STAT 666: Multivariate Statistical Methods
Instructor
William F. ChristensenE-mail: william@stat.byu.edu
Phone Number: 801-422-7057
Office Location: 237 TMCB
Office Hours: MW: 10:00-10:50AM; Th: 3:00-3:30PM; or by appointment.
Announcements
12/9/2019: Here are the daily quizzes for the semester: DailyQuiz666.pdf
11/26/2019: The term project presentation schedule is posted on the calendar below.
10/31/2019: Here is the link to the distance matrix and dendrogram for the personality analysis example I showed today: Example Dendrogram
10/23/2019: HW and Mini-project #3 dates have been updated. Due to time constraints, we will not cover section V.A.iii this semester.
10/10/2019: Here are the daily quizzes for the semester so far: DailyQuiz666f19.pdf
9/3/2019: Webpage for ggobi download is http://www.ggobi.org/. Note that ggobi has not been recently updated (appears to be unsupported). Those who cannot get ggobi working can use a (less elegant) version written as the “tourr” package in R.
9/3/2019: All HW must be turned in to me or the grader by the due date unless you have received permission from me. Please do not ask the grader to grade late work unless it has my signature on it. I understand that things will happen to complicate things occasionally, but rarely should there be need for more than one or two short extensions during the semester.
9/3/2019: All HW must be turned in to me or the grader by the due date unless you have received permission from me. Please do not ask the grader to grade late work unless it has my signature on it. I understand that things will happen to complicate things occasionally, but rarely should there be need for more than one or two short extensions during the semester.
9/3/2019: There is a bonus point on the final exam for everyone who fills out the student evaluation form.
Objectives
At the end of Stat 666 the student will be able to
- Demonstrate understanding of multivariate random vectors and their distributions.
- Use basic principles of probability, statistics, and linear algebra to motivate:
- the comparison of mean vectors,
- multivariate regression and canonical correlation,
- principal component analysis and factor analysis,
- classification and clustering.
- Produce a complete analysis of appropriate multivariate data using R or SAS for each of the methods of multivariate analysis listed above
- Take a multivariate data analysis “consulting” project and
- Identify appropriate statistical approaches to address the clients’ problem
- Carry out a thorough and meaningful analysis of the data
- Clearly and effectively defend and communicate their approach and findings to the client in a report
Prerequisites:
Stat 535, 624, 642.
Lectures:
1:35-2:50pm., TTh; 1106 JKB
Course Materials:
- Textbook: Methods of Multivariate Analysis, Third Edition (We'll call it "RC")
by A. C. Rencher & W. F. Christensen, Wiley. Available from HBLL: the pdf of the book is available to BYU students when using the HBLL on campus. Go to http://onlinelibrary.wiley.com/book/10.1002/9781118391686 or go to http://lib.byu.edu/
- Supplementary textbook (don't buy it... but it's a nice additional reference):
Applied Multivariate Statistical Analysis, Sixth Edition (we’ll call it “JW”)
by R. A. Johnson & D. W. Wichern, Prentice Hall.
- Lecture Notes: Stat 666 Lecture Notes are available below. I recommend printing 2 slides to a page (or 4 slides to a page if you’ve got great eyesight). As you prepare for class, please do the reading assignment and review the slides that are designated as out-of-class responsibility. A spreadsheet denoting which slides will be covered in class is found here. In this Excel file, each slide is marked as one of the following: In Class (important), On Own (important but cover on before class), Skim (useful as background, but will not be directly examined on Midterm/Final), Omit (not covered this semester). To make it a bit simpler, I’ve denoted the slides that will not be discussed in class with blue font.
Grader
Aubrey Odom, aubreyodom55@gmail.com
Grading
Your semester grade will be determined as follows:
Midterm Exam | 20% | Date & Time: approx. Oct 17-22 in Testing Center (any time of day that you want; 3 hour limit) |
---|---|---|
Final Exam | 30% | Date & Time: Mon Dec 16, 11:00 – 2:00 p.m. in classroom |
Homework | 10% | Mostly textbook and textbook-like problems |
Mini-Projects | 25% | Data analysis and report |
Final Project | 15% | Multivariate Analysis Method Report (or other approved project) |
Final grades will depend on the level of difficulty of exams and projects. Grade breakdowns will approximately follow:
- A: 93 – 100.0
- A-: 90 – 92.99
- B+: 86 – 89.99
- B: 81 – 85.99
- B-: 75 – 80.99
Notes of grading of homework problems:
Please make sure that when R or SAS output is included, it is clearly highlighted and annotated. I won't sort through pages of output to verify your conclusions.
Notes on grading of mini-projects and other reports:
Your project grades will be based on two equally weighted areas: "technical" and "exposition." Below is a description of what is expected for each of the two areas. And this is the grading rubric.
- Technical
- Evidence of substantial breadth and/or depth of analysis
- Proper implementation of statistical methods
- Note: document your numerical results (test statistics, estimated parameters, etc.) in your report so that I can properly grade your performance. Don’t just show me the result of your transformation; give me the values estimated to do the transformation. Don’t just interpret the discriminant function; report the standardized discriminant function coefficients. Etc.
- Well-documented computer code (R, SAS, etc.) attached to back of report. (These pages will not count against the page limit for the report.)
- Exposition: At a level appropriate for the target audience, the report has the following qualities:
- Introduction with explanation of problem and important issues
- Motivation and justification for statistical methods being used
- Understandable interpretation and conclusions
- Professional and attractive document
- free of spelling or other writing errors
- conforming to specifications
- all included tables and figures are discussed in the text
- figures and tables are placed in the body of the text instead of in appendices
- Important findings summarized in a brief conclusion
- When introducing technical concepts, give both: (1) a technical definition (formula) AND (2) an intuitive explanation of the technical concept/statistic.
IMPORTANT NOTE: You are really writing to two audiences simultaneously. First, you are writing to your client for the purpose of solving her/his problem and explaining the solution at her/his level. Second, you are writing to your professor to demonstrate your mastery of the subject. This is a difficult task. Save at least a couple of days just for writing and revising—even masterful statistical analyses cannot salvage a poorly written report.
Term Project:
Term projects should be approved by November 1. The typical project will be based on a multivariate method/technique that extends beyond the core methods discussed in class. Term Project reports are generally 4-6 single-spaced pages, not to exceed 6 pages including figures and tables. Additionally, you will turn in your slides for your oral presentation of your project.
Some ideas for methods projects (sorted into semi-related topics) are:
Visualization, Clustering, etc.
- Multi-dimensional scaling
- Self Organizing Maps
- Canonical Correspondence Analysis & Biplots
Dimension Reduction, etc.
- Independent Component Analysis
- Projection Pursuit
- Constrained Correspondence Analysis (see vegan package in R)
- partial Redundancy Analysis aka “pRDA” (see vegan package in R)
Prediction & Classification
- Random forests (proposing extensions beyond Stat 536)
- Boosting (AdaBoost)
- Learning Vector Quantization algorithm (for reducing memory usage in KNN applications)
- Neural Networks
- Deep Learning (i.e., deep neural nets, deep belief network; common applications in image recognition, natural language processing, etc)
- Support Vector Machines
- Support Vector Regression
- Functional Data Analysis
Latent Structure
- Multi-group Confirmatory Factor Analysis
- Multi-group Structural Equation Modeling
- Latent Class Analysis
- Latent Trait Analysis
Big Data, Missing Data, etc.
- p >> N extensions of classical analyses
- Bayesian imputation
The intent of the term project is that you extend learning beyond what was discussed in class. Projects will incorporate at least one of the following facets:
- Applications of multivariate analysis methods to situations that extend beyond the standard scenarios. For example, applying multivariate tools to data that: exhibit temporal or spatial dependence, have p >> N problems, are ordinal in scale, are functional in nature (i.e., require functional data analysis), or are in some other way complicated or non-standard.
- Applications of multivariate methods that have NOT been discussed directly in class. In these scenarios, students might view their report as if it were a 4-6 page section of a multivariate statistical methods textbook.
In all cases, students should try to do more than replicate analyses or illustrations of methods—try to find a question about your methods that goes beyond simple description of methods or analyses. Nearly all projects will use simulation (or some other ambitious quantitative comparison of competing methods). The term project need not be publishable research, but the best projects try to address some novel question.
Tentative Schedule & Textbook Sections for RC and JW
WEEK |
TOPIC & READING ASSIGNMENT (to be
completed in advance) |
LECTURE SLIDES (anticipated…skim the slides ahead of
time and be prepared with questions—some slides will be skipped if there are
no questions. See the plan for
in-class coverage of slides
here
) |
ASSIGNMENTS & EXAMS (assignment
details are below) |
Sep
3, 5 (#1) |
I.
INTRODUCTION & OVERVIEW A.
Overview of Multivariate Concepts, Tools, and Techniques, [not in book] B.
Visualizing Multivariate Concepts, [3.5, 3.13 RC] C.
Data Displays, [3.3-3.4 RC; or 1.3-1.4 JW] D.
Notation & Descriptive Statistics, [3.1-3.2, 3.6-3.8 RC; or 1.3 JW] II.
FOUNDATIONS OF MULTIVARIATE ANALYSIS A.
Brief Coverage of Some Matrix Algebra, [2 RC; or 2 and 2A JW] B.
Expected Values for the Sample Mean and Covariance Matrix, [3.5-3.7 RC; or
3.3 JW] |
1.1-1.26; 2.1-2.6 |
HW1a : Sept 6 |
Sep
10, 12 (#2) |
C.
Geometry of the Sample D.
Generalized Variance, [4.1.3 RC; or 3.4 JW] E.
Multivariate Normal Distribution, [4.1-4.2 RC; or 4.2 JW] F.
Graphical Analysis (Assessing Multivariate Normality and Detecting Outliers),
[4.4-4.6 RC; or 4.6-4.7 JW] H.
Sampling Distributions of Sample Mean
and Sample Covariance Matrix , [4.3 RC; or 4.4-4.5 JW] |
2.7-2.68 |
HW1b : Sept 13 |
Sep
17, 19 (#3) |
I. EM Algorithm and Missing Data, [3.12 RC; or 5.7
JW] J. Multiple Imputation III.
MULTIVARIATE STATISTICAL INFERENCE A. Inference for a Mean Vector i. Hotelling’s T2, [5.3 RC; or 5.2-5.3 JW] ii.
Confidence Regions, [5.2 RC; or 5.4 JW] B. Comparison of Several Multivariate Means i.
Paired Comparisons, [5.7 RC; or 6.2 JW] ii.
Two-Sample Comparisons, [5.4 RC; or 6.3 JW] |
3.1-3.41 |
|
Sep
24, 26 (#4) |
iii.
MANOVA, [6.1-6.6 RC and 8.4-8.8 RC; or 6.4-6.6 JW] |
3.42-3.79, 5.3-5.9 |
Mini-Project #1
: Sep 26 |
Oct
1, 3 (#5) |
(continued) v.
Repeated Measures Analysis [6.9 RC] |
3.87-3.110 |
HW2 : Oct 1 |
Oct
8, 10 (#6) |
(continued) C. Regression & Correlation i.
Multivariate Multiple Regression, [10.4-10.8 RC; or 7.7 JW] |
3.116-3.155 |
HW3a : Oct 8 Mini-Project #2: Oct 11 |
Oct
15, 17 (#7) |
(continued) ii. Seemingly Unrelated
Regressions iii.
Canonical Correlation Analysis, [11.1-11.6 RC; or 10.1-10.4 JW] |
3.156-3.167 |
HW3b : Oct 16 Midterm:
Oct 17-22 |
Oct
22, 24 (#8) |
V.
CLASSIFICATION AND CLUSTERING C. Clustering, [15 RC; or 12.1-12.4 JW] A. Discriminant (& Classification) Analysis
ii. Foundational
Classification Tools: LDA, QDA, KNN [9 RC; or 11.1-11.7 JW] |
5.1-5.2,
5.51-5.62, 5.10-5.20 |
|
Oct
29, 31 (#9) |
(continued) a. b. |
5.21-5.31, |
Term
Project Proposal: Nov 1 |
Nov
5, 7 (#10) |
c. d. IV.
ANALYSIS OF COVARIANCE STRUCTURE A. Principal Components, [12.1-12.8 RC; or 8.1-8.5
JW] |
4.1-4.18
(read ahead…Q&A in class), |
HW4 : Nov 5 |
Nov
12, 14 (#11) |
B. Factor Analysis i.
Exploratory Factor Analysis, [13.1-13.7 RC; or 9.1-9.5 JW] |
4.19-4.48 |
|
Nov
19, 21 (#12) |
ii. Confirmatory Factor
Analysis, [14 RC] |
4.49-4.69 |
Mini-Project #3: Nov 19
|
|
--- Thanksgiving
Break --- |
|
|
Dec
3, 5 (#13) |
(continued) iii. Structural Equation
Modeling |
HW6a : Dec 3 |
|
Dec
10, 12 (#14) |
FINAL
PROJECT PRESENTATIONS (8 min): Tue 12/10: Hannah, Hyejung, Aubrey, Brittany, Zoe, Dean, Matt
Thur 12/12: Jeremy, Timo, Wendy, Celeste, SpencerE, Shelby, SpencerN
|
|
Term
Project Report: Due Dec 10 at 1:30 pm Term
Project Presentation Slide Printout: Due at 1:30 pm on your presentation day HW6b : Dec 12 |
Dec
16 |
FINAL
EXAM: Mon Dec 16, 11 am – 2 pm |
|
Final:
Dec 16, 11am-2pm |
HW ASSIGNMENTS
HW #1a – Due Fri Sep 6 (in my office or the grader's)
HW #1b – Due Fri Sep 13 (in my office or the grader's)
Mini-Project #1 project1.pdf – Due Thur Sep 26 (in my office or the grader's)…make sure to read “Notes on grading of mini-projects and other reports” posted above before writing this mini-project. Partially-censored rubric
HW #2 – Due Tue Oct 1 (in my office or the grader's)
HW #3a – Due Tue Oct 8 (in my office or the grader's)
Mini-Project #2 project2.pdf – Due Fri Oct 11 (in my office or the grader's). Partially-censored rubric
HW #3b – Due Wed Oct 16 (in my office or the grader's)
Midterm Exam – Testing Center Oct 17-22
Term Project Proposal – Must be approved in person by Nov 1
HW #4 – Due Tue Nov 5 (in my office or the grader's)
Mini-Project #3 project3.pdf – Due Tue Nov 19 (in my office or the grader's). This summary is useful, or you can dig into the whole dissertation if you’re curious. Partially-censored rubric
HW #6a – Due Thur Dec 3 (in my office or the grader's). (some partial code to assist: hw6.R and hw6hints.sas).
Term Project – Due Tue Dec 10 at the beginning of class. Printout of your presentation slides are due the day of your presentation.
HW #6b – Due Thur Dec 12 (in my office or the grader's). (some partial code to assist: hw6.R and hw6hints.sas).
DATA
globalterrorismdb_0718dist.csv (further background at https://www.kaggle.com/START-UMD/gtd)
CLASS EXAMPLES
SectI.R -- Updated on 9/4/2019 at 4:00 pm
SectII.R -- Updated on 9/4/2019 at 4:30 pm
HONOR CODE STANDARDS
In keeping with the principles of the BYU Honor Code, students are expected to be honest in all of their academic work. Academic honesty means, most fundamentally, that any work you present as your own must in fact be your own work and not that of another. Violations of this principle may result in a failing grade in the course and additional disciplinary action by the university.
Students are also expected to adhere to the Dress and Grooming Standards. Adherence demonstrates respect for yourself and others and ensures an effective learning and working environment. It is the university’s expectation, and my own expectation in class, that each student will abide by all Honor Code standards. Please call the Honor Code Office at 422-2847 if you have questions about those standards.
PREVENTING SEXUAL DISCRIMINATION OR HARASSMENT
Title IX of the Education Amendments of 1972 prohibits sex discrimination against any participant in an educational program or activity that receives federal funds. The act is intended to eliminate sex discrimination in education and pertains to admissions, academic and athletic programs, and university-sponsored activities. Title IX also prohibits sexual harassment of students by university employees, other students, and visitors to campus. If you encounter sexual harassment or gender-based discrimination, please talk to your professor, contact the Equal Employment Office at 801-422-5895 or 1-888-238-1062 (24-hours), or http://www.ethicspoint.com; or contact the Honor Code Office at 801-422-2847.
STUDENTS WITH DISABILITIES
If you have a disability that may affect your performance in this course, you should get in touch with the office of Services for Students with Disabilities (1520 WSC). This office can evaluate your disability and assist the professor in arranging for reasonable accommodations.