Statistics 666 William F. Christensen Home Teaching Honors Selected Publications

STAT 666: Multivariate Statistical Methods


Instructor

William F. Christensen
E-mail: william@stat.byu.edu
Phone Number: 801-422-7057
Office Location: 237 TMCB
Office Hours: MW: 10:00-10:50AM; Th: 3:00-3:30PM; or by appointment.

Announcements

12/9/2019: Here are the daily quizzes for the semester: DailyQuiz666.pdf

11/26/2019: The term project presentation schedule is posted on the calendar below.

10/31/2019: Here is the link to the distance matrix and dendrogram for the personality analysis example I showed today: Example Dendrogram

10/23/2019: HW and Mini-project #3 dates have been updated. Due to time constraints, we will not cover section V.A.iii this semester.

10/10/2019: Here are the daily quizzes for the semester so far: DailyQuiz666f19.pdf

9/3/2019: Webpage for ggobi download is http://www.ggobi.org/. Note that ggobi has not been recently updated (appears to be unsupported). Those who cannot get ggobi working can use a (less elegant) version written as the “tourr” package in R.

9/3/2019: All HW must be turned in to me or the grader by the due date unless you have received permission from me. Please do not ask the grader to grade late work unless it has my signature on it. I understand that things will happen to complicate things occasionally, but rarely should there be need for more than one or two short extensions during the semester.

9/3/2019: All HW must be turned in to me or the grader by the due date unless you have received permission from me. Please do not ask the grader to grade late work unless it has my signature on it. I understand that things will happen to complicate things occasionally, but rarely should there be need for more than one or two short extensions during the semester.

9/3/2019: There is a bonus point on the final exam for everyone who fills out the student evaluation form.


Objectives

At the end of Stat 666 the student will be able to

  1. Demonstrate understanding of multivariate random vectors and their distributions.
  2. Use basic principles of probability, statistics, and linear algebra to motivate:
    • the comparison of mean vectors,
    • multivariate regression and canonical correlation,
    • principal component analysis and factor analysis,
    • classification and clustering.
  3. Produce a complete analysis of appropriate multivariate data using R or SAS for each of the methods of multivariate analysis listed above
  4. Take a multivariate data analysis “consulting” project and
    • Identify appropriate statistical approaches to address the clients’ problem
    • Carry out a thorough and meaningful analysis of the data
    • Clearly and effectively defend and communicate their approach and findings to the client in a report

Prerequisites:

Stat 535, 624, 642.


Lectures:

1:35-2:50pm., TTh; 1106 JKB


Course Materials:

  • Textbook: Methods of Multivariate Analysis, Third Edition (We'll call it "RC")

    by A. C. Rencher & W. F. Christensen, Wiley. Available from HBLL: the pdf of the book is available to BYU students when using the HBLL on campus. Go to http://onlinelibrary.wiley.com/book/10.1002/9781118391686 or go to http://lib.byu.edu/

  • Supplementary textbook (don't buy it... but it's a nice additional reference): Applied Multivariate Statistical Analysis, Sixth Edition (we’ll call it “JW”)

    by R. A. Johnson & D. W. Wichern, Prentice Hall.

  • Lecture Notes: Stat 666 Lecture Notes are available below. I recommend printing 2 slides to a page (or 4 slides to a page if you’ve got great eyesight). As you prepare for class, please do the reading assignment and review the slides that are designated as out-of-class responsibility. A spreadsheet denoting which slides will be covered in class is found here. In this Excel file, each slide is marked as one of the following: In Class (important), On Own (important but cover on before class), Skim (useful as background, but will not be directly examined on Midterm/Final), Omit (not covered this semester). To make it a bit simpler, I’ve denoted the slides that will not be discussed in class with blue font.


Grader

Aubrey Odom, aubreyodom55@gmail.com


Grading

Your semester grade will be determined as follows:

Midterm Exam 20% Date & Time: approx. Oct 17-22 in Testing Center (any time of day that you want; 3 hour limit)
Final Exam 30% Date & Time: Mon Dec 16, 11:00 – 2:00 p.m. in classroom
Homework 10% Mostly textbook and textbook-like problems
Mini-Projects 25% Data analysis and report
Final Project 15% Multivariate Analysis Method Report (or other approved project)

Final grades will depend on the level of difficulty of exams and projects. Grade breakdowns will approximately follow:

  • A: 93 – 100.0
  • A-: 90 – 92.99
  • B+: 86 – 89.99
  • B: 81 – 85.99
  • B-: 75 – 80.99
Notes of grading of homework problems:

Please make sure that when R or SAS output is included, it is clearly highlighted and annotated. I won't sort through pages of output to verify your conclusions.

Notes on grading of mini-projects and other reports:

Your project grades will be based on two equally weighted areas: "technical" and "exposition." Below is a description of what is expected for each of the two areas. And this is the grading rubric.

  • Technical
    • Evidence of substantial breadth and/or depth of analysis
    • Proper implementation of statistical methods
      • Note: document your numerical results (test statistics, estimated parameters, etc.) in your report so that I can properly grade your performance. Don’t just show me the result of your transformation; give me the values estimated to do the transformation. Don’t just interpret the discriminant function; report the standardized discriminant function coefficients. Etc.
    • Well-documented computer code (R, SAS, etc.) attached to back of report. (These pages will not count against the page limit for the report.)
  • Exposition: At a level appropriate for the target audience, the report has the following qualities:
    • Introduction with explanation of problem and important issues
    • Motivation and justification for statistical methods being used
    • Understandable interpretation and conclusions
    • Professional and attractive document
      • free of spelling or other writing errors
      • conforming to specifications
      • all included tables and figures are discussed in the text
      • figures and tables are placed in the body of the text instead of in appendices
    • Important findings summarized in a brief conclusion
    • When introducing technical concepts, give both: (1) a technical definition (formula) AND (2) an intuitive explanation of the technical concept/statistic.

IMPORTANT NOTE: You are really writing to two audiences simultaneously. First, you are writing to your client for the purpose of solving her/his problem and explaining the solution at her/his level. Second, you are writing to your professor to demonstrate your mastery of the subject. This is a difficult task. Save at least a couple of days just for writing and revising—even masterful statistical analyses cannot salvage a poorly written report.


Term Project:

Term projects should be approved by November 1. The typical project will be based on a multivariate method/technique that extends beyond the core methods discussed in class. Term Project reports are generally 4-6 single-spaced pages, not to exceed 6 pages including figures and tables. Additionally, you will turn in your slides for your oral presentation of your project.

Some ideas for methods projects (sorted into semi-related topics) are:

Visualization, Clustering, etc.

  • Multi-dimensional scaling
  • Self Organizing Maps
  • Canonical Correspondence Analysis & Biplots

Dimension Reduction, etc.

  • Independent Component Analysis
  • Projection Pursuit
  • Constrained Correspondence Analysis (see vegan package in R)
  • partial Redundancy Analysis aka “pRDA” (see vegan package in R)

Prediction & Classification

  • Random forests (proposing extensions beyond Stat 536)
  • Boosting (AdaBoost)
  • Learning Vector Quantization algorithm (for reducing memory usage in KNN applications)
  • Neural Networks
  • Deep Learning (i.e., deep neural nets, deep belief network; common applications in image recognition, natural language processing, etc)
  • Support Vector Machines
  • Support Vector Regression
  • Functional Data Analysis

Latent Structure

  • Multi-group Confirmatory Factor Analysis
  • Multi-group Structural Equation Modeling
  • Latent Class Analysis
  • Latent Trait Analysis

Big Data, Missing Data, etc.

  • p >> N extensions of classical analyses
  • Bayesian imputation

The intent of the term project is that you extend learning beyond what was discussed in class. Projects will incorporate at least one of the following facets:

  1. Applications of multivariate analysis methods to situations that extend beyond the standard scenarios. For example, applying multivariate tools to data that: exhibit temporal or spatial dependence, have p >> N problems, are ordinal in scale, are functional in nature (i.e., require functional data analysis), or are in some other way complicated or non-standard.
  2. Applications of multivariate methods that have NOT been discussed directly in class. In these scenarios, students might view their report as if it were a 4-6 page section of a multivariate statistical methods textbook.

In all cases, students should try to do more than replicate analyses or illustrations of methods—try to find a question about your methods that goes beyond simple description of methods or analyses. Nearly all projects will use simulation (or some other ambitious quantitative comparison of competing methods). The term project need not be publishable research, but the best projects try to address some novel question.


Tentative Schedule & Textbook Sections for RC and JW

WEEK

TOPIC & READING ASSIGNMENT (to be completed in advance)

LECTURE

SLIDES (anticipated…skim the slides ahead of time and be prepared with questions—some slides will be skipped if there are no questions.  See the plan for in-class coverage of slides here )

ASSIGNMENTS & EXAMS (assignment details are below)

Sep 3, 5 (#1)

I. INTRODUCTION & OVERVIEW

A. Overview of Multivariate Concepts, Tools, and Techniques, [not in book]

B. Visualizing Multivariate Concepts, [3.5, 3.13 RC]

C. Data Displays, [3.3-3.4 RC; or 1.3-1.4 JW]

D. Notation & Descriptive Statistics, [3.1-3.2, 3.6-3.8 RC; or 1.3 JW]

II. FOUNDATIONS OF MULTIVARIATE ANALYSIS

A. Brief Coverage of Some Matrix Algebra, [2 RC; or 2 and 2A JW]

B. Expected Values for the Sample Mean and Covariance Matrix, [3.5-3.7 RC; or 3.3 JW]

1.1-1.26;

2.1-2.6

HW1a : Sept 6

Sep 10, 12  (#2)

C. Geometry of the Sample

D. Generalized Variance, [4.1.3 RC; or 3.4 JW]

E. Multivariate Normal Distribution, [4.1-4.2 RC; or 4.2 JW]

F. Graphical Analysis (Assessing Multivariate Normality and Detecting Outliers), [4.4-4.6 RC; or 4.6-4.7 JW]

H. Sampling Distributions of Sample Mean and Sample Covariance Matrix , [4.3 RC; or 4.4-4.5 JW]

2.7-2.68

HW1b : Sept 13

Sep 17, 19 (#3)

I. EM Algorithm and Missing Data, [3.12 RC; or 5.7 JW]

J. Multiple Imputation

III. MULTIVARIATE STATISTICAL INFERENCE

A. Inference for a Mean Vector

i. Hotelling’s T2, [5.3 RC; or 5.2-5.3 JW]

ii. Confidence Regions, [5.2 RC; or 5.4 JW]

B. Comparison of Several Multivariate Means

i. Paired Comparisons, [5.7 RC; or 6.2 JW]

ii. Two-Sample Comparisons, [5.4 RC; or 6.3 JW]

3.1-3.41

Sep 24, 26 (#4)

iii. MANOVA, [6.1-6.6 RC and 8.4-8.8 RC; or 6.4-6.6 JW]

3.42-3.79,

5.3-5.9

Mini-Project #1 : Sep 26

Oct 1, 3 (#5)

(continued)

v. Repeated Measures Analysis [6.9 RC]

3.87-3.110

HW2 : Oct 1

Oct 8, 10 (#6)

(continued)

C. Regression & Correlation

i. Multivariate Multiple Regression, [10.4-10.8 RC; or 7.7 JW]

3.116-3.155

HW3a : Oct 8

 

Mini-Project #2: Oct 11

Oct 15, 17 (#7)

(continued)

ii. Seemingly Unrelated Regressions

iii. Canonical Correlation Analysis, [11.1-11.6 RC; or 10.1-10.4 JW]

3.156-3.167

HW3b : Oct 16

 

Midterm: Oct 17-22

Oct 22, 24 (#8)

V. CLASSIFICATION AND CLUSTERING

C. Clustering, [15 RC; or 12.1-12.4 JW]

A. Discriminant (& Classification) Analysis

i. Describing Group Separation [8 RC]

ii. Foundational Classification Tools: LDA, QDA, KNN [9 RC; or 11.1-11.7 JW]

5.1-5.2,

5.51-5.62,

5.10-5.20

 

Oct 29, 31 (#9)

(continued)

iii. Modern Classification Tools (using a few sections from your Stat 536 text by James, Witten, Hastie, and Tibshirani [JWHT])

a.     Trees & Random Forests [9.7.4 RC; 8.2.2 JWHT]

b.    Boosting [8.2.3 JWHT]

5.21-5.31,

5.32-5.39

 

Term Project Proposal: Nov 1

Nov 5, 7 (#10)

c.     Support Vector Machines [9 JWHT]

d.    Naïve Bayes

IV. ANALYSIS OF COVARIANCE STRUCTURE

A. Principal Components, [12.1-12.8 RC; or 8.1-8.5 JW]

5.40-5.50

4.1-4.18 (read ahead…Q&A in class),

HW4 : Nov 5

Nov 12, 14 (#11)

B. Factor Analysis

i. Exploratory Factor Analysis, [13.1-13.7 RC; or 9.1-9.5 JW]

4.19-4.48

Nov 19, 21 (#12)

ii. Confirmatory Factor Analysis, [14 RC]

4.49-4.69

Mini-Project #3: Nov 19

HW5 : Nov 21

 

--- Thanksgiving Break ---

 

 

Dec 3, 5 (#13)

(continued)

iii. Structural Equation Modeling

HW6a : Dec 3

Dec 10, 12 (#14)

FINAL PROJECT PRESENTATIONS (8 min):

Tue 12/10: Hannah, Hyejung, Aubrey, Brittany, Zoe, Dean, Matt

Thur 12/12: Jeremy, Timo, Wendy, Celeste, SpencerE, Shelby, SpencerN

 

Term Project Report: Due Dec 10 at 1:30 pm

 

Term Project Presentation Slide Printout: Due at 1:30 pm on your presentation day

 

HW6b : Dec 12

Dec 16

FINAL EXAM: Mon Dec 16, 11 am – 2 pm

 

Final: Dec 16, 11am-2pm


HW ASSIGNMENTS

HW #1a – Due Fri Sep 6 (in my office or the grader's)

HW #1b – Due Fri Sep 13 (in my office or the grader's)

Mini-Project #1 project1.pdf – Due Thur Sep 26 (in my office or the grader's)…make sure to read “Notes on grading of mini-projects and other reports” posted above before writing this mini-project. Partially-censored rubric

HW #2 – Due Tue Oct 1 (in my office or the grader's)

HW #3a – Due Tue Oct 8 (in my office or the grader's)

Mini-Project #2 project2.pdf – Due Fri Oct 11 (in my office or the grader's). Partially-censored rubric

HW #3b – Due Wed Oct 16 (in my office or the grader's)

Midterm Exam – Testing Center Oct 17-22

Term Project Proposal – Must be approved in person by Nov 1

HW #4 – Due Tue Nov 5 (in my office or the grader's)

Mini-Project #3 project3.pdf – Due Tue Nov 19 (in my office or the grader's). This summary is useful, or you can dig into the whole dissertation if you’re curious. Partially-censored rubric

HW #6a – Due Thur Dec 3 (in my office or the grader's). (some partial code to assist: hw6.R and hw6hints.sas).

Term Project – Due Tue Dec 10 at the beginning of class. Printout of your presentation slides are due the day of your presentation.

HW #6b – Due Thur Dec 12 (in my office or the grader's). (some partial code to assist: hw6.R and hw6hints.sas).


DATA

places.dat

places.col

places.row

receptor.col

receptor.dat

oliver3b.txt

olive.txt

CALCIUM.DAT

MUSCDYS.DAT

GLUCOSE.DAT

fraud.zip

oliver2a

oliver4a

ROOT.DAT

T6-13.DAT

perf.txt

WEAR.DAT

REAGENT.txt

probe.txt

goods.txt

essay.txt

FISH.DAT

MANDIBLE.DAT

SNAPBEAN.DAT

CHEM.DAT

AMITRIPTYLINE.DAT

SEISHU.DAT

FOOTBALL.DAT

TEMPERAT.DAT

Gradedat.txt

salespeople.txt

receptor2.txt

FBEETLES.DAT

bodyfat.txt

collins.txt

writingstyle.txt

PSAData.txt

PSAContributions.txt

guinea.dat

mushrooms.csv

countriesoftheworld.csv

globalterrorismdb_0718dist.csv (further background at https://www.kaggle.com/START-UMD/gtd)


CLASS EXAMPLES

SectI.R -- Updated on 9/4/2019 at 4:00 pm

SectII.R -- Updated on 9/4/2019 at 4:30 pm

SectIIIA.R

SectIIIB.R

SectIIIB.sas

SectIIIC.sas

SectIVA.R

PCandREG.R

SectIVB.sas

SectVAi.R

SectVAi.sas

SectVAii.sas

SectVAiii.R

SectVC.R


HONOR CODE STANDARDS

In keeping with the principles of the BYU Honor Code, students are expected to be honest in all of their academic work. Academic honesty means, most fundamentally, that any work you present as your own must in fact be your own work and not that of another. Violations of this principle may result in a failing grade in the course and additional disciplinary action by the university.

Students are also expected to adhere to the Dress and Grooming Standards. Adherence demonstrates respect for yourself and others and ensures an effective learning and working environment. It is the university’s expectation, and my own expectation in class, that each student will abide by all Honor Code standards. Please call the Honor Code Office at 422-2847 if you have questions about those standards.


PREVENTING SEXUAL DISCRIMINATION OR HARASSMENT

Title IX of the Education Amendments of 1972 prohibits sex discrimination against any participant in an educational program or activity that receives federal funds. The act is intended to eliminate sex discrimination in education and pertains to admissions, academic and athletic programs, and university-sponsored activities. Title IX also prohibits sexual harassment of students by university employees, other students, and visitors to campus. If you encounter sexual harassment or gender-based discrimination, please talk to your professor, contact the Equal Employment Office at 801-422-5895 or 1-888-238-1062 (24-hours), or http://www.ethicspoint.com; or contact the Honor Code Office at 801-422-2847.


STUDENTS WITH DISABILITIES

If you have a disability that may affect your performance in this course, you should get in touch with the office of Services for Students with Disabilities (1520 WSC). This office can evaluate your disability and assist the professor in arranging for reasonable accommodations.