class: center, middle, inverse, title-slide # Factor Models ## High Dimensional Data Analysis ### Anastasios Panagiotelis & Ruben Loaiza-Maya ### Lecture 8 --- class: inverse, center, middle # Motivation --- # Boston Housing - In an earlier tutorial we considered the Boston Housing data. -- - Each observation is a town (suburb) in the Boston metropolitan area. -- - There are 14 variables measuring demographic information as well as other factors that may influence house price. --- # PCA on Boston Housing ```r #First load required packages library(tidyverse) Boston<-readRDS('Boston.rds') Boston%>% column_to_rownames('Town')%>% prcomp(scale.=TRUE)->pcaout screeplot(pcaout,type = 'l') ``` --- # Scree Plot <img src="FactorModel_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- #Biplot ```r biplot(pcaout) ``` <img src="FactorModel_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # Discussion - Nearly 60% of the variation of all variables in explained by just 2 PCs. -- - Can these PCs be interpreted. -- - Sometimes they can but in this example it is difficult. -- - This is not surprising, PCA just finds a linear combination that maximises variances. -- - To obtain factors with some interpretation we need a more detailed model. --- # Factor Model - The factor model is defined as `$$y_{ij}=\lambda_{j1}f_{1i}+\lambda_{j2}f_{2i}+\ldots+\xi_{ij}$$` -- - Or in matrix form `$$\mathbf{y}_i=\boldsymbol{\Lambda}\mathbf{f}_i+\boldsymbol{\xi}_i$$` - `\(y\)` are observed data, `\(\Lambda\)` / `\(\lambda\)` are coefficients, `\(f\)` are latent factors, `\(\xi\)` are error terms. -- - The intercept is left out for simplicity. --- # Notation - The subscript `\(i\)` denotes the `\(i^{th}\)` cross sectional unit (in the Boston data the town). -- - The subscript `\(j\)` denotes the variable (e.g. teacher ratio, distance from downtown etc.) -- - The dimensions of `\(\mathbf{y}_i\)` and `\(\boldsymbol{\xi}_i\)` are `\(p\times 1\)` (or `\(14\times 1\)` in the Boston data). -- - If there are `\(r\)` factors then `\(\mathbf{f_i}\)` is `\(r\times 1\)` and `\(\boldsymbol{\Lambda}\)` is `\(p\times r\)`. -- - Verify that all matrix multiplication is conformable. --- # Regression - This is similar to a regression model. However - In a regression model there are `\(x\)` on the right hand side that are *observed*. - In a factor model these are replaced with `\(f\)` that are *unobserved*. -- - How can we estimate this model? --- # Assumptions: Errors - Each idiosyncratic error has its own variance. - These variances are called unique variance or uniquenesses -- - The idiosyncratic errors are uncorrelated with each other. - This is a crucial assumption -- - Together these imply that `\(\mbox{Var-Cov}(\boldsymbol{\xi})=\boldsymbol{\Psi}\)` is diagonal. --- # Assumptions: Factors - The factor and idiosyncratic errors are uncorrelated. - This is similar to regression -- - Each factor has a variance of 1. - This is harmless since the factor is latent. -- - The factors are uncorrelated with each other - We relax this assumption later on. -- - These imply that `\(\mbox{Var-Cov}(\mathbf{f})=\mathbf{I}\)`. --- # Estimation - In general these assumptions imply that `$$E(\mathbf{y}\mathbf{y}')=\boldsymbol{\Sigma}=\boldsymbol{\Lambda}\boldsymbol{\Lambda}'+\boldsymbol{\Psi}$$` - The variance is decomposed into two parts -- - Part explained by common factors `\(\boldsymbol{\Lambda}\boldsymbol{\Lambda}'\)` . - This is often called the *communality* or *common variance*. - Part unexplained by common factors `\(\boldsymbol{\Psi}\)`. - This is often called the *uniqueness* or *unique variance*. --- # Estimation - It is straightforward to estimate `\(\boldsymbol{\Sigma}\)` with its sample equivalent `\(\mathbf{S}\)` -- - We can then choose values `\(\hat{\boldsymbol{\Lambda}}\)` and `\(\hat{\boldsymbol{\Psi}}\)` so that `\(\hat{\boldsymbol{\Lambda}}\hat{\boldsymbol{\Lambda}}'+\hat{\boldsymbol{\Psi}}\)` is close to `\(\mathbf{S}\)`. -- - There are many ways to do this -- - *Maximum likelihood estimation* is one of the most popular. --- # Estimation issues - Using *Maximum likelihood estimation* does require a distributional assumption about the data. -- - The most common assumption is that the data are normally distributed. -- - Even when this assumption does not hold the maximum likelihood estimate is still quite robust as long as the data do not differ too much from normality. --- # Number of factors - There are a number of strategies for selecting factors - Scree plot - Kaiser rule - Hypothesis tests --- #Heywood cases - In some rare cases the maximum likelihood converges to an estimate where the unique variances are *negative*. -- - These are known as *Heywood cases* -- - Since a variance is cannot be negative this is usually caused by - Selecting too many factors - Too small a sample size. --- class: inverse, middle, center # Factor Analysis in R --- #Using R - Many packages in R do factor analysis -- - We use `factanal` from the `stats` package -- - First step use the following code ```r #First load required packages Boston<-readRDS('Boston.rds') Boston%>% column_to_rownames('Town')%>% factanal(factors = 2,scores = 'none', rotation = 'none')->facto ``` --- #Output ```r facto$loadings ``` ``` ## ## Loadings: ## Factor1 Factor2 ## CRIM 0.548 ## ZN -0.401 0.455 ## INDUS 0.752 -0.513 ## CHAS -0.120 ## NOX 0.630 -0.628 ## RM -0.280 0.371 ## AGE 0.531 -0.576 ## DIS -0.611 0.573 ## RAD 0.923 0.282 ## TX 0.979 0.103 ## PTRATIO 0.472 0.251 ## B -0.459 ## LSTAT 0.532 -0.384 ## MEDV -0.622 0.297 ## ## Factor1 Factor2 ## SS loadings 5.071 2.080 ## Proportion Var 0.362 0.149 ## Cumulative Var 0.362 0.511 ``` --- #Output - An advantage of printing the loadings like this is that values close to zero are surpressed. -- - This will help with the interpretation of factors. -- - For 2 factors, it can be useful to also plot the factors. -- - To prepare the data use the `tidy` function in the `broom` package. --- #Loadings ```r library(broom) fa_df<-tidy(facto) #Get into data frame ``` <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:500px; "><table class="table table-striped table-hover table-condensed" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> variable </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> uniqueness </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> fl1 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> fl2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> CRIM </td> <td style="text-align:right;"> 0.6957467 </td> <td style="text-align:right;"> 0.5475829 </td> <td style="text-align:right;"> 0.0660978 </td> </tr> <tr> <td style="text-align:left;"> ZN </td> <td style="text-align:right;"> 0.6318782 </td> <td style="text-align:right;"> -0.4009173 </td> <td style="text-align:right;"> 0.4554129 </td> </tr> <tr> <td style="text-align:left;"> INDUS </td> <td style="text-align:right;"> 0.1721753 </td> <td style="text-align:right;"> 0.7517319 </td> <td style="text-align:right;"> -0.5125564 </td> </tr> <tr> <td style="text-align:left;"> CHAS </td> <td style="text-align:right;"> 0.9847352 </td> <td style="text-align:right;"> -0.0248617 </td> <td style="text-align:right;"> -0.1202525 </td> </tr> <tr> <td style="text-align:left;"> NOX </td> <td style="text-align:right;"> 0.2081610 </td> <td style="text-align:right;"> 0.6302070 </td> <td style="text-align:right;"> -0.6282309 </td> </tr> <tr> <td style="text-align:left;"> RM </td> <td style="text-align:right;"> 0.7835958 </td> <td style="text-align:right;"> -0.2803734 </td> <td style="text-align:right;"> 0.3712276 </td> </tr> <tr> <td style="text-align:left;"> AGE </td> <td style="text-align:right;"> 0.3870295 </td> <td style="text-align:right;"> 0.5306744 </td> <td style="text-align:right;"> -0.5756284 </td> </tr> <tr> <td style="text-align:left;"> DIS </td> <td style="text-align:right;"> 0.2983700 </td> <td style="text-align:right;"> -0.6109165 </td> <td style="text-align:right;"> 0.5730549 </td> </tr> <tr> <td style="text-align:left;"> RAD </td> <td style="text-align:right;"> 0.0687967 </td> <td style="text-align:right;"> 0.9228758 </td> <td style="text-align:right;"> 0.2819623 </td> </tr> <tr> <td style="text-align:left;"> TX </td> <td style="text-align:right;"> 0.0307162 </td> <td style="text-align:right;"> 0.9790952 </td> <td style="text-align:right;"> 0.1032290 </td> </tr> <tr> <td style="text-align:left;"> PTRATIO </td> <td style="text-align:right;"> 0.7146628 </td> <td style="text-align:right;"> 0.4716189 </td> <td style="text-align:right;"> 0.2509158 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 0.7799470 </td> <td style="text-align:right;"> -0.4585974 </td> <td style="text-align:right;"> 0.0985286 </td> </tr> <tr> <td style="text-align:left;"> LSTAT </td> <td style="text-align:right;"> 0.5691174 </td> <td style="text-align:right;"> 0.5324934 </td> <td style="text-align:right;"> -0.3838319 </td> </tr> <tr> <td style="text-align:left;"> MEDV </td> <td style="text-align:right;"> 0.5247087 </td> <td style="text-align:right;"> -0.6220804 </td> <td style="text-align:right;"> 0.2970937 </td> </tr> </tbody> </table></div> --- # Plotting The plot is clearer if arrows are used ```r ggplot(fa_df,aes(x=fl1,y=fl2, label=variable))+ geom_segment(aes(xend=fl1, yend=fl2,x=0,y=0), arrow = arrow())+ geom_text(color='red',nudge_y = -0.05) ``` --- # Plotting <img src="FactorModel_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- #Interpretation It is difficult to interpret these Factors - Factor 1 seems to take everything into account except the Charles river dummy. -- - Factor 2 takes everything into account except crime and race. -- - It would be easier to interpret the factors if Factor 1 loaded onto a small set of variables and Factor 2 loaded onto a different small set of variables. -- - Can we do this? --- #Rotations Recall the model is `$$\mathbf{y}_i=\boldsymbol{\Lambda}\mathbf{f}_i+\boldsymbol{\xi}_i$$` Assume there is an r × r rotation matrix `\(\mathbf{R}\)`. Since `\(\mathbf{R}'\mathbf{R} = \mathbf{I}\)` the model above is equivalent to `$$\mathbf{y}_i=\boldsymbol{\Lambda}\mathbf{R}'\mathbf{R}\mathbf{f}_i+\boldsymbol{\xi}_i$$` --- # The rotation trick Grouping parts together we have `$$\mathbf{y}_i=\left(\boldsymbol{\Lambda}\mathbf{R}'\right)\left(\mathbf{R}\mathbf{f}_i\right)+\boldsymbol{\xi}_i$$` Now we have new loadings `\(\tilde{\boldsymbol{\Lambda}}=\boldsymbol{\Lambda}\mathbf{R}'\)` and new factors `\(\tilde{\mathbf{f}_i}=\mathbf{R}\mathbf{f}_i\)` - All rotated versions of the loadings and factors explain the data equally well and satisfy all assumptions of the model. --- # Varimax - Some rotated versions of the factors may be easier to interpret. -- - Generally if there are many zero loadings, then the factors are easy to interpret. -- - An algorithm known as varimax tries to find a rotation with as many loadings close to zero as possible. -- - It can be implemented using the `rotation='varimax'` option in `factanal`. --- #Varimax in R ```r Boston%>% column_to_rownames('Town')%>% factanal(factors = 2,scores = 'none', rotation = 'varimax')->facto_vari ``` --- #Loadings <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:500px; "><table class="table table-striped table-hover table-condensed" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> variable </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> uniqueness </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> fl1 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> fl2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> CRIM </td> <td style="text-align:right;"> 0.6957467 </td> <td style="text-align:right;"> 0.2269179 </td> <td style="text-align:right;"> 0.5027168 </td> </tr> <tr> <td style="text-align:left;"> ZN </td> <td style="text-align:right;"> 0.6318782 </td> <td style="text-align:right;"> -0.5971858 </td> <td style="text-align:right;"> -0.1072602 </td> </tr> <tr> <td style="text-align:left;"> INDUS </td> <td style="text-align:right;"> 0.1721753 </td> <td style="text-align:right;"> 0.8276842 </td> <td style="text-align:right;"> 0.3778276 </td> </tr> <tr> <td style="text-align:left;"> CHAS </td> <td style="text-align:right;"> 0.9847352 </td> <td style="text-align:right;"> 0.0900150 </td> <td style="text-align:right;"> -0.0835228 </td> </tr> <tr> <td style="text-align:left;"> NOX </td> <td style="text-align:right;"> 0.2081610 </td> <td style="text-align:right;"> 0.8637425 </td> <td style="text-align:right;"> 0.2139719 </td> </tr> <tr> <td style="text-align:left;"> RM </td> <td style="text-align:right;"> 0.7835958 </td> <td style="text-align:right;"> -0.4627562 </td> <td style="text-align:right;"> -0.0477061 </td> </tr> <tr> <td style="text-align:left;"> AGE </td> <td style="text-align:right;"> 0.3870295 </td> <td style="text-align:right;"> 0.7672114 </td> <td style="text-align:right;"> 0.1560450 </td> </tr> <tr> <td style="text-align:left;"> DIS </td> <td style="text-align:right;"> 0.2983700 </td> <td style="text-align:right;"> -0.8065489 </td> <td style="text-align:right;"> -0.2260306 </td> </tr> <tr> <td style="text-align:left;"> RAD </td> <td style="text-align:right;"> 0.0687967 </td> <td style="text-align:right;"> 0.2365087 </td> <td style="text-align:right;"> 0.9355566 </td> </tr> <tr> <td style="text-align:left;"> TX </td> <td style="text-align:right;"> 0.0307162 </td> <td style="text-align:right;"> 0.4185322 </td> <td style="text-align:right;"> 0.8911309 </td> </tr> <tr> <td style="text-align:left;"> PTRATIO </td> <td style="text-align:right;"> 0.7146628 </td> <td style="text-align:right;"> 0.0294672 </td> <td style="text-align:right;"> 0.5333993 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 0.7799470 </td> <td style="text-align:right;"> -0.3217031 </td> <td style="text-align:right;"> -0.3413600 </td> </tr> <tr> <td style="text-align:left;"> LSTAT </td> <td style="text-align:right;"> 0.5691174 </td> <td style="text-align:right;"> 0.6040563 </td> <td style="text-align:right;"> 0.2568894 </td> </tr> <tr> <td style="text-align:left;"> MEDV </td> <td style="text-align:right;"> 0.5247087 </td> <td style="text-align:right;"> -0.5762219 </td> <td style="text-align:right;"> -0.3784402 </td> </tr> </tbody> </table></div> --- #Varimax <img src="FactorModel_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- # Oblique Rotation - An orthogonal rotation did not work so instead of considering a matrix where `\(\mathbf{R}\mathbf{R}=\mathbf{I}\)`, consider a matrix `\(\mathbf{G}\mathbf{G}^{-1}=\mathbf{I}\)` `$$\mathbf{y}_i=\boldsymbol{\Lambda}\mathbf{G}\mathbf{G}^{-1}\mathbf{f}_i+\boldsymbol{\xi}_i$$` -- - Now we have new loadings `\(\tilde{\boldsymbol{\Lambda}}=\boldsymbol{\Lambda}\mathbf{G}\)` and new factors `\(\tilde{\mathbf{f}_i}=\mathbf{G}^{-1}\mathbf{f}_i\)` -- - By setting `rotation='promax'` in `factanal`, an oblique 'rotation' can be carried out. --- #Varimax in R ```r Boston%>% column_to_rownames('Town')%>% factanal(factors = 2,scores = 'none', rotation = 'promax')->facto_promax ``` --- #Loadings <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:500px; "><table class="table table-striped table-hover table-condensed" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> variable </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> uniqueness </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> fl1 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> fl2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> CRIM </td> <td style="text-align:right;"> 0.6957467 </td> <td style="text-align:right;"> 0.0831127 </td> <td style="text-align:right;"> 0.5012340 </td> </tr> <tr> <td style="text-align:left;"> ZN </td> <td style="text-align:right;"> 0.6318782 </td> <td style="text-align:right;"> -0.6473132 </td> <td style="text-align:right;"> 0.0800125 </td> </tr> <tr> <td style="text-align:left;"> INDUS </td> <td style="text-align:right;"> 0.1721753 </td> <td style="text-align:right;"> 0.8163722 </td> <td style="text-align:right;"> 0.1528374 </td> </tr> <tr> <td style="text-align:left;"> CHAS </td> <td style="text-align:right;"> 0.9847352 </td> <td style="text-align:right;"> 0.1327142 </td> <td style="text-align:right;"> -0.1267869 </td> </tr> <tr> <td style="text-align:left;"> NOX </td> <td style="text-align:right;"> 0.2081610 </td> <td style="text-align:right;"> 0.9155035 </td> <td style="text-align:right;"> -0.0480169 </td> </tr> <tr> <td style="text-align:left;"> RM </td> <td style="text-align:right;"> 0.7835958 </td> <td style="text-align:right;"> -0.5140824 </td> <td style="text-align:right;"> 0.1027513 </td> </tr> <tr> <td style="text-align:left;"> AGE </td> <td style="text-align:right;"> 0.3870295 </td> <td style="text-align:right;"> 0.8251783 </td> <td style="text-align:right;"> -0.0817944 </td> </tr> <tr> <td style="text-align:left;"> DIS </td> <td style="text-align:right;"> 0.2983700 </td> <td style="text-align:right;"> -0.8456368 </td> <td style="text-align:right;"> 0.0146546 </td> </tr> <tr> <td style="text-align:left;"> RAD </td> <td style="text-align:right;"> 0.0687967 </td> <td style="text-align:right;"> -0.0584710 </td> <td style="text-align:right;"> 0.9960908 </td> </tr> <tr> <td style="text-align:left;"> TX </td> <td style="text-align:right;"> 0.0307162 </td> <td style="text-align:right;"> 0.1660177 </td> <td style="text-align:right;"> 0.8829523 </td> </tr> <tr> <td style="text-align:left;"> PTRATIO </td> <td style="text-align:right;"> 0.7146628 </td> <td style="text-align:right;"> -0.1542302 </td> <td style="text-align:right;"> 0.6038121 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 0.7799470 </td> <td style="text-align:right;"> -0.2487379 </td> <td style="text-align:right;"> -0.2832482 </td> </tr> <tr> <td style="text-align:left;"> LSTAT </td> <td style="text-align:right;"> 0.5691174 </td> <td style="text-align:right;"> 0.6024474 </td> <td style="text-align:right;"> 0.0898444 </td> </tr> <tr> <td style="text-align:left;"> MEDV </td> <td style="text-align:right;"> 0.5247087 </td> <td style="text-align:right;"> -0.5276646 </td> <td style="text-align:right;"> -0.2392111 </td> </tr> </tbody> </table></div> --- #Promax <img src="FactorModel_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- #Possible Interpretation - Factor 1 is - Positively correlated with age. - Negatively correlated with distance. - Factor 1 is a **geographic** factor. -- - Factor 2 is - Positively correlated with crime, pupil-teacher ratio. - Negatively correlated with the race variable. - Factor 2 is a **socioeconomic** factor. --- # More than 2 factors - If there are more than 2 factors look at the loadings matrix. -- - The pattern of zeros should give some clue to the interpretation of the factors. -- - Also look for large loadings (in absolute value) --- # Oblique rotation - Oblique rotations will lead to factors that are correlated with one another. -- - This is not the case for orthogonal factors. -- - Other rotation options are available by downloading the package `GPArotation` -- - Orthogonal Rotations: Varimax, Quartimax, Equimax -- - Oblique Rotations: Promax, Oblimin, Quartimin, Simplimax --- #Factor scores - The factor scores themselves can be estimated using a variety of methods. Two are available as options in the factanal function. - Regression Scores - Bartlett’s Scores - Bartlett’s scores are unbiased estimates - These can be implemented setting `scores='regression'` or `scores='Bartlett'` in `factanal`. --- #Estimation alternatives Other estimation methods can also be used for the factor model. - One example is *Principal Axis Factoring*, which is available for R using the `psych` package. - Principal Axis Factoring does not require the normality assumption and can be adapted for item response data such as Likert scales. --- # Extended topics - What we have discussed today is often called *exploratory factor analysis*. -- - In many social sciences the latent variables may themselves influence other observed variables. -- - Such models are called *structural equation models*. -- - They can also be estimated by maximum likelihood.