poissonisfish

Pingback: Principal Component Analysis in R - Use-R!Use-R!

Pingback: Principal Component Analysis in R | A bunch of data

2017-01-24T07:15:00+00:00

nicely explained..well done sir!

LikeLike

Reply

Pingback: Principal Component Analysis in R – Mubashir Qasim

2017-01-25T00:31:08+00:00

Good job please treat PCA using mintab to solve a problem and kindkly explain in details kore than you do for R. Thanks

LikeLike

Reply

2017-01-25T11:54:02+00:00

Hi Owolabi. I presume you meant “minitab”? I apologize but the focus of my blog is R. I hope you can find the information you seek somewhere else! Best regards

LikeLike

Reply

Pingback: Distilled News | Data Analytics & R

Pingback: Poisson is Fish: Principal Component Analysis in R | The Information Age

Pingback: Linkdump #26 | WZB Data Science Blog

Pingback: Partial least squares in R – poissonisfish

Pingback: Partial least squares in R – Mubashir Qasim

2017-06-18T08:20:49+00:00

Dear Francisco, thank you for your blogs, it is a bit of “auto promotion” but if you are interested in PCA based methods you could also look at FactoMineR http://factominer.free.fr/ and there are many videos https://www.youtube.com/playlist?list=PLnZgp6epRBbTsZEFXi_p6W48HhNyqwxIu
Best wishes,
Julie

LikeLike

Reply

Pingback: Genome-wide association studies in R – poissonisfish

Pingback: Genome-wide association studies in R – Cloud Data Architect

2018-03-17T18:09:53+00:00

Did you make the animation at the start of the article with R? I had some idea about PCA, but even before going through the article, the animation gave me so much more insight. Thanks for a great job!

LikeLike

Reply

2018-10-23T21:14:28+00:00

This is an awesome explanation, but SO many variables. I only have two. How would I do this with only two variables?

LikeLike

Reply

2018-10-23T21:29:28+00:00

Hi Alterra, thank you! The same as with three or more. The animation on top illustrates the two-dimensional case very well. Is your question how to do it mathematically, via the SVD algorithm?
Francisco

LikeLike

Reply

Pingback: The all-new caret interface in R – poissonisfish

2020-08-18T16:03:36+00:00

Hi Francisco, thank you very much for this post! I found it very helpful and enjoyable to read (also the references to the PCA and SVD resources were great).

As I followed your analysis, I tried to install the library pcaMethods and I found that the installation is different for the new versions of R.

The following code worked for me:

if (!requireNamespace(“BiocManager”, quietly = TRUE))
+ install.packages(“BiocManager”)
BiocManager::install(version = “3.11”)

BiocManager::install(c(“pcaMethods”))

Maybe this can help others.

LikeLike

Reply

2020-08-18T17:47:19+00:00

Hi Fede,
Thanks for your feedback! I might follow your code example to correct mine above, greetings.
Francisco

LikeLike

Reply

2020-11-15T23:39:26+00:00

Hi Francisco
I’ve been struggling all weekend because on Tuesday I want to teach my Analytical Chemistry students how to do Principal Component Analysis in R. I have been using R since this May, and I have been studying PCA on my own for six weeks. I have been watching many videos and reading many tutorials. Yours was able to make it clear for me, and in the light of it I am going to revisit the other materials and try to discover what I was missing.

I appreciate how I was able to copy your code and paste it into R and have it run.

Thank you!

LikeLike

Reply

2020-11-16T07:02:26+00:00

Dear Greg, thanks for the kind words. I am glad you found it useful, hopefully your students will build an intuition around PCA and its applications. Greetings and have a nice week, Francisco

LikeLike

Reply

2020-11-17T03:10:07+00:00

Question for you Francisco
I wan students to be able to copy and paste commands from my notes. That’s not working when I use google docs–I’m getting errors when I paste a command, but it works when I retype it in R. What did you use to display R commands in your tutorial?

LikeLike

Reply

2020-11-17T08:00:22+00:00

Hi Greg, I suggest using a code-syntax friendly text editor, you have plenty of free options – Notepad++, Sublime or Atom to name a few. The best option in my mind is having your students installing RStudio and pasting the code into a new, blank script. Greetings

LikeLike

Reply

	# Generate scaled 4*5 matrix with random std normal samples
	set.seed(101)
	mat <- scale(matrix(rnorm(20), 4, 5))
	dimnames(mat) <- list(paste("Sample", 1:4), paste("Var", 1:5))

	# Perform PCA
	myPCA <- prcomp(mat, scale. = F, center = F)
	myPCA$rotation # loadings
	myPCA$x # scores

	# Perform SVD
	mySVD <- svd(mat)
	mySVD # the diagonal of Sigma mySVD$d is given as a vector
	sigma <- matrix(0,4,4) # we have 4 PCs, no need for a 5th column
	diag(sigma) <- mySVD$d # sigma is now our true sigma matrix

	# Compare PCA scores with the SVD's U*Sigma
	theoreticalScores <- mySVD$u %*% sigma
	all(round(myPCA$x,5) == round(theoreticalScores,5)) # TRUE

	# Compare PCA loadings with the SVD's V
	all(round(myPCA$rotation,5) == round(mySVD$v,5)) # TRUE

	# Show that mat == USigmat(V)
	recoverMatSVD <- theoreticalScores %*% t(mySVD$v)
	all(round(mat,5) == round(recoverMatSVD,5)) # TRUE

	# Show that mat == scores*t(loadings)
	recoverMatPCA <- myPCA$x %*% t(myPCA$rotation)
	all(round(mat,5) == round(recoverMatPCA,5)) # TRUE

	# Name the variables
	colnames(wine) <- c("Cvs","Alcohol","Malic acid","Ash",
	"Alcalinity of ash", "Magnesium",
	"Total phenols", "Flavanoids",
	"Nonflavanoid phenols", "Proanthocyanins",
	"Color intensity", "Hue",
	"OD280/OD315 of diluted wines", "Proline")

	# The first column corresponds to the classes
	wineClasses <- factor(wine$Cvs)

	# Use pairs
	pairs(wine[,-1], col = wineClasses, upper.panel = NULL,
	pch = 16, cex = 0.5)
	legend("topright", bty = "n", legend = c("Cv1","Cv2","Cv3"),
	pch = 16, col = c("black","red","green"),
	xpd = T, cex = 2, y.intersp = 0.5)

	wineOutlier <- wine
	wineOutlier[10,] <- wineOutlier[10,]*10 # change the 10th obs. into an extreme one by multiplying its profile by 10
	outlierPCA <- prcomp(scale(wineOutlier[,-1]))
	plot(outlierPCA$x[,1:2], col = wineClasses)

poissonisfish

Principal Component Analysis in R

Mathematical foundation

Let’s get started with R

PCA of the wine data set

PCA of the wine data set with pcaMethods

PCR with the housing data set

24 thoughts on “Principal Component Analysis in R”

Leave a comment Cancel reply

	wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",
	sep=",")

	dev.off() # clear the format from the previous plot
	winePCA <- prcomp(scale(wine[,-1]))
	plot(winePCA$x[,1:2], col = wineClasses)

	if (!requireNamespace("BiocManager", quietly = TRUE))
	install.packages("BiocManager")

	BiocManager::install("pcaMethods")
	library(pcaMethods)

	winePCAmethods <- pca(wine[,-1], scale = "uv", center = T,
	nPcs = 2, method = "svd")
	slplot(winePCAmethods, scoresLoadings = c(T,T),
	scol = wineClasses)

	str(winePCAmethods) # slots are marked with @
	winePCAmethods@R2

	houses <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data",
	header = F, na.string = "?")
	colnames(houses) <- c("CRIM", "ZN", "INDUS","CHAS",
	"NOX","RM","AGE","DIS","RAD",
	"TAX","PTRATIO","B","LSTAT","MEDV")

	# Perform PCA
	pcaHouses <- prcomp(scale(houses[,-14]))
	scoresHouses <- pcaHouses$x

	# Fit lm using the first 3 PCs
	modHouses <- lm(houses$MEDV ~ scoresHouses[,1:3])
	summary(modHouses)

	# Fit lm using all 14 vars
	modHousesFull <- lm(MEDV ~ ., data = houses)
	summary(modHousesFull) # R2 = 0.741

	# Compare obs. vs. pred. plots
	par(mfrow = c(1,2))
	plot(houses$MEDV, predict(modHouses),
	xlab = "Observed MEDV", ylab = "Predicted MEDV",
	main = "PCR", abline(a = 0, b = 1, col = "red"))
	plot(houses$MEDV, predict(modHousesFull),
	xlab = "Observed MEDV", ylab = "Predicted MEDV",
	main = "Full model", abline(a = 0, b = 1, col = "red"))

Mathematical foundation

Let’s get started with R

PCA of the wine data set

PCA of the wine data set with pcaMethods

PCR with the housing data set

Share this:

24 thoughts on “Principal Component Analysis in R”

Leave a comment Cancel reply