plot acf and time series in the same plot

plot acf and time series in the same plot - r

I've a time series with 24h frequency. I've extracted 1 full day of the data and I've plotted individually the ts and the result of the acf.
Here is the results:
24h time serie
then I've executed acf() and plot the results:
acf on 24h time serie
I was thinking that It could be useful to pot the time serie and the acf() result in the same plot, just to the purpose of understanding the result of acf(). I've not seen any example, and so maybe it is not useful at all, but the fact is that I not understand why this is not working
here is my code:
plot(trainingPeriod.1Day.ts, xaxt='n', col='blue', ylim=c(-100, 700))
tt <- time(trainingPeriod.1Day.ts)
ix <- seq(0, length(tt) - 1, by=1)
axis(side = 1, at = tt[ix], labels = FALSE, xlab='Hour of the day')
labs <- hour(date_decimal(index(trainingPeriod.1Day.ts)))
axis(side = 1, at = tt[ix], labels = labs[ix], tcl = -0.7, cex.axis = 0.7)
Apply acf() to the window of my series:
acf.24h <- acf(trainingPeriod.1Month.ts, lag.max = 24, plot = FALSE)
Prepare data to add acf() information and use lines() function:
acf.values <-acf.24h$acf[-1]
acf.sequence <- length(acf.values)
lines(seq(from=0 , by = 1, length.out = acf.sequence), acf.values, type='h')
When adding the last command lines(), nothing gets plotted and I do not have any error in the console window.
Do you have an idea what it may be going on?
Here is the output of dput()
> dput(trainingPeriod.1Day.ts)
structure(c(19L, 10L, 32L, 24L, 65L, 279L, 437L, 543L, 293L,
188L, 280L, 252L, 209L, 181L, 203L, 214L, 264L, 229L, 148L, 108L,
55L, 72L, 47L, 32L), .Tsp = c(2018.08767123288, 2018.09029680365,
8760), class = "ts")

The problem is that the horizontal axis for the first plot is related to the time series, not the sequence of hours (0 to 23). If you multiply the acf values to fix the vertical scale (mentioned by Brendan A.) and use the same time periods for the x axis, you should get the plot you one. Here is my code to generate the following plot.
trainingPeriod.1Day.ts <- structure(c(19L, 10L, 32L, 24L, 65L, 279L, 437L, 543L, 293L,
188L, 280L, 252L, 209L, 181L, 203L, 214L, 264L, 229L, 148L, 108L,
55L, 72L, 47L, 32L), .Tsp = c(2018.08767123288, 2018.09029680365,
8760), class = "ts")
par(mar=c(4,4,2,4))
plot(trainingPeriod.1Day.ts
,xaxt='n'
,col='blue'
,ylim=c(-100, 700)
)
tt <- time(trainingPeriod.1Day.ts)
ix <- seq(0, length(tt) - 1, by=1)
axis(side = 1, at = tt[ix], labels = FALSE, xlab='Hour of the day')
labs <- hour(date_decimal(index(trainingPeriod.1Day.ts)))
axis(side = 1
, at = tt
,labels = ix
,tcl = -0.7
,cex.axis = 0.7)
acf.24h <- acf(trainingPeriod.1Day.ts
,lag.max = 24
,plot = FALSE)
acf.values <-acf.24h$acf[-1]
acf.sequence <- length(acf.values)
lines(tt[1+seq(1,acf.sequence,1)]
,acf.values*500
,type='h'
,col='red'
)
axis(side=4
,at=500*seq(-0.2,1,0.2)
,labels=seq(-0.2,1,0.2)
,main='Correlation'
)
mtext("Correlation", side = 4, line = 3)

Related

Changing the labels in histogram when using lapply() or walk() to produce histogram

I am trying to create a matrix of several histograms using lapply() or walk() - from the purrr-package.
This is a fabricated version of my data set including only 5 of 11 columns and 3 of about 100 rows:
pid
gender
Rand
BP
GH
VT
1
F
D
5
7
5
2
M
A
6
10
5
3
F
D
0
30
5
This is the code I'm using and were I would like to add something to change the x-label depending on the i-value.
x <- datf #dataframe
u <- x[,4:11]
par(mfrow=c(2,4))
walk(x[,4:11],
function(i)
{hist(i[x$rand=="D"],
col=rgb(0,0,1,0.2),
main = "Histogram of score",
ylim=c(0,100))
hist(i[x$rand=="A"],
col=rgb(1,0,0,0.2),
add=TRUE)})
Instead of walk() I have used lapply() - but to hide the output in the Rmarkdown document changed to walk().
I have tried to use xlab = paste(colnames(i)) and xlab = paste(colnames(u)), after reading similar questions; Using lapply on a dataframe to create histograms with labels and Labels for histogram, when using “lapply”
The xlab = paste(colnames(u)) is the closest but the x-label in the histogram is not the right one rather a list of all of them.
Please see the image.
Image
However, when I'm creating a similar histogram but of only one set of data in the histogram, i.e. not including hist(i[x$rand=="A"], col=rgb(1,0,0,0.2), add=TRUE). It works fine.
mapply(hist, as.data.frame(x[,4:11]), main=colnames(x[,4:11]), xlab="score")
I created a example dataset, that in it´s form looks like mine, see code.
Library("dplyr")
datf <- data.frame(cbind(sample(0:100,size=150, replace=T),
sample(0:100,size=150, replace=T),
sample(0:100,size=150, replace=T),
sample(0:100,size=150, replace=T),
sample(0:100,size=150, replace=T),
sample(0:100,size=150, replace=T),
sample(0:100,size=150,replace=T),
sample(0:100,size=150, replace=T)))
datf$rand <- sample(c("D","A"),150, replace=T, prob=c(0.45,0.45))
datf$pid <- sample(1:150, replace=F, size=150)
datf$gender <- sample(c("F","M"),150, replace=T, prob=c(0.35,0.65))
datf <- datf%>%
rename(
BP=X1,
GH=X2,
VT=X3,
MH=X4,
SF=X5,
PF=X6,
RP=X7,
RE=X8
)
datf <- datf[, c("pid","rand","gender", "BP", "GH","VT","MH", "PF" , "RP", "RE","SF")]
And dput()
structure(list(pid = c(108L, 54L, 75L, 2L), rand = c("A", "A",
"A", "A"), gender = c("M", "M", "F", "M"), BP = c(70L, 13L, 27L,
66L), GH = c(2L, 68L, 61L, 19L), VT = c(57L, 68L, 30L, 0L), MH = c(65L,
69L, 21L, 47L), PF = c(100L, 38L, 70L, 60L), RP = c(77L, 27L,
59L, 38L), RE = c(66L, 9L, 68L, 48L), SF = c(30L, 74L, 64L, 20L
)), row.names = c(NA, 4L), class = "data.frame")
This is how I would like the output to look like:
See image here
Would it be easier to use ggplot? - But then how?
Thank you in advance!

Maybe something like this is closer to what you are looking for?
library(tidyverse)
datf %>%
pivot_longer(cols = BP:SF) %>%
ggplot() + aes(value, fill = rand) +
geom_histogram() + facet_wrap(~name)

Boxplot labelling outliers returns an error using data rownames

I am trying to label the outliers in my boxplot using the text function so I can find out from which class the outliers are coming from. I've stored the rownames of my data in variable "rownames" using names(vehData) to get the row names. When I apply this however, I get an error.
ERROR: Error in which(removeOutliers1 == bxpdat$out, arr.ind = TRUE) :
'list' object cannot be coerced to type 'double'
Completely new to R programming. Completely not sure how to fix this or what I am doing wrong
Thanks in advance for any help!
library(reshape2)
vehData <-
structure(
list(
Samples = 1:6,
Comp = c(95L, 91L, 104L, 93L, 85L,
107L),
Circ = c(48L, 41L, 50L, 41L, 44L, 57L),
D.Circ = c(83L,
84L, 106L, 82L, 70L, 106L),
Rad.Ra = c(178L, 141L, 209L, 159L,
205L, 172L),
Pr.Axis.Ra = c(72L, 57L, 66L, 63L, 103L, 50L),
Max.L.Ra = c(10L,
9L, 10L, 9L, 52L, 6L),
Scat.Ra = c(162L, 149L, 207L, 144L, 149L,
255L),
Elong = c(42L, 45L, 32L, 46L, 45L, 26L),
Pr.Axis.Rect = c(20L,
19L, 23L, 19L, 19L, 28L),
Max.L.Rect = c(159L, 143L, 158L, 143L,
144L, 169L),
Sc.Var.Maxis = c(176L, 170L, 223L, 160L, 241L, 280L),
Sc.Var.maxis = c(379L, 330L, 635L, 309L, 325L, 957L),
Ra.Gyr = c(184L,
158L, 220L, 127L, 188L, 264L),
Skew.Maxis = c(70L, 72L, 73L,
63L, 127L, 85L),
Skew.maxis = c(6L, 9L, 14L, 6L, 9L, 5L),
Kurt.maxis = c(16L,
14L, 9L, 10L, 11L, 9L),
Kurt.Maxis = c(187L, 189L, 188L, 199L,
180L, 181L),
Holl.Ra = c(197L, 199L, 196L, 207L, 183L, 183L),
Class = c("van", "van", "saab", "van", "bus", "bus")
),
row.names = c(NA,
6L), class = "data.frame")
#Remove outliers
removeOutliers <- function(data) {
OutVals <- boxplot(data)$out
remOutliers <- sapply(data, function(x) x[!x %in% OutVals])
return (remOutliers)
}
vehDataRemove1 <- vehData[, -1]
vehDataRemove2 <- vehDataRemove1[,-19]
vehData <- vehDataRemove2
vehClass <- vehData$Class
rownames <- names(vehData) #column names
#Begin removing outliers
removeOutliers1 <- removeOutliers(vehData)
bxpdat <- boxplot(removeOutliers1)
#Also tried using vehicles$Class instead of rownames but get the same error
text(bxpdat$group, bxpdat$out,
rownames[which(removeOutliers1 == bxpdat$out, arr.ind = TRUE)[,1]],
pos = 4)
The boxplot looks like this. I am trying to label the outliers based on the x axis e.g. "Comp", "Circ", "D.Circ", "Rad.Ra", "Max.L.Ra" etc.. & by vehicle class "Van", "Bus" ..
Crammed text issue when identifying class

If it is the outliers in the 2nd boxplot, it would be:
bxpdat <- boxplot(removeOutliers1)
text(bxpdat$group, bxpdat$out,
bxpdat$names[bxpdat$group],
pos = 4)
Maybe looks better like this, if you adjust the margin and flip the labels:
par(mar=c(8,3.5,3.5,3.5))
bxpdat = boxplot(removeOutliers1,las=2,cex=0.5)
text(bxpdat$group, bxpdat$out,
bxpdat$names[bxpdat$group],
pos = 4,cex=0.5)

I understood the question differently to #StupidWolf. I thought the goal was to replace points indicating outliers with the text of the vehicle class (bus, van or saab). If you simply print the variable name (e.g. Skew.maxis), then you might as well have simply plotted the outliers as points. Unless I'm missing something.
Here is code to answer the question as I understood it, for what it's worth (beginning after defining removeOutliers):
# CHANGE: Create vehClass vector before removing Class from the dataframe
vehClass <- vehData$Class
vehDataRemove1 <- vehData[, -1]
vehDataRemove2 <- vehDataRemove1[,-19]
vehData <- vehDataRemove2
#Begin removing outliers
removeOutliers1 <- removeOutliers(vehData)
bxpdat <- boxplot(removeOutliers1) # use boxplot(vehData) if you plot all the outliers as points
# loop over columns
n_plot <- 1; set.seed(123) # only plot n_plot randomly-chosen outliers
for(i in 1:ncol(vehData)){
# find out which row indices were removed as outliers
diffInd <- which(vehData[[i]] %in% setdiff(vehData[[i]], removeOutliers1[[i]]))
# if none were, then don't add any outlier text
if(length(diffInd) == 0) next
print(i)
print(paste0("l:", length(diffInd)))
if(length(diffInd) > n_plot){
diffIndPlot <- sample(diffInd, n_plot, replace = FALSE)
} else diffIndPlot <- diffInd
text(x = i, y = vehData[[i]][diffIndPlot],
labels = paste0(vehClass[diffIndPlot], ": ", vehData[[i]][diffIndPlot]))
}

Plotting results with missing categories in interaction with emmeans

I have a quite "messy data". I have a model with a interaction between two factors. And I want to plot it. So:
f1 <- structure(list(tipo = c("digitables", "digitables", "digitables",
"digitables", "digitables", "digitables", "digitables", "digitables",
"payments", "payments", "payments", "payments", "payments", "payments",
"payments", "payments", "traditionals", "traditionals", "traditionals",
"traditionals", "traditionals", "traditionals", "traditionals",
"traditionals"), categoria = c("Advice", "Digital banks", "Exchange",
"FinTech", "Insurance", "Investments", "Lending", "Payments and transfers",
"Advice", "Digital banks", "Exchange", "FinTech", "Insurance",
"Investments", "Lending", "Payments and transfers", "Advice",
"Digital banks", "Exchange", "FinTech", "Insurance", "Investments",
"Lending", "Payments and transfers"), Total = c(63L, 450L, 279L,
63L, 36L, 108L, 567L, 549L, 63L, 450L, 279L, 63L, 36L, 108L,
567L, 549L, 35L, 250L, 155L, 35L, 20L, 60L, 315L, 305L), Frequencia = c(44L,
266L, 118L, 9L, 14L, 45L, 134L, 242L, 33L, 68L, 2L, 10L, 3L,
8L, 11L, 78L, 27L, 226L, 142L, 10L, 20L, 45L, 300L, 245L), Perc = c(69.84,
59.11, 42.29, 14.29, 38.89, 41.67, 23.63, 44.08, 52.38, 15.11,
0.72, 15.87, 8.33, 7.41, 1.94, 14.21, 77.14, 90.4, 91.61, 28.57,
100, 75, 95.24, 80.33), Failure = c(19L, 184L, 161L, 54L, 22L,
63L, 433L, 307L, 30L, 382L, 277L, 53L, 33L, 100L, 556L, 471L,
8L, 24L, 13L, 25L, 0L, 15L, 15L, 60L)), row.names = c(NA, -24L
), class = "data.frame")
# Packages
library(dplyr)
library(ggplot2)
library(emmeans) #version 1.4.8. or 1.5.1
# Works as expected
m1 <- glm(cbind(Frequencia, Failure) ~ tipo*categoria,
data = f1, family = binomial(link = "logit"))
l1 <- emmeans(m1, ~categoria|tipo)
plot(l1, type = "response",
comparison = T,
by = "categoria")
Using by="tipo" results:
# Doesn't work:
plot(l1, type = "response",
comparison = T,
by = "tipo")
Error: Aborted -- Some comparison arrows have negative length!
In addition: Warning message:
Comparison discrepancy in group digitables, Advice - Insurance:
Target overlap = -0.0241, overlap on graph = 0.0073
If I use comparison = F as suggested by explanation supplement vignette, it works. However, it does not show me the arrows, which are very important.
Q1 - Is there a work around for it? (Or is it impossible due to my data?)
As we can see from the last plot, there is a category with probability = 1 (categoria=Insurance and tipo=traditionals). So, I delete only this row of my data frame, and I try to redo the plotting, and results to me:
f1 <- f1 %>%
filter(!Perc ==100)
m1 <- glm(cbind(Frequencia, Failure) ~ tipo*categoria,
data = f1, family = binomial(link = "logit"))
l1 <- emmeans(m1, ~categoria|tipo)
plot(l1, type = "response",
comparison = T,
by = "categoria")
Error in if (dif[i] > 0) lmat[i, id1[i]] = rmat[i, id2[i]] = wgt * v1[i] else rmat[i, :
missing value where TRUE/FALSE needed
Q2 - How to plot my results even when I have a missing level of one variable (with respect to another variable?). I would expect that the Insurance facet would have only have the payments and digitables levels (while the others remain the same).

First, please don't ever re-use the same variable names for more than one thing; that makes things not reproducible. If you modify a dataset, or a model, or whatever, give it a new name so it can be distinguished.
Q1
As documented, comparison arrows cannot always be computed. This is such an example. I suggest displaying the results some other way, e.g. using pwpp() or pwpm()
Q2
There was a bug in handling missing cases. This has been fixed in the GitHub version:
f2 <- f1 %>%
filter(!Perc ==100)
m2 <- glm(cbind(Frequencia, Failure) ~ tipo*categoria,
data = f2, family = binomial(link = "logit"))
l2 <- emmeans(m2, ~categoria|tipo)
plot(l2, type = "response",
comparison = TRUE,
by = "categoria")
plot(l2, type = "response",
comparison = TRUE,
by = "tipo")
## Error: Aborted -- Some comparison arrows have negative length!
## (in group "payments")

Combine scatterplot and barplot, then lapply

I am trying to add a scatterplot and a barplot within the same plot area with ggplot. The scatterplot should be averages of var. '1' over var.'2' for one dataset, and the barplot should be the average value of '1' over my control dataset.
My data looks like this:
> dput(lapply(ubbs6, head))
list(structure(c(96L, 96L, 100L, 88L, 93L, 100L, 61L, 61L, 70L,
40L, 58L, 70L, 7807L, 7357L, 7695L, 6400L, 6009L, 7735L), .Dim = c(6L,
3L), .Dimnames = list(NULL, c("1", "2", "3"))), structure(c(99L,
96L, 100L, 96L, 96L, 96L, 66L, 67L, 70L, 63L, 57L, 62L, 7178L,
6028L, 6124L, 6082L, 6873L, 5629L, 31L, 27L, 60L, 42L, 12L, 18L
), .Dim = c(6L, 4L), .Dimnames = list(NULL, c("1", "2",
"3", "4"))), structure(c(99L, 95L, 95L, 100L, 96L, 95L, 69L,
58L, 56L, 70L, 61L, 65L, 6067L, 6331L, 6247L, 5988L, 7538L, 6162L,
50L, 36L, 67L, 10L, 55L, 70L), .Dim = c(6L, 4L), .Dimnames = list(
NULL, c("1", "2", "3", "4"))))
Example of what I've tried so far:
aggregate(ubbs6[[2]][,'1'], list(ubbs6[[2]][,'2']), mean)
m162 <- aggregate(ubbs6[[2]][,'1'], list(ubbs6[[2]][,'2']), mean)
m163 <- aggregate(ubbs6[[3]][,'1'], list(ubbs6[[3]][,'2']), mean)
m161 <- mean(ubbs6[[1]][,'1'])
ggplot(m162, aes_(x = m162[,'Group.1'], y = m162[,'x']))+
geom_point()+
geom_smooth(method = 'lm', formula = 'y ~ sqrt (x)')
I would like to do two things:
add a barplot of one x,y value of my control set (ubbs6[[1]])
throw this into a lapply structure so I can do this for 11 similar datasets
Any help would be greatly appreciated!
**EDIT: edited out specific details that aren't needed for others to understand the code **

Saving your data in d, you can try
ggplot(as.data.frame(d[[2]]),aes(age, FPAR) ) +
coord_cartesian(ylim = c(90,100)) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ sqrt (x)') +
geom_col(data=data.frame(x=max(as.data.frame(d[[2]])$age),
y=mean(as.data.frame(d[[1]])$FPAR)),
aes(x,y), inherit.aes = FALSE)
You have to use coord_cartesian to specify the y-limits and inherit.aes = FALSE. Otherwise the bar is not correctly drawn.
When you have to combine your second and third dataframe in one plot, you can try
library(tidyverse)
d %>%
.[2:3] %>%
map(as.data.frame) %>%
bind_rows(.id = "id") %>%
mutate(max = max(age),
Mean = mean(d[[1]][1])) %>%
ggplot(aes(age, FPAR, color=id)) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ sqrt (x)', se=FALSE) +
geom_col(data = . %>% distinct(max, Mean),
aes(max, Mean), inherit.aes = FALSE)

How to add multiple confidence ellipses to PCA plot in R?

I am very very new to R and stats in general, and am having trouble adding multiple confidence ellipses to a PCA plot.
My interest is in highlighting potential groupings/clusters in the PCA plot with 95% confidence ellipses. I have tried using the dataEllipse function in R, however I cannot figure out how to add multiple ellipses with different centers to the PCA plot (the centers would be at various points that appear to contain a cluster, in this case lithic sources and lithic tools likely made from that source).
Thanks for any help with this!
{
lithic_final <- LITHIC.DATASHEET.FOR.R.COMPLETE.FORMAT
lithic_final
pca1 <- princomp(lithic_final); pca1
lithic_source <- c("A1", "A1", "A1", "A1", "A2","A2", "A2", "A3","A3","A3","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D","D","E","E","E","E","E","E","E","E","F","F","G","G","G","G","H","H","H","H","H","H","H","I1","I1","I1","I2","I2","I2","I2","I2","J1","J1","J2","J2","J2","J2","J2","J2","J2","J2","J2","K","K","K","K","K","K","K","L","L","L","L","L","L","L","L","L","L","L","L","L","L","BB1","BB1","BB1","FC","FC","FC","JRPP","JRPP","JRPP","BB2","BB2","BB2","BB2","MWP","MWP","MWP","MWP","RPO","RPO","RPO")
lithic_source
summary(pca1)
plot(pca1)
#Plotting the scores with the Lithic Source Info
round(pca1$scores[,1:2], 2)
pca_scores <-round(pca1$scores[,1:2], 2)
plot(pca1$scores[,1], pca1$scores[,2], type="n")
text(pca1$scores[,1], pca1$scores[,2],labels=abbreviate(lithic_source, minlength=3), cex=.45)
#Plotting PCA Scores of EACH SAMPLE for PCA 2 and 3 with Lithic Source Info
round(pca1$scores[,2:3], 2)
pca2_3_scores <-round(pca1$scores[,2:3], 2)
plot(pca1$scores[,2], pca1$scores[,3], type="n")
text(pca1$scores[,2], pca1$scores[,3], labels=abbreviate(lithic_source, minlength=3), cex=.45)
#Plotting PCA Scores of EACH SAMPLE for PCA 3 and 4 with Lithic Source Info
round(pca1$scores[,3:4], 2)
pca3_4_scores <-round(pca1$scores[,3:4], 2)
plot(pca1$scores[,3], pca1$scores[,4], type="n")
text(pca1$scores[,3], pca1$scores[,4], labels=abbreviate(lithic_source, minlength=3), cex=.45)
#Plotting PCA Scores of EACH SAMPLE for PCA 1 and 3 with Lithic Source Info
round(pca1$scores[,1:3], 2)
pca1_3_scores <-round(pca1$scores[,1:3], 2)
plot(pca1$scores[,1], pca1$scores[,3], type="n")
text(pca1$scores[,1], pca1$scores[,3], labels=abbreviate(lithic_source, minlength=3), cex=.45)
#Plotting PCA Scores of EACH SAMPLE for PCA 1 and 4 with Lithic Source Info
round(pca1$scores[,1:4], 2)
pca1_4_scores <-round(pca1$scores[,1:4], 2)
plot(pca1$scores[,1], pca1$scores[,4], type="n")
text(pca1$scores[,1], pca1$scores[,4], labels=abbreviate(lithic_source, minlength=3), cex=.45)
#TRYING TO GET ELLIPSES ADDED TO PCA 1 and 4 scores
dataEllipse(pca1$scores[,1], pca1$scores[,4],centers=12,add=TRUE,levels=0.9, plot.points=FALSE)
structure(list(Ca.K12 = c(418L, 392L, 341L, 251L, 297L, 238L,
258L, 5L, 2L, 37L), Cr.K12 = c(1L, 12L, 15L, 6L, 9L, 6L, 35L,
7L, 45L, 32L), Cu.K12 = c(89L, 96L, 81L, 63L, 88L, 103L, 104L,
118L, 121L, 90L), Fe.K12 = c(18627L, 18849L, 18413L, 12893L,
17757L, 17270L, 16198L, 2750L, 4026L, 3373L), K.K12 = c(20L,
23L, 28L, 0L, 34L, 17L, 45L, 102L, 150L, 147L), Mn.K12 = c(205L,
212L, 235L, 120L, 216L, 212L, 246L, 121L, 155L, 115L), Nb.K12 = c(139L,
119L, 154L, 91L, 122L, 137L, 137L, 428L, 414L, 428L), Rb.K12 = c(99L,
42L, 79L, 49L, 210L, 243L, 168L, 689L, 767L, 705L), Sr.K12 = c(3509L,
3766L, 3481L, 2715L, 2851L, 2668L, 2695L, 202L, 220L, 217L),
Ti.K12 = c(444L, 520L, 431L, 293L, 542L, 622L, 531L, 82L,
129L, 84L), Y.K12 = c(135L, 121L, 105L, 74L, 144L, 79L, 85L,
301L, 326L, 379L), Zn.K12 = c(131L, 133L, 108L, 78L, 124L,
111L, 114L, 81L, 78L, 59L), Zr.K12 = c(1348L, 1479L, 1333L,
964L, 1506L, 1257L, 1296L, 3967L, 4697L, 4427L)), .Names = c("Ca.K12",
"Cr.K12", "Cu.K12", "Fe.K12", "K.K12", "Mn.K12", "Nb.K12", "Rb.K12",
"Sr.K12", "Ti.K12", "Y.K12", "Zn.K12", "Zr.K12"), row.names = c(NA,
10L), class = "data.frame")

I think you would have received a speedier reply if you had focused on your question instead of all the extraneous stuff. You gave us your commands for plotting a bunch of principal components that had nothing to do with your question. The question is, how do you plot ellipses by group? Your sample data at 10 lines and three groups is not helpful because 3 points is not enough to plot data ellipses. You are using the dataEllipse function in package car which has the simplest answer to your question:
First, a reproducible example:
set.seed(42) # so you can get the same numbers I get
source_a <- data.frame(X1=rnorm(25, 50, 5), X2=rnorm(25, 40, 5))
source_b <- data.frame(X1=rnorm(25, 20, 5), X2=rnorm(25, 40, 5))
source_c <- data.frame(X1=rnorm(25, 35, 5), X2=rnorm(25, 25, 5))
lithic_dat <- rbind(source_a, source_b, source_c)
lithic_source <- c(rep("a", 25), rep("b", 25), rep("c", 25))
Plot ellipses with scatterplot() and add text:
scatterplot(X2~X1 | lithic_source, data=lithic_dat, pch="", smooth=FALSE,
reg.line=FALSE, ellipse=TRUE, levels=.9)
text(lithic_dat$X1, lithic_dat$X2, lithic_source, cex=.75)
Scatterplot can be tweaked to do everything you want, but it is also
possible to plot the ellipses without using it:
sources <- unique(lithic_source) # vector of the different sources
plot(lithic_dat$X1, lithic_dat$X1, type="n")
text(lithic_dat$X1, lithic_dat$X2, lithic_source, cex=.75)
for (i in sources) with(lithic_dat, dataEllipse(X1[lithic_source==i],
X2[lithic_source==i], levels=.9, plot.points=FALSE))
This will work for your principal components and any other data.

Here is a simple solution using a package called ggbiplot (available on github) with Iris data. I hope this is what you were looking for.
library(devtools);install_github('vqv/ggbiplot')
library(ggbiplot)
pca = prcomp(iris[,1:4])
ggbiplot(pca,groups = iris$Species,ellipse = T,ellipse.prob = .95)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

plot acf and time series in the same plot - r

Related

Changing the labels in histogram when using lapply() or walk() to produce histogram

Boxplot labelling outliers returns an error using data rownames

Plotting results with missing categories in interaction with emmeans

Combine scatterplot and barplot, then lapply

How to add multiple confidence ellipses to PCA plot in R?

Categories

Resources