My goal is to plot a geom_smooth (in the first instance) based on a linear model controlling (or centering) for covariates. I can easily produce a chart that plots the IV (x) and the DV (y), but I can’t seem to figure out how to adjust the DV for covariates.
Variables:
"Age" = as stated (covariate to be controlled for)
"Gender" = as stated (covariate to be controlled for)
"O" = Openness (non-mean-centred personality trait - see "Omc")
"E" = Extraversion (the non-mean-centred independent variable to be plotted)
"Professionals" = Occupation (the non-mean-centred moderator variable to be plotted)
"Omc" = Mean-centred Openness (covariate to be controlled for in the lm model)
"Emc" = Mean-centred Extraversion (independent variable in the lm model)
"Pmc" = Mean-centred Professionals (moderator variable in the lm model)
"OxP" = Openness x Professionals interaction term (controlled for in the lm model)
"ExP" = Extraversion x Professionals interaction term (of primary interest in the lm model and serves as the justification for the ggplot)
Here is the code I have for the model and the associated chart:
lm.js <- lm(Job_Very_Stressful ~ Age + Gender + Omc + Emc + Pmc + OxP + ExP, data = df.eg)
summary(lm.js)
ggplot(df.eg, aes(x = E, y = Job_Very_Stressful, col = Professionals)) + geom_smooth(method = "lm", alpha = 0.25) + labs(x = "Extraversion", y = "Job Stress") + theme(legend.title=element_blank(), legend.position = "top", text = element_text(size=13), axis.text = element_text(size=10), panel.grid.major = element_line(colour = "grey", size = 0.5, 3)) + coord_cartesian(ylim = c(1.00, 10.00), xlim = c(1.00, 7.00)) + scale_y_continuous(breaks = seq(1.00, 10.00, 0.50)) + scale_x_continuous(breaks = seq(1.00, 7.00, 1.00))
Sample data using dput(head(df.eg, 50)) (these data will not produce a significant lm result, but it doesn't matter for this purpose):
structure(list(Age = c(37L, 66L, 33L, 55L, 60L, 61L, 27L, 54L,
55L, 33L, 33L, 27L, 20L, 25L, 18L, 38L, 36L, 41L, 38L, 58L, 37L,
32L, 45L, 24L, 51L, 37L, 15L, 48L, 43L, 19L, 49L, 39L, 38L, 28L,
42L, 26L, 37L, 58L, 55L, 46L, 57L, 45L, 16L, 27L, 33L, 58L, 23L,
60L, 30L, 24L), Gender = c(1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L), O = c(6.16666666666667,
4.16666666666667, 5.16666666666667, 4.66666666666667, 6.5, 4.66666666666667,
3, 4.5, 5, 4.16666666666667, 3.5, 3.5, 4.66666666666667, 4.33333333333333,
4.83333333333333, 3.66666666666667, 6, 5.5, 2, 4.66666666666667,
3.16666666666667, 3.33333333333333, 3.66666666666667, 2.5, 4.33333333333333,
6.83333333333333, 5, 4.16666666666667, 4.66666666666667, 5.5,
4.33333333333333, 5.16666666666667, 3.5, 2.66666666666667, 5.33333333333333,
2.16666666666667, 4, 4.16666666666667, 4.16666666666667, 3.83333333333333,
2.83333333333333, 5.5, 3.33333333333333, 5.83333333333333, 4,
2.83333333333333, 5, 3.83333333333333, 4.83333333333333, 5.83333333333333
), E = c(5.33333333333333, 5.16666666666667, 5.83333333333333,
5.5, 5.33333333333333, 6.83333333333333, 4.5, 5, 6.83333333333333,
4.66666666666667, 3, 4.5, 5.33333333333333, 6.16666666666667,
5.16666666666667, 5.66666666666667, 6.5, 2, 3.16666666666667,
3.16666666666667, 2.83333333333333, 4, 3.66666666666667, 3.16666666666667,
4.16666666666667, 3.33333333333333, 6.5, 3.83333333333333, 4.33333333333333,
2.83333333333333, 4, 4, 3, 5.33333333333333, 3.83333333333333,
3.83333333333333, 4.33333333333333, 5.66666666666667, 4.33333333333333,
5.83333333333333, 3.83333333333333, 2.66666666666667, 4.16666666666667,
4.66666666666667, 4.5, 3.83333333333333, 6.5, 3.5, 4.33333333333333,
5.5), Professionals = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Job_Very_Stressful = c(1L, 1L,
5L, 7L, 1L, 3L, 5L, 5L, 4L, 6L, 4L, 1L, 2L, 2L, 2L, 4L, 2L, 1L,
1L, 2L, 5L, 5L, 2L, 6L, 5L, 5L, 2L, 5L, 3L, 1L, 2L, 2L, 7L, 1L,
2L, 3L, 5L, 1L, 3L, 3L, 5L, 6L, 3L, 4L, 4L, 3L, 3L, 1L, 3L, 5L
), Omc = c(1.76244927536232, -0.237550724637678, 0.762449275362322,
0.262449275362322, 2.09578260869565, 0.262449275362322, -1.40421739130435,
0.0957826086956519, 0.595782608695652, -0.237550724637678, -0.904217391304348,
-0.904217391304348, 0.262449275362322, -0.0708840579710177, 0.429115942028982,
-0.737550724637678, 1.59578260869565, 1.09578260869565, -2.40421739130435,
0.262449275362322, -1.23755072463768, -1.07088405797102, -0.737550724637678,
-1.90421739130435, -0.0708840579710177, 2.42911594202898, 0.595782608695652,
-0.237550724637678, 0.262449275362322, 1.09578260869565, -0.0708840579710177,
0.762449275362322, -0.904217391304348, -1.73755072463768, 0.929115942028982,
-2.23755072463768, -0.404217391304348, -0.237550724637678, -0.237550724637678,
-0.570884057971018, -1.57088405797102, 1.09578260869565, -1.07088405797102,
1.42911594202898, -0.404217391304348, -1.57088405797102, 0.595782608695652,
-0.570884057971018, 0.429115942028982, 1.42911594202898), Emc = c(0.843115942028983,
0.676449275362322, 1.34311594202898, 1.00978260869565, 0.843115942028983,
2.34311594202898, 0.00978260869565251, 0.509782608695653, 2.34311594202898,
0.176449275362322, -1.49021739130435, 0.00978260869565251, 0.843115942028983,
1.67644927536232, 0.676449275362322, 1.17644927536232, 2.00978260869565,
-2.49021739130435, -1.32355072463768, -1.32355072463768, -1.65688405797102,
-0.490217391304347, -0.823550724637677, -1.32355072463768, -0.323550724637678,
-1.15688405797102, 2.00978260869565, -0.656884057971018, -0.156884057971017,
-1.65688405797102, -0.490217391304347, -0.490217391304347, -1.49021739130435,
0.843115942028983, -0.656884057971018, -0.656884057971018, -0.156884057971017,
1.17644927536232, -0.156884057971017, 1.34311594202898, -0.656884057971018,
-1.82355072463768, -0.323550724637678, 0.176449275362322, 0.00978260869565251,
-0.656884057971018, 2.00978260869565, -0.990217391304347, -0.156884057971017,
1.00978260869565), Pmc = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
-0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
-0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5), OxP = c(0.881224637681161,
-0.118775362318839, 0.381224637681161, 0.131224637681161, 1.04789130434783,
0.131224637681161, -0.702108695652174, 0.047891304347826, 0.297891304347826,
-0.118775362318839, -0.452108695652174, -0.452108695652174, 0.131224637681161,
-0.0354420289855089, 0.214557971014491, -0.368775362318839, 0.797891304347826,
0.547891304347826, -1.20210869565217, 0.131224637681161, -0.618775362318839,
-0.535442028985509, -0.368775362318839, -0.952108695652174, -0.0354420289855089,
1.21455797101449, -0.297891304347826, 0.118775362318839, -0.131224637681161,
-0.547891304347826, 0.0354420289855089, -0.381224637681161, 0.452108695652174,
0.868775362318839, -0.464557971014491, 1.11877536231884, 0.202108695652174,
0.118775362318839, 0.118775362318839, 0.285442028985509, 0.785442028985509,
-0.547891304347826, 0.535442028985509, -0.714557971014491, 0.202108695652174,
0.785442028985509, -0.297891304347826, 0.285442028985509, -0.214557971014491,
-0.714557971014491), ExP = c(0.421557971014491, 0.338224637681161,
0.671557971014491, 0.504891304347826, 0.421557971014491, 1.17155797101449,
0.00489130434782625, 0.254891304347826, 1.17155797101449, 0.0882246376811611,
-0.745108695652174, 0.00489130434782625, 0.421557971014491, 0.838224637681161,
0.338224637681161, 0.588224637681161, 1.00489130434783, -1.24510869565217,
-0.661775362318839, -0.661775362318839, -0.828442028985509, -0.245108695652174,
-0.411775362318839, -0.661775362318839, -0.161775362318839, -0.578442028985509,
-1.00489130434783, 0.328442028985509, 0.0784420289855086, 0.828442028985509,
0.245108695652174, 0.245108695652174, 0.745108695652174, -0.421557971014491,
0.328442028985509, 0.328442028985509, 0.0784420289855086, -0.588224637681161,
0.0784420289855086, -0.671557971014491, 0.328442028985509, 0.911775362318839,
0.161775362318839, -0.0882246376811611, -0.00489130434782625,
0.328442028985509, -1.00489130434783, 0.495108695652174, 0.0784420289855086,
-0.504891304347826)), .Names = c("Age", "Gender", "O", "E", "Professionals",
"Job_Very_Stressful", "Omc", "Emc", "Pmc", "OxP", "ExP"), row.names = 2275:2324, class = "data.frame")
The jtools package has recently been released and has an option to center plots for the mean of covariates.
Related
I am trying to get a heat map where every column has a different color.
I have a heatmap like this:
# install.packages("reshape")
library(reshape)
library(ggplot2)
# Data
set.seed(8)
m <- matrix(round(rnorm(200), 2), 5, 5)
colnames(m) <- paste("Row", 1:5)
rownames(m) <- paste("col", 1:5)
# long format
df <- melt(m)
colnames(df) <- c("x", "y", "value")
ggplot(df, aes(x = x, y = y, fill = value)) +
geom_tile()
I would like to get for each columun col1,col2,col3,col4, and col5 a different color.
For example:
For col1 blue, col2 2 green, violet for col3, yellow for col4 and orange in col5.
I need to catch these ideas because I am doing the next plot with the next dataset:
dput(bdd)
structure(list(var = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L), .Label = c("var_1", "var_2",
"var_3", "var_4", "var_5", "var_6", "var_7", "var_8", "var_9",
"var_10", "var_11", "var_12", "var_13", "var_14"), class = "factor"),
value = c(4.93, 2.85, 2.075, 1.91, 1.73, 1.34, 0.615, 0.145,
0.14, 0.11, 0.09, 0.06, 0.06, 0.015, 4.13, 1.65, 1.985, 0.51,
5.805, 0.84, 1.28, 0.03, 0.235, 0.145, 0.145, 0.205, 0.03,
0.2, 1.135, 2.175, 2.735, 1.69, 0.86, 0.715, 1.905, 0.17,
0.86, 0.055, 0.03, 0.075, 0.14, 0.005, 3.55, 4.225, 5.985,
0.185, 1.17, 0.91, 0.49, 1.34, 0.485, 0.1, 0.145, 1.145,
0.53, 0.11, 12.06, 1.995, 2.205, 0.48, 1.875, 2.03, 0.335,
0.26, 1.25, 0.225, 0.245, 0.52, 0.075, 0.04), country = structure(c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), .Label = character(0)),
country1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L), .Label = c("C1", "C2", "C3", "C4", "C5"), class = "factor")), row.names = c(NA,
-70L), class = c("tbl_df", "tbl", "data.frame"))
ggplot(data=bdd,aes(x=country1,y=var,fill=value))+
geom_tile(aes(alpha=value,fill=country),color="white")+
geom_text(aes(label = sprintf("%0.3f", round(value, digits = 3))))+
scale_fill_gradient(low="white", high="blue")+
scale_alpha(range = c(0, 1))+
theme_classic()+theme(axis.title.x=element_blank(), axis.text.x=element_text(angle=0,hjust=0.5,vjust=0.5), legend.position = "none")+
labs( fill="% ",y = "y ")
But what I need is every column with a different color as in the first example.
Best.
ggplot(data=bdd,aes(x=country1,y=var,fill=country1))+
geom_tile(aes(alpha=value),color="white")+
geom_text(aes(label = sprintf("%0.3f", round(value, digits = 3))))+
scale_alpha(range = c(0, 1))+
theme_classic()+theme(axis.title.x=element_blank(), axis.text.x=element_text(angle=0,hjust=0.5,vjust=0.5), legend.position = "none")+
labs( fill="% ",y = "y ")
To specify the colors for each column to be different than the default spectrum, you could use one of the discrete fill options like scale_fill_discrete, scale_fill_manual, or a custom palette like ggthemes::scale_fill_tableau(palette = "Nuriel Stone")
Is there a way to make a file with the correlation statistic between the raw number of fish observed ("num") and each environmental data column ("temp", "do", etc.) by species ("group")?
*As well as correlations between the means and medians of num vs. env. factors?
I'd also like to be able to choose which correlation method to use (Pearson correlation, Kendall rank correlation, Spearman correlation, etc.)
My data:
zeros <- structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("2019", "2020"), class = "factor"), season = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("dry", "wet"), class = "factor"),
site = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L,
3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L
), .Label = c("Hardhead silverside", "Sailfin molly"), class = "factor"),
num = c(0, 8, 0, 9, 0, 13, 0, 9, 0, 10, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 7, 0, 2,
0, 3, 0, 13, 0), temp = c(23L, 36L, 35L, 34L, 30L, 28L, 18L,
19L, 33L, 33L, 25L, 20L, 33L, 23L, 36L, 32L, 28L, 17L, 34L,
31L, 26L, 34L, 26L, 35L, 15L, 25L, 26L, 20L, 18L, 14L, 23L,
17L, 26L, 17L, 17L, 19L, 29L, 31L, 18L, 15L), sal = c(12.5,
25.5, 8.5, 15.5, 17.5, 27.5, 9.5, 31.5, 1.5, 34.5, 25.5,
21.5, 10.5, 8.5, 32.5, 19.5, 6.5, 5.5, 15.5, 28.5, 6.5, 3.5,
29.5, 13.5, 7.5, 16.5, 3.5, 28.5, 22.5, 5.5, 9.5, 12.5, 29.5,
24.5, 8.5, 32.5, 37.5, 3.5, 12.5, 19.5), do = c(9.66, 7.66,
1.66, 14.66, 15.66, 1.66, 14.66, 15.66, 0.66, 5.66, 10.66,
11.66, 4.66, 0.66, 13.66, 1.66, 13.66, 6.66, 6.66, 10.66,
9.66, 15.66, 9.66, 15.66, 4.66, 13.66, 1.66, 11.66, 6.66,
8.66, 12.66, 0.66, 6.66, 0.66, 9.66, 16.66, 1.66, 10.66,
15.66, 10.66), depth = c(120L, 161L, 52L, 52L, 43L, 105L,
165L, 23L, 79L, 136L, 41L, 59L, 65L, 118L, 122L, 69L, 137L,
88L, 152L, 105L, 108L, 79L, 96L, 80L, 22L, 110L, 157L, 118L,
126L, 93L, 156L, 64L, 74L, 24L, 111L, 113L, 157L, 78L, 121L,
130L)), class = "data.frame", row.names = c(NA, -40L))
The first part of your question is straightforward:
zeros.spl <- split(zeros, zeros$group)
zeros.cors <- sapply(zeros.spl, function(x) cor(x[, "num"], x[, 6:9]))
dimnames(zeros.cors)[[1]] <- colnames(zeros)[6:9]
zeros.cors
# Hardhead silverside Sailfin molly
# temp -0.3080334 0.36174046
# sal 0.1393580 0.47095129
# do 0.2544695 -0.06646818
# depth 0.1296208 0.08777425
t(zeros.cors)
# temp sal do depth
# Hardhead silverside -0.3080334 0.1393580 0.25446948 0.12962078
# Sailfin molly 0.3617405 0.4709513 -0.06646818 0.08777425
Use write.csv(zeros.cors, file="results.csv") or write.csv(t(zeros.cors), file="results.csv") depending on what you want the rows/cols to be.
The second question is not clear. The means/medians of a group will be a single value so you cannot correlate it with the environmental variables. You could compute the means by group with aggregate:
aggregate(zeros[, 5:9], by=list(zeros$group), "mean")
# Group.1 num temp sal do depth
# 1 Hardhead silverside 1.45 25.95 15.35 8.51 105.20
# 2 Sailfin molly 2.45 25.00 18.90 9.06 90.25
aggregate(zeros[, 5:9], by=list(zeros$group), "median")
# Group.1 num temp sal do depth
# 1 Hardhead silverside 0 26 11.5 9.66 115.5
# 2 Sailfin molly 0 24 19.5 10.66 90.5
Data Sets
> dput(head(spdistuc,50))
structure(list(Lane = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), Vehicle.class = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
speedmph = c(0, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2,
30.6, 34, 37.4, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2,
30.6, 34, 37.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2, 30.6,
34, 0, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2, 30.6,
34, 37.4, 0, 3.4, 6.8, 10.2, 13.6, 17), cprob = c(0, 0.01,
0.04, 0.08, 0.14, 0.22, 0.32, 0.5, 0.73, 0.95, 0.99, 1, 0,
0, 0.03, 0.07, 0.16, 0.3, 0.51, 0.81, 0.99, 1, 1, 0, 0.03,
0.05, 0.1, 0.21, 0.49, 0.84, 1, 1, 0, 0, 0.01, 0.01, 0.06,
0.1, 0.17, 0.4, 0.76, 0.95, 1, 1, 0, 0, 0.01, 0.01, 0.02,
0.04)), .Names = c("Lane", "Vehicle.class", "speedmph", "cprob"
), row.names = c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L,
40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 88L, 89L, 90L, 91L, 92L,
93L), class = "data.frame")
> dput(head(cspdistuv,50))
structure(list(lanem = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), cars.speedmph = c(18,
20, 22, 24, 26, 28, 30, 32, 34, 36, 10, 15, 20, 25, 30, 35, 5,
10, 15, 20, 25, 30, 35, 0, 5, 10, 15, 20, 25, 30, 35, 5, 10,
15, 20, 25, 30, 35), cars.prob = c(0, 0.13, 0.17, 0.2, 0.2, 0.27,
0.37, 0.8, 0.97, 1, 0, 0.03, 0.13, 0.4, 0.77, 1, 0, 0.03, 0.17,
0.27, 0.5, 0.8, 1, 0, 0.03, 0.1, 0.27, 0.53, 0.6, 0.83, 1, 0,
0.07, 0.17, 0.33, 0.53, 0.8, 1)), .Names = c("lanem", "cars.speedmph",
"cars.prob"), row.names = c(NA, 38L), class = "data.frame")
Problem
I plotted the spdistuc:
cu1 <- ggplot() + geom_point(data = spdistuc, mapping = aes(x=speedmph, y = cprob, color = 'observed')) + facet_wrap(~Lane) + theme_bw() + my.theme()
This gave me following:
But when I added another plot on the existing one,
cu2 <- cu1 + geom_point(data = cspdistuv, mapping = aes(x = cars.speedmph, y = cars.prob, color = 'simulated-default')) + facet_wrap(~lanem)
I got the following:
Question
Why the existing plot ("observed") changed? You can see more than 1 point for a single value on x-axis. What am I doing wrong?
Expanding my comment into an answer:
The problem is you use "Lane" in the first dataset and "lanem" in the second.
This can be fixed by making the column names the same.
names(cspdistuv)[names(cspdistuv) == "lanem"] <- "Lane"
When this change is made, you should not need to include facet_wrap in your cu2 definition. It will still be remembered from cu1's definition.
As similar question has been asked before (here) but I can't adjust the solution provided there to my specific problem. For every plot shown below, there should be a x as well as y-axis.
The data:
dput(df_nSubj)
structure(list(nSubj = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 50L, 50L, 50L,
50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L,
50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L), family = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("AOV", "MLM1", "MLM2",
"MLM3"), class = "factor"), Spher = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L), .Label = c("met", "vio"), class = "factor"),
effSize = c(0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8, 0.2, 0.5, 0.8,
0.2, 0.5, 0.8), pow = c(0.12, 0.53, 0.84, 0.1, 0.4, 0.74,
0.13, 0.55, 0.84, 0.11, 0.44, 0.74, 0.12, 0.5, 0.82, 0.1,
0.43, 0.76, 0.12, 0.49, 0.81, 0.1, 0.43, 0.76, 0.2, 0.84,
0.99, 0.15, 0.72, 0.97, 0.21, 0.81, 0.98, 0.17, 0.69, 0.95,
0.2, 0.83, 0.99, 0.17, 0.75, 0.97, 0.19, 0.82, 0.99, 0.17,
0.75, 0.97, 0.32, 0.95, 1, 0.23, 0.87, 1, 0.32, 0.92, 1,
0.25, 0.83, 0.99, 0.3, 0.94, 1, 0.24, 0.89, 1, 0.3, 0.94,
1, 0.24, 0.89, 1, 0.41, 0.99, 1, 0.29, 0.96, 1, 0.4, 0.97,
1, 0.3, 0.92, 1, 0.38, 0.99, 1, 0.32, 0.97, 1, 0.37, 0.98,
1, 0.32, 0.97, 1, 0.5, 1, 1, 0.36, 0.98, 1, 0.47, 0.99, 1,
0.36, 0.96, 1, 0.47, 1, 1, 0.4, 0.99, 1, 0.46, 1, 1, 0.4,
0.99, 1)), class = "data.frame", .Names = c("nSubj", "family",
"Spher", "effSize", "pow"), row.names = c(NA, -120L))
plot:
require(ggplot2)
require(grid)
pl1 <- ggplot(data=df_nSubj,aes(x=nSubj,y=pow,group=family))+
geom_point(aes(shape=family))+geom_line()+
labs(x="Number of subjects",y="Power",shape="")+
scale_y_continuous(limits=c(0.2,1),breaks=c(0.2,0.4,0.6,0.8,1))+
guides(shape = guide_legend(ncol = 4))+
facet_grid(Spher~effSize)+
theme_bw()+
theme(legend.position = "top",
panel.margin = unit(2, "lines"),
legend.key = element_blank(),
strip.text.x = element_blank(),
strip.text.y = element_blank(),
strip.background = element_blank(),
panel.border=element_blank(),
axis.line=element_line(),
axis.title.x = element_text(vjust=-0.5))
Thanks in advance
Change facet_grid(Spher~effSize) to facet_wrap(Spher~effSize, scales = "free")
I'm late to this party, but I have two workarounds currently.
The first option is to add horizontal and/or vertical line geoms with intercepts set to -Inf, which requires turning off plot clipping. To the original plot, add the following lines:
geom_hline(aes(yintercept=-Inf)) +
geom_vline(aes(xintercept=-Inf)) +
coord_cartesian(clip="off")
Fake Axis Option
Alternatively, use the lemon package and its facet command. This will reproduce the axes and their ticks without adding the labels.
library(lemon)
And change facet_grid(Spher~effSize) to facet_rep_grid(Spher~effSize)
Lemon Package Option
The ggh4x package also has this functionality via its facet_wrap2 function
I have a data frame for which I'm computing a linear model and would like to include the correlation coefficient and its significance using geom_text.
structure(list(ppno = c(1L, 1L, 1L, 10L, 10L, 10L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L,
8L, 8L, 9L, 9L, 9L), light.color = structure(c(1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("B", "IR",
"IR+B"), class = "factor"), session = c(2L, 1L, 3L, 2L, 3L, 1L,
1L, 3L, 2L, 3L, 2L, 1L, 2L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 3L, 2L,
1L, 3L, 1L, 3L, 2L, 3L, 2L, 1L), time = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("pre",
"post"), class = "factor"), pre.pri.s = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), pre.pri.r = c(8L, 4L, 6L,
2L, 2L, 4L, 10L, 12L, 9L, 24L, 16L, 15L, 15L, 15L, 15L, 3L, 5L,
7L, 13L, 11L, 12L, 16L, 15L, 14L, 21L, 5L, 8L, 1L, 0L, 0L), pre.nwc = c(5L,
2L, 4L, 2L, 2L, 4L, 10L, 10L, 9L, 11L, 10L, 11L, 12L, 11L, 11L,
3L, 5L, 6L, 9L, 11L, 12L, 12L, 11L, 10L, 11L, 5L, 8L, 1L, 0L,
0L), pre.ppi = structure(c(3L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 2L,
3L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 2L, NA, 2L, 2L, 3L, 3L, 3L, 4L,
2L, 3L, 1L, 1L, 1L), .Label = c("1", "2", "3", "4", "NULL"), class = "factor"),
pre.pri.nwc = c(1.6, 2, 1.5, 1, 1, 1, 1, 1.2, 1, 2.18181818181818,
1.6, 1.36363636363636, 1.25, 1.36363636363636, 1.36363636363636,
1, 1, 1.16666666666667, 1.44444444444444, 1, 1, 1.33333333333333,
1.36363636363636, 1.4, 1.90909090909091, 1, 1, 1, NaN, NaN
), post.pri.s = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), post.pri.r = c(4L, 4L, 7L, 0L, 0L, 4L,
3L, 8L, 7L, 16L, 12L, 19L, 6L, 10L, 4L, 1L, 3L, 0L, 3L, 11L,
15L, 8L, 9L, 9L, 8L, 4L, 3L, 0L, 0L, 0L), post.nwc = c(4L,
3L, 4L, 0L, 0L, 3L, 3L, 8L, 7L, 10L, 9L, 15L, 5L, 9L, 4L,
1L, 3L, 0L, 3L, 8L, 13L, 8L, 9L, 9L, 8L, 4L, 3L, 0L, 0L,
0L), post.ppi = structure(c(2L, 2L, 3L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 2L, 5L, 1L, 1L, NA, 3L, 2L, 1L, 1L,
2L, 3L, 2L, 2L, 1L, 1L, 1L), .Label = c("1", "2", "3", "4",
"NULL"), class = "factor"), post.pri.nwc = c(1, 1.33333333333333,
1.75, NaN, NaN, 1.33333333333333, 1, 1, 1, 1.6, 1.33333333333333,
1.26666666666667, 1.2, 1.11111111111111, 1, 1, 1, NaN, 1,
1.375, 1.15384615384615, 1, 1, 1, 1, 1, 1, NaN, NaN, NaN),
delta.pri.r = c(4, 0.1, -1, 2, 2, 0.1, 7, 4, 2, 8, 4, -4,
9, 5, 11, 2, 2, 7, 10, 0.1, -3, 8, 6, 5, 13, 1, 5, 1, 0.1,
0.1), delta.nwc = c(1, -1, 0.1, 2, 2, 1, 7, 2, 2, 1, 1, -4,
7, 2, 7, 2, 2, 6, 6, 3, -1, 4, 2, 1, 3, 1, 5, 1, 0.1, 0.1
), delta.pri.nwc = c(-0.6, -0.666666666666667, 0.25, NaN,
NaN, 0.333333333333333, 0.1, -0.2, 0.1, -0.581818181818182,
-0.266666666666667, -0.0969696969696969, -0.05, -0.252525252525252,
-0.363636363636364, 0.1, 0.1, NaN, -0.444444444444444, 0.375,
0.153846153846154, -0.333333333333333, -0.363636363636364,
-0.4, -0.90909090909091, 0.1, 0.1, NaN, NaN, NaN), delta.vas = c(4.081632,
-43.877544, -8.163264, -2.040816, 0.510204, 9.183672, 8.163264,
8.163264, 11.224488, 0, -14.285712, -11.224488, 19.387752,
0, 26.530608, 2.040816, 10.20408, 11.224488, 42.346932, -10.20408,
-28.06122, 11.224488, 5.612244, 21.428568, 22.448976, 0,
23.469384, 0.510204, -1.020408, 0)), .Names = c("ppno", "light.color",
"session", "time", "pre.pri.s", "pre.pri.r", "pre.nwc", "pre.ppi",
"pre.pri.nwc", "post.pri.s", "post.pri.r", "post.nwc", "post.ppi",
"post.pri.nwc", "delta.pri.r", "delta.nwc", "delta.pri.nwc",
"delta.vas"), row.names = c(NA, -30L), class = "data.frame")
Using this code for the plot.
p <- ggplot(data=mpq.vas, mapping=aes(x=delta.vas, y=delta.pri.r,
colour=light.color)) +
geom_point() +
geom_smooth(aes(group=1), method="lm", size=1, colour="black")
#
# Clean up the basics.
pp <- p + geom_hline(yintercept=0, colour="grey60") +
geom_vline(xintercept=0, colour="grey60") +
scale_colour_manual(name="Treatment\ncolor", values=cols) +
scale_x_continuous(name=
expression(paste(Delta, " VAS pain [t(0) - t(60)]")))+
scale_y_continuous(name=expression(paste(Delta, "PRI(r) [pre - post]")))
#
# Add correlation info.
val <- cor.test(mpq.vas$delta.vas, mpq.vas$delta.pri.r)
When I then try to add the correlation coefficient somewhere in the text, I get an error about an unexpected symbol at the location of the Q in the label.
pp + geom_text(aes(x=20, y=-5, label=paste("italic(r) ==", 3, "Q", sep=" ")),
parse=TRUE, colour="black")
(yes, I know a correlation of 3 is impossible, just an example).
I would like to do:
pp + geom_text(aes(x=20, y=-5, label=paste("italic(r) ==", round(val$estimate, digits=2), "\np < 0.0001", sep=" ")), parse=TRUE, colour="black")
But this generates the same error, now at the \n thingy. What am I doing wrong?
pp + geom_text(aes(x=20, y=-5,
label=paste("list(italic(r) ==", round(val$estimate, digits=2), ", p < 0.0001)")),
parse=TRUE, colour="black")
The key is that the label argument is parsed if parse==TRUE, this means that the texts need to have a same format as in ?plotmath.
What the geom_text exactly do is like this:
expr <- parse(text=label)
and then draw text using the expr as a label. So label argument need to be a valid expression. In you example,
paste("italic(r) ==", 3, "Q", sep=" ")
is invalid expression, so
parse(text=paste("italic(r) ==", 3, "Q", sep=" "))
induces an error.
In plotmath, if you want to concat symbols, then you need to use:
paste(x, y, z)
list(x, y, z)
So if you want to simply concat, then
geom_text(foobar, label=paste("paste(italic(r) ==", 3, "Q)", sep=" "))
The first (outside) paste concats a piece of texts into one text variable.
The second (inside) paste is used in plotmath process.
In my example above, I used list (see ?plotmath) instead of paste, because stats and p value is separated by `,'.