Related
I have a dataset containing male and female data. I have the response variable metabolic rate, and several predictors (Behaviour, Temperature 1, Temperature 2, Activity, Sex, Body Size, and Body Mass).
First, I conduct the GLMM using an identity link on male and female combined data:
glmer(log(Metabolic_Rate)~ Temperature.1 + Behaviour * Temperature.2 + Sex + Activity + Body_Size + Body_Mass + (1|Week), data= AMRdata, family = Gamma(link = 'identity'))
And then run model simplification. The model works just fine and I have no error messages.
Then, I separate the data by sex and run the model on just female data:
females<-subset(AMRdata,Sex=="F")
glmer(log(Metabolic_Rate)~ Temperature.1 + Behaviour * Temperature.2 + Activity + Body_Size + Body_Mass + (1|Week), data = females, family = Gamma(link = 'identity'))
I am greeted by the error message:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev, :
(maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
Removal of Temperature.2 takes removes this error, but I really need to keep it in the model.
Does anyone have any suggestions as to how to remove the error?
I'm using lme4 version 1.1.21, and the female model doesn't work if I don't use the log transformation, either. I have tried using a log link, but I get the same error message.
Some data:
structure(list(Metabolic_Rate = c(8.79514591, 16.71840387, 14.1932374,
10.90741585, 10.7436911, 14.97469781, 19.88267242, 12.43274774,
15.12038794, 11.84916117, 11.05467852, 19.53495917, 12.14440531,
12.09564168, 6.78392472, 10.51570692, 8.527792046, 8.731880804,
10.71404367), Behaviour = c(23L, 17L, 14L, 7L, 99L, 78L, 90L,
1L, 9L, 29L, 76L, 66L, 43L, 36L, 13L, 4L, 82L, 14L, 59L), Temperature.1 = c(21.9,
21.7, 18.52, 19.85, 20.45, 20.54, 21.7, 22, 21.32, 21.4, 21.44,
22.1, 22.22, 22.25, 20.43, 20.9, 21.63, 21.2, 21.52), Temperature.2 = c(17.5,
15.6, 12.5, 19.8, 16.6, 20.8, 21.4, 21.9, 21, 21.3, 20.5, 22,
22.1, 22.2, 20.6, 21.2, 21.9, 21.1, 21.5), Activity = c(39.54664352,
66.75914352, 40.85949074, 44.8505787, 37.20023148, 69.75388889,
72.43981481, 70.42199074, 20.71481481, 77.27662037, 62.21712963,
93.22673611, 82.39247685, 89.42141204, 35.35729167, 31.97777778,
74.65821759, 40.80590278, 54.3755787), Sex = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("F", "M"), class = "factor"), Body_Size = c(7.6,
5.8, 7.9, 7.6, 8, 7.5, 7.9, 7.6, 7.2, 7.8, 7.8, 7.4, 7.1, 8.4,
6, 7.9, 7.2, 7, 8.2), Body_Mass = c(0.747, 0.55, 0.76, 0.673,
0.691, 0.683, 0.689, 0.789, 0.6, 0.612, 0.637, 0.511, 0.582,
0.603, 0.408, 0.527, 0.666, 0.483, 0.602), Week = c(1L, 1L, 2L,
3L, 3L, 3L, 3L, 4L, 5L, 6L, 6L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 10L
)), class = "data.frame", row.names = c(NA, -19L))
there are 8 tests and numerous demographic variables. I want to omit those who dont have complete test data, and compare demographics with the original dataset to see if there is selection bias. which I will do with chi-square.
I already tried na.omit and only ended up with a new data frame of the 8 variables.
dput(head(df))
structure(list(ï..leerlingnr2013 = 10048001:10048006, schoolnr = c(1004L,
1004L, 1004L, 1004L, 1004L, 1004L), toets_ws = c(78, 91, 75,
98, 79, 92), toets_dmt = c(103, 97, 112, 98, 71, 112), toets_bl = c(35,
57, 55, 63, 15, 46), toets_rw = c(109, 100, 115, 113, 92, 99),
citotaal = c(72L, 81L, 81L, 82L, 61L, 85L), citorekwisk = c(50L,
49L, 49L, 42L, 40L, 46L), citostudiev = c(31L, 36L, 35L,
34L, 31L, 34L), citowereld = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), gebmaand = c(6L,
6L, 3L, 6L, 7L, 1L), gebjaar = c(2001L, 2002L, 2002L, 2001L,
2001L, 2002L), geslacht = c(1L, 2L, 2L, 2L, 1L, 1L), oplei_vader = c(3L,
3L, 3L, NA, 2L, NA), oplei_moeder = c(1L, 2L, 1L, 3L, 2L,
2L), CoolSES = c(3L, 3L, 3L, 4L, 2L, 2L), zorgleerling = c(0L,
0L, 0L, 0L, 1L, 0L), welblk = c(3.71428571428571, 3.71428571428571,
4.28571428571429, 3.71428571428571, 3.71428571428571, 3.42857142857143
), welbmll = c(3.66666666666667, 3.66666666666667, 3.83333333333333,
2.83333333333333, 2.66666666666667, 4.16666666666667), zelfvertr = c(4.16666666666667,
2.16666666666667, 3.66666666666667, 4.16666666666667, 3,
3.66666666666667), taak = c(3.8, 3.8, 4.6, 4.6, 4.2, 3.4),
bekwaming = c(3.77777777777778, 3.44444444444444, 4.11111111111111,
4.66666666666667, 3, 3.33333333333333), extrinsiek = c(3,
2.66666666666667, 3.66666666666667, 3.44444444444444, 2.11111111111111,
3.33333333333333), prestatie = c(2.57142857142857, 3.85714285714286,
3.28571428571429, 1.57142857142857, 1.71428571428571, 2.28571428571429
), sociaal = c(3.57142857142857, 2.57142857142857, 3.42857142857143,
3.57142857142857, 3.28571428571429, 3.28571428571429)), row.names = c(NA,
6L), class = "data.frame")
My goal is to plot a geom_smooth (in the first instance) based on a linear model controlling (or centering) for covariates. I can easily produce a chart that plots the IV (x) and the DV (y), but I can’t seem to figure out how to adjust the DV for covariates.
Variables:
"Age" = as stated (covariate to be controlled for)
"Gender" = as stated (covariate to be controlled for)
"O" = Openness (non-mean-centred personality trait - see "Omc")
"E" = Extraversion (the non-mean-centred independent variable to be plotted)
"Professionals" = Occupation (the non-mean-centred moderator variable to be plotted)
"Omc" = Mean-centred Openness (covariate to be controlled for in the lm model)
"Emc" = Mean-centred Extraversion (independent variable in the lm model)
"Pmc" = Mean-centred Professionals (moderator variable in the lm model)
"OxP" = Openness x Professionals interaction term (controlled for in the lm model)
"ExP" = Extraversion x Professionals interaction term (of primary interest in the lm model and serves as the justification for the ggplot)
Here is the code I have for the model and the associated chart:
lm.js <- lm(Job_Very_Stressful ~ Age + Gender + Omc + Emc + Pmc + OxP + ExP, data = df.eg)
summary(lm.js)
ggplot(df.eg, aes(x = E, y = Job_Very_Stressful, col = Professionals)) + geom_smooth(method = "lm", alpha = 0.25) + labs(x = "Extraversion", y = "Job Stress") + theme(legend.title=element_blank(), legend.position = "top", text = element_text(size=13), axis.text = element_text(size=10), panel.grid.major = element_line(colour = "grey", size = 0.5, 3)) + coord_cartesian(ylim = c(1.00, 10.00), xlim = c(1.00, 7.00)) + scale_y_continuous(breaks = seq(1.00, 10.00, 0.50)) + scale_x_continuous(breaks = seq(1.00, 7.00, 1.00))
Sample data using dput(head(df.eg, 50)) (these data will not produce a significant lm result, but it doesn't matter for this purpose):
structure(list(Age = c(37L, 66L, 33L, 55L, 60L, 61L, 27L, 54L,
55L, 33L, 33L, 27L, 20L, 25L, 18L, 38L, 36L, 41L, 38L, 58L, 37L,
32L, 45L, 24L, 51L, 37L, 15L, 48L, 43L, 19L, 49L, 39L, 38L, 28L,
42L, 26L, 37L, 58L, 55L, 46L, 57L, 45L, 16L, 27L, 33L, 58L, 23L,
60L, 30L, 24L), Gender = c(1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L), O = c(6.16666666666667,
4.16666666666667, 5.16666666666667, 4.66666666666667, 6.5, 4.66666666666667,
3, 4.5, 5, 4.16666666666667, 3.5, 3.5, 4.66666666666667, 4.33333333333333,
4.83333333333333, 3.66666666666667, 6, 5.5, 2, 4.66666666666667,
3.16666666666667, 3.33333333333333, 3.66666666666667, 2.5, 4.33333333333333,
6.83333333333333, 5, 4.16666666666667, 4.66666666666667, 5.5,
4.33333333333333, 5.16666666666667, 3.5, 2.66666666666667, 5.33333333333333,
2.16666666666667, 4, 4.16666666666667, 4.16666666666667, 3.83333333333333,
2.83333333333333, 5.5, 3.33333333333333, 5.83333333333333, 4,
2.83333333333333, 5, 3.83333333333333, 4.83333333333333, 5.83333333333333
), E = c(5.33333333333333, 5.16666666666667, 5.83333333333333,
5.5, 5.33333333333333, 6.83333333333333, 4.5, 5, 6.83333333333333,
4.66666666666667, 3, 4.5, 5.33333333333333, 6.16666666666667,
5.16666666666667, 5.66666666666667, 6.5, 2, 3.16666666666667,
3.16666666666667, 2.83333333333333, 4, 3.66666666666667, 3.16666666666667,
4.16666666666667, 3.33333333333333, 6.5, 3.83333333333333, 4.33333333333333,
2.83333333333333, 4, 4, 3, 5.33333333333333, 3.83333333333333,
3.83333333333333, 4.33333333333333, 5.66666666666667, 4.33333333333333,
5.83333333333333, 3.83333333333333, 2.66666666666667, 4.16666666666667,
4.66666666666667, 4.5, 3.83333333333333, 6.5, 3.5, 4.33333333333333,
5.5), Professionals = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Job_Very_Stressful = c(1L, 1L,
5L, 7L, 1L, 3L, 5L, 5L, 4L, 6L, 4L, 1L, 2L, 2L, 2L, 4L, 2L, 1L,
1L, 2L, 5L, 5L, 2L, 6L, 5L, 5L, 2L, 5L, 3L, 1L, 2L, 2L, 7L, 1L,
2L, 3L, 5L, 1L, 3L, 3L, 5L, 6L, 3L, 4L, 4L, 3L, 3L, 1L, 3L, 5L
), Omc = c(1.76244927536232, -0.237550724637678, 0.762449275362322,
0.262449275362322, 2.09578260869565, 0.262449275362322, -1.40421739130435,
0.0957826086956519, 0.595782608695652, -0.237550724637678, -0.904217391304348,
-0.904217391304348, 0.262449275362322, -0.0708840579710177, 0.429115942028982,
-0.737550724637678, 1.59578260869565, 1.09578260869565, -2.40421739130435,
0.262449275362322, -1.23755072463768, -1.07088405797102, -0.737550724637678,
-1.90421739130435, -0.0708840579710177, 2.42911594202898, 0.595782608695652,
-0.237550724637678, 0.262449275362322, 1.09578260869565, -0.0708840579710177,
0.762449275362322, -0.904217391304348, -1.73755072463768, 0.929115942028982,
-2.23755072463768, -0.404217391304348, -0.237550724637678, -0.237550724637678,
-0.570884057971018, -1.57088405797102, 1.09578260869565, -1.07088405797102,
1.42911594202898, -0.404217391304348, -1.57088405797102, 0.595782608695652,
-0.570884057971018, 0.429115942028982, 1.42911594202898), Emc = c(0.843115942028983,
0.676449275362322, 1.34311594202898, 1.00978260869565, 0.843115942028983,
2.34311594202898, 0.00978260869565251, 0.509782608695653, 2.34311594202898,
0.176449275362322, -1.49021739130435, 0.00978260869565251, 0.843115942028983,
1.67644927536232, 0.676449275362322, 1.17644927536232, 2.00978260869565,
-2.49021739130435, -1.32355072463768, -1.32355072463768, -1.65688405797102,
-0.490217391304347, -0.823550724637677, -1.32355072463768, -0.323550724637678,
-1.15688405797102, 2.00978260869565, -0.656884057971018, -0.156884057971017,
-1.65688405797102, -0.490217391304347, -0.490217391304347, -1.49021739130435,
0.843115942028983, -0.656884057971018, -0.656884057971018, -0.156884057971017,
1.17644927536232, -0.156884057971017, 1.34311594202898, -0.656884057971018,
-1.82355072463768, -0.323550724637678, 0.176449275362322, 0.00978260869565251,
-0.656884057971018, 2.00978260869565, -0.990217391304347, -0.156884057971017,
1.00978260869565), Pmc = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
-0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
-0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5), OxP = c(0.881224637681161,
-0.118775362318839, 0.381224637681161, 0.131224637681161, 1.04789130434783,
0.131224637681161, -0.702108695652174, 0.047891304347826, 0.297891304347826,
-0.118775362318839, -0.452108695652174, -0.452108695652174, 0.131224637681161,
-0.0354420289855089, 0.214557971014491, -0.368775362318839, 0.797891304347826,
0.547891304347826, -1.20210869565217, 0.131224637681161, -0.618775362318839,
-0.535442028985509, -0.368775362318839, -0.952108695652174, -0.0354420289855089,
1.21455797101449, -0.297891304347826, 0.118775362318839, -0.131224637681161,
-0.547891304347826, 0.0354420289855089, -0.381224637681161, 0.452108695652174,
0.868775362318839, -0.464557971014491, 1.11877536231884, 0.202108695652174,
0.118775362318839, 0.118775362318839, 0.285442028985509, 0.785442028985509,
-0.547891304347826, 0.535442028985509, -0.714557971014491, 0.202108695652174,
0.785442028985509, -0.297891304347826, 0.285442028985509, -0.214557971014491,
-0.714557971014491), ExP = c(0.421557971014491, 0.338224637681161,
0.671557971014491, 0.504891304347826, 0.421557971014491, 1.17155797101449,
0.00489130434782625, 0.254891304347826, 1.17155797101449, 0.0882246376811611,
-0.745108695652174, 0.00489130434782625, 0.421557971014491, 0.838224637681161,
0.338224637681161, 0.588224637681161, 1.00489130434783, -1.24510869565217,
-0.661775362318839, -0.661775362318839, -0.828442028985509, -0.245108695652174,
-0.411775362318839, -0.661775362318839, -0.161775362318839, -0.578442028985509,
-1.00489130434783, 0.328442028985509, 0.0784420289855086, 0.828442028985509,
0.245108695652174, 0.245108695652174, 0.745108695652174, -0.421557971014491,
0.328442028985509, 0.328442028985509, 0.0784420289855086, -0.588224637681161,
0.0784420289855086, -0.671557971014491, 0.328442028985509, 0.911775362318839,
0.161775362318839, -0.0882246376811611, -0.00489130434782625,
0.328442028985509, -1.00489130434783, 0.495108695652174, 0.0784420289855086,
-0.504891304347826)), .Names = c("Age", "Gender", "O", "E", "Professionals",
"Job_Very_Stressful", "Omc", "Emc", "Pmc", "OxP", "ExP"), row.names = 2275:2324, class = "data.frame")
The jtools package has recently been released and has an option to center plots for the mean of covariates.
Data Sets
> dput(head(spdistbc,50))
structure(list(Lane = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), Vehicle.class = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
speedmph = c(0, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2,
30.6, 34, 37.4, 40.8, 0, 3.4, 6.8, 10.2, 13.6, 17, 20.4,
23.8, 27.2, 30.6, 34, 37.4, 40.8, 3.4, 6.8, 10.2, 13.6, 17,
20.4, 23.8, 27.2, 30.6, 34, 37.4, 40.8, 0, 3.4, 6.8, 10.2,
13.6, 17, 20.4, 23.8, 27.2, 30.6, 34, 37.4), cprob = c(0,
0, 0.03, 0.06, 0.11, 0.2, 0.28, 0.43, 0.56, 0.75, 0.91, 0.97,
1, 0, 0, 0.01, 0.01, 0.02, 0.05, 0.17, 0.36, 0.57, 0.76,
0.93, 0.99, 1, 0, 0.01, 0.01, 0.04, 0.07, 0.16, 0.32, 0.55,
0.76, 0.94, 0.99, 1, 0, 0, 0, 0.01, 0.03, 0.06, 0.11, 0.25,
0.47, 0.74, 0.92, 0.98)), .Names = c("Lane", "Vehicle.class",
"speedmph", "cprob"), row.names = c(7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 26L, 27L, 28L, 29L, 30L, 31L,
32L, 33L, 34L, 35L, 36L, 37L, 38L, 42L, 43L, 44L, 45L, 46L, 47L,
48L, 49L, 50L, 51L, 52L, 53L, 66L, 67L, 68L, 69L, 70L, 71L, 72L,
73L, 74L, 75L, 76L, 77L), class = "data.frame")
> dput(head(cspdistbv,50))
structure(list(lanem = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L), cars = structure(c(34, 35, 36, 37, 38, 39, 40, 24,
26, 28, 30, 32, 34, 36, 38, 40, 20, 25, 30, 35, 40, 10, 15, 20,
25, 30, 35, 40, 10, 15, 20, 25, 30, 35, 40, 35, 40, 45, 50, 55,
0, 0.03, 0.07, 0.17, 0.67, 0.93, 1, 0, 0.03, 0.1, 0.1, 0.2, 0.27,
0.33, 0.8, 1, 0, 0.1, 0.31, 0.52, 1, 0, 0.07, 0.27, 0.37, 0.5,
0.77, 1, 0, 0.03, 0.07, 0.23, 0.4, 0.77, 1, 0, 0.13, 0.47, 0.77,
1), .Dim = c(40L, 2L), .Dimnames = list(NULL, c("speedmph", "prob"
)))), .Names = c("lanem", "cars"), row.names = c(NA, 40L), class = "data.frame")
Problem
I created the plot using spdistbc:
cb1 <- ggplot() + geom_point(data = spdistbc, mapping = aes(x=speedmph, y = cprob, color = 'observed')) + facet_wrap(~Lane) + theme_bw() + my.theme()
Which gave me this:
But when I combine another plot from the second data frame using following code:
cb2 <- cb1 + geom_point(data = cspdistbv, mapping = aes(x = cars.speedmph, y = cars.prob, color = 'simulated-default')) + facet_wrap(~lanem)
I get the error:
Error in eval(expr, envir, enclos) : object 'cars.speedmph' not found
Question
You can see in the cspdistbv data frame, there is a column named cars.speedmph, then why R can't find it? Please help.
Somehow you've created an invalid data.frame. You've stored a matrix in the second column of cspdistbv; dim(cspdistbv) thinks it only has two columns and this interferes with proper naming and such. I'm not sure how you created it, but you can fix it with
cspdistbv <- cbind.data.frame(lanem=cspdistbv[,1], cspdistbv[,2])
And then
cb1 <- ggplot() + geom_point(data = spdistbc, mapping = aes(x=speedmph,
y = cprob, color = 'observed')) + facet_wrap(~Lane) + theme_bw()
cb2 <- cb1 + geom_point(data = cspdistbv, mapping = aes(x = speedmph,
y = prob, color = 'simulated-default')) + facet_wrap(~lanem)
should work
Data Sets
> dput(head(spdistuc,50))
structure(list(Lane = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), Vehicle.class = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
speedmph = c(0, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2,
30.6, 34, 37.4, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2,
30.6, 34, 37.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2, 30.6,
34, 0, 3.4, 6.8, 10.2, 13.6, 17, 20.4, 23.8, 27.2, 30.6,
34, 37.4, 0, 3.4, 6.8, 10.2, 13.6, 17), cprob = c(0, 0.01,
0.04, 0.08, 0.14, 0.22, 0.32, 0.5, 0.73, 0.95, 0.99, 1, 0,
0, 0.03, 0.07, 0.16, 0.3, 0.51, 0.81, 0.99, 1, 1, 0, 0.03,
0.05, 0.1, 0.21, 0.49, 0.84, 1, 1, 0, 0, 0.01, 0.01, 0.06,
0.1, 0.17, 0.4, 0.76, 0.95, 1, 1, 0, 0, 0.01, 0.01, 0.02,
0.04)), .Names = c("Lane", "Vehicle.class", "speedmph", "cprob"
), row.names = c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L,
40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 88L, 89L, 90L, 91L, 92L,
93L), class = "data.frame")
> dput(head(cspdistuv,50))
structure(list(lanem = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), cars.speedmph = c(18,
20, 22, 24, 26, 28, 30, 32, 34, 36, 10, 15, 20, 25, 30, 35, 5,
10, 15, 20, 25, 30, 35, 0, 5, 10, 15, 20, 25, 30, 35, 5, 10,
15, 20, 25, 30, 35), cars.prob = c(0, 0.13, 0.17, 0.2, 0.2, 0.27,
0.37, 0.8, 0.97, 1, 0, 0.03, 0.13, 0.4, 0.77, 1, 0, 0.03, 0.17,
0.27, 0.5, 0.8, 1, 0, 0.03, 0.1, 0.27, 0.53, 0.6, 0.83, 1, 0,
0.07, 0.17, 0.33, 0.53, 0.8, 1)), .Names = c("lanem", "cars.speedmph",
"cars.prob"), row.names = c(NA, 38L), class = "data.frame")
Problem
I plotted the spdistuc:
cu1 <- ggplot() + geom_point(data = spdistuc, mapping = aes(x=speedmph, y = cprob, color = 'observed')) + facet_wrap(~Lane) + theme_bw() + my.theme()
This gave me following:
But when I added another plot on the existing one,
cu2 <- cu1 + geom_point(data = cspdistuv, mapping = aes(x = cars.speedmph, y = cars.prob, color = 'simulated-default')) + facet_wrap(~lanem)
I got the following:
Question
Why the existing plot ("observed") changed? You can see more than 1 point for a single value on x-axis. What am I doing wrong?
Expanding my comment into an answer:
The problem is you use "Lane" in the first dataset and "lanem" in the second.
This can be fixed by making the column names the same.
names(cspdistuv)[names(cspdistuv) == "lanem"] <- "Lane"
When this change is made, you should not need to include facet_wrap in your cu2 definition. It will still be remembered from cu1's definition.