Related
I want to add regression lines to my graph. geom_smooth only adds lines to my figure legend however.
Is the current code I've tried (geom_smooth has the exact same result) along with various attempts at tinkering myself. DGRP.Line & Diet is a factor while Weighted.average is an integer.
RENA <- read.csv('RENA.csv')
RENAVG <- aggregate(Weighted.average~Diet+DGRP.Line, data = RENA, FUN = sum)
ggplot(RENAVG, aes(x=DGRP.Line, y=Weighted.average, colour=Diet))+
geom_point()+
stat_smooth(method='lm')
I'm not sure if the failure to properly regress is a consequence of DGRP.Line being a factor or not. But I'd expect geom_smooth to just form regression lines from my .csv file anyway (RENAVG)
On another attempt using the main RENA.CSV I get this error
"mapping must be created by aes()"
but I'm not sure if that's relevant to the RENAVG I made on R.
My graph is included below. As you can see the figure legend is lined, but no regression lines are on the actual data set.
Edit:
I tried converting my original ggplot (excluding smooth) to its own 'RENAVGPLOT'. Then added smooth in as RENAVGPLOT _ Geom_Smooth, resulting in: Error: Don't know how to add RENAVGPLOT to a plot
dput(RENAVG)
structure(list(Diet = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Control",
"Rena"), class = "factor"), DGRP.Line = structure(c(1L, 1L, 3L,
3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L,
11L, 11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L,
17L, 18L, 18L, 19L, 19L, 20L, 20L), .Label = c("105a", "105b",
"348", "354", "362a", "362b", "391a", "391b", "392", "397", "405",
"486a", "486b", "712", "721", "737", "757a", "757b", "853", "879"
), class = "factor"), Weighted.average = c(3.618181818, 7.516666667,
7.5, 10.464285714, 5.830882353, 7.0625, 6.411392405, 7.413953488,
6.079053054, 7.0375, 6.373640273, 10.406521739, 6.948020792,
9.851458886, 9.176727909, 10.164712153, 6.23826291, 11.023310023,
7.908730159, 9.537815126, 5.314323607, 6.655822854, 5.669226044,
7.818181818, 4.761481935, 9.468873129, 6.577764637, 12.170588235,
5.742087177, 10.529411765, 8.891608391, 2, 11.036572623, 3, 9.739878543,
9.782051282, 7.741384687, 8.739583333)), row.names = c(NA, -38L
), class = "data.frame")
>
Example of what I'd like
mtcars
ggplot(mtcars, aes(x=mpg, y=wt, colour=cyl)) +
geom_point()+
geom_smooth()
I am trying to evaluate if different populations reach different asymptotes using NLS, in R. Here I have two data.frames df1 has only one population (Represented by Site)
df1<- structure(list(Site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("ALT01",
"ALT02", "ALT03", "Cotton", "Deep", "Eckhardt", "Green", "Johnson",
"Kissinger", "Marsh", "Sand", "Shypoke", "Sora", "Spike", "Tamora",
"WRP01", "WRP05", "WRP08", "WRP10", "WRP11", "WRP12", "WRP14",
"WRP15", "WRP18"), class = "factor"), Nets = 1:18, Cumulative.spp = c(12L,
13L, 15L, 17L, 17L, 17L, 17L, 19L, 19L, 19L, 19L, 20L, 22L, 22L,
22L, 22L, 22L, 22L)), .Names = c("Site", "Nets", "Cumulative.spp"
), row.names = c(NA, 18L), class = "data.frame")
and df2 has to populations (Again represented by Site)
df2 <- structure(list(Site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("ALT01",
"ALT02", "ALT03", "Cotton", "Deep", "Eckhardt", "Green", "Johnson",
"Kissinger", "Marsh", "Sand", "Shypoke", "Sora", "Spike", "Tamora",
"WRP01", "WRP05", "WRP08", "WRP10", "WRP11", "WRP12", "WRP14",
"WRP15", "WRP18"), class = "factor"), Nets = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L), Cumulative.spp = c(12L, 13L, 15L, 17L, 17L,
17L, 17L, 19L, 19L, 19L, 19L, 20L, 22L, 22L, 22L, 22L, 22L, 22L,
7L, 10L, 11L, 12L, 13L, 14L, 14L, 14L, 15L, 15L, 16L, 16L, 16L,
16L, 16L, 17L, 17L, 17L)), .Names = c("Site", "Nets", "Cumulative.spp"
), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 25L, 26L, 27L, 28L, 29L, 30L,
31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L), class = "data.frame")
When I model for one population everything looks great:
Model1<-nls(Cumulative.spp ~ SSasympOff(Nets, A, lrc, c0), data = df1)
What I am trying to do is see if I can add several populations to the same model and add a Site Variable, I have tried this:
Model2<-nls(Cumulative.spp ~ SSasympOff(Nets, A, lrc, c0) + Site , data = df2)
and this:
Model2<-nls(Cumulative.spp ~ SSasympOff(Nets + Site , A, lrc, c0), data = df2)
But no luck so far, any help would be appreciated.
We assume that you want to have different Asym parameters for the two populations but common lrc and c0 parameters.
First in (1) we show how to modify the solution in the question to get the answer. Most of the code in (1) is just to get starting values but the actual fit is only one line of code -- two lines if you count the fact that we defined the formula in a separate line.
Then in (2) we show how to simplify (1) by using algorithm "plinear" eliminating the need to get starting values for the linear parameters. In (2a) we show a further simplification which extends more readily to more sites and in (2b) we simplify that further under the condition that all sites are present (which is not the case in the question but may be the case in the real data).
1) default algorithm We can get starting values in nls by fitting each population separately (fm1, fm2) and together (fm3). Finally fit the model with different Asym parameters (fm4).
# get starting values
fo <- Cumulative.spp ~ SSasympOff(Nets, A, lrc, c0)
fm1 <- nls(fo, df2, subset = Site == "ALT01")
fm2 <- nls(fo, df2, subset = Site == "ALT03")
fm3 <- nls(fo, df2)
st <- c(A1 = coef(fm1)[["A"]], A2 = coef(fm2)[["A"]], coef(fm3)[c("lrc", "c0")])
# fit
fo4 <- Cumulative.spp ~ SSasympOff(Nets, A1*(Site=="ALT01")+A2*(Site=="ALT03"), lrc, c0)
fm4 <- nls(fo4, data = df2, start = st)
plot(Cumulative.spp ~ Nets, df2, col = Site)
points(fitted(fm4) ~ Nets, df2, col = "red", pch = 20)
2) plinear Actually Asym is special since the model is linear in it and we can use this to simplify the above as we don't need starting values for the linear parameters if we switch to algorithm="plinear". This eliminates the need to run fm1 and fm2. We only need fm3 to generate starting values. Note that "plinear" requires that the RHS of the formula be a matrix with each column multiplying the coefficient of one linear parameter. Here we have two linear parameters (the Asym for each Site) so it is a two-column matrix.
# get starting values
fo <- Cumulative.spp ~ SSasympOff(Nets, A, lrc, c0)
fm3 <- nls(fo, df2)
st5 <- coef(fm3)[c("lrc", "c0")]
# fit
mm <- with(df2, cbind(Site=="ALT01", Site=="ALT03"))
fo5 <- Cumulative.spp ~ mm * SSasympOff(Nets,1,lrc,c0)
fm5 <- nls(fo5, data = df2, start = st5, algorithm = "plinear")
2a) mm could alternately be written like this which has the advantage that it extends to more sites:
mm <- model.matrix(~ Site - 1, transform(df2, Site = droplevels(Site)))
2b) If all levels of the Site factor are represented in the data then we could simplify even further as droplevels(Site) (which drops the unused levels) could then be simply Site allowing us to write:
mm <- model.matrix(~ Site - 1, df2)
Update: Some fixes and improvements.
How can the analysis of repeated replicated design given on this page ( https://stats.stackexchange.com/questions/115135/repeated-measures-anova-with-replicated-measurements ) be done in R? I can perform ANOVA using aov() but I have some doubts as to the Error term there.
The data is as follows:
mydf = structure(list(User = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
Mode = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Trial1Time = c(20L,
5L, 40L, 10L, 15L, 30L, 13L, 11L, 35L), Trial2Time = c(30L,
7L, 25L, 20L, 17L, 35L, 26L, 11L, 38L)), .Names = c("User",
"Mode", "Trial1Time", "Trial2Time"), class = "data.frame", row.names = c(NA,
-9L))
I have the following data frame summary created with dplyr
structure(list(maxrep = c(7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 12L, 12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L,
18L, 18L, 19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 24L,
24L, 26L, 26L), div = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Premier Division",
"Second Division"), class = "factor"), freq = c(1L, 10L, 4L,
39L, 26L, 89L, 73L, 146L, 107L, 162L, 117L, 133L, 121L, 125L,
116L, 91L, 110L, 65L, 95L, 43L, 75L, 38L, 43L, 24L, 38L, 16L,
36L, 5L, 15L, 2L, 9L, 7L, 9L, 1L, 3L, 3L, 2L, 1L)), .Names = c("maxrep",
"div", "freq"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -38L))
My intention is to use ggplot2 to plot line graphs of 2 lines with different colour with text labels for each value.
What I did was
ggplot(df, aes(x=maxrep, y=freq, colour=div)) +
geom_line() +
geom_text(aes(label=freq), vjust=-.5)
The result was
Now my question: All the labels in the chart are above the points in respective lines. I want to have the labels for the different colours to be in different relative position, e.g. labels for cyan above the line, and labels for red below the line (i.e. variable vjust). Is there a way to do that?
Also, is there a way to get read of the letter a in the colour legend on the right?
What about plotting the lines separately wich differing vjust values? You can get rid of a in the legend setting show_guide = FALSE.
ggplot(df, aes(x=maxrep, y=freq, colour=div, label = freq)) +
geom_line() +
geom_text(data = df[df$div == "Second Division",], vjust=2, show_guide = FALSE) + geom_text(data = df[df$div == "Premier Division",], vjust=-2, show_guide = FALSE)
Which returns:
Create a new variable in the data.frame holding the vjust adjustment parameter:
df$pos <- c(2, -2)[(df$div == "Premier Division")+1]
And you could call vjust inside aes with the new pos vector:
ggplot(df, aes(x=maxrep, y=freq, colour=div)) +
geom_line() +
geom_text(aes(label=freq, vjust=pos))
I can't get why the testing of dataset is not working in R neural networks (nnet package).
I have two datasets with similar structures - for training (trainset, 17 cases) and prediction (testset, 9 cases). Each dataset has columns: Age, Gender, Height, Weight. In the testing dataset the age is unknown (NaN).
The formula for training is obtained successfully below:
library(nnet)
trainednetwork<-nnet(age~gender+emLength+action5cnt,trainset, size=17)
Anyway, if I try to use test dataset for prediction in the next string of the code,
prediction<-predict(trainednetwork,testset)
I get mistake "No component terms, no attribute". Can anyone help?
The data (obtained with dput() function):
testset:
structure(list(
age = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_),
gender = structure(
c(2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
.Label = c("f", "m"),
class = "factor"),
Height= c(9L, 11L, 9L, 11L, 9L, 11L, 9L, 11L, 9L),
Weight= c(1L, 41L, 2L, 1L, 2L, 29L, 12L, 6L, 12L)),
.Names = c("age", "gender", "Height", "Weight"),
class = "data.frame",
row.names = c(NA, 9L))
trainset:
structure(list(
age = c(43L, 35L, 22L, 28L, 20L, 47L, 41L, 23L,
42L, 27L, 22L, 60L, 62L, 47L, 42L, 26L, 54L),
gender = structure(
c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L),
.Label = c("f", "m"),
class = "factor"),
Height= c(7L, 9L, 11L, 11L, 11L, 9L, 11L, 9L, 23L, 9L,
9L, 9L, 10L, 7L, 7L, 11L, 7L),
Weight= c(2L, 2L, 9L, 9L, 28L, 8L, 6L, 3L, 1L, 2L, 40L,
1L, 9L, 1L, 7L, 4L, 35L)),
.Names = c("age", "gender", "Height", "Weight"),
class = "data.frame",
row.names = c(NA, 17L))
I think in the R neuralnet package the command to use for prediction is "compute", not predict, which is very confusing. A