how to use method="nlsLM" (in packages minpack.lm) in geom_smooth - r

test <- data.frame(Exp = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6), t = c(0, 0.33, 0.67,
1, 1.33, 1.67, 2, 4, 6, 8, 10, 0, 33, 0.67, 1, 1.33, 1.67, 2, 4, 6, 8, 10,
0, 0.33, 0.67, 1, 1.33, 1.67, 2, 4, 6, 8, 10), fold = c(1,
0.957066345654286, 1.24139015724819, 1.62889151698633, 1.72008539595879,
1.82725412314402, 1.93164365299958, 1.9722929538061, 2.15842019312484,
1.9200507796933, 1.95804730344453, 1, 0.836176542548747, 1.07077717914707,
1.45471712491441, 1.61069357875771, 1.75576377806756, 1.89280913889538,
2.00219054189937, 1.87795513639311, 1.85242493827193, 1.7409346372629, 1,
0.840498729335292, 0.904130905000499, 1.23116185602517, 1.41897551928886,
1.60167656534099, 1.72389226836308, 1.80635095956481, 1.76640786872057,
1.74327897001172, 1.63581509884482))
d <- ggplot(test,aes(x=t, y=fold))+
#to make it obvious I use argument names instead of positional matching
geom_point()+
geom_smooth(method="nls",
formula=y~1+Vmax*(1-exp(-x/tau)), # this is an nls argument
method.args = list(start=c(tau=0.2,Vmax=2)), # this too
se=FALSE)
I find the code here in this site, but I wonder how to change method="nls" to method = "nlsLM" in geom_smooth, as the original "nls" is really a big problem to me when setting the start values.
Is there any ways to use packages from cran in the method of geom_smooth in ggplot2?
Thanks

You don't seem to have tried anything. You can simply do the obvious:
library(ggplot2)
library(minpack.lm)
d <- ggplot(test,aes(x=t, y=fold))+
geom_point()+
geom_smooth(method="nlsLM",
formula=y~1+Vmax*(1-exp(-x/tau)),
method.args = list(start=c(tau=0.2,Vmax=2)),
se=FALSE)
print(d)
#works
Note that convergence problems do not have an easy one-size-fits-all solution. Sometimes minpack can help, but often it will simply give you a bad fit where nls helpfully throws an error.

It's probably best to keep your nls results in a separate data frame, and plot the two items separately:
ggplot() +
geom_point(aes(x=t, y=fold), data = test) +
geom_line(aes(...), data = my.nls.results)

Use geom_line() instead.
For example, let's say you're working with mtcars and your formula is mpg ~ k / wt + b
nls_model <- nls(mpg ~ k / wt + b, data, etc.)
ggplot(...) +
geom_line(stat = "smooth",
method = "nls",
formula = y ~ k / x + b,
method.args = list(start = as.list(coef(nls_model))),
se = FALSE)
This worked for me even with nlsLM, too. The idea, too, behind coef(nls_model) is to use the coefficients of your successful model as the starting values in the geom_line so you get the same model. Just make sure you use y and x in the formula inside geom_line.

Related

Plotting an Exponential Best Fit Curve to ggplot2 using Stat_smooth [duplicate]

I am trying to fit data on an exponential decay function (RC like system) with equation:
My data are on the following dataframe:
dataset <- data.frame(Exp = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6), t = c(0, 0.33, 0.67, 1, 1.33, 1.67, 2, 4, 6, 8, 10, 0, 33, 0.67, 1, 1.33, 1.67, 2, 4, 6, 8, 10, 0, 0.33, 0.67, 1, 1.33, 1.67, 2, 4, 6, 8, 10), fold = c(1, 0.957066345654286, 1.24139015724819, 1.62889151698633, 1.72008539595879, 1.82725412314402, 1.93164365299958, 1.9722929538061, 2.15842019312484, 1.9200507796933, 1.95804730344453, 1, 0.836176542548747, 1.07077717914707, 1.45471712491441, 1.61069357875771, 1.75576377806756, 1.89280913889538, 2.00219054189937, 1.87795513639311, 1.85242493827193, 1.7409346372629, 1, 0.840498729335292, 0.904130905000499, 1.23116185602517, 1.41897551928886, 1.60167656534099, 1.72389226836308, 1.80635095956481, 1.76640786872057, 1.74327897001172, 1.63581509884482))
I have 3 experiment (Exp: 4, 5 and 6) data I want to fit each experiment on the given equation.
I have managed to do it for of the experiment by subsetting my data and using the parameter calculated by nls
test <- subset(dataset,Exp==4)
fit1 = nls(fold ~ 1+(Vmax*(1-exp(-t/tau))),
data=test,
start=c(tau=0.2,Vmax=2))
ggplot(test,aes(t,fold))+
stat_function(fun=function(t){1+coef(fit1)[[2]]*(1-exp(-t/coef(fit1)[[1]]))})+
geom_point()
But if I try to use the geom_smooth function directly on the full dataset with this code
d <- ggplot(test,aes(t,fold))+
geom_point()+
geom_smooth(method="nls",
formula='fold~1+Vmax*(1-exp(-t/tau))',
start=c(tau=0.2,Fmax=2))
print(d)
I get the following error:
Error in model.frame.default(formula = ~fold, data = data, weights = weight) :
variable lengths differ (found for '(weights)')
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Is there anything wrong with my syntax? I would have this one working in order to use the same function on the dataset and using group to have one fit per Exp level.
There are several problems:
formula is a parameter of nls and you need to pass a formula object to it and not a character.
ggplot2 passes y and x to nls and not fold and t.
By default, stat_smooth tries to get the confidence interval. That isn't implemented in predict.nls.
In summary:
d <- ggplot(test,aes(x=t, y=fold))+
#to make it obvious I use argument names instead of positional matching
geom_point()+
geom_smooth(method="nls",
formula=y~1+Vmax*(1-exp(-x/tau)), # this is an nls argument,
#but stat_smooth passes the parameter along
start=c(tau=0.2,Vmax=2), # this too
se=FALSE) # this is an argument to stat_smooth and
# switches off drawing confidence intervals
Edit:
After the major ggplot2 update to version 2, you need:
geom_smooth(method="nls",
formula=y~1+Vmax*(1-exp(-x/tau)), # this is an nls argument
method.args = list(start=c(tau=0.2,Vmax=2)), # this too
se=FALSE)

Passing smooth line through all data points with more than 50 points

I have data that looks like:
year mean.streak
1958 2.142857
1959 3.066667
1960 2.166667
1961 2.190476
The code for my plot with localized regression looks like:
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
geom_smooth(method = 'loess')
and outputs:
I'd like to capture the somewhat sinusoidal pattern of the data by passing a smooth line through all of the data points, rather than the typical jagged geom_line. I tried polynomial interpolation by writing:
ggplot(df)+
geom_point(aes(x = year, y = mean.streak, colour = year), size = 3) +
stat_smooth(aes(x = year, y = mean.streak), method = "lm",
formula = y ~ poly(x, 57), se = FALSE)
Taken from this thread. But I get the error:
Warning message:
Computation failed in `stat_smooth()`:
'degree' must be less than number of unique points
seemingly because there are too many datapoints, as this answer seems to indicate.
Is there a way to pass a smooth line through all the data with 59 data points?
Full data is:
structure(list(year = 1958:2016, mean.streak = c(2.14285714285714,
3.06666666666667, 2.16666666666667, 2.19047619047619, 2.35, 2.42857142857143,
2.28571428571429, 1.92592592592593, 1.69230769230769, 2.61111111111111,
3, 2.94117647058824, 2.2, 2.5, 2.13636363636364, 1.76923076923077,
1.36111111111111, 1.41176470588235, 1.76, 2, 2.63157894736842,
2.08695652173913, 2.86666666666667, 2.125, 3, 3.125, 2.57894736842105,
1.84, 1.46666666666667, 1.7037037037037, 1.625, 1.67741935483871,
1.84, 1.6, 3, 3.11111111111111, 3.66666666666667, 4.18181818181818,
2.85714285714286, 3.66666666666667, 2.66666666666667, 2.92857142857143,
3.1875, 2.76923076923077, 5.375, 5.18181818181818, 4.08333333333333,
6.85714285714286, 2.77777777777778, 2.76470588235294, 3.15384615384615,
3.83333333333333, 3.06666666666667, 3.07692307692308, 4.41666666666667,
4.9, 5.22222222222222, 5, 5.27272727272727), median.streak = c(1,
3, 1.5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 2,
2, 3, 2, 2, 2.5, 2, 2, 1, 1, 1, 1, 1, 1, 1.5, 2, 4, 4, 1, 3,
2, 2.5, 2, 2, 5.5, 4, 2.5, 9, 2, 2, 2, 1.5, 2, 3, 2.5, 4.5, 4,
5, 4), max.streak = c(6, 6, 9, 7, 5, 5, 7, 4, 3, 7, 9, 7, 6,
6, 6, 4, 3, 4, 4, 10, 8, 6, 6, 5, 10, 8, 5, 6, 3, 4, 4, 4, 4,
5, 8, 8, 11, 8, 8, 11, 10, 5, 12, 7, 10, 12, 12, 10, 7, 10, 10,
14, 9, 7, 9, 12, 10, 14, 12), mean.std = c(-0.73762950487994,
-0.480997734887942, -0.517355702126398, -0.387678832192802, -0.315808940316265,
-0.455313725347534, -0.520453518496716, -0.598412265824216, -0.523171795723798,
-0.62285788065637, -0.54170040191883, -0.590289727314622, -0.468222025966258,
-0.639180735884434, -0.656427002478427, -0.565745564840106, -0.473399411312895,
-0.564475310127763, -0.493531273810312, -0.543209721496256, -0.640240670332106,
-0.510337503791441, -0.596096374402028, -0.504696265560619, -0.620412635042488,
-0.497008319856979, -0.546623513153538, -0.613345407826292, -0.564945850817486,
-0.581770706442245, -0.5709080560492, -0.627986564445679, -0.680973485641403,
-0.548092447365696, -0.554620596559388, -0.483847268000936, -0.67619820292833,
-0.613245144944101, -0.509832316970819, -0.302654541906113, -0.623276311320811,
-0.431421947082012, -0.525548788393688, -0.244995094473986, -0.412444188256097,
-0.112114155982405, -0.299486359079708, -0.300201791042539, -0.240281366191648,
-0.359719754440627, -0.511417389357902, -0.474906675611613, -0.312106332395495,
-0.449137693833681, -0.526248555772371, -0.56052848268042, -0.390017880007091,
-0.537267264953157, -0.444528236868953)), class = c("tbl_df",
"tbl", "data.frame"), .Names = c("year", "mean.streak", "median.streak",
"max.streak", "mean.std"), row.names = c(NA, -59L))
Adjust the span:
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
stat_smooth(method = 'loess', span = 0.3)
Or use a spline:
library(splines)
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
stat_smooth(method = 'lm', formula = y ~ ns(x, 10))
Generally, you don't want to fit an extremely high-degree polynomial. Such fits look awful. It would be much better to fit an actual time series model to your data:
library(forecast)
library(zoo)
ggplot(aes(x = year, y = mean.streak, color = year), data = streaks)+
geom_point(color = 'black')+
geom_line(data = data.frame(year = sort(streaks$year),
mean.streak = fitted(auto.arima(zoo(streaks$mean.streak,
order.by = streaks$year)))),
show.legend = FALSE)

How to update a list in a for loop(cannot store ggplot object into the list) [duplicate]

My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.
To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.
(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]])
through print(myplots[[4]]) one at a time.)
Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.
(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)
Here is a reproducible example:
library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function
#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
When I look at a summary of a plot object in the plot list, this is what I see
> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping: x = data2[, i]
faceting: facet_null()
-----------------------------------
geom_histogram: fill = lightgreen
stat_bin:
position_stack: (width = NULL, height = NULL)
I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.
Thanks!
In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_histogram(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).
1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.
Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- eval(substitute(
ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
,list(i = i)))
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
Using lapply works too as x exists within the anonymous function environment (using mtcars as data):
plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
ggplot(data = mtcars) +
geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
theme_wsj() +
scale_colour_wsj("colors6")
})
I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.
Here is the code with the visualizations:
Question
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_bar(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Answer
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_bar(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
multiplot(plotlist = myplots, cols = 4)
Same result using lapply:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_bar(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Created on 2021-04-09 by the reprex package (v0.3.0)

R: displaying scientific notation

chocolate <- data.frame(
Sabor =
c(5, 7, 3,
4, 2, 6,
5, 3, 6,
5, 6, 0,
7, 4, 0,
7, 7, 0,
6, 6, 0,
4, 6, 1,
6, 4, 0,
7, 7, 0,
2, 4, 0,
5, 7, 4,
7, 5, 0,
4, 5, 0,
6, 6, 3
),
Tipo = factor(rep(c("A", "B", "C"), 15)),
Provador = factor(rep(1:15, rep(3, 15))))
tapply(chocolate$Sabor, chocolate$Tipo, mean)
ajuste <- lm(chocolate$Sabor ~ chocolate$Tipo + chocolate$Provador)
summary(ajuste)
anova(ajuste)
a1 <- aov(chocolate$Sabor ~ chocolate$Tipo + chocolate$Provador)
posthoc <- TukeyHSD(x=a1, 'chocolate$Tipo', conf.level=0.95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = chocolate$Sabor ~ chocolate$Tipo + chocolate$Provador)
$`chocolate$Tipo`
diff lwr upr p adj
B-A -0.06666667 -1.803101 1.669768 0.9950379
C-A -3.80000000 -5.536435 -2.063565 0.0000260
C-B -3.73333333 -5.469768 -1.996899 0.0000337
Here is some sample code using TukeyHSD. The output is a matrix, and I want the values to be displayed in scientific notation. I've tried using scipen and setting options(digits = 20) but some of my values from my actual data are still way too small so that the p adj values are 0.00000000000000000000
How can I get the values to be displayed in scientific notation?
You could do this:
format(posthoc, scientific = TRUE)
If you want to change the number of digits, for instance using 3, you could do this:
format(posthoc, scientific = TRUE, digits = 3)

how to make barplot bars same size in plot window in R using barplot function

I would like to plot 3 plots in the same window. Each will have a different amount of bar plots. How could I make them all the same size and close together (same distance from each other) without doing NAs in the smaller barplots. example code below. I do want to point out my real data will be plotting numbers from dataframes$columns not a vector of numbers as shown below. I am sure there is magic way to do this but cant seem to find helpful info on the net. thanks
pdf(file="PATH".pdf");
par(mfrow=c(1,3));
par(mar=c(9,6,4,2)+0.1);
barcenter1<- barplot(c(1,2,3,4,5));
mtext("Average Emergent", side=2, line=4);
par(mar=c(9,2,4,2)+0.1);
barcenter2<- barplot(c(1,2,3));
par(mar=c(9,2,4,2)+0.1);
barcenter3<- barplot(c(1,2,3,4,5,6,7));
Or would there be a way instead of using the par(mfrow....) to make a plot window, could we group the barcenter data on a single plot with an empty space between the bars? This way everything is spaced and looks the same?
Using the parameters xlim and width:
par(mfrow = c(1, 3))
par(mar = c(9, 6, 4, 2) + 0.1)
barcenter1 <- barplot(c(1, 2, 3, 4, 5), xlim = c(0, 1), width = 0.1)
mtext("Average Emergent", side = 2, line = 4)
par(mar = c(9, 2, 4, 2) + 0.1)
barcenter2 <- barplot(c(1, 2, 3), xlim = c(0, 1), width = 0.1)
par(mar = c(9, 2, 4, 2) + 0.1)
barcenter1 <- barplot(c(1, 2, 3, 4, 5, 6, 7), xlim = c(0, 1), width = 0.1)
Introducing zeroes:
df <- data.frame(barcenter1 = c(1, 2, 3, 4, 5, 0, 0),
barcenter2 = c(1, 2, 3, 0, 0, 0, 0),
barcenter3 = c(1, 2, 3, 4, 5, 6, 7))
barplot(as.matrix(df), beside = TRUE)
With ggplot2 you can get something like this:
df <- data.frame(x=c(1, 2, 3, 4, 5,1, 2, 3,1, 2, 3, 4, 5, 6, 7),
y=c(rep("bar1",5), rep("bar2",3),rep("bar3",7)))
library(ggplot2)
ggplot(data=df, aes(x = x, y = x)) +
geom_bar(stat = "identity")+
facet_grid(~ y)
For the option you mentioned in your second comment you would need:
x <- c(1, 2, 3, 4, 5, NA, 1, 2, 3, NA, 1, 2, 3, 4, 5, 6, 7)
barplot(x)

Resources