Inconsistent predictions from predict.gbm() 2.1.4 vs 2.1.3 - r

This question is related to my earlier post here.
I have tracked down the problem and it seems to be related to which version of gbm I use. The latest version, 2.1.4 exhibits the problem on my system (R 3.4.4 and also 3.5; both on Ubuntu 18.04) whereas version 2.1.3 works as expected:
mydata <- structure(list(Count = c(1L, 3L, 1L, 4L, 1L, 0L, 1L, 2L, 0L, 0L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 0L, 2L, 3L, 1L, 4L, 3L, 0L, 4L, 1L, 2L, 1L, 1L, 0L, 2L, 1L, 4L, 1L, 5L, 3L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 1L, 1L, 1L, 0L, 3L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 0L, 0L, 3L, 5L, 1L, 2L, 1L, 1L, 0L, 0L, 1L, 2L, 1L, 3L, 1L, 1L, 0L, 2L, 2L, 1L, 3L, 3L, 2L, 0L, 0L, 1L, 2L, 1L, 0L, 2L, 0L, 0L, 4L, 4L, 2L), Treat1 = structure(c(10L, 14L, 8L, 2L, 3L, 12L, 1L, 10L, 6L, 2L, 11L, 11L, 15L, 1L, 8L, 3L, 13L, 9L, 9L, 11L, 1L, 8L, 14L, 5L, 10L, 8L, 15L, 11L, 7L, 6L, 13L, 11L, 7L, 1L, 1L, 2L, 7L, 12L, 5L, 1L, 8L, 1L, 9L, 8L,12L, 14L, 12L, 7L, 8L, 14L, 3L, 3L, 5L, 1L, 1L, 11L, 6L, 5L, 5L, 13L, 9L, 3L, 8L, 9L, 13L, 9L, 7L, 9L, 2L, 6L, 10L, 3L, 11L, 4L, 3L, 15L, 12L, 6L, 4L, 3L, 8L, 8L, 11L, 1L, 11L, 2L, 11L, 5L, 12L, 6L, 8L, 14L, 1L, 9L, 9L, 10L, 10L, 5L, 14L, 3L), .Label = c("D", "U", "R", "E", "C", "Y", "L", "O", "G", "T", "N", "J", "V", "X", "A"), class = "factor"), Treat2 = structure(c(15L, 13L, 7L, 8L, 2L, 5L, 15L, 4L, 2L, 7L, 6L, 2L, 3L, 14L, 10L, 7L, 7L, 14L, 11L, 7L, 6L, 1L, 5L, 13L, 11L, 6L, 10L, 5L, 3L, 1L, 7L, 9L, 6L, 10L, 5L, 11L, 15L, 9L, 7L, 11L, 10L, 2L, 3L, 3L, 5L, 11L, 8L, 6L,4L, 5L, 15L, 8L, 8L, 2L, 2L, 10L, 4L, 1L, 10L, 11L, 10L, 8L, 7L, 7L, 8L, 14L, 16L, 11L, 10L, 9L, 3L, 15L, 13L, 1L, 11L, 11L, 9L, 7L, 10L, 9L, 3L, 7L, 5L, 13L, 3L, 14L, 10L, 10L, 15L, 13L, 15L, 12L, 14L, 11L, 5L, 4L, 2L, 3L, 11L, 10L), .Label = c("B", "X", "R", "H", "L", "D", "U", "Q", "K", "C", "T", "V", "J", "E", "F", "A"), class = "factor"), Near = c(0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0), Co1 = c(2, 5, 1, 1, 0, 1, 1, 2, 1, 2, 5, 2, 1, 0, 1, 2, 6, 3, 3, 1, 2, 2, 3, 0, 1, 0, 1, 0, 2, 1, 0, 1, 2, 3, 1, 2, 2, 0, 0, 2, 3, 3, 1, 1, NA, 2, 0, 2, 1, NA, 1, 1, 0, 1, 2, 0, 2, 1, 1, 1, 2, 3, 1, 0, 4, 0, 0, 0, 2, 2, 1, 1,2, 0, 1, 2, 1, 0, 0, 0, 0, 2, 1, 2, 2, 2, 2, 1, 0, 1, 1, 1, 1, 1, 0, 2, 0, 0, 5, 1), Co2 = c(1, 1, 2, 2, 4, 1, 3, 0, 5, 2, 2, 4, 1, 1, 2, 1, 2, 3, 0, 2, 3, 3, 0, 3, 1, 0, 1, 1, 1, 2, 0, 1, 1, 1, 2, 3, 2, 2, 3, 0, 0, 0, 1, 2, NA, 1, 1, 1, 0, 2, 1, 1, 2, 5, 0, 2, 1, 4, 1, 1, 3, 0, 1, 1, 1, 1, NA, 0, 2, 1, 1, 3, 2, 1, 2, 1, 3, 1, 2, 0, 1, 5, 2, 2, 1, 2, 3, 4, 3, 1, 1, 0, 5, 1, 1, 0, 1, 1, 2, 0)), .Names = c("Count", "Treat1", "Treat2", "Near", "Co1", "Co2"), row.names = c(1759L, 959L, 1265L, 1504L, 630L, 1905L, 1885L, 1140L, 1187L, 1792L, 1258L, 1125L, 756L, 778L, 1718L, 1797L, 388L, 715L, 63L, 311L, 1492L, 1128L, 629L, 536L, 503L, 651L, 1684L, 1893L, 721L, 1440L, 1872L, 1444L, 1593L, 143L, 1278L, 1558L, 1851L, 1168L, 1829L, 386L, 365L, 849L, 429L, 155L, 11L, 1644L, 101L, 985L, 72L, 459L, 1716L, 844L, 1313L, 77L, 1870L, 744L, 219L, 513L, 644L, 831L, 338L, 284L, 211L, 1096L,243L, 1717L, 1881L, 1784L, 1017L, 992L, 45L, 707L, 489L, 1267L, 1152L, 1819L, 995L, 510L, 1350L, 1700L, 56L, 1754L, 725L, 1625L, 319L, 1818L, 1287L, 1634L, 953L, 1351L, 1787L, 923L, 917L, 484L, 886L, 390L, 1531L, 679L, 1811L, 1736L), class = "data.frame")
detach("package:gbm", unload = TRUE )
remove.packages("gbm")
require(devtools)
install_version("gbm", version = "2.1.3")
set.seed(12345)
require(gbm)
n.trees <- 10000
m1.gbm <- gbm(Count ~ Treat1 + Treat2 + Near + Co1 + Co2, data = mydata, distribution = "poisson", n.trees = n.trees)
head(predict(m1.gbm, newdata = mydata, n.trees = n.trees, type = "response"))
[1] 0.8620154 2.8210216 0.8800267 3.7808341 0.4749737 0.3716022
predict(m1.gbm, newdata = head(mydata), n.trees = n.trees, type = "response")
[1] 0.8620154 2.8210216 0.8800267 3.7808341 0.4749737 0.3716022
...as expected. However,
detach("package:gbm", unload = TRUE )
remove.packages("gbm")
install.packages("gbm", dependencies = TRUE)
# I had to restart R after this, otherwise the following line failed with:
# Loading required package: gbm
# Error: package or namespace load failed for ‘gbm’ in get(method, envir = home):
# lazy-load database '/home/container/R/x86_64-pc-linux-gnu-library/3.5/gbm/R/gbm.rdb' is corrupt
require(gbm)
m1.gbm <- gbm(Count ~ Treat1 + Treat2 + Near + Co1 + Co2, data = mydata, distribution = "poisson", n.trees = n.trees)
head(predict(m1.gbm, newdata = mydata, n.trees = n.trees, type = "response"))
[1] 0.7524109 2.8789957 0.7843470 4.1724821 0.4525449 0.2036923
predict(m1.gbm, newdata = head(mydata), n.trees = n.trees, type = "response")
[1] 2.2216079 1.2806235 0.9109426 2.2842149 2.4828922 0.6124778
...which exhibits the problem in my earlier post.
I find this quite surprising since gbm is a well-known package, although I see that the vignette was update last month, so perhaps the latest version was only recently released. I was unable to find the exact date from here. What is the best way to proceed here ?

Related

Different colors for every column in heat map using geom_tile()

I am trying to get a heat map where every column has a different color.
I have a heatmap like this:
# install.packages("reshape")
library(reshape)
library(ggplot2)
# Data
set.seed(8)
m <- matrix(round(rnorm(200), 2), 5, 5)
colnames(m) <- paste("Row", 1:5)
rownames(m) <- paste("col", 1:5)
# long format
df <- melt(m)
colnames(df) <- c("x", "y", "value")
ggplot(df, aes(x = x, y = y, fill = value)) +
geom_tile()
I would like to get for each columun col1,col2,col3,col4, and col5 a different color.
For example:
For col1 blue, col2 2 green, violet for col3, yellow for col4 and orange in col5.
I need to catch these ideas because I am doing the next plot with the next dataset:
dput(bdd)
structure(list(var = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L), .Label = c("var_1", "var_2",
"var_3", "var_4", "var_5", "var_6", "var_7", "var_8", "var_9",
"var_10", "var_11", "var_12", "var_13", "var_14"), class = "factor"),
value = c(4.93, 2.85, 2.075, 1.91, 1.73, 1.34, 0.615, 0.145,
0.14, 0.11, 0.09, 0.06, 0.06, 0.015, 4.13, 1.65, 1.985, 0.51,
5.805, 0.84, 1.28, 0.03, 0.235, 0.145, 0.145, 0.205, 0.03,
0.2, 1.135, 2.175, 2.735, 1.69, 0.86, 0.715, 1.905, 0.17,
0.86, 0.055, 0.03, 0.075, 0.14, 0.005, 3.55, 4.225, 5.985,
0.185, 1.17, 0.91, 0.49, 1.34, 0.485, 0.1, 0.145, 1.145,
0.53, 0.11, 12.06, 1.995, 2.205, 0.48, 1.875, 2.03, 0.335,
0.26, 1.25, 0.225, 0.245, 0.52, 0.075, 0.04), country = structure(c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), .Label = character(0)),
country1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L), .Label = c("C1", "C2", "C3", "C4", "C5"), class = "factor")), row.names = c(NA,
-70L), class = c("tbl_df", "tbl", "data.frame"))
ggplot(data=bdd,aes(x=country1,y=var,fill=value))+
geom_tile(aes(alpha=value,fill=country),color="white")+
geom_text(aes(label = sprintf("%0.3f", round(value, digits = 3))))+
scale_fill_gradient(low="white", high="blue")+
scale_alpha(range = c(0, 1))+
theme_classic()+theme(axis.title.x=element_blank(), axis.text.x=element_text(angle=0,hjust=0.5,vjust=0.5), legend.position = "none")+
labs( fill="% ",y = "y ")
But what I need is every column with a different color as in the first example.
Best.
ggplot(data=bdd,aes(x=country1,y=var,fill=country1))+
geom_tile(aes(alpha=value),color="white")+
geom_text(aes(label = sprintf("%0.3f", round(value, digits = 3))))+
scale_alpha(range = c(0, 1))+
theme_classic()+theme(axis.title.x=element_blank(), axis.text.x=element_text(angle=0,hjust=0.5,vjust=0.5), legend.position = "none")+
labs( fill="% ",y = "y ")
To specify the colors for each column to be different than the default spectrum, you could use one of the discrete fill options like scale_fill_discrete, scale_fill_manual, or a custom palette like ggthemes::scale_fill_tableau(palette = "Nuriel Stone")

Is it possible to extend the intervals of the x-axis in R?

I have two plots: a barplot, and a ggplot(geom_jitter bubbleplot). Ultimately, I am using a photo editing app to line up these two plots. As you can see, the intervals in the bottom of these two plots do not match up, which is my problem here. I would like to make it so I can just change the bottom x-axis of both plots to 400 (lowest common interval to cover x-axis of both plots). I do not want to change the data values, just the axis values.
Barplot Code
GYPCdomain <- read.csv(file.choose(), header=TRUE)
GYPCbarplot <- barplot(as.matrix(GYPCdomain), horiz=TRUE, xlab = "Length (Protein Domains Shown)",
col=c("azure", "plum1", "skyblue"),
legend = c("Cytoplasmic", "Helical Membrane", "Extracellular"))
sample data:
structure(list(GYPC = c(0L, 0L, 171L, 0L, 72L, 0L, 141L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L)), class = "data.frame", row.names = c(NA, -42L))
Bubbleplot Code
library(ggplot2)
library(scales)
data(GYPC, package="ggplot2")
GYPC <- read.csv(file.choose(), header = TRUE)
GYPCggplot <- ggplot(GYPC, aes(Position, log10(Frequency)))+
geom_jitter(aes(col=Geographical.Location, size =(p.value)))+
labs(subtitle="Frequency of Various Polymorphisms", title="GYPC Gene") +
labs(color = "Geographical Location") +
labs(size = "p-value") + labs(x = "Position of Polymorphism on GYPC Gene") +
scale_size_continuous(range=c(1,4.5), trans = "reverse") +
guides(size = guide_legend(reverse = TRUE))
sample data:
structure(list(Variant = structure(c(4L, 4L, 4L, 4L, 4L, 8L,
8L, 8L, 8L, 8L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 12L,
12L, 12L, 12L, 12L, 14L, 14L, 14L, 14L, 14L, 2L, 2L, 2L, 2L,
2L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
11L, 13L, 13L, 13L, 13L, 13L, 15L, 15L, 15L, 15L, 15L), .Label = c("rs111631066",
"rs114199197", "rs115178969", "rs115201071", "rs139780142", "rs139816143",
"rs143080607", "rs143216051", "rs199797395", "rs531807314", "rs545780841",
"rs551011574", "rs560942282", "rs567759380", "rs571586275"), class = "factor"),
Position = c(213L, 213L, 213L, 213L, 213L, 60L, 60L, 60L,
60L, 60L, 249L, 249L, 249L, 249L, 249L, 183L, 183L, 183L,
183L, 183L, 282L, 282L, 282L, 282L, 282L, 294L, 294L, 294L,
294L, 294L, 150L, 150L, 150L, 150L, 150L, 135L, 135L, 135L,
135L, 135L, 258L, 258L, 258L, 258L, 258L, 255L, 255L, 255L,
255L, 255L, 138L, 138L, 138L, 138L, 138L, 159L, 159L, 159L,
159L, 159L, 141L, 141L, 141L, 141L, 141L, 198L, 198L, 198L,
198L, 198L, 258L, 258L, 258L, 258L, 258L), Geographical.Location = structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("AFR",
"AMR", "EAS", "EUR", "SAS"), class = "factor"), Frequency = c(0.023,
0.001, 0, 0, 0, 0.017, 0.001, 0, 0, 0, 0.012, 0, 0, 0, 0,
0.002, 0.003, 0.002, 0.023, 0.016, 0.001, 0, 0, 0, 0, 0,
0, 0, 0, 0.004, 0, 0, 0, 0.001, 0, 0, 0, 0, 0, 0.001, 0,
0, 0.001, 0, 0, 0.001, 0, 0, 0, 0, 0, 0.001, 0, 0, 0, 0,
0, 0, 0, 0.002, 0, 0, 0.001, 0, 0, 0, 0, 0, 0, 0.001, 0,
0, 0.001, 0, 0), pre.p.value = c(6.32e-17, 0.113, 0.00126,
0.00126, 0.00211, 2.51e-12, 0.356, 0.00806, 0.00809, 0.0139,
4.86e-10, 0.15, 0.0542, 0.0542, 0.0537, 0.000376, 0.0778,
0.0068, 7.4e-06, 0.0109, 0.264, 1, 1, 1, 1, 0.579, 1, 0.589,
0.59, 0.00144, 1, 1, 1, 0.201, 1, 1, 1, 1, 1, 0.195, 1, 1,
0.201, 1, 1, 1, 1, 0.201, 1, 1, 1, 0.139, 1, 1, 1, 1, 1,
1, 1, 0.0381, 1, 1, 0.201, 1, 1, 1, 1, 1, 1, 0.195, 1, 1,
0.201, 1, 1), p.value = c(0, 0.75, 0.5, 0.5, 0.5, 0, 0.75,
0.5, 0.5, 0.75, 0, 0.75, 0.75, 0.75, 0.75, 0.5, 0.75, 0.5,
0.25, 0.75, 0.75, 1, 1, 1, 1, 1, 1, 1, 1, 0.5, 1, 1, 1, 0.75,
1, 1, 1, 1, 1, 0.75, 1, 1, 0.75, 1, 1, 1, 1, 0.75, 1, 1,
1, 0.75, 1, 1, 1, 1, 1, 1, 1, 0.75, 1, 1, 0.75, 1, 1, 1,
1, 1, 1, 0.75, 1, 1, 0.75, 1, 1), log.p.value. = c(-16.19928292,
-0.947, -2.899629455, -2.899629455, -2.675717545, -11.60032628,
-0.449, -2.093664958, -2.092051478, -1.8569852, -9.313363731,
-0.824, -1.266000713, -1.266000713, -1.270025714, -3.424812155,
-1.11, -2.167491087, -5.13076828, -1.962573502, -0.5783960731,
0, 0, 0, 0, -0.2373214363, 0, -0.2298847052, -0.2291479884,
-2.841637508, 0, 0, 0, -0.6968039426, 0, 0, 0, 0, 0, -0.7099653886,
0, 0, -0.6968039426, 0, 0, 0, 0, -0.6968039426, 0, 0, 0,
-0.857, 0, 0, 0, 0, 0, 0, 0, -1.419075024, 0, 0, -0.6968039426,
0, 0, 0, 0, 0, 0, -0.7099653886, 0, 0, -0.6968039426, 0,
0), X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), range = structure(c(2L, 6L, 5L, 4L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "0 < p-value < 1E-9",
"1E-2 < p-value < 1", "1E-4 < p-value < 1E-2", "1E-6 < p-value < 1E-4",
"1E-9 < p-value < 1E-6"), class = "factor")), class = "data.frame", row.names = c(NA,
-75L))
I took the liberty to produce your barplot also with ggplot, because than we can use the awesome features of the cowplot package, which was made for things like these. Setting axis limits can be done with ylim() or xlim() but because of different width of the legends, we need the cowplot package to truly align the plots (or the legends would need to go below the plots)
#recreating the barplot
library(dplyr) #needed for data wrangling
GYPCbarplot_ggplot=GYPCdomain %>%
filter(GYPC>0) %>%
mutate(domain=factor(c("Cytoplasmic", "Helical Membrane", "Extracellular"),
levels=c("Cytoplasmic", "Helical Membrane", "Extracellular"),
ordered = T)) %>%
ggplot(aes(x=1,y=GYPC,fill=domain))+
geom_col(position="stack")+
scale_fill_manual(values=c("Cytoplasmic"="azure", "Helical Membrane"="plum1", "Extracellular"="skyblue"))+
coord_flip()+
xlab("GYPC")+
ylab( "Length (Protein Domains Shown)")+
ylim(0,400)+ #creates the limit
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
#the bubbleplot
GYPC_bubbleplot <- ggplot(GYPC_data, aes(Position, log10(Frequency)))+
geom_jitter(aes(col=Geographical.Location, size =(p.value)))+
labs(subtitle="Frequency of Various Polymorphisms", title="GYPC Gene") +
labs(color = "Geographical Location") +
labs(size = "p-value") + labs(x = "Position of Polymorphism on GYPC Gene") +
scale_size_continuous(range=c(1,4.5), trans = "reverse") +
guides(size = guide_legend(reverse = TRUE))+
xlim(0,400) #added this limit
library(cowplot) #used to arrange the two plots
plot_grid(GYPCbarplot_ggplot,GYPC_bubbleplot,
ncol = 1, #both plots in one column (below each other)
align = "v", #align both bottom axes
rel_heights = c(1,1.5) #make bottom plot a bit higher
)
et voila:
If I understand correctly, the OP is asking to synchronise the x-axes in order to show the protein domains a certain position on the GYPC gene belongs to.
If my assumption is correct then there is an alternative approach which fills the background of the bubble plot according to the protein domains:
library(dplyr)
domain_name <- c("Cytoplasmic", "Helical Membrane", "Extracellular")
domain_fill <- c("azure", "plum1", "skyblue")
names(domain_fill) <- domain_name
GPYCdomain_2 <- GYPCdomain %>%
filter(GYPC > 0) %>%
mutate(domain_name = forcats::fct_inorder(rev(domain_name)),
end_pos = cumsum(GYPC),
start_pos = lag(end_pos, default = 0L))
library(ggplot2)
ggplot(GYPC, aes(Position, log10(Frequency))) +
geom_rect(aes(xmin = start_pos, xmax = end_pos, ymin = -Inf, ymax = Inf, fill = domain_name),
data = GPYCdomain_2, inherit.aes = FALSE, alpha = 0.6) +
scale_fill_manual(values = domain_fill) +
geom_jitter(aes(color = Geographical.Location, size = (p.value))) +
labs(subtitle = "Frequency of Various Polymorphisms", title = "GYPC Gene") +
labs(color = "Geographical Location") +
labs(size = "p-value") +
labs(x = "Position of Polymorphism on GYPC Gene") +
labs(fill = "Protein Domain") +
scale_size_continuous(range = c(1, 4.5), trans = "reverse") +
guides(size = guide_legend(reverse = TRUE))

Emmeans continuous independant variable

I want to explan Type_f with Type_space of the experiment and the rate of Exhaustion_product and quantitative variable Age.
Here is my data :
res=structure(list(Type_space = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("",
"29-v1", "29-v2", "88-v1", "88-v2"), class = "factor"), Id = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L,
81L, 82L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L,
26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L,
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L,
52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L,
65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L,
78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L,
91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L,
103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L,
114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 123L, 124L,
125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 134L, 135L,
136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 145L, 146L,
147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L, 155L, 156L, 157L,
158L, 159L, 160L, 161L, 162L, 163L, 164L, 165L, 166L, 167L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L,
81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L,
94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L,
30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L,
43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L,
56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L,
69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L,
82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L,
95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L,
106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L,
117L, 118L, 119L, 120L, 121L, 122L, 123L, 124L, 125L, 126L, 127L,
128L, 129L, 130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L,
139L, 140L, 141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L,
150L, 151L, 152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L,
161L, 162L, 163L, 164L), Age = c(3, 10, 1, 5, 4, 2, 1, 8, 2,
13, 1, 6, 3, 5, 2, 1, 3, 8, 3, 6, 1, 3, 7, 1, 2, 2, 2, 1, 2,
5, 4, 1, 6, 3, 6, 8, 2, 3, 4, 7, 3, 2, 6, 2, 3, 7, 1, 5, 4, 1,
4, 3, 2, 3, 5, 5, 2, 1, 1, 5, 8, 7, 2, 2, 4, 3, 4, 4, 2, 2, 10,
7, 5, 3, 3, 5, 7, 5, 3, 4, 5, 4, 1, 8, 6, 1, 12, 1, 6, 3, 4,
4, 13, 5, 2, 7, 7, 20, 1, 1, 1, 7, 1, 4, 3, 8, 2, 2, 4, 1, 1,
2, 3, 2, 2, 6, 11, 2, 5, 5, 9, 4, 4, 2, 7, 2, 7, 10, 6, 9, 2,
2, 5, 11, 1, 8, 8, 4, 1, 2, 14, 11, 13, 20, 3, 3, 4, 16, 2, 6,
11, 9, 11, 4, 5, 6, 19, 5, 2, 6, 1, 7, 11, 3, 9, 2, 3, 6, 20,
8, 6, 2, 11, 18, 9, 3, 7, 3, 2, 1, 8, 3, 5, 6, 2, 5, 8, 11, 4,
9, 7, 2, 12, 8, 2, 9, 5, 4, 15, 5, 13, 5, 10, 13, 7, 6, 1, 12,
12, 10, 4, 2, 16, 7, 17, 11, 18, 4, 3, 12, 1, 3, 7, 3, 6, 5,
11, 10, 12, 6, 14, 8, 6, 7, 8, 5, 10, 12, 6, 13, 3, 11, 14, 7,
9, 9, 4, 13, 4, 2, 1, 2, 2, 1, 7, 9, 3, 10, 3, 2, 1, 3, 1, 4,
2, 4, 5, 4, 2, 13, 4, 1, 3, 1, 11, 4, 1, 3, 3, 7, 5, 4, 5, 6,
1, 2, 1, 2, 1, 6, 1, 7, 6, 9, 5, 1, 6, 3, 2, 3, 3, 8, 8, 3, 2,
2, 4, 2, 5, 2, 6, 8, 11, 1, 6, 3, 3, 4, 5, 5, 7, 4, 2, 7, 3,
3, 1, 3, 9, 5, 2, 4, 12, 1, 4, 5, 2, 7, 6, 1, 2, 6, 4, 2, 7,
3, 5, 5, 3, 7, 1, 5, 2, 1, 15, 3, 5, 2, 5, 13, 6, 2, 3, 5, 2,
8, 4, 2, 6, 7, 2, 4, 1, 13, 8, 2, 1, 2, 1, 1, 5, 2, 1, 6, 11,
4, 1, 7, 7, 4, 3, 5, 1, 4, 10, 1, 2, 6, 1, 11, 3, 8, 9, 2, 6,
8, 11, 14, 16, 4, 1, 4, 2, 1, 10, 4, 9, 3, 12, 8, 11, 8, 8, 5,
1, 4, 13, 3, 8, 5, 14, 3, 5, 5, 12, 1, 3, 4, 5, 2, 7, 6, 9, 6,
10, 5, 2, 3, 2, 10, 10, 10, 10, 10, 1, 14, 3, 5, 9, 6, 2, 2,
2, 4, 4, 11, 14, 2, 2, 2, 8, 7, 2, 10, 12, 1, 6, 10, 2, 3, 5,
10, 6, 1, 8, 4, 11, 5, 4, 3, 6, 2, 4, 6, 9, 3, 9, 11, 7, 3, 15,
3, 7, 3, 5, 4, 6, 9, 13, 8, 5, 7, 8, 8, 5, 10), Type_product = c("f",
"s", "f", "f", "f", "f", "s", "c", "s", "f", "c", "f", "f", "f",
"s", "s", "f", "f", "c", "f", "s", "f", "f", "s", "f", "c", "f",
"f", "s", "f", "f", "c", "f", "c", "f", "f", "f", "f", "f", "c",
"c", "c", "f", "f", "c", "c", "f", "c", "c", "c", "c", "c", "s",
"f", "c", "c", "c", "s", "f", "c", "f", "f", "c", "c", "f", "c",
"c", "c", "f", "c", "c", "c", "c", "c", "c", "c", "f", "c", "c",
"c", "c", "f", "c", "f", "f", "s", "f", "c", "f", "f", "f", "c",
"f", "f", "f", "f", "f", "s", "c", "c", "f", "f", "c", "c", "f",
"f", "c", "c", "f", "f", "s", "f", "c", "c", "f", "f", "f", "c",
"f", "f", "f", "c", "f", "f", "f", "f", "f", "f", "c", "f", "f",
"f", "f", "c", "s", "f", "c", "f", "f", "c", "f", "f", "f", "c",
"f", "c", "c", "c", "f", "f", "f", "f", "c", "c", "c", "f", "f",
"c", "c", "f", "c", "f", "f", "c", "c", "c", "c", "f", "f", "f",
"c", "c", "c", "f", "c", "f", "c", "f", "f", "f", "c", "f", "c",
"c", "c", "c", "c", "f", "c", "c", "c", "c", "c", "c", "c", "f",
"f", "f", "c", "f", "c", "f", "f", "c", "c", "f", "f", "f", "c",
"c", "c", "f", "c", "c", "c", "c", "c", "f", "c", "f", "f", "c",
"c", "f", "c", "f", "c", "f", "c", "c", "c", "f", "c", "c", "c",
"c", "c", "c", "c", "f", "c", "c", "f", "c", "c", "f", "f", "c",
"f", "f", "s", "c", "s", "c", "f", "c", "c", "s", "c", "c", "s",
"c", "m", "c", "c", "f", "f", "f", "f", "f", "f", "s", "f", "f",
"c", "c", "f", "c", "f", "f", "f", "c", "f", "f", "f", "s", "f",
"f", "c", "f", "c", "f", "m", "c", "c", "c", "f", "s", "f", "f",
"f", "c", "s", "c", "m", "f", "c", "m", "c", "f", "c", "f", "f",
"f", "c", "m", "f", "c", "c", "f", "c", "f", "c", "c", "c", "c",
"c", "f", "f", "f", "c", "m", "f", "m", "m", "c", "c", "c", "c",
"m", "m", "c", "f", "m", "m", "m", "m", "m", "m", "m", "m", "m",
"c", "c", "f", "f", "f", "f", "c", "f", "m", "f", "f", "f", "c",
"f", "f", "f", "c", "f", "f", "c", "c", "f", "c", "f", "c", "m",
"f", "c", "f", "c", "f", "f", "f", "f", "c", "c", "f", "f", "c",
"c", "f", "f", "f", "f", "f", "f", "c", "f", "c", "c", "f", "c",
"f", "f", "f", "f", "f", "f", "f", "c", "f", "c", "f", "c", "f",
"c", "f", "c", "f", "f", "c", "c", "c", "c", "c", "f", "f", "f",
"c", "f", "c", "f", "f", "c", "c", "f", "f", "c", "f", "c", "f",
"c", "c", "c", "f", "f", "c", "f", "c", "c", "f", "c", "f", "c",
"f", "c", "f", "c", "m", "c", "c", "m", "c", "c", "f", "c", "c",
"f", "c", "c", "c", "f", "c", "c", "m", "c", "m", "m", "c", "c",
"f", "c", "c", "c", "c", "m", "c", "c", "c", "m", "m", "m", "c",
"c", "c", "c", "m", "m", "f", "m", "m", "m", "m", "m", "m", "m",
"m", "m", "m", "m", "m", "m", "m", "m"), Exhaustion_product = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L), .Label = c("(0,10]", "(10,20]", "(20,30]", "(30,40]", "(40,50]",
"(50,60]", "(60,70]", "(70,80]", "(80,90]", "(90,100]"), class = "factor"),
Type_f = c(1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0,
1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1,
1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0,
1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1,
1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0,
1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0,
1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0,
0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0)), .Names = c("Type_space", "Id", "Age",
"Type_product", "Exhaustion_product", "Type_f"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L,
68L, 69L, 70L, 71L, 73L, 75L, 76L, 79L, 80L, 81L, 82L, 84L, 85L,
86L, 91L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L,
111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L,
122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L,
133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L,
144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L,
155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L, 163L, 164L, 165L,
166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L, 174L, 175L, 176L,
177L, 178L, 179L, 180L, 181L, 182L, 183L, 184L, 185L, 186L, 187L,
188L, 189L, 190L, 191L, 192L, 193L, 194L, 195L, 197L, 198L, 199L,
201L, 202L, 203L, 204L, 206L, 207L, 208L, 209L, 210L, 212L, 213L,
214L, 215L, 217L, 218L, 219L, 220L, 221L, 222L, 223L, 225L, 227L,
229L, 230L, 231L, 232L, 233L, 234L, 235L, 236L, 237L, 238L, 239L,
242L, 243L, 244L, 246L, 247L, 248L, 249L, 250L, 251L, 253L, 254L,
256L, 259L, 260L, 261L, 262L, 263L, 264L, 265L, 266L, 269L, 270L,
272L, 273L, 274L, 276L, 277L, 278L, 279L, 280L, 281L, 282L, 283L,
284L, 285L, 287L, 289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L,
297L, 298L, 300L, 301L, 302L, 303L, 306L, 308L, 309L, 311L, 312L,
313L, 314L, 315L, 316L, 317L, 318L, 319L, 320L, 322L, 323L, 325L,
326L, 327L, 328L, 329L, 331L, 332L, 334L, 335L, 336L, 338L, 339L,
340L, 341L, 342L, 343L, 344L, 345L, 346L, 347L, 348L, 349L, 350L,
352L, 353L, 354L, 356L, 357L, 358L, 359L, 360L, 361L, 363L, 364L,
365L, 366L, 367L, 368L, 369L, 370L, 372L, 373L, 374L, 375L, 376L,
377L, 378L, 379L, 380L, 381L, 382L, 384L, 385L, 387L, 388L, 389L,
391L, 393L, 394L, 395L, 396L, 397L, 398L, 399L, 400L, 401L, 402L,
404L, 407L, 408L, 409L, 411L, 412L, 413L, 414L, 415L, 416L, 417L,
418L, 419L, 420L, 421L, 422L, 423L, 424L, 425L, 426L, 427L, 428L,
429L, 430L, 431L, 432L, 433L, 434L, 435L, 436L, 437L, 438L, 439L,
440L, 442L, 443L, 444L, 445L, 446L, 447L, 448L, 449L, 450L, 451L,
452L, 453L, 454L, 455L, 456L, 457L, 458L, 459L, 460L, 461L, 462L,
463L, 464L, 465L, 466L, 467L, 468L, 469L, 470L, 471L, 472L, 473L,
474L, 476L, 477L, 478L, 479L, 480L, 481L, 482L, 483L, 484L, 486L,
487L, 488L, 489L, 490L, 491L, 492L, 493L, 494L, 495L, 496L, 497L,
498L, 500L, 501L, 502L, 503L, 504L, 505L, 506L, 507L, 508L, 509L,
510L, 511L, 512L, 513L, 514L, 515L, 516L, 517L, 518L, 519L, 520L,
521L, 522L, 523L, 524L, 525L, 526L, 527L, 528L, 529L, 530L, 531L,
532L, 534L, 535L, 536L, 537L, 538L, 539L, 540L, 541L, 542L, 543L,
547L, 548L, 550L, 551L, 552L, 553L, 554L, 555L, 556L, 557L, 558L,
559L, 560L, 561L, 562L, 563L, 565L, 566L, 567L, 568L, 569L, 570L,
571L, 572L, 573L, 575L, 577L, 579L, 580L, 581L, 582L, 583L, 585L,
586L, 587L, 590L, 592L, 599L, 606L, 608L), class = "data.frame")
an=Anova(glm(Type_f ~ Type_space + Exhaustion_product + Age , family=binomial,data=res))
gl=glm(Type_f ~ Type_space + Exhaustion_product + Age , family=binomial,data=res)
library("emmeans")
emmp <- emmeans( gl, pairwise ~ Exhaustion_product + Age)
summary( emmp, infer=TRUE)
(1) In the case of categorical variable the results are clear. But in the case of Age which is significant in the GLM, what is the value generated in the emmeans ?5.455426.Is that is means ? How can I interpret this ?
(0,10] 5.455426 0.36901411 0.2935894 Inf -0.20641061 0.94443883 1.257 0.2088
(2)I want to generate graphic representationof the interaction age and Exhaustion_product. Also this do not make sens.
emmip(gl, Exhaustion_product ~ Age)
Edit 1
Contrast result
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
(0,10],5.45542635658915 - (10,20],5.45542635658915 0.33231353 0.4078967 Inf -0.95814279 1.6227698 0.815 0.9984
(0,10],5.45542635658915 - (20,30],5.45542635658915 -0.53694399 0.4194460 Inf -1.86393835 0.7900504 -1.280 0.9582
(0,10],5.45542635658915 - (30,40],5.45542635658915 -0.16100309 0.4139472 Inf -1.47060101 1.1485948 -0.389 1.0000
(0,10],5.45542635658915 - (40,50],5.45542635658915 0.40113723 0.4021403 Inf -0.87110757 1.6733820 0.998 0.9925
(0,10],5.45542635658915 - (50,60],5.45542635658915 0.60576562 0.4106536 Inf -0.69341247 1.9049437 1.475 0.9022
(0,10],5.45542635658915 - (60,70],5.45542635658915 1.38800301 0.4319258 Inf 0.02152631 2.7544797 3.214 0.0430
(0,10],5.45542635658915 - (70,80],5.45542635658915 1.01677522 0.4147441 Inf -0.29534399 2.3288944 2.452 0.2952
(0,10],5.45542635658915 - (80,90],5.45542635658915 1.99085692 0.4747929 Inf 0.48876247 3.4929514 4.193 0.0011
(0,10],5.45542635658915 - (90,100],5.45542635658915 2.03923289 0.4745872 Inf 0.53778910 3.5406767 4.297 0.0007
Because this question seems like a self-learning one, I am going to do a similar example, not the same data. But the structure is the same, with one factor and one covariate as predictors.
The example is the emmeans::fiber dataset. Its response variable is fiber strength, the continuous predictor is the diameter, and the factor is the machine it was made on.
Model:
> mod = glm(log(strength) ~ machine + diameter, data = fiber)
> summary(mod)
... (output has been abbreviated) ...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.124387 0.068374 45.695 6.74e-14
machineB 0.026025 0.023388 1.113 0.290
machineC -0.044593 0.025564 -1.744 0.109
diameter 0.023557 0.002633 8.946 2.22e-06
(Dispersion parameter for gaussian family taken to be 0.001356412)
Analysis with emmeans is based on the reference grid, which by default consists of all levels of the factor and the mean of the covariate:
> ref_grid(mod)
'emmGrid' object with variables:
machine = A, B, C
diameter = 24.133
Transformation: “log”
You can confirm in R that mean(fiber$diameter) is 24.133. I emphasize this is the mean of the diameter values, not of anything in the model.
> summary(.Last.value)
machine diameter prediction SE df
A 24.13333 3.692901 0.01670845 Inf
B 24.13333 3.718925 0.01718853 Inf
C 24.13333 3.648307 0.01819206 Inf
Results are given on the log (not the response) scale.
Those summary values are the predictions from mod at each combination of machine and diameter. Now look at EMMs for machine
> emmeans(mod, "machine")
machine emmean SE df asymp.LCL asymp.UCL
A 3.692901 0.01670845 Inf 3.660153 3.725649
B 3.718925 0.01718853 Inf 3.685237 3.752614
C 3.648307 0.01819206 Inf 3.612652 3.683963
Results are given on the log (not the response) scale.
Confidence level used: 0.95
... we get exactly the same three predictions. But if we look at diameter:
> emmeans(mod, "diameter")
diameter emmean SE df asymp.LCL asymp.UCL
24.13333 3.686711 0.009509334 Inf 3.668073 3.705349
Results are averaged over the levels of: machine
Results are given on the log (not the response) scale.
Confidence level used: 0.95
... we get the EMM is equal to the average of the three predicted values in the reference grid. And note that it says in the annotations that results were averaged over machine, so it is worth reading that.
To get a graphical representation of the model results, we can do
> emmip(mod, machine ~ diameter, cov.reduce = range)
The argument cov.reduce = range is added to cause the reference grid to use the min and max diameter, rather than its average. Without that, we'd have gotten three dots instead of three lines. This plot still shows the model predictions, just over a more detailed grid of values. Notice that all three lines have the same slope. That is vbecause the model was specified that way: the diameter effect is added to the machine effect. Each line thus has the common slope of 0.023557 (see the output from summary(mod).
There is no post hoc test needed for diameter, since its one effect is already tested in summary(mod).
One last thing. The model used log(strength) as the response. If we want the EMMs on the same scale as strength, just add type = "response":
> emmeans(mod, "machine", type = "response")
machine response SE df asymp.LCL asymp.UCL
A 40.16118 0.6710311 Inf 38.86728 41.49815
B 41.22008 0.7085126 Inf 39.85455 42.63239
C 38.40960 0.6987496 Inf 37.06421 39.80384
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Again, the annotations below the results help explain the output.

r geom_bar reorder layers of bars by values

I have produced a bar chart that shows cumulative totals over periods of months for various programs using the following data structure and code:
library(dplyr)
data_totals <- data_long %>%
group_by(Period, Program) %>%
arrange(Period, Program) %>%
ungroup() %>%
group_by(Program) %>%
mutate(Running_Total = cumsum(Value))
dput(data_totals)
structure(list(Period = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L,
8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L), .Label = c("2018-04", "2018-05",
"2018-06", "2018-07", "2018-08", "2018-09", "2018-10", "2018-11",
"2018-12", "2019-01", "2019-02", "2019-03", "Apr-Mar 2019"), class = "factor"),
Program = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L), .Label = c("A",
"B", "C", "D",
"E"), class = "factor"), Value = c(5597,
0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0,
850, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0,
0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0,
1544, 0, 0, 0, 0, 1544, 0, 0, 0, 0, 1544), Running_Total = c(5597,
0, 0, 0, 1544, 5597, 0, 0, 0, 3088, 5597, 0, 0, 0, 4632,
5597, 0, 850, 0, 6176, 5597, 0, 850, 0, 7720, 5597, 0, 850,
0, 9264, 5597, 0, 850, 0, 10808, 5597, 0, 850, 0, 12352,
5597, 0, 850, 0, 13896, 5597, 0, 850, 0, 15440, 5597, 0,
850, 0, 16984, 5597, 0, 850, 0, 18528)), .Names = c("Period",
"Program", "Value", "Running_Total"), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -60L), vars = "Program", labels = structure(list(
Program = structure(1:5, .Label = c("A",
"B", "C", "D",
"E"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L), vars = "Program", drop = TRUE, .Names = "Program"), indices = list(
c(0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L
), c(1L, 6L, 11L, 16L, 21L, 26L, 31L, 36L, 41L, 46L, 51L,
56L), c(2L, 7L, 12L, 17L, 22L, 27L, 32L, 37L, 42L, 47L, 52L,
57L), c(3L, 8L, 13L, 18L, 23L, 28L, 33L, 38L, 43L, 48L, 53L,
58L), c(4L, 9L, 14L, 19L, 24L, 29L, 34L, 39L, 44L, 49L, 54L,
59L)), drop = TRUE, group_sizes = c(12L, 12L, 12L, 12L, 12L
), biggest_group_size = 12L)
# reorder the groups descending so that the lowest total will be on layers from front to back
reorder(data_totals$Program, -data_totals$Running_Total)
ggplot(data = data_totals, aes(x = Period, y = Running_Total)) +
geom_bar(aes(color = Program, group = Program, fill = Program),
stat = "identity", position = "identity", alpha = 1.0)
It works in that it creates the graph with all the proper data, but the smaller Running_Totals are obscured by the larger ones.
I get the following error message as well:
Warning message:
The plyr::rename operation has created duplicates for the following name(s): (`colour`)
Even though I do not have the plyr package loaded.
I can see all the Running_Totals if I set the alpha to 0.5
Running_Total for each Program by Period, alpha = 0.5:
How can I get the layers ordered so that the smallest values are on the front most layers working back toward the highest values?
The way I was trying to represent the data in the original question was flawed.
There is no advantage to having the Program with the maximum value for each Period be the top of the bar.
A more illustrative solution is to have a stacked bar, with labels indicating the contribution of each Program to the overall value of each Period:
ggplot(data = data_totals[which(data_totals$Running_Total > 0),], aes(x = Period, y = Running_Total, fill = Program)) +
geom_bar(aes(color = Program, group = Program, fill = Program), stat = "identity", position = "stack", alpha = 1.0) +
geom_text(aes(label = Running_Total), position = position_stack(vjust = 0.5))
I used [which(data_totals$Running_Total > 0),] to eliminate any "0" bars and labels.

R: minpack.lm::nls.lm failed with good results

I use nls.lm from the minpack.lm package to fit a lot of non linear models.
It often fails after 20 iterations because of a singular gradient matrix at initial parameter estimates.
The problem is when I have a look at the iterations before failling (trace = T) I can see the results was ok.
Reproductible example:
Data:
df <- structure(list(x1 = c(7L, 5L, 10L, 6L, 9L, 10L, 2L, 4L, 9L, 3L,
11L, 6L, 4L, 0L, 7L, 12L, 9L, 11L, 11L, 0L, 2L, 3L, 5L, 6L, 6L,
9L, 1L, 7L, 7L, 4L, 3L, 13L, 12L, 13L, 5L, 0L, 5L, 6L, 6L, 7L,
5L, 10L, 6L, 10L, 0L, 7L, 9L, 12L, 4L, 5L, 6L, 3L, 4L, 5L, 5L,
0L, 9L, 9L, 1L, 2L, 2L, 13L, 8L, 2L, 5L, 10L, 6L, 11L, 5L, 0L,
4L, 4L, 8L, 9L, 4L, 2L, 12L, 4L, 10L, 7L, 0L, 4L, 4L, 5L, 8L,
8L, 12L, 4L, 6L, 13L, 5L, 12L, 1L, 6L, 4L, 9L, 11L, 11L, 6L,
10L, 10L, 0L, 3L, 1L, 11L, 4L, 3L, 13L, 5L, 4L, 2L, 3L, 11L,
7L, 0L, 9L, 6L, 11L, 6L, 13L, 1L, 5L, 0L, 6L, 4L, 8L, 2L, 3L,
7L, 9L, 12L, 11L, 7L, 4L, 10L, 0L, 6L, 1L, 7L, 2L, 6L, 3L, 1L,
6L, 10L, 12L, 7L, 7L, 6L, 6L, 1L, 7L, 8L, 7L, 7L, 5L, 7L, 10L,
10L, 11L, 7L, 1L, 8L, 3L, 12L, 0L, 11L, 8L, 5L, 0L, 6L, 3L, 2L,
2L, 8L, 9L, 2L, 8L, 2L, 13L, 10L, 2L, 12L, 6L, 13L, 2L, 11L,
1L, 12L, 6L, 7L, 9L, 8L, 10L, 2L, 6L, 0L, 2L, 11L, 2L, 3L, 9L,
12L, 1L, 11L, 11L, 12L, 4L, 6L, 9L, 1L, 4L, 1L, 8L, 8L, 6L, 1L,
9L, 8L, 2L, 10L, 10L, 1L, 2L, 0L, 11L, 6L, 6L, 0L, 4L, 13L, 4L,
8L, 4L, 10L, 9L, 6L, 11L, 8L, 1L, 6L, 5L, 10L, 8L, 10L, 8L, 0L,
3L, 0L, 6L, 7L, 4L, 3L, 7L, 7L, 8L, 6L, 2L, 9L, 5L, 7L, 7L, 0L,
7L, 2L, 5L, 5L, 7L, 5L, 7L, 8L, 6L, 1L, 2L, 6L, 0L, 8L, 10L,
0L, 10L), x2 = c(4L, 6L, 1L, 5L, 4L, 1L, 8L, 9L, 4L, 7L, 2L,
6L, 9L, 11L, 5L, 1L, 3L, 2L, 2L, 12L, 8L, 9L, 6L, 4L, 4L, 2L,
9L, 6L, 6L, 6L, 8L, 0L, 0L, 0L, 8L, 10L, 7L, 7L, 4L, 5L, 5L,
3L, 6L, 3L, 12L, 6L, 1L, 0L, 8L, 6L, 6L, 7L, 8L, 5L, 8L, 11L,
3L, 2L, 12L, 11L, 10L, 0L, 2L, 8L, 8L, 3L, 7L, 2L, 7L, 10L, 7L,
8L, 2L, 4L, 7L, 11L, 1L, 8L, 2L, 5L, 11L, 9L, 7L, 5L, 5L, 3L,
1L, 8L, 4L, 0L, 5L, 0L, 12L, 5L, 9L, 1L, 2L, 0L, 5L, 0L, 2L,
10L, 9L, 10L, 0L, 8L, 10L, 0L, 6L, 8L, 8L, 7L, 1L, 6L, 10L, 1L,
5L, 1L, 6L, 0L, 12L, 7L, 13L, 6L, 9L, 2L, 11L, 10L, 5L, 2L, 0L,
2L, 5L, 6L, 2L, 10L, 4L, 10L, 4L, 9L, 5L, 9L, 11L, 4L, 3L, 1L,
6L, 3L, 7L, 7L, 10L, 3L, 3L, 6L, 3L, 7L, 4L, 1L, 0L, 1L, 4L,
11L, 4L, 10L, 0L, 11L, 0L, 3L, 5L, 11L, 5L, 8L, 10L, 9L, 4L,
3L, 10L, 4L, 10L, 0L, 3L, 9L, 1L, 7L, 0L, 8L, 1L, 11L, 0L, 5L,
4L, 2L, 2L, 0L, 11L, 6L, 13L, 9L, 1L, 9L, 7L, 3L, 1L, 12L, 2L,
2L, 1L, 6L, 4L, 2L, 10L, 6L, 10L, 2L, 3L, 4L, 9L, 2L, 5L, 10L,
0L, 0L, 10L, 9L, 12L, 0L, 7L, 5L, 10L, 6L, 0L, 9L, 4L, 8L, 1L,
3L, 5L, 2L, 4L, 12L, 4L, 5L, 2L, 5L, 0L, 2L, 10L, 8L, 10L, 7L,
3L, 8L, 8L, 6L, 3L, 5L, 6L, 11L, 4L, 5L, 4L, 3L, 10L, 6L, 8L,
6L, 7L, 4L, 8L, 5L, 3L, 7L, 12L, 8L, 4L, 11L, 2L, 3L, 12L, 1L
), x3 = c(1, 1, 1, 1, 3, 1, 0, 3, 3, 0, 3, 2, 3, 1, 2, 3, 2,
3, 3, 2, 0, 2, 1, 0, 0, 1, 0, 3, 3, 0, 1, 3, 2, 3, 3, 0, 2, 3,
0, 2, 0, 3, 2, 3, 2, 3, 0, 2, 2, 1, 2, 0, 2, 0, 3, 1, 2, 1, 3,
3, 2, 3, 0, 0, 3, 3, 3, 3, 2, 0, 1, 2, 0, 3, 1, 3, 3, 2, 2, 2,
1, 3, 1, 0, 3, 1, 3, 2, 0, 3, 0, 2, 3, 1, 3, 0, 3, 1, 1, 0, 2,
0, 2, 1, 1, 2, 3, 3, 1, 2, 0, 0, 2, 3, 0, 0, 1, 2, 2, 3, 3, 2,
3, 2, 3, 0, 3, 3, 2, 1, 2, 3, 2, 0, 2, 0, 0, 1, 1, 1, 1, 2, 2,
0, 3, 3, 3, 0, 3, 3, 1, 0, 1, 3, 0, 2, 1, 1, 0, 2, 1, 2, 2, 3,
2, 1, 1, 1, 0, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 3, 3, 1, 3, 3, 3,
0, 2, 2, 2, 1, 1, 1, 0, 0, 3, 2, 3, 1, 2, 1, 0, 2, 3, 3, 3, 3,
3, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 3, 2, 0, 0, 1, 1, 2, 1, 3,
1, 0, 0, 3, 3, 2, 2, 1, 2, 1, 3, 2, 3, 0, 0, 2, 3, 0, 0, 0, 1,
0, 3, 0, 2, 1, 3, 0, 3, 2, 3, 3, 0, 1, 0, 0, 3, 0, 1, 2, 1, 3,
2, 1, 3, 3, 0, 0, 1, 0, 3, 2, 1), y = c(0.03688, 0.09105, 0.16246,
0, 0.11024, 0.16246, 0.13467, 0, 0.11024, 0.0807, 0.12726, 0.03934,
0, 0.0826, 0.03688, 0.06931, 0.1378, 0.12726, 0.12726, 0.08815,
0.13467, 0.01314, 0.09105, 0.12077, 0.12077, 0.02821, 0.15134,
0.03604, 0.03604, 0.08729, 0.04035, 0.46088, 0.20987, 0.46088,
0.06672, 0.24121, 0.08948, 0.07867, 0.12077, 0.03688, 0.02276,
0.04535, 0.03934, 0.04535, 0.08815, 0.03604, 0.50771, 0.20987,
0.08569, 0.09105, 0.03934, 0.0807, 0.08569, 0.02276, 0.06672,
0.0826, 0.1378, 0.02821, 0.03943, 0.03589, 0.04813, 0.46088,
0.22346, 0.13467, 0.06672, 0.04535, 0.07867, 0.12726, 0.08948,
0.24121, 0.06983, 0.08569, 0.22346, 0.11024, 0.06983, 0.03589,
0.06931, 0.08569, 0.04589, 0.03688, 0.0826, 0, 0.06983, 0.02276,
0.06238, 0.03192, 0.06931, 0.08569, 0.12077, 0.46088, 0.02276,
0.20987, 0.03943, 0, 0, 0.50771, 0.12726, 0.1628, 0, 0.41776,
0.04589, 0.24121, 0.01314, 0.03027, 0.1628, 0.08569, 0, 0.46088,
0.09105, 0.08569, 0.13467, 0.0807, 0.12912, 0.03604, 0.24121,
0.50771, 0, 0.12912, 0.03934, 0.46088, 0.03943, 0.08948, 0.07103,
0.03934, 0, 0.22346, 0.03589, 0, 0.03688, 0.02821, 0.20987, 0.12726,
0.03688, 0.08729, 0.04589, 0.24121, 0.12077, 0.03027, 0.03688,
0.03673, 0, 0.01314, 0.02957, 0.12077, 0.04535, 0.06931, 0.03604,
0.36883, 0.07867, 0.07867, 0.03027, 0.36883, 0.03192, 0.03604,
0.36883, 0.08948, 0.03688, 0.16246, 0.41776, 0.12912, 0.03688,
0.02957, 0.1255, 0, 0.20987, 0.0826, 0.1628, 0.03192, 0.02276,
0.0826, 0, 0.04035, 0.04813, 0.03673, 0.1255, 0.1378, 0.04813,
0.1255, 0.04813, 0.46088, 0.04535, 0.03673, 0.06931, 0.07867,
0.46088, 0.13467, 0.12912, 0.02957, 0.20987, 0, 0.03688, 0.02821,
0.22346, 0.41776, 0.03589, 0.03934, 0.07103, 0.03673, 0.12912,
0.03673, 0.0807, 0.1378, 0.06931, 0.03943, 0.12726, 0.12726,
0.06931, 0.08729, 0.12077, 0.02821, 0.03027, 0.08729, 0.03027,
0.22346, 0.03192, 0.12077, 0.15134, 0.02821, 0.06238, 0.04813,
0.41776, 0.41776, 0.03027, 0.03673, 0.08815, 0.1628, 0.07867,
0, 0.24121, 0.08729, 0.46088, 0, 0.1255, 0.08569, 0.16246, 0.1378,
0, 0.12726, 0.1255, 0.03943, 0.12077, 0.02276, 0.04589, 0.06238,
0.41776, 0.22346, 0.24121, 0.04035, 0.24121, 0.07867, 0.36883,
0.08569, 0.04035, 0.03604, 0.36883, 0.06238, 0.03934, 0.03589,
0.11024, 0.02276, 0.03688, 0.36883, 0.24121, 0.03604, 0.13467,
0.09105, 0.08948, 0.03688, 0.06672, 0.03688, 0.03192, 0.07867,
0.03943, 0.13467, 0.12077, 0.0826, 0.22346, 0.04535, 0.08815,
0.16246)), .Names = c("x1", "x2", "x3", "y"), row.names = c(995L,
1416L, 281L, 1192L, 1075L, 294L, 1812L, 2235L, 1097L, 1583L,
670L, 1485L, 2199L, 2495L, 1259L, 436L, 803L, 631L, 617L, 2654L,
1813L, 2180L, 1403L, 911L, 927L, 533L, 2024L, 1517L, 1522L, 1356L,
1850L, 222L, 115L, 204L, 1974L, 2292L, 1695L, 1746L, 915L, 1283L,
1128L, 880L, 1467L, 887L, 2665L, 1532L, 267L, 155L, 1933L, 1447L,
1488L, 1609L, 1922L, 1168L, 1965L, 2479L, 813L, 550L, 2707L,
2590L, 2373L, 190L, 504L, 1810L, 2007L, 843L, 1770L, 659L, 1730L,
2246L, 1668L, 1923L, 465L, 1108L, 1663L, 2616L, 409L, 1946L,
589L, 1277L, 2493L, 2210L, 1662L, 1142L, 1331L, 735L, 430L, 1916L,
922L, 208L, 1134L, 127L, 2693L, 1213L, 2236L, 240L, 623L, 108L,
1190L, 9L, 575L, 2268L, 2171L, 2308L, 103L, 1953L, 2409L, 184L,
1437L, 1947L, 1847L, 1570L, 365L, 1550L, 2278L, 270L, 1204L,
384L, 1472L, 205L, 2694L, 1727L, 2800L, 1476L, 2229L, 453L, 2630L,
2426L, 1275L, 523L, 163L, 635L, 1287L, 1349L, 561L, 2261L, 931L,
2339L, 973L, 2113L, 1229L, 2155L, 2554L, 936L, 892L, 433L, 1560L,
697L, 1791L, 1755L, 2351L, 720L, 740L, 1558L, 674L, 1736L, 988L,
321L, 18L, 375L, 959L, 2560L, 1047L, 2429L, 119L, 2468L, 98L,
773L, 1158L, 2520L, 1216L, 1872L, 2364L, 2094L, 1035L, 826L,
2374L, 1028L, 2368L, 176L, 895L, 2090L, 399L, 1789L, 179L, 1800L,
369L, 2568L, 140L, 1207L, 1001L, 518L, 481L, 12L, 2597L, 1474L,
2749L, 2097L, 379L, 2110L, 1615L, 800L, 423L, 2733L, 626L, 662L,
421L, 1363L, 898L, 530L, 2315L, 1365L, 2331L, 468L, 768L, 900L,
2027L, 544L, 1337L, 2376L, 53L, 44L, 2338L, 2075L, 2655L, 78L,
1782L, 1231L, 2291L, 1379L, 212L, 2212L, 1032L, 1929L, 331L,
790L, 1226L, 664L, 1018L, 2735L, 916L, 1157L, 590L, 1343L, 7L,
490L, 2257L, 1853L, 2251L, 1748L, 719L, 1941L, 1885L, 1544L,
725L, 1294L, 1494L, 2601L, 1077L, 1169L, 979L, 709L, 2282L, 1526L,
1797L, 1424L, 1690L, 993L, 1979L, 1268L, 730L, 1739L, 2697L,
1842L, 952L, 2483L, 479L, 864L, 2677L, 283L), class = "data.frame")
Starting value
starting_value <- structure(c(0.177698291502873, 0.6, 0.0761564106440883, 0.05,
1.9, 1.1, 0.877181493020499, 1.9), .Names = c("F_initial_x2",
"F_decay_x2", "S_initial_x2", "S_decay_x2", "initial_x1", "decay_x1",
"initial_x3", "decay_x3"))
NLSLM fail
coef(nlsLM(
formula = y ~ (F_initial_x2 * exp(- F_decay_x2 * x2) + S_initial_x2 * exp(- S_decay_x2 * x2)) *
(1 + initial_x1 * exp(- decay_x1 * x1)) *
(1 + initial_x3 * exp(- decay_x3 * x3 )),
data = df,
start = coef(brute_force),
lower = c(0, 0, 0, 0, 0, 0, 0, 0),
control = nls.lm.control(maxiter = 200),
trace = T))
It. 0, RSS = 1.36145, Par. = 0.177698 0.6 0.0761564 0.05 1.9 1.1 0.877181 1.9
It. 1, RSS = 1.25401, Par. = 0.207931 0.581039 0.0769047 0.0577244 2.01947 1.22911 0.772957 5.67978
It. 2, RSS = 1.19703, Par. = 0.188978 0.604515 0.0722749 0.0792141 2.44179 1.1258 0.96305 8.67253
It. 3, RSS = 1.1969, Par. = 0.160885 0.640958 0.0990201 0.145187 3.5853 0.847158 0.961844 13.2183
It. 4, RSS = 1.19057, Par. = 0.142138 0.685678 0.11792 0.167417 4.27977 0.936981 0.959606 13.2644
It. 5, RSS = 1.19008, Par. = 0.124264 0.757088 0.136277 0.188896 4.76578 0.91274 0.955142 21.0167
It. 6, RSS = 1.18989, Par. = 0.118904 0.798296 0.141951 0.194167 4.93099 0.91529 0.952972 38.563
It. 7, RSS = 1.18987, Par. = 0.115771 0.821874 0.145398 0.197773 5.02251 0.914204 0.949906 38.563
It. 8, RSS = 1.18986, Par. = 0.113793 0.837804 0.147573 0.199943 5.07456 0.914192 0.948289 38.563
It. 9, RSS = 1.18986, Par. = 0.112458 0.848666 0.149033 0.201406 5.11024 0.914099 0.947232 38.563
It. 10, RSS = 1.18986, Par. = 0.111538 0.856282 0.150035 0.202411 5.13491 0.914051 0.946546 38.563
It. 11, RSS = 1.18986, Par. = 0.110889 0.861702 0.15074 0.203118 5.15244 0.914013 0.946076 38.563
It. 12, RSS = 1.18986, Par. = 0.110426 0.865606 0.151243 0.203623 5.16501 0.913986 0.945747 38.563
It. 13, RSS = 1.18986, Par. = 0.110092 0.868441 0.151605 0.203986 5.17412 0.913966 0.945512 38.563
It. 14, RSS = 1.18986, Par. = 0.109849 0.87051 0.151868 0.20425 5.18075 0.913952 0.945343 38.563
It. 15, RSS = 1.18985, Par. = 0.109672 0.872029 0.15206 0.204443 5.18561 0.913941 0.94522 38.563
It. 16, RSS = 1.18985, Par. = 0.109542 0.873147 0.152201 0.204585 5.18918 0.913933 0.945131 38.563
It. 17, RSS = 1.18985, Par. = 0.109446 0.873971 0.152305 0.204689 5.19181 0.913927 0.945065 38.563
Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Questions:
Does it make sense to use the best parameters found before the singular gradient matrix problem, ie the one found at Iteration = 17?
If yes is there a way to fetch them? I didn't succeed to save the results when an error occured.
I've noticed that if I reduce the number of maxiter to a number below 17 I still have the same error which appear in the new last iteration, which doesn't make sense to me
Eg with maxiter = 10
It. 0, RSS = 1.36145, Par. = 0.177698 0.6 0.0761564 0.05 1.9 1.1 0.877181 1.9
It. 1, RSS = 1.25401, Par. = 0.207931 0.581039 0.0769047 0.0577244 2.01947 1.22911 0.772957 5.67978
It. 2, RSS = 1.19703, Par. = 0.188978 0.604515 0.0722749 0.0792141 2.44179 1.1258 0.96305 8.67253
It. 3, RSS = 1.1969, Par. = 0.160885 0.640958 0.0990201 0.145187 3.5853 0.847158 0.961844 13.2183
It. 4, RSS = 1.19057, Par. = 0.142138 0.685678 0.11792 0.167417 4.27977 0.936981 0.959606 13.2644
It. 5, RSS = 1.19008, Par. = 0.124264 0.757088 0.136277 0.188896 4.76578 0.91274 0.955142 21.0167
It. 6, RSS = 1.18989, Par. = 0.118904 0.798296 0.141951 0.194167 4.93099 0.91529 0.952972 38.563
It. 7, RSS = 1.18987, Par. = 0.115771 0.821874 0.145398 0.197773 5.02251 0.914204 0.949906 38.563
It. 8, RSS = 1.18986, Par. = 0.113793 0.837804 0.147573 0.199943 5.07456 0.914192 0.948289 38.563
It. 9, RSS = 1.18986, Par. = 0.112458 0.848666 0.149033 0.201406 5.11024 0.914099 0.947232 38.563
It. 10, RSS = 0.12289, Par. = 0.112458 0.848666 0.149033 0.201406 5.11024 0.914099 0.947232 38.563
Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
In addition: Warning message:
In nls.lm(par = start, fn = FCT, jac = jac, control = control, lower = lower, :
lmdif: info = -1. Number of iterations has reached `maxiter' == 10.
Do you see any explanation?
The underlying problem in the question is that convergence is not being achieved. This can be resolved by transforming the decay parameters using Y = log(X+1) and then transforming them back afterwards using X = exp(Y)-1. Such transformations can beneficially modify the jacobian. Unfortunately, the application of such transformations tends to be largely trial and error. (Also see Note 1.)
ix <- grep("decay", names(starting_value))
fm <- nlsLM(
formula = y ~ (F_initial_x2 * exp(- log(F_decay_x2+1) * x2) +
S_initial_x2 * exp(- log(S_decay_x2+1) * x2)) *
(1 + initial_x1 * exp(- log(decay_x1+1) * x1)) *
(1 + initial_x3 * exp(- log(decay_x3+1) * x3 )),
data = df,
start = replace(starting_value, ix, exp(starting_value[ix]) - 1),
lower = c(0, 0, 0, 0, 0, 0, 0, 0),
control = nls.lm.control(maxiter = 200),
trace = TRUE)
giving a similar residual sum of squares but achieving convergence:
> fm
Nonlinear regression model
model: y ~ (F_initial_x2 * exp(-log(F_decay_x2 + 1) * x2) + S_initial_x2 * exp(-log(S_decay_x2 + 1) * x2)) * (1 + initial_x1 * exp(-log(decay_x1 + 1) * x1)) * (1 + initial_x3 * exp(-log(decay_x3 + 1) * x3))
data: df
F_initial_x2 F_decay_x2 S_initial_x2 S_decay_x2 initial_x1 decay_x1
1.092e-01 1.402e+00 1.526e-01 2.275e-01 5.199e+00 1.494e+00
initial_x3 decay_x3
9.449e-01 1.375e+07
residual sum-of-squares: 1.19
Number of iterations to convergence: 38
Achieved convergence tolerance: 1.49e-08
> replace(coef(fm), ix, log(coef(fm)[ix]+1))
F_initial_x2 F_decay_x2 S_initial_x2 S_decay_x2 initial_x1 decay_x1
0.1091735 0.8763253 0.1525997 0.2049852 5.1993194 0.9139096
initial_x3 decay_x3
0.9448779 16.4368001
Note 1: After some experimentation I noticed that it was good enough to just apply the transformation on decay_x3.
Note 2: Regarding the comment that you would like something automatic note that a third degree polynomial fit with lm would more consistently not run into trouble and has lower residual sum of squares -- 1.14 vs. 1.19 -- but at the expense of more parameters -- 10 vs. 8.
# lm poly fit
fm.poly <- lm(y ~ poly(x1, x2, degree = 3), df)
deviance(fm.poly) # residual sum of squares
## [1] 1.141398
length(coef(fm.poly)) # no. of coefficients
## [1] 10
# nlsLM fit transforming decay parameters
deviance(fm)
## [1] 1.189855
length(coef(fm))
## [1] 8
Note 3: Here is another model formed by replacing the x3 part with a quadratic polynomial and dropping F_initial_x2 as it becomes redundant. It also has 8 parameters, it converges and it fits the data better than the model in the question (i.e. has lower residual sum of squares).
fm3 <- nlsLM(formula = y ~ (exp(- F_decay_x2 * x2) +
S_initial_x2 * exp(- S_decay_x2 * x2)) *
(1 + initial_x1 * exp(- decay_x1 * x1)) *
cbind(1, poly(x3, degree = 2)) %*% c(p1,p2,p3),
data = df,
start = c(starting_value[-c(1, 7:8)], p1=0, p2=0, p3=0),
lower = c(0, 0, 0, 0, 0, 0, NA, NA),
control = nls.lm.control(maxiter = 200),
trace = TRUE)
giving:
> fm3
Nonlinear regression model
model: y ~ (exp(-F_decay_x2 * x2) + S_initial_x2 * exp(-S_decay_x2 * x2)) * (1 + initial_x1 * exp(-decay_x1 * x1)) * cbind(1, poly(x3, degree = 2)) %*% c(p1, p2, p3)
data: df
F_decay_x2 S_initial_x2 S_decay_x2 initial_x1 decay_x1 p1
3.51614 2.60886 0.26304 8.26244 0.81232 0.09031
p2 p3
-0.16968 0.53324
residual sum-of-squares: 1.019
Number of iterations to convergence: 20
Achieved convergence tolerance: 1.49e-08
Note 4: nlxb from the nlmrt package converges without doing anything special.
library(nlmrt)
nlxb(
formula = y ~ (F_initial_x2 * exp(- F_decay_x2 * x2) + S_initial_x2 * exp(- S_decay_x2 * x2)) *
(1 + initial_x1 * exp(- decay_x1 * x1)) *
(1 + initial_x3 * exp(- decay_x3 * x3 )),
data = df,
start = starting_value,
lower = c(0, 0, 0, 0, 0, 0, 0, 0),
control = nls.lm.control(maxiter = 200),
trace = TRUE)
giving:
residual sumsquares = 1.1899 on 280 observations
after 31 Jacobian and 33 function evaluations
name coeff SE tstat pval gradient JSingval
F_initial_x2 0.109175 NA NA NA 3.372e-11 15.1
F_decay_x2 0.876313 NA NA NA -5.94e-12 8.083
S_initial_x2 0.152598 NA NA NA 6.55e-11 2.163
S_decay_x2 0.204984 NA NA NA 4.206e-11 0.6181
initial_x1 5.19928 NA NA NA -1.191e-12 0.3601
decay_x1 0.91391 NA NA NA 6.662e-13 0.1315
initial_x3 0.944879 NA NA NA 2.736e-12 0.02247
decay_x3 33.9921 NA NA NA -1.056e-15 2.928e-15
Often when this error occurs, the problem is not the code but the used model. A singular gradient matrix at the initial parameter estimates might indicate that there is not a single unique solution for the model or that the model is overspecified for the data at hand.
To answer your questions:
Yes, that makes sense. The function nlsLM first calls nls.lm which does the iteration. When it reaches the end of the iterations (either because of a best fit or because max.iter), the result is passed on to the function nlsModel. That function does a QR decomposition of the gradient matrix multiplied by the squared weights. And your initial gradient matrix contains a column with only zeros. So while nls.lm can do the iterations, it's only at the next step nlsModel that the problem with the gradient matrix is actually checked and discovered.
There is a way, but that requires you to change the options within R itself, specifically the error option. By setting it to dump.frames, you get a dump of all the environments that exist at the time of error. Those are stored in a list called last.dump and you can use these environments to look for the values you want.
In this case the parameters are returned by a function getPars() that resides inside the environment of the workhorse function nlsModel:
old.opt <- options(error = dump.frames)
themod <- nlsLM(
formula = y ~ (F_initial_x2 * exp(- F_decay_x2 * x2) +
S_initial_x2 * exp(- S_decay_x2 * x2)) *
(1 + initial_x1 * exp(- decay_x1 * x1)) *
(1 + initial_x3 * exp(- decay_x3 * x3 )),
data = df,
start = starting_value,
lower = c(0, 0, 0, 0, 0, 0, 0, 0),
control = nls.lm.control(maxiter = 200),
trace = TRUE)
thecoefs <- llast.dump[["nlsModel(formula, mf, start, wts)"]]$getPars()
options(old.opt) # reset to the previous value.
Note that this is NOT the kind of code you want to use in a production environment or to share with colleagues. And it's also not a solution to your problem, because the problem is the model, not the code.
This is another consequence of what I explained in 1. So yes, that's logic.
I've done a very brief test to see if it really is the model, and if I replace the last parameter (decay_x3) by its start value, the model is fitted without problem. I don't know what we're dealing with here, so dropping another parameter might make more sense in the real world, but just to prove that your code is fine:
themod <- nlsLM(
formula = y ~ (F_initial_x2 * exp(- F_decay_x2 * x2) +
S_initial_x2 * exp(- S_decay_x2 * x2)) *
(1 + initial_x1 * exp(- decay_x1 * x1)) *
(1 + initial_x3 * exp(- 1.9* x3 )),
data = df,
start = starting_value[-8],
lower = c(0, 0, 0, 0, 0, 0, 0, 0)[-8],
control = nls.lm.control(maxiter = 200),
trace = TRUE)
exits without errors at iteration 10.
EDIT:
I've been looking a bit deeper into it, and based on the data the "extra" solution is basically to kick x3 out of the model. You only have 3 unique values in there, and the initial estimate for the parameter is about 38. So:
> exp(-38*c(1,2,3)) < .Machine$double.eps
[1] TRUE TRUE TRUE
If you compare that to the actual Y values, it's clear that initial_x3 * exp(- decay_x3 * x3 ) doesn't contribute anything to the model, as it is practically 0.
If you manually calculate the gradient as done in nlsModel, you get a matrix that's not of full rank; the last column contains only 0 :
theenv <- list2env( c(df, thecoefs))
thederiv <- numericDeriv(form[[3]], names(starting_value), theenv)
thegrad <- attr(thederiv, "gradient")
And that's what gives you the error. The model is overspecified for the data you have.
The log-transformation that Gabor suggests, prevents that your last estimate becomes so big it forces x3 out of the model. Due to the log transformation, the algorithm doesn't jump to such extreme values very easily. In order to have the same estimates as with the original model, his decay_x3 should be as high as 3.2e16 to specify the same model (exp(38)). So the log transformation protects you from estimates that force the influence of any variable to 0.
Another side effect of the log transformation is that big steps in the value of decay_x3 have only a moderate effect on the model. The estimate Gabor finds, is already a whopping 1.3e7, but after the back transformation that's still a doable value of 16 for decay_x3. Which still makes x3 redundant in the model if you look at :
> exp(-16*c(1,2,3))
[1] 1.125352e-07 1.266417e-14 1.425164e-21
But it doesn't cause the singularity that causes your error.
You can avoid this by setting your upper boundaries, eg:
themod <- nlsLM(
formula = y ~ (F_initial_x2 * exp(- F_decay_x2 * x2) +
S_initial_x2 * exp(- S_decay_x2 * x2)) *
(1 + initial_x1 * exp(- decay_x1 * x1)) *
(1 + initial_x3 * exp(- decay_x3 * x3 )),
data = df,
start = starting_value,
lower = c(0, 0, 0, 0, 0, 0, 0, 0),
upper = rep(- log(.Machine$double.eps^0.5),8),
control = nls.lm.control(maxiter = 200),
trace = TRUE)
runs perfectly fine, gives you the same estimates, and again concludes that x3 is redundant.
So whatever way you look at it, x3 has no impact on y, your model is overspecified and can't be fit decently with the data at hand.

Resources