I have a data such that produced from special function:
where t0=1, alpha, q, gamma, C and beta are unknown parameters.
The question is how to fit the above function to following data, in R?
mydata<-structure(list(x = 1:100, y = c(0, 0, 2, 1, 3, 4, 4, 3, 7, 8,
9, 11, 12, 11, 15, 15, 17, 21, 49, 43, 117, 75, 85, 97, 113,
129, 135, 147, 149, 149, 123, 129, 127, 122, 143, 157, 144, 139,
123, 117, 141, 138, 124, 134, 158, 151, 136, 133, 121, 117, 122,
125, 117, 111, 98, 94, 92, 89, 73, 87, 91, 88, 94, 90, 93, 76,
60, 96, 71, 80, 71, 63, 65, 47, 74, 63, 78, 68, 55, 48, 51, 45,
48, 50, 71, 48, 35, 51, 69, 62, 64, 66, 51, 59, 58, 34, 57, 56,
63, 50)), class = "data.frame", row.names = c(NA, -100L))
I defined the function as follows:
t0<<-1
fyy<-function(t,cc0,alpha0,qq0,beta0,gamma0){
ret<-cc0*((t-t0)^alpha0)/(((1+(qq0-1)*beta0*(t-t0)^gamma0))^(1/(qq0-1)))
return(ret)
}
but I don't know how to continue?
as #mhovd mentioned I used "nls" function but I got an error as follows:
> fit <- nls(y~fyy(x,cc0 ,alpha0 ,beta0 ,gamma0 ,qq0 ),
data=data.frame(mydata), start=list(cc0 = .01,alpha0 =1,beta0 =.3,gamma0
= 2,qq0 = 1))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
In the comments #masoud references a paper about the specific function in the question. It suggests fixing gamma0 and qq0 and if we do that we do get a solution -- fm shown in red in the plot. We have also shown an alternate parametric curve as fm2 in blue. It also has 3 optimized parameters but has lower residual sum of squares (lower is better).
fyy <- function(t,cc0,alpha0,qq0,beta0,gamma0){
cc0 * ((t-t0)^alpha0) / (((1+(qq0-1)*beta0*(t-t0)^gamma0))^(1/(qq0-1)))
}
mydata0 <- subset(mydata, y > 0)
# fixed values
t0 <- 1
gamma0 <- 3
qq0 <- 1.2
st <- list(cc0 = 1, alpha0 = 1, beta0 = 1) # starting values
fm <- nls(y ~ fyy(x, cc0, alpha0, qq0, beta0, gamma0), mydata0,
lower = list(cc0 = 0.1, alpha0 = 0.1, beta0 = 0.00001),
start = st, algorithm = "port")
deviance(fm) # residual sum of squares
## [1] 61458.5
st2 <- list(a = 1, b = 1, c = 1)
fm2 <- nls(y ~ exp(a + b/x + c*log(x)), mydata0, start = st2)
deviance(fm2) # residual sum of squares
## [1] 16669.24
plot(mydata0, ylab = "y", xlab = "t")
lines(fitted(fm) ~ x, mydata0, col = "red")
lines(fitted(fm2) ~ x, mydata0, col = "blue")
legend("topright", legend = c("fm", "fm2"), lty = 1, col = c("red", "blue"))
Related
I have this dataframe:
df <- structure(list(a = c(2, 5, 90, 77, 56, 65, 85, 75, 12, 24, 52,
32), b = c(45, 78, 98, 55, 63, 12, 23, 38, 75, 68, 99, 73), c = c(77,
85, 3, 22, 4, 69, 86, 39, 78, 36, 96, 11), d = c(52, 68, 4, 25,
79, 120, 97, 20, 7, 19, 37, 67), e = c(14, 73, 91, 87, 94, 38,
1, 685, 47, 102, 666, 74)), class = "data.frame", row.names = c(NA,
-12L))
and the script:
R <- Map(`+`, list(1:3), 0:3)
df_cum <- as.matrix(rep(NA, ncol(df)))
for (r in seq(R)) {
for (f in seq(ncol(df))) {
df_cum <- sapply(df[R[[r]],], function(x) (cumprod(1 + x) - 1)*100)
}
}
I want to change all the first row values to "0", for each loop (1:3, 2:4, 3:5,...), before
df_cum <- sapply(df[R[[r]],], function(x) (cumprod(1 + x) - 1)*100)
I.e. for the first cicle 1:3 (df rows), the first row values change from "2, 45, 77, 52, 14" to "0, 0, 0, 0, 0".
How can I do?
Thx
I am trying to create a triangular plot,that three dimensions of which represent three herbal strategies.
One dimension represents the strategy of C (competitive plant), the second dimension “S” (stress tolerant plants) and the third dimension ”R” (ruderal plants), the points on it represent the plant species.
I want to write the species name outside the triangle and connect it to the points inside the triangle with an arrow. How do I draw this ternary plot?
The following is the data structure and my code
require(Ternary)
TernaryPlot()
#Plot two stylised plots side by side, and plot data
par(mfrow=c(1, 1), mar=rep(0.3, 4))
TernaryPlot(atip='C%', btip='R%', ctip='S%',
point='UP', lab.cex=0.8, grid.minor.lines=0,
grid.lty='solid', col='#FFFFFF', grid.col='GREY',
axis.col=rgb(0.1, 0.1, 0.1), ticks.col=rgb(0.1, 0.1, 0.1),
padding=0.08)
data_points <- list("Bromus dantonia" = c(47, 59, 149),
"Calamagrosis psoudo phragmatis" = c(90, 102, 63),
"Carex diluta" = c(109, 64, 82),
"Carex divisa" = c(96, 99, 59),
"Carex pseudocyperus" = c(130, 71, 54),
"Carex stenophylla" = c(97, 98, 59),
"Catabrosa aquatica" = c(100, 5, 150),
"Centaurea iberica" = c(124, 85, 46),
"Cirsium hygrophilum" = c(158, 42, 55),
"Cladium mariscus" = c(159, 96, 0),
"cod2" = c(54, 82, 119),
"Cynodon dactylon" = c(121, 54, 80),
"Eleocharis palustri" = c(124, 100, 31),
"Epilobium parviflorum" = c(67, 80, 107),
"Eromopoa persica" = c(83, 15, 157),
"Funaria cf.microstoma" = c(8, 0, 247),
"Glaux maritime" = c(4, 196, 55),
"Hordeum brevisubulatum" = c(76, 70, 109),
"Hordeum glaucum" = c(40, 79, 136),
"Inula britannica" = c(95, 108, 51),
"Juncus articulatus" = c(107, 79, 69),
"Blysmus compressus" = c(81, 127, 47),
"Juncusinflexus"= c(149, 106, 0),
"Medicago polymorpha" = c(60, 86, 109),
"Mentha spicata" = c(150, 23, 82),
"Ononis spinosa" = c(66, 112, 77),
"Phragmites australis" = c(234, 0, 21),
"Plantago amplexicaulis" = c(108, 83, 64),
"Poa trivialis" = c(90, 28, 138),
"Polygonum paronychioides" = c(20, 12, 223),
"Potentila reptans" = c(106, 41, 108),
"Potentilla anserina" = c(105, 58, 91),
"Ranunculus grandiflorus" = c(129, 25, 101),
"Schoenus nigricans" = c(143, 91, 21),
"Setaria viridis" = c(10, 7, 238),
"Sonchus oleraceus" = c(178, 0, 77),
"Taraxacum officinale" = c(117, 28, 110),
"Trifolium repens" = c(94, 4, 157),
"Triglochin martima" = c(63, 96, 95),
"Veronica anagallis-aquatica" = c(55, 37, 163)
)
AddToTernary(points, data_points, pch=21, cex=1.2,
bg=vapply(data_points,
function (x) rgb(x[1], x[2], x[3], 128,
maxColorValue=255),
character(1))
)
AddToTernary(text, data_points, names(data_points), cex=0.8, font=1)
Having trouble fitting an appropriate curve to this data.
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 31, 41, 51, 61, 71,
81, 91, 110, 210, 310, 410, 510, 610, 710, 810, 910, 1100, 2100,
3100, 4100, 5100, 6100, 7100, 8100, 9100)
y <- c(75, 84, 85, 89, 88, 91, 92, 92, 93, 92, 94, 95, 95, 96, 95,
95, 94, 97, 97, 97, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99)
Tried so far:
fit1 <- lm(y~log(x)+I(1/x))
fit2 <- lm(y~log(x)+I(1/x)+x)
plot(x,y, log="x")
lines(0.01:10000, predict(fit1, newdata = data.frame(x=0.01:10000)))
lines(0.01:10000, predict(fit2, newdata = data.frame(x=0.01:10000)), col='red')
The fits are ok, but arrived at entirely empirically and there is room for improvement. I did not fit loess or splines to be any better.
The concrete goal is to increase the R^2 of the fit and improve regression diagnostics (e.g. Q-Q plots of residuals).
Edit: Expected Model: this is sampling data, where more samples (x) improve the accuracy of the estimate (y); it would saturate at 100%.
This would be my function guess and according fit in python
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as so
def f( x, a, b , s, p ):
return a + b * s * ( x - 1 ) / ( 1 + ( s * ( x - 1 ) )**( abs( 1 / p ) ) )**abs( p )
def g( x, a , s, p ):
return a * s * x / ( 1 + ( s * x )**( abs( 1 / p ) ) )**abs( p )
def h( x, s, p ):
return 100 * s * x / ( 1 + ( s * x )**( abs( 1 / p ) ) )**abs( p )
xData = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 31, 41, 51, 61, 71,
81, 91, 110, 210, 310, 410, 510, 610, 710, 810, 910, 1100, 2100,
3100, 4100, 5100, 6100, 7100, 8100, 9100 ]
yData = [ 75, 84, 85, 89, 88, 91, 92, 92, 93, 92, 94, 95, 95, 96, 95,
95, 94, 97, 97, 97, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99,
99, 99, 99, 99, 99, 99 ]
xList = np.logspace( 0, 5, 100 )
bestFitF, err = so.curve_fit( f , xData, yData, p0=[ 75, 25, 1, 1])
bestFitG, err = so.curve_fit( g , xData, yData)
bestFitH, err = so.curve_fit( h , xData, yData)
fList = np.fromiter( ( f(x, *bestFitF ) for x in xList ), np.float)
gList = np.fromiter( ( g(x, *bestFitG ) for x in xList ), np.float)
hList = np.fromiter( ( h(x, *bestFitH ) for x in xList ), np.float)
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( xData, yData, marker='o', linestyle='')
ax.plot( xList, fList, linestyle='-.', label='f')
ax.plot( xList, gList, linestyle='-.', label='g')
ax.plot( xList, hList, linestyle='-.', label='h')
ax.set_xscale( 'log' )
ax.legend( loc=0 )
plt.show()
Function f requires start values, g and h don't. It should be possible to write some code to guess the parameters, basically the first one is yData[0], the second is yData[-1] - yData[0] and the others don't matter and are just set to 1, but I did it manually here.
Both, g and h have the property that they pass ( 0, 0 ).
Additionally, h will saturate at 100.
Note: Sure the more parameters the better the fit, but if it is, e.g., a CDF you probably want a fixed saturation value and maybe the pass through ( 0, 0 ) as well.
This might be an acceptable fit to the Gunary equation, with an R-squared value of 0.976:
y = x / (a + bx + cx^0.5)
Fitting target of lowest sum of squared absolute error = 2.4509677507601545E+01
a = 1.2327255760994933E-03
b = 1.0083740273268828E-02
c = 1.9179200839782879E-03
R package drc has many options.
Here is a 5-parameter log-logistic model, which yields residuals lower than the fits in the question.
BONUS: It has a self-starter function, so you avoid the challenge of finding initial values for non-linear regression.
library(drc)
dosefit <- drm(y ~ x, fct = LL2.5())
I have a set of data that I have collected which consists of a time series, where each y-value is found by taking the mean of 30 samples of grape cluster weight.
I want to simulate more data from this, with the same number of x and y values, so that I can carry out some Bayesian analysis to find the posterior distribution of the data.
I have the data, and I know that the growth follows a Gompertz curve with formula:
[y = a*exp(-exp(-(x-x0)/b))], with a = 88.8, b = 11.7, and x0 = 15.1.
The data I have is
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165).
Any help would be appreciated thank you
*Will edit when more information is given**
I am a little confused by your question. I have compiled what you have written into R. Please elaborate for me so that I can help you:
gompertz <- function(x, x0, a, b){
a*exp(-exp(-(x-x0)/b))
}
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165) # means of 30 samples of grape cluster weights?
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112) # ?
#??
gompertz(x, x0 = 15.1, a = 88.8, b = 11.7)
gompertz(y, x0 = 15.1, a = 88.8, b = 11.7)
I am struggling with some strange behaviour in R, with the quantile function.
I have two sets of numeric data, and a custom boxplot stats function (which someone helped me write, so I am actually not too sure about every detail):
sample_lang = c(91, 122, 65, 90, 90, 102,
98, 94, 84, 86, 108, 104,
94, 110, 100, 86, 92, 92,
124, 108, 82, 65, 102, 90, 114,
88, 68, 112, 96, 84, 92,
80, 104, 114, 112, 108, 68,
92, 68, 63, 112, 116)
sample_vocab = c(96, 136, 81, 92, 95,
112, 101, 95, 97, 94,
117, 95, 111, 115, 88,
92, 108, 81, 130, 106,
91, 95, 119, 103, 132, 103,
65, 114, 107, 108, 86,
100, 98, 111, 123, 123, 117,
82, 100, 97, 89, 132, 114)
my.boxplot.stats <- function (x, coef = 1.5, do.conf = TRUE, do.out = TRUE) {
if (coef < 0)
stop("'coef' must not be negative")
nna <- !is.na(x)
n <- sum(nna)
#stats <- stats::fivenum(x, na.rm = TRUE)
stats <- quantile(x, probs = c(0.15, 0.25, 0.5, 0.75, 0.85), na.rm = TRUE)
iqr <- diff(stats[c(2, 4)])
if (coef == 0)
do.out <- FALSE
else {
out <- if (!is.na(iqr)) {
x < (stats[2L] - coef * iqr) | x > (stats[4L] + coef *
iqr)
}
else !is.finite(x)
if (any(out[nna], na.rm = TRUE))
stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
}
conf <- if (do.conf)
stats[3L] + c(-1.58, 1.58) * iqr/sqrt(n)
list(stats = stats, n = n, conf = conf, out = if (do.out) x[out &
nna] else numeric())
}
However, when I call quantile and my.boxplot.stats on the same set of data, I am getting different quantile results for the sample_vocab data (but it appears consistent with the sample_lang data), and I am not sure why:
> quantile(sample_vocab, probs = c(0.15, 0.25, 0.5, 0.75, 0.85), na.rm=TRUE)
15% 25% 50% 75% 85%
89.6 94.5 101.0 114.0 118.4
>
> my.boxplot.stats(sample_vocab)
$stats
15% 25% 50% 75% 85%
81.0 94.5 101.0 114.0 136.0
Could someone help me understand what is happening? Please note, I am reasonably experienced with programming, but have no formal training in R, I am learning on my own.
Thanks so much in advance!
The relevant bit of code is right here:
if (coef == 0)
do.out <- FALSE
else {
out <- if (!is.na(iqr)) {
x < (stats[2L] - coef * iqr) | x > (stats[4L] + coef *
iqr)
}
else !is.finite(x)
if (any(out[nna], na.rm = TRUE))
stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
}
Basically, if coef != 0 (in your case coef is 1.5, the default function parameter), then the first and last elements of the reported quantiles are replaced with the minimum and maximum data value within coef * iqr of the 25% and 75% quantiles, where iqr is the distance between those quantiles.