I had an old function that worked like a charm:
lm_eqn = function(m) {
l <- list(a = format(coef(m)[1], digits = 2),
b = format(abs(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- substitute(italic(C)[i] == a + b %.% italic(I)[i]*","~~italic(r)^2~"="~r2,l)
as.character(as.expression(eq));
}
where m was an lm model. This would produce an equation like the following:
y = 0.3 + 4.4x, r = 0.67
which could then be used in a ggplot to show the model formula with its graph. The problem is that the same equation now incorporates uncalled for symbols:
y = c(0.3) + c(4.4)x, r=0.67
The concatenated c() is now included for each variable from the list I am accruing - and I don't know why. Does anyone know how to
a) prevent this, or
b) correct it?
Note: the problem seems to emerge in substitution, the output of eq is:
"italic(y) == c(`(Intercept)` = \"0.3\") + c(x = \"4.4\") %.% italic(x) * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.67\""
It looks like substitute's output includes the c() for the intercept and slope.
edit
m in this case is a generic lm element. For example
x <- c(5,3,6,8,2,6)
y <- c(2,6,3,7,4,9)
test.lm <- lm(y~x)
lm_eqn(test.lm)
[1] "italic(C)[i] == c(`(Intercept)` = \"3.3\") + c(x = \"0.37\") %.% italic(I)[i] * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.0969\""
You apparently need to unname the coef() values:
lm_eqn = function(m) {
l <- list(a = format(unname(coef(m))[1], digits = 2),
b = format(abs(unname(coef(m))[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- bquote( italic(C)[i] == .(l$a) + .(l$b) %.% italic(I)[i]*","~~italic(r)^2~"="~.(l$r2))
as.character(as.expression(eq));
}
I also think you need to clarify exactly what you are hoping to see. At the moment you are creating an expression vector with two elements and then you are converting that to a character. The fact that ggplot requires character values for its "expressions" makes it quite difficult to look at a character value and figure out what will be displayed, so you should probably expand your test code to include that manner in which this value will be delivered. (It's much easier to look at a real R expression.) I think there are mechanisms that allow unevaluated expressions to be passed to ggplot annotations and titles but they seem incredibly convoluted to my eyes.
Could also use substitute which requires specifying a list that has named elements.
lm_eqn = function(m) {
l <- list(a = format(unname(coef(m))[1], digits = 2),
b = format(abs(unname(coef(m))[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
eq <- substitute( italic(C)[i] == a + b %.% italic(I)[i]*","~~italic(r)^2 == r2, env=l) )
as.character(as.expression(eq));
}
lm_eqn(test.lm)
[1] "italic(C)[i] == \"3.3\" + \"0.37\" %.% italic(I)[i] * \",\" ~ ~italic(r)^2 == \"0.0969\""
Related
I have a function R^5 -> R, and I am interested in its minimum. There are plenty of functions in R like optim, optimize or fminbnd in the R package pracma. But they just accept one argument and I don't understand the help page.
mindisturbed <- function(a,d1,d2,d3,p){
sum((data^(- a) * (d1 + d2*cos(log(data)*2*pi/p) + d3 *
sin(log(data)*2*pi/p)) - log(j))^2)
}
The "data" and the "j" variable are in my global settings. These are vectors with length k. The arguments of the function are all numeric numbers with length 1. The function is an residual square sum.
So do anyone know how to minimize this function in depend of all its arguments?
Assuming data and j are vectors of the same length try the following. You may or may not need better starting values.
1) Use optim like this
st <- c(a = 1, d1 = 1, d2 = 1, d3 = 1, p = 1)
f <- function(x) mindisturbed(x[1], x[2], x[3], x[4], x[5])
optim(st, f)
2) or nls with default algorithm where st is from (1)
fo <- log(j) ~ data^(- a) * (d1 + d2*cos(log(data)*2*pi/p) + d3 *
sin(log(data)*2*pi/p))
nls(fo, start = st)
3) or nls with plinear algorithm. In that case the RHS of the formula is a matrix with column names d1, d2 and d3 such that first column multiplies d1, second d2 and third d3. Only the nonlinear parameters, i.e. a and p, are specified in start.
fo2 <- log(j) ~ data^(-a) * cbind(d1 = 1,
d2 = cos(log(data)*2*pi/p),
d3 = sin(log(data)*2*pi/p))
nls(fo2, start = c(a = 0.1, p = 0.1), algorithm = "plinear")
Note
The question did not include data and j but we can use these to try it out.
set.seed(123)
n <- 100
data <- runif(n, 1, 2)
j <- 1:n
o <- order(data)
j <- j[o]
data <- data[o]
I would like to add a p-value to a scatter-plot, while respecting APA style. This entails two elements: (a) an italicized p, and (b) stripping the leading zero (but also: formatting values smaller than .001 as < .001).
We can strip the leading zero with a custom formula
# Formatting formula
format.p <- function(p, precision = 0.001) {
digits <- -log(precision, base = 10)
p <- formatC(p, format = 'f', digits = digits)
p[p == formatC(0, format = 'f', digits = digits)] <- paste0('< ', precision)
sub("0", "", p)}
# Get p-value
(p = cor.test(mtcars$wt, mtcars$mpg)$p.value)
1.293959e-10
# Format p-value
(p = format.p(p))
"< .001"
# Make plot
library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg)) +
stat_smooth(geom="line",method="lm")+
annotate(geom="text",label=paste0("p = ", p),x=4.5,y=25,size=8)
We can also achieve the italicized p like this:
ggplot(mtcars,aes(x=wt,y=mpg)) +
stat_smooth(geom="line",method="lm") +
(geom="text",label=paste0("italic('p')~'='",p),parse=T,x=4.5,y=25,size=8)
But notice then that we lost the stripped zero (the leading zero is back while we don't want it). Any idea how to fix this?
Solution provided by #rawr in comments (thank you!)
The key was to change label=paste0("italic('p')~'='", p) to label=sprintf("italic('p')~'%s'", p).
Furthermore, in order to avoid having situations where the function would simultaneously output equal and smaller than signs (e.g., p = < .001), I have also modified the format.p() function to choose either < or = depending on the situation.
Here's the final solution:
# Formatting formula
format.p <- function(p, precision = 0.001) {
digits <- -log(precision, base = 10)
p <- formatC(p, format = 'f', digits = digits)
if (p < .001) {
p = paste0('< ', precision)}
if (p >= .001) {
p = paste0('= ', p) }
sub("0", "", p)
}
# Get p-value
(p = cor.test(mtcars$wt, mtcars$mpg)$p.value)
1.293959e-10
# Format p-value
(p = format.p(p))
"< .001"
# Make plot
library(ggplot2)
ggplot(mtcars,aes(x=wt,y=mpg)) +
stat_smooth(geom="line",method="lm")+
annotate(geom="text",label=sprintf("italic('p')~'%s'",p),parse=TRUE,x=4.5,y=25,size=8)
I am running a nonlinear regression model that needs initial values to start, but the number of variables I want to include may be too large to manually type all the values - therefore I was wondering if there's an alternative to that.
set.seed(12345)
y = rnorm(100, 1000,150)
x1 = rnorm(100,10000,251)
x2 = rnorm(100, 3000,654)
x3 = rnorm(100, 25000,100)
x4 = rnorm(100, 200000,589)
x5 = rnorm(100, 31657,296)
adstock <- function(x,rate=0){
return(as.numeric(stats::filter(x=log(x+1),filter=rate,method="recursive")))
}
library(minpack.lm)
nlsLM(y~b0
+ b1 * adstock(x1, r1)
+ b2 * adstock(x2, r2)
+ b3 * adstock(x3, r3)
+ b4 * adstock(x4, r4)
+ b5 * adstock(x5, r5)
, algorithm = "LM"
# this is where I need to paste the results from the loop
, start = c(b0=1,b1=1,b2=1,b3=1,b4=1,b5=1
,r1=0.1,r2=0.1,r3=0.1,r4=0.1,r5=0.1
)
# end
, control = list(maxiter = 200)
)
My idea was to use a loop to pass the values to the model, but I can't make it work (the following code should be for b_i coefficients)
test_start <- NULL
for(i in 1:(5+1)) {
test_start[i] = paste0("b",i-1,"=",1)
}
cat(test_start)
This is the result, which is not exactly what the model expects:
b0=1 b1=1 b2=1 b3=1 b4=1 b5=1
How can I pass the results of the loop to the model?
Also, how can I add r_i start coefficients to b_i start coefficients in the loop?
Any help would be very appreciated.
PS: at the moment I am interested to assign to each b0,b1,...,b5 the same value (in this case, 1) and to each r1,r2,...,r5 the same value (in this case, 0.1)
Define the data as DF and the formula as fo and then grep out the b and r variables. The line defining v creates a vector with their names and the line defining st a named vector with value 1 for the b's and 0.1 for the r's.
DF <- data.frame(y, x1, x2, x3, x4, x5)
n <- ncol(DF) - 1
rhs <- c("b0", sprintf("b%d * adstock(x%d, r%d)", 1:n, 1:n, 1:n))
fo <- reformulate(rhs, "y")
v <- grep("[br]", all.vars(fo), value = TRUE)
st <- setNames(grepl("b", v) + 0.1 * grepl("r", v), v)
st
nlsLM(fo, DF, start = st, algorithm = "LM", control = list(maxiter = 200))
Regarding the comment try defining rhs like this. In the first line take whatever subset of labs you want, e.g. labs <- labels(...)[1:9] or change the formula in the first line, e.g. labs <- labels(terms(y ~ .*(1 + x1), data = DF))
labs <- labels(terms(y ~ .^2, data = DF))
labs <- sub(":", "*", labs)
n <- length(labs)
rhs <- c("b0", sprintf("b%d * adstock(%s, r%d)", 1:n, labs, 1:n))
I am curious how one would edit the following solution from Jayden so that the equation may be formatted y = bx + a or y = bx - a? I wanted to make it look as clean as possible.
lm_eqn = function(m) {
l <- list(a = format(coef(m)[1], digits = 2),
b = format(abs(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
if (coef(m)[2] >= 0) {
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,l)
} else {
eq <- substitute(italic(y) == a - b %.% italic(x)*","~~italic(r)^2~"="~r2,l)
}
as.character(as.expression(eq));
}
I have tried eliminated in the %.% and that throws up an error and I have tried inverting the order, but am having issues with the syntax in the if/else section of the function. I also would like to make it where the equation is formatted such that the coeff (a) is presented without the negative sign. abs(a) returns |a|. Thanks for any input! It is appreciated!
This follows from another thread( Adding Regression Line Equation and R2 on graph)
If you want it in b*x+a form then just:
if (coef(m)[2] >= 0) {
eq <- substitute(italic(y) ==
b %.% italic(x) + a*","~~italic(r)^2~"="~r2, l)
} else {
eq <- substitute(italic(y) ==
- b %.% italic(x) + a *"," ~~ italic(r)^2 ~"="~r2, l)
}
Writing R expressions requires understanding that there is a syntax rule: token/separator/token, but you can use either "+" or "-" as a unary separator. The upper portion of the plotmath symbol table in ?plotmath has the acceptable separators. Spaces and linefeeds get ignored.
What error are you seeing? This works for me to give bx ± a as requested. You have to move the abs() to the definition of a instead of b and test coef(m)[1] instead of 2...
lm_eqn = function(m) {
l <- list(a = format(abs(coef(m)[1]), digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3));
if (coef(m)[1] >= 0) {
eq <- substitute(italic(y) == b %.% italic(x) + a*","~~italic(r)^2~"="~r2,l)
} else {
eq <- substitute(italic(y) == b %.% italic(x) - a*","~~italic(r)^2~"="~r2,l)
}
as.character(as.expression(eq));
}
When you want to use R functions in VBA via RExcel, you have to use
RInterface.RRun "..."
Then, if you'd like to define your own R function, you can simply
RInterface.RRun "y <- function(x) { ... }"
If y is made up by more than one command line, you can separate each line with ;, as you're used to do in R environment.
But... what if your y function is very very long?
A 20 ~ 30 rows R function is damn difficult to be written in such a way in VBA; and there's a limit to the length of VBA sentences.
So: how may I wrap?
Here's an example of a quite long R function: can you show me how to put in VBA using RExcel?
bestIV <- function(dT, IVTS.t, Spot, r) {
b <- r
xout <- seq(0, max(T), dT)
sfm <- matrix(NA, nrow = length(K), ncol = length(xout))
for(i in 1:length(K)) {
sfm[i,] <- approx(x = T, y = IVTS.t[i,], xout = xout, rule = 2)$y
}
sfm[,1] <- sfm[,1] + sfm[,2] - sfm[,3]
rownames(sfm) <- K
colnames(sfm) <- xout
Option <- matrix(NA, nrow = length(K), ncol = length(xout))
for(i in 1:length(K)) {
for(j in 1:length(xout)) {
TypeFlag <- ifelse(K[i] < Spot, 'p', 'c')
Option[i,j] <- GBSOption(TypeFlag = TypeFlag, S = Spot, X = K[i],
Time = xout[j] / 365, r = r, b = b,
sigma = sfm[i,j] / 100)#price
}
}
rownames(Option) <- K
colnames(Option) <- xout
dP <- (cbind(0, -t(apply(X = Option, MARGIN = 1, FUN = diff))) / Option)[,-(1:2)]
dV <- dP / dT
min.V <- which(dV == min(dV), arr.ind = TRUE, useNames = TRUE)
Strike <- as.numeric(dimnames(min.V)[1])
Maturity <- as.numeric(unlist(dimnames(dV)[2]))[min.V[2]]
Days <- dT
Mat <- c(dV[which(dV == min(dV))], Strike, Maturity, Days)
names(Mat) <- c('Value', 'Strike', 'Maturity', 'Days')
return(Mat)
}
Thanks,
Put your R code in your spreadhseet (in a range of cells) and use this function instead:
RInterface.RunRCodeFromRange range
Executes the commands in range on a worksheet
(allows to use commands prepared for interactive execution with R to be run in macro code)
You are passing a string as an argument to a VBA function. Thus your question reduces to "how can I concatenate strings in VBA".
The answer is to use the concatenation operator &, like this:
"a" & "b"
Say you have an R function:
y <- function(x, a, b){
return(x)
}
Then you can do this in VBA:
RInterface.RRun "y <- function(x, a, b) {" &
"return(x)" &
"}"