Because of a bug in the neuralnet command in R, I am building a formula manually instead of using the '.' notation for all variables. Inside of a loop, the paste function is transposing the "~" and "y" as shown below.
for(i in 1:3)
{
f <- as.formula(paste(c("y",i,"~", paste(c("x1","x2"), collapse = " + ")), collapse=""))
message(f)
}
produces:
~y1x1 + x2
~y2x1 + x2
~y3x1 + x2
I tried reversing the order of the "~" and "y", but that gives an error "unexpected symbol". So the question is, how do I get:
y1~x1 + x2
y2~x1 + x2
y3~x1 + x2
Thanks!
This would be a method of producing 5 formula-objects with an sapply-loop. Note: Your current for-loop will over-write the f-values because you did not index the assignment:
sapply( paste("y",1:5,"~", paste(c("x1","x2"), collapse = " + "),
sep="") , as.formula)
$`y1~x1 + x2`
y1 ~ x1 + x2
<environment: 0x121e1b668>
$`y2~x1 + x2`
y2 ~ x1 + x2
<environment: 0x121e1b668>
$`y3~x1 + x2`
y3 ~ x1 + x2
<environment: 0x121e1b668>
$`y4~x1 + x2`
y4 ~ x1 + x2
<environment: 0x121e1b668>
$`y5~x1 + x2`
y5 ~ x1 + x2
<environment: 0x121e1b668>
There is really no way to have any other structure than a list-object, since formulas are language constructs and typically need to be inside list or list like structures and use "[[" to gain access to their values.
Related
I am new to coding and R and would like your help. For my analysis, I am trying to run regression on a time series data with 1 dependent variable (Y) and 4 Independent Variables (X1, X2, X3, X4). All these variables (Y and X) have 4 different transformations (For example for X1 - X1, SQRT(X1), Square(X1) and Ln(X1)). I want to run the regressions for all the possible combinations of Y (Y, SQRT(Y), Square(Y), Ln(Y)) and all the combinations of X values so that in the end I can decide by looking at the R squared value which variable to choose in which of its transformation.
I am currently using the code in R for linear regression and changing the variables manually which is taking a lot of time. Maybe there is a loop or something I can use for the regressions? Waiting for your kind help. Thanks
lm(Y ~ X1 + X2 + X3 + X4)
lm(SQRT(Y) ~ X1 + X2 + X3 + X4)
lm(Square(Y) ~ X1 + X2 + X3 + X4)
lm(Ln(Y) ~ 1 + X2 + X3 + X4)
lm(Y ~ SQRT(X1) + X2 + X3 + X4)
lm(Y ~ Square(X1) + X2 + X3 + X4)
....
lm(ln(Y)~ ln(X1) + ln(X2) + ln(X3) + ln(X4))
This is my original code.
Regression10 <- lm(Final_Data_v2$`10 KW Installations (MW)`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price (Eur/W)`)
summary(Regression10)
Regressionsqrt10 <- lm(Final_Data_v2$`SQRT(10 KW Installations (MW))`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price (Eur/W)`)
summary(Regressionsqrt10)
And so on..
Here is the link to my DATA: LINK
This picks the transformations of RHS variables such that adjusted R-squared is maximized. This statistical approach will almost certainly lead to spurious results though.
# simulate some data
set.seed(0)
df <- data.frame(Y = runif(100),
X1 = runif(100),
X2 = runif(100),
X3 = runif(100),
X4 = runif(100))
# create new variables for log/sqrt transormations of every X and Y
for(x in names(df)){
df[[paste0(x, "_log")]] <- log(df[[x]])
df[[paste0(x, "_sqrt")]] <- sqrt(df[[x]])}
# all combinations of Y and X's
yVars <- names(df)[substr(names(df),1,1)=='Y']
xVars <- names(df)[substr(names(df),1,1)=='X']
df2 <- combn(c(yVars, xVars), 5) %>% data.frame()
# Ensure that formula is in form of some Y, some X1, some X2...
valid <- function(x){
ifelse(grepl("Y", x[1]) &
grepl("X1", x[2]) &
grepl("X2", x[3]) &
grepl("X3", x[4]) &
grepl("X4", x[5]), T, F)}
df2 <- df2[, sapply(df2, valid)]
# Create the formulas
formulas <- sapply(names(df2), function(x){
paste0(df2[[x]][1], " ~ ",
df2[[x]][2], " + ",
df2[[x]][3], " + ",
df2[[x]][4], " + ",
df2[[x]][5])})
# Run linear model for each formula
models <- lapply(formulas, function(x) summary(lm(as.formula(x), data=df)))
# Return the formula that maximizes R-squared
formulas[which.max(sapply(models, function(x) x[['adj.r.squared']]))]
"Y ~ X1 + X2 + X3 + X4_log"
Consider expand.grid for all combinations of coefficients, filtering on each column name using grep. Then call model function that takes a dynamic formula with Map (wrapper to mapply) to build list of lm objects (equal to all combinations of coefficients) at N=1,024 items.
Below runs the equivalent polynomial operations for square root and squared. Note: grep is only adjustment required to actual variable names.
coeffs <- c(names(Final_Data_v2),
paste0("I(", names(Final_Data_v2), "^(1/2))"),
paste0("I(", names(Final_Data_v2), "^2)"),
paste0("log(", names(Final_Data_v2), ")"))
# BUILD DATA FRAME OF ALL COMBNS OF VARIABLE AND TRANSFORMATION TYPES
all_combns <- expand.grid(y_var = coeffs[grep("10 KW Installations (MW)", coeffs)],
x_var1 = coeffs[grep("10 KW Prio Installations (MW)", coeffs)],
x_var2 = coeffs[grep("FiT 10 KW (Cent/kWh)", coeffs)],
x_var3 = coeffs[grep("Electricity Prices 10 kW Cent/kW", coeffs)],
x_var4 = coeffs[grep("PV System Price (Eur/W)", coeffs)],
stringsAsFactors = FALSE)
# FUNCTION WITH DYNAMIC FORMULA TO RECEIVE ALL POLYNOMIAL TYPES
proc_model <- function(y, x1, x2, x3, x4) {
myformula <- paste0("`",y,"`~`",x1,"`+`",x2,"`+`",x3,"`+`",x4,"`")
summary(lm(as.formula(myformula), data=Final_Data_v2))
}
# MAP CALL PASSING COLUMN VALUES ELEMENTWISE AS FUNCTION PARAMS
lm_list <- with(all_combns, Map(proc_model, y_var, x_var1, x_var2, x_var3, x_var4))
I'm trying to regress returns against FF 3-factors with a rolling window.
To do so, I have found the function roll_lm in R, but the function is only producing regression output for one of the 3 variables.
The code is described here:
Y <- as.matrix(Portfolio_returns[,2])
X1 <- as.matrix(Mydata[,2])
X2 <- as.matrix(Mydata[,3])
X3 <- as.matrix(Mydata[,4])
Five_years_Rolling_reg <- roll_lm(X1 + X2 + X3,Y,60)
When I apply the coef function, I only get output for X1 and not X2 nor X3.
What am I doing wrong?
You problem seems to be a basic misunderstanding of how the function works. Looking at ?roll_lm
Arguments
x
matrix or xts object. Rows are observations and columns are the independent variables.
Currently it seems like you are trying to use a formula = X1 + X2 + X3 style of input, which is not what the help page is saying. As such it is adding the columns together as if it was: x1 = 2; x2 = 3; x1 + x2 = 5
Instead you should bind the rows together.
Y <- as.matrix(Portfolio_returns[,2])
X <- as.matrix(Mydata[,2:4]
roll_lm(X, Y, 60)
Or alternatively use the model.frame, model.response, model.matrix functions from base-R, which gives you the familiarity of the formula settings.
names(Mydata)[1:4] <- c("Y", "X1", "X2", "X3")
frame <- model.frame(Y ~ X1 + X2 + X3, data = Mydata)
X <- model.matrix(Y ~ X1 + X2 + X3, data = Mydata)
roll_lm(X, model.response(frame), 60)
I have a very complex S4 object (output from a lavaan model) which has slots within slots within slots and variables ($) at the deepest level of each of the deepest slots. How do I extract and store the object.size(and potentially other functions like length and dim and the object name) of every element within this object so that I can compare it to another object of the same class?
I have tried storing the output from str(obj) and unclass(obj) and then manipulating the output to extract the information I want, but it's turning out to be very tedious. Looping over names is equally difficult. Is there a way to "flatten" the object into a list? Is there a recursive function anyone can think of to repeatedly dig into each slot?
Edit
Here's an example, using the lavaan package I referenced above, though ideally the solution shouldn't be dependent on the specific object class and could work across classes:
library(lavaan)
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit <- sem(model, data=PoliticalDemocracy)
The object fit contains many slots and objects inside. I can, of course, extract information from a particular element like object.size(fit#Data#X[[1]]), but I'm looking for a generalized solution. The challenge is that I want to extract the same information about each element, regardless of its "depth".
Thanks!
seems like purrr package might be of help here, especially functions like flatten, transpose, map / at_depth - combined with char vectors as input you can easily extract stuff from deeply nested lists. for example, you could write down the "extractor" functions that you need separatly, then store them all in a list and use invoke (also from purrr) with your object as sole arg or invoke_map on many such objects.
Edit
Here is some code to help you extract object.size(fit#Data#X[[1]]) from one or many lavaan objects. Since the slots you are interested in, in practice are most probably at different depths, my guess is that there is no easy general solution.
The idea is that once you know the exact elements you are interested in - it is fairly straightforward to code up some helper functions to manipulate single/multiple such objects. The functions that I mentioned above provide friendly shortcuts for achieving this.
Let me know if i can be of further help.
library("lavaan")
library("tidyverse")
model <- '
# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
# say you have 3 different models
fit1 <- sem(model, data=PoliticalDemocracy)
fit2 <- sem(model, data=PoliticalDemocracy)
fit3 <- sem(model, data=PoliticalDemocracy)
# S4 objects have function slot() function for accessing so
object.size(fit#Data#X[[1]])
# becomes
slot(slot(fit, "Data"), "X")[[1]]
# since fit is an S4 object - you have to wrap it up
# in a list to manipulate it easier.
# above code becomes :
list(fit1) %>%
map(~ slot(., "Data")) %>%
map(~ slot(., "X")) %>%
flatten %>%
map(object.size)
# wrap up the above code in a helper function ...
extr_obj_size <- function(lavaan_fit) {
list(lavaan_fit) %>%
map(~ slot(., "Data")) %>%
map(~ slot(., "X")) %>%
map(object.size)
}
extr_obj_size(fit1)
# which you can further wrap up in a function that operates
# on a vector of such objects
extr_multiple_obj_size <- function(vec_lavaan_fits) {
vec_lavaan_fits %>%
map(extr_obj_size) %>%
flatten
}
c(fit,fit2,fit3) %>% extr_multiple_obj_size
Edit2
I don't know how helpful would the following code be in general but i jumbled up something that, given that you know the name of the slot you are interested in - would check at depth 1 and 2 and return you the corresponding value.
fit <- sem(model, data=PoliticalDemocracy)
slot_of_interest <- "eq.constraints"
# slot names at depth 1
depth1 <- names(getSlots("lavaan"))
# slot names at depth 2
depth2 <- depth1 %>% map(~ slotNames(slot(fit,.)))
# helper fun to check if a slot name of interest is inside a slot
in_slot <- function(x) slot_of_interest %in% x
# so if its at depth 1 - then just map slot()-function with that name
if (slot_of_interest %in% depth1) {
list(fit1) %>% map(~slot(., slot_of_interest))
} else {
# otherwise you would need to detect at which index at depth2 does this name appear
index1 <- depth2 %>% detect_index(in_slot)
# and map first the slot-name at that index - then the corresponding slot of interest
list(fit1) %>% map(~ slot(., depth1[index1])) %>% map(~ slot(., slot_of_interest))
}
I am trying to add term to a model formula in R. This is straightforward to do using update() if I enter the variable name directly into the update function. However it does not work if the variable name is in a variable.
myFormula <- as.formula(y ~ x1 + x2 + x3)
addTerm <- 'x4'
#Works: x4 is added
update(myFormula, ~ . + x4)
Output: y ~ x1 + x2 + x3 + x4
#Does not work: "+ addTerm" is added instead of x4 being removed
update(myFormula, ~ . + addTerm)
Output: y ~ x1 + x2 + x3 + addTerm
Adding x4 via the variable can be done in a slightly more complex way.
formulaString <- deparse(myFormula)
newFormula <- as.formula(paste(formulaString, "+", addTerm))
update(newFormula, ~.)
Output: y ~ x1 + x2 + x3 + x4
Is there a way to get update() to do this directly without needing these extra steps? I've tried paste, parse, and the other usual functions and they don't work.
For example, if paste0 is used the output is
update(myFormula, ~ . + paste0(addTerm))
Output: y ~ x1 + x2 + x3 + paste0(addTerm)
Does anybody have any recommendations on how to use a variable in update()?
Thanks
You can probably just do:
update(myFormula, paste("~ . +",addTerm))
When using the add1 function to consider new variables, I would like to reference all variables (either in some dataframe or global environment), but I can not figure out how to use the scope argument to do this.
I am aware I can use it like this
X = data.frame(replicate(4,rnorm(20))) ; y = rnorm(20)
lm1 = lm(y ~ 1)
out = add1(lm1, scope= ~X$X1 + X$X2 + X$X3)
but I want to avoid manually writing in every variable.
As I have seen in other questions, I know the . symbol will not work but I am not sure why. It stands for what is already there, so if I do
x1 = rnorm(20) ; x2 = rnorm(20) ; x3 = rnorm(20) ; x4 = rnorm(20) ; y = rnorm(20)
out = add1(lm1, scope= ~ . )
it does not use what is already in the global environment.
I know the documentation says that scope must be "a formula giving the terms to be considered", but that is usually where . can be used to reference all variables.
Thanks in advance.
Also note I have read Chp 7 of MASS, and these related threads
scope from add1()-command in R
http://tolstoy.newcastle.edu.au/R/help/02b/3588.html
This is an even simpler answer, which I found after browsing this question
http://r.789695.n4.nabble.com/glm-formula-vs-character-td2543061.html
x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
y = rnorm(100)
BaseReg = lm(y ~ 1)
newdf = data.frame(x1,x2,x3)
out = add1(BaseReg, names(newdf))
It is baffling that such a simple way to get this was not stated in the documentation for add1.
As the help page for add1 says the formula ~. means "what's already there". It is not any simpler to use as.formula for small numbers of names but this approach can be using in a function or script. (Generally one would expect to put the X's and Y in the same dataframe.)
as.formula(paste("~", paste(names(YX)[-c(1,5)],collapse="+")))
#~X1 + X2 + X3
YX <- cbind(y,X)
form <- as.formula(paste("~", paste(names(YX)[-c(1,5)],collapse="+")))
add1(lm1, form)
You appear to have stumbled across a more efficient strategy. If using a data object with column names: "y" "X1" "X2" "X3"
"X4:
> formula(YX)
y ~ X1 + X2 + X3 + X4
> formula(YX)[-2]
~X1 + X2 + X3 + X4
> as.list(formula(YX))
[[1]]
`~`
[[2]]
y
[[3]]
X1 + X2 + X3 + X4
> names(YX)
[1] "y" "X1" "X2" "X3" "X4"
You can see that a formula object has as its first element the formula-defining tilde which is really an R function. The second element is the LHS expression and the third elemtn is the RHS expression.
Here is something I found that works:
X = data.frame(replicate(4,rnorm(20)))
lm1 = lm(X1 ~ 1 ,data=X)
add1(lm1, scope=formula(X)[-2])
Granted, I have no idea why this is the case
formula(X)[-2]
# ~X2 + X3 + X4
I just found it by accident. Other things like formula(X)[-1] and formula(X)[-3] also return other things which are equally bizarre to me.