Loop through a character vector to use in a function - r

I am conducting a methodcomparison study, comparing measurements from two different systems. My dataset has a large number of columns with variabels containing measurements from one of the two systems.
aX and bX are both measures of X, but from system a and b. I have about 80 pairs of variabels like this.
A simplified version of my data looks like this:
set.seed(1)
df <- data.frame(
ID = as.factor(rep(1:2, each=10)),
aX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
bX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
aY = rep(1:10+rnorm(10,mean=1,sd=0.5), 2),
bY = rep(1:10-rnorm(10,mean=1,sd=0.5),2))
head(df)
ID aX bX aY bY
1 1 1.686773 2.755891 2.459489 -0.6793398
2 1 3.091822 3.194922 3.391068 1.0513939
3 1 3.582186 3.689380 4.037282 1.8061642
4 1 5.797640 3.892650 4.005324 3.0269025
5 1 6.164754 6.562465 6.309913 4.6885298
6 1 6.589766 6.977533 6.971936 5.2074973
I am trying to loop through the elements of a character vector, and use the elements to point to columns in the dataframe. But I keep getting error messages when I try to call functions with variable names generated in the loop.
For simplicity, I have changed the loop to include a linear model as this produces the same type of error as I have in my original script.
#This line is only included to show that
#the formula used in the loop works when
#called with directly with the "real" column names
(broom::glance(lm(aX~bX, data = df)))$r.squared
[1] 0.9405218
#Now I try the loop
varlist <- c("X", "Y")
for(i in 1:length(varlist)){
aVAR <- paste0("a", varlist[i])
bVAR <- paste0("b", varlist[i])
#VAR and cVAR appear to have names identical column names in the df dataframe
print(c(aVAR, bVAR))
#Try the formula with the loop variable names
print((broom::glance(lm(aVAR~bVAR, data = df)))$r.squared)
}
The error messages I get when calling the functions from inside the loop vary according to the function I am calling, the common denominator for all the errors is that the occur when I try to use the character vector (varlist) to pick out specific columns.
Example of error messages:
rmcorr(ID, aVAR, bVAR, df)
Error in rmcorr(ID, aVAR, bVAR, df) :
'Measure 1' and 'Measure 2' must be numeric
or
broom::glance(lm(aVAR~bVAR, data = df))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion
Can you help me understand what goes wrong in the loop? Or suggest and show another way to acomplish what I am trying to do.

Variables aren't evaluated in formulas (the things with ~).
You can type
bert ~ ernie
and not get an error even if variables named bert and ernie do not exist. Formula store relationships between symbols/names and does not attempt to evaulate them. Also note we are not using quotes here. Variable names (or symbols) are not interchangeable with character values (ie aX is very different from "aX").
So when putting together a formula from string values, I suggest you use the reformualte() function. It takes a vector of names for the right-hand side and an optional value for the left hand side. So you would create the same formula with
reformulate("ernie", "bert")
# bert ~ ernie
And you can use the with your lm
lm(reformulate(bVAR, aVAR), data = df)

I'm too lazy to search for a duplicate on how to construct formulas programmatically, so here is a solution:
varlist <- c("X", "Y")
for(i in 1:length(varlist)){
#make these symbols:
aVAR <- as.symbol(paste0("a", varlist[i]))
bVAR <- as.symbol(paste0("b", varlist[i]))
#VAR and cVAR appear to have names identical column names in the df dataframe
print(c(aVAR, bVAR))
#Try the formula with the loop variable names
#construct the call to `lm` with `bquote` and `eval` the expression
print((broom::glance(eval(bquote(lm(.(aVAR) ~ .(bVAR), data = df)))))$r.squared)
}

Related

Following Error: Must subset columns with a valid subscript vector. x Can't convert from <double> to <integer> due to loss of precision

I was trying to score multiple psychological tests for a research.
When trying to create a sub scale with the data I cleaned I used the following function:
dat <- subset(data, select = c(CODE, GRUPPE, ZEITPUNKT, POS, NEG,
SELBSTW, LZ, ERFOLG , MISSERFOLG , OPT , CES, RISK, CESPSYCH
,CESSOMAT, HERAUS, SHAME, SELF ))
All scales consist of numeric vectors apart from "CODE".
Fehler: Must subset columns with a valid subscript vector.
x Can't convert from to due to loss of precision.
I'm really bad in R so I hope someone can help me how to solve that please
Edit:
Before I did following steps basically with all scales depending on their sub scales, need of recoding and way on scoring..:
POS <- rowMeans(data[,c("PANAS1", "PANAS3", "PANAS4","PANAS6", "PANAS10", "PANAS11", "PANAS13","PANAS15", "PANAS17", "PANAS18")])
NEG <- rowMeans(data[,c("PANAS2", "PANAS5", "PANAS7","PANAS8", "PANAS9", "PANAS12", "PANAS14","PANAS16","PANAS19", "PANAS20")])
OPT <- rowSums(data[,c("OPT1","OPT2","OPT3")])

R: transforming vector of objects in a vector of object names

Context: I am estimating several different specifications with the R package rugarch. Let's say 2 as an example (y is the data):
garchmodel<-ugarchspec()
garchfit = ugarchfit(spec=garchmodel, y)
egarchmodel<-ugarchspec(variance.model = list(model = "eGARCH")
egarchfit<-ugarchfit(spec=egarchmodel, y)
I wrote a function to gather the information criteria of the model:
CollectInfoCrieria <- function(models){
infoMat <- t(sapply(models, infocriteria))
return(infoMat)
}
If I then type
themodels = c(garchfit, egarchfit)
CollectInfoCrieria(themodels)
I obtain a matrix with the information criteria of each model specification per row. I would like to transform the model names into character strings to use as row names, as I wanted.
If I then define a function
getName<-function(names) deparse(substitute(names))
Then getName(garchfit) gives "garchfit", while
models <- c(garch11fit, egarchfit)
sapply(models, getName)
gives
[1] "X[[i]]" "X[[i]]"
How can I get
[1] "garch11fit" "egarchfit"
garch11fit=1
egarchfit=2
models <- c(garch11fit=garch11fit, egarchfit=egarchfit)
names(models)
If you name the variables, you can get their names back. But in your code, the names aren't associated in the models variable, only the values of garch11fit and egarchfit, so you can't get the names back.
This is how one can collect the object names inside the CollectInfoCrieria function:
CollectInfoCrieria <- function(...){
infoMat <- t(sapply(c(...), infocriteria))
models = as.character(eval(substitute(alist(...))));
out = data.frame(models,infoMat)
}

In R, how do I modify a dataframe column in a list given a string name

I'm new to R. Thank you for your patience. I'm working with the survey package.
Background: I'm writing a function that loops through combinations of predictor and outcome variables (i.e., svyglm(outcome~predictor)) in a complex survey to output crude prevalence ratios. For each outcome/predictor combination, I want to first relevel the predictor within the survey design object to ensure the output ratios are all > 1.
Specific problem: Given the survey design object name, column name and reference level as strings, how do I tell R I want said column releveled.
prams16 is the name of the survey design object which includes a list of 9 items, variables is the analytic dataset (data frame) within the survey design object and mrace is a column in the variables DF.
These work:
prams16$variables$mrace <- relevel(prams16$variables$mrace, ref="White")
prams16[["variables"]]["mrace"] <- relevel(prams16$variables$mrace, ref="White")
However, when I try to construct references to prams16$variables$mrace or prams16[["variables"]]["mrace"] with strings, nothing seems to work.
Thanks!
EDIT: Requested reproducible example of problem.
myPredictor <- as.factor(c("Red","White","Black","Red","Green","Black","White","Black","Red","Green","Black"))
myOutcome <- c(1,0,1,0,1,0,1,0,1,0,1)
myDF <- tibble(myPredictor, myOutcome)
myOtherStuff <- c("etc","etc")
myObj <- list(myDF=myDF,myOtherStuff=myOtherStuff)
#These work...
myObj$myDF$myPredictor <- relevel(myObj$myDF$myPredictor, ref="White")
str(myObj$myDF$myPredictor) #"White" is now the referent level
myObj[["myDF"]]["myPredictor"] <- relevel(myObj$myDF$myPredictor, ref="Red")
str(myObj$myDF$myPredictor) #"Red" is now the referent level
#How to construct relevel assignment statement from strings?
anObj <- "myObj"
aPredictor <- "myPredictor"
aRef <- "Green"
#Produces error
as.name(paste0(anObj,"$myDF$",aPredictor)) <- relevel(as.name(paste0(anObj,"$myDF$",aPredictor)), ref=aRef)
Here's a way to solve this using expression arithmetic. Our task is to construct and evaluate the following expression:
myObj$myDF[[aPredictor]] <- relevel( myObj$myDF[[aPredictor]], ref=aRef )
Step 1: Convert the string "myObj" to a symbolic name:
sObj <- rlang::sym(anObj) # Option 1
sObj <- as.name(anObj) # Option 2
Step 2: Construct the expression myObj$myDF[[aPredictor]]:
e1 <- rlang::expr( (!!sObj)$myDF[[aPredictor]] )
Here, we use !! to tell rlang::expr that we want to replace sObj with whatever symbol is stored inside that variable. Without !!, the expression would be sObj$myDF[[aPredictor]], which is not quite what we want.
Step 3: Construct the target expression:
e2 <- rlang::expr( !!e1 <- relevel(!!e1, ref=aRef) )
As before, !! replaces e1 with whatever expression is stored inside it (i.e., what we constructed in Step 2).
Step 4: Evaluate the expression and inspect the result:
eval.parent(e2)
## The column is now correctly releveled to Green
myObj$myDF$myPredictor
# [1] Red White Black Red Green Black White Black Red Green Black
# Levels: Green Black Red White

How to automate numerous iterations/combinations for a function's arguments in R?

I am somewhat new to R and I am currently trying to automate all possible iterations for a function's arguments in R. What I mean by this is say we have a function with five arguments that can either be TRUE or FALSE. I want to run every possible combination of these arguments and dump them all to several different variables.
Given the number of arguments and their binary nature (10 possible arguments, pick 5), this creates 252 possible argument combinations for the function. Is there any way to automate this process? Or am I stuck generating all 252 possible combinations in code lines? I'm using the auto.arima function, and want to test all possible combinations of lambda, allowmean, allowdrift, seasonal, and stationary. Here's the function:
ARIMA1<-auto.arima(x, d=NA, D=NA, max.p=5, max.q=5,
max.P=2, max.Q=2, max.order=5, max.d=2, max.D=1,
start.p=2, start.q=2, start.P=1, start.Q=1,
stationary=FALSE, seasonal=TRUE,
ic=c("aicc", "aic", "bic"), stepwise=TRUE, trace=FALSE,
approximation=(length(x)>100 | frequency(x)>12), xreg=NULL,
test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"),
allowdrift=TRUE, allowmean=TRUE, lambda=NULL, biasadj=FALSE,
parallel=FALSE, num.cores=2)
One method to do this would be to use expand.grid to construct a data.frame with all possible combinations, then loop through the rows of this data.frame and fill in the values.
Here's an example:
x <- 1:5
y <- 2:6
z <- c(1,3)
aa <- letters[7:10]
w <- c(FALSE, TRUE)
myInputs <- expand.grid(x, y, z, aa, w)
for(i in seq_len(nrow(myInputs))) {
print(myInputs[i, 1] + myInputs[i, 2] + myInputs[i, 3])
}
You can instead use these inputs to feed any function that you want.
Since expand.grid outputs a data.frame, there is no problem in mixing up vector types (like numeric, logical, character).
However, if you are inputting character vectors, and you want them to remain character (rather than being converted into factor variables, make sure to include the stringsAsFactors=FALSE argument.

apply using values of each line in a data.frame as parameters in R

I have 1 data.frame as follows, each line is a different Stock data :
Teste=data.frame(matrix(runif(25), nrow=5, ncol=5))
colnames(Teste) <- c("AVG_VOLUME","AVG_RETURN","VOL","PRICE","AVG_XX")
AVG_VOLUME AVG_RETURN VOL PRICE AVG_XX
1 0.7028197 0.9264265 0.2169411 0.80897110 0.3047671
2 0.7154557 0.3314615 0.4839466 0.63529520 0.5633933
3 0.4038030 0.4347487 0.3441471 0.07028743 0.7704912
4 0.5392530 0.6414982 0.4482528 0.11087518 0.3512511
5 0.8720084 0.9615865 0.8081017 0.45781973 0.0137508
What i want to do is to apply the function GBM from package sde (https://cran.r-project.org/web/packages/sde/sde.pdf) using the cols AVG_RETURN, VOL, PRICE as arguments for all lines in the data.frame.
Something like this :
Result <- apply(Teste,1,function(x) {
GBM(x[,"PRICE"],x[,"AVG_RETURN"],x[,"VOL"],1,252)
})
So i want the Result to be a data.frame that runs GBM for each Stock in the Teste data.frame.
How can i get this result ?
The answer to the narrow question about why you are getting errors is that when the apply function passes values it is only as a vector rather than a dataframe, so removing hte commas in the arguments to "[" will get you a result.
Result <- apply(Teste,1,function(x) {
GBM(x[,"PRICE"],x[,"AVG_RETURN"],x[,"VOL"],1,252)
})
If you need it to be a dataframe where each stock would be a column, and the input datastructure has meaningful stock names, then I suggest using:
dfRes <- setNames( data.frame(Result), rownames(Teste) )
I think the only way this could be meaningful in a risk analysis context is if many more simulation runs than these single instances are assembled in some higher level context.

Resources