paste not returning values concatenated - r

I am trying to get the column names of a dataframe to use them in another call, but this apply call returns the values separated, instead of concatenated correctly. What did I do wrong here?
df<-data.frame(c(1,2,3),c(4,5,6))
colnames(df)<-c("hi","bye")
apply(df,2,function(x){
paste("subscale_scores$",colnames(x),sep="")
#this is the command I am eventually trying to run
#lm(paste("subscale_scores",colnames(x))~surveys$npitotal+ipip$extraversion+ipip$agreeableness+ipip$conscientiousness+ipip$emotionalStability+ipip$intelImagination)
})
Goal output:
subscale_scores$hi
subscale_scores$bye

Is there any need for the apply?
Is this what you mean?
paste0('subscale_scores$', names(df))
# [1] "subscale_scores$hi" "subscale_scores$bye"
if you need them concatenated by newline say, add , sep='\n'.
The paste0 is shorthand for paste(..., sep="").
A note on your lm call later - if you want to do lm(Y ~ ...) where Y is each of your columns separately, try:
lms <- lapply(colnames(df),
function (y) {
# construct your formula
frm <- paste0('subscale_scores$', y, ' ~ surveys$npitotal+ipip$extraversion+ipip$agreeableness+ipip$conscientiousness+ipip$emotionalStability+ipip$intelImagination')
lm(frm)
})
names(lms) <- colnames(df)
Then lms$hi will contain the output of lm(subscale_scores$hi ~ ...) and so on.
Or if the aim was to combine all the columns together (Y1 + Y2 ~ ...)
Then paste0('subscale_scores$', names(df), collapse='+') will give you subscale_scores$hi+subscale_scores$bye

How about this?
unlist(lapply(colnames(df),function(x){
paste("subscale_scores$",x,sep="")
}))

Related

R: Define ranges from text using regex

I need a way to call defined variables dependant from a string within text.
Let's say I have five variables (r010, r020, r030, r040, r050).
If there is a given text in that form "r010-050" I want to have the sum of values from all five variables.
The whole text would look like "{r010-050} == {r060}"
The first part of that equation needs to be replaced by the sum of the five variables and since r060 is also a variable the result (via parsing the text) should be a logical value.
I think regex will help here again.
Can anyone help?
Thanks.
Define the inputs: the variables r010 etc. which we assume are scalars and the string s.
Then define a pattern pat which matches the {...} part and a function Sum which accepts the 3 capture groups in pat (i.e. the strings matched to the parts of pat within parentheses) and performs the desired sum.
Use gsubfn to match the pattern, passing the capture groups to Sum and replacing the match with the output of Sum. Then evaluate it.
In the example the only variables in the global environment whose names are between r010 and r050 inclusive are r010 and r020 (it would have used more had they existed) and since they sum to r060 it returned TRUE.
library(gsubfn)
# inputs
r010 <- 1; r020 <- 2; r060 <- 3
s <- "{r010-050} == {r060}"
pat <- "[{](\\w+)(-(\\w+))?[}]"
Sum <- function(x1, x2, x3, env = .GlobalEnv) {
x3 <- if(x3 == "") x1 else paste0(gsub("\\d", "", x1), x3)
lst <- ls(env)
sum(unlist(mget(lst[lst >= x1 & lst <= x3], envir = env)))
}
eval(parse(text = gsubfn(pat, Sum, s)))
## [1] TRUE

R: created a names vector containing the means of multiple numeric vectors

I have over 20 numeric vectors which consist of a series of values. each vector is distinguished by a letter, e.g. val_a, val_b, val_c etc...
I would like to put the means from each of these vectors into a single named vector. I could of course do this in a laborious manner like so:
obs <- c("val_a" = round(mean(val_a),3),
"val_b" = round(mean(val_b),3),
"val_c" = round(mean(val_c),3))
But with 20 vectors this then becomes tedious to write out, and not to mention an inelegant solution. How can I create the named vector in a more succinct way? I have made an attempt using a for loop, as so:
obs <- c(for (j in 1:20) {
assign(paste("val",letters[j], sep = "_"),
mean(as.name(paste('val',letters[j], sep = '_'))),)
})
In the right hand argument passed to assign, "as.name" is used in order to remove the quotation marks from output of "paste". So the second argument passed to assign returns a character which has the exact same name as the numeric vector that I want get the mean of, e.g. val_a. But I get the error messsage:
Warning messages:
1: In mean.default(as.name(paste("val", letters[j], sep = "_"))) :
argument is not numeric or logical: returning NA
Does anyone know how to accomplish this?
Solution
To build on bouncyball's comment so you have a full answer, you can do this:
sapply(paste('val', letters[1:20], sep='_'), function(x) round(mean(get(x)), 3))
Explanation
For an object in your environment called x, get("x") will return x. See help("get"). Then we can do this for every element of paste('val', letters[1:20], sep='_') using sapply(), or if you like, a loop.
Example
val_a <- rnorm(100)
val_b <- rnorm(100)
val_c <- rnorm(100)
sapply(paste('val', letters[1:3], sep='_'), function(x) round(mean(get(x)), 3))
val_a val_b val_c
-0.09328504 -0.15632654 -0.09759111

R: dynamic arguments

I'm using an R function which requires a list of variables as input arguments in the following format:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Basically the input arguments are column names in the table (and are not to be enclosed in ""). I have a list of these variables that I want to add one by one; i.e. run the function with one variable first, get the output, and incrementally adding variables (getting an output each time) i.e.
iteration 1:
output <- funName(gender ~ height, data=tableName)
iteration 2:
output <- funName(gender ~ height + weight, data=tableName)
iteration 3:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Is this possible?
Try the following:
# vector of variable names
myNames <- c("gender", "height", "weight", "varName4")
# print out results
for(i in 2:4) {
print(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))))
}
Of course, you can replace print with the appropriate funName, such as lm, along with additional arguments. So
for(i in 2:4) {
lm(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))), data=tableName)
}
Should work as you would expect it to. You could also use lapply if you wanted to save the results in an orderly fashion:
temp <- lapply(2:4, function(i) as.formula(paste(myNames[1], "~",
paste(myNames[2:i], collapse="+"))))
will save a list of formulas, for example.
Using the reformulate function as mentioned by #ben-bolker, you can simplify the web of paste functions:
for(i in 2:4) {
print(reformulate(myNames[2:i], response = myNames[1], intercept = TRUE))
}

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.
A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

Entering variables into regression function

I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list.
So I will have a loop where I will want to go through each of these different values and put it in a formula.
something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i].
Then I put the time and the chosen feature to a dataframe to be used for regression
feature_list<- c("GPRS")
time_list<-c("time")
calc<-0
feature_dim <- length(feature_list)
time_dim <- length(time_list)
data <- read.csv("data.csv", header = TRUE, sep = ";")
result <- matrix(nrow=0, ncol=5)
errors<-matrix(nrow=0, ncol=3)
for(i in 1:feature_dim) {
my_feature <- feature_list[i]
my_time <- time_list[i]
fitdata <- data.frame(data[my_feature], data[my_time])
for(j in 1:60) {
my_b <- 0.0001 * (2^j)
for(k in 1:60) {
my_c <- 0.0001 * (2^k)
cat("Feature: ", my_feature, "\t")
cat("b: ", my_b, "\t")
cat("c: ", my_c, "\n")
err <- try(nlsfit <- nls(GPRS ~ 53E5*exp(-1*b*exp(-1*c*time)), data=fitdata, start=list(b=my_b, c=my_c)), silent=TRUE)
calc<-calc+1
if(class(err) == "try-error") {
next
}
else {
coefs<-coef(nlsfit)
ess<-deviance(nlsfit)
result<-rbind(result, c(coefs[1], coefs[2], ess, my_b, my_c))
}
}
}
}
Now in the nls() call I want to be able to call my_feature instead of just "A" or "B" or something and then to the next one on the list. But I get an error there. What am I doing wrong?
You can use paste to create a string version of your formula including the variable name you want, then use either as.formula or formula functions to convert this to a formula to pass to nls.
as.formula(paste(my_feature, "~ 53E5*exp(-1*b*exp(-1*c*time))"))
Another option is to use the bquote function to insert the variable names into a function call, then eval the function call.
I worked with R a while ago, maybe you can give this a try:
What you want is create a formula with a list of variables right?
so if the response variable is the first element of your list and the others are the explanatory variables you could create your formula this way:
my_feature[0] ~ reduce("+",my_feature[1:]) . This might work.
this way you can create formulae that depends on the variables in my_features.

Resources