I try to calculate the column means for diffrent groups in R. there exist several methods to assign groups and so two columns where created that contain diffrent groupings.
# create a test df
df.abcd.2<-data.frame(Grouping1=c("a","f","a","d","d","f","a"),Grouping2=c("y","y","z","z","x","x","q"),Var1=sample(1:7),Var2=sample(1:7),Var3=rnorm(1:7))
df.abcd.2
Now I created a loop with assign, lapply, split and colMeans to get my results and store the in diffrent dfs. The loop works fine.
#Loop to create the colmeans and store them in dataframes
for (i in 1:2){
nam <- paste("RRRRRR",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df.abcd.2[,3:5], df.abcd.2[,i]), colMeans)
)
)
}
So now i would like to create a function to apply this method on diffrent dataframes. My attemp looked like this:
# 1. function to calculate colMeans for diffrent groups
# df= desired datatframe,
# a=starting column: beginning of the columns that contain the groups, b= end of columns that contain the groups
# c=startinc column: beginning of columns to be analized, d=end of columns do be analized
function.split.colMeans<-function(df,a,b,c,d)
{for (i in a:b){
nam <- paste("OOOOO",deparse(i), sep=".")
assign(nam, as.data.frame(
lapply(
split(df[,c:d], df[,i]), colMeans)
)
)
}
}
#test the function
function.split.colMeans(df.abcd.2,1,2,3,5)
So when I test this function I get neither an error message nor results... Can anyone help me out, please?
It's working perfectly. Read the help for assign. Learn about frames and environments.
In other words, its creating the variables inside your function, but they don't leak out into the environment you see when you do ls() at the command line. If you put print(ls()) inside your functions loop you'll see them, but when the function ends, they disappear.
Normally, the only way functions interact with their calling environment is by their return value. Any other method is entering a whole world of pain.
DONT use assign to create things with sequential or informative names. Ever. Unless you know what you are doing, which you don't... Stick them in lists, then you can index the parts for looping and so on.
Related
Very green R user here. Sorry if this is asked and answered somewhere else, I haven't been able to find anything myself.
I can't figure out why I can't get a for loop to work define multiple new dataframes but looping through a predefined list.
My list is defined from a subset of variable names from an existing dataframe:
varnames <- colnames(dplyr::select(df_response, -1:-4))
Then I want to loop through the list to create a new dataframe for each variable name on the list containing the results of summary function:
for (i in varnames){
paste0("df_",i) <- summary(paste0("df$",i))
}
I've tried variations with and without paste function but I can't get anything to work.
paste0 returns a string. Both <- and $ require names, but you can use assign and [[ instead. This should work:
varnames <- colnames(dplyr::select(df_response, -1:-4))
for (i in varnames){
assign(
paste0("df_", i),
summary(df[[i]])
)
}
Anyway, I'd suggest working with a list and accessing the summaries with $:
summaries <- df_response |>
dplyr::select(-(1:4)) |>
purrr::imap(summary)
summaries$firstvarname
I'm having issues with a specific problem I have a dataset of a ton of matrices that all have V1 as their column names, essentially NULL. I'm trying to write a loop to replace all of these with column names from a list but I'm running into some issues.
To break this down to the most simple form, this code isn't functioning as I'd expect it to.
nameofmatrix <- paste('column_', i, sep = "")
colnames(eval(as.name(nameofmatrix))) <- c("test")
I would expect this to take the value of column_1 for example, and replace (in the 2nd line) with "test" as the column name.
I tried to break this down smaller, for example, if I run print(eval(as.name(nameofmatrix)) I get the object's column/rows printed as expected and if I run print(colnames(eval(as.name(nameofmatrix))) I'm getting NULL as expected for the column header (since it was set as V1).
I've even tried to manually type in the column name, such as colnames(column_1) <- c("test) and this successfully works to rename the column. But once this variable is put in the text's place as shown above, it does not work the same. I'm having difficulties finding a solution on how to rename several matrix columns after they have been created with this method. Does anyone have any advice or suggestions?
Note, the error I'm receiving on trying to run this is
Error in eval([as.name](nameofmatrix)) <- \`vtmp\` : could not find function "eval<-"
We could return the values of the objects in a list with get (if there are multiple objects use mget, then rename the objects in the list and update those objects in the global env with list2env
list2env(lapply(mget(nameofmatrix), function(x) {colnames(x) <- newnames
x}), .GlobalEnv)
It can also be done with assign
data(mtcars)
nameofobject <- 'mtcars'
assign(nameofobject, `colnames<-`(get(nameofobject),
c('mpg1', names(mtcars)[-1])))
Now, check the names of 'mtcars'
names(mtcars)[1]
#[1] "mpg1"
I want to use function for repetitively making up set with different names.
for example, if I have 5 random vectors.
number1<-sample(1:10, 3)
number2<-sample(1:10, 3)
number3<-sample(1:10, 3)
number4<-sample(1:10, 3)
number5<-sample(1:10, 3)
Then, I will use these vectors for selecting rows in raw data set(i.e. dataframe)
testset1<-raw[number1,]
testset2<-raw[number2,]
testset3<-raw[number3,]
tsetset4<-raw[number4,]
testset5<-raw[number5,]
It takes lot of spaces in manuscript for writing up each commands. I'm trying to shorten these commands with using 'function'
However, I found that it is hard to use variables in a function statement for writing 'text argument'. For example, it is easy to use variables like this.
mean_function<-function(x){
mean(x)
}
But, I want to use function like this.
testset "number with 1-5" <-raw[number"number 1-5",]
I would really appreciate your help.
You don't need to create a function for this task, simply use lapply to loop over the list of elements produced by mget(), then set some names and finally put all results in the global environment:
rowSelected <-lapply(mget(paste0("number", 1:5)), function(x) raw[x, ])
names(rowSelected) <- paste0("testset", 1:5)
list2env(rowSelected, envir = .GlobalEnv)
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
Sometimes I have code which references a specific dataset based on some variable ID. I have then been creating lines of code using paste0, and then eval(parse(...)) that line to execute the code. This seems to be getting sloppy as the length of the code increases. Are there any cleaner ways to have dynamic data reference?
Example:
dataset <- "dataRef"
execute <- paste0("data.frame(", dataset, "$column1, ", dataset, "$column2)")
eval(parse(execute))
But now imagine a scenario where dataRef would be called for 1000 lines of code, and sometimes needs to be changed to dataRef2 or dataRefX.
Combining the comments of Jack Maney and G.Grothendieck:
It is better to store your data frames that you want to access by a variable in a list. The list can be created from a vector of names using get:
mynames <- c('dataRef','dataRef2','dataRefX')
# or mynames <- paste0( 'dataRef', 1:10 )
mydfs <- lapply( mynames, get )
Then your example becomes:
dataset <- 'dataRef'
mydfs[[dataset]][,c('column1','column2')]
Or you can process them all at once using lapply, sapply, or a loop:
mydfs2 <- lapply( mydfs, function(x) x[,c('column1','column2')] )
#G.Grothendieck has shown you how to use get and [ to elevate a character value and return the value of a named object and then reference named elements within that object. I don't know what your code was intended to accomplish since the result of executing htat code would be to deliver values to the console, but they would not have been assigned to a name and would have been garbage collected. If you wanted to use three character values: objname, colname1 and colname2 and those columns equal to an object named after a fourth character value.
newname <- "newdf"
assign( newname, get(dataset)[ c(colname1, colname2) ]
The lesson to learn is assign and get are capable of taking character character values and and accessing or creating named objects which can be either data objects or functions. Carl_Witthoft mentions do.call which can construct function calls from character values.
do.call("data.frame", setNames(list( dfrm$x, dfrm$y), c('x2','y2') )
do.call("mean", dfrm[1])
# second argument must be a list of arguments to `mean`