Problems with renaming columns via variables in R

Problems with renaming columns via variables in R - r

I'm having issues with a specific problem I have a dataset of a ton of matrices that all have V1 as their column names, essentially NULL. I'm trying to write a loop to replace all of these with column names from a list but I'm running into some issues.
To break this down to the most simple form, this code isn't functioning as I'd expect it to.
nameofmatrix <- paste('column_', i, sep = "")
colnames(eval(as.name(nameofmatrix))) <- c("test")
I would expect this to take the value of column_1 for example, and replace (in the 2nd line) with "test" as the column name.
I tried to break this down smaller, for example, if I run print(eval(as.name(nameofmatrix)) I get the object's column/rows printed as expected and if I run print(colnames(eval(as.name(nameofmatrix))) I'm getting NULL as expected for the column header (since it was set as V1).
I've even tried to manually type in the column name, such as colnames(column_1) <- c("test) and this successfully works to rename the column. But once this variable is put in the text's place as shown above, it does not work the same. I'm having difficulties finding a solution on how to rename several matrix columns after they have been created with this method. Does anyone have any advice or suggestions?
Note, the error I'm receiving on trying to run this is
Error in eval([as.name](nameofmatrix)) <- \`vtmp\` : could not find function "eval<-"

We could return the values of the objects in a list with get (if there are multiple objects use mget, then rename the objects in the list and update those objects in the global env with list2env
list2env(lapply(mget(nameofmatrix), function(x) {colnames(x) <- newnames
x}), .GlobalEnv)
It can also be done with assign
data(mtcars)
nameofobject <- 'mtcars'
assign(nameofobject, `colnames<-`(get(nameofobject),
c('mpg1', names(mtcars)[-1])))
Now, check the names of 'mtcars'
names(mtcars)[1]
#[1] "mpg1"

Related

Converting column names to upper case in a list of data frames using lapply explanation

This is probably a very simple problem but I have been struggling to search for this issue. Basically, I am using lapply to convert the column names to upper in a list of dataframes. My first attempt did not work, however adding ;x works. What exactly is going on?
This does not work:
df.list <- lapply(df.list,function(x) colnames(x) <- toupper(colnames(x)))
This does:
df.list <- lapply(df.list,function(x) {colnames(x) <- toupper(colnames(x));x})

Since you are modifying the object x (or in this case only the colnames of x) inside the function definition, you have to return the modified object x. This is happening by using ;x which can be read as a new line only returning the object x

R creating variable tables from list of variable names

I am currently trying to create a table from a list of variable names (something I feel should be relatively simple) and I can't for the life of me, figure out how to do it correctly.
I have a data table that I've named 'file' and there are a list of 3 variable names within this file. What I want to do is create a table of each variable and then rbind them together. For further context, these few lines of code will be worked into a much larger function. The list of variable names must be able to accommodate the number of variables the user defines.
I have tried the following:
file<-as.data.table(dt)
variable_list<-list("outcome", "type")
for (variable in variable_list){
var_table<-as.data.table(table(file$variable_list))
na_table<-as.data.table(table(is.na(file$variable)))
}
When I run the above code, R returns empty tables of var_table and na_table. What am I doing wrong?

An option is to loop over the 'variable_list, extract the column, apply tableandrbindwithindo.call`
do.call(rbind, lapply(variable_list, function(nm) table(file[[nm]])))
NOTE: assuming that the levels of the columns are the same
If the levels are not the same, make it same by converting the columns to factor with levels specified
lvls <- na.omit(sort(unique(unlist(file[, unlist(variable_list), with = FALSE]))))
do.call(rbind, lapply(variable_list, function(nm)
table(factor(file[[nm]], levels = lvls))))
Or if we have a data.table, use the data.table methods
rbindlist(lapply(variable_list, function(nm) file[, .N,by = c(nm)]), fill = TRUE)

The problem (at least one of the problems) might be that you are attempting to use the $ operator incorrectly. You cannot substitute text values into the second argument. You can use its syntactic equivalent [[ instead of $, however. So this would be a possible improvement. (I've not tested it since you provided no test material.)
file<-as.data.table(dt)
variable_list<-list("outcome", "type")
for (variable in variable_list){
var_table<-as.data.table(table(file[[variable]])) # clearly not variable_list
na_table<-as.data.table(table(is.na(file[[variable]] )))
}
I'm guessing you might have done something like, ...
var_table <- file[, table(variable ) ]
... since data.table syntax evaluates text values in the environment of the file (which in this case is confusing named "file". It's better not to use such names, since in this case there's also an R function by that name.

How to remove the first row from multiple dataframes?

I have multiple dataframes and would like to remove the first row in all of them.
I have tried using a for loop but cannot understand what I am doing wrong
for (i in cities){
i <- i[-1, ]
}
I get the following error code:
Error in i[-1, ] : incorrect number of dimensions

If we assume that the only objects in your workspace are dataframes then this might succeed:
cities <- objects() )
for (i in cities) { assign(i, get(i)[-1,])}
Explanation:
Two thing wrong with original codes:
One was already mentioned in comments. "df" is not the same as df. You need to use get to convert a character value to a "true" R name that is used to retrieve an object having that name. The result of object() is only a character value. In R the term "name" means a "language object". See the help page: ?mode. (There is potential confusion about rownames and columnnames which are always "character"-class.) It's not like SAS which is a macro language that has no such distinction.
The second error was trying to get substitution for the i on the left-hand side of <-. The would have failed even if you were working with actual R names. The assign function is designed to handle character values that are then converted to R names.

say you get a list of all the tables in your environment, and you call that list cities. You can't just iterate over each value of cities and change things, because in the list they are just characters.
Here is what you need:
for (i in cities){
tmp <- get(i) # load the actual table
tmp <- tmp[-1, ] # remove first column
assign(i, tmp) # re-assign table to original table name
}

Is there a more efficient/clean approach to an eval(parse(paste0( set up?

Sometimes I have code which references a specific dataset based on some variable ID. I have then been creating lines of code using paste0, and then eval(parse(...)) that line to execute the code. This seems to be getting sloppy as the length of the code increases. Are there any cleaner ways to have dynamic data reference?
Example:
dataset <- "dataRef"
execute <- paste0("data.frame(", dataset, "$column1, ", dataset, "$column2)")
eval(parse(execute))
But now imagine a scenario where dataRef would be called for 1000 lines of code, and sometimes needs to be changed to dataRef2 or dataRefX.

Combining the comments of Jack Maney and G.Grothendieck:
It is better to store your data frames that you want to access by a variable in a list. The list can be created from a vector of names using get:
mynames <- c('dataRef','dataRef2','dataRefX')
# or mynames <- paste0( 'dataRef', 1:10 )
mydfs <- lapply( mynames, get )
Then your example becomes:
dataset <- 'dataRef'
mydfs[[dataset]][,c('column1','column2')]
Or you can process them all at once using lapply, sapply, or a loop:
mydfs2 <- lapply( mydfs, function(x) x[,c('column1','column2')] )

#G.Grothendieck has shown you how to use get and [ to elevate a character value and return the value of a named object and then reference named elements within that object. I don't know what your code was intended to accomplish since the result of executing htat code would be to deliver values to the console, but they would not have been assigned to a name and would have been garbage collected. If you wanted to use three character values: objname, colname1 and colname2 and those columns equal to an object named after a fourth character value.
newname <- "newdf"
assign( newname, get(dataset)[ c(colname1, colname2) ]
The lesson to learn is assign and get are capable of taking character character values and and accessing or creating named objects which can be either data objects or functions. Carl_Witthoft mentions do.call which can construct function calls from character values.
do.call("data.frame", setNames(list( dfrm$x, dfrm$y), c('x2','y2') )
do.call("mean", dfrm[1])
# second argument must be a list of arguments to `mean`

Recoding over multiple data frames in R

(edited to reflect help...I'm not doing great with formatting, but appreciate the feedback)
I'm a bit stuck on what I suspect is an easy enough problem. I have multiple different data sets that I have loaded into R, all of which have different numbers of observations, but all of which have two variables named "A1," "A2," and "A3". I want to create a new variable in each of the three data frames that contains the value held in "A1" if A3 contains a value greater than zero, and the value held in "A2" if A3 contains a value less than zero. Seems simple enough, right?
My attempt at this code uses this faux-data:
set.seed(1)
A1=seq(1,100,length=100)
A2=seq(-100,-1,length=100)
A3=runif(100,-1,1)
df1=cbind(A1,A2,A3)
A3=runif(100,-1,1)
df2=cbind(A1,A2,A3)
I'm about a thousand percent sure that R has some functionality for creating the same named variable in multiple data frames, but I have tried doing this with lapply:
mylist=list(df1,df2)
lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2[x$A3>0]
return(x)
})
But the newVar is not available for me once I leave the lapply loop. For example, if I ask for the mean of the new variable:
mean(df1$newVar)
[1] NA
Warning message:
In mean.default(df1$newVar) :
argument is not numeric or logical: returning NA
Any help would be appreciated.
Thank you.

Well first of all, df1 and df2 are not data.frames but matrices (the dollar syntax doesn't work on matrices).
In fact, if you do:
set.seed(1)
A1=seq(1,100,length=100)
A2=seq(-100,-1,length=100)
A3=runif(100,-1,1)
df1=as.data.frame(cbind(A1,A2,A3))
A3=runif(100,-1,1)
df2=as.data.frame(cbind(A1,A2,A3))
mylist=list(df1,df2)
lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2
})
the code almost works but gives some warnings. In fact, there's still an error in the last line of the function called by lapply. If you change it like this, it works as expected:
lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2[x$A3>0] # you need to subset x$A2 otherwise it's too long
return(x) # better to state explicitly what's the return value
})
EDIT (as per comment):
as basically always happens in R, functions do not mutate existing objects but return brand new objects.
So, in this case df1 and df2 are still the same but lapply returns a list with the expected 2 new data.frames i.e. :
resultList <- lapply(mylist,function(x){
x$newVar=x$A1
x$newVar[x$A3>0]=x$A2[x$A3>0]
return(x)
})
newDf1 <- resultList[[1]]
newDf2 <- resultList[[2]]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Problems with renaming columns via variables in R - r

Related

Converting column names to upper case in a list of data frames using lapply explanation

R creating variable tables from list of variable names

How to remove the first row from multiple dataframes?

Is there a more efficient/clean approach to an eval(parse(paste0( set up?

Recoding over multiple data frames in R

Categories

Resources