In R, I have several datasets, and I want to use a loop to create new variables (columns) within each of them:
All dataframes have the same name structure, so that is what I am using to loop through them. Here is some pseudo-code with what I want to do
Name = Dataframe_1 #Assume the for-loop goes from Dataframe_1 to _10 (loop not shown)
#Pseudo-code
eval(as.name(Name))$NewVariable <- c("SomeString") #This is what I would like to do, but I get an error ("could not find function eval<-")
As a result, I should have the same dataframe with one extra column (NewVariable), where all rows have the value "SomeString".
If I use eval(as.name(Name)) I can call up the dataframe Name with no problem, but none of the usual data frame operators seem to work with that particular call (not <- assignment, or $ or [[]])
Any ideas would be appreciated, thanks in advance!
We can place the datasets in a list and create a new column by looping over the list with lapply. If needed, the original dataframe objects can be updated with list2env.
lst <- mget(paste0('Dataframe_', 1:10))
lst1 <- lapply(lst, transform, NewVariable = "SomeString")
list2env(lst1, envir = .GlobalEnv())
Or another option is with assign
nm1 <- ls(pattern = "^Dataframe_\\d+")
nm2 <- rep("NewVariable", length(nm1))
for(j in seq_along(nm1)){
assign(nm1[j], `[<-`(get(nm1[j]), nm2[j], value = "SomeString"))
}
Related
I want to create a ranked variable that will appear in multiple data frames.
I'm having trouble getting the ranked variable into the data frames.
Simple code. Can't make it happen.
dfList <- list(df1,df2,df3)
for (df in dfList){
rAchievement <- rank(df["Achievement"])
df[[rAchievement]]<-rAchievement
}
The result I want is for df1, df2 and df3 to each gain a new variable called rAchievement.
I'm struggling!! And my apologies. I know there are similar questions out there. I have reviewed them all. None seem to work and accepted answers are rare.
Any help would be MUCH appreciated. Thank you!
We can use lapply with transform in a single line
dfList <- lapply(dfList, transform, rAchievement = rank(Achievement))
If we need to update the objects 'df1', 'df2', 'df3', set the names of the 'dfList' with the object names and use list2env (not recommended though)
names(dfList) <- paste0('df", 1:3)
list2env(dfList, .GlobalEnv)
Or using the for loop, we loop over the sequence of the list, extract the list element assign a new column based on the rank of the 'Achievement'
for(i in seq_along(dfList)) {
dfList[[i]][['rAchievement']] <- rank(dfList[[i]]$Achievement)
}
I have a list of data frames allData. Each data frame has a column called idCode. How do I change the type of idCode to character with lapply (or some other function if possible)?
I've tried this but it only returns me a list of all the "idCode" columns. Nothing changed in the original allData list.
lapply(allData, function(x) x$idCode <- as.character(x$idCode))
I've also tried this:
lapply(allData, function(x) {x$idCode <- as.character(x$idCode) x})
With the hope that it will return all the data frames with idCode converted, so I may "stitch" them together again in a new list. However, it give me an error: unexpected symbol in "lapply(allData, function(x) {x$idCode <- as.character(x$idCode) x.
Is it possible to do this with lapply()? Or some other functions are also OK.
You have several options here:
You can just use a for loop and manipulate each dataframe with e.g. as.character()
for(i in 1:length(allData)){
allData[[i]]$idCode<-as.character(allData[[i]]$idCode)
}
or you use the global variable assignement '<<-'
lapply(X = 1:length(allData),FUN = function(x){
allData[[x]]$idCode<<-as.character(allData[[x]]$idCode)
return(NULL)
})
In order to change the type of a column in a dataframe you can also use the function class()
lapply(X = 1:length(allData),FUN = function(x){
class(allData[[x]]$idCode)<<-"character"
return(NULL)
})
I have a code similar to as shown below :
# initialize an empty list of dataframes here....
for (i in range 1:10) {
# create a new data frame here....
# append this newly created dataframe to the list here....
How can I create an empty list of dataframes at the start of the loop and then go on adding a newly created dataframe in each iteration of the for loop?
If the sole purpose is to merge the data frames, it may be easier to use the merge_all from the reshape package:
reshape::merge_all(your_list_with_dfs, ...)
Or alternatively, you may try:
do.call("rbind", your_list_with_dfs)
in order to append the rows.
The way I did it was as follows (as suggested by #akrun)
list_of_frames <- replicate(10, data.frame())
for (i in range 1:10) {
# create a new data frame called new_dataframe
list_of_frames[[i]] <- new_dataframe
myList <- vector(mode = "list", length = 10) # Create an empty list.
# Length depends on how many dataframe you want to append.
for (i in seq(1,10)) {
myList[[i]] <- new_dataframe #append your new dataframe
}
This is an edited version of
my initial question, which i obviously explained poorly, so ill try again.
I want to perform a function with every column of the dataframe, and name the resulting objects (here values of the class dist) according to the original dataframe and the colname:
library(vegan)
d1 <- as.data.frame(matrix(rnorm(12),4,3), ncol=3, dimnames=list(NULL, LETTERS[1:3])))
Fun <-function(x){
vegdist(decostand(x,"standardize",MARGIN=2), method="euclidean")
}
d1.A <- Fun(d1$A) # A being the colname of the first column of d1
d1.B <- Fun(d1$B)
d1.C <- Fun(d1$C)
This i want to do for more than 100 columns in my dataframe.
So, in short i want to apply my function to all columns of my dataframe and create result values with names that are made from the name of the original dataframe and a paste of the column name the function was working on.
Thank you very much!
If you want to clutter your global environment with lots of objects, one option is list2env or you can use assign (Though, I would not recommend it). Instead you can do all the operations/analysis by storing it in a list and later save/write to different files using write.table and lapply
lst <- setNames(lapply(d1, Fun),
paste("d1", colnames(d1), sep="."))
The above list could be used for most of the analysis. If you need as individual objects.
list2env(lst, envir=.GlobalEnv)
#<environment: R_GlobalEnv>
Now, you can get the individual objects by calling d1.A, d1.B etc.
d1.A
# 1 2 3
#2 1.9838499
#3 1.2754209 0.7084290
#4 2.2286961 0.2448462 0.9532752
I am assuming you need to create a number (equal to the number of columns of d1) of objects of class "dist".
If that is the case, you can do this:
for (i in 1:ncol(d1))
{
eval(parse(text=paste('d1.',colnames(d1)[i], "<-" ,"Fun(d1[,",i,"])", sep="")))
}
This evaluates in each iteration to:
d1.V1 <- Fun(d1$V1)
d1.V2 <- Fun(d1$V2)
d1.V3 <- Fun(d1$V3)
I have a number of dataframes (imported from CSV) that have the same structure. I would like to loop through all these dataframes and keep only two of these columns.
The loop below does not seem to work, any ideas why? Would ideally like to do this using a loop as I am trying to get better at using these.
frames <- ls()
for (frame in frames){
frame <- subset(frame, select = c("Col_A","Col_B"))
}
Cheers in advance for any advice.
For anyone interested I used Richard Scriven's idea of reading in the dataframes as one object, with a function added that showed where the file had been imported from. This allowed me to then use the Plyr package to manipulate the data:
library(plyr)
dataframes <- list.files(path = TEESMDIR, full.names = TRUE)
## Define a function to add the filename to the dataframe
read_csv_filename <- function(filename){
ret <- read.csv(filename)
ret$Source <- filename #EDIT
ret
}
list_dataframes <- ldply(dataframes, read_csv_filename)
selection <- llply(list_dataframes, subset, select = c(var1,var3))
The basic problem is that ls() returns a character vector of all the names of the objects in your environment, not the objects themselves. To get and replace an object using a character variable containing it's name, you can use the get()/assign() functions. You could re-write your function as
frames <- ls()
for (frame in frames){
assign(frame, subset(get(frame), select = c("Col_A","Col_B")))
}