I have several data frames df1, df, 2...., df10. Columns (variables) are the same in all of them.
I want to create a new variable within each of them. I can easily do it "manually" as follows:
df1$newvariable <- ifelse(df1$oldvariable == 999, NA, df1$oldvariable)
or, alternatively
df1 = transform(df1, df1$newvariable= ifelse(df1$oldvariable==999, NA, df1$oldvariable)))
Unfortunately I'm not able to do this in a loop. If I write
for (i in names) { #names is the list of dataframes
i$newvariable <- ifelse(i$oldvariable == 999, NA, i$oldvariable)
}
I get the following output
Error in i$oldvariable : $ operator is invalid for atomic vectors
What I'd do is to pool all data.frame on to a list and then use lapply as follows:
df1 <- as.data.frame(matrix(runif(2*10), ncol=2))
df2 <- as.data.frame(matrix(runif(2*10), ncol=2))
df3 <- as.data.frame(matrix(runif(2*10), ncol=2))
df4 <- as.data.frame(matrix(runif(2*10), ncol=2))
# create a list and use lapply
df.list <- list(df1, df2, df3, df4)
out <- lapply(df.list, function(x) {
x$id <- 1:nrow(x)
x
})
Now, you'll have all the data.frames with a new column id appended and out is a list of data.frames. You can access each of the data.frames with x[[1]], x[[2]] etc...
This has been asked many times. The $<- is not capable of translating that "i" index into either the first or second arguments. The [[<- is capable of doing so for the second argument but not the first. You should be learning to use lapply and you will probably need to do it with two nested lapply's, one for the list of "names" and the other for each column in the dataframes. The question is incomplete since it lacks specific examples. Make up a set of three dataframes, set some of the values to "999" and provide a list of names.
Related
I have a list of titles that I would like to iterate over and create/save data frames to. I have tried the using the paste() function (as seen below) but that does not work for me. Any advice would be greatly appreciated.
samples <- list("A","B","C")
for (i in samples){
paste(i,sumT,sep="_") <- data.frame(col1=NA,col1=NA)
}
My desired output is three empty data frames named: A_sumT, B_sumT and C_sumT
Here's an answer with purrr.
samples <- list("A", "B", "C")
samples %>%
purrr::map(~ data.frame()) %>%
purrr::set_names(~ paste(samples, "sumT", sep="_"))
Consider creating a list of dataframes and avoid many separate objects flooding global environment as this example can extend to hundreds and not just three. Plus with this approach, you will maintain one container capable of running bulk operations across all dataframes.
By using sapply below on a character vector, you create a named list:
samples <- c("A","B","C") # OR unlist(list("A","B","C"))
df_list <- sapply(samples, function(x) data.frame(col1=NA,col2=NA), simplify=FALSE)
# RUN ANY DATAFRAME OPERATION
head(df_list$A)
tail(df_list$B)
summary(df_list$C)
# BULK OPERATIONS
stacked_df <- do.call(rbind, df_list)
stacked_df <- do.call(cbind, df_list)
merged_df <- Reduce(function(x,y) merge(x,y,by="col1"), df_list)
Or if you need to rename list
# RENAME LIST
df_list <- setNames(df_list, paste0(samples, "_sumT"))
# RUN ANY DATAFRAME OPERATION
head(df_list$A_sumT)
tail(df_list$B_sumT)
summary(df_list$C_sumT)
I have a list of data.frames, and I want to conditionally reassign values in the data.frame. If I were just assigning to a single data.frame I would say something like
DF[DF==9] <- NA
to set all entries in the data.frame that are 9 to NA. However, when I try to use lapply to do this same procedure on each data.frame in a list:
List_of_DFs <- list(DF1, DF2, DF3)
List_of_DFs <- lapply(List_of_DFs, function(x) x[x==9] <- NA)
Instead of each value of 9 becoming NA, the entire list entry becomes NA. So in the case above, List_of_DFs becomes NA,NA,NA.
I know I can do this with a for loop and the [[]] subsetting operator, but I figure there must be a better method.
If it's interesting or relevant, the motivation behind this problem is the list of data.frames is from XLConnect, and I will subsequently bind the data.frames by row, but I first want to drop some character values in a column that should be all numeric, so I can coerce to numeric, and subsequently bind by row.
We need to return x to get the data.frame
lapply(List_of_DFs, function(x) {x[x==9] <- NA; x})
Or another option is replace
lapply(List_of_DFs, function(x) replace(x, x == 9, NA))
If we are using dplyr with mutate_each we can change the column values that are 9 to NA
lapply(List_of_DFs, function(x) x %>%
mutate_each(funs(replace(., .==9, NA))))
Let's say I have a list of 30 data.frames, each containing 2 variables (called value, and rank), called myList
I'd know I can use
my.DF <- do.call("cbind", myList)
to create the output my.DF containing all the variables next to each other.
It is possible to cbind each variable individually into it's own data.frame i.e to just have a new data.frame of just the 2nd variable?
We can extract the second column by looping over the list (lapply) and wrap with data.frame.
data.frame(lapply(myList, `[`, 2))
If we want to separate the variables,
lapply(names(myList[[1]]), function(x)
do.call(cbind,lapply(myList, `[`, x)))
data
set.seed(24)
myList <- list( data.frame(value=1:6, rank= sample(6)),
data.frame(value=7:12, rank=sample(6)))
Say I have a list dflist which contains dataframes df1 and df2.
df1 <- data.frame(VAR1 = letters[1:10], VAR2 = seq(1:10))
df2 <- data.frame(VAR3 = letters[11:20], VAR4 = seq(11:20))
dflist <- list(df1 = df1, df2 = df2)
In general, I want to apply a single argument function to each of the variables in each dataframe in the list. To make the question more concrete, say I'm interested in setting the variable names to lowercase. Using a dataframe paradigm, I'd just do this:
colnames(df1) <- tolower(colnames(df1))
colnames(df2) <- tolower(colnames(df2))
However, this becomes prohibitive when I have dozens of variables in each of the 20 or 30 dataframes I'm working on, hence the shift to using lists.
I'm aware that this question stems from my fundamental misunderstanding of the *apply family of functions, but I've been unable to locate examples of functions applied to deeper than the first sublevel of a list. Thanks for any input.
As #akrun suggested, the answer is simply:
lapply(dflist, function(x) {colnames(x) <- tolower(colnames(x)); x })
I have a function to deduplicate a data frame so that each person (indexed by PatID) is represented once by the latest record (largest RecID):
dedupit <- function(x) {
x <- x[order(x$PatID, -x$RecID),]
x <- x[ !duplicated(x$PatID), ]
return(x)
}
It can deduplicate and replace a dataframe if I do:
df <- dedupit(df)
But I have multiple data frames that need deduplication. Rather than write the above code for each individual data frame, I would like to apply a the dedupit function across multiple dataframes at once so that it replaces the unduplicated dataframe with the duplicated version.
I was able to make a list of the dataframes and lapply the function across each element in the list with:
listofdifs <- list(df1, df2, ....)
listofdfs <- lapply(trial, function(x) dedupit(x))
Though, it only modifies the elements of the list and does not replace the unduplicated dataframes. How do I apply this function to modify and replace multiple dataframes?
Does it work?
Name your dataframes when creating the list, so you can recover them afterwards
list.df <- list(df1 = df1, df2 = df2, df3 = df3)
list2env(lapply(list.df, dedupit), .GlobalEnv)
As a result your dataframes df1, df2, df3 will be the deduplicate version.
unlist a list of dataframes