I'm beginning with R so I'm not really good at searching relevant answer for my question. I am sorry if similar questions have been asked.
I have a list made of data frames and lists.
I'd like to know how to keep only data frames so that I can bind them together to produce on huge data frame.
here I give you an example :
L1 <- list(c(1, "abc", 3))
L2 <- list(c("b","d"))
L3 <- list(L1,L2)
brand <- c("A","B","C","D")
price <- c(1,1,3,7)
df <- data.frame(brand , price)
brand2 <- c("E","F","G","H")
price2 <- c(20,3,5,10)
df2 <- data.frame(brand2, price2)
L4 <- list(df, L3, df2)
finaldf <- do.call("rbind.fill", L4)
Unfortunately I got this error : Error: All inputs to rbind.fill must be data.frames
So I know that the problem is that there is a list in that list L4. In my real data, there are even several lists in the big list. So can anyone tell me how to get rid of these lists inside the big list ? Thank you very very much !
You need to filter out which list entries are not data.frames like so:
is_df <- sapply(L4, is.data.frame)
finaldf <- do.call("rbind.fill", L4[is_df])
Alterntatively,
do.call("rbind.fill", Filter(is.data.frame, L4))
You can create an index to subset your list like so:
# Subset list
index <- sapply(L4, is.data.frame)
and then use it to make your final data.frame like so:
finaldf <- do.call("rbind", L4[index])
Keep in mind that in order for this to work both dataframes have to have the same column names, so when you create df2 you should specify the column names like so:
df2 <- data.frame(brand = brand2, price = price)
... before you even do the above.
Related
I faced a small problem of exctracting names of dataframes from list of dataframes. Let me provide you with small example.
Assume I have three dataframes (data1, data2, data3 and it doesn't matter the content of dataframes) and I put them into list like shown below.
my_list <- list(data1, data2, data3)
And I'd like to print their names in loop like this:
for (d in my_list){
cat(deparse(substitute(d)))}
As a result I've got ddd but I want data1, data2, data3 as a character.
How coul I resolve this kind of problem?
Thanks in advance.
The first problem is the list my_list that you are creating doesn't have any names assigned to its elements. To solve this you can use lst function from dplyr.
my_list <- dplyr::lst(data1, data2, data3)
And then to get the names as a single comma-separated string you can do:
cat(names(my_list), sep=',')
Use dplyr::lst which will assign the names to the list as per the dataframe name.
my_list <- dplyr::lst(data1, data2, data3)
To get names you can do names(my_list).
and in loop :
for (d in names(my_list)) {
name <- d
data <- my_list[[d]]
}
I have a list of more than 600 data frames, which doesn't have the same exact type of variables. Sometimes the class is different to the desired one because for instance there is a letter just before an integer and that makes the whole column a character instead (just because of a silly typo...) What I need to do is to identify which of those data frames do not have the desired variables' classes and modify it so I can work with all data for different purposes (summarize, analyses, etc). (I asked for similar help but regarding the column names here How to split a list of data frames based on its column names? )
I am trying to create two lists from the main one based on the desired order and classes of the variables. For that I am trying to do the following:
v1 <- c(1:15)
v2 <- c(20:34)
v3 <- c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o")
v3b <- c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o")
df1 <- data.frame(v1, v2, v3)
df2 <- data.frame(v1, v2, v3)
df3 <- data.frame(v1, v2, v3b)
df1[1,2] <- "m3"
mylist <- list(df1, df2, df3)
str(mylist[[1]]) # here you can see the class of each variable and how in df1 the class has been changed from integer to character
d_class <- sapply(mylist[[2]], class) # this is how I get the desired class
# now I try to alternatives to identify the dfs with the undesired structure:
#1
list_a <- list()
list_a <- lapply(mylist, function(x) class(x) == d_class) # do not work
grep('FALSE', list_a)
#2
list_b <- list()
list_b <- split(mylist,sapply(mylist,function(x)
identical(class(x),d_class)))
As you can see I always get the all the data frames in return and the code doesn't recognize the desired class even if it is previously specified.
Does anyone know what is wrong with this code?
Any help is much appreciated
As it is a list of data.frame, the class is data.frame. We need to loop throuh the datasets to get the class of individual columns
lapply(mylist, function(x) sapply(x, class) == d_class)
If we need a vector of logical index to find those datasets that have the same class as in the 'd_class', wrap with all on the logical vector
sapply(mylist, function(x) all(sapply(x, class) == d_class))
I have a list of titles that I would like to iterate over and create/save data frames to. I have tried the using the paste() function (as seen below) but that does not work for me. Any advice would be greatly appreciated.
samples <- list("A","B","C")
for (i in samples){
paste(i,sumT,sep="_") <- data.frame(col1=NA,col1=NA)
}
My desired output is three empty data frames named: A_sumT, B_sumT and C_sumT
Here's an answer with purrr.
samples <- list("A", "B", "C")
samples %>%
purrr::map(~ data.frame()) %>%
purrr::set_names(~ paste(samples, "sumT", sep="_"))
Consider creating a list of dataframes and avoid many separate objects flooding global environment as this example can extend to hundreds and not just three. Plus with this approach, you will maintain one container capable of running bulk operations across all dataframes.
By using sapply below on a character vector, you create a named list:
samples <- c("A","B","C") # OR unlist(list("A","B","C"))
df_list <- sapply(samples, function(x) data.frame(col1=NA,col2=NA), simplify=FALSE)
# RUN ANY DATAFRAME OPERATION
head(df_list$A)
tail(df_list$B)
summary(df_list$C)
# BULK OPERATIONS
stacked_df <- do.call(rbind, df_list)
stacked_df <- do.call(cbind, df_list)
merged_df <- Reduce(function(x,y) merge(x,y,by="col1"), df_list)
Or if you need to rename list
# RENAME LIST
df_list <- setNames(df_list, paste0(samples, "_sumT"))
# RUN ANY DATAFRAME OPERATION
head(df_list$A_sumT)
tail(df_list$B_sumT)
summary(df_list$C_sumT)
I have several data frames df1, df, 2...., df10. Columns (variables) are the same in all of them.
I want to create a new variable within each of them. I can easily do it "manually" as follows:
df1$newvariable <- ifelse(df1$oldvariable == 999, NA, df1$oldvariable)
or, alternatively
df1 = transform(df1, df1$newvariable= ifelse(df1$oldvariable==999, NA, df1$oldvariable)))
Unfortunately I'm not able to do this in a loop. If I write
for (i in names) { #names is the list of dataframes
i$newvariable <- ifelse(i$oldvariable == 999, NA, i$oldvariable)
}
I get the following output
Error in i$oldvariable : $ operator is invalid for atomic vectors
What I'd do is to pool all data.frame on to a list and then use lapply as follows:
df1 <- as.data.frame(matrix(runif(2*10), ncol=2))
df2 <- as.data.frame(matrix(runif(2*10), ncol=2))
df3 <- as.data.frame(matrix(runif(2*10), ncol=2))
df4 <- as.data.frame(matrix(runif(2*10), ncol=2))
# create a list and use lapply
df.list <- list(df1, df2, df3, df4)
out <- lapply(df.list, function(x) {
x$id <- 1:nrow(x)
x
})
Now, you'll have all the data.frames with a new column id appended and out is a list of data.frames. You can access each of the data.frames with x[[1]], x[[2]] etc...
This has been asked many times. The $<- is not capable of translating that "i" index into either the first or second arguments. The [[<- is capable of doing so for the second argument but not the first. You should be learning to use lapply and you will probably need to do it with two nested lapply's, one for the list of "names" and the other for each column in the dataframes. The question is incomplete since it lacks specific examples. Make up a set of three dataframes, set some of the values to "999" and provide a list of names.
I have a data frame with following structure:
pat <- c(rep(1,50), rep(2,50), rep(3,50))
inc <- rep(c(rep(1,5), rep(2,5), rep(3,5), rep(4,5), rep(5,5),
rep(6,5), rep(7,5), rep(8,5), rep(9,5), rep(10,5)), 3)
df <- data.frame(cbind(pat, inc))
df is split into a list of elements:
all.inc = split(df, inc)
Now I want to split each element of this list into sub-lists. Something like:
all.pat = split(all.inc, pat)
This doesn't work, obviously. I've already tried the plyr functions and lapply, but didn't get it to work.
Any ideas?
Use lapply:
lapply(all.inc, function(x) split(x, x$pat))
If you'd like to split your data frame all at once, you could use
split(df, interaction(df$pat,df$inc))
However, the returned value will be a single list of data frames, which is slightly different from what you would get by splitting list elements.