List elements to dataframes in R - r

How would I go about taking elements of a list and making them into dataframes, with each dataframe name consistent with the list element name?
Ex:
exlist <- list(west=c(2,3,4), north=c(2,5,6), east=c(2,4,7))
Where I'm tripping up is in the actual naming of the unique dataframes -- I can't figure out how to do this with a for() loop or with lapply:
for(i in exlist) {
i <- data.frame(exlist$i)
}
gives me an empty dataframe called i, whereas I'd expect three dataframes to be made (one called west, another called north, and another called east)
When I use lapply syntax and call the individual list element name, I get empty dataframes:
lapply(exlist, function(list) i <- data.frame(list["i"]))
yields
data frame with 0 columns and 0 rows
> $west
list..i..
1 NA
$north
list..i..
1 NA
$east
list..i..
1 NA

If you want to convert your list elements to data.frames, you can try either
lapply(exlist, as.data.frame)
Or (as suggested by #Richard), depends on your desired output:
lapply(exlist, as.data.frame.list)
It is always recommended to keep multiple data frames in a list rather than polluting your global environment, but if you insist on doing this, you could use list2env (don't do this), such as:
list2env(lapply(exlist, as.data.frame.list), .GlobalEnv)

This should create the three objects you want:
df.names <- "value" ## vector with column names here
for (i in names(exlist)) setNames(assign(i, data.frame(exlist[[i]])), df.names)

Related

What is happening during assignment to a dataframe by lapply

Given a dataframe df and a function f which is applied to df:
df[] <- lapply(df, f)
What is the magic R is performing to replace columns in df with collection of vectors in the list from lapply? I see that the result from lapply is a list of vectors having the same names as the dataframe df. I assume some magic mapping is being done to map the vectors to df[], which is the collection of columns in df (methinks). Just works? Trying to better understand so that I remember what to use the next time.
A data.frame is merely a list of vectors having the same length. You can see it using is.list(a_data_frame). It will return TRUE.
[] can have different meaning or action depending of the object it is applied on. It even can be redefined as it is in fact a function.
[] allows to subset or insert vector columns from data.frame.
df[1] get the first column
df[1] <- 2 replace the first column with 2 (repeated in order to have the same length as other columns)
df[] return the whole data.frame
df[] <- list(c1,c2,c3) sets the content of the data.frame replacing it's current content
Plus a wide number of other way to access or set data in a data.frame (by column name, by subset of rows, of columns, ...)

How to test if column names of multiple dataframes are same

Say I have 10 dataframes. I would like to check if all have same column names irrespective of their cases.
I can do this in multiple steps, but I was wondering if there is a shortcut way to do this?
We place the datasets in a list, loop over the list with lapply, get the column names, convert it to a single case, get the unique and check if the length is 1
length(unique(lapply(lst1, function(x) sort(toupper(names(x)))))) == 1
#[1] TRUE
data
lst1 <- list(mtcars, mtcars, mtcars)
You can use Reduce + intersect to get all the common column names in the list of dataframes and compare it with the names of any single dataframe in the list.
all(sort(Reduce(intersect, lapply(list_df, names))) == sort(names(list_df[[1]])))

How do I pass a data frame as an argument to a function?

T12 is a data frame with 22 columns (but I just want column 2 till 8) and about one million entries.
Some of the Entries are NA in column one. Everytime there is NA in first column, complete cases deletes the complete row. Everything works well.
I Have a lot more data frames and I don't want to write the whole code again for every data frame.
I would like to have something like this function and want to put as x T12, T13, T14, T15 and so on.
Might you help me?
split <- function (x){
x <- x[,2:8]
x <- x[complete.cases(x[ ,1]),]
}
If you have dataframes named "T12", "T13" etc, you can use the pattern "T" followed by a number to capture all such dataframes in a character vector using ls.
Using mget you can get dataframes from those character vector in a named list.
You can then use lapply to apply split function on each list.
new_data <- lapply(mget(ls(pattern = 'T\\d+')), split)
new_data has list of dataframes. If you want these changes to reflect in original dataframe use list2env.
list2env(new_data, .GlobalEnv)
PS - split is a default function in R, so it is better to give some different name to your function.

R - whats the type of this object and how to work with it?

I'm lost with the following object in R:
# create a list of filenames
files <- list.files("directory", full.names = TRUE)
# read all files as csv
data <- lapply(files, function(x) (data.frame(read.csv(x))))
Thats fine, but I have no idea what the type of data is and how to get my hands on it. Lets have a look:
data[1]
[[1]]
Date value1 value2 ID
1 2003-01-01 NA NA 1
2 2003-01-02 NA NA 1
...
Ok, that looks like a data frame (thats also what I intended when I did data.frame(read.csv(x))) -- I wanted a list of data frames. Unfortunately, when I ask
typeof(data[1])
[1] "list"
R claims data[1] to be a list. Why? I figured out now how that data[[1]] gives access to the data.frame as intended. But I could not figure out how to apply operations on the data frames packed in data. For instance, I would like to filter all elements from data which have more than 100 rows in the dataframe. I tried
lapply(data, Filter, f = function(x) (nrow(data.frame(x))>100))
but this just gives back a list of the same length as data which contains for instance
[[1]]
data frame with 0 columns and 1461 rows
Basically I have three questions:
Why do I get a list of lists instead of a list of data frames?
Could I convert this list of lists into, lets say a vector of data frames?
How could I subset the list in the way described above (for instance get all frames with more then 1000 rows)?
Ad1:
This is pretty basic stuff: the [ operator does not select a single element from a list - it returns a subset of a list. For a single element use [[ or $. So the answer to the first question is: you do get a list of data frames.
Ad2: You can't have vectors of data frames.
Ad3: lapply needs a FUN argument. But even if used correctly, lapply will produce some output element for every input element in a list. For filtering use Filter, in your case: Filter(function(x) nrow(x) > 100, data)

in R: combine columns of different dataframes

I try to combine each columns of three different dataframes to get an object with the same length of the original dataframe and three columns of every subobject. Each of the original dataframe has 10 columns and 14 rows.
I tried it with a for-loop, but the result is not usable for me.
t <- NULL
for(i in 1 : length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t <- list(t, a)
}
t
But in the end I would like to get 10 seperated dataframes with three columns.
So I want to loop through this:
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
for every column of each original dataframe. But if I use t <- list(t, a) it constructs a crazy list. Thanks.
The code you're using to append elements to t is wrong, you should do in this way:
t <- list()
for(i in 1:length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t[[length(t)+1]] <- a
}
t
Your code is wrong since at each step, you transform t into a list where the first element is the previous t (that is a list, except for the first iteration), and the second element is the subset. So basically in the end you're getting a sort of recursive list composed by two elements where the second one is the data.frame subset and the first is again a list of two elements with the same structure, for ten levels.
Anyway, your code is equivalent to this one-liner (that is probably more efficient since it does not perform any list concatenation):
t <- lapply(1:length(net),
function(i){cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])})
This should work:
do.call(cbind,list(imp.qua.00.09, exp.qua.00.09, net))

Resources