R Extract columns in list of dataframes into lists - r

I have a very specific problem. I have a list of dataframes:
AB_df = data.frame(replicate(2,sample(0:130,201,rep=TRUE)))
BC_df = data.frame(replicate(2,sample(0:130,200,rep=TRUE)))
DE_df = data.frame(replicate(2,sample(0:130,197,rep=TRUE)))
FG_df = data.frame(replicate(2,sample(0:130,203,rep=TRUE)))
AB_pc = data.frame(replicate(2,sample(0:130,201,rep=TRUE)))
BC_pc = data.frame(replicate(2,sample(0:130,200,rep=TRUE)))
DE_pc = data.frame(replicate(2,sample(0:130,197,rep=TRUE)))
FG_pc = data.frame(replicate(2,sample(0:130,203,rep=TRUE)))
df_list = list(AB_df, BC_df, DE_df, FG_df, AB_pc, BC_pc, DE_pc, FG_pc)
names(df_list) = c("AB_df", "BC_df", "DE_df", "FG_df", "AB_pc", "BC_pc", "DE_pc", "FG_pc")
I want to extract now the 1st column of every 2nd dataframe into a list called "picked" and the other dataframes into a list called "unpicked". I tried doing this with a loop and sequences. The sequences are giving me the correct list entries, but in my output lists I always only get the same entry. This is my try so far:
picked = list()
unpicked = list()
for (a in 1:(length(df_list)/2)) {
for (b in seq(1,length(df_list), by = 2)){
for (c in seq(2,length(df_list), by = 2)) {
picked[[a]] = df_list[[b]][[1]]
unpicked[[a]] = df_list[[c]][[1]]}}}
I think I am close, but something is still not right.

We can use lapply to select 1st column of every second dataframe (picked) and select the remaining dataframe in unpicked.
picked <- lapply(df_list[c(FALSE, TRUE)], `[`, 1)
unpicked <- lapply(df_list[c(TRUE, FALSE)], `[`, 1)
We use FALSE/TRUE to select alternate list elements. So here we select element 2, 4, 6 and 8 and with [ subset 1st columns from the data. For unpicked we select list element 1, 3, 5 and 7 and get their first column.

we can split to get a list of picked and unpicked
lst1 <- lapply(split(seq_along(df_list), seq_along(df_list) %%2),
function(i) df_list[i][1])

Related

Extract fix columns and one variable column from a list of df´s in R

I want to extract the columns 2 and 20 of each df within a list and add a variable for the columns 3:19 and for each of those (16) I want to create a new df.
I tried to build a for loop
for i in (3:19){
lapply(abs_bezirke)
y = paste("straftat", i , sep = "")
assign(y, filter.values <- c(2,i,20))
}
thanks in advance
This is not tested
sapply(3:19, FUN = function(i, mydata) {
mydata[, c(2, i, 20)]
}, mydata = mydf, simplify = FALSE)
It basically does what your loop is up to, but using sapply. The result should be a list of data.frames.

Using a loop to select a column names from a list

I've been struggling with column selection with lists in R. I've loaded a bunch of csv's (all with different column names and different number of columns) with the goal of extracting all the columns that have the same name (just phone_number, subregion, and phonetype) and putting them together into a single data frame.
I can get the columns I want out of one list element with this;
var<-data[[1]] %>% select("phone_number","Subregion", "PhoneType")
But I cannot select the columns from all the elements in the list this way, just one at a time.
I then tried a for loop that looks like this:
new.function <- function(a) {
for(i in 1:a) {
tst<-datas[[i]] %>% select("phone_number","Subregion", "PhoneType")
}
print(tst)
}
But when I try:
new.function(5)
I'll only get the columns from the 5th element.
I know this might seem like a noob question for most, but I am struggling to learn lists and loops and R. I'm sure I'm missing something very easy to make this work. Thank you for your help.
Another way you could do this is to make a function that extracts your columns and apply it to all data.frames in your list with lapply:
library(dplyr)
extractColumns = function(x){
select(x,"phone_number","Subregion", "PhoneType")
#or x[,c("phone_number","Subregion","PhoneType")]
}
final_df = lapply(data,extractColumns) %>% bind_rows()
The way you have your loop set up currently is only saving the last iteration of the loop because tst is not set up to store more than a single value and is overwritten with each step of the loop.
You can establish tst as a list first with:
tst <- list()
Then in your code be explicit that each step is saved as a seperate element in the list by adding brackets and an index to tst. Here is a full example the way you were doing it.
#Example data.frame that could be in datas
df_1 <- data.frame("not_selected" = rep(0, 5),
"phone_number" = rep("1-800", 5),
"Subregion" = rep("earth", 5),
"PhoneType" = rep("flip", 5))
# Another bare data.frame that could be in datas
df_2 <- data.frame("also_not_selected" = rep(0, 5),
"phone_number" = rep("8675309", 5),
"Subregion" = rep("mars", 5),
"PhoneType" = rep("razr", 5))
# Datas is a list of data.frames, we want to pull only specific columns from all of them
datas <- list(df_1, df_2)
#create list to store new data.frames in once columns are selected
tst <- list()
#Function for looping through 'a' elements
new.function <- function(a) {
for(i in 1:a) {
tst[[i]] <- datas[[i]] %>% select("phone_number","Subregion", "PhoneType")
}
print(tst)
}
#Proof of concept for 2 elements
new.function(2)

Index nested lists of named data frames using character vector - R

I have a nested list of named data frames like so:
mylist2 <- list(
list(df1.a = data.frame(replicate(2,sample(0:1,5,rep=TRUE))), df2.b = data.frame(replicate(2,sample(0:1,5,rep=TRUE)))),
list(df3.c = data.frame(replicate(2,sample(0:1,5,rep=TRUE))), df4.d = data.frame(replicate(2,sample(0:1,5,rep=TRUE)))),
list(df5.e = data.frame(replicate(2,sample(0:1,5,rep=TRUE))), df6.f = data.frame(replicate(2,sample(0:1,5,rep=TRUE)))))
I run a test (not important what sort of test) and it produces a character vector telling me which data frames in this list are important:
test
[1] "df1.a" "df5.e"
What is the most efficient way to extract these data frames from the nested list using this character vector? The test only shows the names of second list, so nestedlist[test] does not work.
As the OP mentioned it was a nested list, we can loop through the initial list and then extract the elements of the second list with [
lapply(mylist2, '[', test)
or using tidyverse
library(tidyverse)
map(mylist2, ~ .x %>%
select(test))
Update
Based on the updated dataset:
Filter(length, lapply(mylist2, function(x) x[intersect(test, names(x))]))
Here is a reproducible example including sample data using nested lists:
# Sample data
lst <- list(
list(df1.a = 1, df2.b = 2),
list(df3.c = 3, df4.d = 4),
list(df5.e = 5, df6.f = 6))
test <- c("df1.a", "df5.e");
ret <- lapply(lst, function(x) x[names(x) %in% test])
ret[sapply(ret, length) > 0];
#[[1]]
#[[1]]$df1.a
#[1] 1
#
#
#[[2]]
#[[2]]$df5.e
#[1] 5

Is there an easy way to extract specific combinations of values from a list?

I have several (named) vectors in a list:
data = list(a=runif(n = 50, min = 1, max = 10), b=runif(n = 50, min = 1, max = 10), c=runif(n = 50, min = 1, max = 10), d=runif(n = 50, min = 1, max = 10))
I want to play around with different combinations of them depending on the row from another array called combs:
var <- letters[1:length(data)]
combs <- do.call(expand.grid, lapply(var, function(x) c("", x)))[-1,]
I would like to be able to extract each combination so that I can use the vectors created by these combinations.
All this is to be able to apply functions to each row extracted, and then to each combinations of these dataframes. So for example:
# Row 5 is "a", "c"
combs[5,]
# Use this information to extract this particular combination from my data:
# by hand it would be:
res_row5 = cbind(data[["a"]], data[["c"]])
# Extract another combination
# Row 11 is "a", "b", "d"
combs[11,]
res_row11 = cbind(data[["a"]], data[["b"]], data[["d"]])
# So that I can apply functions to each row across all these vectors
res_row_5_func = apply(res_row5, 1, sum)
# Apply another function to res_row11
res_row_5_func = apply(res_row11, 1, prod)
# Multiply the two, do other computations which can do as long as I have extracted the right vectors
I had already asked a very similar question here: Is there an easy way to match values of a list to array in R?
But can't figure out how to extract the actual data...
Thanks so much!
What you could do is first generate a list of vectors indexing the relevant entries in data:
library(magrittr)
combList <- lapply(1:nrow(combs), function(ii) combs[ii,] %>% unlist %>% setdiff(""))
You could then use this list to index the columns in data and generate a new list of the desired matrices:
dataMatrixList <- lapply(combList, function(indVec) data[indVec] %>% do.call('cbind', .))
The i-th entry in your dataMatrixList the contains a matrix with columns corresponding to the i-th row in combs. You can then compute sums, products etc. using
rowSumsList <- lapply(dataMatrixList, function(x) apply(x, 1, sum))
This would be another approach, that I think gives what you want? it will return a list of your dataframes by subsetting your data list by the (non-empty) elements of each row of combs:
data_sets <- apply(combs,
1,
function(x) do.call(cbind.data.frame, data[unlist(x[x!=''])])
)

cbind equally named vectors in multiple data.frames in a list to a single data.frame

I have a list similar to this one:
set.seed(1602)
l <- list(data.frame(subst_name = sample(LETTERS[1:10]), perc = runif(10), crop = rep("type1", 10)),
data.frame(subst_name = sample(LETTERS[1:7]), perc = runif(7), crop = rep("type2", 7)),
data.frame(subst_name = sample(LETTERS[1:4]), perc = runif(4), crop = rep("type3", 4)),
NULL,
data.frame(subst_name = sample(LETTERS[1:9]), perc = runif(9), crop = rep("type5", 9)))
Question: How can I extract the subst_name-column of each data.frame and combine them with cbind() (or similar functions) to a new data.frame without messing up the order of each column? Additionally the columns should be named after the corresponding crop type (this is possible 'cause the crop types are unique for each data.frame)
EDIT: The output should look as follows:
Having read the comments I'm aware that within R it doesn't make much sense but for the sake of having alook at the output the data.frame's View option is quite handy.
With the help of this SO-Question I came up with the following sollution. (There's probably room for improvement)
a <- lapply(l, '[[', 1) # extract the first element of the dfs in the list
a <- Filter(function(x) !is.null(unlist(x)), a) # remove NULLs
a <- lapply(a, as.character)
max.length <- max(sapply(a, length))
## Add NA values to list elements
b <- lapply(a, function(v) { c(v, rep(NA, max.length-length(v)))})
e <- as.data.frame(do.call(cbind, d))
names(e) <- unlist(lapply(lapply(lapply(l, '[[', "crop"), '[[', 2), as.character))
It is not really correct to do this with the given example because the number of rows is not the same in each one of the list's data frames . But if you don't care you can do:
nullElements = unlist(sapply(l,is.null))
l = l[!nullElements] #delete useless null elements in list
columns=lapply(l,function(x) return(as.character(x$subst_name)))
newDf = as.data.frame(Reduce(cbind,columns))
If you don't want recycled elements in the columns you can do
for(i in 1:ncol(newDf)){
colLength = nrow(l[[i]])
newDf[(colLength+1):nrow(newDf),i] = NA
}
newDf = newDf[1:max(unlist(sapply(l,nrow))),] #remove possible extra NA rows
Note that I edited my previous code to remove NULL entries from l to simplify things

Resources