Search unnamed lists of lists for string - r

I have a list of unnamed lists of named dataframes of varying lengths. I'm looking for a way to grep or search through the indices of the list elements to find specific named dfs.
Here is the current method:
library(tibble) # for tibbles
## list of lists of dataframes
abc_list <- list(list(dfAAA = tibble(names = state.abb[1:10]),
dfBBB = tibble(junk = state.area[5:15]),
dfAAA2 = tibble(names = state.abb[8:20])),
list(dfAAA2 = tibble(names = state.abb[10:15]),
dfCCC = tibble(junk2 = state.area[4:8]),
dfGGG = tibble(junk3 = state.area[12:14])))
# Open list, manually ID list index which has "AAA" dfs
# extract from list of lists into separate list
desired_dfs_list <- abc_list[[1]][grepl("AAA", names(abc_list[[1]]))]
# unlist that list into a combined df
desired_rbinded_list <- as.data.frame(data.table::rbindlist(desired_dfs_list, use.names = F))
I know there's a better way than this.
What I've attempted so far:
## attempt:
## find pattern in df names
aaa_indices <- sapply(abc_list, function(x) grep(pattern = "AAA", names(x)))
## apply that to rbind ???
desired_aaa_rbinded_list <- purrr::map_df(aaa_indices, data.table::rbindlist(abc_list))
the steps from the manual example would be:
pull identified list items (dfs) into a separate list
rbind the list of dfs into one df
I'm just not sure how to do that in a way that allows me more flexibility, instead of manually opening the lists and ID-ing the indices to pull.
thanks for any help or ideas!

If your tibbles( or dataframes) are always one level deep in the list (meaning a list(0.level) of lists (1st level)) you can use unlist to get rid of the first level:
all_dfs_list <- unlist(abc_list,
recursive = FALSE # will stop unlisting after the first level
)
This will result in a list of tibbles:
> all_dfs_list
$dfAAA
# A tibble: 10 x 1
names
<chr>
1 AL
2 AK
...
then you can filter by name and use rbindlist on the desired elements, as you already did in your question:
desired_dfs_list <- all_dfs_list[grepl("AAA",names(all_dfs_list))]
desired_rbinded_list <- as.data.frame(
data.table(rbindlist(desired_dfs_list, use.names = F))
)

Related

Add different suffix to column names on multiple data frames in R

I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)
Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"
You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".

Transforming list obtained via strsplit to merge common categories

I have a list resembling the one below:
# Initial object
vec <- c("levelA-1", "levelA-2", "levelA-3",
"levelB-1", "levelB-2", "levelB-3")
lstVec <- strsplit(x = vec, split = "-")
I would like to arrive at a list of the following structure:
lstRes <- list(levelA = list(1:3),
lvelB = list(1:3))
Notes
The list has the following characteristics:
First level elements are transformed into distinct lists
Second level elements created via strsplit are elements of those lists
this suffices:
mat <- do.call(rbind, lstVec)
result <- split(mat[,2], mat[,1])
the do.call and rbind stack the result of lstVec by row into a matrix (thanks to G. Grothendieck for pointing out this is not a data frame), then the split split mat[,2] by mat[,1].
as Aaron says, ti is a little odd that you want nested list. but you can get it
lapply(result, as.list)
i am not sure how good rbind is. but another way to obtain mat is
mat <- matrix(unlist(lstVec), ncol = 2, byrow = TRUE)

R -- Use List value to lookup and return value from datafarme

I do not want to loop if I don't have to!
I am trying to iterate through a list, for each value in the list I want to lookup that value in a dataframe and pull data from another column (like a vlookup). I did my best to explain more detail in me code below.
# Create First dataframe
df = data.frame(Letter=c("a","b","c"),
Food=c("Apple","Bannana","Carrot"))
# Create Second dataframe
df1 = data.frame(Testing=c("ab","abc","c"))
# Create Function
SplitAndCalc <- function(i,dat){
# Split into characters
EachCharacter <- strsplit(as.character(dat$Testing), "")
# Iterate through Each Character, look up the matching Letter in df, pull back Food from df
# In the end df1 will looks something like Testing=c("ab","abc","c"), Food=c("Apple","AppleBannanaCarrot","Carrot")
return(Food)
}
library("parallel")
library("snow")
# Detect the number of CPU cores on local workstation
num.cores <- detectCores()
# Create cluster on local host
cl <- makeCluster(num.cores, type="SOCK")
# Get count of rows in dataframe
row.cnt = nrow(df1)
# Call function in parallel
system.time(Weight <- parLapply(cl, c(1:row.cnt), SplitAndCalc, dat=df1))
# Create new column in dataframe to store results from function
df1$Food <- NA
# Unlist the Weight to fill out dataframe
df1$Food <- as.numeric(unlist(Weight))
Thanks!
I think I found something that will work for me, posting incase it can help someone else...
# Create First dataframe
df = data.frame(Letter=c("a","b","c"),
Food=c("Apple","Bannana","Carrot"))
# Create Second dataframe
df1 = data.frame(ID=c(1,2,3,4,5),
Testing=c("ab","abc","a","cc","abcabcabc"))
# Split into individual characters
EachCharacter <- strsplit(as.character(df1$Testing), "")
# Set the names of list values to df1$ID so we can merge back together later
temp <- setNames(EachCharacter,df1$ID)
# Unlist temp list and rep ID for each letter
out.dat <- data.frame(ID = rep(names(temp), sapply(temp, length)),
Letter = unlist(temp))
# Merge Individual letter weight
PullInLetterFood <- (merge(df, out.dat, by = 'Letter'))

R: Any R function can find mapping or matching from one nested list to another (reproducible example is given)?

Here I have two list where one is nested list with containing two "IntegerList" like object, and second one with list of data.frame objects.I am wondering that any R function can find corresponding mapping from list element from nested list to list of data.frame objects recursively, then I call my own function for them. However, I implemented function that takes two argument where one args is list element from nested list, second one is data.frame objects from df object list.
This is the quick reproducible example with toy data:
m <- as(list(c(1:2), 3, 4, integer(0), 6), "CompressedIntegerList")
n <- as(list(1,3,6, integer(0), 8), "CompressedIntegerList")
d <- data.frame(x1=seq(2,5,9), x2=seq(4,5,9), x3=LETTERS[seq(1:8)], score=sample(1:12, 8))
e <- data.frame(x1=seq(1,9,8), x2=seq(7,9,8), x3=letters[seq(1:8)], score=sample(1:10, 8))
df.li <- list(d,e)
nested_list <- list(m,n)
for example, 1st list element of nested element is corresponds to 1st data.frame object of list of df.li, so I want have this matches and pass them to my own function recursively. I am seeking if I have several list object in nested_list, so what would be nice version of my function?
I tired nested lapply for them, but when I called my own function, it gave me error. I tested my own function and I am prettry sure that nested_list and df.li correspond by parallel gemotrically (pair-wise).
so I am looking for any R function that can find such mapping. Is there any R function can do this?
Use mapply.
Here is an example which determines if certain values show up in corresponding data.frames
# List of data.frames
dfs = list(
data.frame( name = c("Lorem", "ipsum"),
id = c(0, 1)),
data.frame( name = c("dolor", "sit"),
id = c(2,3))
)
# List of ids
ids = c(1,5)
# Function checks whether i occurs in the id column of the data.frame df
id.in.data.frame = function(i, df){
return( i %in% df$id )
}
# Check whether id 1 occurs in the first data.frame and id 5 occurs in the
# second data.frame
mapply(FUN = id.in.data.frame,
i = ids, # use the parameter names of id.in.data.frame ...
df = dfs # ... as parameter names here
)
# --> Returns: [1] TRUE FALSE

Nested named list to data frame

I have the following named list output from a analysis. The reproducible code is as follows:
list(structure(c(-213.555409754509, -212.033637890131, -212.029474755074,
-211.320398316741, -211.158815833294, -210.470525157849), .Names = c("wasn",
"chappal", "mummyji", "kmph", "flung", "movie")), structure(c(-220.119433774144,
-219.186901747536, -218.743319709963, -218.088361753899, -217.338920075687,
-217.186050877079), .Names = c("crazy", "wired", "skanndtyagi",
"andr", "unveiled", "contraption")))
I want to convert this to a data frame. I have tried unlist to data frame options using reshape2, dplyr and other solutions given for converting a list to a data frame but without much success. The output that I am looking for is something like this:
Col1 Val1 Col2 Val2
1 wasn -213.55 crazy -220.11
2 chappal -212.03 wired -219.18
3 mummyji -212.02 skanndtyagi -218.74
so on and so forth. The actual out put has multiple columns with paired values and runs into many rows. I have tried the following codes already:
do.call(rbind, lapply(df, data.frame, stringsAsFactors = TRUE))
works partially provides all the character values in a column and numeric values in the second.
data.frame(Reduce(rbind, df))
didn't work - provides the names in the first list and numbers from both the lists as tow different rows
colNames <- unique(unlist(lapply(df, names)))
M <- matrix(0, nrow = length(df), ncol = length(colNames),
dimnames = list(names(df), colNames))
matches <- lapply(df, function(x) match(names(x), colNames))
M[cbind(rep(sequence(nrow(M)), sapply(matches, length)),
unlist(matches))] <- unlist(df)
M
didn't work correctly.
Can someone help?
Since the list elements are all of the same length, you should be able to stack them and then combine them by columns.
Try:
do.call(cbind, lapply(myList, stack))
Here's another way:
as.data.frame( c(col = lapply(x, names), val = lapply(x,unname)) )
How it works. lapply returns a list; two lists combined with c make another list; and a list is easily coerced to a data.frame, since the latter is just a list of vectors having the same length.
Better than coercing to a data.frame is just modifying its class, effectively telling the list "you're a data.frame now":
L = c(col = lapply(x, names), val = lapply(x,unname))
library(data.table)
setDF(L)
The result doesn't need to be assigned anywhere with = or <- because L is modified "in place."

Resources