data.frame from lists in list, weird column names

data.frame from lists in list, weird column names - r

I'm trying to make a data.frame from a "list in list"
l <- list(c("sam1", "GSM6683", "GSM6684", "GSM6687", "GSM6688"), c("sam2",
"GSM6681", "GSM6682", "GSM6685", "GSM6686"))
df <- data.frame(l)
1) I get a date.frame with weird column names, how can I avoid it?
2) I'd like to get the column names from the first element of the inner list in list
like so:
column names: sam1, sam2
row1 GSM6683 GSM6681
row2 GSM6684 GSM6682
row3 GSM6687 GSM6685
row4 GSM6688 GSM6686

You were almost there, since you want sam1 and sam2 to be column names you don't need to make them part of you list and specify they are column names.
>l <- list(c("GSM6683", "GSM6684", "GSM6687", "GSM6688"), c(
"GSM6681", "GSM6682", "GSM6685", "GSM6686"))
>df <- data.frame(l)
>colnames(df)<-c("sam1", "sam2")

If you're starting with the data structure in your example, do this:
df <- data.frame(lapply(l, function(x) x[-1]))
names(df) <- lapply(l, function(x) x[1])
If you have a choice on how to construct the data structure, do what R_Newbie says in his answer.

Related

Add different suffix to column names on multiple data frames in R

I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)

Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"

You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".

turning lists of lists of lists into a dataframe

I have a set of lists stored in the all_lists.
all_list=c("LIST1","LIST2")
From these, I would like to create a data frame such that
LISTn$findings${Coli}$character is entered into the n'th column with rowname from LISTn$rowname.
DATA
LIST1=list()
LIST1[["findings"]]=list(s1a=list(character="a1",number=1,string="a1type",exp="great"),
=list(number=2,string="b1type"),
in2a=list(character="c1",number=3,string="c1type"),
del3b=list(character="d1",number=4,string="d1type"))
LIST1[["rowname"]]="Row1"
LIST2=list()
LIST2[["findings"]]=list(s1a=list(character="a2",number=5,string="a2type",exp="great"),
s1b=list(character="b2",number=6,string="b2type"),
in2a=list(character="c2",number=7,string="c2type"),
del3b=list(character="d2",number=8,string="d2type"))
LIST2[["rowname"]]="Row2"
Please note that some characters are missing for which NA would suffice.
Desired output is this data frame:
s1a s1b in2a del3b
Row1 a1 NA c1 d1
Row2 a2 b2 c2 d2
There is about 1000 of these lists, speed is a factor. And each list is about 50mB after I load them through rjson::fromJSON(file=x)
The row and column names don't follow a particular pattern. They're names and attributes

We can use a couple of lapply/sapply combinations to loop over the nested list and extract the elements that have "Row" as the name
do.call(rbind, lapply(mget(all_list), function(x)
sapply(lapply(x$findings[grep("^Row\\d+", names(x$findings))], `[[`,
"character"), function(x) replace(x, is.null(x), NA))))
Or it can be also done by changing the names to a single value and then extract all those
do.call(rbind, lapply(mget(all_list), function(x) {
x1 <- setNames(x$findings, rep("Row", length(x$findings)) )
sapply(x1[names(x1)== "Row"], function(y)
pmin(NA, y$character[1], na.rm = TRUE)[1])}))

purrr has a strong function called map_chr which is built for these tasks.
library(purrr)
sapply(mget(all_list),function(x) purrr::map_chr(x$findings,"character",.default=NA))
%>% t
%>% data.frame

R: Merge lists containing vectors and scalars

I have N lists that have some identical column names. Here is a MWE with two list:
ls <- list()
ls[[1]] <- list("a"=1:2,
"b"=20,
"c"=numeric(0))
names(ls[[1]]$a) <- c("a1", "a2")
ls[[2]] <- list("a"=3:4,
"b"=30,
"c"=1:4,
"d"="f")
names(ls[[2]]$a) <- c("a1", "a2")
Is it possible merge these into a resulting list lsRes, where lsRes has the following properties:
lsRes$a contains two elements, where the first is the named vector
c(1,2) (with names c(a1, a2)) and the second a named vector c(3,4)
(with names (c(a1.a2)))
lsRes$b contains two elements, where the first is 20 and the second is 30
lsRes$c contains two elements, where the first is numeric(0) and the second is 1:4
lsRes$d contains
two elements, where the first is NA and the second is "f"
I looked at this and this, but they describe different cases

Assuming that we need to have the output also as a list, we create the common names and then assign those doesn't any of the common names to NA
nm1 <- unique(unlist(sapply(ls, names)))
lsRes <- lapply(ls, function(x) {x[setdiff(nm1, names(x))] <- NA; x})
lengths(lsRes)
#[1] 4 4
If we need to have a list of 4 elements, then use transpose
library(purrr)
lsRes %>%
transpose

R: Merging lists of data frames

I'm a total noob at R and I've tried (and retried) to search for an answer to the following problem, but I've not been able to get any of the proposed solutions to do what I'm interested in.
I have two lists of named elements, with each element pointing to data frames with identical layouts:
(EDIT)
df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"))
df2 <- data.frame(A=c(98,99),B=c("Y","Z"))
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"))
lst2 <- c(X=df3)
(EDIT 2)
So it seems like storing multiple data frames in a list is a bad idea, as it will convert the data frames to lists. So I'll go out looking for an alternative way to store a set of named data frames.
In general the names of the elements in the two elements might overlap partially, completely, or not at all.
I'm looking for a way to merge the two lists into a single list:
<some-function-sequence>(lst1, lst2)
->
c(X=rbind(df1,df3),Y=df2)
-resulting in something like this:
[EDIT: Syntax changed to correctly reflect desired result (list-of-data frames)]
$X
A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
$X.B
A B
1 98 Y
2 99 Z
I.e:
IF the lists contain identical element names, each pointing to a data frame, THEN I want to 'rbind' the rows from these two data frames and assign the resulting data frame to the same element name in the resulting list.
Otherwise the element names and data frames from both lists should just be copied into the resulting list.
I've tried the solutions from a number of discussions such as:
Can I combine a list of similar dataframes into a single dataframe?
Combine/merge lists by elements names
Simultaneously merge multiple data.frames in a list
Combine/merge lists by elements names (list in list)
Convert a list of data frames into one data frame
-but I've not been able to find the right solution. A general problem seems to be that the data frame ends up being converted into a list by the application of 'mapply/sapply/merge/...' - and usually also sliced and/or merged in ways which I am not interested in. :)
Any help with this will be much appreciated!
[SOLUTION]
The solution seems to be to change the use of c(...) when collecting data frames to list(...) after which the solution proposed by Pierre seems to give the desired result.

Here is a proposed solution using split and c to combine like terms. Please read the caveat at the bottom:
s <- split(c(lst1, lst2), names(c(lst1,lst2)))
lapply(s, function(lst) do.call(function(...) unname(c(...)), lst))
# $X.A
# [1] 1 2 3 4 5
#
# $X.B
# [1] "A" "B" "C" "D" "E"
#
# $Y.A
# [1] 98 99
#
# $Y.B
# [1] "Y" "Z"
This solution is based on NOT having factors as strings. It will not throw an error but the factors will be converted to numbers. Below I show how I transformed the data to remove factors. Let me know if you require factors:
df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"), stringsAsFactors=FALSE)
df2 <- data.frame(A=c(98,99),B=c("Y","Z"), stringsAsFactors=FALSE)
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"), stringsAsFactors=FALSE)
lst2 <- c(X=df3)
If the data is stored in lists we can use:
lapply(split(c(lst1, lst2), names(c(lst1,lst2))), function(lst) do.call(rbind, lst))

The following solution is probably not the most efficient way. However, if I got your problem right this should work ;)
# Example data
# Some vectors
a <- 1:5
b <- 3:7
c <- rep(5, 5)
d <- 5:1
# Some dataframes, data1 and data3 have identical column names
data1 <- data.frame(a, b)
data2 <- data.frame(c, b)
data3 <- data.frame(a, b)
data4 <- data.frame(c, d)
# 2 lists
list1 <- list(data1, data2)
list2 <- list(data3, data4)
# Loop, wich checks for the dataframe names and rbinds dataframes with the same column names
final_list <- list1
used_lists <- numeric()
for(i in 1:length(list1)) {
for(j in 1:length(list2)) {
if(sum(colnames(list1[[i]]) == colnames(list2[[j]])) == ncol(list1[[i]])) {
final_list[[i]] <- rbind(list1[[i]], list2[[j]])
used_lists <- c(used_lists, j)
}
}
}
# Adding the other dataframes, which did not have the same column names
for(i in 1:length(list2)) {
if((i %in% used_lists) == FALSE) {
final_list[[length(final_list) + 1]] <- list2[[i]]
}
}
# Final list, which includes all other lists
final_list

Nested named list to data frame

I have the following named list output from a analysis. The reproducible code is as follows:
list(structure(c(-213.555409754509, -212.033637890131, -212.029474755074,
-211.320398316741, -211.158815833294, -210.470525157849), .Names = c("wasn",
"chappal", "mummyji", "kmph", "flung", "movie")), structure(c(-220.119433774144,
-219.186901747536, -218.743319709963, -218.088361753899, -217.338920075687,
-217.186050877079), .Names = c("crazy", "wired", "skanndtyagi",
"andr", "unveiled", "contraption")))
I want to convert this to a data frame. I have tried unlist to data frame options using reshape2, dplyr and other solutions given for converting a list to a data frame but without much success. The output that I am looking for is something like this:
Col1 Val1 Col2 Val2
1 wasn -213.55 crazy -220.11
2 chappal -212.03 wired -219.18
3 mummyji -212.02 skanndtyagi -218.74
so on and so forth. The actual out put has multiple columns with paired values and runs into many rows. I have tried the following codes already:
do.call(rbind, lapply(df, data.frame, stringsAsFactors = TRUE))
works partially provides all the character values in a column and numeric values in the second.
data.frame(Reduce(rbind, df))
didn't work - provides the names in the first list and numbers from both the lists as tow different rows
colNames <- unique(unlist(lapply(df, names)))
M <- matrix(0, nrow = length(df), ncol = length(colNames),
dimnames = list(names(df), colNames))
matches <- lapply(df, function(x) match(names(x), colNames))
M[cbind(rep(sequence(nrow(M)), sapply(matches, length)),
unlist(matches))] <- unlist(df)
M
didn't work correctly.
Can someone help?

Since the list elements are all of the same length, you should be able to stack them and then combine them by columns.
Try:
do.call(cbind, lapply(myList, stack))

Here's another way:
as.data.frame( c(col = lapply(x, names), val = lapply(x,unname)) )
How it works. lapply returns a list; two lists combined with c make another list; and a list is easily coerced to a data.frame, since the latter is just a list of vectors having the same length.
Better than coercing to a data.frame is just modifying its class, effectively telling the list "you're a data.frame now":
L = c(col = lapply(x, names), val = lapply(x,unname))
library(data.table)
setDF(L)
The result doesn't need to be assigned anywhere with = or <- because L is modified "in place."

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

data.frame from lists in list, weird column names - r

If you're starting with the data structure in your example, do this: df <- data.frame(lapply(l, function(x) x[-1])) names(df) <- lapply(l, function(x) x[1]) If you have a choice on how to construct the data structure, do what R_Newbie says in his answer.

Related

Add different suffix to column names on multiple data frames in R

turning lists of lists of lists into a dataframe

R: Merge lists containing vectors and scalars

R: Merging lists of data frames

Nested named list to data frame

Categories

Resources