I have the following question:
I have a list (L1) with two parts and each 4 identical variables.
The variable 4 is also the name of the part of the list. e.g. $a = a
a <- data.frame(V1=c("a","b","c"), V2=c(4,7,9), V3=1:3, V4=c("a","a","a"))
b <- data.frame(V1=c("d","e","f"), V2=c(10,14,16), V3=1:3, V4=c("b","b","b"))
L1 <- list(a=a, b=b)
L1
$a
V1 V2 V3 V4
a 4 1 a
b 7 2 a
c 9 3 a
$b
V1 V2 V3 V4
d 10 1 b
e 14 2 b
f 16 3 b
I would like to extract the rows of each part of the list with V3==2. If there is no row in the list with this value V1 to V3 should be extracted with NA and V4 should contain the name of the part of the list.
In the example the outcome should look like this:
V1 V2 V3 V4
b 7 2 a
e 14 2 b
If I select a value e.g. V3==4 then my result should look like this:
V1 V2 V3 V4
<NA> <NA> <NA> a
<NA> <NA> <NA> b
I can extract a column with
unlist(lapply(L1, "[",3)) but I can't figure out how to extract rows which have a certain value in a variable.
I also tried to combine lapply with the subset function, but this didn't work for me.
Thank's for your help!
This should work. The first command returns a list, the second one converts it to a data frame. If the value is not in the data, it returns NA (for the list) or a row of NAs (for the df).
l <- lapply(L1, function(x) {i <- which(x$V3 == 2)
if (length(i) > 0) x[i, ]
else NA })
df <- rbind(l[[1]], l[[2]])
We could create a function using data.table. We rbind the list elements with rbindlist, grouped by 'V4', if the 'V3' is not equal to the given value, we return the NA elements (.SD[.N+1]) or else return the Subset of Data.table (.SD[tmp]).
library(data.table)
f1 <- function(lst, val){
rbindlist(lst)[, {tmp <- V3==val
if(!any(tmp)) .SD[.N+1]
else .SD[tmp]},
by = V4][, names(lst[[1]]), with=FALSE]
}
f1(L1, 4)
# V1 V2 V3 V4
#1: NA NA NA a
#2: NA NA NA b
f1(L1, 3)
# V1 V2 V3 V4
#1: c 9 3 a
#2: f 16 3 b
f1(L1, 2)
# V1 V2 V3 V4
#1: b 7 2 a
#2: e 14 2 b
You can also bind_rows with dplyr
list(a = a, b = b) %>%
bind_rows(.id = "source") %>%
filter(V2 == 2)
Related
I am trying to put a list into each data table cell, where the list comes from the values of other columns. I don't want to enter the name of each column explicitly. Basically I want to redo the following working example without having to refer to V1, V2 explicitly:
# Create a data table as an example:
dat <- data.table(V1 = c(1:4), V2 = c('A','B','C','D'))
print(dat)
V1 V2
1: 1 A
2: 2 B
3: 3 C
4: 4 D
Now I create a new variable V3 containing the list of the row values:
dat[, id := rownames(dat)] # Creating unique id per row
dat[, V3 := list(list(list(V1,V2))), by = 'id']
print(dat)
V1 V2 id V3
1: 1 A 1 <list[2]>
2: 2 B 2 <list[2]>
3: 3 C 3 <list[2]>
4: 4 D 4 <list[2]>
And we can see that the first element of V3 is correct and composed of the list 1,A:
unlist(dat$V3[[1]])
[1] "1" "A"
How can I make this process without having to explicitely list V1 and V2 (indeed in my code I have to do this for over 60 variables (i.e. taking a list from the rows across 60 variables), so I don't want to write list(list(list(V1, V2, V3, ..., V60)?
You can use apply -
library(data.table)
dat <- data.table(V1 = c(1:4), V2 = c('A','B','C','D'))
dat[, V3 := apply(.SD, 1, as.list), .SDcols = V1:V2]
dat
# V1 V2 V3
#1: 1 A <list[2]>
#2: 2 B <list[2]>
#3: 3 C <list[2]>
#4: 4 D <list[2]>
unlist(dat$V3[[1]])
# V1 V2
#"1" "A"
If you need an unnamed vector use unname(.SD) in apply.
i think this will do the trick
dat[, v3 := lapply(transpose(dat), as.list)]
# V1 V2 v3
# 1: 1 A <list[2]>
# 2: 2 B <list[2]>
# 3: 3 C <list[2]>
# 4: 4 D <list[2]>
unlist(dat$v3[[1]])
#[1] "1" "A"
We can use Map + asplit like below
> dat[, V3 := Map(as.list, asplit(.SD, 1))][]
V1 V2 V3
1: 1 A <list[2]>
2: 2 B <list[2]>
3: 3 C <list[2]>
4: 4 D <list[2]>
Using Map we could do
dat[, V3 := do.call(Map, c(f = function(x, y) as.list(c(x, y)), unname(.SD)))]
dat
V1 V2 V3
1: 1 A <list[2]>
2: 2 B <list[2]>
3: 3 C <list[2]>
4: 4 D <list[2]>
I have two different lists of dataframes. Some of the dataframes in the two lists have the same name, and others dont. When I merge the two lists, I need the dataframes with the same name to be merged rbind-style, and the ones that are unique in both lists just to remain as unique dataframes and tack on to the newly created merged list of dataframes.
The list1 is likely to have more dataframes and more rows per dataframe than list2 since it will be the cumulatively binded list as a result of a loop. List2 is the new result of each loop to be added to the cumulative list1.
Mock Example:
mydf1 <- data.frame(V1=1, V2=rep("A", 4))
mydf2 <- data.frame(V1=1, V2=rep("B", 3))
mydf3 <- data.frame(V1=1, V2=rep("C", 2))
mydf4 <- data.frame(V1=2, V2="A")
mydf5 <- data.frame(V1=3, V2="C")
mydf6 <- data.frame(V1=4, V2="D")
mydf7 <- data.frame(V1=7, V2="E")
list1 <- list(AA=mydf1, BB=mydf2, CC=mydf3)
list2 <- list(AA=mydf4, CC=mydf5, DD=mydf6, EE=mydf7)
Expected result:
$AA
V1 V2
1 1 A
2 1 A
3 1 A
4 1 A
1 2 A
$BB
V1 V2
1 1 B
2 1 B
3 1 B
$CC
V1 V2
1 1 C
2 1 C
1 3 C
$DD
V1 V2
1 4 D
$EE
V1 V2
1 7 E
I have tried with the solution here, but have not been able to get them to work properly.
This solution isn't putting the right dataframes together and is creating other weird combinations.
(m <- match(names(list2), names(list1), nomatch = 0L))
# [1] 1 1 2
Map(rbind, list1[m], list2)
and this one appears to just never rbind the dataframes with the same names, all the dataframes just keep 1 row.
stackMe <- function(x) {
a <- eval.parent(quote(names(X)))[substitute(x)[[3]]]
rbind(list1[[a]], x)
}
lapply(list2, stackMe)
How can I merge two list of dataframes where those dataframes with the same name just append/rbind the rows, and other unique dataframes are just tacked on to the list?
We can use indexing with Map
list1[names(list2)] <- Map(rbind, list1[names(list2)], list2)
list1
#$AA
# V1 V2
#1 1 A
#2 1 A
#3 1 A
#4 1 A
#5 2 A
#$BB
# V1 V2
#1 1 B
#2 1 B
#3 1 B
#$CC
# V1 V2
#1 1 C
#2 1 C
#3 3 C
#$DD
# V1 V2
#1 4 D
#$EE
# V1 V2
#1 7 E
I would like to convert the following list into a dataframe.
test <- list(list("a",c("b","c","d"),character(0)),list("j",c("r","s"),character(0)),list("h",character(0),"i"))
I tried the following:
df.test <- do.call(rbind,Map(data.frame, V1=sapply(test, "[[", 1),V2=sapply(test, "[[", 2),V3=sapply(test, "[[", 3)))
But this doesn't work with nested lists containing character(0). A satisfactory output looks something like this:
V1 V2 V3
1 a b NA
2 a c NA
3 a d NA
4 j r NA
5 j s NA
6 h NA i
Many thanks in advance.
library(tidyverse)
test %>%
map_df(~.x %>%
map(~if(length(.)) . else NA) %>%
do.call(what = cbind) %>%
as_tibble)
Gives
V1 V2 V3
<chr> <chr> <chr>
1 a b NA
2 a c NA
3 a d NA
4 j r NA
5 j s NA
6 h NA i
I've searched longer than I'd like to admit for shifting leading NA's to the end.
Got close with a few stack questions "Cut out outer NAs in R","Rotate a Matrix in R","na.locf remove leading NAs, keep others [closed]" as well as looking over na.trim function in zoo package. Essentially I want to turn this:
D <- matrix(c(1:9), 3)
D[2,1]<- NA
D[3,1]<- NA
D[3,2]<- NA
D <- as.data.frame(D)
into this:
D1 <- data.frame(V1 = c(1,5,9),
V2 = c(4,8,NA),
V3 = c(7,NA,NA))
Any help is as always, much appreciated!
Thanks,
You can use sort(..., na.last = T) within row-wise apply:
as.data.frame(t(apply(D, 1, sort, na.last = T)))
# V1 V2 V3
#1 1 4 7
#2 5 8 NA
#3 9 NA NA
Update
To avoid ordering non-NA entries, you can do:
# Revised sample data
D <- matrix(c(1:9), 3)
D[2,1]<- NA
D[3,1]<- NA
D[3,2]<- NA
D <- as.data.frame(D)
D[2,2:3] <- c(8, 5);
D;
# V1 V2 V3
#1 1 4 7
#2 NA 8 5
#3 NA NA 9
as.data.frame(t(apply(D, 1, function(x) c(x[!is.na(x)], x[is.na(x)]))))
#V1 V2 V3
#1 1 4 7
#2 8 5 NA
#3 9 NA NA
I am trying to use a loop to print the column (variable) corresponding to certain specific values of a row (sample) in a large data frame. For example
c1<-c(1,2,3)
c2<-c(4,5,6)
c3<-c(7,8,9)
data<-as.data.frame(rbind(c1,c2,c3))
row V1 V2 V3
r1 1 2 3
r2 4 5 6
r3 7 8 9
If ri=j ( where j is a list of values) then I want to add to the data frame the column in which the value j of the row i is inserted. For example if the target values of the list were
for r1=2
for r2=12
for r3=7
Then the results would be
row V1 V2 V3 V4 V5
r1 1 2 3 2 1
r2 4 5 6 5 4
r3 7 8 9 8 7
Any advice?
Suppose we have a list, where each element is the values you want to match for each row:
r.values <- list(r1=c(2), r2=c(12), r3=c(7))
And your data frame as before:
data <- data.frame(c(1,2,3), c(4,5,6), c(7,8,9))
Now we want to build a vector of column indices of interest based on where each row matches the values in r.values:
indices <- c()
for (i in 1:nrow(data)) {
indices <- c(indices, which(data[i,] %in% r.values[[i]]))
}
data[,indices]
This gives you the following:
V2 V1
c1 2 1
c2 5 4
c3 8 7
The nice thing is this can be extended to look at multiple values across rows, or ignore rows entirely:
r.values <- list(c(2,3), NA, r3=c(7,8))
Running the above loop again gives you:
V2 V3 V1 V2.1
c1 2 3 1 2
c2 5 6 4 5
c3 8 9 7 8
Let's say you have a vector of row indices rows <- c(1,2,3) and a vector of corresponding values val <- c(2, 12, 7).
We start creating a vector to grab all columns that should be added:
newcols <- c()
for(i in seq_along(rows))
{
temp <- which(data[rows[i],]==val[i])
if(length(temp)==0) temp <- NA
newcols[i] <- temp
}
Now we simply add your columns:
result <- cbind(data, data[, newcols[!is.na(newcols)]])
i <- c(1, 2, 3)
j <- c(2, 12, 7)
col.idx <- mapply(match, j, split(data, rownames(data))[i])
# [1] 2 NA 1
data.frame(data, data[i, na.omit(col.idx)])
# V1 V2 V3 V2.1 V1.1
# c1 1 2 3 2 1
# c2 4 5 6 5 4
# c3 7 8 9 8 7