Find the maximum values of a variable in a list - r

I have a list of the following structure & I intend to find the maximum value of X2 in the second variable (b) in the list
sample data
[[1]]
[[1]]$a
[1] 2
[[1]]$b
X1 X2
1 58 1686729
2 106 1682303
[[2]]
[[2]]$a
[1] 3
[[2]]$b
X1 X2
1 24 1642468
2 89 1695581
3 156 1634019
I looked into multiple filters that can be applied to the list like:
library(rlist)
list.filter(result, max(b$area))
and also tried lapply but with no success
lapply(result, function(x) x[which.max(x$b)])
I need the following output:
a x1 x2
2 58 1686729
3 89 1695581

With lapply() you can find the max of X2 in $b in each list, then cbind() with the a element.
l_max <- lapply(l, function(x) {
b <- x$b
cbind(a=x$a, b[which.max(b$X2),])
})
Use bind_rows() from dplyr for binding together.
l_max %>%
dplyr::bind_rows()
# a X1 X2
# 1 2 58 1686729
# 2 3 89 1695581
Example data:
l <- list(
list(a = 2,
b = data.frame(X1 = c(58, 106), X2 = c(1686729, 1682303))),
list(a = 3,
b = data.frame(X1 = c(24, 89,156), X2 = c(1642468, 1695581,1634019)))
)
With your example:
l_max <- lapply(l, function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),]) # NOTICE I used [,2] to refer to the second column
#b$area works too if all df share the col name
})
l_max %>%
dplyr::bind_rows()
# a rt area
# 1 2 58 1686729
# 2 3 89 1695581
# 3 4 101 1679889
# 4 5 88 1695983
# 5 6 105 1706445
# 6 7 121 1702019
Another solution with purrr::map_df() avoids the use of bind_rows():
purrr::map_df(l, function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),])
})
All base R using mapply():
t(mapply(function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),])
}, l))
Or with Map():
do.call("rbind", Map(function(x) {
b <- x$b
cbind(a = x$a, b[which.max(b[,2]),])
}, l))

You can also use sapply():
t(sapply(list, function(elem){
c(a = elem$a, elem$b[which.max(elem$b$area), ])
}))

Related

Find values in data frame 2 which is found in data frame 1, within a certain range

I want to find which values in df2 which is also present in df1, within a certain range. One value is considering both a and b in the data frames (a & b can't split up). For examples, can I find 9,1 (df1[1,1]) in df2? It doesn't have to be on the same position. Also, we can allow a diff of for example 1 for "a" and 1 for "b". For example, I want to find all values 9+-1,1+-1 in df2. "a" & "b" always go together, each row stick together. Does anyone have a suggestion of how to code this? Many many thanks!
set.seed(1)
a <- sample(10,5)
set.seed(1)
b <- sample(5,5, replace=T)
feature <- LETTERS[1:5]
df1 <- data.frame(feature,a,b)
df1
> df1
feature a b
A 9 1
B 4 4
C 7 1
D 1 2
E 2 5
set.seed(2)
a <- sample(10,5)
b <- sample(5,5, replace=T)
feature <- LETTERS[1:5]
df2 <- data.frame(feature,a,b)
df2
df2
feature a b
A 5 1
B 6 4
C 9 5
D 1 1
E 10 2
Not correct but Im imaging this can be done for a for loop somehow!
for(i in df1[,1]) {
for(j in df1[,2]){
s<- c(s,(df1[i,1] & df1[j,2]== df2[,1] & df2[,2]))# how to add certain allowed diff levels?
}
}
s
Output wanted:
feature_df1 <- LETTERS[1:5]
match <- c(1,0,0,1,0)
feature_df2 <- c("E","","","D", "")
df <- data.frame(feature_df1, match, feature_df2)
df
feature_df1 match feature_df2
A 1 E
B 0
C 0
D 1 D
E 0
I loooove data.table, which is (imo) the weapon of choice for these kind of problems..
library( data.table )
#make df1 and df2 a data.table
setDT(df1, key = "feature"); setDT(df2)
#now perform a join operation on each row of df1,
# creating an on-the-fly subset of df2
df1[ df1, c( "match", "feature_df2") := {
val = df2[ a %between% c( i.a - 1, i.a + 1) & b %between% c(i.b - 1, i.b + 1 ), ]
unique_val = sort( unique( val$feature ) )
num_val = length( unique_val )
list( num_val, paste0( unique_val, collapse = ";" ) )
}, by = .EACHI ][]
# feature a b match feature_df2
# 1: A 9 1 1 E
# 2: B 4 4 0
# 3: C 7 1 0
# 4: D 1 2 1 D
# 5: E 2 5 0
One way to go about this in Base R would be to split the data.frames() into a list of rows then calculate the absolute difference of row vectors to then evaluate how large the absolute difference is and if said difference is larger than a given value.
Code
# Find the absolute difference of all row vectors
listdif <- lapply(l1, function(x){
lapply(l2, function(y){
abs(x - y)
})
})
# Then flatten the list to a list of data.frames
listdifflat <- lapply(listdif, function(x){
do.call(rbind, x)
})
# Finally see if a pair of numbers is within our threshhold or not
m1 <- 2
m2 <- 3
listfin <- Map(function(x){
x[1] > m1 | x[2] > m2
},
listdifflat)
head(listfin, 1)
[[1]]
V1
[1,] TRUE
[2,] FALSE
[3,] TRUE
[4,] TRUE
[5,] TRUE
[6,] TRUE
[7,] TRUE
[8,] TRUE
[9,] TRUE
[10,] TRUE
Data
df1 <- read.table(text = "
4 1
7 5
1 5
2 10
13 6
19 10
11 7
17 9
14 5
3 5")
df2 <- read.table(text = "
15 1
6 3
19 6
8 2
1 3
13 7
16 8
12 7
9 1
2 6")
# convert df to list of row vectors
l1<- lapply(1:nrow(df1), function(x){
df1[x, ]
})
l2 <- lapply(1:nrow(df2), function(x){
df2[x, ]
})

Loop with table function in R

I am a very beginner with R. I have a question about the table function. I have a result like this :
table(my_vector)
1 2 3
11 23 7
And I want to extract elements from a matrix :
From 1 to 11 as my_matrix[1:11,]
Form 11+1 to 11+23 as my_matrix[12:34,]
Form 11+23+1 to 11+23+7 as my_matrix[35:41,]
How can I do a loop with this ?
Think this should do it
my_matrix <- matrix(rep(1:41, times=3), 41)
my_vector <- rep(1:3, times=c(11, 23, 7))
my_tab <- table(my_vector)
my_csum1 <- c(0, cumsum(my_tab)) + 1
my_csum2 <- cumsum(my_tab)
my_list <- list()
for (i in 1:length(my_csum2)) {
my_list[[i]] <- my_matrix[my_csum1[i]:my_csum2[i], ]
}
lapply(my_list, range)
# [[1]]
# [1] 1 11
# [[2]]
# [1] 12 34
# [[3]]
# [1] 35 41

LIst of lists in R into a data.frame - inconsistent variable names

I have a list of lists and I want to convert it into a dataframe. The challenge is that there are missing variables names in lists (not NA's but the variable is missing completely).
To illustrate on example: from
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
I would like to get
a b c
[1,] 1 2 3
[2,] 4 NA 6
Another option is
library(reshape2)
as.data.frame(acast(melt(my_list), L1~L2, value.var='value'))
# a b c
#1 1 2 3
#2 4 NA 6
Or as #David Arenburg suggested a wrapper for melt/dcast would be recast
recast(my_list, L1 ~ L2, value.var = 'value')[, -1]
# a b c
#1 1 2 3
#2 4 NA 6
You can use the bind_rows function from the dplyr package :
my_list <- list()
my_list[[1]] <- list(a = 1, b = 2, c = 3)
my_list[[2]] <- list(a = 4, c = 6)
dplyr::bind_rows(lapply(my_list, as.data.frame))
This outputs:
Source: local data frame [2 x 3]
a b c
1 1 2 3
2 4 NA 6
Another answer, this requires to change the class of the arguments to data.frames:
library(plyr)
lista <- list(a=1, b=2, c =3)
listb <- list(a=4, c=6)
lista <- as.data.frame(lista)
listb <- as.data.frame(listb)
my_list <- list(lista, listb)
my_list <- do.call(rbind.fill, my_list)
my_list
a b c
1 1 2 3
2 4 NA 6

Operate on attributes of data.frames stored in a list using lapply

Consider the following:
df <- list(df1 = data.frame(a.1 = 1, b..2 = 2), df2 = data.frame(c.1 = 3, d...4 = 5, e..3 = 8))
## $df1
## a.1 b..2
## 1 1 2
## $df2
## c.1 d...4 e..3
## 1 3 5 8
names(df[[1]])
## [1] "a.1" "b..2"
Now, for example, I would like to remove the dotes out of the data frames' names so the output will be
## $df1
## a1 b2
## 1 1 2
## $df2
## c1 d4 e3
## 1 3 5 8
Obvioulsy the following won't work
lapply(names(df), function(x) gsub("[.]", "", x))
Neither will this
lapply(df[attributes(df)$names], function(x) gsub("[.]", "", x))
a for loop works though
for(i in 1:length(df)){names(df[[i]]) <- gsub("[.]", "", names(df[[i]]))}
## $df1
## a1 b2
## 1 1 2
## $df2
## c1 d4 e3
## 1 3 5 8
Edit
#jdharrison solution is very nice, but i was looking to find a way to operate only on the attributes rather on the whole data set (like in the for loop), something like
df2 <- list(a..1 = 2, b..3 = 5)
names(df2) <- lapply(names(df2), function(x) gsub("[.]", "", x))
Something like this maybe?
lapply(df, function(x){
`names<-`(x, gsub("[.]", "",names(x)))
}
)
> lapply(df, function(x){ `names<-`(x, gsub("[.]", "",names(x)))})
$df1
a1 b2
1 1 2
$df2
c1 d4 e3
1 3 5 8

Alternative to class() that does not distinguish between "numeric" and "integer"

Given a data.frame, I would like to test if all the columns are of the same "class". if they are I'd like to leave the data.frame as is. If they aren't I'd like to keep all columns that match the first variables class and drop any that are not of that class. The exception being that, for my purposes, integer and numeric are equal.
For example:
dat <- data.frame(numeric,numeric,integer,factor)
Would be:
data.frame(numeric,numeric,integer)
Additionally
dat <- data.frame(character,character,integer)
Would be:
data.frame(character,character)
And finally:
dat <- data.frame(numeric,numeric,numeric,factor)
Would be:
data.frame(numeric,numeric,numeric)
I would do this:
dat <- data.frame(
a=as.integer(1:26), b=as.integer(26:1), c=as.numeric(1:26), d=as.factor(1:26)
)
Create two helper functions:
is.numint <- function(x)is.numeric(x) || is.integer(x)
is.charfact <- function(x)is.character(x) || is.factor(x)
Return only numeric columns:
head(dat[, sapply(dat, is.numint)])
a b c
1 1 26 1
2 2 25 2
3 3 24 3
4 4 23 4
5 5 22 5
Return only factor columns:
head(dat[, sapply(dat, is.charfact), drop=FALSE])
d
1 1
2 2
3 3
4 4
5 5
6 6
Combining this approach, and rewriting your function:
dropext <- function(x){
is.numint <- function(x)is.numeric(x) || is.integer(x)
is.charfact <- function(x)is.character(x) || is.factor(x)
cl <- rep(NA, length(x))
cl[sapply(x, is.numint)] <- "num"
cl[sapply(x, is.charfact)] <- "char"
x[, cl == unique(cl)[1], drop=FALSE]
}
dropext(dat)
a b c
1 1 26 1
2 2 25 2
3 3 24 3
4 4 23 4
5 5 22 5
How about:
if(length(unique(cl <- sapply(dat, class))) > 1 &&
any(!sapply(dat, is.numeric))) {
dat <- dat[ , which(cl == cl[1]), drop = FALSE]
}
This assumes that in the following example:
dat2 <- data.frame(A = factor(sample(LETTERS, 26, replace = TRUE)),
B = factor(sample(LETTERS, 26, replace = TRUE)),
C = sample(LETTERS, 26, replace = TRUE),
dat, stringsAsFactors = FALSE)
> sapply(dat2, class)
A B C
"factor" "factor" "character"
as.integer.1.26. as.integer.26.1. as.numeric.1.26.
"integer" "integer" "numeric"
you want only the factor variables, i.e. you want to distinguish between character and factor variables - which is what your code appears to do.
For this example I used
if(length(unique(cl <- sapply(dat2, class))) > 1 &&
any(!sapply(dat2, is.numeric))) {
dat2 <- dat2[ ,which(cl == cl[1]), drop = FALSE]
}
which results in
> head(dat2)
A B
1 D G
2 P D
3 C T
4 X F
5 N R
6 A E
> sapply(dat2, class)
A B
"factor" "factor"
On dat, the above if() statement would not change dat:
> if(length(unique(cl <- sapply(dat, class))) > 1 &&
+ any(!sapply(dat, is.numeric))) {
+ dat <- dat[ , which(cl == cl[1]), drop = FALSE]
+ }
> head(dat)
as.integer.1.26. as.integer.26.1. as.numeric.1.26.
1 1 26 1
2 2 25 2
3 3 24 3
4 4 23 4
5 5 22 5
6 6 21 6
Appreciate the commentary and your answers, in the end all I needed was a class() function that does not distinguish between integer and numeric. Which can be accomplished with a simple wrapper.
class.wrap <- function(x) {
test <- class(x)
if(test == "integer") test <- "numeric"
return(test)
}

Resources