How do I remove empty data frames from a list? - r

I've got dozens of lists, each is a collection of 11 data frames. Some data frames are empty (another script did not output any data, not a bug).
I need to push each list through a function but that chokes when it sees an empty data frame. So how do I write a function that will take a list, do a dim on each element (i.e. data frame) and if it's 0, then skip to the next.
I tried something like this:
empties <- function (mlist)
{
for(i in 1:length(mlist))
{
if(dim(mlist[[i]])[1]!=0) return (mlist[[i]])
}
}
But clearly, that didn't work. I would do this manually at this point but that would take forever. Help?

I'm not sure if this is exactly what you're asking for, but if you want to trim mlist down to contain only non-empty data frames before running the function on it, try mlist[sapply(mlist, function(x) dim(x)[1]) > 0].
E.g.:
R> M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
R> M2 <- data.frame(matrix(nrow = 0, ncol = 0))
R> M3 <- data.frame(matrix(9:12, nrow = 2, ncol = 2))
R> mlist <- list(M1, M2, M3)
R> mlist[sapply(mlist, function(x) dim(x)[1]) > 0]
[[1]]
X1 X2
1 1 3
2 2 4
[[2]]
X1 X2
1 9 11
2 10 12

A slightly simpler and more transparent approach to the sapply/indexing combination is to use the Filter() function:
> Filter(function(x) dim(x)[1] > 0, mlist)
[[1]]
X1 X2
1 1 3
2 2 4
[[2]]
X1 X2
1 9 11
2 10 12

Instead of dim(x)[1] you could make use of nrow, so you could do
mlist[sapply(mlist, nrow) > 0]
Filter(function(x) nrow(x) > 0, mlist)
You could also use keep and discard from purrr
purrr::keep(mlist, ~nrow(.) > 0)
purrr::discard(mlist, ~nrow(.) == 0)
There is also compact in purrr which removes all empty elements directly. It is a wrapper on discard
purrr::compact(mlist)
If you are ok to filter the list based on number of columns, you could replace nrow with ncol in above answers. Additionally, you could also use lengths to filter the list.
mlist[lengths(mlist) > 0]

Adding tidyverse option:
library(tidyverse)
mlist[map(mlist, function(x) dim(x)[1]) > 0]
mlist[map(mlist, ~dim(.)[1]) > 0]

Related

Use paste0 to create multiple object names with a for loop

I would like to create multiple object names with a for loop. I have tried the following which fails horribly:
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in length(1:3)) {
x <- as.name(paste0("somevar_",[i]))
x[2]
}
The desired result is x being somevar_1, somevar_2, somevar_3 for the respective iterations, and x[2] being 2, 5 and 8 respectively.
How should I do this?
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in 1:3) {
x <- get(paste0("somevar_", n))
print(x[2])
}
Result
[1] 2
[1] 5
[1] 8
We can use mget to get all the required objects in a list and use sapply to subset 2nd element from each of them.
sapply(mget(paste0("somevar_", 1:3)), `[`, 2)
#somevar_1 somevar_2 somevar_3
# 2 5 8

Trouble applying function to data frame

Toy example:
> myfn = function(a,x){sum(a*x)}
> myfn(a=2, x=c(1,2,3))
[1] 12
Good so far. Now:
> df = data.frame(a=c(4,5))
> df$ans = myfn(a=df$a, x=c(1,2,3))
Warning message:
In a * x : longer object length is not a multiple of shorter object length
> df
a ans
1 4 26
2 5 26
What I want to happen is that for the first row, it is as if I called myfn(a=4, x=c(1,2,3), giving an answer of 24, and for the second row, it is as if I called myfn(a=5, x=c(1,2,3) giving an answer of 30. How do I do this? Thank you.
EDIT: slightly more complex version. Now suppose that the function is
myfn = function(a,b, x){sum((a+b)*x)}
and that I have the data frame
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
I want to create df$ans such that, for the first row it is as if I called myfn(a=4, b=6, x=c(1,2,3) and for the second for it is as if I called myfn(a=5, b=7, x=c(1,2,3), that is, use df$x for a, df$y for b, and ignore df$z.
Something like this would work:
myfn = function(a,x){
return(sum(a*x))
}
df <- data.frame(a=c(4,5))
df$ans <- apply(df, 1, myfn, x = c(1,2,3))
df$ans
a ans
1 4 24
2 5 30
** Edited Based On User Edit **
df = data.frame(a=c(4,5), b=c(6,7), c=c(9,9))
df$ans <- apply(df[, c("a", "b")], 1, function(y) sum((y['a']+y['b'])*c(1,2,3)))
a b c ans
1 4 6 9 60
2 5 7 9 72
There are several ways this can be done, each with it's own charms. If you don't want to modify the function I would just do
mapply(myfn, df$x, df$y, MoreArgs = list(x = 1:3))
Alternatively, you can bake the iteration right into the function, e.g,
myfn = function(a,b, x){
sapply(a+b, function(ab) {
sum(ab*x)
})
}
myfn(df$x, df$y, 1:3)
That's probably the way I would do it.

Multiply values of column with itself in R

I am trying to multiply elements of column with itself but am unable to do it.
I have column A with values a, b, c, I want answer as (a*b + a*c + b*c).
For example, with
A <- c(2, 3, 5) the expected output is sum(6 + 10 + 15) = 31.
I am trying to run for loop to execute but was failing. Can anyone please provide R code to do this.
example data :
df1 <- data.frame(A=c(2,3,5))
combn will give you the combinations
combinations <- combn(df1$A,2)
# [,1] [,2] [,3]
# [1,] 2 2 3
# [2,] 3 5 5
apply with margin 2 (by columns), will do the multiplication
multiplied_terms <- apply(combinations,2,function(x) x[1]*x[2])
# [1] 6 10 15
Or shorter and more general, thanks to #zacdav :
multiplied_terms <- apply(combinations,2,prod)
then we can sum them
output <- sum(multiplied_terms)
# [1] 31
Piped for a compact solution:
library(magrittr)
df1$A %>% combn(2) %>% apply(2,prod) %>% sum
Here's another way. Approach by #Moody_Mudskipper maybe easier to extend to groups of 3 etc. But, I think this should be much faster since there isn't the need to actually find the combinations.
Using for loop
It just goes through the vector A multiplying the rest of the elements until the last one.
len <- length(A)
res <- numeric(0)
for (j in seq_len(len - 1))
res <- res + sum(A[j] * A[(j+1) : len]))
res
#[1] 31
Using lapply or sapply
The for loop can be replaced by using lapply
res <- sum(unlist(lapply(1 : (len - 1), function(j) sum(A[j] * A[(j+1) : len]))))
or sapply,
res <- sum(sapply(1 : (len - 1), function(j) sum(A[j] * A[(j+1) : len])))
I didn't check which of these is the fastest.
# If you need to store the pairwise multiplications, then use the following;
# res <- NULL
# for (j in 1 : (len-1))
# res <- c(res, A[j] * A[(j+1) : len])
# res
# [1] 6 10 15
# sum(res)
# [1] 31

Apply a list of n functions to each row of a dataframe?

I have a list of functions
funs <- list(fn1 = function(x) x^2,
fn2 = function(x) x^3,
fn3 = function(x) sin(x),
fn4 = function(x) x+1)
#in reality these are all f = splinefun()
And I have a dataframe:
mydata <- data.frame(x1 = c(1, 2, 3, 2),
x2 = c(3, 2, 1, 0),
x3 = c(1, 2, 2, 3),
x4 = c(1, 2, 1, 2))
#actually a 500x15 dataframe of 500 samples from 15 parameters
For each of i rows, I would like to evaluate function j on each of the j columns and sum the results:
unlist(funs)
attach(mydata)
a <- rep(NA,4)
for (i in 1:4) {
a[i] <- sum(fn1(x1[i]), fn2(x2[i]), fn3(x3[i]), fn4(x4[i]))
}
How can I do this efficiently? Is this an appropriate occasion to implement plyr functions? If so, how?
bonus question: why is a[4] NA?
Is this an appropriate time to use functions from plyr, if so, how can I do so?
Ignoring your code snippet and sticking to your initial specification that you want to apply function j on the column number j and then "sum the results"... you can do:
mapply( do.call, funs, lapply( mydata, list))
# [,1] [,2] [,3] [,4]
# [1,] 1 27 0.8414710 2
# [2,] 4 8 0.9092974 3
# [3,] 9 1 0.9092974 3
I wasn't sure which way you want to now add the results (i.e. row-wise or column-wise), so you could either do rowSums or colSums on this matrix. E.g:
colSums( mapply( do.call, funs, lapply( mydata, list)) )
# [1] 14.000000 36.000000 2.660066 8.000000
Why don't just write one function for all 4 and apply it to the data frame?
All your functions are vectorized, and so is splinefun, and this will work:
fun <- function(df)
cbind(df[, 1]^2, df[, 2]^3, sin(df[, 3]), df[, 4] + 1)
rowSums(fun(mydata))
This is considerably more efficient than "foring" or "applying" over the rows.
I tried using plyr::each:
library(plyr)
sapply(mydata, each(min, max))
x1 x2 x3 x4
min 1 0 1 1
max 3 3 3 2
and it works fine, but when I pass custom functions I get:
sapply(mydata, each(fn1, fn2))
Error in proto[[i]] <- fs[[i]](x, ...) :
more elements supplied than there are to replace
each has very brief documentation, I don't quite get what's the problem.

Check that a vector is contained in a matrix in R

I can't believe this is taking me this long to figure out, and I still can't figure it out.
I need to keep a collection of vectors, and later check that a certain vector is in that collection. I tried lists combined with %in% but that doesn't appear to work properly.
My next idea was to create a matrix and rbind vectors to it, but now I don't know how to check if a vector is contained in a matrix. %in appears to compare sets and not exact rows. Same appears to apply to intersect.
Help much appreciated!
Do you mean like this:
wantVec <- c(3,1,2)
myList <- list(A = c(1:3), B = c(3,1,2), C = c(2,3,1))
sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or, is the vector in the set?
any(sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec))
We can do a similar thing with a matrix:
myMat <- matrix(unlist(myList), ncol = 3, byrow = TRUE)
## As the vectors are now in the rows, we use apply over the rows
apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec))
Or by columns:
myMat2 <- matrix(unlist(myList), ncol = 3)
## As the vectors are now in the cols, we use apply over the cols
apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec))
If you need to do this a lot, write your own function
vecMatch <- function(x, want) {
isTRUE(all.equal(x, want))
}
And then use it, e.g. on the list myList:
> sapply(myList, vecMatch, wantVec)
A B C
FALSE TRUE FALSE
> any(sapply(myList, vecMatch, wantVec))
[1] TRUE
Or even wrap the whole thing:
vecMatch <- function(x, want) {
out <- sapply(x, function(x, want) isTRUE(all.equal(x, want)), want)
any(out)
}
> vecMatch(myList, wantVec)
[1] TRUE
> vecMatch(myList, 5:3)
[1] FALSE
EDIT: Quick comment on why I used isTRUE() wrapped around the all.equal() calls. This is due to the fact that where the two arguments are not equal, all.equal() doesn't return a logical value (FALSE):
> all.equal(1:3, c(3,2,1))
[1] "Mean relative difference: 1"
isTRUE() is useful here because it returns TRUE iff it's argument is TRUE, whilst it returns FALSE if it is anything else.
> M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
v <- c(2, 5, 8)
check each column:
c1 <- which(M[, 1] == v[1])
c2 <- which(M[, 2] == v[2])
c3 <- which(M[, 3] == v[3])
Here is a way to still use intersect() on more than 2 elements
> intersect(intersect(c1, c2), c3)
[1] 2

Resources