Summing a dataframe with lapply - r

Here I'm attempting to sum the columns for c1,c2,c3 and add total to a new column in res dataframe :
res <- data.frame("ID" = c(1,2), "c1" = c(1,2), "c2" = c(3,4), "c3" = c(5,6))
res_subset <- data.frame(res$c1 , res$c2 , res$c3)
tr <- t(res_subset)
s1 <- lapply(tr , function(x){
sum(x)
})
s1 contains :
s1
[[1]]
[1] 1
[[2]]
[1] 3
[[3]]
[1] 5
[[4]]
[1] 2
[[5]]
[1] 4
[[6]]
[1] 6
I take the transpose of the columns to be summed ( tr <- t(res_subset) ) as lapply executes function against each column but I'm attempting to execute function against row.
Is there an issue with how I take transpose as this appears to work for simpler example :
res1 <- data.frame("c1" = c(1,2), "c2" = c(3,4), "c3" = c(5,6))
lapply(res1 , function(x){
sum(x)
})
returns :
$c1
[1] 3
$c2
[1] 7
$c3
[1] 11

If I understood right what you need, just use rowSums() function.
res$sum <- rowSums(res[,2:4])

The function sum returns a scalar, which is not what you want here. Instead, col1 + col2 + ... gives the desired result. So you can use Reduce in combination with +:
res$sum <- Reduce(`+`, res[, c('c1','c2','c3')])
The + operator must be quoted with backticks, since we are using it as a function. (I think quoting with normal quotation marks is OK too.)
rowSums also works, but my understanding is that it will create an intermediate matrix, which is not efficient.

Related

subset of list of vector with grep?

I have a list of vector and I want to create a new list containing any value containing the letter 'a' but keep in internal structure.
l = list ( g1 = c('a','b','ca') ,
g2 = c('a','b') )
lapply(l, function(x) grep('a',x) )
lapply on provides the index number but what I want it to return are the values.
The end result should be a list with vector g1 containing a and ca whilst g2 with just a.
thanks!
Add value = TRUE.
lapply(l, function(x) grep('a', x, value = TRUE))
# $g1
# [1] "a" "ca"
#
# $g2
# [1] "a"
Alternatively, you can do:
lapply(l, function(x) x[grepl("a", x)])
$g1
[1] "a" "ca"
$g2
[1] "a"
If you want to try with tidyverse here are couple of approaches.
library(tidyverse)
map(l, ~grep('a', .x, value=T))
map(l, ~str_subset(.x, 'a')) # str_subset from stringr package is a wrapper for grep shown above.

Remove duplicated elements from list

I have a list of character vectors:
my.list <- list(e1 = c("a","b","c","k"),e2 = c("b","d","e"),e3 = c("t","d","g","a","f"))
And I'm looking for a function that for any character that appears more than once across the list's vectors (in each vector a character can only appear once), will only keep the first appearance.
The result list for this example would therefore be:
res.list <- list(e1 = c("a","b","c","k"),e2 = c("d","e"),e3 = c("t","g","f"))
Note that it is possible that an entire vector in the list is eliminated so that the number of elements in the resulting list doesn't necessarily have to be equal to the input list.
We can unlist the list, get a logical list using duplicated and extract the elements in 'my.list' based on the logical index
un <- unlist(my.list)
res <- Map(`[`, my.list, relist(!duplicated(un), skeleton = my.list))
identical(res, res.list)
#[1] TRUE
Here is an alternative using mapply with setdiff and Reduce.
# make a copy of my.list
res.list <- my.list
# take set difference between contents of list elements and accumulated elements
res.list[-1] <- mapply("setdiff", res.list[-1],
head(Reduce(c, my.list, accumulate=TRUE), -1))
Keeping the first element of the list, we compute on subsequent elements and the a list of the cumulative vector of elements produced by Reduce with c and the accumulate=TRUE argument. head(..., -1) drops the final list item containing all elements so that the lengths align.
This returns
res.list
$e1
[1] "a" "b" "c" "k"
$e2
[1] "d" "e"
$e3
[1] "t" "g" "f"
Note that in Reduce, we could replace c with function(x, y) unique(c(x, y)) and accomplish the same ultimate output.
I found the solutions here very complex for my understanding and sought a simpler technique. Suppose you have the following list.
my_list <- list(a = c(1,2,3,4,5,5), b = c(1,2,2,3,3,4,4),
d = c("Mary", "Mary", "John", "John"))
The following much simpler piece of code removes the duplicates.
sapply(my_list, unique)
You will end up with the following.
$a
[1] 1 2 3 4 5
$b
[1] 1 2 3 4
$d
[1] "Mary" "John"
There is beauty in simplicity!

Applying as.numeric only to elements of a list that can be coerced to numeric (in R)

I have a function which returns a list containing individual character vectors which I would like to convert to numeric. Most of the time, all the elements of the list can easily be coerced to numeric:
and so a simplelapply(x, FUN = as.numeric) works fine.
e.g.
l <- list(a = c("1","1"), b = c("2","2"))
l
$a
[1] "1" "1"
$b
[1] "2" "2"
lapply(l, FUN = as.numeric)
$a
[1] 1 1
$b
[1] 2 2
However, in some situations, vectors contain true characters:
e.g.
l <- list(a = c("1","1"), b = c("a","b"))
l
$a
[1] "1" "1"
$b
[1] "a" "b"
lapply(l, FUN = as.numeric)
$a
[1] 1 1
$b
[1] NA NA
The solution I have come with works but feels a little convoluted:
l.id <- unlist(lapply(l, FUN = function(x){all(!is.na(suppressWarnings(as.numeric(x))))}))
l.id
a b
TRUE FALSE
l[l.id] <- lapply(l[l.id], FUN = as.numeric)
l
$a
[1] 1 1
$b
[1] "a" "b"
So I was just wondering if anyone out there had a more streamlined and elegant solution to suggest.
Thanks!
One option would be to check whether all the elements in the vector have only numbers and if so convert to numeric or else stay as the same.
lapply(l, function(x) if(all(grepl('^[0-9.]+$', x))) as.numeric(x) else x)
Or we can use type.convert to automatically convert the class, but the character vectors will be converted to factor class.
lapply(l, type.convert)
You could also do something like
lapply(l, function(x) if(is.numeric(t <- type.convert(x))) t else x)
# $a
# [1] 1 1
#
# $b
# [1] "a" "b"
This doesn't convert anything other than when a numeric results from type.convert(). Or, for this simple case we can use as.is = TRUE but note that this will not always give us what we want.
lapply(l, type.convert, as.is = TRUE)
# $a
# [1] 1 1
#
# $b
# [1] "a" "b"

R dataframe factors

I want to droplevels a dataframe (please do not mark this question as duplicate :)).
Given all the methods available only one works. What am I doing wrong?
Example:
> df = data.frame(x = (c("a","b","c")),y=c("d","e","f"))
> class(df$x)
[1] "factor"
> levels(df$x)
[1] "a" "b" "c"
Method 1 not working:
> df1 = droplevels(df)
> class(df1$x)
[1] "factor"
> levels(df1$x)
[1] "a" "b" "c"
Method 2 not working:
> df2 = as.data.frame(df, stringsAsFactors = FALSE)
> class(df2$x)
[1] "factor"
> levels(df2$x)
[1] "a" "b" "c"
Method 3 not working:
> df3 = df
> df3$x = factor(df3$x)
> class(df3$x)
[1] "factor"
> levels(df3$x)
[1] "a" "b" "c"
Method 4 finally works:
> df4 = df
> df4$x = as.vector(df4$x)
> class(df4$x)
[1] "character"
> levels(df4$x)
NULL
While working, I think method 4 is the least elegant. Can you help me to debug this? Many thanks
EDIT: Following comments and answers: I want to remove the factor structure from a data frame and not only droplevels
"Dropping levels" refers to getting rid of unused factor levels, but keeping the object as class factor. You're looking for a way to convert all factor columns into character columns:
> df2 = data.frame(lapply(df,
function(x) if (is.factor(x)) as.character(x) else x),
stringsAsFactors = FALSE)
> lapply(df2, class)
$x
[1] "character"
$y
[1] "character"
> df2
x y
1 a d
2 b e
3 c f
I'm guessing you want:
df[] <- lapply(df, as.character)
This has two differences from your code: the "[]" on the LHS of the assignment which preserves the dataframe structure of dfand the use of lapply. The droplevels function only drops extraneous levels but does not convert to a character vector. The as.character function does not have a data.frame method. It needs to be (l)-applied to each of the factor vectors rather than to a list of factor vectors. The more general function to do that (avoiding the error of attempting coercion on a numeric vector) would be:
makefac2char <- function(v) if(is.factor(v)){as.character(v)} else {v}
df[] <- lapply(df, makefac2char)
# To make a new dataframe
df2 <- lapply(df, makefac2char)
df2<- data.frame(df2)
If you do not want to destructively replace 'df' then you need to wrap data.frame around the lapply results since lapply does not maintain attributes. If you had created that dataframe with 'stringAsFactors=FALSE' (or set that option in .Options) you would not have needed to do this on a data.frame-wide basis.

How to extract the non-empty elements of list in R?

I have very big list, but some of the elements(positions) are NULL, means nothing inside there.
I want just extract the part of my list, which is non-empty. Here is my effort, but I faced with error:
ind<-sapply(mylist, function() which(x)!=NULL)
list<-mylist[ind]
#Error in which(x) : argument to 'which' is not logical
Would someone help me to implement it ?
You can use the logical negation of is.null here. That can be applied over the list with vapply, and we can return the non-null elements with [
(mylist <- list(1:5, NULL, letters[1:5]))
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# NULL
# [[3]]
# [1] "a" "b" "c" "d" "e"
mylist[vapply(mylist, Negate(is.null), NA)]
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# [1] "a" "b" "c" "d" "e"
Try:
myList <- list(NULL, c(5,4,3), NULL, 25)
Filter(Negate(is.null), myList)
If you don't care of the result structure , you can just unlist:
unlist(mylist)
What the error means is that your brackets are not correct, the condition you want to test must be in the which function :
which(x != NULL)
One can extract the indices of null enteries in the list using "which" function and not include them in the new list by using "-".
new_list=list[-which(is.null(list[]))]
should do the job :)
Try this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_if(is.null, ~ NA_character_) %>% #convert NULL into NA
is.na() %>% #find NA
`!` %>% #Negate
which() #get index of Non-NULLs
or even this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_lgl(is.null) %>%
`!` %>% #Negate
which()
MyList <- list(NULL, c(5, 4, 3), NULL, NULL)
[[1]]
NULL
[[2]]
[1] 5 4 3
[[3]]
NULL
[[4]]
NULL
MyList[!unlist(lapply(MyList,is.null))]
[[1]]
[1] 5 4 3

Resources