R: Applying function to DataFrame

R: Applying function to DataFrame - r

I have following code:
library(Ecdat)
data(Fair)
Fair[1:5,]
x1 = function(x){
mu = mean(x)
l1 = list(s1=table(x),std=sd(x))
return(list(l1,mu))
}
mylist <- as.list(Fair$occupation,
Fair$education)
x1(mylist)
What I wanted is that x1 outputs the result for the items selected in mylist. However, I get In mean.default(x) : argument is not numeric or logical: returning NA.

You need to use lapply if your passing a list to a function
output<-lapply(mylist,FUN=x1)
This will process your function x1 for each element in mylist and return a list of results to output.

Here the mylist is created not in the correct way and a list is not needed also as data.frame is a list with columns of equal length. So, just subset the columns of interest and apply the function
lapply(Fair[c("occupation", "education")], x1)
In the OP's code, as.list simply creates a list of length 601 with only a single element in each.
str(mylist)
#List of 601
#$ : int 7
#$ : int 6
#$ : int 1
#...
#...
Another problem in the code is that it is not even considering the 2nd argument. Using a simple example
as.list(1:3, 1:2)
#[[1]]
#[1] 1
#[[2]]
#[1] 2
#[[3]]
#[1] 3
The second argument is not at all considered. It could have been
list(1:3, 1:2)
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 1 2
But for data.frame columns, we don't need to explicitly call the list as it is a list of vectors that have equal length.
Regarding the error in OP's post, mean works on vectors and not on list or data.frame.

Related

finding the index of list elements which are greater than 0

I have a list, "my_list", below:
$`2015-03-01 00:18:50`
integer(0)
$`2015-03-01 11:19:59`
[1] 4 6
$`2015-03-01 12:18:29`
[1] 12 13
$`2015-03-01 13:19:09`
[1] 1
$`2015-03-01 17:18:44`
integer(0)
$`2015-03-01 22:18:49`
integer(0)
I want to get the element index (not the subelement index) of the values greater than 0 (or where a list subelement is NOT empty). The output expected is a list that looks like:
2,2,3,3,4
I have gotten close with:
indices<-which(lapply(my_list,length)>0)
This piece of code however, only gives me the following and doesn't account for there being more than one subelement within a list element:
2,3,4
Does anyone know how to achieve what I am looking for?

We can use lapply along with a seq_along trick to bring in the indices of each element of the list. Then, for each list element, generate a vector of matching indices. Finally, unlist the entire list to obtain a single vector of matches.
x <- list(a=integer(0),b=c(4,6),c=c(12,13),d=c(1),e=integer(0),f=integer(0))
result <- lapply(seq_along(x), function(i) { rep(i, sum(x[[i]] > 0)) })
unlist(result)
[1] 2 2 3 3 4
Demo

You can try this, I hope this is what you have expected, Using lengths to calculate length of items in the list, then iterating every items of that list in rep command to get the final outcome:
lyst <- list(l1=integer(0), l2= c(1,2), l3=c(3,4), l4=character(0), l5=c(5,6,6))
lyst1 <- lengths(lyst)
unlist(lapply(1:length(lyst1), function(x)rep(x, lyst1[[x]])))
Output:
#> unlist(lapply(1:length(lyst1), function(x)rep(x, lyst1[[x]])))
#[1] 2 2 3 3 5 5 5

Repeat each numeric index by the respective length:
rep(seq_along(x), lengths(x))
#[1] 2 2 3 3 4
Using #Tim's x data.

R: operations between vectors inside of lists and vectors outside

Supose I have a list of 3 elements and each element is a list of 2 other elements. The first, a 4-dimensional vector and the second, say, a char. The following code will produce a list exactly as I just described it:
x <- NULL
for(i in 1:3){
set.seed(i); a <- list(sample(1:4, 4, replace = T), LETTERS[i])
x <- c(x, list(a))
}
Its structure is there fore of the following type (the exact values may chage since I used the sample function):
> str(x)
List of 3
$ :List of 2
..$ : int [1:4] 2 2 3 4
..$ : chr "A"
$ :List of 2
..$ : int [1:4] 1 3 3 1
..$ : chr "B"
$ :List of 2
..$ : int [1:4] 1 4 2 2
..$ : chr "C"
Now, I have an other 4-dimensional vector, say y:
y <- 1:4
Finally I want to create a matrix resulting from the operation (say sum) between y and each 4-dimensional vector stored in the list. For the given example, this matrix would give the following result:
[,1] [,2] [,3]
[1,] 3 2 2
[2,] 4 5 6
[3,] 6 6 5
[4,] 8 5 6
Question: How can I create the above matrix in a simple and elegant way? I was searching for some solution that could use some apply function or that could use directly the sum function in some way that I'm not aware of.

Try this:
# you can also simply write: sapply(x, function(x) x[[1]]) + y
foo <- function(x) x[[1]]
sapply(x, foo) + y
The function foo extracts the vector inside the list;
sapply returns those vectors as a matrix;
Finally, we use recycling rule for addition.
Update 1
Well, since #Frank mentioned it. I might make a little explanation. The '[[' operator in R (note the quote!) is a function, taking two arguments. The first is a vector type object, like a vector/list; while the second is the index which you want to refer to. For the following example:
a <- 1:4
a[2] # 2
'[['(a, 2) # 2
Though my original answer is easier to digest, it is not the most efficient, because for each list element, two function calls are invoked to take out the vector. While if we use '[[' directly, only one function call is invoked. Therefore, we get time savings by reducing function call overhead. Function call overhead can be noticeable, when the function is small and does not do much work.
Operators in R are essentially functions. +, *, etc are arithmetic operators and you can find them by ?'+'. Similarly, you can find ?'[['. Don't worry too much if you can't follow this at the moment. Sooner or later you will get to it.
Update 2
I don't understand how it actually does the job. When I simply ask for [[1]] at the console, I get the first element of the list (both the integer vector and the char value), not just the vector. I guess the remainder should be the magics of the sapply function.
Ah, if you have difficulty in understanding sapply (or similarly lapply), consider the following. But I will start from lapply.
output <- lapply(x, foo) is doing something like:
output <- vector("list", length = length(x))
for (i in 1:length(x)) output[[i]] <- foo(x[[i]])
So lapply returns a list:
> output
[[1]]
[1] 2 2 3 4
[[2]]
[1] 1 4 4 3
[[3]]
[1] 3 1 1 1
Yes, lapply loops through the elements of x, applying function foo, and return the result in another list.
sapply takes the similar idea, but returns a vector/matrix. You may think that sapply collapses the result of lapply to a vector/matrix.
Sure, my this part of explanation is just to make things understandable. lapply and sapply is not really implemented as R loop. They are more efficient.

sapply doesn't return (numeric) vector when calculating gradients

I am calculating gradient values by using
DF$gradUx <- sapply(1:nrow(DF), function(i) ((DF$V4[i+1])-DF$V4[i]), simplify = "vector")
but when checking class(DF$gradUx), I still get a list. What I want is a numeric vector. What am I doing wrong?
Browse[1]> head(DF)
V1 V2 V3 V4
1 0 0 -2.913692e-09 2.913685e-09
2 1 0 1.574589e-05 3.443367e-09
3 2 0 2.111406e-05 3.520451e-09
4 3 0 2.496275e-05 3.613013e-09
5 4 0 2.735775e-05 3.720385e-09
6 5 0 2.892444e-05 3.841937e-09

You will only get a numeric vector when all return values are of length 1. More accurately, you will get an array if all return values are the same length. From ?sapply "Details":
Simplification in 'sapply' is only attempted if 'X' has length
greater than zero and if the return values from all elements of
'X' are all of the same (positive) length. If the common length
is one the result is a vector, and if greater than one is a matrix
with a column corresponding to each element of 'X'.
When i == 0, your formula will return numeric(0), so the whole return will be a list.
You need to change your calculation to account for indexing outside the bounds of your vector. DF$V4[1-1] returns numeric(0), and DF$V4[nrow(DF)+1] returns NA. Fix this logic and you should remedy the vector problem.
Edit: for historical reasons, the original question incorrectly calculated the difference as DF$V4[i+1])-DF$V4[i-1], giving a lag-2 difference, whereas the recently-edited question (and the OP's intent) shows a lag-1 difference.

Instead of sapply I should simply use diff(DF$V3) and write it into a new data.frame:
gradients = data.frame(gradUx=diff(DF$V3),gradUy=diff(DF$V4))

This calculation can be vectorized very easily if you line up the observations. I use head and tail to drop the first 2 and last 2 observations:
gradUx <- c(NA, tail(df$V4, -2) - head(df$V4, -2), NA)
> gradUx
[1] NA 6.06766e-10 1.69646e-10 1.99934e-10 2.28924e-10 NA
Which provides the same values as your approach, in vector form:
> sapply(1:nrow(df), function(i) ((df$V4[i+1])-df$V4[i-1]), simplify = "vector")
[[1]]
numeric(0)
[[2]]
[1] 6.06766e-10
[[3]]
[1] 1.69646e-10
[[4]]
[1] 1.99934e-10
[[5]]
[1] 2.28924e-10
[[6]]
[1] NA

Replace values in list

I have a nested list, which could look something like this:
characlist<-list(list(c(1,2,3,4)),c(1,3,2,NA))
Next, I want to replace all values equal to one with NA. I tried the following, but it produces an error:
lapply(characlist,function(x) ifelse(x==1,NA,x))
Error in ifelse(x == 1, NA, x) :
(list) object cannot be coerced to type 'double'
Can someone tell me what's wrong with the code?

Use rapply instead:
> rapply(characlist,function(x) ifelse(x==1,NA,x), how = "replace")
#[[1]]
#[[1]][[1]]
#[1] NA 2 3 4
#
#
#[[2]]
#[1] NA 3 2 NA
The problem in your initial approach was that your first list element is itself a list. Hence you cannot directly apply the ifelse logic as you would on an atomic vector. By using ?rapply you can avoid that problem (rapply is a recursive version of lapply).

Another option would be using relist after we replace the elements that are 1 to NA in the unlisted vector. We specify the skeleton as the original list to get the same structure.
v1 <- unlist(characlist)
relist(replace(v1, v1==1, NA), skeleton=characlist)
#[[1]]
#[[1]][[1]]
#[1] NA 2 3 4
#[[2]]
#[1] NA 3 2 NA

Finding the existance of a vector within matrix within list within list

I have tried to use R to find a vector within a matrix within list within list. I have tried if the vector 'ab' exists by using the following 'exists' code but none of them work. How can I make it work?
aa <- list(x = matrix(1,2,3), y = 4, z = 3)
colnames(aa$x) <- c('ab','bb','cb')
aa
#$x
# ab bb cb
#[1,] 1 1 1
#[2,] 1 1 1
#
#$y
#[1] 4
#
#$z
#[1] 3
exists('ab', where=aa)
#[1] FALSE
exists('ab', where=aa$x)
# Error in exists("ab", where = aa$x) : invalid 'envir' argument
exists('ab', where=colnames(aa$x))
# Error in as.environment(where) : no item called "ab" on the search list
colnames(aa$x)
#[1] "ab" "bb" "cb"

The column names are part of either matrix or data.frames. So, we loop over the list using sapply, get the column names (colnames), unlist and check whether 'ab' is among that vector
'ab' %in% unlist(sapply(aa, colnames))
#[1] TRUE
If we want to be more specific for a particular list element, we extract the element (aa$x), get the column names and check whether 'ab' is among them.
'ab' %in% colnames(aa$x)
#[1] TRUE
Or another option would be to loop through 'aa', and if the element is a matrix, extract the 'ab' column and check whether it is a vector, wrap the sapply with any to get a single TRUE/FALSE output.
any(sapply(aa, function(x) if(is.matrix(x)) is.vector(x[, 'ab']) else FALSE))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: Applying function to DataFrame - r

You need to use lapply if your passing a list to a function output<-lapply(mylist,FUN=x1) This will process your function x1 for each element in mylist and return a list of results to output.

Related

finding the index of list elements which are greater than 0

R: operations between vectors inside of lists and vectors outside

sapply doesn't return (numeric) vector when calculating gradients

Replace values in list

Finding the existance of a vector within matrix within list within list

Categories

Resources