I use apply to a matrix in order to apply a function row by row.
My syntax is as follows :
res = apply(X,1,MyFunc)
The above function MyFunc returns a list of two values.
But the result of this apply application is a strange structure, where R seems to add some of its own (housekeeping?) data :
res = $`81`
$`81`$a
[1] 80.8078
$`81`$b
[1] 6247
Whereas the result I am waiting for is simply :
res = $a
[1] 80.8078
$b
[1] 6247
I do not know why this strange 81 is inserted by R and how can I get rid of it.
Thanks for help
This is perfectly normal behaviour. You are applying a function over a matrix with named rows. Your function returns a list for each row, and each element in this new list of lists is named with the corresponding rowname.
Here is an example that reproduces what you describe:
x <- matrix(1:4, nrow=2)
rownames(x) <- 80:81
myFunc <- function(x)list(a=1, b=2)
xx <- apply(x, 1, myFunc)
xx
This returns:
$`80`
$`80`$a
[1] 1
$`80`$b
[1] 2
$`81`
$`81`$a
[1] 1
$`81`$b
[1] 2
Take a look at the structure of this list:
str(xx)
List of 2
$ 80:List of 2
..$ a: num 1
..$ b: num 2
$ 81:List of 2
..$ a: num 1
..$ b: num 2
To index the first element, simply use xx[[1]]:
xx[[1]]
$a
[1] 1
$b
[1] 2
Here is a guess as to what you may have intended... Rather than returning a list, if you return a vector, the result of the apply will be a matrix:
myFunc <- function(x)c(a=1, b=2)
apply(x, 1, myFunc)
80 81
a 1 1
b 2 2
And to get a specific row, without names, do:
unname(xx[2, ])
[1] 2 2
It would help to know what your matrix (X) looks like. Let's try something like this:
mf <- function(x) list(a=sum(x),b=prod(x))
mat <- matrix(1:6,nrow=2)
Then:
> apply(mat,1,mf)
[[1]]
[[1]]$a
[1] 9
[[1]]$b
[1] 15
[[2]]
[[2]]$a
[1] 12
[[2]]$b
[1] 48
You need that first subscript to differentiate between the lists that each row will generate. I suspect that your rownames are numbered, which results in the $`81` that you are seeing.
Related
I want to sort vectors in a list. I tried the following:
test <- list(c(2,3,1), c(3,2,1), c(1,2,3))
for (i in length(test)){
test[[i]] <- sort(test[[i]])
}
test
Which returns the list unchanged (vectors not sorted):
[[1]]
[1] 2 3 1
[[2]]
[1] 3 2 1
[[3]]
[1] 1 2 3
However when I sort manually outside the loop the order is stored:
test[[1]]
[1] 2 3 1
test[[1]] <- sort(test[[1]])
test[[1]]
[1] 1 2 3
Why does the behaviour in the loop differ? I would expect the loop to store three vectors c(1,2,3) in the list. What am I missing?
I just figured the loop only loops over one element since length(test) = 3. Hence I should have used for (i in 1:length(test)).
I want to run a function on each level of a data.frame variable based on the condition of each level of another data.frame variable (or lists, if it's better to work with them for some reason).
If one of the variables achieves a certain condition (e.g., > 15), I then want to run a simple function (e.g., product) on each pair of variables and add the results to a new list. For the sake of my needs and other's future needs, I am hoping for a
solution that is flexible for any condition and any function.
I am new to programming/R and do not know how to appropriate structure a for loop (or other method) in order to run the function for all combinations of elements in both data.frame variables. It seems like this should be really easy to achieve but I've been searching for hours and cannot find a solution.
This is the nested for loop code I am working on:
df1 <- data.frame(c(1, 2, 3))
df2 <- data.frame(c(10, 20, 30))
list1 <- list()
for (i in 1:length(df1)) {
for (j in 1:length(df2)) {
if (df2[j,] > 15) {
list1[[i]] <- df1[i,] * df2[j,]}
}}
list1
When I run the current code I get and empty list results: list().
What I want returned is something like this:
[[1]]
[1] 20
[[2]]
[1] 30
[[3]]
[1] 40
[[4]]
[1] 60
[[5]]
[1] 60
[[6]]
[1] 90
Consider sapply with two inputs to iterate across nrow of both data frames with list conversion:
mat <- sapply(1:nrow(df2), function(i, j) ifelse(df2[j,] > 15, df1[i,]*df2[j,], NA),
1:nrow(df1))
mat <- mat[!is.na(mat)]
mat
# [1] 20 30 40 60 60 90
as.list(mat)
# [[1]]
# [1] 20
#
# [[2]]
# [1] 30
#
# [[3]]
# [1] 40
#
# [[4]]
# [1] 60
#
# [[5]]
# [1] 60
#
# [[6]]
# [1] 90
There are many ways to do this, here are two of them: one is your for loop and another is vectorised.
for loop
There are few mistakes in your code, both df1 and df2 have length= 1. Therefore, i and j are only set as 1. This can be fixed by using nrow instead of length. Another thing is to create an index outside the loop to assign your results to list. Following code works
df1 <- data.frame(c(1, 2, 3))
df2 <- data.frame(c(10, 20, 30))
list1 <- list()
index=0
for (i in 1:nrow(df1)) {
for (j in 1:nrow(df2)) {
if (df2[j,] > 15) {
index=index+1
list1[[index]] <- df1[i,] * df2[j,]}
}}
list1
[[1]]
[1] 20
[[2]]
[1] 30
[[3]]
[1] 40
[[4]]
[1] 60
[[5]]
[1] 60
[[6]]
[1] 90
vectorized way
Using expand.grid to generate the required combinations and prod to find their products
dat=expand.grid(df1[,1], df2[df2 > 15,1])
dat=dat[order(dat$Var1),]
apply(dat, 1, prod)
1 4 2 5 3 6
20 30 40 60 60 90
I'm creating a function for intersecting a list of vectors.
The idea is for the function to reccur until it intersects the last two vectors in the the list and then move upward.
The problem is that this results in list(), even though there should definitely be an answer.
intersect_list <- function (x) {
if (length (x) == 1) {return (unique (x))}
else {intersect (x [[1]], intersect_list (x [-1]))
}
}
> listing
[[1]]
[1] 1 2 3 4
[[2]]
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5 6
[[3]]
[1] 2 4
> intersect_list(listing)
list()
If I do everything manually, it works fine (naturally) and the output shuld be [1] 2 4.
Change the case when length(x)==1 to return a vector instead of a list.
intersect_list <- function (x) {
if (length (x) == 1) {return (unique (x[[1]]))}
else {intersect (x [[1]], intersect_list (x [-1]))
}
}
When unique is used on a lists it returns a list with all the unique elements. It doesn't recursively apply unique on each element. Since in the case when the input length is 1 we still have a list whose first element is a vector it ends up that unique will just return the input.
> unique(list(c(1,1,1,2)))
[[1]]
[1] 1 1 1 2
To see how unique is acting take a look at this example
> mylist <- list(1:2, 1:2, 3:4, c(1,1,1,2))
> unique(mylist)
[[1]]
[1] 1 2
[[2]]
[1] 3 4
[[3]]
[1] 1 1 1 2
The first two elements of mylist are the same vector so unique only returns a single instance of it.
So the issue with your function is that you were passing a list into unique which ends up returning a list. When intersect comes across a vector and a list it doesn't do what you were expecting it to.
> intersect(1:3, list(1:3))
list()
> intersect(1:3, list(1))
[[1]]
[1] 1
> intersect(list(1:3), list(1:3))
[[1]]
[1] 1 2 3
> intersect(list(1:3, 2:3), list(1:3))
[[1]]
[1] 1 2 3
By changing the unique(x) to unique(x[[1]]) we pass in the vector instead of the list which in turn takes the unique elements from that vector and returns a vector. Then both of the inputs to intersect will be a vector like you expected instead of having a list in there.
tl;dr - What the hell is a vector in R?
Long version:
Lots of stuff is a vector in R. For instance, a number is a numeric vector of length 1:
is.vector(1)
[1] TRUE
A list is also a vector.
is.vector(list(1))
[1] TRUE
OK, so a list is a vector. And a data frame is a list, apparently.
is.list(data.frame(x=1))
[1] TRUE
But, (seemingly violating the transitive property), a data frame is not a vector, even though a dataframe is a list, and a list is a vector. EDIT: It is a vector, it just has additional attributes, which leads to this behavior. See accepted answer below.
is.vector(data.frame(x=1))
[1] FALSE
How can this be?
To answer your question another way, the R Internals manual lists R's eight built-in vector types: "logical", "numeric", "character", "list", "complex", "raw", "integer", and "expression".
To test whether the non-attribute part of an object is really one of those vector types "underneath it all", you can examine the results of is(), like this:
isVector <- function(X) "vector" %in% is(X)
df <- data.frame(a=1:4)
isVector(df)
# [1] TRUE
# Use isVector() to examine a number of other vector and non-vector objects
la <- structure(list(1:4), mycomment="nothing")
chr <- "word" ## STRSXP
lst <- list(1:4) ## VECSXP
exp <- expression(rnorm(99)) ## EXPRSXP
rw <- raw(44) ## RAWSXP
nm <- as.name("x") ## LANGSXP
pl <- pairlist(b=5:8) ## LISTSXP
sapply(list(df, la, chr, lst, exp, rw, nm, pl), isVector)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
Illustrating what #joran pointed out, that is.vector returns false on a vector which has any attributes other than names (I never knew that) ...
# 1) Example of when a vector stops being a vector...
> dubious = 7:11
> attributes(dubious)
NULL
> is.vector(dubious)
[1] TRUE
#now assign some additional attributes
> attributes(dubious) <- list(a = 1:5)
> attributes(dubious)
$a
[1] 1 2 3 4 5
> is.vector(dubious)
[1] FALSE
# 2) Example of how to strip a dataframe of attributes so it looks like a true vector ...
> df = data.frame()
> attributes(df)
$names
character(0)
$row.names
integer(0)
$class
[1] "data.frame"
> attributes(df)[['row.names']] <- NULL
> attributes(df)[['class']] <- NULL
> attributes(df)
$names
character(0)
> is.vector(df)
[1] TRUE
Not an answer, but here are some other interesting things that are definitely worth investigating. Some of this has to do with the way objects are stored in R.
One example:
If we set up a matrix of one element, that element being a list, we get the following. Even though it's a list, it can be stored in one element of the matrix.
> x <- matrix(list(1:5)) # we already know that list is also a vector
> x
# [,1]
# [1,] Integer,5
Now if we coerce x to a data frame, it's dimensions are still (1, 1)
> y <- as.data.frame(x)
> dim(y)
# [1] 1 1
Now, if we look at the first element of y, it's the data frame column,
> y[1]
# V1
# 1 1, 2, 3, 4, 5
But if we look at the first column of, y, it's a list
> y[,1]
# [[1]]
# [1] 1 2 3 4 5
which is exactly the same as the first row of y.
> y[1,]
# [[1]]
# [1] 1 2 3 4 5
There are a lot of properties about R objects that are cool to investigate if you have the time.
Can someone explain what's happening in this example code? I have a function which does a calculation loop and as always, wanted to initialize my output vector instead of incrementing it each time thru the loop.
Rgames> library(Rmpfr)
Rgames> foo<-rep(NA,5)
Rgames> foo
[1] NA NA NA NA NA
Rgames> rfoo<-mpfr(rep(NA,5),20)
Rgames> rfoo
5 'mpfr' numbers of precision 20 bits
[1] NaN NaN NaN NaN NaN
Rgames> for(jj in 1:5) {
+ foo[jj]<- mpfr(jj,10)
+ rfoo[jj]<-mpfr(jj,10)
+ }
Rgames> rfoo
5 'mpfr' numbers of precision 10 bits
[1] 1 2 3 4 5
Rgames> foo
[[1]]
'mpfr1' 1
[[2]]
'mpfr1' 2
[[3]]
'mpfr1' 3
[[4]]
'mpfr1' 4
[[5]]
'mpfr1' 5
I don't understand why, apparently, the existing non-mpfr vector foo is not only coerced to a list, but then each time through the loop, the new value is inserted into foo[jj] as a list, giving me an unpleasant "list of lists" . The mpfr vector rfoo does what I expected I'd get in both cases. (I checked, and if I do not initialize, and put something inside the loop like foo<-c(foo,mpfr(jj,10)) I do get a result equivalent to rfoo)
What's happening here is the same thing that would happen if you were working with lists instead of mpfr objects. For example, as follows. I believe this makes sense because S4 objects are stored in a similar way to lists, but I'm not an S4 expert.
> foo <- rep(NA,2)
> foo
[1] NA NA
> foo[1] <- list(1)
> foo
[[1]]
[1] 1
[[2]]
[1] NA
I believe that what happens is that the original atomic vector gets coerced to a list to be able to include the object that you've asked to put there. I can't find any documentation about that right here; I think it's discussed in Chambers's book but don't have that at hand.
One can easily recreate this behavior using S3 methods as well; first the S3 methods to create a new class:
mynum <- function(x) {structure(as.list(x), class="mynum")}
print.mynum <- function(x) { cat("My numbers are:\n")
print(do.call(paste, x), quote=FALSE) }
Here's what happens if you start with an atomic vector:
> (foo <- rep(NA, 2))
[1] NA NA
> foo[1] <- mynum(1)
> foo
[[1]]
[1] 1
[[2]]
[1] NA
and here's what happens if you start with the mynum vector:
> (rfoo <- mynum(rep(NA, 2)))
My numbers are:
[1] NA NA
> rfoo[1] <- mynum(1)
> rfoo
My numbers are:
[1] 1 NA