I have noticed a curious quirk in R. Let's say I have a vector and I want to use logical indices where I intend to just refer to the first two elements.
vec <- rep(2, 10)
vec[c(TRUE,FALSE)] <- 13
My output now shows 13 assigned in every other value
vec
[1] 13 2 13 2 13 2 13 2 13 2
Now, I know I could just use the numeric indices (for example wrapping the logical values with a which call) but I am curious about this.
Why does R repeat logical values when indexing a vector?
Related
Suppose a data frame df has a column speed, then what is difference in the way accessing the column like so:
df["speed"]
or like so:
df$speed
The following calculates the mean value correctly:
lapply(df["speed"], mean)
But this prints all values under the column speed:
lapply(df$speed, mean)
There are two elements to the question in the OP. The first element was addressed in the comments: df["speed"] is an object of type data.frame() whereas df$speed is a numeric vector. We can see this via the str() function.
We'll illustrate this with Ezekiel's 1930 analysis of speed and stopping distance, the cars data set from the datasets package.
> library(datasets)
> data(cars)
>
> str(cars["speed"])
'data.frame': 50 obs. of 1 variable:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
> str(cars$speed)
num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
>
The second element that was not addressed in the comments is that lapply() behaves differently when passed a vector versus a list().
With a vector, lapply() processes each element in the vector independently, producing unexpected results for a function such as mean().
> unlist(lapply(cars$speed,mean))
[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25
What happened?
Since each element of cars$speed is processed by mean() independently, lapply() returns a list of 50 means of 1 number each: the original elements in the cars$speed vector.
Processing a list with lapply()
With a list, each element of the list is processed independently. We can calculate how many items will be processed by lapply() with the length() function.
> length(cars["speed"])
[1] 1
>
Since a data frame is also a list() that contains one element of type data.frame(), the length() function returns the value 1. Therefore, when processed by lapply(), a single mean is calculated, not one per row of the speed column.
> lapply(cars["speed"],mean)
$speed
[1] 15.4
>
If we pass the entire cars data frame as the input object for lapply(), we obtain one mean per column in the data frame, since both variables in the data frame are numeric.
> lapply(cars,mean)
$speed
[1] 15.4
$dist
[1] 42.98
>
A theoretical perspective
The differing behaviors of lapply() are explained by the fact that R is an object oriented language. In fact, John Chambers, creator of the S language on which R is based, once said:
In R, two slogans are helpful.
-- Everything that exists is an object, and
-- Everything that happens is a function call.
John Chambers, quoted in Advanced R, p. 79.
The fact that lapply() works differently on a data frame than a vector is an illustration of the object oriented feature of polymorphism where the same behavior is implemented in different ways for different types of objects.
While this looks like an beginner's question I think it's worth answering it since many beginners could have a similar question and a guide to the corresponding documentation is helpful IMHO.
No up-votes please - I am just collecting the comment fragments from the question that contribute to the answer - feel free to edit this answer...*
A data.frame is a list of vectors with the same length (number of elements). Please read the help in the R console (by typing ?data.frame)
The $ operator is implemented by returning one column as vector (?"$.data.frame")
lapply applies a function to each element of a list (see ?lapply). If the first param X is a scalar vector (integer, double...) with multiple elements, each element of the vector is converted ("coerced") into one separate list element (same as as.list(1:26))
Examples:
x <- data.frame(a = LETTERS, b = 1:26, stringsAsFactors = FALSE)
b.vector <- x$b
b.data.frame <- x["b"]
class(b.vector) # integer
class(b.data.frame) # data.frame
lapply(b.vector, mean)
# returns a result list with 26 list elements, the same as `lapply(1:26, mean)`
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2
# ... up to list element 26
lapply(b.data.frame, mean)
# returns a list where each element of the input vector in param X
# becomes a separate list element (same as `as.list(1:26)`)
# $b
# [1] 13.5
So IMHO your original question can be reduced to: Why is lapply behaving differently if the first parameter is a scalar vector instead of a list?
I have a vector of length 1000. It contains (numeric) survey answers of 100 participants, thus 10 answers per participant. I would like to drop the first three values for every participant to create a new vector of length 700 (including only the answers to questions 4-10).
I only know how to extract every n-th value of the vector, but cannot figure how to solve the above problem.
vector <- seq(1,1000,1)
Expected output:
4 5 6 7 8 9 10 14 15 16 17 18 19 20 24 ...
Using a matrix to first structure and then flatten is one method. Another somewhat similar method is to use what I am calling a "logical pattern index":
head( # just showing the first couple of "segments"
vector[ c( rep(FALSE, 3), rep(TRUE, 10-3) ) ],
15)
[1] 4 5 6 7 8 9 10 14 15 16 17 18 19 20 24
This method can also be use inside the two argument version of [ to select rows ore columns using a logical pattern index. This works because of R's recycling of logical indices.
Thanks for providing example data, based on which this thread is reproducible. Here is one solution
c(matrix(vector, 10)[4:10, ])
We first convert the vector to a matrix with 10 rows, so that each column attributes to a participant. Then use row subsetting to remove first three rows. Finally the matrix is flattened to a vector again.
Currently, this code works to do what I want to do where dx$res is a vector selecting values from dx$val1 or dx$val2 depending on value of dx$x0.
x0<-c(1,2,1,2,2,1)
val1<-c(8,6,4,5,3,2)
val2<-c(4,8,6,7,9,5)
dx<-data.frame(x0,val1,val2)
dx$res<-(dx$x0==1)*dx$val1+(dx$x0==2)*dx$val2
I would like to know if there were more elegant methods to do this like using apply function.
One option is model.matrix with rowSums. It is also more general for 'n' number of distinct elements in the 'x0' column.
dx$res <- rowSums(dx[-1]*model.matrix(~ factor(x0) - 1 , dx))
dx$res
#[1] 8 8 4 7 9 2
Assume I have three matrices...
A=matrix(c("a",1,2),nrow=1,ncol=3)
B=matrix(c("b","c",3,4,5,6),nrow=2,ncol=3)
C=matrix(c("d","e","f",7,8,9,10,11,12),nrow=3,ncol=3)
I want to find all possible combinations of column 1 (characters or names) while summing up columns 2 and 3. The result would be a single matrix with length equal to the total number of possible combinations, in this case 6. The result would look like the following matrix...
Result <- matrix(c("abd","abe","abf","acd","ace","acf",11,12,13,12,13,14,17,18,19,18,19,20),nrow=6,ncol=3)
I do not know how to add a table in to this question, otherwise I would show it more descriptively. Thank you in advance.
You are mixing character and numeric values in a matrix and this will coerce all elements to character. Much better to define your matrix as numeric and keep the character values as the row names:
A <- matrix(c(1,2),nrow=1,dimnames=list("a",NULL))
B <- matrix(c(3,4,5,6),nrow=2,dimnames=list(c("b","c"),NULL))
C <- matrix(c(7,8,9,10,11,12),nrow=3,dimnames=list(c("d","e","f"),NULL))
#put all the matrices in a list
mlist<-list(A,B,C)
Then we use some Map, Reduce and lapply magic:
res <- Reduce("+",Map(function(x,y) y[x,],
expand.grid(lapply(mlist,function(x) seq_len(nrow(x)))),
mlist))
Finally, we build the rownames
rownames(res)<-do.call(paste0,expand.grid(lapply(mlist,rownames)))
# [,1] [,2]
#abd 11 17
#acd 12 18
#abe 12 18
#ace 13 19
#abf 13 19
#acf 14 20
I've got a fairly basic question concerning vector operations in R. I want to apply a certain operation (i.e. increment) to specific elements of a vector by using a vector containing the indices of the elements.
For example:
ind <- c(2,5,8)
vec <- seq(1,10)
I want to add 1 to the 2nd, 5th and 8th element of vec. In the end I'd like to have:
vec <- c(1,3,3,4,6,6,7,9,8,10)
I tried vec[ind] + 1
but that returns only the three elements. I could use a for-loop, of course, but knowing R, I'm sure there's a more elegant way.
Any help would be much appreciated.
We have to assign it
vec[ind] <- vec[ind] + 1
vec
#[1] 1 3 3 4 6 6 7 9 9 10