My question is a continuation of this:
Split a vector into chunks
What would be the best possible way to access of all these chunks. For example, is there an easy way to access these mini-vectors if I have around a hundred of them. I would be needing to find the minimum of each of these chunks and store the results in a new vector.
Look at the plyr package there is family of function to process lists or vectors. In the post you mentioned, you see lists. Thus, use llply to have input as a list and output as a list, for vectors aaply is your choice.
# Examples from ?lapply
x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
> x
$a
[1] 1 2 3 4 5 6 7 8 9 10
$beta
[1] 0.04978707 0.13533528 0.36787944 1.00000000 2.71828183 7.38905610 20.08553692
$logic
[1] TRUE FALSE FALSE TRUE
llply(x, mean)
llply(x, quantile, probs = 1:3/4)
Related
I want to assure that the result of which(..., arr.ind = TRUE) is always ordered, specifically: arranged ascending by (col, row). I do not see such a remark in the which function documentation, whereas it seems to be the case based on some experiments I made. How I can check / learn if it is the case?
Example. When I run the code below, the output is a matrix in which the results are arranged ascending by (col, row) columns.
> set.seed(1)
> vals <- rnorm(10)
> valsall <- sample(as.numeric(replicate(10, vals)))
> mat <- matrix(valsall, 10, 10)
> which(mat == max(mat), arr.ind = TRUE)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 2 2
[5,] 10 2
[6,] 1 6
[7,] 2 8
[8,] 4 8
[9,] 1 9
[10,] 6 9
Part1:
Answering a part of your question on how to understand functions on a deeper level, if the documentation is not enough, without going into the detail of function which().
As match() is not a primitive function (which are written in C), i.e. written using the basic building blocks of R, we can check what's going on behind the scenes by printing the function itself. Note that using the backticks allows to check functions that have reserved names, e.g. +, and is therefore optional in this example. This dense R code can be extremely tiresome to read, but I've found it very educational and it does solve some mental knots every once in a while.
> print(`which`)
function (x, arr.ind = FALSE, useNames = TRUE)
{
wh <- .Internal(which(x))
if (arr.ind && !is.null(d <- dim(x)))
arrayInd(wh, d, dimnames(x), useNames = useNames)
else wh
}
<bytecode: 0x00000000058673e0>
<environment: namespace:base>
Part2:
So after giving up on trying to understand the which and arrayInd function in the way described above, I'm trying it with common sense. The most efficient way to check each value of a matrix/array that makes sense to me, is to at some point convert it to a one-dimensional object. Coercion from matrix to atomic vector, or any reduction of dimensions will always result in concatenating the complete columns of each dimension, so to me it is natural that higher-level functions will also follow this fundamental rule.
> testmat <- matrix(1:10, nrow = 2, ncol = 5)
> testmat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> as.numeric(testmat)
[1] 1 2 3 4 5 6 7 8 9 10
I found Hadley Wickham's Advanced R an extremely valuable resource in answering your question, especially the chapters about functions and data structures.
[http://adv-r.had.co.nz/][1]
I have data
test <- 1:10
and I would like to obtain the indices of test that fulfill different related conditions. For example,
which(test>5)[1]
which(test>8)[1]
which(test>9)[1]
yield
[1] 6
[1] 9
[1] 10
when carried out individually, but is there a way to execute them simultaneously using a vector like
bounds <- c(5,8,9)
That then yields a vector containing the indices for each value in bounds?
A couple of options are
findInterval(bounds, test) + 1
#[1] 6 9 10
which is the fastest, or
max.col(outer(bounds, test, `<`), 'first')
#[1] 6 9 10
which is the slowest, along with the commented one below the OP's post:
sapply(bounds, function(x) which(test > x)[1])
#[1] 6 9 10
which is neither the fastest, nor the slowest.
Just use apply:
sapply(bounds, function(x) which(test>x)[1])
[1] 6 9 10
I am having some fundamental confusion with R. I have a snippet of R code.
> m <- 1:10
> m
[1] 1 2 3 4 5 6 7 8 9 10
> dim(m) <- c(2,5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Now I am a C/Python programmer and the line dim(m) <- c(2,5) is incredibly confusing to me. I realize that it effectively changed a vector into a matrix, however looking at it I do not understand the logic/order of operation.
<- is the assignment operator in R. So to me, logically the order of operation is : assign (2,5) to the output of dim(m). Since the output of dim(m) isn't assigned to a variable, the output would be lost.
Could someone explain how I should read the line dim(m) <- c(2,5)? What is the order of operation? It seems that the order of operation using <- to changes depending on the LHS and RHS of the equation.
These are special functions called Replacement Functions. I quote from Hadley's Advanced-R book:
Replacement functions act like they modify their arguments in place, and have the special name xxx<-. They typically have two arguments (x and value), although they can have more, and they must return the modified object. For example, the following function allows you to modify the second element of a vector:
`second<-` <- function(x, value) {
x[2] <- value
x
}
x <- 1:10
second(x) <- 5L
x
#> [1] 1 5 3 4 5 6 7 8 9 10
When R evaluates the assignment second(x) <- 5, it notices that the left hand side of the <- is not a simple name, so it looks for a function named second<- to do the replacement.
You can check the full chapter here under the Replacement Functions title.
I cannot seem to convert a list to a matrix. I load a .csv file using:
dat = read.csv("games.csv", header = TRUE)
> typeof(dat)
[1] "list"
But when I try to convert it into a numeric matrix using:
games = data.matrix(dat)
The entries' values are all changed for some reason. What is the problem?
While Nathaniel's solution worked for you, I think it's important to point out that you might need to adjust your perception of what is going on.
The typeof(dat) might be a list but the class is a data.frame.
This might help illustrate the difference:
# typeof() & class() of `pts` is `list`
# whereas typeof() `dat` in your example is `list` but
# class() of `dat` in your example is a `data.frame`
pts <- list(x = cars[,1], y = cars[,2])
as.matrix(pts)
## [,1]
## x Numeric,50
#3 y Numeric,50
head(as.matrix(data.frame(pts)))
## x y
## [1,] 4 2
## [2,] 4 10
## [3,] 7 4
## [4,] 7 22
## [5,] 8 16
## [6,] 9 10
Those are two substantially different outcomes from the 'as.matrix()` function.
Just making sure you don't get disappointed of the outcome if you try this in a different context outside of read.csv.
Without any other information being provided, perhaps you might try:
games <- as.matrix(dat)
I have a list of data frames in R. All of the data frames in the list are of the same size. However, the elements may be of different types. For example,
I would like to apply a function to corresponding elements of data frame. For example, I want to use the paste function to produce a data frame such as
"1a" "2b" "3c"
"4d" "5e" "6f"
Is there a straightforward way to do this in R. I know it is possible to use the Reduce function to apply a function on corresponding elements of dataframes within lists. But using the Reduce function in this case does not seem to have the desired effect.
Reduce(paste,l)
Produces:
"c(1, 4) c(\"a\", \"d\")" "c(2, 5) c(\"b\", \"e\")" "c(3, 6) c(\"c\", \"f\")"
Wondering if I can do this without writing messy for loops. Any help is appreciated!
Instead of Reduce, use Map.
# not quite the same as your data
l <- list(data.frame(matrix(1:6,ncol=3)),
data.frame(matrix(letters[1:6],ncol=3), stringsAsFactors=FALSE))
# this returns a list
LL <- do.call(Map, c(list(f=paste0),l))
#
as.data.frame(LL)
# X1 X2 X3
# 1 1a 3c 5e
# 2 2b 4d 6f
To explain #mnel's excellent answer a bit more, consider the simple example of summing the corresponding elements of two vectors:
Map(sum,1:3,4:6)
[[1]]
[1] 5 # sum(1,4)
[[2]]
[1] 7 # sum(2,5)
[[3]]
[1] 9 # sum(3,6)
Map(sum,list(1:3,4:6))
[[1]]
[1] 6 # sum(1:3)
[[2]]
[1] 15 # sum(4:6)
Why the second one is the case might be made more obvious by adding a second list, like:
Map(sum,list(1:3,4:6),list(0,0))
[[1]]
[1] 6 # sum(1:3,0)
[[2]]
[1] 15 # sum(4:6,0)
Now, the next is more tricky. As the help page ?do.call states:
‘do.call’ constructs and executes a function call from a name or a
function and a list of arguments to be passed to it.
So, doing:
do.call(Map,c(sum,list(1:3,4:6)))
calls Map with the inputs of the list c(sum,list(1:3,4:6)), which looks like:
[[1]] # first argument to Map
function (..., na.rm = FALSE) .Primitive("sum") # the 'sum' function
[[2]] # second argument to Map
[1] 1 2 3
[[3]] # third argument to Map
[1] 4 5 6
...and which is therefore equivalent to:
Map(sum, 1:3, 4:6)
Looks familiar! It is equivalent to the first example at the top of this answer.