I have a matrix defined as the pair-wise differences between the elements of an array:
a <- as.matrix(dist(c(1,2,3,4,5)))
I need to compute without looping the sum of the pair-wise differences between the first two elements, the first three elements, etc. i.e., I need to arrive to the array:
v <- c(1,4,10,20)
Try
head(cumsum(cumsum(1:5)),-1)
#[1] 1 4 10 20
I don't know if you indeed want the to call the cumulative sum function twice, as I think "the sum of the pair-wise differences between the first two elements, the first three elements, etc." should result in:
c(1, 3, 6, 10)
Anyway, this should work with non-sequential x as well for your required output:
> cumsumdiff <- function (x) cumsum(cumsum(sapply(x[-1], `-`, x[1])))
> cumsumdiff(1:5)
[1] 1 4 10 20
Or based on #Jota's suggestion using the distance matrix:
> cumsumdiff <- function(x) cumsum(cumsum(unname(as.matrix(dist(x))[1, -1])))
> cumsumdiff(1:5)
[1] 1 4 10 20
Related
Say I have two lists:
list1 <- list(c(32, 43, 42))
list2 <- list(c(42, 46, 42))
How do I compute a column-wise (position-wise?) evaluation of the values stored across those lists? For example, I would like to calculate the number of events in each respective column that is greater than 40. So my results would be 1 for the first column, 2 for the second column, and 2 for the third column.
Is there a straightforward way to do this in R? I'm only finding resources for computing the mean down a column instead of a count-greater-than. Thanks in advance.
We can get this by loopiing over the list and applying rowSums
rowSums(sapply(list(list1, list2), function(x) do.call(rbind, x) > 40))
#[1] 1 2 2
Or if there is only a single list element, it can be converted to vector
rowSums(cbind(unlist(list1), unlist(list2)) > 40)
#[1] 1 2 2
We can combine the two lists using rbind/cbind, compare with 40 and use rowSums/colSums to calculate number of values that are greater than 40.
With rbind and colSums
colSums(rbind(list1[[1]], list2[[1]]) > 40)
#[1] 1 2 2
With cbind and rowSums
rowSums(cbind(list1[[1]], list2[[1]]) > 40)
#[1] 1 2 2
I am trying to run a summation on each row of dataframe. Let's say I want to take the sum of 100n^2, from n=1 to n=4.
> df <- data.frame(n = seq(1:4),a = rep(100))
> df
n a
1 1 100
2 2 100
3 3 100
4 4 100
Simpler example:
Let's make fun1 our example summation function. I can pull 100 out because I can just multiply it in later.
fun <- function(x) {
i <- seq(1,x,1)
sum(i^2) }
I want to then apply this function to each row to the dataframe, where df$n provides the upper bound of the summation.
The desired outcome would be as follows, in df$b:
> df
n a b
1 1 100 1
2 2 100 5
3 3 100 14
4 4 100 30
To achieve these results I've tried the apply function
apply(df$n,1,phi)
and also with df converted into a matrix
mat <- as.matrix(df)
apply(mat[1,],1,phi)
Both return an error:
Error in seq.default(1, x, 1) : 'to' must be of length 1
I understand this error, in that I understand why seq requires a 'to' value of length 1. I don't know how to go forward.
I have also tried the same while reading the dataframe as a matrix.
Maybe less simple example:
In my case I only need to multiply the results above, df$b, by 100 (or df$a) to get my final answer for each row. In other cases, though, the second value might be more entrenched, for example a^i. How would I call on both variables, a and n?
Underlying question:
My underlying goal is to apply a summation to each row of a dataframe (or a matrix). The above questions stem from my attempt to do so using seq(), as I saw advised in an answer on this site. I will gladly accept an answer that obviates the above questions with a different way to run a summation.
If we are applying seq it doesn't take a vector for from and to. So we can loop and do it
df$b <- sapply(df$n, fun)
df$b
#[1] 1 5 14 30
Or we can Vectorize
Vectorize(fun)(df$n)
#[1] 1 5 14 30
Motivation: I am currently trying to rethink my coding such as to exclude for-loops where possible. The below problem can easily be solved with conventional for-loops, but I was wondering if R offers a possibility to utilize the apply-family to make the problem easier.
Problem: I have a matrix, say X (n x k matrix) and two matrices of start and stop indices, called index.starts and index.stops, respectively. They are of size n x B and it holds that index.stops = index.starts + m for some integer m. Each pair index.starts[i,j] and index.stops[i,j] are needed to subset X as X[ (index.starts[i,j]:index.stops[i,j]),]. I.e., they should select all the rows of X in their index range.
Can I solve this problem using one of the apply functions?
Application: (Not necessarily important for understanding my problem.) In case you are interested, this is needed for a bootstrapping application with blocks in a time series application. The X represents the original sample. index.starts is sampled as replicate(repetitionNumber, sample.int((n-r), ceiling(n/r), replace=TRUE)) and index.stopsis obtained as index.stop = index.starts + m. What I want in the end is a collection of rows of X. In particular, I want to resample repetitionNumber times m blocks of length r from X.
Example:
#generate data
n<-100 #the size of your sample
B<-5 #the number of columns for index.starts and index.stops
#and equivalently the number of block bootstraps to sample
k<-2 #the number of variables in X
X<-matrix(rnorm(n*k), nrow=n, ncol = k)
#take a random sample of the indices 1:100 to get index.starts
r<-10 #this is the block length
#get a sample of the indices 1:(n-r), and get ceiling(n/r) of these
#(for n=100 and r=10, ceiling(n/r) = n/r = 10). Replicate this B times
index.starts<-replicate(B, sample.int((n-r), ceiling(n/r), replace=TRUE))
index.stops<-index.starts + r
#Now can I use apply-functions to extract the r subsequent rows that are
#paired in index.starts[i,j] and index.stops[i,j] for i = 1,2,...,10 = ceiling(n/r) and
#j=1,2,3,4,5=B ?
It's probably way more complicated than what you want/need, but here is a first approach. Just comment if that helps you in any way and I am happy to help.
My approach uses (multiple) *apply-functions. The first lapply "loops" over 1:B cases, where it first calculates the start and end points, which are combined into the take.rows (with subsetting numbers). Next, the inital matrix is subsetted by take.rows (and returned in a list). As a last step, the standard deviation is taken for each column of the subsetted matrizes (as a dummy function).
The code (with heavy commenting) looks like this:
# you can use lapply in parallel mode if you want to speed up code...
lapply(1:B, function(i){
starts <- sample.int((n-r), ceiling(n/r), replace=TRUE)
# [1] 64 22 84 26 40 7 66 12 25 15
ends <- starts + r
take.rows <- Map(":", starts, ends)
# [[1]]
# [1] 72 73 74 75 76 77 78 79 80 81 82
# ...
res <- lapply(take.rows, function(subs) X[subs, ])
# res is now a list of 10 with the ten subsets
# [[1]]
# [,1] [,2]
# [1,] 0.2658915 -0.18265235
# [2,] 1.7397478 0.66315385
# ...
# say you want to compute something (sd in this case) you can do the following
# but better you do the computing directly in the former "lapply(take.rows...)"
res2 <- t(sapply(res, function(tmp){
apply(tmp, 2, sd)
})) # simplify into a vector/data.frame
# [,1] [,2]
# [1,] 1.2345833 1.0927203
# [2,] 1.1838110 1.0767433
# [3,] 0.9808146 1.0522117
# ...
return(res2)
})
Does that point you in the right direction/gives you the answer?
Let say I’ve a data frame consists of one variable (x)
df <- data.frame(x=c(1,2,3,3,5,6,7,8,9,9,4,4))
I want to know how many numbers are less than 2,3,4,5,6,7.
I know how to do this manually using
# This will tell you how many numbers in df less than 4
xnew <- length(df[ which(df$x < 4), ])
My question is how can I automate this by using for-loop or other method(s)? And I need to store the results in an array as follows
i length
2 1
3 2
4 4
5 6
6 7
7 8
Thanks
One way would be to loop over (sapply) the numbers (2:7), check which elements in df$x is less than (<) the "number" and do the sum, cbind with the numbers, will give the matrix output
res <- cbind(i=2:7, length=sapply(2:7, function(y) sum(df$x <y)))
Or you can vectorize by creating a matrix of numbers (2:7) with each number replicated by the number of rows of df, do the logical operation < with df$x. The logical operation is repeated for each column of the matrix, and get the column sums using colSums.
length <- colSums(df$x <matrix(2:7, nrow=nrow(df), ncol=6, byrow=TRUE))
#or
#length <- colSums(df$x < `dim<-`(rep(2:7,each=nrow(df)),c(12,6)))
cbind(i=2:7, length=length)
num = c(2,3,4,5,6,7)
res = sapply(num, function(u) length(df$x[df$x < u]))
data.frame(number=num,
numberBelow=res)
A vectorized solution:
findInterval(2:7*(1-.Machine$double.eps),sort(df$x))
The .Machine$double.eps part assure that you are taking just the numbers lower than and not lower or equal than.
I need some help in determining more than one minimum value in a vector. Let's suppose, I have a vector x:
x<-c(1,10,2, 4, 100, 3)
and would like to determine the indexes of the smallest 3 elements, i.e. 1, 2 and 3. I need the indexes of because I will be using the indexes to access the corresponding elements in another vector. Of course, sorting will provide the minimum values but I want to know the indexes of their actual occurrence prior to sorting.
In order to find the index try this
which(x %in% sort(x)[1:3]) # this gives you and index vector
[1] 1 3 6
This says that the first, third and sixth elements are the first three lowest values in your vector, to see which values these are try:
x[ which(x %in% sort(x)[1:3])] # this gives the vector of values
[1] 1 2 3
or just
x[c(1,3,6)]
[1] 1 2 3
If you have any duplicated value you may want to select unique values first and then sort them in order to find the index, just like this (Suggested by #Jeffrey Evans in his answer)
which(x %in% sort(unique(x))[1:3])
I think you mean you want to know what are the indices of the bottom 3 elements? In that case you want order(x)[1:3]
You can use unique to account for duplicate minimum values.
x<-c(1,10,2,4,100,3,1)
which(x %in% sort(unique(x))[1:3])
Here's another way with rank that includes duplicates.
x <- c(x, 3)
# [1] 1 10 2 4 100 3 3
which(rank(x, ties.method='min') <= 3)
# [1] 1 3 6 7