Avoid for loops in R/ vectorize

Avoid for loops in R/ vectorize - r

Suppose I have a vector
a <- c(1, 3, 4, 5, 6, 1, 2, 1, 1, 1)
I want to make a new vector that stores the sum of values before every element after the 3rd element like the vector below:
b <- c(8, 13, 19, 20, 22, 23, 24, 25)
How can I do this without a for loop?

We can use cumsum on the vector and index to remove the first two elements
b1 <- cumsum(a)[-(1:2)]
b1
#[1] 8 13 19 20 22 23 24 25
Or another option is Reduce
b1 <- Reduce(`+`, a, accumulate = TRUE)[-(1:2)]

Another base R option with Reduce (#akrun's cumsum answer is the most concise one, I believe)
> tail(Reduce(`+`,a,accumulate = TRUE),-2)
[1] 8 13 19 20 22 23 24 25

Related

Examine if a value is in an interval using R

Having the following vector:
t <- c(2, 6, 8, 20, 22, 30, 40, 45, 60)
I would like to find the values that fall between the following intervals:
g <- list(c(1,20), c(20, 40))
The desired output is:
1, 20 c(2, 6, 8)
20, 40 c(20, 22, 30)
Using the dplyr library, I do the following:
library(dplyr)
for(i in t){
for(h in g){
if(between(i, h[[1]], h[[2]])==TRUE){print(c(i, h[[1]], h[[2]]))}
}}
Is there a better way of doing this in R?

We can loop over the list 'g' and extract the 't' elements based on the first and second values by creating a logical vector with >/< and extract the elements of 't'
lapply(g, function(x) t[t >= x[1] & t < x[2]])
-output
[[1]]
[1] 2 6 8
[[2]]
[1] 20 22 30

library(purrr)
library(dplyr)
map(g,~keep(t,between(t,.[1],.[2])))
[[1]]
[1] 2 6 8 20
[[2]]
[1] 20 22 30 40

You may find findInterval() from base R useful:
g <- c(1, 20, 40)
t <- c(2, 6, 8, 20, 22, 30, 40, 45, 60)
findInterval(t, g)
#> [1] 1 1 1 2 2 2 3 3 3
So t[1], t[2] and t[3] are in the first interval, t[4], t[5] and
t[6] in the second, and t[7], t[8] and t[9] the third (meaning that
these values are bigger than the right end point of the second interval.)
If you had values lower than one they would be labelled by 0:
t2 <- c(-1, 0, 2, 6, 8, 20, 22, 30, 40, 45, 60)
findInterval(t2, g)
#> [1] 0 0 1 1 1 2 2 2 3 3 3
You can save the result of findInterval() as e.g. y and use which(y==1) to find which entries correspond to the first interval.

We can try cut + is.na like below
lapply(
g,
function(x) {
t[!is.na(cut(t, x, include.lowest = TRUE))]
}
)
which gives
[[1]]
[1] 2 6 8 20
[[2]]
[1] 20 22 30 40

Is there an easy way of performing arithmetic on elements in a vector in R?

The input can be a vector of numbers or a string, and the output is the addition of the number 1 plus that element's position in the string of digits.
myFunciton(c(4,10))
[1] 5, 12
myFunction(1:10)
[1] 2, 4, 6, 8, 10, 12, 14, 16, 18, 20

You can use :
myfunction <- function(x) x + seq_along(x)
myfunction(c(4, 10))
#[1] 5 12
myfunction(1:10)
#[1] 2 4 6 8 10 12 14 16 18 20

Sum every x elements in a vector

I have a vector v like:
v <- c(1, 2, 46, 6, 3, 5, 67, 2, ..., 9)
I want to add the numbers three by three, so I would have the results of adding 1+6+67...
Thank you!

I would suggest creating a sequence by the width you want (in this case 3) which will start from 1 to the length of your vector and then sum:
#Data
v <- c(1, 2, 46, 6, 3, 5, 67, 2, 9)
#Seq
seqv <- seq(1,length(v),by = 3)
#Sum
sum(v[seqv])
Output:
[1] 74

You could create a sequence of values by three and use that to index the vector v and then sum the result.
v <- 10:19
s <- seq(1,9, by=3)
> v
[1] 10 11 12 13 14 15 16 17 18 19
> s
[1] 1 4 7
> sum(v[s])
[1] 39

Formula to substitute dataframe column names with categories defined in a second dataframe

Let's say I have data in wide format (samples in row and species in columns).
species <- data.frame(
Sample = 1:10,
Lobvar = c(21, 15, 12, 11, 32, 42, 54, 10, 1, 2),
Limtru = c(2, 5, 1, 0, 2, 22, 3, 0, 1, 2),
Pocele = c(3, 52, 11, 30, 22, 22, 23, 10, 21, 32),
Genmes = c(1, 0, 22, 1, 2,32, 2, 0, 1, 2)
)
And I want to automatically change the species names, based on a reference of functional groups that I have for all of the species (so it works even if I have more references than actual species in the dataset), for example:
reference <- data.frame(
Species_name = c("Lobvar", "Ampmis", "Pocele", "Genmes", "Limtru", "Secgio", "Nasval", "Letgos", "Salnes", "Verbes"),
Functional_group = c("Crustose", "Geniculate", "Erect", "CCA", "CCA", "CCA", "Geniculate", "Turf","Turf", "Crustose"),
stringsAsFactors = FALSE
)
EDIT
Thanks to #Dan Y suggestions, I can now changes the species names to their functional group names:
names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]
However, in my actual data.frame I have more species, and this creates many functional groups with the same name in different columns. I now would like to sum the columns that have the same names. I updated the example to give a results in which there is more than one functional group with the same name.
So i get this:
Sample Crustose CCA Erect CCA Crustose
1 21 2 3 1 2
2 15 5 52 0 3
3 12 1 11 22 4
4 11 0 30 1 1
5 32 2 22 2 0
6 42 22 22 32 0
and the final result I am looking for is this:
Sample Crustose CCA Erect
1 23 3 3
2 18 5 52
3 16 22 11
4 12 1 30
5 32 4 22
6 42 54 22
How do you advise on approaching this? Thanks for your help and the amazing suggestions I already received.

Re Q1) We can use match to do the name lookup:
names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]
Re Q2) Then we can mapply the rowSums function after some regular expression work on the colnames:
namevec <- gsub("\\.[[:digit:]]", "", names(df))
mapply(function(x) rowSums(df[which(namevec == x)]), unique(namevec))

Consecutive Sum of a Vector

This is a question following a previous one. In that question, it is suggested to use rollapply to calculate sum of the 1st, 2nd, 3rd entry of a vector; then 2nd, 3rd, 4th; and so on.
My question is how calculate sum of the 1st, 2nd and 3rd; then the 4th, 5th and 6th. That is, rolling without overlapping. Can this be easily done, please?

Same idea. You just need to specify the by argument. Default is 1.
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
zoo::rollapply(x, 3, by = 3, sum)
#[1] 10 20 12
#or another Base R option
sapply(split(x, ceiling(seq_along(x)/3)), sum)
# 1 2 3
#10 20 12

Using tapply in base R:
set.seed(1)
vec <- sample(10, 20, replace = TRUE)
#[1] 3 4 6 10 3 9 10 7 7 1 3 2 7 4 8 5 8 10 4 8
unname(tapply(vec, (seq_along(vec)-1) %/% 3, sum))
# [1] 13 22 24 6 19 23 12
Alternatively,
colSums(matrix(vec[1:(ceiling(length(vec)/3)*3)], nrow = 3), na.rm = TRUE)
#[1] 13 22 24 6 19 23 12
vec[1:(ceiling(length(vec)/3)*3)] fills in the vector with NA if the length is not divisible by 3. Then, you simply ignore NAs in colSums.
Yet another one using cut and aggregate:
x <- ceiling(length(vec)/3)*3
df <- data.frame(vec=vec[1:x], col=cut(1:x, breaks = seq(0,x,3)))
aggregate(vec~col, df, sum, na.rm = TRUE)[[2]]
#[1] 13 22 24 6 19 23 12

We can use roll_sum from RcppRoll which would be very efficient
library(RcppRoll)
roll_sum(x, n=3)[c(TRUE, FALSE, FALSE)]
#[1] 10 20 12
data
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)

you can define the window size, and do:
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
n <- 3
diff(c(0, cumsum(x)[slice.index(x, 1)%%n == 0]))
p.s. using the input from the answer by #Sotos

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Avoid for loops in R/ vectorize - r

Suppose I have a vector a <- c(1, 3, 4, 5, 6, 1, 2, 1, 1, 1) I want to make a new vector that stores the sum of values before every element after the 3rd element like the vector below: b <- c(8, 13, 19, 20, 22, 23, 24, 25) How can I do this without a for loop?

We can use cumsum on the vector and index to remove the first two elements b1 <- cumsum(a)[-(1:2)] b1 #[1] 8 13 19 20 22 23 24 25 Or another option is Reduce b1 <- Reduce(`+`, a, accumulate = TRUE)[-(1:2)]

Another base R option with Reduce (#akrun's cumsum answer is the most concise one, I believe) > tail(Reduce(`+`,a,accumulate = TRUE),-2) [1] 8 13 19 20 22 23 24 25

Related

Examine if a value is in an interval using R

Is there an easy way of performing arithmetic on elements in a vector in R?

Sum every x elements in a vector

Formula to substitute dataframe column names with categories defined in a second dataframe

Consecutive Sum of a Vector

Categories

Resources