calculating column sum for certain row - r

I am trying to calculate column sum of per 5 rows for each row, in R using the following code:
df <- data.frame(count=1:10)
for (loop in (1:nrow(df)))
{df[loop,"acc_sum"] <- sum(df[max(1,loop-5):loop,"count"])}
But I don't like the explicit loop here, how can I modify it? Thanks.

According to your question, your desired result is:
df
# count acc_sum
# 1 1 1
# 2 2 3
# 3 3 6
# 4 4 10
# 5 5 15
# 6 6 21
# 7 7 27
# 8 8 33
# 9 9 39
# 10 10 45
This can be done like this:
df <- data.frame(count=1:10)
library(zoo)
df$acc_sum <- rev(rollapply(rev(df$count), 6, sum, partial = TRUE, align = "left"))
To obtain this result, we are reversing the order of df$count, we sum the elements (using partial = TRUE and align = "left" is important here), and we reverse the result to have the vector needed.
rev(rollapply(rev(df$count), 6, sum, partial = TRUE, align = "left"))
# [1] 1 3 6 10 15 21 27 33 39 45
Note that this sums 6 elements, not 5. According to the code in your question, this gives the same output. If you just want to sum 5 rows, just replace the 6 with a 5.

Related

R Select N evenly spaced out elements in vector, including first and last

I'm looking for a way to extract evenly spaced elements in a vector. I'd like a general way to do this because I am trying to specify the values that I want in a plotly chart. I tried using pretty but that only seems to work with ggplot2.
I'm pretty much looking for an R version of this question that was answered for python.
Here's a sample set. This sample is a vector of 23 elements, a prime that cannot be factored.
x <- 1:23
Ideally, there would be a function that takes a number for the spacing (n) and that splits x into a subset of n evenly spaced values that also includes the first and last element. For example:
split_func(x, n = 4)
[1] 1 4 8 12 16 20 23
The output elements centered between the first and last elements and are spaced by 4, with the exception of the first/second and second-to-last/last elements.
A couple other examples:
split_func(x, n = 5)
[1] 1 5 10 15 20 23 # either this
[1] 1 4 9 14 19 23 # or this would work
split_func(1:10, n = 3)
[1] 1 3 6 9 10 # either this
[1] 1 2 5 8 10 # or this would work
split_func(1:27, n = 6)
[1] 1 5 11 17 23 27
Is there a function that does this already?
Try this:
split_func <- function(x, by) {
r <- diff(range(x))
out <- seq(0, r - by - 1, by = by)
c(round(min(x) + c(0, out - 0.51 + (max(x) - max(out)) / 2), 0), max(x))
}
split_func(1:23, 4)
# [1] 1 4 8 12 16 20 23
split_func(1:23, 5)
# [1] 1 4 9 14 19 23
split_func(1:10, 3)
# [1] 1 4 7 10
split_func(1:27, 6)
# [1] 1 5 11 17 23 27

selecting row in a dataframe according to defined values

I have an annual record of temperature. I need to select special row (days) with five rows before them (to take the mean of five days) and then take the mean of the selected groups. here is my data frame and the following code that i applied but didn't work.
Day T.m
1 22
2 21
3 34
4 28
5 14
6 7
7 12
8 22
9 11
10 12
11 14
12 3
13 4
14 11
15 16
a <- c(8, 12,14)
apply(DF [c((a-5):a),2], 1, mean)
We can use mapply
mapply(function(x, y) mean(DF[[2]][x:y]), a-5, a)
#[1] 19.500000 12.333333 9.166667
Or a vectorized approach would be
tapply(DF[[2]][rep(a-5 , each = 6) + 0:5], rep(1:3, each = 6), FUN = mean)
# 1 2 3
#19.500000 12.333333 9.166667

Using rollsum function for future instances (forward rollsum)

I want to apply a forward rollsum, i.e., instead of giving me the sum (or median) of past instances, I want to calculate the sum of future instances.
I know the function rollsum (and rollmedian, rollapply), but they just work for past instances. At least, I haven't been able to find information on how to do it.
Example:
price = c(c5,5,8,2,6,2,6,6,6,0,7,0,3,8,9,9)
past = rollsum(price, 4, align='right',fill=NA)
future = c(21,18,16,20,2018,19,13,10,18,20,29,rep(NA,4))
price past future
5 NA 21
5 NA 18
8 NA 16
2 20 20
6 21 20
2 18 18
6 16 19
6 20 13
6 20 10
0 18 18
7 19 20
0 13 29
3 10 NA
8 18 NA
9 20 NA
9 29 NA
The align argument controls this. For example, by specifying align = "left" we get this:
library(zoo)
rollsum(1:6, 3, align = "left", fill = NA)
## [1] 6 9 12 15 NA NA
The 6 in the output is 1+2+3, the 9 in the output is 2+3+4, etc. The last two elements are NA since there are not 3 future elements.
Even more flexiblity is available if you use to rollapply. For example, this is the same as above:
rollapply(1:6, 3, sum, align = "left", fill = NA)
## [1] 6 9 12 15 NA NA
whereas the following sums the 3 components AFTER but not including the current component (the elements of the list are the offsets from the current position to use where 0 means current position, 1 is the next position, etc. -- negative numbers can be used for the prior positions).
rollapply(1:6, list(1:3), sum, fill = NA)
## [1] 9 12 15 NA NA NA
Thus 9 is 2+3+4 since 2, 3, 4 are the 3 components that come after the first component, 1.
Assuming you're ordering your data by date, couldn't you do something like:
df %>%
group_by( someFactorColumn) %>% # optional grouping variable
arrange(-dateItHappened) %>%
mutate( forwardsum = cumsum( valYouCareAbout) %>%
arrange( dateItHappened)
We could also use roll_sum from library(RcppRoll)
library(RcppRoll)
roll_sum(df1$price,4, align='left', fill=NA)

How to get a vector which identify to which intervals the elements belong in R

I need to sort my vector values into custom intervals and subsequently identify which element belong to which interval.
For example if a vector is:
x <- c(1,4,12,13,18,24)
and the intervals are:
interval.vector <- c(1,7,13,19,25)
1st interval: 1 - 7
2nd interval: 7 - 13
3rd interval: 13 - 19
4th interval: 19 - 25
...how do I combine x and interval.vector to get this:
element: 1 4 12 13 18 24
interval: 1 1 2 2 3 4
You can also use cut.
x <- c(1,4,12,13,18,24)
interval.vector <- c(1,7,13,19,25)
x.cut <- cut(x, breaks = interval.vector, include.lowest = TRUE)
data.frame(x, x.cut, group = as.numeric(x.cut))
x x.cut group
1 1 [1,7] 1
2 4 [1,7] 1
3 12 (7,13] 2
4 13 (7,13] 2
5 18 (13,19] 3
6 24 (19,25] 4
Another option is the very efficient findInterval function, but I'm not sure how robust this solution on different variations of x
findInterval(x, interval.vector + 1L, all.inside = TRUE)
## [1] 1 1 2 2 3 4

How to add a date to each row for a column in a data frame?

df <- data.frame(DAY = character(), ID = character())
I'm running a (for i in DAYS[i]) and get IDs for each day and storing them in a data frame
df <- rbind(df, data.frame(ID = IDs))
I want to add the DAY[i] in a second column across each row in a loop.
How do I do that?
As #Pascal says, this isn't the best way to create a data frame in R. R is a vectorised language, so generally you don't need for loops.
I'm assuming each ID is unique, so you can create a vector of IDs from 1 to 10:
ID <- 1:10
Then, you need a vector for your DAYs which can be the same length as your IDs, or can be recycled (i.e. if you only have a certain number of days that are repeated in the same order you can have a smaller vector that's reused). Use c() to create a vector with more than one value:
DAY <- c(1, 2, 9, 4, 4)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 1
# 2 2 2
# 3 3 9
# 4 4 4
# 5 5 4
# 6 6 1
# 7 7 2
# 8 8 9
# 9 9 4
# 10 10 4
Or with a vector for DAY that includes unique values:
DAY <- sample(1:100, 10, replace = TRUE)
df <- data.frame(ID, DAY)
df
# ID DAY
# 1 1 61
# 2 2 30
# 3 3 32
# 4 4 97
# 5 5 32
# 6 6 74
# 7 7 97
# 8 8 73
# 9 9 16
# 10 10 98

Resources