I have the same question as this post, but I want to use dplyr:
With an R dataframe, eg:
df <- data.frame(id = rep(1:3, each = 5)
, hour = rep(1:5, 3)
, value = sample(1:15))
how do I add a cumulative sum column that matches the id?
Without dplyr the accepted solution of the previous post is:
df$csum <- ave(df$value, df$id, FUN=cumsum)
Like this?
df <- data.frame(id = rep(1:3, each = 5),
hour = rep(1:5, 3),
value = sample(1:15))
mutate(group_by(df,id), csum=cumsum(value))
Or if you use the dplyr's piping operator:
df %>% group_by(id) %>% mutate(csum = cumsum(value))
Result in both cases:
Source: local data frame [15 x 4]
Groups: id
id hour value csum
1 1 1 4 4
2 1 2 14 18
3 1 3 8 26
4 1 4 2 28
5 1 5 3 31
6 2 1 10 10
7 2 2 7 17
8 2 3 5 22
9 2 4 12 34
10 2 5 9 43
11 3 1 6 6
12 3 2 15 21
13 3 3 1 22
14 3 4 13 35
15 3 5 11 46
Related
I have a dataframe having some rows missing value. Here is a sample dataframe:
df <- data.frame(id = c(1,1,1, 2,2,2, 3,3,3),
item = c(11,12,13, 24,25,26, 56,45,56),
score = c(5,5, NA, 6,6,6, 7,NA, 7))
> df
id item score
1 1 11 5
2 1 12 5
3 1 13 NA
4 2 24 6
5 2 25 6
6 2 26 6
7 3 56 7
8 3 45 NA
9 3 56 7
Grouping the dataset by id column, I would like to fill those NA values with the same score.
the desired output should be:
> df
id item score
1 1 11 5
2 1 12 5
3 1 13 5
4 2 24 6
5 2 25 6
6 2 26 6
7 3 56 7
8 3 45 7
9 3 56 7
Any ideas?
Thanks!
We can group by 'id' and fill
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
fill(score, .direction = "downup") %>%
ungroup
Here is another option with base R
> transform(df, score = ave(score, id, FUN = function(x) mean(x, na.rm = TRUE)))
id item score
1 1 11 5
2 1 12 5
3 1 13 5
4 2 24 6
5 2 25 6
6 2 26 6
7 3 56 7
8 3 45 7
9 3 56 7
Another option is to create your own function,eg:
fill.in<-function(dataf){
dataf2<-data.frame()
for (i in 1:length(unique(dataf$id))){
dataf1<-subset(dataf, id %in% unique(dataf$id)[i])
dataf1$score<-max(dataf1$score,na.rm=TRUE)
dataf2<-rbind(dataf2,dataf1)
}
return(dataf2)
}
fill.in(df)
I have the following data.
id type
1 15
1 16
2 10
3 10
3 11
3 13
3 14
4 9
5 8
5 20
5 21
5 22
Using the above data, we want to calculate the "interval" when "id" is equal.
id type interval
1 15 -
1 16 1
2 10 -
3 10 -
3 11 1
3 13 2
3 14 1
4 9 -
5 8 -
5 20 12
5 21 1
5 22 1
We group by 'id' and take the diff to create the 'interval' column
library(data.table)
setDT(df1)[, interval := c(0, diff(type)), by = id]
Or with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(interval = c(0, diff(type))
Or with ave from base R
df1$interval <- with(df1, ave(type, id, FUN = function(x) c(0, diff(x))))
I have data frame, I want to create a new variable by sum of each ID and group, if I sum normal,dimension of data reduce, my case I need to keep and repeat each row.
ID <- c(rep(1,3), rep(3, 5), rep(4,4))
Group <-c(1,1,2,1,1,1,2,2,1,1,1,2)
x <- c(1:12)
y<- c(12:23)
df <- data.frame(ID,Group,x,y)
ID Group x y
1 1 1 1 12
2 1 1 2 13
3 1 2 3 14
4 3 1 4 15
5 3 1 5 16
6 3 1 6 17
7 3 2 7 18
8 3 2 8 19
9 4 1 9 20
10 4 1 10 21
11 4 1 11 22
12 4 2 12 23
The output with 2 more variables "sumx" and "sumy". Group by (ID, Group)
ID Group x y sumx sumy
1 1 1 1 12 3 25
2 1 1 2 13 3 25
3 1 2 3 14 3 14
4 3 1 4 15 15 48
5 3 1 5 16 15 48
6 3 1 6 17 15 48
7 3 2 7 18 15 37
8 3 2 8 19 15 37
9 4 1 9 20 30 63
10 4 1 10 21 30 63
11 4 1 11 22 30 63
12 4 2 12 23 12 23
Any Idea?
As short as:
df$sumx <- with(df,ave(x,ID,Group,FUN = sum))
df$sumy <- with(df,ave(y,ID,Group,FUN = sum))
We can use dplyr
library(dplyr)
df %>%
group_by(ID, Group) %>%
mutate_each(funs(sum)) %>%
rename(sumx=x, sumy=y) %>%
bind_cols(., df[c("x", "y")])
If there are only two columns to sum, then
df %>%
group_by(ID, Group) %>%
mutate(sumx = sum(x), sumy = sum(y))
You can use below code to get what you want if it is a single column and in case you have more than 1 column then add accordingly:
library(dplyr)
data13 <- data12 %>%
group_by(Category) %>%
mutate(cum_Cat_GMR = cumsum(GrossMarginRs))
I have the same question as this post, but I want to use dplyr:
With an R dataframe, eg:
df <- data.frame(id = rep(1:3, each = 5)
, hour = rep(1:5, 3)
, value = sample(1:15))
how do I add a cumulative sum column that matches the id?
Without dplyr the accepted solution of the previous post is:
df$csum <- ave(df$value, df$id, FUN=cumsum)
Like this?
df <- data.frame(id = rep(1:3, each = 5),
hour = rep(1:5, 3),
value = sample(1:15))
mutate(group_by(df,id), csum=cumsum(value))
Or if you use the dplyr's piping operator:
df %>% group_by(id) %>% mutate(csum = cumsum(value))
Result in both cases:
Source: local data frame [15 x 4]
Groups: id
id hour value csum
1 1 1 4 4
2 1 2 14 18
3 1 3 8 26
4 1 4 2 28
5 1 5 3 31
6 2 1 10 10
7 2 2 7 17
8 2 3 5 22
9 2 4 12 34
10 2 5 9 43
11 3 1 6 6
12 3 2 15 21
13 3 3 1 22
14 3 4 13 35
15 3 5 11 46
With data frame:
df <- data.frame(id = rep(1:3, each = 5)
, hour = rep(1:5, 3)
, value = sample(1:15))
I want to add a cumulative sum column that matches the id:
df
id hour value csum
1 1 1 7 7
2 1 2 9 16
3 1 3 15 31
4 1 4 11 42
5 1 5 14 56
6 2 1 10 10
7 2 2 2 12
8 2 3 5 17
9 2 4 6 23
10 2 5 4 27
11 3 1 1 1
12 3 2 13 14
13 3 3 8 22
14 3 4 3 25
15 3 5 12 37
How can I do this efficiently? Thanks!
df$csum <- ave(df$value, df$id, FUN=cumsum)
ave is the "go-to" function if you want a by-group vector of equal length to an existing vector and it can be computed from those sub vectors alone. If you need by-group processing based on multiple "parallel" values, the base strategy is do.call(rbind, by(dfrm, grp, FUN)).
To add to the alternatives, data.table's syntax is nice:
library(data.table)
DT <- data.table(df, key = "id")
DT[, csum := cumsum(value), by = key(DT)]
Or, more compactly:
library(data.table)
setDT(df)[, csum := cumsum(value), id][]
The above will:
Convert the data.frame to a data.table by reference
Calculate the cumulative sum of value grouped by id and assign it by reference
Print (the last [] there) the result of the entire operation
"df" will now be a data.table with a "csum" column.
Using dplyr::
require(dplyr)
df %>% group_by(id) %>% mutate(csum = cumsum(value))
Using library plyr.
library(plyr)
ddply(df,.(id),transform,csum=cumsum(value))
Using base R
df <- data.frame(id = rep(1:3, each = 5)
, hour = rep(1:5, 3)
, value = sample(1:15))
transform(df , csum = ave(value , id , FUN = cumsum))
#> id hour value csum
#> 1 1 1 4 4
#> 2 1 2 12 16
#> 3 1 3 13 29
#> 4 1 4 6 35
#> 5 1 5 5 40
#> 6 2 1 15 15
#> 7 2 2 1 16
#> 8 2 3 2 18
#> 9 2 4 8 26
#> 10 2 5 9 35
#> 11 3 1 11 11
#> 12 3 2 7 18
#> 13 3 3 10 28
#> 14 3 4 3 31
#> 15 3 5 14 45
Created on 2022-06-05 by the reprex package (v2.0.1)