Sum 1:n by group - r

Have: Dataset I need to sum i:n for each row within each group
demo<-data.frame(th=c(c(0,24,26),(c(0,1,2,4))),hs=c(rep(220,3),c(rep(240,4))),
seq=(c(1:3,1:4)),group=c(rep(1,3),rep(2,4)))
Here's what that looks like:
> demo
th hs seq group
1 0 220 1 1
2 24 220 2 1
3 26 220 3 1
4 0 240 1 2
5 1 240 2 2
6 2 240 3 2
7 4 240 4 2
Need a vector that is a based on the hs, seq, and th columns but that is a summation of the hs column raised to the seq column and times the th columns up to that row within the group.
demo[1,"an"]<- demo[1,"hs"]^demo[1,"seq"] * demo[1,"th"]
demo[2,"an"]<-sum(demo[1,"hs"]^demo[1,"seq"] * demo[1,"th"],
demo[2,"hs"]^demo[2,"seq"] * demo[2,"th"] )
demo[3,"an"]<-sum(demo[1,"hs"]^demo[1,"seq"] * demo[1,"th"],
demo[2,"hs"]^demo[2,"seq"] * demo[2,"th"],
demo[3,"hs"]^demo[3,"seq"] * demo[3,"th"])
demo[6,"an"]<-sum(demo[4,"hs"]^demo[4,"seq"] * demo[4,"th"],
demo[5,"hs"]^demo[5,"seq"] * demo[5,"th"],
demo[6,"hs"]^demo[6,"seq"] * demo[6,"th"])
Here's what that new column (an) should look like
> demo
th hs seq group an
1 0 220 1 1 0
2 24 220 2 1 1161600
3 26 220 3 1 278009600
4 0 240 1 2 NA
5 1 240 2 2 NA
6 2 240 3 2 27705600
7 4 240 4 2 NA
Ignore the NA's in this MRE, those need to be filled in too.

Libraries
library(tidyverse)
Sample data
df <-
read.csv(
text =
"th hs seq group
0 220 1 1
24 220 2 1
26 220 3 1
0 240 1 2
1 240 2 2
2 240 3 2
4 240 4 2",
sep = " ",header = T
)
Code
df %>%
#Grouping by group
group_by(group) %>%
#Applying a cumulative sum of the formula, by group
mutate(an = cumsum(hs^seq*th))
Output
th hs seq group an
<int> <int> <int> <int> <dbl>
1 0 220 1 1 0
2 24 220 2 1 1161600
3 26 220 3 1 278009600
4 0 240 1 2 0
5 1 240 2 2 57600
6 2 240 3 2 27705600
7 4 240 4 2 13298745600

We can use data.table
library(data.table)
setDT(df)[, an := cumsum(hs^seq^th), group]

Related

cumsum by participant and reset on 0 R [duplicate]

This question already has answers here:
R cumulative sum by condition with reset
(3 answers)
Cumulative sum that resets when 0 is encountered
(4 answers)
Closed 1 year ago.
I have a data frame that looks like this below. I need to sum the number of correct trials by participant, and reset the counter when it gets to a 0.
Participant TrialNumber Correct
118 1 1
118 2 1
118 3 1
118 4 1
118 5 1
118 6 1
118 7 1
118 8 0
118 9 1
118 10 1
120 1 1
120 2 1
120 3 1
120 4 1
120 5 0
120 6 1
120 7 0
120 8 1
120 9 1
120 10 1
I've tried using splitstackshape:
df$Count <- getanID(cbind(df$Participant, cumsum(df$Correct)))[,.id]
But it cumulatively sums the correct trials when it gets to a 0 and not by participant:
Participant TrialNumber Correct Count
118 1 1 1
118 2 1 1
118 3 1 1
118 4 1 1
118 5 1 1
118 6 1 1
118 7 1 1
118 8 0 2
118 9 1 1
118 10 1 1
120 1 1 1
120 2 1 1
120 3 1 1
120 4 1 1
120 5 0 2
120 6 1 1
120 7 0 2
120 8 1 1
120 9 1 1
120 10 1 1
I then tried using dplyr:
df %>%
group_by(Participant) %>%
mutate(Count=cumsum(Correct)) %>%
ungroup %>%
as.data.frame(df)
Participant TrialNumber Correct Count
118 1 1 1
118 2 1 2
118 3 1 3
118 4 1 4
118 5 1 5
118 6 1 6
118 7 1 7
118 8 0 7
118 9 1 8
118 10 1 9
120 1 1 1
120 2 1 2
120 3 1 3
120 4 1 4
120 5 0 4
120 6 1 5
120 7 0 5
120 8 1 6
120 9 1 7
120 10 1 8
Which gets me closer, but still doesn't reset the counter when it gets to 0. If anyone has any suggestions to do this it would be greatly appreciated, thank you
Does this work?
library(dplyr)
library(data.table)
df %>%
mutate(grp = rleid(Correct)) %>%
group_by(Participant, grp) %>%
mutate(Count = cumsum(Correct)) %>%
select(- grp)
# A tibble: 10 x 4
# Groups: Participant, grp [6]
grp Participant Correct Count
<int> <chr> <dbl> <dbl>
1 1 A 1 1
2 1 A 1 2
3 1 A 1 3
4 2 A 0 0
5 3 A 1 1
6 3 B 1 1
7 3 B 1 2
8 4 B 0 0
9 5 B 1 1
10 5 B 1 2
Toy data:
df <- data.frame(
Participant = c(rep("A", 5), rep("B", 5)),
Correct = c(1,1,1,0,1,1,1,0,1,1)
)

How can I create a lag difference variable within group relative to baseline?

I would like a variable that is a lagged difference to the within group baseline. I have panel data that I have balanced.
my_data <- data.frame(id = c(1,1,1,2,2,2,3,3,3), group = c(1,2,3,1,2,3,1,2,3), score=as.numeric(c(0,150,170,80,100,110,75,100,0)))
id group score
1 1 1 0
2 1 2 150
3 1 3 170
4 2 1 80
5 2 2 100
6 2 3 110
7 3 1 75
8 3 2 100
9 3 3 0
I would like it to look like this:
id group score lag_diff_baseline
1 1 1 0 NA
2 1 2 150 150
3 1 3 170 170
4 2 1 80 NA
5 2 2 100 20
6 2 3 110 30
7 3 1 75 NA
8 3 2 100 25
9 3 3 0 -75
The data.table version of #Liam's answer
library(data.table)
setDT(my_data)
my_data[,.(id,group,score,lag_diff_baseline = score-first(score)),by = id]
I missed the easy answer:
library(dplyr)
my_data %>%
group_by(id) %>%
mutate(lag_diff_baseline = score - first(score))

How to rank a column with a condition

I have a data frame :
dt <- read.table(text = "
1 390
1 366
1 276
1 112
2 97
2 198
2 400
2 402
3 110
3 625
4 137
4 49
4 9
4 578 ")
The first colomn is Index and the second is distance.
I want to add a colomn to rank the distance by Index in a descending order (the highest distance will be ranked first)
The result will be :
dt <- read.table(text = "
1 390 1
1 66 4
1 276 2
1 112 3
2 97 4
2 198 3
2 300 2
2 402 1
3 110 2
3 625 1
4 137 2
4 49 3
4 9 4
4 578 1")
Another R base approach
> dt$Rank <- unlist(tapply(-dt$V2, dt$V1, rank))
A tidyverse solution
dt %>%
group_by(V1) %>%
mutate(Rank=rank(-V2))
transform(dt,s = ave(-V2,V1,FUN = rank))
V1 V2 s
1 1 390 1
2 1 66 4
3 1 276 2
4 1 112 3
5 2 97 4
6 2 198 3
7 2 300 2
8 2 402 1
9 3 110 2
10 3 625 1
11 4 137 2
12 4 49 3
13 4 9 4
14 4 578 1
You could group, arrange, and rownumber. The result is a bit easier on the eyes than a simple rank, I think, and so worth an extra step.
dt %>%
group_by(V1) %>%
arrange(V1,desc(V2)) %>%
mutate(rank = row_number())
# A tibble: 14 x 3
# Groups: V1 [4]
V1 V2 rank
<int> <int> <int>
1 1 390 1
2 1 366 2
3 1 276 3
4 1 112 4
5 2 402 1
6 2 400 2
7 2 198 3
8 2 97 4
9 3 625 1
10 3 110 2
11 4 578 1
12 4 137 2
13 4 49 3
14 4 9 4
A scrambled alternative is min_rank
dt %>%
group_by(V1) %>%
mutate(min_rank(desc(V2)) )

Add column with numbers based on a second column

Here my data.frame:
df = read.table(text = 'Day ID Event
100 1 1
100 1 1
99 1 1
97 1 1
87 2 1
86 2 1
85 2 1
965 1 2
964 1 2
960 1 2
959 1 2
709 2 2
708 2 2
12 3 2
9 3 2', header = TRUE)
What I would like to do is to create a new column which, considering the ID and Event ones, assign for each observation a number in decreasing order based on the relative Day ones.
My desired output would be:
Day ID Event Count
100 1 1 4
100 1 1 4
99 1 1 3
97 1 1 1
87 2 1 3
86 2 1 2
85 2 1 1
965 1 2 7
964 1 2 6
960 1 2 2
959 1 2 1
709 2 2 2
708 2 2 1
12 3 2 4
9 3 2 1
E.g. If you look at the first 'block' above: Day 97 = 1, Day 98 = 2, Day 99 = 3 and Day 100 = 4. We are missing Day 98 but we still need to include it in the count.
I tried the following but the output is not the one I need:
df$Count <- ave(df$Day, df$Event, df$ID, FUN = seq_along)
Thanks for your help
We can try
library(dplyr)
df %>%
group_by(ID, Event) %>%
mutate(Count = 1+(Day-Day[n()]))

Create a new column with a sum based on the value of three other columns

I have a data frame and I want to create another column based on the information of three different columns. I am using R.
I want to start counting on 0 and to add 2 in each new cell, based on a column Time and on Item and Participants information. I want to have 0 for the beginning of the Time counting (which is in ms) for each item of each participant.
df <- data.frame(Item=c(1,1,1,1,1,1,2,2,2,2,2,2),
Part=c(1,1,1,2,2,2,1,1,1,2,2,2),
Time=c(1234,1235,1236,345,346,347,1546,1547,1548,234,235,236))
Item Part Time
1 1 1 1234
2 1 1 1235
3 1 1 1236
4 1 2 345
5 1 2 346
6 1 2 347
7 2 1 1546
8 2 1 1547
9 2 1 1548
10 2 2 234
11 2 2 235
12 2 2 236
With the new column the table would be something like:
Item Part Time NewColumn
1 1 1 1234 0
2 1 1 1235 2
3 1 1 1236 4
4 1 2 345 0
5 1 2 346 2
6 1 2 347 4
7 2 1 1546 0
8 2 1 1547 2
9 2 1 1548 4
10 2 2 234 0
11 2 2 235 2
12 2 2 236 4
Many thanks in advance.
In case the structure stays as it is
library(dplyr)
result <- df %>% group_by(Part, Item) %>% mutate(NewColumn = seq (0,4,2))
I group by Item and Part and create a new column that counts 0, 2, 4
Item Part Time NewColumn
1 1 1 1234 0
2 1 1 1235 2
3 1 1 1236 4
4 1 2 345 0
5 1 2 346 2
6 1 2 347 4
7 2 1 1546 0
8 2 1 1547 2
9 2 1 1548 4
10 2 2 234 0
11 2 2 235 2
12 2 2 236 4
In order to be more flexible (if you have more than 3 rows per group), you can use
result <- df %>% group_by(Part, Item) %>% mutate(NewColumn = 2* (row_number()-1))
which will will generate numbers in the sequence 0, 2, 4, 6, 8,...
library(data.table)
df <- data.table(df)
df[, NewCol := seq(0,nrow(df),2), by=list(Item,Part)]
Er... df = cbind(df,NewColumn=c(0,2,4))?
+1 for library(plyr)
library(plyr)
ddply(df, c("Item","Part"), mutate,NewColumn = seq(0,4,2))
Item Part Time NewColumn
1 1 1234 0
1 1 1235 2
1 1 1236 4
1 2 345 0
1 2 346 2
1 2 347 4
2 1 1546 0
2 1 1547 2
2 1 1548 4
2 2 234 0
2 2 235 2
2 2 236 4

Resources