Calculate difference between current row and first row within group [duplicate] - r

This question already has answers here:
subtract first or second value from each row [duplicate]
(2 answers)
Closed 3 days ago.
I would like to create a new column in my dataset that shows the difference in the values (column b in example dataset) between the current row and the first row within a group (column a in example dataset) in R. How would I go about doing this?
a<-c(1,1,1,1,2,2,2,2)
b<-c(2,4,6,8,10,12,14,16)
have<-as.data.frame(cbind(a,b))
> have
a b
1 2
1 4
1 6
1 8
2 10
2 12
2 14
2 16
> want
a b c
1 2 0
1 4 2
1 6 4
1 8 6
2 10 0
2 12 2
2 14 4
2 16 6

You can use first() to address the first member in the group:
library(dplyr)
as.data.frame(cbind(a,b)) %>%
group_by(a) %>%
mutate(c = b - first(b)) %>%
ungroup()

Related

Group-ID according to numbering reset [duplicate]

This question already has answers here:
Group variable based on continuous values
(1 answer)
Group a dataframe based on sequence breaks in a column?
(2 answers)
Something like conditional seq_along on grouped data
(1 answer)
How do I create a variable that increments by 1 based on the value of another variable?
(3 answers)
Closed 3 years ago.
I have following data:
d <- as_tibble(c(1,2,1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7))
The running numbers are one group, and for every reset
I need hvae a new group. What I need is a group-ID for
every numbering reset; hence:
d$ID <- c(1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4)
To visualize it:
value ID
1 1
2 1
1 2
2 2
3 2
4 2
5 2
1 3
2 3
3 3
4 3
1 4
2 4
3 4
4 4
5 4
6 4
7 4
I have tried using group_indices of dplyr but
that doesnt do the trick as it groups by same value:
d$ID <- d %>% group_indices(value)
We can use diff to subtract the current value with previous value and increment the counter whenever the values are reset.
cumsum(c(TRUE, diff(d$value) < 0))
#[1] 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4
In dplyr,we can use lag to compare it with previous value.
library(dplyr)
d %>% mutate(ID = cumsum(value < lag(value, default = first(value))) + 1)

Build rowSums in dplyr based on columns containing pattern in their names [duplicate]

This question already has answers here:
Sum across multiple columns with dplyr
(8 answers)
R, create a new column in a data frame that applies a function of all the columns with similar names
(3 answers)
Closed 4 years ago.
My data frame looks something like this
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3
A 1 0 1 1
A 2 1 1 2
A 3 3 0 0
With dplyr I want to build a columns that sums the values of the count-variables for each row, selecting the count-variables based on their name.
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
A 1 0 1 1 2
A 2 1 1 2 4
A 3 3 0 0 3
How do I do that?
As you asked for a dplyr solution, you can do:
library(dplyr)
df %>%
mutate(SUM = rowSums(select(., starts_with("COUNT"))))
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
1 A 1 0 1 1 2
2 A 2 1 1 2 4
3 A 3 3 0 0 3

Loop over each group and subtract their value [duplicate]

This question already has answers here:
R: Differences by group and adding
(3 answers)
Closed 6 years ago.
I have the following dataset:
df <- data.frame (id= c(1,1,1,2,2), time = c(13,14,17,17,17))
id time
1 1 13
2 1 14
3 1 17
4 2 17
5 2 17
and I wish to go over on each id and subtract the next time and the previous time. So, My ideal output will be:
#output
id time diff
1 1 13 0
2 1 14 1
3 1 17 3
4 2 17 0
5 2 17 0
What is the most efficient way for that?
Thank so Zheyuan Li.
This is a great solution:
df$diff <- with(df, ave(time, id, FUN = function (x) c(0, diff(x))))

in R: Sum by group without summarising [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 6 years ago.
I have searched a lot, but not found a solution.
I have the following data frame:
Age no.observations Factor
1 1 4 A
2 1 3 A
3 1 12 A
4 1 5 B
5 1 9 B
6 1 3 B
7 2 12 A
8 2 3 A
9 2 6 A
10 2 7 B
11 2 9 B
12 2 1 B
I would like to sum create another column with the sum by the categories Age and Factor, thus having 19 for the first three rows, 26 for the next three etc. I want this to be a column added to this data.frame, therefore dplyr and its summarise function do not help.
Use mutate with group_by to not summarise:
df %>%
group_by(Age, Factor) %>%
mutate(no.observations.in.group = sum(no.observations)) %>%
ungroup()

How to create a new row that would show me the number of observations in a group in an unbalanced panel dataset in R? [duplicate]

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 6 years ago.
I have a dataset that looks like this:
id time
1 1
1 2
2 5
2 3
3 2
3 7
3 8
And I want to add another column to show me how many observations there are in a group.
id time label
1 1 1
1 2 2
2 5 1
2 3 2
3 2 1
3 7 2
3 8 3
We can use ave
df1$label <- with(df1, ave(seq_along(id), id, FUN=seq_along))
Or with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(label = row_number())

Resources