This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 6 years ago.
I have searched a lot, but not found a solution.
I have the following data frame:
Age no.observations Factor
1 1 4 A
2 1 3 A
3 1 12 A
4 1 5 B
5 1 9 B
6 1 3 B
7 2 12 A
8 2 3 A
9 2 6 A
10 2 7 B
11 2 9 B
12 2 1 B
I would like to sum create another column with the sum by the categories Age and Factor, thus having 19 for the first three rows, 26 for the next three etc. I want this to be a column added to this data.frame, therefore dplyr and its summarise function do not help.
Use mutate with group_by to not summarise:
df %>%
group_by(Age, Factor) %>%
mutate(no.observations.in.group = sum(no.observations)) %>%
ungroup()
Related
This question already has answers here:
subtract first or second value from each row [duplicate]
(2 answers)
Closed 3 days ago.
I would like to create a new column in my dataset that shows the difference in the values (column b in example dataset) between the current row and the first row within a group (column a in example dataset) in R. How would I go about doing this?
a<-c(1,1,1,1,2,2,2,2)
b<-c(2,4,6,8,10,12,14,16)
have<-as.data.frame(cbind(a,b))
> have
a b
1 2
1 4
1 6
1 8
2 10
2 12
2 14
2 16
> want
a b c
1 2 0
1 4 2
1 6 4
1 8 6
2 10 0
2 12 2
2 14 4
2 16 6
You can use first() to address the first member in the group:
library(dplyr)
as.data.frame(cbind(a,b)) %>%
group_by(a) %>%
mutate(c = b - first(b)) %>%
ungroup()
This question already has an answer here:
Include levels of zero count in result of table()
(1 answer)
Closed 2 years ago.
I have this collection
x <- c(3,4,5,7,7,9,9,9,10,10,10,10,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,13,13,13,13,14,14,14,15,15)
And I want to get the frequencies of each value of the sequence 3:15 within that collection. If I do table(x) it gives me the frequencies of the existing values, but for example, the value 6 would have a frequency value of 0 and is not shown with table().
Use factor with levels in table.
table(factor(x, levels = 3:15))
# 3 4 5 6 7 8 9 10 11 12 13 14 15
# 1 1 1 0 2 0 3 4 7 10 14 3 2
Or for a general case :
table(factor(x, levels = min(x):max(x)))
This question already has answers here:
Group variable based on continuous values
(1 answer)
Group a dataframe based on sequence breaks in a column?
(2 answers)
Something like conditional seq_along on grouped data
(1 answer)
How do I create a variable that increments by 1 based on the value of another variable?
(3 answers)
Closed 3 years ago.
I have following data:
d <- as_tibble(c(1,2,1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7))
The running numbers are one group, and for every reset
I need hvae a new group. What I need is a group-ID for
every numbering reset; hence:
d$ID <- c(1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4)
To visualize it:
value ID
1 1
2 1
1 2
2 2
3 2
4 2
5 2
1 3
2 3
3 3
4 3
1 4
2 4
3 4
4 4
5 4
6 4
7 4
I have tried using group_indices of dplyr but
that doesnt do the trick as it groups by same value:
d$ID <- d %>% group_indices(value)
We can use diff to subtract the current value with previous value and increment the counter whenever the values are reset.
cumsum(c(TRUE, diff(d$value) < 0))
#[1] 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4
In dplyr,we can use lag to compare it with previous value.
library(dplyr)
d %>% mutate(ID = cumsum(value < lag(value, default = first(value))) + 1)
This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 3 years ago.
I'm trying to use the tidyverse (whatever package is appropriate) to add a column (via mutate()) that is a running total of the unique values that have occurred in the column so far. Here is some toy data, showing the desired output.
data.frame("n"=c(1,1,1,6,7,8,8),"Unique cumsum"=c(1,1,1,2,3,4,4))
Who knows how to accomplish this in the tidyverse?
Here is an option with group_indices
library(dplyr)
df1%>%
mutate(unique_cumsum = group_indices(., n))
# n unique_cumsum
#1 1 1
#2 1 1
#3 1 1
#4 6 2
#5 7 3
#6 8 4
#7 8 4
data
df1 <- data.frame("n"=c(1,1,1,6,7,8,8))
Here's one way, using the fact that a factor will assign a sequential value to each unique item, and then converting the underlying factor codes with as.numeric:
data.frame("n"=c(1,1,1,6,7,8,8)) %>% mutate(unique_cumsum=as.numeric(factor(n)))
n unique_cumsum
1 1 1
2 1 1
3 1 1
4 6 2
5 7 3
6 8 4
7 8 4
Another solution:
df <- data.frame("n"=c(1,1,1,6,7,8,8))
df <- df %>% mutate(`unique cumsum` = cumsum(!duplicated(n)))
This should work even if your data is not sorted.
This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 6 years ago.
I have a dataset that looks like this:
id time
1 1
1 2
2 5
2 3
3 2
3 7
3 8
And I want to add another column to show me how many observations there are in a group.
id time label
1 1 1
1 2 2
2 5 1
2 3 2
3 2 1
3 7 2
3 8 3
We can use ave
df1$label <- with(df1, ave(seq_along(id), id, FUN=seq_along))
Or with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(label = row_number())