R: Assign incremental ids based on the groups [duplicate] - r

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 3 years ago.
I have the following sample data frame:
> test = data.frame(UserId = sample(1:5, 10, replace = T)) %>% arrange(UserId)
> test
UserId
1 1
2 1
3 1
4 1
5 1
6 3
7 4
8 4
9 4
10 5
I now want another column called loginCount for that user, which is something like assigning incremental ids within each group, something like below. Using the mutate like below creates id within each group, but how do I get the incremental ids within each group independent of each other ?
> test %>% mutate(loginCount = group_indices_(test, .dots = "UserId"))
UserId loginCount
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 3 2
7 4 3
8 4 3
9 4 3
10 5 4
I want something like shown below:
UserId loginCount
1 1
1 2
1 3
1 4
1 5
3 1
4 1
4 2
4 3
5 1

You could group and use row_number:
test %>%
arrange(UserId) %>%
group_by(UserId) %>%
mutate(loginCount = row_number()) %>%
ungroup()
# A tibble: 10 x 2
# Groups: UserId [4]
UserId loginCount
<int> <int>
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1

One solution using base R tapply()
test$loginCount <- unlist(tapply(rep(1, nrow(test)), test$UserId, cumsum))
> test
UserId loginCount
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1

Related

Remove groups with only one individual in R without using dplyr package [duplicate]

This question already has answers here:
Select groups with more than one distinct value
(3 answers)
Closed 1 year ago.
Consider the following dataset. The data is grouped with either one or two people per group. However, an individual may have several entries.
group<-c(1,1,1,1,2,2,3,3,3,3,4,4)
individualID<-c(1,1,2,2,3,3,5,5,6,6,7,7)
X<-rbinom(12,1,0.5)
df1<-data.frame(group,individualID,X)
> df1
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 2 3 1
6 2 3 1
7 3 5 1
8 3 5 1
9 3 6 1
10 3 6 1
11 4 7 0
12 4 7 1
From the above Group 1 and group 3 have 2 individuals whereas group 2 and group 4 have 1 individual each.
> aggregate(data = df1, individualID ~ group, function(x) length(unique(x)))
group individualID
1 1 2
2 2 1
3 3 2
4 4 1
How can I subset the data without use of dplyr package to have only groups that have more than 1 individual. i.e. omit groups with 1 individual.
I should end up with only group 1 and group 3.
Or another option is with tidyverse - after grouping by 'group', filter the rows where the number of distinct (n_distinct) elements in 'individualID' is greater than 1
library(dplyr)
df1 %>%
group_by(group) %>%
filter(n_distinct(individualID) > 1) %>%
ungroup
# A tibble: 8 × 3
group individualID X
<dbl> <dbl> <int>
1 1 1 0
2 1 1 0
3 1 2 1
4 1 2 1
5 3 5 0
6 3 5 0
7 3 6 1
8 3 6 0
Or with subset and ave from base R
subset(df1, ave(individualID, group, FUN = function(x) length(unique(x))) > 1)
group individualID X
1 1 1 0
2 1 1 0
3 1 2 1
4 1 2 1
7 3 5 0
8 3 5 0
9 3 6 1
10 3 6 0
There are more concise ways for sure, but here is the general idea.
# use your code to get the counts by group
df1_counts <- aggregate(data = df1, individualID ~ group, function(x) length(unique(x)))
# create a vector of groups where the count is > 1
keep_groups <- df1_counts$group[df1_counts$individualID > 1]
# filter the rows to only groups you want to keep
df1[df1$group %in% keep_groups,]
# group individualID X
# 1 1 1 0
# 2 1 1 0
# 3 1 2 1
# 4 1 2 0
# 7 3 5 1
# 8 3 5 1
# 9 3 6 0
# 10 3 6 1

Remove groups with only one individual in R [duplicate]

This question already has an answer here:
Select groups with more than one distinct value per group [duplicate]
(1 answer)
Closed 1 year ago.
Consider the following dataset. The data is grouped with either one or two people per group. However, an individual may have several entries.
df1<-data.frame(group,individualID,X)
> df1
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 2 3 1
6 2 3 1
7 3 5 1
8 3 5 1
9 3 6 1
10 3 6 1
11 4 7 0
12 4 7 1
From the above Group 1 and group 3 have 2 individuals whereas group 2 and group 4 have 1 individual each.
> aggregate(data = df1, individualID ~ group, function(x) length(unique(x)))
group individualID
1 1 2
2 2 1
3 3 2
4 4 1
How can I subset the data to have only groups that have more than 1 individual. i.e. omit groups with 1 individual.
I should end up with only group 1 and group 3.
You could make a lookup table to identify the groups that have more than one unique individualID (similar to what you did with aggregate), then filter df1 based on that:
library(dplyr)
lookup <- df1 %>%
group_by(group) %>%
summarise(count = n_distinct(individualID)) %>%
filter(count > 1)
df1 %>% filter(group %in% unique(lookup$group))
group individualID X
1 1 1 0
2 1 1 1
3 1 2 1
4 1 2 1
5 3 5 1
6 3 5 1
7 3 6 1
8 3 6 1
Or, as #MrGumble suggests above, you could also merge df1 after creating lookup:
merge(df1, lookup)
group individualID X count
1 1 1 0 2
2 1 1 1 2
3 1 2 1 2
4 1 2 1 2
5 3 6 1 2
6 3 6 1 2
7 3 5 1 2
8 3 5 1 2

Retrieve a value by another column criteria in R

i need some help:
i got this df:
df <- data.frame(month = c(1,1,1,1,1,2,2,2,2,2),
day = c(1,2,3,4,5,1,2,3,4,5),
flow = c(2,5,7,8,5,4,6,7,9,2))
month day flow
1 1 1 2
2 1 2 5
3 1 3 7
4 1 4 8
5 1 5 5
6 2 1 4
7 2 2 6
8 2 3 7
9 2 4 9
10 2 5 2
but i want to know the day of min per month:
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
this repetition is not a problem, i will use pivot fuction
tks people!
We can use which.min to return the index of 'min'imum 'flow' per group and use that to get the corresponding 'day' to create the column with mutate
library(dplyr)
df <- df %>%
group_by(month) %>%
mutate(dayminflowofthemonth = day[which.min(flow)]) %>%
ungroup
-output
df
# A tibble: 10 x 4
# month day flow dayminflowofthemonth
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 2 1
# 2 1 2 5 1
# 3 1 3 7 1
# 4 1 4 8 1
# 5 1 5 5 1
# 6 2 1 4 5
# 7 2 2 6 5
# 8 2 3 7 5
# 9 2 4 9 5
#10 2 5 2 5
Another option using indexing inside dplyr pipeline:
library(dplyr)
#Code
newdf <- df %>% group_by(month) %>% mutate(Val=day[flow==min(flow)][1])
Output:
# A tibble: 10 x 4
# Groups: month [2]
month day flow Val
<dbl> <dbl> <dbl> <dbl>
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
Here is a base R option using ave
transform(
df,
dayminflowofthemonth = ave(day*(ave(flow,month,FUN = min)==flow),month,FUN = max)
)
which gives
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
One more base R approach:
df$dayminflowofthemonth <- by(
df,
df$month,
function(x) x$day[which.min(x$flow)]
)[df$month]

Nested Subseting

I have the following data frame
Library(dplyr)
ID <- c(1,1,1,2,2,2,2,3,3)
Tag <- c(1,2,6,1,3,4,6,4,3)
Value <- c(5,9,3,3,5,6,4,8,9)
DF <- data.frame(ID,Tag,Value)
ID Tag Value
1 1 1 5
2 1 2 9
3 1 6 3
4 2 1 3
5 2 3 5
6 2 4 6
7 2 6 4
8 3 4 8
9 3 3 9
I would like to perform the following 1) group by rows ID 2) assign the Value corresponding to a specific Tag a new column. In the following example, I am assigning the Value of Tag 6 to a new column by ID
ID Tag Value New_Value
1 1 1 5 3
2 1 2 9 3
3 1 6 3 3
4 2 1 3 4
5 2 3 5 4
6 2 4 6 4
7 2 6 4 4
8 3 4 8 NA
9 3 3 9 NA
To the best of my knowledge, I need to subset the data in each group to get the Value for Tag 6. Here is my code and the error msg
DF %>% group_by(ID) %>% mutate(New_Value = select(filter(.,Tag==6),Value))
Adding missing grouping variables: `ID`
Error: Column `New_Value` is of unsupported class data.frame
Another possible solution is to create a new dataframe with IDs and Values for Tag 6 and join it with DF. However, I believe there is a better generic solution by only using dplyr.
I would appreciate it if you can help me understand how to perform a nested subset in this situation
Thank you
On the assumption that Tag is unique within groups, you could do:
library(dplyr)
DF %>%
group_by(ID) %>%
mutate(New_Value = ifelse(any(Tag == 6), Value[Tag == 6], NA))
# A tibble: 9 x 4
# Groups: ID [3]
ID Tag Value New_Value
<dbl> <dbl> <dbl> <dbl>
1 1 1 5 3
2 1 2 9 3
3 1 6 3 3
4 2 1 3 4
5 2 3 5 4
6 2 4 6 4
7 2 6 4 4
8 3 4 8 NA
9 3 3 9 NA

Generate data frame with parameters [duplicate]

This question already has answers here:
Fill missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 3 years ago.
I have a data frame of ids with number column
df <- read.table(text="
id nr
1 1
2 1
1 2
3 1
1 3
", header=TRUE)
I´d like to create new dataframe from it, where each id will have unique nr from df dataframe. As you may notice, id 3 have only nr 1, but no 2 and 3. So result should be.
result <- read.table(text="
id nr
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
", header=TRUE)
You can use expand.grid as:
library(dplyr)
result <- expand.grid(id = unique(df$id), nr = unique(df$nr)) %>%
arrange(id)
result
id nr
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
We can do:
tidyr::expand(df,id,nr)
# A tibble: 9 x 2
id nr
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3

Resources