I want to create a column (in R) that indexes the presence of a number in another column grouped by a user_id column. And when the other column is NA, the new desired column should not increase.
The example should bring clarity.
I have this df:
data <- data.frame(user_id = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
one=c(1,NA,3,2,NA,0,NA,4,3,4,NA))
user_id tobeindexed
1 1 1
2 1 NA
3 1 3
4 2 2
5 2 NA
6 2 0
7 2 NA
8 3 4
9 3 3
10 3 4
11 3 NA
I want to make a new column looking like "desired" in the following df:
> cbind(data,data.frame(desired = c(1,1,2,1,1,2,2,1,2,3,3)))
user_id tobeindexed desired
1 1 1 1
2 1 NA 1
3 1 3 2
4 2 2 1
5 2 NA 1
6 2 0 2
7 2 NA 2
8 3 4 1
9 3 3 2
10 3 4 3
11 3 NA 3
How can I solve this?
Using colsum and group_by gets me close, but the count does not start over from 1 when the user_id changes...
> data %>% group_by(user_id) %>% mutate(desired = cumsum(!is.na(tobeindexed)))
user_id tobeindexed desired
<dbl> <dbl> <int>
1 1 1 1
2 1 NA 1
3 1 3 2
4 2 2 3
5 2 NA 3
6 2 0 4
7 2 NA 4
8 3 4 5
9 3 3 6
10 3 4 7
11 3 NA 7
Given the sample data you provided (with the one) column, this works unchanged. The code is retained below for demonstration.
base R
data$out <- ave(data$one, data$user_id, FUN = function(z) cumsum(!is.na(z)))
data
# user_id one out
# 1 1 1 1
# 2 1 NA 1
# 3 1 3 2
# 4 2 2 1
# 5 2 NA 1
# 6 2 0 2
# 7 2 NA 2
# 8 3 4 1
# 9 3 3 2
# 10 3 4 3
# 11 3 NA 3
dplyr
library(dplyr)
data %>%
group_by(user_id) %>%
mutate(out = cumsum(!is.na(one))) %>%
ungroup()
# # A tibble: 11 × 3
# user_id one out
# <dbl> <dbl> <int>
# 1 1 1 1
# 2 1 NA 1
# 3 1 3 2
# 4 2 2 1
# 5 2 NA 1
# 6 2 0 2
# 7 2 NA 2
# 8 3 4 1
# 9 3 3 2
# 10 3 4 3
# 11 3 NA 3
I need to shift valid values to the top the of dataframe withing each id. Here is an example dataset:
df <- data.frame(id = c(1,1,1,2,2,2,3,3,3,3),
itemid = c(1,2,3,1,2,3,1,2,3,4),
values = c(1,NA,0,NA,NA,0,1,NA,0,NA))
df
id itemid values
1 1 1 1
2 1 2 NA
3 1 3 0
4 2 1 NA
5 2 2 NA
6 2 3 0
7 3 1 1
8 3 2 NA
9 3 3 0
10 3 4 NA
excluding the id column, when there is a missing value in values column, I want to shift all values aligned to the top for each id.
How can I get this desired dataset below?
df1
id itemid values
1 1 1 1
2 1 2 0
3 1 3 NA
4 2 1 0
5 2 2 NA
6 2 3 NA
7 3 1 1
8 3 2 0
9 3 3 NA
10 3 4 NA
Using tidyverse you can arrange by whether values is missing or not (which will put those at the bottom).
library(tidyverse)
df %>%
arrange(id, is.na(values))
Output
id itemid values
<dbl> <dbl> <dbl>
1 1 1 1
2 1 3 0
3 1 2 NA
4 2 3 0
5 2 1 NA
6 2 2 NA
7 3 1 1
8 3 3 0
9 3 2 NA
10 3 4 NA
Or, if you wish to retain the same order for itemid and other columns, you can use mutate to specifically order columns of interest (like values). Other answers provide good solutions, such as #Santiago and #ThomasIsCoding. If you have multiple columns of interest to move NA to the bottom per group, you can also try:
df %>%
group_by(id) %>%
mutate(across(.cols = values, ~values[order(is.na(.))]))
where the .cols argument would contain the columns to transform and reorder independently.
Output
id itemid values
<dbl> <dbl> <dbl>
1 1 1 1
2 1 2 0
3 1 3 NA
4 2 1 0
5 2 2 NA
6 2 3 NA
7 3 1 1
8 3 2 0
9 3 3 NA
10 3 4 NA
We can try ave + order
> transform(df, values = ave(values, id, FUN = function(x) x[order(is.na(x))]))
id itemid values
1 1 1 1
2 1 2 0
3 1 3 NA
4 2 1 0
5 2 2 NA
6 2 3 NA
7 3 1 1
8 3 2 0
9 3 3 NA
10 3 4 NA
With data.table:
library(data.table)
setDT(df)[, values := values[order(is.na(values))], id][]
#> id itemid values
#> 1: 1 1 1
#> 2: 1 2 0
#> 3: 1 3 NA
#> 4: 2 1 0
#> 5: 2 2 NA
#> 6: 2 3 NA
#> 7: 3 1 1
#> 8: 3 2 0
#> 9: 3 3 NA
#> 10: 3 4 NA
I'd define a function that does what you want and then group by id:
completed_first <- function(x) {
completed <- x[!is.na(x)]
length(completed) <- length(x)
completed
}
library(dplyr)
df %>%
group_by(id) %>%
mutate(
values = completed_first(values)
) %>%
ungroup()
# # A tibble: 10 × 3
# id itemid values
# <dbl> <dbl> <dbl>
# 1 1 1 1
# 2 1 2 0
# 3 1 3 NA
# 4 2 1 0
# 5 2 2 NA
# 6 2 3 NA
# 7 3 1 1
# 8 3 2 0
# 9 3 3 NA
# 10 3 4 NA
(This method preserves the order of itemid.)
Or building upon ThomasIsCoding's answer:
library(dplyr)
df %>%
group_by(id) %>%
mutate(
values = values[order(is.na(values))]
) %>%
ungroup()
# # A tibble: 10 × 3
# id itemid values
# <dbl> <dbl> <dbl>
# 1 1 1 1
# 2 1 2 0
# 3 1 3 NA
# 4 2 1 0
# 5 2 2 NA
# 6 2 3 NA
# 7 3 1 1
# 8 3 2 0
# 9 3 3 NA
# 10 3 4 NA
I have a very complex problem, i hope someone can help -> i want to copy a row value (i.e. Player 1 or Player 2) into two other rows (for Player 3 and 4) if and only if these players are in the same Treatment, Group and Period AND this player was indeed picked (see column Player.Picked)
I know that with tidyverse I can group_by my columns of interest: Treatment, Group, and Period.
However, I am unsure how to proceed with the condition that Player Picked is fulfilled and then how to extract this value appropriately for the players 3 and 4 in the same treatment, group, period.
The column "extracted.Player 1/2 Value" should be the output. (I have manually provided the first four correct solutions).
Any ideas? Help would be very much appreciated. Thanks a lot in advance!
df
T Player Group Player.Picked Period Player1/2Value extracted.Player1/2Value
1 1 6 1 1 10
1 2 6 1 1 9
1 3 5 2 1 NA -> 4
1 4 6 1 1 NA -> 10
1 5 3 1 1 NA
1 1 5 2 1 8
1 2 1 0 1 7
1 3 6 1 1 NA -> 10
1 4 2 2 1 NA
1 5 2 2 1 NA
1 1 1 0 1 7
1 2 2 2 1 11
1 3 3 1 1 NA
1 4 4 1 1 NA
1 5 4 1 1 NA
1 1 2 2 1 21
1 2 4 1 1 17
1 3 1 0 1 NA
1 4 5 2 1 NA -> 4
1 5 6 1 1 NA
1 1 3 1 1 12
1 2 3 1 1 15
1 3 4 1 1 NA
1 4 1 0 1 NA
1 5 1 0 1 NA
1 1 4 1 1 11
1 2 5 2 1 4
1 3 2 2 1 NA
1 4 3 1 1 NA
1 5 5 2 1 NA
I'm not sure if I understood the required logic; here I'm assuming that Player 5 always picks Player 1 or 2 per Group.
So, here is my go at this using library(data.table):
library(data.table)
DT <- data.table::data.table(
check.names = FALSE,
T = c(1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L),
Player = c(1L,2L,3L,
4L,5L,1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,
1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,1L,
2L,3L,4L,5L),
Group = c(6L,6L,5L,
6L,3L,5L,1L,6L,2L,2L,1L,2L,3L,4L,4L,
2L,4L,1L,5L,6L,3L,3L,4L,1L,1L,4L,
5L,2L,3L,5L),
Player.Picked = c(1L,1L,2L,
1L,1L,2L,0L,1L,2L,2L,0L,2L,1L,1L,1L,
2L,1L,0L,2L,1L,1L,1L,1L,1L,0L,0L,
1L,2L,2L,2L),
Period = c(1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L),
`Player1/2Value` = c(10L,9L,NA,
NA,NA,8L,7L,NA,NA,NA,7L,11L,NA,NA,
NA,21L,17L,NA,NA,NA,12L,15L,NA,NA,NA,
11L,4L,NA,NA,NA),
`extracted.Player1/2Value` = c(NA,NA,4L,
10L,NA,NA,NA,10L,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,4L,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA)
)
setorderv(DT, cols = c("T", "Group", "Period", "Player"))
Player5PickedDT <- DT[Player == 5, Player.Picked, by = c("T", "Group", "Period")]
setnames(Player5PickedDT, old = "Player.Picked", new = "Player5Picked")
DT <- DT[Player5PickedDT, on = c("T", "Group", "Period")]
extractedDT <- DT[Player == Player5Picked & Player5Picked > 0, `Player1/2Value`, by = c("T", "Group", "Period")]
setnames(extractedDT, old = "Player1/2Value", new = "extractedValue")
DT[, "Player5Picked" := NULL]
DT <- extractedDT[DT, on = c("T", "Group", "Period")]
DT[, extractedValue := fifelse(Player %in% c(3, 4), yes = extractedValue, no = NA_real_)]
setcolorder(DT, c("T", "Group", "Period", "Player", "Player.Picked", "Player1/2Value", "extracted.Player1/2Value", "extractedValue"))
DT
The resulting table differs from your expected result (extracted.Player1/2Value vs extractedValue, but in my eyes it is following the explained logic):
T Group Period Player Player.Picked Player1/2Value extracted.Player1/2Value extractedValue
1: 1 1 1 1 0 7 NA NA
2: 1 1 1 2 0 7 NA NA
3: 1 1 1 3 0 NA NA NA
4: 1 1 1 4 1 NA NA NA
5: 1 1 1 5 0 NA NA NA
6: 1 2 1 1 2 21 NA NA
7: 1 2 1 2 2 11 NA NA
8: 1 2 1 3 2 NA NA 11
9: 1 2 1 4 2 NA NA 11
10: 1 2 1 5 2 NA NA NA
11: 1 3 1 1 1 12 NA NA
12: 1 3 1 2 1 15 NA NA
13: 1 3 1 3 1 NA NA 12
14: 1 3 1 4 2 NA NA 12
15: 1 3 1 5 1 NA NA NA
16: 1 4 1 1 0 11 NA NA
17: 1 4 1 2 1 17 NA NA
18: 1 4 1 3 1 NA NA 11
19: 1 4 1 4 1 NA NA 11
20: 1 4 1 5 1 NA NA NA
21: 1 5 1 1 2 8 NA NA
22: 1 5 1 2 1 4 NA NA
23: 1 5 1 3 2 NA 4 4
24: 1 5 1 4 2 NA 4 4
25: 1 5 1 5 2 NA NA NA
26: 1 6 1 1 1 10 NA NA
27: 1 6 1 2 1 9 NA NA
28: 1 6 1 3 1 NA 10 10
29: 1 6 1 4 1 NA 10 10
30: 1 6 1 5 1 NA NA NA
T Group Period Player Player.Picked Player1/2Value extracted.Player1/2Value extractedValue