How to merge subsequent values of specific column in table in r - r

I want to merge repetitive row with a specific content only.
let say I have following datafram
df:
user action
1 A
1 A
1 B
1 B
2 A
2 C
2 C
2 A
2 A
I want to merge only subsequent action A only.
so the result would be:
user action
1 A
1 B
1 B
2 A
2 C
2 C
2 A
how can I do it in R?
thx

As long as there are no other conditions to match, this will work with:
library(magrittr)
library(dplyr)
Start by creating a dummy column that tells us whether it's an immediate duplicate of the prior "A":
> df %>% group_by(user) %>%
mutate(condition=paste0(action,lag(action)==action))
# A tibble: 9 x 3
# Groups: user [2]
user action condition
<fct> <fct> <chr>
1 1 A ANA
2 1 A ATRUE
3 1 B BFALSE
4 1 B BTRUE
5 2 A ANA
6 2 C CFALSE
7 2 C CTRUE
8 2 A AFALSE
9 2 A ATRUE
Then you can filter out the rows within each user where A follows another A:
> df %>% group_by(user) %>%
mutate(condition=paste0(action,lag(action)==action)) %>%
filter(condition!="ATRUE")
# A tibble: 7 x 3
# Groups: user [2]
user action condition
<fct> <fct> <chr>
1 1 A ANA
2 1 B BFALSE
3 1 B BTRUE
4 2 A ANA
5 2 C CFALSE
6 2 C CTRUE
7 2 A AFALSE
You don't even have to reveal the dummy column because you can just filter out the rows that match "ATRUE" and then select the two variables you care about:
> df %>% group_by(user) %>%
mutate(condition=paste0(action,lag(action)==action)) %>%
filter(condition!="ATRUE") %>% select(user,action)
# A tibble: 7 x 2
# Groups: user [2]
user action
<fct> <fct>
1 1 A
2 1 B
3 1 B
4 2 A
5 2 C
6 2 C
7 2 A

Related

How to create new column of repeating sequence based on other column

I have a the following dataframe:
Participant_ID Order
1 A
1 A
2 B
2 B
3 A
3 A
4 B
4 B
5 B
5 B
6 A
6 A
Every two rows refer to the same participant. I want to create a new column based on the value in the column 'Order'. If the 'Order' == A, then I want it to create a new column with two rows of [1, 2], and then if the 'Order' == B, then I want it to create two rows of [2,1] in the same column
The preferred output would be the following:
Participant_ID Order Period
1 A 1
1 A 2
2 B 2
2 B 1
3 A 1
3 A 2
4 B 2
4 B 1
5 B 2
5 B 1
6 A 1
6 A 2
Any help would be appreciated
Here are a couple of possibilities. This assumes that Order value is same for a given Participant_ID. If this isn't the case, you will need to include additional logic.
You can use if_else:
library(tidyverse)
df %>%
group_by(Participant_ID) %>%
mutate(Period = if_else(Order == "A", 1:2, 2:1))
Or to explicitly check for multiple different values (e.g., "A", "B", etc.), have more flexibility, and include NA for other cases, you can use case_when:
df %>%
group_by(Participant_ID) %>%
mutate(Period = case_when(
Order == "A" ~ 1:2,
Order == "B" ~ 2:1,
TRUE ~ NA_integer_
))
Output
Participant_ID Order Period
<int> <chr> <int>
1 1 A 1
2 1 A 2
3 2 B 2
4 2 B 1
5 3 A 1
6 3 A 2
7 4 B 2
8 4 B 1
9 5 B 2
10 5 B 1
11 6 A 1
12 6 A 2

Add original values for columns after group by

For the dataframe below I want to add the original values for Var_x after a group_by on ID and event and a max() on quest, but I cannot get my code right. Any suggestions? By the way, in my original dataframe more than 1 column needs to be added.
df <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,3,3,3),
quest = c(1,1,2,2,3,3,1,2,3,1,2,3),
event = c("A","B","A","B","A",NA,"C","D","C","D","D",NA),
VAR_X = c(2,4,3,6,3,NA,6,4,5,7,5,NA))
Code:
df %>%
group_by(ID,event) %>%
summarise(quest = max(quest))
Desired output:
ID quest event VAR_X
1 1 2 B 6
2 1 3 A 3
3 2 2 D 4
4 2 3 C 5
5 3 2 D 5
Start by omiting the na values and in the end do an inner_join with the original data set.
df %>%
na.omit() %>%
group_by(ID, event) %>%
summarise(quest = max(quest)) %>%
inner_join(df, by = c("ID", "event", "quest"))
## A tibble: 5 x 4
## Groups: ID [3]
# ID event quest VAR_X
# <dbl> <fct> <dbl> <dbl>
#1 1 A 3 3
#2 1 B 2 6
#3 2 C 3 5
#4 2 D 2 4
#5 3 D 2 5
df %>%
drop_na() %>% # remove if necessary ..
group_by(ID, event) %>%
filter(quest == max(quest)) %>%
ungroup()
# A tibble: 5 x 4
# ID quest event VAR_X
#<dbl> <dbl> <chr> <dbl>
# 1 1 2 B 6
# 2 1 3 A 3
# 3 2 2 D 4
# 4 2 3 C 5
# 5 3 2 D 5

R add rows to grouped df using dplyr

I have a grouped df and I would like to add additional rows to the top of the groups that match with a variable (item_code) from the df.
The additional rows do not have an id column. The additional rows should not be duplicated within the groups of df.
Example data:
df <- as.tibble(data.frame(id=rep(1:3,each=2),
item_code=c("A","A","B","B","B","Z"),
score=rep(1,6)))
additional_rows <- as.tibble(data.frame(item_code=c("A","Z"),
score=c(6,6)))
What I tried
I found this post and tried to apply it:
Add row in each group using dplyr and add_row()
df %>% group_by(id) %>% do(add_row(additional_rows %>%
filter(item_code %in% .$item_code)))
What I get:
# A tibble: 9 x 3
# Groups: id [3]
id item_code score
<int> <fct> <dbl>
1 1 A 6
2 1 Z 6
3 1 NA NA
4 2 A 6
5 2 Z 6
6 2 NA NA
7 3 A 6
8 3 Z 6
9 3 NA NA
What I am looking for:
# A tibble: 6 x 3
id item_code score
<int> <fct> <dbl>
1 1 A 6
2 1 A 1
3 1 A 1
4 2 B 1
5 2 B 1
6 3 B 1
7 3 Z 6
8 3 Z 1
This should do the trick:
library(plyr)
df %>%
join(subset(df, item_code %in% additional_rows$item_code, select = c(id, item_code)) %>%
join(additional_rows) %>%
subset(!duplicated(.)), type = "full") %>%
arrange(id, item_code, -score)
Not sure if its the best way, but it works
Edit: to get the score in the same order added the other arrange terms
Edit 2: alright, there should now be no duplicated rows added from the additional rows as per your comment

Spread one column in multiple columns

I have one column "m" that contains multiple values associated with one subject (ID). I need to spread the values in this column in 5 different columns to obtain the second table that I provided below. I also need to associate names to those columns.
f <- read.table(header = TRUE, text = "
Scale ID m
1 1 1 0.4089795
2 1 1 0.001041055
3 1 1 0.1843616
4 1 1 0.03398921
5 1 1 FALSE
6 3 1 0.1179424
7 3 1 0.3569155
8 3 1 0.2006204
9 3 1 0.04024855
10 3 1 FALSE
")
Here's what the output should look like
ID Scale x y z a b
1 1 1 0.4089795 0.001041055 0.1843616 0.03398921 FALSE
2 1 3 0.1179424 0.356915500 0.2006204 0.04024855 FALSE
Thanks for any help!
df <- read.table(header = TRUE, text = "
Scale ID m
1 1 1 0.4089795
2 1 1 0.001041055
3 1 1 0.1843616
4 1 1 0.03398921
5 1 1 FALSE
6 3 1 0.1179424
7 3 1 0.3569155
8 3 1 0.2006204
9 3 1 0.04024855
10 3 1 FALSE
")
library(tidyverse)
df %>%
group_by(Scale, ID) %>% # for each combination of Scale and ID
mutate(names = c("x","y","z","a","b")) %>% # add column names
ungroup() %>% # forget the grouping
spread(-Scale, -ID) %>% # reshape data
select(Scale, ID, x, y, z, a, b) # order columns
# # A tibble: 2 x 7
# Scale ID x y z a b
# <int> <int> <fct> <fct> <fct> <fct> <fct>
# 1 1 1 0.4089795 0.001041055 0.1843616 0.03398921 FALSE
# 2 3 1 0.1179424 0.3569155 0.2006204 0.04024855 FALSE

Add missing subtotals to each group using dplyr

I need to add a new row to each id group where the key= "n" and value is the total - a + b
x <- data_frame( id = c(1,1,1,2,2,2,2),
key = c("a","b","total","a","x","b","total"),
value = c(1,2,10,4,1,3,12) )
# A tibble: 7 × 3
id key value
<dbl> <chr> <dbl>
1 1 a 1
2 1 b 2
3 1 total 10
4 2 a 4
5 2 x 1
6 2 b 3
7 2 total 12
In this example, the new rows should be
1 n 7
2 n 5
I tried getting the a+b subtotal and joining that to the total count to get the difference, but after using nine dplyr verbs I seem to be going in the wrong direction. Thanks.
This isn't a join, it's just binding new rows on:
x %>% group_by(id) %>%
summarize(
value = sum(value[key == 'total']) - sum(value[key %in% c('a', 'b')]),
key = 'n'
) %>%
bind_rows(x) %>%
select(id, key, value) %>% # back to original column order
arrange(id, key) # and a start a row order
# # A tibble: 9 × 3
# id key value
# <dbl> <chr> <dbl>
# 1 1 a 1
# 2 1 b 2
# 3 1 n 7
# 4 1 total 10
# 5 2 a 4
# 6 2 b 3
# 7 2 n 5
# 8 2 total 12
# 9 2 x 1
Here's a way using data.table, binding rows as in Gregor's answer:
library(data.table)
setDT(x)
dcast(x, id ~ key)[, .(id, key = "n", value = total - a - b)][, rbind(.SD, x)][order(id)]
id key value
1: 1 n 7
2: 1 a 1
3: 1 b 2
4: 1 total 10
5: 2 n 5
6: 2 a 4
7: 2 x 1
8: 2 b 3
9: 2 total 12

Resources