Conditional replacement of values in a column - r

I have the following:
ID Value1 Value2 Code
0001 3.3 432 A
0001 0 654 A
0001 0 63 A
0002 0 78 B
0002 1 98 B
0003 0 22 C
0003 0 65 C
0003 0 91 C
I need the following:
ID Value1 Value2 Code
0001 3.3 432 A
0001 0 0 A
0001 0 0 A
0002 0 0 B
0002 1 98 B
0003 0 22 C
0003 0 65 C
0003 0 91 C
i.e., for the same "Code" if there is at least one row with Value1 !=0 then all the other rows referred to the same Code will be set to 0 (meaning that 654 and 63 for 0001 relative to Value2 will be set to 0). If this is not the case (like for 0003 nothing will be done).
Can anyone help me please?
Thank you in advance

dplyr
library(dplyr)
quux %>%
group_by(Code) %>%
mutate(Value2 = if_else(abs(Value1) > 0 | !any(abs(Value1) > 0),
Value2, 0L)) %>%
ungroup()
# # A tibble: 8 x 4
# ID Value1 Value2 Code
# <int> <dbl> <int> <chr>
# 1 1 3.3 432 A
# 2 1 0 0 A
# 3 1 0 0 A
# 4 2 0 0 B
# 5 2 1 98 B
# 6 3 0 22 C
# 7 3 0 65 C
# 8 3 0 91 C
base R
quux |>
transform(Value2 = ifelse(ave(abs(Value1), Code, FUN = function(v) abs(v) > 0 | !any(abs(v) > 0)),
Value2, 0L))
# ID Value1 Value2 Code
# 1 1 3.3 432 A
# 2 1 0.0 0 A
# 3 1 0.0 0 A
# 4 2 0.0 0 B
# 5 2 1.0 98 B
# 6 3 0.0 22 C
# 7 3 0.0 65 C
# 8 3 0.0 91 C
data.table
library(data.table)
as.data.table(quux)[, Value2 := fifelse(abs(Value1) > 0 | !any(abs(Value1) > 0), Value2, 0L), by = Code][]
# ID Value1 Value2 Code
# <int> <num> <int> <char>
# 1: 1 3.3 432 A
# 2: 1 0.0 0 A
# 3: 1 0.0 0 A
# 4: 2 0.0 0 B
# 5: 2 1.0 98 B
# 6: 3 0.0 22 C
# 7: 3 0.0 65 C
# 8: 3 0.0 91 C
Data
quux <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), Value1 = c(3.3, 0, 0, 0, 1, 0, 0, 0), Value2 = c(432L, 654L, 63L, 78L, 98L, 22L, 65L, 91L), Code = c("A", "A", "A", "B", "B", "C", "C", "C")), class = "data.frame", row.names = c(NA, -8L))

This should do it:
df %>% group_by(Code) %>%
mutate(Value2 = if_else(row_number() == 1 & any(Value1 != 0), Value2, 0))
# A tibble: 8 × 4
# Groups: Code [3]
# ID Value1 Value2 Code
# <int> <dbl> <dbl> <fct>
# 1 1 3.3 432 A
# 2 1 0 0 A
# 3 1 0 0 A
# 4 2 0 78 B
# 5 2 1 0 B
# 6 3 0 0 C
# 7 3 0 0 C
# 8 3 0 0 C

We can use an if_else here. For example
library(dplyr)
dd %>%
group_by(ID) %>%
mutate(Value2=if_else(any(Value1!=0) & Value1==0, 0L, Value2))
Basically we use any() to check for non-zero values and then replace with 0s if one is found.

Related

Q About Converting the Format of Data Frame in R

May I know how to convert the format of this data frame? This is a participant who took three tests (A, B, C) two times (0,2) on two words (Word_id: 201, 202), with the scores on each time coded as 0 or 1.
I would like to covert my data frame like this, with "Time" occurring as "0, 0,0, 2, 2, 2".
Participant Time Measure Word_ID Score
100 0 A 201 0
100 0 B 201 1
100 0 C 201 0
100 2 A 201 1
100 2 B 201 1
100 2 C 201 1
100 0 A 202 0
100 0 B 202 0
100 0 C 202 0
100 2 A 202 1
100 2 B 202 1
100 2 C 202 1
But my current data frame looks like this. May I have your suggestions? Thank you very much.
Participant Time Measure 201 202
100 0 A 0 0
100 0 B 1 0
100 0 C 0 0
100 2 A 1 1
100 2 B 1 1
100 2 C 1 1
Reading your data as df like
df <- read.table(text = " Participant Time Measure 201 202
100 0 A 0 0
100 0 B 1 0
100 0 C 0 0
100 2 A 1 1
100 2 B 1 1
100 2 C 1 1", header = T)
In this case, column name 201 and 202 become X201 and X202.
library(dplyr)
library(stringr)
library(reshape2)
df %>%
reshape2::melt(id = c('Participant', 'Time', 'Measure'),
variable.name = "Word_ID",
value.name = "Score") %>%
mutate(Word_ID = str_remove(Word_ID, "X"))
Participant Time Measure Word_ID Score
1 100 0 A 201 0
2 100 0 B 201 1
3 100 0 C 201 0
4 100 2 A 201 1
5 100 2 B 201 1
6 100 2 C 201 1
7 100 0 A 202 0
8 100 0 B 202 0
9 100 0 C 202 0
10 100 2 A 202 1
11 100 2 B 202 1
12 100 2 C 202 1
You can use pivot_longer from tidyr:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(`201`:`202`, names_to = "Word_ID", values_to = "Score") %>%
arrange(Participant, Word_ID)
Output
Participant Time Measure Word_ID Score
<int> <int> <chr> <chr> <int>
1 100 0 A 201 0
2 100 0 B 201 1
3 100 0 C 201 0
4 100 2 A 201 1
5 100 2 B 201 1
6 100 2 C 201 1
7 100 0 A 202 0
8 100 0 B 202 0
9 100 0 C 202 0
10 100 2 A 202 1
11 100 2 B 202 1
12 100 2 C 202 1
Data
df <- structure(list(Participant = c(100L, 100L, 100L, 100L, 100L,
100L), Time = c(0L, 0L, 0L, 2L, 2L, 2L), Measure = c("A", "B",
"C", "A", "B", "C"), `201` = c(0L, 1L, 0L, 1L, 1L, 1L), `202` = c(0L,
0L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-6L))

Add frequency into dataframe for each group and unique element (R)

I have a table such as
Group Family Nb
1 A 15
2 B 20
3 A 2
3 B 1
3 C 1
4 D 10
4 A 5
5 B 1
5 D 1
And I would like to transform that dataframe such that I have each unique Family element in columns, and for each Group, the frequency of the Nb element, I should then get :
Group A B C D E F
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0.5 0.25 0.25 0 0 0
4 0.33 0 0 0.67 0 0
5 0 0.5 0 0.5 0 0
Here is the dput format of the tabel if it can helps :
Family = c("A", "B", "A", "B", "C", "D", "A", "B", "D"),
Nb = c(15L, 20L, 2L, 1L, 1L, 10L, 5L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-9L))
in Base R:
prop.table(xtabs(Nb ~ ., df), 1)
# Family
#Group A B C D
# 1 1.0000000 0.0000000 0.0000000 0.0000000
# 2 0.0000000 1.0000000 0.0000000 0.0000000
# 3 0.5000000 0.2500000 0.2500000 0.0000000
# 4 0.3333333 0.0000000 0.0000000 0.6666667
# 5 0.0000000 0.5000000 0.0000000 0.5000000
If you need it as a data.frame, just wrap the results in as.data.frame.matrix
You can first group_by the Group column, then calculate the frequency and finally pivot the data into a "wide" format.
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(Nb = Nb/sum(Nb)) %>%
pivot_wider(Group, names_from = "Family", values_from = "Nb", values_fill = 0)
# A tibble: 5 × 5
# Groups: Group [5]
Group A B C D
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0.5 0.25 0.25 0
4 4 0.333 0 0 0.667
5 5 0 0.5 0 0.5
Another possible solution:
library(tidyverse)
df %>%
pivot_wider(names_from = Family, values_from = Nb, values_fill = 0) %>%
mutate(aux = rowSums(.[-1]), across(-Group, ~ .x / aux), aux = NULL)
#> # A tibble: 5 × 5
#> Group A B C D
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0
#> 2 2 0 1 0 0
#> 3 3 0.5 0.25 0.25 0
#> 4 4 0.333 0 0 0.667
#> 5 5 0 0.5 0 0.5

dplyr: add numbers based on matching rows

Let's say I have
> fig
hands imp_spe n
1 A 0 39
2 A 1 32
3 B 0 3
4 B 1 2
5 C 0 115
6 C 1 24
7 D 0 11
8 D 1 3
I want to add a new column fig$new, that adds numbers in fig$n, but only when rows in fig$hands are matching.
I need to keep the dataframe as it is.
Expected output
> fig
hands imp_spe n new
1 A 0 39 71
2 A 1 32 71
3 B 0 3 5
4 B 1 2 5
5 C 0 115 139
6 C 1 24 139
7 D 0 11 14
8 D 1 3 14
I am looking for a solution in dplyr
fig <- structure(list(hands = c("A", "A", "B", "B", "C", "C", "D", "D"
), imp_spe = c(0, 1, 0, 1, 0, 1, 0, 1), n = c(39L, 32L, 3L, 2L,
115L, 24L, 11L, 3L)), row.names = c(NA, -8L), class = "data.frame")
here you go
library(dplyr)
fig %>%
group_by(hands) %>%
mutate(new = sum(n)) %>%
ungroup
dplyr solution:
dplyr::add_count(fig, hands, wt = n, name = 'new')
# hands imp_spe n new
# 1 A 0 39 71
# 2 A 1 32 71
# 3 B 0 3 5
# 4 B 1 2 5
# 5 C 0 115 139
# 6 C 1 24 139
# 7 D 0 11 14
# 8 D 1 3 14
base solution:
transform(
fig,
new = ave(x = n, hands, FUN = sum)
)
# hands imp_spe n new
# 1 A 0 39 71
# 2 A 1 32 71
# 3 B 0 3 5
# 4 B 1 2 5
# 5 C 0 115 139
# 6 C 1 24 139
# 7 D 0 11 14
# 8 D 1 3 14

By ID, Identify Highest Value, then assign that to all sharing ID

I have the following table, in each EVENTID, there are several PERSONID:
PERSONID EVENTID INJURYSCORE DIABETES
222 A734 3 0
353 A734 4 1
45 B823 5 1
423 B283 2 1
232 B283 1 0
432 Y821 1 0
How do I make two new variables:
maxscore - which, per EVENTID marks a 1 to the PERSONID with the highest INJURYSCORE
maxdiabetes - per EVENTID, if any of the PERSONID have diabetes (diabetes = 1), a 1 is assigned to all other PERSONID in that EVENTID
Here is a base R option using ave within transform, e.g.,
transform(
df,
maxscore = +(ave(INJURYSCORE,EVENTID,FUN = max)==INJURYSCORE),
maxdiabetes = ave(DIABETES,EVENTID,FUN = any)
)
which gives
PERSONID EVENTID INJURYSCORE DIABETES maxscore maxdiabetes
1 222 A734 3 0 0 1
2 353 A734 4 1 1 1
3 45 B823 5 1 1 1
4 423 B283 2 1 1 1
5 232 B283 1 0 0 1
6 432 Y821 1 0 1 0
We can use as.integer
library(dplyr)
df1 %>%
group_by(EVENTID) %>%
mutate(maxscore = as.integer(INJURYSCORE == max(INJURYSCORE)),
maxidiabetes = as.integer(any(DIABETES > 0)))
-output
# A tibble: 6 x 6
# Groups: EVENTID [4]
# PERSONID EVENTID INJURYSCORE DIABETES maxscore maxidiabetes
# <int> <chr> <int> <int> <int> <int>
#1 222 A734 3 0 0 1
#2 353 A734 4 1 1 1
#3 45 B823 5 1 1 1
#4 423 B283 2 1 1 1
#5 232 B283 1 0 0 1
#6 432 Y821 1 0 1 0
data
df1 <- structure(list(PERSONID = c(222L, 353L, 45L, 423L, 232L, 432L
), EVENTID = c("A734", "A734", "B823", "B283", "B283", "Y821"
), INJURYSCORE = c(3L, 4L, 5L, 2L, 1L, 1L), DIABETES = c(0L,
1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-6L))
tidyverse
library(dplyr)
dat %>%
group_by(EVENTID) %>%
mutate(
maxscore = +(INJURYSCORE == max(INJURYSCORE)),
maxdiabetes = +any(DIABETES > 0)
) %>%
ungroup()
# # A tibble: 6 x 6
# PERSONID EVENTID INJURYSCORE DIABETES maxscore maxdiabetes
# <int> <chr> <int> <int> <int> <int>
# 1 222 A734 3 0 0 1
# 2 353 A734 4 1 1 1
# 3 45 B823 5 1 1 1
# 4 423 B283 2 1 1 1
# 5 232 B283 1 0 0 1
# 6 432 Y821 1 0 1 0
data.table
library(data.table)
datDT <- as.data.table(dat)
datDT[, maxscore := +(INJURYSCORE == max(INJURYSCORE)), by = EVENTID
][, maxdiabetes := +any(DIABETES > 0), by = EVENTID ][]
# PERSONID EVENTID INJURYSCORE DIABETES maxscore maxdiabetes
# 1: 222 A734 3 0 0 1
# 2: 353 A734 4 1 1 1
# 3: 45 B823 5 1 1 1
# 4: 423 B283 2 1 1 1
# 5: 232 B283 1 0 0 1
# 6: 432 Y821 1 0 1 0
Data
dat <- read.table(header = TRUE, text = "
PERSONID EVENTID INJURYSCORE DIABETES
222 A734 3 0
353 A734 4 1
45 B823 5 1
423 B283 2 1
232 B283 1 0
432 Y821 1 0")

How do I change subsequent row values if some condition is met with multiple groups?

I have a dataframe that looks like this:
ID value condition
A 0 0
A 3 0
A 0 1
A 7 1
A 5 0
A 5 0
A 5 0
A 7 0
B 6 0
B 2 1
B 7 0
B 10 1
B 0 0
B 6 0
I want to change the ID name when the condition is met and change the name of the ID that follows too. The condition can be met multiple times per ID so I'd like to modify it each time.
The result would change the original ID or just add a new column:
ID value condition newID
A 0 0 A
A 3 0 A
A 0 1 A1
A 7 1 A1
A 5 0 A2
A 5 0 A2
A 5 0 A2
A 7 0 A2
B 6 0 B
B 2 1 B1
B 7 0 B2
B 10 1 B3
B 0 0 B4
B 6 0 B4
One option after grouping by 'ID', create the index with rleid (from data.table) and change it to paste the 'ID' based on the condition with case_when
library(dplyr)
library(data.table)
df1 %>%
group_by(ID) %>%
mutate(newID = rleid(condition)-1,
newID = case_when(newID == 0 ~ first(ID), TRUE ~ paste0(first(ID), newID)))
# A tibble: 14 x 4
# Groups: ID [2]
# ID value condition newID
# <chr> <int> <int> <chr>
# 1 A 0 0 A
# 2 A 3 0 A
# 3 A 0 1 A1
# 4 A 7 1 A1
# 5 A 5 0 A2
# 6 A 5 0 A2
# 7 A 5 0 A2
# 8 A 7 0 A2
# 9 B 6 0 B
#10 B 2 1 B1
#11 B 7 0 B2
#12 B 10 1 B3
#13 B 0 0 B4
#14 B 6 0 B4
data
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"), value = c(0L, 3L, 0L, 7L, 5L,
5L, 5L, 7L, 6L, 2L, 7L, 10L, 0L, 6L), condition = c(0L, 0L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame",
row.names = c(NA, -14L))
Could also do:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(newID = cumsum(c(0, (condition != lag(condition))[-1])),
newID = ifelse(newID != 0, paste0(ID, newID), ID))
Output:
# A tibble: 14 x 4
# Groups: ID [2]
ID value condition newID
<chr> <int> <int> <chr>
1 A 0 0 A
2 A 3 0 A
3 A 0 1 A1
4 A 7 1 A1
5 A 5 0 A2
6 A 5 0 A2
7 A 5 0 A2
8 A 7 0 A2
9 B 6 0 B
10 B 2 1 B1
11 B 7 0 B2
12 B 10 1 B3
13 B 0 0 B4
14 B 6 0 B4
Same idea as #akrun but using only data.table
library(data.table)
setDT(df)
df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
# ID value condition newID
# 1: A 0 0 A
# 2: A 3 0 A
# 3: A 0 1 A1
# 4: A 7 1 A1
# 5: A 5 0 A2
# 6: A 5 0 A2
# 7: A 5 0 A2
# 8: A 7 0 A2
# 9: B 6 0 B
# 10: B 2 1 B1
# 11: B 7 0 B2
# 12: B 10 1 B3
# 13: B 0 0 B4
# 14: B 6 0 B4
If I understand correctly, the OP wants to create subgroups within each ID for each contiguous streak of condition.
Unfortunately, the OP has requested to name the subgroups in a special way which makes the solutions overly complicated. By OP's request, the subgroups are to be named, e.g., A, A1, A2 which means that subgroup numbering and subgroup name is off by one, e.g., the second subgroup is named A1, the third one A2, etc.
In case a more simplified naming scheme would be acceptable, we can directly benefit from the prefix parameter of the rleid() function. Then, the first subgroup of group A will be named A1, the second A2, etc.
dplyr
library(dplyr)
df %>%
group_by(ID) %>%
mutate(newID = data.table::rleid(condition, prefix = first(ID)))
# A tibble: 14 x 4
# Groups: ID [2]
ID value condition newID
<chr> <int> <int> <chr>
1 A 0 0 A1
2 A 3 0 A1
3 A 0 1 A2
4 A 7 1 A2
5 A 5 0 A3
6 A 5 0 A3
7 A 5 0 A3
8 A 7 0 A3
9 B 6 0 B1
10 B 2 1 B2
11 B 7 0 B3
12 B 10 1 B4
13 B 0 0 B5
14 B 6 0 B5
data.table
library(data.table)
setDT(df)[, newID := rleid(condition, prefix = ID), ID][]
ID value condition newID
1: A 0 0 A1
2: A 3 0 A1
3: A 0 1 A2
4: A 7 1 A2
5: A 5 0 A3
6: A 5 0 A3
7: A 5 0 A3
8: A 7 0 A3
9: B 6 0 B1
10: B 2 1 B2
11: B 7 0 B3
12: B 10 1 B4
13: B 0 0 B5
14: B 6 0 B5
Data
library(data.table)
df <- fread("ID value condition
A 0 0
A 3 0
A 0 1
A 7 1
A 5 0
A 5 0
A 5 0
A 7 0
B 6 0
B 2 1
B 7 0
B 10 1
B 0 0
B 6 0")

Resources