I am trying to create a new variable out of two variables in the same dataframe (df), like those below. The categories are mutually exclusive.
VAR1 VAR2
1 1
2 2
6 6
1 = yes
2 = no
6 = did not answer
The script I have tried to get the combined variable, but is not working is below:
if (df$VAR1 == 1) {
df$combo = 1
} else if (df$VAR2 == 1) {
df$combo = 2
} else if ((df$VAR1 == 2) & (df$VAR2 == 2)) {
df$combo = 3
} else if ((df$VAR1 == 6) & (df$VAR2 == 6)) {
df$combo = 6
}
Any pointers will be appreciated.
You may try
for (i in 1:nrow(df)){
if (df$VAR1[i] == 1) {
df$combo[i] = 1
} else if (df$VAR2[i] == 1) {
df$combo[i] = 2
} else if ((df$VAR1[i] == 2) & (df$VAR2[i] == 2)) {
df$combo[i] = 3
} else if ((df$VAR1[i] == 6) & (df$VAR2[i] == 6)) {
df$combo[i] = 6
}
}
VAR1 VAR2 combo
1 1 1 1
2 2 2 3
3 6 6 6
Or use dplyr
library(dplyr)
df %>%
mutate(combo = case_when(
VAR1 == 1 ~ 1,
VAR2 == 1 ~ 2,
(VAR1 == 2 & VAR2 == 2) ~ 3,
(VAR1 == 6 & VAR2 == 6) ~ 6,
TRUE ~ NA_real_
))
VAR1 VAR2 combo
1 1 1 1
2 2 2 3
3 6 6 6
Related
I am trying to combine treatment allocations for patients who completed two different randomisation forms. I can simulate some example data here:
data <- data.frame(id = 1:100,
trt_a = factor(c(sample(0:1, 50, TRUE), rep(NA, 50))),
trt_b = factor(c(sample(0:1, 50, TRUE), rep(NA, 50))),
trt_ab = factor(c(rep(NA, 50), sample(c("a", "b", "ab", "neither"), 50, TRUE))))
Is there any way of creating a new column with the same factor levels as trt_ab? Half the patients had choice of either trt_a or trt_b, and the other half had choice trt_ab. I want to use some sort of case_when statement to generate a new column with the actual treatment choices:
data %>%
mutate(trt = case_when(trt_a == 0 & trt_b == 0 ~ "neither",
trt_a == 1 & trt_b == 0 ~ "a",
trt_a == 0 & trt_b == 1 ~ "b",
trt_a == 1 & trt_b == 1 ~ "ab",
!is.na(trt_ab) ~ trt_ab))
However, when any of the columns are factors, I get the following error:
Error in `mutate()`:
! Problem while computing `trt = case_when(...)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
data %>%
mutate(trt = case_when(trt_a == 0 & trt_b == 0 ~ "neither",
trt_a == 1 & trt_b == 0 ~ "a",
trt_a == 0 & trt_b == 1 ~ "b",
trt_a == 1 & trt_b == 1 ~ "ab",
!is.na(trt_ab) ~ trt_ab)) %>% head
-output
id trt_a trt_b trt_ab trt
1 1 0 0 <NA> neither
2 2 0 0 <NA> neither
3 3 1 1 <NA> ab
4 4 1 1 <NA> ab
5 5 0 1 <NA> b
6 6 1 1 <NA> ab
I am trying to print specific observations from a data frame. Consider this simple example:
df <- data.frame(ID = c(1,1,1,2,2,2,3,3),
Week = c(1,2,2,1,1,2,1,1),
Y = c(4,2,6,7,5,3,1,9))
I would like to (only) print the rows where (ID = 1 & Week = 2), (ID = 2 & Week = 1) as well as (ID = 3 & Week = 1), giving this output:
rbind(df[(df$ID == 1) & (df$Week == 2),],
df[(df$ID == 2) & (df$Week == 1),],
df[(df$ID == 3) & (df$Week == 1),])
The values to be used for indexing are stored in a vector for each variable:
IDidx <- c(1,2,3)
Weekidx <- c(2,1,1)
Is there any solution that takes these vectors and indexes element-wise from them as I have done it "manually" using rbind()?
Thanks for your help!
We can create a data frame based on IDidx and Weekidx, and then use the semi_join from the dplyr package.
inx <- data.frame(ID = IDidx, Week = Weekidx)
library(dplyr)
df %>% semi_join(inx, by = c("ID", "Week"))
# ID Week Y
# 1 1 2 2
# 2 1 2 6
# 3 2 1 7
# 4 2 1 5
# 5 3 1 1
# 6 3 1 9
I have a dataset that looks like this:
Group ID
UP 1
UP 1
UP 2
UP 2
UP 2
UP 1
UP 1
UP 2
UP 2
UP 1
UP 1
Is there any way to see how many times a 1 is under a 1 in the ID column?
Does this work:
library(dplyr)
df %>% mutate(flag = case_when(ID == 1 & lag(ID) == 1 ~ 1, TRUE ~ 0)) %>% pull(flag) %>% sum
[1] 3
Base R :
sum(df$ID == 1 & c(tail(df$ID, -1), NA) == 1, na.rm = TRUE)
#[1] 3
You can also use dplyr::lag and data.table::shift
sum(df$ID == 1 & dplyr::lag(df$ID) == 1, na.rm = TRUE)
sum(df$ID == 1 & data.table::shift(df$ID) == 1, na.rm = TRUE)
I would like to select rows of a data frame based on conditions on two columns that should identify a unique row. In the concrete example below I would like to select
id=1,2,3... with a specific mtry value specified in a vector, i.e. For id=1, I just want the first line with mtry=3, for id=2 I would like mtry=5.
I tried using group_by and using filter e.g.
filter(df, (mtry,id) %in% c([3,1],[5,2],[3,3]))
but this gives an error
Error: unexpected ',' in .
What is the tidyverse way of doing this?
You can do this kind of filter with an inner join
library(dplyr)
df %>%
inner_join(tibble(mtry = c(3, 5, 3), id = c(1, 2, 3)))
Example:
set.seed(100)
df <- data.frame(mtry = sample(1:3, 100, T), id = sample(1:5, 100, T))
df %>%
inner_join(tibble(mtry = c(3, 5, 3), id = c(1, 2, 3)))
# Joining, by = c("mtry", "id")
# mtry id
# 1 3 1
# 2 3 3
# 3 3 3
# 4 3 3
# 5 3 1
# 6 3 3
# 7 3 1
# 8 3 1
# 9 3 1
# 10 3 3
# 11 3 1
# 12 3 3
# 13 3 1
You need to create different conditions for each combination
subset(df, (mtry == 3 & id == 1) | (mtry == 5 & id == 2) | (mtry == 3 & id == 3))
Or if you want tidyverse put the conditions in filter
library(dplyr)
df %>% filter((mtry == 3 & id == 1) | (mtry == 5 & id == 2) | (mtry == 3 & id == 3))
You can combine condition 1 and 3 to do
df %>% filter((mtry == 3 & id %in% c(1, 3)) | (mtry == 5 & id == 2))
I have data like so:
ID membership AdultChild
1 1 A
2 1 A
3 2 A
4 2 C
5 2 C
6 3 A
7 3 A
: : :
I want to group by membership and apply a 'code' after counting the AdultChild variable i.e.
ID membership AdultChild code
1 1 A x1
2 1 A x1
3 2 A x2
4 2 C x2
5 2 C x2
6 3 A x1
7 3 A x1
: : : :
I will have conditions similar to:
count <- function(x){
if(sum(x == "A") == 2 && sum(x == "C") == 0){
code <<- x1
}else if (sum(x == "A") == 1 & sum(x == "C") >= 1){
code <<- x2
}else {
code <<- X3
}
I have tried using dplyr to group and mutate, using the function above to add a new variable called code. I also thought about using the aggregate function but didn't have much luck.
df.2 <- df %>% group_by(membership)
%>% mutate(n = count(AdultChild)) %>%
ungroup()
df.2 <- aggregate.data.frame(df, by = membership, FUN =
count(df$AdultChild))
Basically, I want a new variable which be decided using certain conditions and applied to each ID when grouped by membership.
Thanks in advance.
library(dplyr)
df %>% group_by(membership) %>%
mutate(code=case_when(
sum(AdultChild=='A', na.rm = T)==2 & sum(AdultChild=='C', na.rm = T)==0 ~ 'X1',
sum(AdultChild=='A', na.rm = T)==1 & sum(AdultChild=='C', na.rm = T)>=1 ~ 'X2',
TRUE ~ 'X3'
))
# A tibble: 7 x 4
# Groups: membership [3]
ID membership AdultChild code
<int> <int> <fct> <chr>
1 1 1 A X1
2 2 1 A X1
3 3 2 A X2
4 4 2 C X2
5 5 2 C X2
6 6 3 A X1
7 7 3 A X1
count <- function(x){
if(sum(x == "A", na.rm = T) == 2 & sum(x == "C", na.rm = T) == 0){
y <- "4"
} else if (sum(x == "A", na.rm = T) > 2 & sum(x == "C", na.rm = T) == 0){
y <- "5"
}else if (sum(x == "A", na.rm = T) == 1 & sum(x == "C", na.rm = T) >= 1){
y <- "6"
}else if (sum(x == "A", na.rm = T) == 2 & sum(x == "C", na.rm = T) <= 3 & sum(x == "C", na.rm = T) >= 1){
y <- "7"
}else {
y <- "8"
}
}
df.2 <- df %>% group_by(membership) %>% mutate(code = count(AdultChild)) %>% ungroup()