I want to set the next row i+1 in the same column to NA if there is already an NA in row i and then do this by groups. Here is my attempt:
dfeg <- tibble(id = c(rep("A", 5), rep("B", 5)),
x = c(1, 2, NA, NA, 3, 5, 6, NA, NA, 7))
setNextrowtoNA <- function(x){
for (j in 1:length(x)){
if(is.na(x[j])){x[j+1] <- NA}
}
}
dfeg <- dfeg %>% group_by(id) %>% mutate(y = setNextrowtoNA(x))
However my attempt doesn't create the column y that am looking for. Can anyone help with this? Thanks!
EDIT: In my actual data I have multiple values in a row that need to be set to NA, for example my data is more like this:
dfeg <- tibble(id = c(rep("A", 6), rep("B", 6)),
x = c(1, 2, NA, NA, 3, 4, 15, 16, NA, NA, 17, 18))
And need to create a column like this:
y = c(1, 2, NA, NA, NA, NA, 15, 16, NA, NA, NA, NA)
Any ideas? Thanks!
EDIT 2:
I figured it out on my own, this seems to work:
dfeg <- tibble(id = c(rep("A", 6), rep("B", 6)),
x = c(1, 2, NA, NA, 3, 4, 15, 16, NA, NA, 17, 18))
setNextrowtoNA <- function(x){
for (j in 1:(length(x))){
if(is.na(x[j]))
{
x[j+1] <- NA
}
lengthofx <- length(x)
x <- x[-lengthofx]
print(x[j])
}
return(x)
}
dfeg <- dfeg %>% group_by(id) %>% mutate(y = NA,
y = setNextrowtoNA(x))
Use cumany:
library(dplyr)
dfeg %>%
group_by(id) %>%
mutate(y = ifelse(cumany(is.na(x)), NA, x))
id x y
<chr> <dbl> <dbl>
1 A 1 1
2 A 2 2
3 A NA NA
4 A NA NA
5 A 3 NA
6 A 4 NA
7 B 15 15
8 B 16 16
9 B NA NA
10 B NA NA
11 B 17 NA
12 B 18 NA
Previous answer:
Use an ifelse statement with lag:
library(dplyr)
dfeg %>%
group_by(id) %>%
mutate(y = ifelse(is.na(lag(x, default = 0)), NA, x))
Related
I have a dataset like the following simplified one:
x_1 <- c(1, NA, 2, 3, NA, 4, 5)
x_2 <- c(2, 1, NA, NA, NA, 4, 6)
y_1 <- c(2, 4, 6, 8, NA, 10, NA)
y_2 <- c(NA, 4, NA, 8, 10, 11, 13)
df <- data.frame(x_1, x_2, y_1, y_2)
x_1 x_2 y_1 y_2
1 1 2 2 NA
2 NA 1 4 4
3 2 NA 6 NA
4 3 NA 8 8
5 NA NA NA 10
6 4 4 10 11
7 5 6 NA 13
The goal is to coalesce each of the two corresponding variables (x and y) and to replace the values that are not the same (e.g. first row of x_1 and x_2) with NA. I did this with the following:
df <- df %>%
mutate(x = coalesce(x_1, x_2)) %>%
mutate(x = ifelse(!is.na(x) &
!is.na(x_2) &
x != x_2,
NA,
x)) %>%
select(!c(x_1, x_2))
Now, I have to do this with 21 variables so I thought I put the variables in a list and feed them through the pipeline with a for loop like this:
cols <- c("x", "y")
for(i in cols){
var_1 <- paste(i, "1", sep = "_")
var_2 <- paste(i, "2", sep = "_")
df <- df %>%
mutate(i = coalesce(var_1, var_2)) %>%
mutate(i = ifelse(!is.na(i) &
!is.na(var_2) &
i != var_2,
NA,
i)) %>%
select(!c(var_1, var_2))
}
What happens is that the code is executed, but instead of the new variables there is only the variable "i" with empty values. It seems as if R does not recognise the "i" in the pipeline as the iterator, however it does recognize "var_1" and "var_2" (because they are being removed from the dataset).
Does anyone know why that is and how I can fix it?
Thanks a lot in advance.
fun <- function(x, var) {
var_1 <- sym(paste(var, "1", sep = "_"))
var_2 <- sym(paste(var, "2", sep = "_"))
x %>%
mutate(!!var := ifelse((!!var_1 != !!var_2) %in% TRUE,
NA, coalesce(!!var_1, !!var_2))) %>%
select(!c(var_1, var_2))
}
cols <- c("x", "y")
Reduce(fun, cols, init = df)
# x y
# 1 NA 2
# 2 1 4
# 3 2 6
# 4 3 8
# 5 NA 10
# 6 4 NA
# 7 NA 13
If you want to avoid rlang:
library(tidyverse)
library(stringr)
x_1 <- c(1, NA, 2, 3, NA, 4, 5)
x_2 <- c(2, 1, NA, NA, NA, 4, 6)
y_1 <- c(2, 4, 6, 8, NA, 10, NA)
y_2 <- c(NA, 4, NA, 8, 10, 11, 13)
df <- data.frame(x_1, x_2, y_1, y_2)
my_coalesce <- function(d) {
vec_1 <- select(d, 1) %>% pull()
vec_2 <- select(d, 2) %>% pull()
res <- coalesce(vec_1, vec_2)
res[vec_1 != vec_2] <- NA
res
}
cols <- c("x", "y")
map(cols, ~df %>%
select(starts_with(.x)) %>% # or:
#select(str_c(.x, "_", 1:2)) %>%
my_coalesce()) %>%
set_names(cols) %>%
as_tibble()
I have a dataframe with the following column headers:
df <- data.frame(
ABC1_1_1DEF = c(1, 2, 3),
ABC1_2_1DEF = c(NA, 1, 2),
ABC1_3_1DEF = c(1, 1, NA),
ABC1_1_2DEF = c(3, NA, NA),
ABC1_2_2DEF = c(2, NA, NA),
ABC1_3_2DEF = c(NA, 1, 1)
)
I want to pivot the dataframe longer such that the middle number of each column is the group that contains the new columns:
df2 <- data.frame(
ABC1_1 = c(1, 2, 3, 3, NA, NA),
ABC1_2 = c(3, NA, NA, 2, NA, NA),
ABC1_3 = c(2, NA, NA, NA, 1, 1)
)
What's the best way to achieve this using R, ideally with dplyr?
To combine all the ABC1_1, ABC1_2 and ABC1_3 columns you can use -
tidyr::pivot_longer(df, cols = everything(),
names_to = '.value',
names_pattern = '([A-Z]+\\d+_\\d+)')
# ABC1_1 ABC1_2 ABC1_3
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 2 NA
#3 2 1 1
#4 NA NA 1
#5 3 2 NA
#6 NA NA 1
I have a tibble where the rows and columns are the same IDs and I would like to take the mean (ignoring the NAs) to make the df symmetrical. I am struggling to see how.
data <- tibble(group = LETTERS[1:4],
A = c(NA, 10, 20, NA),
B = c(15, NA, 25, 30),
C = c(20, NA, NA, 10),
D = c(10, 12, 15, NA)
)
I would normally do
A <- as.matrix(data[-1])
(A + t(A))/2
But this does not work because of the NAs.
Edit: below is the expected output.
output <- tibble(group = LETTERS[1:4],
A = c(NA, 12.5, 20, 10),
B = c(12.5, NA, 25, 21),
C = c(20, 25, NA, 12.5),
D = c(10, 21, 12.5, NA))
Here is a suggestion using tidyverse code.
library(tidyverse)
data <- tibble(group = LETTERS[1:4],
A = c(NA, 10, 20, NA),
B = c(15, NA, 25, 30),
C = c(20, NA, NA, 10),
D = c(10, 12, 15, NA)
)
A <- data %>%
pivot_longer(-group, values_to = "x")
B <- t(data) %>%
as.data.frame() %>%
setNames(LETTERS[1:4]) %>%
rownames_to_column("group") %>%
pivot_longer(-group, values_to = "y") %>%
left_join(A, by = c("group", "name")) %>%
mutate(
mean = if_else(!(is.na(x) | is.na(y)), (x + y)/2, x),
mean = if_else(is.na(mean) & !is.na(y), y, mean)
) %>%
select(-x, -y) %>%
pivot_wider(names_from = name, values_from = mean)
B
## A tibble: 4 x 5
# group A B C D
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 A NA 12.5 20 10
#2 B 12.5 NA 25 21
#3 C 20 25 NA 12.5
#4 D 10 21 12.5 NA
Okay so this is how I ended up doing this. I would have preferred if I didnt use a for loop because the actual data I have is much bigger but beggars cant be choosers!
A <- as.matrix(data[-1])
for (i in 1:nrow(A)){
for (j in 1:ncol(A)){
if(is.na(A[i,j])){
A[i,j] <- A[j, i]
}
}
}
output <- (A + t(A))/2
output %>%
as_tibble() %>%
mutate(group = data$group) %>%
select(group, everything())
# A tibble: 4 x 5
group A B C D
<chr> <dbl> <dbl> <dbl> <dbl>
1 A NA 12.5 20 10
2 B 12.5 NA 25 21
3 C 20 25 NA 12.5
4 D 10 21 12.5 NA
I have a dataset with answers to a likert scale and reaction times that are both results of a experimental manipulation. Ideally I would like to copy the Likert_Answer values and align them to the experimental manipulation associated with that value.
The dataset looks like this:
x <- rep(c(NA, round(runif(5, min=0, max=100), 2)), times=3)
myDF <- data.frame(ID = rep(c(1,2,3), each=6),
Condition = rep(c("A","B"), each=3, times=3),
Type_of_Task = rep(c("Test", rep(c("Experiment"), times=2)), times=6),
Likert_Answer = c(5, NA, NA, 6, NA, NA, 1, NA, NA, 5, NA, NA, 5, NA, NA, 1, NA, NA),
Reaction_Times = x)
I find it very hard to formulate the problem I have, so this is how my expected output should look like:
myDF_Output <- data.frame(ID = rep(c(1,2,3), each=6),
Condition = rep(c("A","B"), each=3, times=3),
Type_of_Task = rep(c("Test", rep(c("Experiment"), times=2)), times=6),
Likert_Answer = rep(c(5, 6, 1, 5, 5, 1), each = 3),
Reaction_Times = x)
I have seen in this post a feasible solution that is the following:
library(dplyr)
library(tidyr)
myDF2 <- myDF %>%
group_by(ID) %>%
fill(Likert_Answer) %>%
fill(Likert_Answer, .direction = "up")
The problem is that this solution is valid as far as a person replies to the likert scale. If that was not the case, I am afraid this solution would "drag" the result of the likert scale of the previous one experimental condition. For example:
myDF_missing <- myDF
myDF_missing[4,4] = NA
myDF3 <- myDF_missing %>%
group_by(ID) %>%
fill(Likert_Answer) %>%
fill(Likert_Answer, .direction = "up")
In this case, what should have been a NA in Likert_Scales for all values in condition B for ID 1 has become a 5. Any idea of how could avoid this?
(Excuse me if the code is dirty: I am quite new to R and I am learning the hard way... But I got pretty stuck with this problem at this stage.)
if I understood your problem correctly you are very close to a solution. I manipulated the demo df to show how the grouping works:
library(dplyr)
library(tidyr)
myDF <- data.frame(ID = rep(c(1,2,3), each=6),
Condition = rep(c("A","B"), each=3, times=3),
Type_of_Task = rep(c("Test", rep(c("Experiment"), times=5)), times=3),
Likert_Answer = c(5, NA, NA, 6, NA, NA, 1, NA, NA, 5, NA, NA, NA, NA, NA, 1, NA, NA),
Reaction_Times = x)
myDF %>%
dplyr::group_by(ID) %>%
tidyr::fill(Likert_Answer)
ID Condition Type_of_Task Likert_Answer Reaction_Times
<dbl> <chr> <chr> <dbl> <dbl>
1 1 A Test 5 NA
2 1 A Experiment 5 18.4
3 1 A Experiment 5 41.1
4 1 B Experiment 6 59.8
5 1 B Experiment 6 93.4
6 1 B Experiment 6 38.5
7 2 A Test 1 NA
8 2 A Experiment 1 18.4
9 2 A Experiment 1 41.1
10 2 B Experiment 5 59.8
11 2 B Experiment 5 93.4
12 2 B Experiment 5 38.5
13 3 A Test NA NA
14 3 A Experiment NA 18.4
15 3 A Experiment NA 41.1
16 3 B Experiment 1 59.8
17 3 B Experiment 1 93.4
18 3 B Experiment 1 38.5
I would like to compare multiple values by USER.
Based on USER "A", If the values (A,B,C,D,and E) are same with USER "B", it should be written as B at the newly created variable EQUAL
Here is my data
Desired value
I am very new to R, I tried to look at the compare function but got a little overwhelmed. Would very much appreciate any help.
Here's an abridged version of the data you provided:
library(tidyverse)
df <- data.frame(
id = c(1001, 1002, 1003, 1001, 1002, 1003),
user = c('a', 'a', 'a', 'b', 'b', 'b'),
point_a = c(1, 1, NA, 1, 1, NA),
point_b = c(NA, NA, 2, NA, NA, NA),
point_c = c(3, 2, 3, 3, 2, 3),
point_d = c(2, 1, NA, 2, 1, NA),
point_e = c(4, NA, 1, 4, NA, NA)
)
df
id user point_a point_b point_c point_d point_e
1 1001 a 1 NA 3 2 4
2 1002 a 1 NA 2 1 NA
3 1003 a NA 2 3 NA 1
4 1001 b 1 NA 3 2 4
5 1002 b 1 NA 2 1 NA
6 1003 b NA NA 3 NA NA
If you inner_join on the columns you want to match, and then filter for rows where user.x is greater than user.y (i.e. first in alphabetical order, to get rid of duplicates and rows matching to themselves), you should be left with the matches you're looking for:
df %>%
inner_join(df, by = c('point_a', 'point_b', 'point_c', 'point_d', 'point_e')) %>%
filter(user.x < user.y) %>%
rename(user = user.x,
equal = user.y)
id.x user point_a point_b point_c point_d point_e id.y equal
1 1001 a 1 NA 3 2 4 1001 b
2 1002 a 1 NA 2 1 NA 1002 b
We may split the data along users, and put the result in mapply and calculate the rowSums of TRUEs after comparison with `==`. From the resulting matrix we want to know which.max which allows us to subset the users (without "A"). The result just needs to be subsetted by user "A".
transform(dat, EQUAL=
split(dat, dat$user) |>
(\(.) mapply(\(x, y) rowSums(x == y, na.rm=TRUE),
unname(.['A']),
.[c('B', 'C')]))() |>
(\(.) sort(unique(dat$user))[-1][apply(., 1, which.max)])()
) |>
(\(.) .[.$user == 'A', ])()
# id user point_a point_b point_c point_d point_e EQUAL
# 1 1001 A 1 NA 3 2 4 B
# 2 1002 A 1 NA 2 1 NA B
# 3 1003 A NA 2 3 NA 1 C
Note: R version 4.1.2 (2021-11-01)
Data:
dat <- structure(list(id = c(1001L, 1002L, 1003L, 1001L, 1002L, 1003L,
1001L, 1002L, 1003L), user = c("A", "A", "A", "B", "B", "B",
"C", "C", "C"), point_a = c(1, 1, NA, 1, 1, NA, 4, 1, NA), point_b = c(NA,
NA, 2, NA, NA, NA, 3, NA, 2), point_c = c(3, 2, 3, 3, 2, 3, 3,
2, 3), point_d = c(2, 1, NA, 2, 1, NA, 2, 1, NA), point_e = c(4,
NA, 1, 4, NA, NA, 4, NA, 1)), class = "data.frame", row.names = c(NA,
-9L))