pivot_longer with groups of columns [duplicate] - r

This question already has an answer here:
How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables
(1 answer)
Closed 2 years ago.
I've got a dataset that looks like this:
df_start <- tribble(
~name, ~age, ~x1_sn_ctrl1, ~x1_listing2_2, ~x1_affect1, ~x2_sn_ctrl1, ~x1_listing2_2, ~x2_affect1, ~number,
"John", 28, 1, 1, 9, 4, 5, 9, 6,
"Paul", 27, 2, 1, 4, 1, 3, 3, 4,
"Ringo", 31, 3, 1, 2, 2, 5, 8, 9)
I need to pivot_longer() while handling the groupings within my columns:
There are 2 x-values (1 and 2)
There are 3 questions (sn_ctrl1, listing2_2, affect1) for each x-value
In my actual dataset, there are 14 x's.
Essentially, what I'd like to do is to apply pivot_longer() to the x-values but leave my 3 questions (sn_ctrl1, listing2_2, affect1) wide.
What I'd like to end up with is this:
df_end <- tribble(
~name, ~age, ~xval, ~sn_ctrl1, ~listing2_2, ~affect1, ~number,
"John", 28, 1, 1, 1, 9, 6,
"John", 28, 2, 4, 5, 9, 6,
"Paul", 27, 1, 2, 1, 4, 4,
"Paul", 27, 2, 1, 3, 3, 4,
"Ringo", 31, 1, 3, 1, 2, 9,
"Ringo", 31, 2, 2, 5, 8, 9)
I have tried lots of very unsuccessful attempts playing with regex in names_pattern & pivot_longer but am completely striking out.
Anyone know how to tackle this?
THANKS!
PS: Note that I tried to make a straightforward reproducible example. The actual names of my columns vary slightly. For instance, there is x1_sn_ctrl1 & x1_attr1_ctrl2.

You can use :
tidyr::pivot_longer(df_start,
cols = -c(name, age, number),
names_to = c("xval", ".value"),
names_pattern = 'x(\\d+)_(.*)')
Which yields
# A tibble: 9 x 7
name age number xval sn_ctrl1 listing2_2 affect1
<chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 John 28 6 1 1 1 9
2 John 28 6 2 4 NA 9
3 John 28 6 1 NA 5 NA
4 Paul 27 4 1 2 1 4
5 Paul 27 4 2 1 NA 3
6 Paul 27 4 1 NA 3 NA
7 Ringo 31 9 1 3 1 2
8 Ringo 31 9 2 2 NA 8
9 Ringo 31 9 1 NA 5 NA

Related

dplyr solution: absolute difference of two values in one column matched by other column

I have a dataframe that looks like this, but there will be many more IDs:
# Groups: ID [1]
ID ARS stim
<int> <int> <chr>
1 3 0 1
2 3 4 2
3 3 2 3
4 3 3 4
5 3 1 5
6 3 0 6
7 3 2 10
8 3 4 11
9 3 0 12
10 3 3 13
11 3 2 14
12 3 2 15
I would like to calculate the sum of the absolute difference abs() between the values in ARS, e.g. for stim=1 and stim=10 plus for stim=2 and stim=11 and so on.
Any good solutions are appreciated!
The desired output calculation is:
abs(0-2) + abs(4-4) + abs(2-0) + abs(3-3) + abs(1-2) + abs(0-2)
Hence, 2+0+2+0+1+2
Output for ID==3: 7
A possible solution:
library(dplyr)
df <- structure(list(ID = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), ARS = c(0, 4, 2, 3, 1, 0, 2, 4, 0, 3, 2, 2), stim = c(1, 2, 3, 4, 5, 6,
10, 11, 12, 13, 14, 15)), row.names = c(NA, -12L), class = "data.frame")
df %>%
group_by(ID) %>%
summarise(value = abs(ARS[which(stim == 1:6)] - ARS[which(stim == 9+1:6)]),
.groups = "drop") %>%
pull(value) %>% sum
#> [1] 7

replace negative values with na using na_if{dplyr}

Let's say I have the following dataframe:
dat <- tribble(
~V1, ~V2,
2, -3,
3, 2,
1, 3,
3, -4,
5, 1,
3, 2,
1, -4,
3, 4,
4, 1,
3, -5,
4, 2,
3, 4
)
How can I replace negative values with NA using na_if()? I know how to do this using ifelse, but don't manage to come up with a correct condition for na_if():
> dat %>%
+ mutate(V2 = ifelse(V2 < 0, NA, V2))
# A tibble: 12 x 2
V1 V2
<dbl> <dbl>
1 2 NA
2 3 2
3 1 3
4 3 NA
5 5 1
6 3 2
7 1 NA
8 3 4
9 4 1
10 3 NA
11 4 2
12 3 4

dplyr Find records that have specifc set of values

I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?
ID
Timepoint
1
1
1
6
1
12
2
1
3
1
3
6
3
12
3
18
4
1
4
6
4
12
I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.
Result would be IDs 1,3 and 4
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))
df %>%
filter(Timepoint %in% c(1, 6, 12)) %>%
mutate(indicator = 1) %>%
group_by(ID) %>%
complete(Timepoint = c(1, 6, 12)) %>%
filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
select(indicator)
Output:
# A tibble: 9 × 2
# Groups: ID [3]
ID indicator
<dbl> <dbl>
1 1 1
2 1 1
3 1 1
4 3 1
5 3 1
6 3 1
7 4 1
8 4 1
9 4 1
We can use
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c(1, 6, 12) %in% Timepoint)) %>%
ungroup
-output
# A tibble: 10 x 2
ID Timepoint
<dbl> <dbl>
1 1 1
2 1 6
3 1 12
4 3 1
5 3 6
6 3 12
7 3 18
8 4 1
9 4 6
10 4 12
From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.
ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])

How to find corresponding ID comparing values in two columns?

I have following problem that you can easily see after downloading the picture. It would be of great help if you help me solve the problem.
In Table 1, the IDs are correctly linked up with the values in column A. B contains some values which are not ordered and whose corresponding IDs are not given in corresponding rows. We need to find the IDs of the values in column B by using the IDs of column A.
Now if we run following code in R, we will find the IDs corresponding the values in column B
mydata <- read.csv(‘C:/Users/Windows/Desktop/practice_1.csv’)
df <- data.frame(mydata$B, mydata$A, mydata$ID, header=TRUE)
library(qdap)
df[, "New ID"] <- df[, 1] %l% df[, -1]
After running above code, we will find the new ID in the column New ID like Table 2.
What you need is a simple match operation:
table1$ID2 <- table1$ID1[match(table1$z, table1$y)]
table1
# ID1 y z ID2
# 1 0 1 11 10
# 2 1 2 3 2
# 3 2 3 5 4
# 4 3 4 4 3
# 5 4 5 8 7
# 6 5 6 7 6
# 7 6 7 15 15
# 8 7 8 6 5
# 9 8 9 2 1
# 10 9 10 16 17
# 11 10 11 1 0
# 12 11 12 NA NA
# 13 15 15 NA NA
# 14 17 16 NA NA
Please, the next time you ask a question where sample data is necessary (most questions), please provide data in this format:
Data
# dput(table1)
structure(list(ID1 = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 17), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16), z = c(11, 3, 5, 4, 8, 7, 15, 6, 2, 16, 1, NA, NA, NA), ID2 = c(10, 2, 4, 3, 7, 6, 15, 5, 1, 17, 0, NA, NA, NA)), row.names = c(NA, -14L), class = "data.frame")

Replacing variables [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Apologies in advance if this is too basic for this community.
I have a dataframe like the one attached here. And I want to tell R to do this: B1 + B2 /2 for every missing variable(NA) in column B4, and then use the answer to replace those missing variables (NA).
enter image description here
Any advice will be appreciated.
Thanks
Create an index for the NA's in B4 and use that index to assign the formula's results.
i <- is.na(df1$B4)
df1$B4[i] <- with(df1[i, ], B1 + B2/2)
df1
# B1 B2 B3 B4
#1 2 NA 5 NA
#2 4 4 9 6
#3 3 6 8 6
#4 6 2 NA 10
#5 4 6 2 12
#6 2 6 3 14
#7 1 1 1 3
#8 1 3 6 6
#9 NA 7 NA 2
Data in dput format
df1 <-
structure(list(B1 = c(2, 4, 3, 6, 4, 2, 1, 1, NA),
B2 = c(NA, 4, 6, 2, 6, 6, 1, 3, 7), B3 = c(5, 9, 8,
NA, 2, 3, 1, 6, NA), B4 = c(NA, 6, NA, 10, 12, 14, 3,
6, 2)), class = "data.frame", row.names = c(NA, -9L))
Might this be what you're looking for? (R version 4.1.0)
library(tidyverse)
my_df <- tibble(
b1 = c(2, 4, 3, 6, 4, 2, 1, 1, NA),
b2 = c(NA, 4, 6, 2, 6, 6, 1, 3, 7),
b3 = c(5, 9, 8, NA, 2, 3, 1, 6, NA),
b4 = c(NA, 6, NA, 10, 12, 14, 3, 6, 2)
)
my_df |> mutate(
b4 = case_when(
is.na(b4) ~ b1 + b2 / 2,
TRUE ~ b4
)
)
Results in:
# A tibble: 9 x 4
b1 b2 b3 b4
<dbl> <dbl> <dbl> <dbl>
1 2 NA 5 NA
2 4 4 9 6
3 3 6 8 6
4 6 2 NA 10
5 4 6 2 12
6 2 6 3 14
7 1 1 1 3
8 1 3 6 6
9 NA 7 NA 2
But it seems there are NA values in common rows between b2 and b4 so i'm not sure.

Resources