I've to data frame, let's say A and B.
The table A is constructed like this :
ID a b c d
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
And the table B is constructed like this :
A B
a 1
a 2
a 3
b 2
b 6
b 8
b 9
c 1
c 6
c 11
d 5
d 4
Basically what i'd like to do is to for the ID change NA in 1 (in table A) if in the table B 1(column B) is associated with a(column A).
I'm not sure this is the best way to do this maybe using a matrix could be simpler.
I think what you want to convert B to a dense table with 1 present if that combination is present in B. You can do that by recognising that B is the same data but with the value of the cells left out. We need to add that in and then spread to convert from long to wide:
library(tidyverse)
tbl_b <- tibble(
A = c("a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d"),
B = c(1, 2, 3, 2, 6, 8, 9, 1, 6, 11, 5, 4)
)
tbl_b %>%
mutate(value = 1) %>%
spread(A, value)
#> # A tibble: 9 x 5
#> B a b c d
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 NA 1 NA
#> 2 2 1 1 NA NA
#> 3 3 1 NA NA NA
#> 4 4 NA NA NA 1
#> 5 5 NA NA NA 1
#> 6 6 NA 1 1 NA
#> 7 8 NA 1 NA NA
#> 8 9 NA 1 NA NA
#> 9 11 NA NA 1 NA
Created on 2019-02-22 by the reprex package (v0.2.1)
Related
I have a Datafaame like this:
dt <- tibble(
TRIAL = c("A", "A", "A", "B", "B", "B", "C", "C", "C","D","D","D"),
RL = c(1, NA, 3, 1, 6, 3, 2, 3, 1, 0, 1.5, NA),
SL = c(6, 1.5, 1, 0, 0, 1, 1, 2, 0, 1, 1.5, NA),
HC = c(0, 1, 5, 6,7, 8, 9, 3, 4, 5, 4, 2)
)
# A tibble: 12 x 4
TRIAL RL SL HC
<chr> <dbl> <dbl> <dbl>
1 A 1 6 0
2 A NA 1.5 1
3 A 3 1 5
4 B 1 0 6
5 B 6 0 7
6 B 3 1 8
7 C 2 1 9
8 C 3 2 3
9 C 1 0 4
10 D 0 1 5
11 D 1.5 1.5 4
12 D NA NA 2
I want to group the data frame by TRIAL and have the values in RL and SL checked by group, if the value in either of the column is greater than 5 then move all values for RL and SL for that particular group to RLCT and SLCT respectively.
# A tibble: 12 x 6
TRIAL HC RLCT SLCT SL RL
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 6 NA NA
2 A 1 NA 1.5 NA NA
3 A 5 3 1 NA NA
4 B 6 1 0 NA NA
5 B 7 6 0 NA NA
6 B 8 3 1 NA NA
7 C 9 NA NA 1 3
8 C 3 NA NA 3 5
9 C 4 NA NA 1 1
10 D 5 NA NA 1 0
11 D 4 NA NA 1.5 1.5
12 D 2 NA NA NA NA
When I run the below code, I did not get the expected output
dt0 <- dt %>%
mutate(RLCT = NA,
SLCT = NA) %>%
group_by(TRIAL) %>%
filter(!any(RL > 5.0 | SL > 5.0))
dt1 <- dt %>%
group_by(TRIAL) %>%
filter(any(RL > 5.0 | SL > 5.0)) %>%
mutate(RLCT = RL,
SLCT = SL) %>%
rbind(dt0, .) %>%
mutate(RL = ifelse(!is.na(RLCT), NA, RL),
SL = ifelse(!is.na(SLCT), NA, SL)) %>% arrange(TRIAL)
This is what I get
# A tibble: 9 x 6
# Groups: TRIAL [3]
TRIAL RL SL HC RLCT SLCT
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A NA NA 0 1 6
2 A NA NA 1 NA 1.5
3 A NA NA 5 3 1
4 B NA NA 6 1 0
5 B NA NA 7 6 0
6 B NA NA 8 3 1
7 C 2 1 9 NA NA
8 C 3 2 3 NA NA
9 C 1 0 4 NA NA
You can define a column to storage the condition, and change RL and SL with ifelse inside across.
dt %>%
group_by(TRIAL) %>%
mutate(cond = any(RL > 5.0 | SL > 5.0, na.rm = TRUE),
across(c(RL, SL), ~ ifelse(cond, ., NA), .names = "{.col}CT"),
across(c(RL, SL), ~ ifelse(!cond, ., NA)),
cond = NULL)
Result:
# A tibble: 12 x 6
# Groups: TRIAL [4]
TRIAL RL SL HC RLCT SLCT
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A NA NA 0 1 6
2 A NA NA 1 NA 1.5
3 A NA NA 5 3 1
4 B NA NA 6 1 0
5 B NA NA 7 6 0
6 B NA NA 8 3 1
7 C 2 1 9 NA NA
8 C 3 2 3 NA NA
9 C 1 0 4 NA NA
10 D 0 1 5 NA NA
11 D 1.5 1.5 4 NA NA
12 D NA NA 2 NA NA
With dplyr, you could use group_modify():
library(dplyr)
dt %>%
group_by(TRIAL) %>%
group_modify(~ {
if(any(select(.x, c(RL, SL)) > 5, na.rm = TRUE)) {
rename_with(.x, ~ paste0(.x, 'CT'), c(RL, SL))
} else {
.x
}
})
Output
# A tibble: 12 × 6
# Groups: TRIAL [4]
TRIAL RLCT SLCT HC RL SL
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1 6 0 NA NA
2 A NA 1.5 1 NA NA
3 A 3 1 5 NA NA
4 B 1 0 6 NA NA
5 B 6 0 7 NA NA
6 B 3 1 8 NA NA
7 C NA NA 9 2 1
8 C NA NA 3 3 2
9 C NA NA 4 1 0
10 D NA NA 5 0 1
11 D NA NA 4 1.5 1.5
12 D NA NA 2 NA NA
I have created the following dataframe in R
library(tidyR)
library(dplyr)
DF11<- data.frame("ID"= c("A", "A", "A", "B", "B", "B", "B", "B"))
DF11$X_F<-c(5, 7,9,6,7,8,9,10)
DF11$X_A<-c(7, 8,9,3,6,7,9,10)
The dataframe looks as follows
ID X_F X_A
A 5 7
A 7 8
A 9 9
B 6 3
B 7 6
B 8 7
B 9 9
B 10 10
ID is the grouping variable. I would like to use dplyr to create the following dataframe.
ID X_F X_A
A 0 NA
A 1 NA
A 2 NA
A 3 NA
A 4 NA
A 5 7
A 7 8
A 9 9
A 10 NA
A 11 NA
A 12 NA
B 0 NA
B 1 NA
B 2 NA
B 3 NA
B 4 NA
B 5 NA
B 6 3
B 7 6
B 8 7
B 9 9
B 10 10
B 11 NA
B 12 NA
B 13 NA
The resultant dataframe should take DF11 and then group the X_F column using ID column. Next it should complete X_F group-wise from 0 to the minimum value of X_F by group, and then from the maximum value of X_F to maximum value X_F +3.
I tried the following code and was able to solve it partially.
DF112<-DF11%>%group_by(ID)%>%complete(X_F=seq(0, max(X_F)+3, by =1))
ID X_F X_A
A 0 NA
A 1 NA
A 2 NA
A 3 NA
A 4 NA
A 5 7
A 6 NA
A 7 8
A 8 NA
A 9 9
A 10 NA
A 11 NA
A 12 NA
B 0 NA
B 1 NA
B 2 NA
B 3 NA
B 4 NA
B 5 NA
B 6 3
B 7 6
B 8 7
B 9 9
B 10 10
B 11 NA
B 12 NA
B 13 NA
How do I get the desired output mentioned above. I request someone to guide me.
It would work to pass two vectors into your complete function call, one to do the lower values and one to do the upper:
library(tidyr)
library(dplyr)
DF11 <- data.frame("ID" = c("A", "A", "A", "B", "B", "B", "B", "B"))
DF11$X_F <- c(5, 7, 9, 6, 7, 8, 9, 10)
DF11$X_A <- c(7, 8, 9, 3, 6, 7, 9, 10)
DF11 %>%
group_by(ID) %>%
complete(X_F = c(seq(0, min(X_F) - 1 , by = 1), seq(max(X_F) + 1, max(X_F) + 3, by = 1))) |>
arrange(ID, X_F)
# A tibble: 25 × 3
# Groups: ID [2]
ID X_F X_A
<chr> <dbl> <dbl>
1 A 0 NA
2 A 1 NA
3 A 2 NA
4 A 3 NA
5 A 4 NA
6 A 5 7
7 A 7 8
8 A 9 9
9 A 10 NA
10 A 11 NA
11 A 12 NA
12 B 0 NA
13 B 1 NA
14 B 2 NA
15 B 3 NA
16 B 4 NA
17 B 5 NA
18 B 6 3
19 B 7 6
20 B 8 7
21 B 9 9
22 B 10 10
23 B 11 NA
24 B 12 NA
25 B 13 NA
Created on 2022-11-01 with reprex v2.0.2
I understand how to use complete from tidyverse tidyverse complete a dataframe
In the example they give:
df <- tibble(
group = c(1:2, 1, 2),
item_id = c(1:2, 2, 3),
item_name = c("a", "a", "b", "b"),
value1 = c(1, NA, 3, 4),
value2 = 4:7
)
df
#> # A tibble: 4 × 5
#> group item_id item_name value1 value2
#> <dbl> <dbl> <chr> <dbl> <int>
#> 1 1 1 a 1 4
#> 2 2 2 a NA 5
#> 3 1 2 b 3 6
#> 4 2 3 b 4 7
Is there a way of adding a group and completing? e.g. add a group 3 and complete the table.
For example, I have a df which I populate in a for loop to make a plot. The df is like so:
variant Location Position variable value protein Mutation.type
FANCI_L605F FANCI chr15:89828441_C/T B 0.45 L605F nonsynonymous_SNV
PLCG2_R953* PLCG2 chr16:81969788_C/T B 0.87 R953* stopgain
STAT3_R278C STAT3 chr17:40486033_G/A B 0.38 R278C nonsynonymous_SN
FANCI_L605F FANCI chr15:89828441_C/T C 0.45 L605F nonsynonymous_SNV
PLCG2_R953* PLCG2 chr16:81969788_C/T C 0.87 R953* stopgain
STAT3_R278C STAT3 chr17:40486033_G/A C 0.38 R278C nonsynonymous_SNV
I also have a vector of possible variable names:
all_var<-c("A","B","C")
I have worked out how to add any missing variables (I think):
new_df<-complete(df,variable=all_var,Position)
>new_df
variant Location Position variable value protein Mutation.type
NA NA chr15:89828441_C/T A NA NA NA
NA NA chr16:81969788_C/T A NA NA NA
NA NA chr17:40486033_G/A A NA NA NA
FANCI_L605F FANCI chr15:89828441_C/T B 0.45 L605F nonsynonymous_SNV
PLCG2_R953* PLCG2 chr16:81969788_C/T B 0.87 R953* stopgain
STAT3_R278C STAT3 chr17:40486033_G/A B 0.38 R278C nonsynonymous_SN
FANCI_L605F FANCI chr15:89828441_C/T C 0.45 L605F nonsynonymous_SNV
PLCG2_R953* PLCG2 chr16:81969788_C/T C 0.87 R953* stopgain
STAT3_R278C STAT3 chr17:40486033_G/A C 0.38 R278C nonsynonymous_SNV
How do I now complete the variant,Location, protein, Mutation.Type?
You can use add a row by specifying the group and use complete() to complete the combinations, i.e.
library(dplyr)
library(tidyr)
df %>%
add_row(group = 3) %>%
complete(group, nesting(item_id, item_name)) %>%
drop_na(item_id)
# A tibble: 12 x 5
group item_id item_name value1 value2
<dbl> <dbl> <chr> <dbl> <int>
1 1 1 a 1 4
2 1 2 a NA NA
3 1 2 b 3 6
4 1 3 b NA NA
5 2 1 a NA NA
6 2 2 a NA 5
7 2 2 b NA NA
8 2 3 b 4 7
9 3 1 a NA NA
10 3 2 a NA NA
11 3 2 b NA NA
12 3 3 b NA NA
I have a dataframe like this:
df <- data_frame(id = c(rep('A', 10), rep('B', 10)),
value = c(1:3, rep(NA, 2), 1:2, rep(NA, 3), 1, rep(NA, 4), 1:3, rep(NA, 2)))
I need to count the number of consective NA's in the value column. The count needs to be grouped by ID, and it needs to restart at 1 every time a new NA or new series of NA's is encountered. The exptected output should look like this:
df$expected_output <- c(rep(NA, 3), 1:2, rep(NA, 2), 1:3, NA, 1:4, rep(NA, 3), 1:2)
If anyone can give me a dplyr solution that would also be great :)
I've tried a few things but nothing is giving any sort of sensical result. Thanks in advance^!
A solution using dplyr and data.table.
library(dplyr)
library(data.table)
df2 <- df %>%
group_by(id) %>%
mutate(info = rleid(value)) %>%
group_by(id, info) %>%
mutate(expected_output = row_number()) %>%
ungroup() %>%
mutate(expected_output = ifelse(!is.na(value), NA, expected_output)) %>%
select(-info)
df2
# # A tibble: 20 x 3
# id value expected_output
# <chr> <dbl> <int>
# 1 A 1 NA
# 2 A 2 NA
# 3 A 3 NA
# 4 A NA 1
# 5 A NA 2
# 6 A 1 NA
# 7 A 2 NA
# 8 A NA 1
# 9 A NA 2
# 10 A NA 3
# 11 B 1 NA
# 12 B NA 1
# 13 B NA 2
# 14 B NA 3
# 15 B NA 4
# 16 B 1 NA
# 17 B 2 NA
# 18 B 3 NA
# 19 B NA 1
# 20 B NA 2
We can use rle to get length of groups that are or are not na, and use purrr::map2 to apply seq if they are NA and get the growing count or just fill in with NA values using rep.
library(tidyverse)
count_na <- function(x) {
r <- rle(is.na(x))
consec <- map2(r$lengths, r$values, ~ if (.y) seq(.x) else rep(NA, .x))
unlist(consec)
}
df %>%
mutate(expected_output = count_na(value))
#> # A tibble: 20 × 3
#> id value expected_output
#> <chr> <dbl> <int>
#> 1 A 1 NA
#> 2 A 2 NA
#> 3 A 3 NA
#> 4 A NA 1
#> 5 A NA 2
#> 6 A 1 NA
#> 7 A 2 NA
#> 8 A NA 1
#> 9 A NA 2
#> 10 A NA 3
#> 11 B 1 NA
#> 12 B NA 1
#> 13 B NA 2
#> 14 B NA 3
#> 15 B NA 4
#> 16 B 1 NA
#> 17 B 2 NA
#> 18 B 3 NA
#> 19 B NA 1
#> 20 B NA 2
Here is a solution using rle:
x <- rle(is.na(df$value))
df$new[is.na(df$value)] <- sequence(x$lengths[x$values])
# A tibble: 20 x 3
id value new
<chr> <dbl> <int>
1 A 1 NA
2 A 2 NA
3 A 3 NA
4 A NA 1
5 A NA 2
6 A 1 NA
7 A 2 NA
8 A NA 1
9 A NA 2
10 A NA 3
11 B 1 NA
12 B NA 1
13 B NA 2
14 B NA 3
15 B NA 4
16 B 1 NA
17 B 2 NA
18 B 3 NA
19 B NA 1
20 B NA 2
Yet another solution:
library(tidyverse)
df %>%
mutate(aux =data.table::rleid(value)) %>%
group_by(id, aux) %>%
mutate(eout = ifelse(is.na(value), row_number(), NA_real_)) %>%
ungroup %>% select(-aux)
#> # A tibble: 20 × 4
#> id value expected_output eout
#> <chr> <dbl> <int> <dbl>
#> 1 A 1 NA NA
#> 2 A 2 NA NA
#> 3 A 3 NA NA
#> 4 A NA 1 1
#> 5 A NA 2 2
#> 6 A 1 NA NA
#> 7 A 2 NA NA
#> 8 A NA 1 1
#> 9 A NA 2 2
#> 10 A NA 3 3
#> 11 B 1 NA NA
#> 12 B NA 1 1
#> 13 B NA 2 2
#> 14 B NA 3 3
#> 15 B NA 4 4
#> 16 B 1 NA NA
#> 17 B 2 NA NA
#> 18 B 3 NA NA
#> 19 B NA 1 1
#> 20 B NA 2 2
I would like to convert repeating values in a vector into NA's, such that I keep the position of the first occurrence of each new value.
I can find lots of posts on how to solve the removal of duplicate rows, but no posts that solve this issue.
Can you help me convert the column "problem" into the values in the column "desire"?
dplyr solutions are preferred.
library(tidyverse)
df <- tribble(
~frame, ~problem, ~desire,
1, NA, NA,
2, "A", "A",
3, NA, NA,
4, "B", "B",
5, "B", NA,
6, NA, NA,
7, "C", "C",
8, "C", NA,
9, NA, NA,
10, "E", "E")
df
# A tibble: 10 x 3
frame problem desire
<dbl> <chr> <chr>
1 1 NA NA
2 2 A A
3 3 NA NA
4 4 B B
5 5 B NA
6 6 NA NA
7 7 C C
8 8 C NA
9 9 NA NA
10 10 E E
_____EDIT with "Base R"/ "dplyr" solution___
Ronak Shah's solution works. Here it is within a dplyr workflow in case anyone is interested:
df %>%
mutate(
solved = replace(problem, duplicated(problem), NA))
# A tibble: 10 x 4
frame problem desire solved
<dbl> <chr> <chr> <chr>
1 1 NA NA NA
2 2 A A A
3 3 NA NA NA
4 4 B B B
5 5 B NA NA
6 6 NA NA NA
7 7 C C C
8 8 C NA NA
9 9 NA NA NA
10 10 E E E
Using data.table rleid, we can replace the duplicated values to NA.
library(data.table)
df$answer <- replace(df$problem, duplicated(rleid(df$problem)), NA)
# frame problem desire answer
# <dbl> <chr> <chr> <chr>
# 1 1 NA NA NA
# 2 2 A A A
# 3 3 NA NA NA
# 4 4 B B B
# 5 5 B NA NA
# 6 6 NA NA NA
# 7 7 C C C
# 8 8 C NA NA
# 9 9 NA NA NA
#10 10 E E E
For a complete base R option we can use rle instead of rleid to create sequence
df$answer <- replace(df$problem, duplicated(with(rle(df$problem),
rep(seq_along(values), lengths))), NA)
As in the example shown if all the similar values are always together we can use only duplicated
df$problem <- replace(df$problem, duplicated(df$problem), NA)
We can use data.table
library(data.table)
setDT(df)[duplicated(rleid(problem)), problem := NA][]