Here is my data:
df <- tibble::tribble(
~A, ~B, ~C,
"a", "b", 2L,
"a", "b", 4L,
"c", "d", 3L,
"c", "d", 5L
)
var <- "AB"
I want to get this output:
df1 <- df %>%
unite("AB", c("A", "B")) %>%
group_by(AB) %>%
nest()
However, I want to refer var, maybe using rlang. I do not want to manually input "AB". I tried the following, but not getting the desired output.
df1 <- df %>%
unite(var, c("A", "B")) %>%
group_by(!!var) %>%
nest()
This solved the problem:
df1 <- df %>%
unite(!!var, c("A", "B")) %>%
group_by(!!sym(var)) %>%
nest()
Related
I need to duplicate rows with incontinuous dates to fill all the dates in a dataframe.
Suppose this df:
df <- data.frame(date = c("2022-07-05", "2022-07-07", "2022-07-11", "2022-07-15", "2022-07-18"), letter = c("a", "b", "a", "b", "c"))
The desired output is this df_new:
df_new <- data.frame(date = c("2022-07-05", "2022-07-06",
"2022-07-07", "2022-07-08", "2022-07-09", "2022-07-10",
"2022-07-11", "2022-07-12", "2022-07-13", "2022-07-14",
"2022-07-15"),
letter = c("a", "a",
"b", "b", "b", "b",
"a", "a", "a", "a",
"c"))
Could you please help ?
We could use complete from tidyr to expand the data based on the min/max date incremented by '1 day' and then fill the NA elements in 'letter' by the previous non-NA element
library(dplyr)
library(tidyr)
df %>%
mutate(date = as.Date(date)) %>%
complete(date = seq(min(date), max(date), by = '1 day')) %>%
fill(letter)
I would like to compare df1 and df2 and find the entry which are present df1 but not df2 and present in df2 but not in df1 in r.
Input:
df1 <- data.frame(ID1 = c("d", "p", "n", "m", "c"))
df2 <- data.frame(ID2 = c("c", "b", "a", "d", "s", "p"))
Output:
nonMatch_Uniquedf1 <- data.frame(ID1 = c("n", "m"))
nonMatch_Uniquedf2 <- data.frame(ID2 = c("b", "a", "s"))
Please note that both columns of df1 and df2 have different row numbers.
Thank you for your help.
Here's another way of reaching the desired output using the anti_join function.
library(dplyr)
df1 %>%
anti_join(df2,
# Define equivalence in column names in both df
by = c("ID2" = "ID1"))
df2 %>%
anti_join(df1,
# Define equivalence in column names in both df
by = c("ID1" = "ID2"))
With dplyr:
require(dplyr)
df1 %>%
filter(!df1$ID1 %in% df2$ID2) #For df1 values not in df2
df2 %>%
filter(!df2$ID2 %in% df1$ID1) #For df2 values not in df1
Edit: with the expected output:
nonMatch_Uniquedf1 <- df1 %>%
filter(!df1$ID1 %in% df2$ID2) #For df1 values not in df2
nonMatch_Uniquedf2 <- df2 %>%
filter(!df2$ID2 %in% df1$ID1) #For df2 values not in df1
I'm trying to make combn() work in dplyr::mutate, but I'm failing and can't quite figure out why.
This works:
c("a", "b", "c") %>% combn(2, FUN = paste, collapse = ";", simplify = TRUE)
[1] "a;b" "a;c" "b;c"
But how can I make this work?
tribble(
~col,
c("a", "b", "c"),
c("a", "d", "f")
) %>%
mutate(col = combn(str_split(names, ";"), 2, FUN = paste, collapse = ";"))
I want each row in the matrix to be a character vector in this form:
[1] "a;b" "a;c" "b;c"
The example above would be the first row.
Edit: I guess it's fine if combn() isn't used.
We could use map to loop over the list and paste
library(tidyverse)
out <- tribble(
~col,
c("a", "b", "c"),
c("a", "d", "f")
) %>%
mutate(col = map(col, ~ combn(.x, 2, FUN = paste, collapse=";")))
Try:
tribble(
~col,
c("a", "b", "c"),
c("a", "d", "f")
) %>%
rowwise() %>%
mutate(new = toString(combn(col, 2, FUN = paste, collapse = ";")))
After grouping by id I wish to replace the NAs in dist_from_top with sequential values such that dist_from_top becomes c(5,4,3,2,1,5,4,3,2). I am using the one dist_from_top value within each id grouping as a seed of sorts to fill in the values of dist_from_top that are above and below.
tidyr::fill() can fill in the same value throughout the grouping, but I can't think of a way to make it increase and decrease by 1 as it fills. Any help is greatly appreciated.
library(dplyr)
library(tidyr)
df <-
tribble(
~id, ~mgr, ~dist_from_top,
"A", "B", NA,
"A", "C", NA,
"A", "D", 3,
"A", "E", NA,
"A", "F", NA,
"B", "C", NA,
"B", "D", 4,
"B", "E", NA,
"B", "F", NA
)
An "almost there" solution using fill()
df %>%
group_by(id) %>%
fill(dist_from_top, .direction = "up") %>%
fill(dist_from_top, .direction = "down")
Create a column that counts downwards in each group, from any starting point:
... %>% mutate(rn = -row_number())
Add the offset that is defined by the difference between dist_from_top and rn for the one row where dist_from_top is not NA:
... %>% mutate(dist_from_top = rn + max(dist_from_top - rn, na.rm = TRUE))
This uses max() merely to pick one value, assuming there is only one value that isn't NA.
Both mutate() operations operate on groups:
df %>%
group_by(id) %>%
mutate(rn = ...) %>%
mutate(dist_from_top = ...) %>%
ungroup() %>%
select(-rn)
If there is an all-NA group, you'll see a warning.
I have the following problem: I have two dataframes. df1 contains among other variables (which are not shown in the code below) a date-variable. In df2 I have an id (refering to the id in table df1), a factor-variable (type) and another date.
df1 <- data.frame(id=1:5, referenceDate=c("2018-01-20","2018-02-03","2018-05-20", "2018-08-01", "2018-07-31"))
df2 <- data.frame(id=c(1,1,1,2,2,4,4,5,5), type=c("A", "A", "B", "A", "A", "B", "A", "B", "B"), dates=c("2018-01-10", "2018-01-23", "2018-01-24", "2018-05-21", "2018-05-18", "2018-06-01", "2018-09-01", "2018-07-10", "2018-07-20"))
My goal is to create a new column in df1 indicating the number of rows in df2 where (e.g.) df2$type=='A' and df2$dates occures before df1$referenceDate.
In R I have the following solution that gives me the number of rows where df2$type=='A'. But how can I additionally consider the date? I had the idea of first joining the two tables in order to get the referenceDate-Variable from df1 into df2 and then do the counting and join the two tables again in the other direction (in order to get the count variable back into df1). But this does not sound very elegant to me.
library(tidyverse)
reduced <- df2 %>% filter(type=='A') %>% group_by(id) %>% mutate(count=n()) %>% filter(!duplicated(id))
df1 %>% left_join(reduced[, c("id", "count")])
I think this might be what you want:
df1 <- tibble(id = 1:5,
referenceDate = as.Date(c("2018-01-20","2018-02-03","2018-05-20", "2018-08-01", "2018-07-31")))
df2 <- tibble(id = c(1,1,1,2,2,4,4,5,5),
type = c("A", "A", "B", "A", "A", "B", "A", "B", "B"),
dates = as.Date(c("2018-01-10", "2018-01-23", "2018-01-24", "2018-05-21", "2018-05-18", "2018-06-01", "2018-09-01", "2018-07-10", "2018-07-20")))
df1 %>%
left_join(
df2 %>%
left_join(df1, by = 'id') %>%
filter(dates < referenceDate) %>%
group_by(id) %>%
count(type) %>%
ungroup(),
by = 'id'
)
The key is to join df1 to df2 first and then filter based on reference date. That allows you to use filter to keep what you want. Then, use count. Then join back to df1