How can I define the columns I want to use for nesting in the tidyr::complete function?
one_of or as.name are not working.
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
char_vec <- c("item_id", "item_name")
df %>% complete(group, nesting(char_vec))
Error: `by` can't contain join column `char_vec` which is missing from RHS
Run `rlang::last_error()` to see where the error occurred.
An up to date solution with dplyr version 1.06 is !!!syms():
library(dplyr)
df %>%
complete(group, nesting(!!!syms(char_vec)))
Ok, I figured it out.
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
char_vec <- c("item_id", "item_name")
df %>% complete(group, nesting(!!as.symbol(char_vec)))
Related
I want to iterate over several columns of a flextable using the mk_par function. Consider the following example:
tibble(a = c(1:10),
b1 = letters[1:10],
b2 = LETTERS[1:10],
c1 = paste0("new_",letters[1:10]),
c2 = paste0(LETTERS[1:10], "_new")) %>%
flextable(col_keys = c("a", "b", "c")) %>%
mk_par(j = "b", value = as_paragraph(b1, b2)) %>%
mk_par(j = "c", value = as_paragraph(c1, c2))
I would like to replace the two mk_par statements by a single expression which takes the arguments c("b", "c") and renders the same output. I have succeeded in rewriting this with a for loop
for(pref in c("b", "c")){
tt <- tt %>%
mk_par(j = pref,
value = as_paragraph(.data[[paste0(pref,1)]],
.data[[paste0(pref,2)]]))
}
but I wonder if there is a one line expression that does the same which integrates smoothly in a dplyr pipe syntax?
I have the following input data frame with 4 columns and 3 rows.
The time column can take value from 1 to the corresponding value of the maturity column for that customer, I want to create more observations for each customer till the value of time is = value of maturity, with the other columns retaining their original value. Please see the below links for input and expected output
Input
Output
Here is a dplyr solution inspired but not exactly equal to this post.
library(dplyr)
df <- data.frame(custno = 1:3, time = 1, dept = c("A", "B", "A"))
df %>%
slice(rep(1:n(), each = 5)) %>%
group_by(custno) %>%
mutate(time = seq_along(time))
Edit
After the comments by the OP, the following seems to be better.
First, the data:
df <- data.frame(custno = 1:3, time = 1,
dept = c("A", "B", "A"),
maturity = c(5,4,6))
And the solution.
df %>%
tidyr::uncount(maturity) %>%
group_by(custno) %>%
mutate(time = seq_along(time))
We can also use slice with row_number
library(dplyr)
library(data.table)
df %>%
slice(rep(row_number(), maturity)) %>%
mutate(time = rowid(custno))
data
df <- data.frame(custno = 1:3, time = 1,
dept = c("A", "B", "A"),
maturity = c(5,4,6))
Lets say i have the following data:
> data.frame(value = 1:2, name = c("a", "b"))
value name
1 1 a
2 2 b
Goal:
Can i give it as Input to the pipe Operator and "send" it to setNames (or magrittr::set_names)?
What i have tried:
library(magrittr)
data.frame(value = 1:2, name = c("a", "b")) %>%
setNames(object = .$value, nm = .$name)
That doesnt work i guess, because the pipe wants to Hand over the whole data.frame and use it as a first Argument. That got me interested if i can skip this behaviour and use two subsets instead.
(So that data.frame(value = 1:2, name = c("a", "b")) %>% is fixed and not replaced by a variable).
Desired Output:
How it would look like without the pipe Operator:
> a <- data.frame(value = 1:2, name = c("a", "b"))
> setNames(object = a$value, nm = a$name)
a b
1 2
For this case, we can simply wrap it inside {}
library(dplyr)
data.frame(value = 1:2, name = c("a", "b")) %>%
{ setNames(object = .$value, nm = .$name)}
With tidyverse, there is also a deframe which will give a named vector
library(tibble)
data.frame(value = 1:2, name = c("a", "b")) %>%
select(2:1) %>%
deframe
#a b
#1 2
Since the update to tidyr version 1.0.0 I have started to get an error when unnesting a list of dataframes.
The error comes because some of the data frames in the list contain a column with all NA values (logical), while other of the dataframes contain the same column but with some character values (character). The columns with all NA values are coded as logicals while the others are coded as character vectors.
The default behavior of earlier versions of tidyr handled the different column types without problems (at least I didn't get this error when running the script).
Can I solve this issue from inside tidyr::unest() ?
Reproducible example:
library(tidyr)
a <- tibble(
value = rnorm(3),
char_vec = c(NA, "A", NA))
b <- tibble(
value = rnorm(2),
char_vec = c(NA, "B"))
c <- tibble(
value = rnorm(3),
char_vec = c(NA, NA, NA))
tibble(
file = list(a, b, c)) %>%
unnest(cols = c(file))
#> No common type for `..1$file$char_vec` <character> and `..3$file$char_vec`
#> <logical>.
Created on 2019-10-11 by the reprex package (v0.3.0)
You can convert all relevant columns to character one step before unnesting.
tibble(
file = list(a, b, c)) %>%
mutate(file = map(file, ~ mutate(.x, char_vec = as.character(char_vec)))) %>%
unnest(cols = c(file))
If there are several columns that need treatment you can do:
tibble(
file = list(a, b, c)) %>%
mutate(file = map(file, ~ mutate_at(.x, vars(starts_with("char")), ~as.character(.))))
Data for the latter example:
a <- tibble(
value = rnorm(3),
char_vec = c(NA, "A", NA),
char_vec2 = c(NA, NA, NA))
b <- tibble(
value = rnorm(2),
char_vec = c(NA, "B"),
char_vec2 = c("C", "A"))
c <- tibble(
value = rnorm(3),
char_vec = c(NA, NA, NA),
char_vec2 = c("B", NA, "A"))
I'm new to the Tidyverse and dplyr and was hoping to get some guidance on how best to concatenate data from row below the current row. For example, in the dataframe below I want to use data in the Grade column to create the data in the Prior3Grades column. The Prior3Grades data for 1/2/2019 would be created by concatenating the grades from 12/3/18, 11/3/18 and 10/4/18.
Can this be achieved in dplyr using mutate or some other means? Also is this in dplyr's wheelhouse or would this be something better suited to sql.
Using some basic packages from the tidyverse:
library(dplyr)
library(tidyr)
library(tibble)
df <- tibble(
Name = "Bob",
TestDate = seq(as.Date("2019-02-01"), as.Date("2019-05-08"), length.out = 6), ## some random dates
Grade = c("A", "A", "B", "C", "D", "A")
)
df %>%
group_by(Name) %>%
mutate(
grade1 = lead(Grade),
grade2 = lead(Grade, 2),
grade3 = lead(Grade, 3)
) %>%
replace_na(list(grade1 = "", grade2 = "", grade3 = "")) %>%
mutate(
Prior3Grades = paste0(grade1, grade2, grade3)
)