I want to iterate over several columns of a flextable using the mk_par function. Consider the following example:
tibble(a = c(1:10),
b1 = letters[1:10],
b2 = LETTERS[1:10],
c1 = paste0("new_",letters[1:10]),
c2 = paste0(LETTERS[1:10], "_new")) %>%
flextable(col_keys = c("a", "b", "c")) %>%
mk_par(j = "b", value = as_paragraph(b1, b2)) %>%
mk_par(j = "c", value = as_paragraph(c1, c2))
I would like to replace the two mk_par statements by a single expression which takes the arguments c("b", "c") and renders the same output. I have succeeded in rewriting this with a for loop
for(pref in c("b", "c")){
tt <- tt %>%
mk_par(j = pref,
value = as_paragraph(.data[[paste0(pref,1)]],
.data[[paste0(pref,2)]]))
}
but I wonder if there is a one line expression that does the same which integrates smoothly in a dplyr pipe syntax?
Related
In the following example I use a dplyr::arrange on a data.table with a key. This overrides the sort on that column:
x <- data.table(a = sample(1000:1100), b = sample(c("A", NA, "B", "C", "D"), replace = TRUE), c = letters)
setkey(x, "a")
# lose order on datatable key
x <- dplyr::arrange(x, b)
y <- data.table(a = sample(1000:1100), f = c(letters, NA), g = c("AA", "BB", NA, NA, NA, NA))
setkey(y, "a")
res <- merge(x, y, by = c("a"), all.x = TRUE)
# try merge with key removed
res2 <- merge(x %>% as.data.frame() %>% as.data.table(), y, by = c("a"), all.x = TRUE)
# merge results are inconsistent
identical(res, res2)
I can see that if I ordered with x <- x[order(b)], I would maintain the sort on the key and the results would be consistent.
I am not sure why I cannot use dplyr::arrange and what relationship the sort key has with the merge. Any insight would be appreciated.
The problem is that with dplyr::arrange(x, b) you do not remove the sorted attribute from your data.table contrary to using x <- x[order(b)] or setorder(x, "b").
The data.table way would be to use setorder in the first place e.g.
library(data.table)
x <- data.table(a = sample(1000:1100), b = sample(c("A", NA, "B", "C", "D"), replace = TRUE), c = letters)
setorder(x, "b", "a", na.last=TRUE)
The wrong results of joins on data.tables which have a key although they are not sorted by it, is a known bug (see also #5361 in data.table bug tracker).
I have a data frame with 3 columns. What I want to do is to calculate the product of the return over a selected month rolling period for each monthly period (or said another way, each row) (where available). This is the basic structure of the data.
set.seed = 100
assets <- c("A", "B", "C", "D", "E", "F", "G", "H", "I")
FileDate <- seq(as.Date("2011-12-30"), as.Date("2019-01-31"), by="months")
df <- merge(x = assets, y = FileDate, all.x = TRUE)
df$return <- runif(774, min=0, max=1)
What it should end with is a dataframe where a new column is added with the selected period cumulative return for that time frame. For example, I have shown below a four month return. The calculation of the 4-month return on 03/30/2012 from the data would be:
((1+0.81/100)(1+0.715/100)(1+0.27/100)*(1+0.80/100)-1)*100
This would be repeated for each value under the X column.
I ended up utilizing the mutate function there you can set the lag width. in the end version I wanted
library(dplyr)
library(zoo)
# Create Test Dataframe
set.seed = 100
assets <- c("A", "B", "C", "D", "E", "F", "G", "H", "I")
FileDate <- seq(as.Date("2011-12-30"), as.Date("2019-01-31"), by="months")
df <- merge(x = assets, y = FileDate, all.x = TRUE)
df$performance <- runif(774, min=0, max=1)
This particular code creates a 5 month average on a rolling basis. If you sort by column X you can see and recreate it in excel.
df <- df %>%
group_by(x) %>%
mutate(x_mean = rollmean(performance, 5, fill = NA, align = 'right'))
I also found a way to create a lag so I could take the 4 prior values to the observation and calculate the mean:
df2 = df %>%
mutate(perf.4.previous = rollapply(data = perf.1.previous, width = 4, FUN =
mean, align = "right", fill = NA, na.rm = T))
How can I define the columns I want to use for nesting in the tidyr::complete function?
one_of or as.name are not working.
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
char_vec <- c("item_id", "item_name")
df %>% complete(group, nesting(char_vec))
Error: `by` can't contain join column `char_vec` which is missing from RHS
Run `rlang::last_error()` to see where the error occurred.
An up to date solution with dplyr version 1.06 is !!!syms():
library(dplyr)
df %>%
complete(group, nesting(!!!syms(char_vec)))
Ok, I figured it out.
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
char_vec <- c("item_id", "item_name")
df %>% complete(group, nesting(!!as.symbol(char_vec)))
Lets say i have the following data:
> data.frame(value = 1:2, name = c("a", "b"))
value name
1 1 a
2 2 b
Goal:
Can i give it as Input to the pipe Operator and "send" it to setNames (or magrittr::set_names)?
What i have tried:
library(magrittr)
data.frame(value = 1:2, name = c("a", "b")) %>%
setNames(object = .$value, nm = .$name)
That doesnt work i guess, because the pipe wants to Hand over the whole data.frame and use it as a first Argument. That got me interested if i can skip this behaviour and use two subsets instead.
(So that data.frame(value = 1:2, name = c("a", "b")) %>% is fixed and not replaced by a variable).
Desired Output:
How it would look like without the pipe Operator:
> a <- data.frame(value = 1:2, name = c("a", "b"))
> setNames(object = a$value, nm = a$name)
a b
1 2
For this case, we can simply wrap it inside {}
library(dplyr)
data.frame(value = 1:2, name = c("a", "b")) %>%
{ setNames(object = .$value, nm = .$name)}
With tidyverse, there is also a deframe which will give a named vector
library(tibble)
data.frame(value = 1:2, name = c("a", "b")) %>%
select(2:1) %>%
deframe
#a b
#1 2
This block runs below, and produces df_all as intended, but when I uncomment the single function at the top (not even apply it here but I do need for other things) and rerun the same block, I get: Error in bind_rows_(x, .id): Argument 1 must be a data frame or a named atomic vector, not a function
library(data.table)
# addxtoy_newy_csv <- function(df) {
# zdf1 <- df %>% filter(Variable == "s44")
# setDT(df)
# setDT(zdf1)
# df[zdf1, Value := Value + i.Value, on=.(tstep, variable, Scenario)]
# setDF(df)
#}
tstep <- rep(c("a", "b", "c", "d", "e"), 5)
Variable <- c(rep(c("v"), 5), rep(c("w"), 5), rep(c("x"), 5), rep(c("y"), 5), rep(c("x"), 5))
Value <- c(1,2,3,4,5,10,11,12,13,14,33,22,44,57,5,3,2,1,2,3,34,24,11,11,7)
Scenario <- c(rep(c("i"), 20), rep(c("j"), 5) )
df1 <- data.frame(tstep, Variable, Value, Scenario)
tstep <- c("a", "b", "c", "d", "e")
Variable <- rep(c("x"), 5)
Value <- c(100, 34, 100,22, 100)
Scenario <- c(rep(c("i"), 5))
df2<- data.frame(tstep, Variable, Value, Scenario)
setDT(df1)
setDT(df2)
df1[df2, Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
setDF(df1)
df_all <- mget(ls(pattern="df*")) %>% bind_rows()
The pattern you use in ls() will match any object with a "d" in its name, so addxtoy_newy_csv gets included in the list of object names. The f* in your pattern means you currently search for "d, followed by zero or more f's". I think a safer pattern to use would be ^df.*, to match objects that start with "df":
df1 = data.frame(x = 1:3)
df2 = data.frame(x = 4:6)
adder = function(x) x + 1
ls(pattern = "df*")
ls(pattern = "^df.*")