Equivalence between function(x) and purrr::map - r

I have this list:
list(structure(list(a = 1:10, b = 2:11, c = 3:12), .Names = c("a",
"b", "c"), row.names = c(NA, -10L), class = "data.frame"), structure(list(
a = 1:10, b = 2:11, c = 3:12), .Names = c("a", "b", "c"), row.names = c(NA,
-10L), class = "data.frame"), structure(list(a = 1:10, b = 2:11,
c = 3:12), .Names = c("a", "b", "c"), row.names = c(NA, -10L
), class = "data.frame"))
And this function:
fun1<-function(x){
funs<-c(s=sum,m=mean)
lapply(funs,function(f)f(x,na.rm=TRUE))
}
With lapply the result is ok. See:
list%>%
lapply(function(x){
lapply(x,fun1)
})
But, purrr::map doesn't work:
list%>%
map(.)%>%
map(.,fun1)
What's wrong?

Your syntax for the map part is wrong. You need the same code structure as you are using with lapply. First let's get rid of the pipes so the code looks more alike:
Also don't give objects the same name as R functions.
library(purrr)
lapply_outcome <- lapply(my_list, function(x) {lapply(x, fun1)})
map_outcome <- map(my_list, function(x) {map(x, fun1)})
identical(lapply_outcome, map_outcome)
[1] TRUE
With pipes:
my_list %>%
lapply(function(x) lapply(x,fun1))
my_list %>%
map(., function(x) map(x, fun1))
or with a formula call inside map, but personally I find this less readable:
my_list %>%
map(~ map(., fun1))

Related

Transform tidy dataframe into form for sparklines (dataui)

I have some tidy data and need to transform it into a format that works for building small graphs (sparklines) using the dataui package. You can see the required dataframe format in the code example below, df_sparkline.
The tidy data I have has about 30 companies and a year of data which is < 10,000 rows. What is the best (clearest to understand is valued more than raw speed) way to transform df_tidy to df_sparklines?
library("dataui")
library("reactable")
library("tidyverse")
df_tidy <- tibble(
company = c("A", "B", "A", "B", "A", "B"),
line_data = c(1, 2, 2, 2, 1, 1),
date = c(as.Date("2021-01-01"), as.Date("2021-01-01"), as.Date("2021-01-02"), as.Date("2021-01-02"), as.Date("2021-01-03"), as.Date("2021-01-03"))
)
df_sparkline <- structure(list(company = c("A", "B"), line_data = list(list(c(1, 2, 1)), list(c(2, 2, 1)))), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))
rt1 <- reactable(
df_sparkline,
columns = list(
line_data = colDef(
cell = function(value, index) {
dui_sparkline(
data = value[[1]],
height = 80,
components = dui_sparklineseries(curve = "linear") # https://github.com/williaster/data-ui/tree/master/packages/sparkline#series
)
}
)
)
)
rt1
All you need is group_by() and summarise():
df_sparkline2 = df_tidy %>%
group_by(company) %>%
summarise(line_data=list(list(line_data)))
waldo::compare(df_sparkline, df_sparkline2)
# √ No differences
The key here is to call list() inside summarise().

Split Columns in a List of Dataframes R

I have a list of data frames which some columns have this special character ->(arrow). Now i do want to loop through this list of data frames and locate columns with this -> (arrow) then the new columns be named with a suffix _old and _new. This is a sample of data frames :
dput(df1)
df1 <- structure(list(v1 = c("reg->joy", "ress", "mer->dls"),
t2 = c("James","Jane", "Egg")),
class = "data.frame", row.names = c(NA, -3L))
dput(df2)
df2 <- structure(list(v1 = c("me", "df", "kl"),
t2 = c("James","Jane->dlt", "Egg"),
t3 = c("James ->may","Jane", "Egg")),
class = "data.frame", row.names = c(NA, -3L))
dput(df3)
df3 <- structure(list(v1 = c("56->34", "df23-> ", "mkl"),
t2 = c("James","Jane", "Egg"),
d3 = c("James->","Jane", "Egg")),
class = "data.frame", row.names = c(NA, -3L))
This is what I have tried
dfs <- list(df1,df2,df3)
for (y in 1:length(dfs)){
setDT(dfs[[y]])
df1<- lapply(names(dfs[[y]]), function(x) {
mDT <- df2[[y]][, tstrsplit(get(x), " *-> *")]
if (ncol(mDT) == 2L) setnames(mDT, paste0(x, c("_old", "_new")))
}) %>% as.data.table()
}
This only splits one data frame, I need to split all of the data frames.
NOTE: The code I have splits so well on one dataframe, what I want is how to implement it on a List of data frames
EXPECTED OUTPUT
dput(df1)
df1 <- structure(list(v1_old = c("reg", "mer"),
v1_new = c("joy", "dls")),
class = "data.frame", row.names = c(NA, -3L))
dput(df2)
df2 <- structure(list(t2_old = c("dlt"),
t2_new = c("dlt"),
t3_old = c("James"),
t3_new = c("may")),
class = "data.frame", row.names = c(NA, -3L))
dput(df3)
df3 <- structure(list(v1_old = c("56", "df23 "),
v1_new = c("34", " "),
d3 = c("James"),
d3 = c(" ")),
class = "data.frame", row.names = c(NA, -3L))
I add below a solution using the tidyverse.
Select the columns if one of the strings in the columns contains an arrow:
col_arrow_ls <- purrr::map(dfs, ~select_if(., ~any(str_detect(., "->"))))
Then split the function using tidyr::separate. Since each element of the output is a data frame, purrr::map_dfc is used to column-bind them together:
split_df_fn <- function(df1){
names(df1) %>%
map_dfc(~ df1 %>%
select(.x) %>%
tidyr::separate(.x,
into = paste0(.x, c("_old", "_new")),
sep = "->")
)
}
Apply the function to the list of data frames.
purrr::map(col_arrow_ls, split_df_fn)
[[1]]
v1_old v1_new
1 reg joy
2 ress <NA>
3 mer dls
[[2]]
t2_old t2_new t3_old t3_new
1 James <NA> James may
2 Jane dlt Jane <NA>
3 Egg <NA> Egg <NA>
[[3]]
v1_old v1_new d3_old d3_new
1 56 34 James
2 df23 Jane <NA>
3 mkl <NA> Egg <NA>

split columns in a list of dataframes in R

I have a list of data frames which some columns have this special character ->(arrow). Now i do want to loop through this list of data frames and locate columns with this -> (arrow) then the new columns be named with a suffix _old and _new. This is a sample of data frames :
dput(df1)
df1 <- structure(list(v1 = c("reg->joy", "ress", "mer->dls"),
t2 = c("James","Jane", "Egg")),
class = "data.frame", row.names = c(NA, -3L))
dput(df2)
df2 <- structure(list(v1 = c("me", "df", "kl"),
t2 = c("James","Jane->dlt", "Egg"),
t3 = c("James ->may","Jane", "Egg")),
class = "data.frame", row.names = c(NA, -3L))
dput(df3)
df3 <- structure(list(v1 = c("56->34", "df23-> ", "mkl"),
t2 = c("James","Jane", "Egg"),
d3 = c("James->","Jane", "Egg")),
class = "data.frame", row.names = c(NA, -3L))
This is what I have tried
dfs <- list(df1,df2,df3)
for (y in 1:length(dfs)){
setDT(dfs[[y]])
df1<- lapply(names(dfs[[y]]), function(x) {
mDT <- df2[[y]][, tstrsplit(get(x), " *-> *")]
if (ncol(mDT) == 2L) setnames(mDT, paste0(x, c("_old", "_new")))
}) %>% as.data.table()
}
This only splits one data frame, I need to split all of the data frames
EXPECTED OUTPUT
dput(df1)
df1 <- structure(list(v1_old = c("reg", "mer"),
v1_new = c("joy", "dls")),
class = "data.frame", row.names = c(NA, -3L))
dput(df2)
df2 <- structure(list(t2_old = c("dlt"),
t2_new = c("dlt"),
t3_old = c("James"),
t3_new = c("may")),
class = "data.frame", row.names = c(NA, -3L))
dput(df3)
df3 <- structure(list(v1_old = c("56", "df23 "),
v1_new = c("34", " "),
d3 = c("James"),
d3 = c(" ")),
class = "data.frame", row.names = c(NA, -3L))
So I have played around and found the answer
df1 <-c()
for (y in 1:length(dfs)){
setDT(dfs[[y]])
df1[[y]] <- lapply(names(modifiedtbl[[y]]), function(x) {
mDT <- dfs[[y]][, tstrsplit(get(x), " *-> *")]
if (ncol(mDT) == 2L) setnames(mDT, paste0(x, c("_old", "_new")))
}) %>% as.data.table()
}

Map nested data by row r

I have data that look like this (thanks once again dput!):
dat <- structure(list(vars = c("var_1", "var_2"), data = list(structure(list(
time = 1:10, value = c(1:10
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(time = 1:10, value = c(11:20
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))), mu = c(1, 2), stdev = c(1,2)), class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA,-2L))
I am trying to mutate an extra column which maps a function over each row. e.g calculate dnorm for each element of the nested variable in dat$data[[1]]$value using dat$mu[1] and dat$stdev[1] and the go on to do the same for row two.
The column I would like to mutate is a tibble [10 x 1] for each row containing this as the output:
dnorm(dat$data[[1]]$value, mean = dat$mu[1], sd = dat$stdev[1])
dnorm(dat$data[[2]]$value, mean = dat$mu[2], sd = dat$stdev[2])
Things I have tried that don't work but might be close?:
# This alternates between mean and stdev for each element of each nested variable
dat_1 <- dat %>%
mutate(z = map(data, ~ dnorm(.x$value, mean = dat$mu, sd = dat$stdev)))
# apply by row has structure issues
dat_2 <- dat %>%
apply(MARGIN = 1, function(x){
mutate(x, z = map(data, ~ dnorm(.x$value, mean = dat$mu, sd = dat$stdev)))
})
a basic map function like this dat_3 <- dat %>% mutate(sigma = map(data, ~ sum(.x$value))) works fine without referencing other values in the df. This is early days for me using nested data and map in this way - been looking at the documentation for all the map functions to try solve this but no luck yet! If that's clear as mud I can try clarify - thanks in advance!
We can use a parallel map:
library(purrr)
library(dplyr)
expected_out1 <- dnorm(dat$data[[1]]$value, mean = dat$mu[1], sd = dat$stdev[1])
expected_out2 <- dnorm(dat$data[[2]]$value, mean = dat$mu[2], sd = dat$stdev[2])
out <-
dat %>%
mutate(z = pmap(list(map(data, "value"), mu, stdev), dnorm))
all.equal(out$z, list(expected_out1, expected_out2))
# [1] TRUE

discard last or first group after group_by by referencing group directly

Data:
df <- data.frame(A=c(rep(letters[1],3),rep(letters[2],3),rep(letters[3],3)),
B=rnorm(9),
stringsAsFactors=F)
I don't know if there's a way to do this, but what I'd like to know is if there's way to discard the last group by directly referencing the groups after group_by(A) to get the desired output:
A B
1 a -0.4900863
2 a 1.4106594
3 a -0.2245738
4 b -0.2124955
5 b 0.6963785
6 b 0.9151825
I AM INTERESTED IN SOLUTIONS THAT DIRECTLY WORK AT THE GROUPS LEVEL
For instance, something like:
df %>% group_by(A) %>% head(.Groups,-1)
or
df %>% group_by(A) %>% Groups[1:2]
I AM NOT INTERESTED IN THE FOLLOWING KINDS OF SOLUTIONS
df %>% filter(!(A == max(A)))
df %>% filter(!(A %in% max(A)))
OR OTHER SOLUTIONS THAT DO NOT REQUIRE group_by TO WORK
I was assuming you were not supposed to be assuming that we knew in advance what the number of groups might be. Try using the labels attribute:
all_but_last <- df %>% group_by(A) %>% attr("labels") %>% head(-1)
A
1 a
2 b
... to extract desired rows
> df %>% filter(A %in% all_but_last[[1]])
A B
1 a -0.799026840
2 a -0.712402478
3 a 0.685320094
4 b 0.971492883
5 b -0.001479117
6 b -0.817766296
Helps to use dput to look at the actual contents of a "grouped_df":
dput( df %>% group_by(A) )
structure(list(A = c("a", "a", "a", "b", "b", "b", "c", "c",
"c"), B = c(-0.799026840397576, -0.712402478350695, 0.685320094252465,
0.971492883452258, -0.00147911717469651, -0.817766295631676,
-1.00112471676908, 1.88145909873596, -0.305560178617216)), .Names = c("A",
"B"), row.names = c(NA, -9L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = "A", drop = TRUE, indices = list(
0:2, 3:5, 6:8), group_sizes = c(3L, 3L, 3L), biggest_group_size = 3L,
labels = structure(list(
A = c("a", "b", "c")),
row.names = c(NA, -3L),
class = "data.frame",
vars = "A", drop = TRUE, .Names = "A"))
Note that the labels are a data.frame so you could have further applied unlist to the result that became all_but_last and you then would not have needed to extract its value with "[[".
Perhaps this helps
library(dplyr)
df %>%
group_by(A) %>%
group_indices(.) %in% 1:2 %>%
df[.,]
Or with data.table
library(data.table)
setDT(df)[, grp := .GRP, A][grp %in% unique(grp)[1:2]][, grp := NULL][]

Resources