R- How to use map() into map() - r

I need access a tibble table that is in a nest() function inside of another nest() function.
x <- list( factory = c('a','b','c','d'), cost = c(21,30,44,100))
x <- as.data.frame(x)
x <- x %>%
melt('cost','factory')
colnames(x) <- c('cost','client','type')
x <- x %>%
group_by(client)%>%
nest()
for (m in 1:4) {
if(m==1){
x$scene <- m
x2 <- x
}else{
x3 <- x
x3$scene <- m
x2 <- rbind(x2,x3)
}
}
x2 <- x2 %>%
group_by(scene) %>%
nest()
What am I trying to do is applying a function inside of first vector, something like:
test <- function(df){
df$data %>%
mutate(increa = cost + 15)
}
x2$data%>%
map(test)
dput(x2) result a:
structure(list(scene = 1:4, data = list(structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), structure(list(client =
structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100), type = c("a",
"b", "c", "d")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame")))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")))), row.names = c(NA, -4L), class =
c("tbl_df", "tbl", "data.frame"))
The expected result:
[[1]]
[[1]]$`factory`
[1] "a" "b" "c" "d"
[[1]]$cost
[1] 21 30 44 100
[[1]]$increa
[1] 36 45 59 115
[[2]]
[[2]]$`factory`
[1] "a" "b" "c" "d"
[[2]]$cost
[1] 21 30 44 100
[[2]]$increa
[1] 36 45 59 115
[[3]]
[[3]]$`factory`
[1] "a" "b" "c" "d"
[[3]]$cost
[1] 21 30 44 100
[[3]]$increa
[1] 36 45 59 115
[[4]]
[[4]]$`factory`
[1] "a" "b" "c" "d"
[[4]]$cost
[1] 21 30 44 100
[[4]]$increa
[1] 36 45 59 115
Someone could help me to solve this issue?
ANSWER
This is the result that I was looking for:
map(x2$data, function(df) map(df$data, function(df) df <- mutate(df,increa = cost + 15)))

To get your desired output I think it is easier first to extract the level of information you want to have and then calculate the new column. If you on the other hand want to manipulate the data in this structure and preserve this, than a nested call of map and mutate is necessary-
library(tidyverse)
First solution - extract information and then calculate new column:
We can get to desired level of information with
map(x2$data, ~ .x$data)
# [[1]]
# [[1]][[1]]
# # A tibble: 4 x 2
# cost type
# <dbl> <chr>
# 1 21 a
# 2 30 b
# 3 44 c
# 4 100 d
#
#
# [[2]]
# [[2]][[1]]
# # A tibble: 4 x 2
# cost type
# <dbl> <chr>
# 1 21 a
# 2 30 b
# 3 44 c
# 4 100 d
#
# ...
As this is a nested list structure a second map is needed to calculate the new column. Here the mutate-function is applied to each of the nested data entries with the additional specification to create a new column inc.
map(x2$data, ~ map(.x$data, mutate, inc = cost + 15))
# [[1]]
# [[1]][[1]]
# # A tibble: 4 x 3
# cost type inc
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
#
# [[2]]
# [[2]][[1]]
# # A tibble: 4 x 3
# cost type inc
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
# ...
The same result would be obtained with an extra function test which takes a data.frame as input parameter and calculates the new column:
test <- function(df){
mutate(df, increa = cost + 15)
}
map(x2$data, ~ map(.x$data, test))
Second solution - Manipulate in place
If you however want to keep this nested structure, then we use mutate on the first data-column with map and again mutate and map:
x2_new <- x2 %>%
mutate(data = map(data, function(df1) mutate(df1, data = map(data, test))))
To verify that this worked we again extract the needed information as above:
map(x2_new$data, ~ .x$data)
# [[1]]
# [[1]][[1]]
# # A tibble: 4 x 3
# cost type increa
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
#
# [[2]]
# [[2]][[1]]
# # A tibble: 4 x 3
# cost type increa
# <dbl> <chr> <dbl>
# 1 21 a 36
# 2 30 b 45
# 3 44 c 59
# 4 100 d 115
#
# ...
Third solution - breaks structure but keep information
This is my favourite solution as it turns the data into a tidy format and keeps all information:
x2 %>%
unnest(data) %>%
unnest(data) %>%
mutate(inc = cost + 15)
# A tibble: 16 x 5
# scene client cost type inc
# <int> <fct> <dbl> <chr> <dbl>
# 1 1 factory 21 a 36
# 2 1 factory 30 b 45
# 3 1 factory 44 c 59
# 4 1 factory 100 d 115
# 5 2 factory 21 a 36
# 6 2 factory 30 b 45
# 7 2 factory 44 c 59
# 8 2 factory 100 d 115
# 9 3 factory 21 a 36
# 10 3 factory 30 b 45
# 11 3 factory 44 c 59
# 12 3 factory 100 d 115
# 13 4 factory 21 a 36
# 14 4 factory 30 b 45
# 15 4 factory 44 c 59
# 16 4 factory 100 d 115
Data
generic_data <- structure(
list(client = structure(1L, .Label = "factory", class = "factor"),
data = list(structure(list(cost = c(21, 30, 44, 100),
type = c("a", "b", "c", "d")),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame")))),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
x2 <- structure(
list(scene = 1:4,
data = list(generic_data, generic_data, generic_data, generic_data)),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

Based on your description, I think you're looking for map_depth
From the documentation https://purrr.tidyverse.org/reference/map_if.html:
map_depth(x, 2, fun) is equivalent to x <- map(x, ~ map(., fun))
which looks like the answer/solution you settled on.

Related

Matching data replacement in R

I have a two datasets with a similar dimensions and a similar column names. The goal is to check if NA values exist in one of the datasets and replace with the corresponding values in the other dataset as shown in the example below.
I have tried running a for loop for to do solve the problem but that didn't work and failed miserably.
df is new data frame created with NA's
loop = for (a in 1:nrow(data1)) {
for (b in 1:ncol(data1)) {
for (c in 1:nrow(data2)) {
for (d in 1:ncol(data2)) {
for (x in 1:nrow(df)) {
for (y in 1:ncol(df)) {
df[x,y]<- ifelse(data1[a,b] != "NA", data1[a,b], data2[c,d])
return(df)`enter code here`
}
}
}
}
}
}
Example
# The first data frame
structure(list(age = c(23, 22, 21, 20), gender = c("M", "F",
NA, "F")), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
# age gender
# 1 23 M
# 2 22 F
# 3 21 NA
# 4 20 F
# The second data frame
structure(list(age = c(23, 22, 21, 20), gender = c("M", "F",
"M", "F")), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
# age gender
# 1 23 M
# 2 22 F
# 3 21 M
# 4 20 F
Desired output
Age Gender
23 M
22 F
21 M
20 F
You might try this:
df1 <- tibble(age = c(23,22,21,20),
gender = c("M", "F", NA, "F"))
# -------------------------------------------------------------------------
#> df1
# # A tibble: 4 x 2
# age gender
# <dbl> <chr>
# 1 23 M
# 2 22 F
# 3 21 NA
# 4 20 F
# -------------------------------------------------------------------------
df2 <- tibble(age = c(23,22,21,20),
gender = c("M", "F", "M", "F"))
# -------------------------------------------------------------------------
#> df2
# # A tibble: 4 x 2
# age gender
# <dbl> <chr>
# 1 23 M
# 2 22 F
# 3 21 M
# 4 20 F
# -------------------------------------------------------------------------
# get the na in df1 of gender var
df1.na <- is.na(df1$gender)
#> df1.na
# [1] FALSE FALSE TRUE FALSE
# -------------------------------------------------------------------------
# use the values in df2 to replace na in df1 (Note that this is index based)
df1$gender[df1.na] <- df2$gender[df1.na]
df1
# -------------------------------------------------------------------------
#> df1
# A tibble: 4 x 2
# age gender
# <dbl> <chr>
# 1 23 M
# 2 22 F
# 3 21 M
# 4 20 F
# -------------------------------------------------------------------------
This can be done using the natural_join function from the rqdatatable library. The function does require an index to merge on, so we will need to create one.
Creating a reproducible example will help other people help you. Here I've created two simple data frames that should cover most cases for your problem.
# Create example data
tbl1 <-
data.frame(
w = c(1, 2, 3, 4),
x = c(1, 2, 3, NA),
y = c(1, 2, 3, 4),
z = c(1, NA, NA, NA)
)
tbl2 <-
data.frame(
w = c(9, 9, 9, 9), # check value doesnt overwrite value,
x = c(1, 2, 3, 4), # check na gets filled in
y = c(1, 2, 3, NA), # check NA doesnt overwrite value
z = c(9, NA, NA, NA) # check NA in both stays NA
)
# Create join index
tbl1$indx <- 1:nrow(tbl1)
tbl2$indx <- 1:nrow(tbl2)
# Use natural_join
library("rqdatatable")
natural_join(tbl1, tbl2, by = "indx")

Add the index of list to bind_rows?

I have this data:
dat=list(structure(list(Group.1 = structure(3:4, .Label = c("A","B", "C", "D", "E", "F"), class = "factor"), Pr1 = c(65, 75)), row.names = c(NA, -2L), class = "data.frame"),NULL, structure(list( Group.1 = structure(3:4, .Label = c("A","B", "C", "D", "E", "F"), class = "factor"), Pr1 = c(81,4)), row.names = c(NA,-2L), class = "data.frame"))
I want to use combine using bind_rows(dat) but keeping the index number as a varaible
Output Include Type([[1]] and [[3]])
type Group.1 Pr1
1 1 C 65
2 1 D 75
3 3 C 81
4 3 D 4
data.table solution
use rbindlist() from the data.table-package, which had built-in id-support that respects NULL df's.
library(data.table)
rbindlist( dat, idcol = TRUE )
.id Group.1 Pr1
1: 1 C 65
2: 1 D 75
3: 3 C 81
4: 3 D 4
dplyr - partly solution
bind_rows also has ID-support, but it 'skips' empty elements...
bind_rows( dat, .id = "id" )
id Group.1 Pr1
1 1 C 65
2 1 D 75
3 2 C 81
4 2 D 4
Note that the ID of the third element from dat becomes 2, and not 3.
According to the documentation of bind_rows() you can supply the name for .id argument of the function. When you apply bind_rows() to the list of data.frames the names of the list containing your data.frames are assigned to the identifier column. [EDIT] But there is a problem mentioned by #Wimpel:
names(dat)
NULL
However, supplying the names to the list will do the thing:
names(dat) <- 1:length(dat)
names(dat)
[1] "1" "2" "3"
bind_rows(dat, .id = "type")
type Group.1 Pr1
1 1 C 65
2 1 D 75
3 3 C 81
4 3 D 4
Or in one line, if you prefer:
bind_rows(setNames(dat, seq_along(dat)), .id = "type")

Rename columns in nested lists and row bind

I've a nested list of objects that I'd like to first rename some variables and row bind its object, but selecting only some variables.
In the example below, I'd like to rename columns A to a in the second object, and w to x in the third object to, then row bind all three object selecting only columns a and x using.
Data:
df <- list(structure(list(a = 1:3,
x = c(-1.99, -1.11, -0.34),
y = c("C", "B", "A")), .Names = c("a", "x", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)), structure(list(a = 1:3, x = c(-0.44, -1.07, -0.23)), .Names = c("A", "x"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)),
structure(list(a = 1:3, x = c(-0.62, -0.60, -0.06),
y = c(3L, 2L, 1L)), .Names = c("a", "w", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)))
List structure:
> lapply(df, names)
[[1]]
[1] "a" "x" "y"
[[2]]
[1] "A" "x"
[[3]]
[1] "a" "w" "y"
Then, row binding then:
library(plyr)
df2 <- ldply(df, data.frame)
using purrr (map), dplyr(rename,select,bind_rows,%>%) and magrittr (%<>%,%>%) ):
library(purrr)
library(dplyr)
library(magrittr)
df[[2]] %<>% rename(.,a = A)
df[[3]] %<>% rename(.,x = w)
df %>% map_df(. %>% select("a","x"))
# # A tibble: 9 x 2
# a x
# <int> <dbl>
# 1 1 -1.99
# 2 2 -1.11
# 3 3 -0.34
# 4 1 -0.44
# 5 2 -1.07
# 6 3 -0.23
# 7 1 -0.62
# 8 2 -0.60
# 9 3 -0.06
Or in base R:
names(df[[2]])[names(df[[2]]) == "A"] <- "a"
names(df[[3]])[names(df[[3]]) == "w"] <- "x"
do.call(rbind,lapply(df,"[",c("a","x")))
You could achieve that with:
library(plyr)
df = lapply(df, function(x) {plyr::rename(x,c("A"="a","w"="x"),warn_missing = F)})
df2 <- ldply(lapply(df, function(x) {x[,c("a","x")]}), data.frame)
Output:
a x
1 1 -1.99
2 2 -1.11
3 3 -0.34
4 1 -0.44
5 2 -1.07
6 3 -0.23
7 1 -0.62
8 2 -0.60
9 3 -0.06
Hope this helps.
Another idea could be to create a named vector v with the replacement values, loop over your list, rename if there is a match and select the desired columns.
v <- c("a" = "A", "x" = "w")
map_df(df, .f = ~ rename_if(
.x,
.p = names(.x) %in% v,
.f = funs(stringi::stri_replace_all_fixed(., v, names(v), vectorize_all = FALSE))) %>%
select(names(v))
)
Which gives:
## A tibble: 9 x 2
# a x
# <int> <dbl>
#1 1 -1.99
#2 2 -1.11
#3 3 -0.34
#4 1 -0.44
#5 2 -1.07
#6 3 -0.23
#7 1 -0.62
#8 2 -0.60
#9 3 -0.06

Appending list items

I have a list of some length(let's say 1000). Each element of the list is another list of length = 2. Each element of the new list is a data.table. The second element of each list might be an empty data.table.
I need to rbind() all the data.frames that are in the first position of the list. I am currently doing the following:
DT1 = data.table()
DT2 = data.table()
for (i in 1:length(myList)){
DT1 = rbind(DT1, myList[[i]][[1]]
DT2 = rbind(DT2, myList[[i]][[2]]
}
This works, but it is too slow. Is there a way I can avoid the for-loop?
Thank you in advance!
data table has a dedicated fast function: rbindlist
Cf: http://www.inside-r.org/packages/cran/data.table/docs/rbindlist
Edited:
Here is an example of code
library(data.table)
srcList=list(list(DT1=data.table(X=0),DT2=NULL),list(DT1=data.table(X=2),data.table(Y=3)))
# first have a list for all DT1s
DT1.list= lapply(srcList, FUN=function(el){el$DT1})
rbindlist(DT1.list)
X
1: 0
2: 2
Do this:
do.call("rbind", lapply(df.list, "[[", 1)) # for first list element
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# 4 4 40
# 5 5 50
# 6 6 60
do.call("rbind", lapply(df.list, "[[", 2)) # for second list element
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# 4 4 70
# 5 5 80
# 6 6 90
DATA
df.list=list(list(structure(list(x = 1:3, y = c(10, 20, 30)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
x = 1:3, y = c(30, 40, 50)), .Names = c("x", "y"), row.names = c(NA,
-3L), class = "data.frame")), list(structure(list(x = 4:6, y = c(40,
50, 60)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame"),
structure(list(x = 4:6, y = c(70, 80, 90)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame")))
# df.list
# [[1]]
# [[1]][[1]]
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# [[1]][[2]]
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# [[2]]
# [[2]][[1]]
# x y
# 1 4 40
# 2 5 50
# 3 6 60
# [[2]][[2]]
# x y
# 1 4 70
# 2 5 80
# 3 6 90

List of data frames with names instead of numbers?

I am not sure if this question is too basic but as I haven't found an answer despite searching google for quite some time I have to ask here..
Suppose I want to create a list out of data frames (df1 and df2), how can I use the name of the data frame as the list "index"(?) instead of numbers? I.e., how do I get [[df1]] instead of [[1]] and [[df2]] instead of [[2]]?
list(structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA,
-10L), class = "data.frame"), structure(list(b = 1:10, a = 1:10), .Names = c("b",
"a"), row.names = c(NA, -10L), class = "data.frame"))
OK, entirely different way to ask this question to hopefully make things clearer ;)
I have three data frames
weguihl <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
raeg <- structure(list(b = 1:3, a = 1:3), .Names = c("b", "a"), row.names = c(NA, -3L), class = "data.frame")
awezilf <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
I want to create a list out of them..
li <- list(weguihl, raeg, awezilf)
But now I have the problem that - without remembering the order of the data frames - I do not know which data frame is which in the list..
> li
[[1]]
a b
1 1 1
2 2 2
3 3 3
[[2]]
b a
1 1 1
2 2 2
3 3 3
[[3]]
a b
1 1 1
2 2 2
3 3 3
Thus I'd prefer this output
> li
[[weguihl]]
a b
1 1 1
2 2 2
3 3 3
[[raeg]]
b a
1 1 1
2 2 2
3 3 3
[[awezilf]]
a b
1 1 1
2 2 2
3 3 3
How do I get there?
You could potentially achieving this with mget on a clean global environment.
Something like
Clean the global environment
rm(list = ls())
You data frames
weguihl <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
raeg <- structure(list(b = 1:10, a = 1:10), .Names = c("b", "a"), row.names = c(NA, -10L), class = "data.frame")
awezilf <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
Running mget which will return a list of data frames by default
li <- mget(ls(), .GlobalEnv)
li
# $awezilf
# a b
# 1 1 1
# 2 2 2
# 3 3 3
#
# $raeg
# b a
# 1 1 1
# 2 2 2
# 3 3 3
#
# $weguihl
# a b
# 1 1 1
# 2 2 2
# 3 3 3

Resources