I have a character vector nms of variable names that all appear in at least one of several files. If a variable exists in more than one file, the values will be the same.
I have a named list test_lst where the top-level names are the names of the files. A sublist of the list includes a vector of the names of the variables in the file.
I would like to use purrr go through test_lst and find the first file that contains each of the variables, and return a named list where the names are the filenames and each element is a vector of the variables in nms that exist in that file. And I would like to index the sublist by name, not by position.
It seems like this should be easy, and I don’t know why I can not make it work.
Data:
test_lst <- list(ob1 = list(v1 = list(s1 = "X", s2 = paste0("A", 1:3)), v2 = paste0("A", 4:8)),
ob2 = list(v1 = list(s1 = "X", s2 = paste0("A", 9:11)), v2 = paste0("A", 12:16)))
nms <- c (paste0("A", 1:2), paste0("A", 9:10))
Non-working code:
find_vars <- function(var_names, meta){
map_chr(meta, c("v1", "s2")) -> var_vecs
names(var_vecs)<- names(meta)
map_chr(var_vecs, var_names %in% .) -> out
names(out) <- names(var_vecs)
out
}
find_vars(var_names = nms, meta = test_lst)
Desired output, a list:
$ob1
[1] "A1" "A2"
$ob2
[1] "A9" "A10"
We can use modify_depth
library(tidyverse)
modify_depth(test_lst, 2, ~ enframe(.x) %>%
select(value) %>%
unnest %>%
filter(value %in% nms)) %>%
flatten %>%
keep(~ nrow(.x) > 0) %>%
map(~ .x %>%
pull(value)) %>%
set_names(names(test_lst))
#$ob1
#[1] "A1" "A2"
#$ob2
#[1] "A9" "A10"
Or we can enframe first and then loop through the 'value' column to subset the elements
enframe(test_lst) %>%
unnest %>%
mutate(value = map(value, ~ intersect(nms, unlist(.x)))) %>%
unnest %>%
deframe %>%
split(names(.))
Or using the same notation we used with intersect earlier
map(test_lst, ~ intersect(nms, unlist(.x)))
or another option is melt
library(reshape2)
melt(test_lst) %>%
select(L1, value) %>%
group_by(L1) %>%
filter(value %in% nms) %>%
{split(as.character(.$value), .$L1)}
We can unlist all values of test_lst and find out common values using intersect
lapply(test_lst, function(x) intersect(unlist(x), nms))
#$ob1
#[1] "A1" "A2"
#$ob2
#[1] "A9" "A10"
If you want to use purrr, we can change lapply to map
purrr::map(test_lst, ~intersect(unlist(.), nms))
Related
I built a function to use it inside the purrr::map2 function and run it in two lists. When I run the function steps separately it works ok. But apparently in map2 it runs the first time (for the first elements of list .x[[1]] .y[[1]]) and then in the second round throws this error in the for function:
How can I find out why it's not working?
PS: It's hard to put an example of the data here because they are lists with very specific characteristics for this function. I'm sorrry.
Follow the function:
df <- list()
build_HUW_raster <- function(.x, .y) {
list.time <- .x %>%
split(.$id) %>%
purrr::map(~list(t=as.matrix(.x$date),
xy=unname(as.matrix(.x[,c(22,23)])))
)
for(i in 1:50){
cat(i," ")
path=list.time[[i]]
ctmc=ctmcmove::path2ctmc(path$xy,path$t,r,method="LinearInterp")
df[[i]] <- as.data.frame(do.call(cbind, ctmc))
}
df <- df %>% purrr::map(~ group_by(., ec) %>%
summarise(rt = mean(rt)) %>%
arrange(desc(rt))
)
stacktime <- df %>% purrr::map(~ rename(., cell = ec)) %>%
map(~dplyr::left_join(cargo.grid, ., by="cell", copy=T)) %>%
map(~raster::rasterize(., r, field="rt", na.rm=F, background=0)) %>%
raster::stack()
stackprop <- .y %>%
split(.$id) %>%
purrr::map(~ raster::rasterize(., y = r,
field=.$proportion,
fun=function(x, ...)median(x))) %>%
raster::stack()
stack_huw <- raster::overlay(raster::calc(stacktime, fun=function(x)
ifelse(is.na(x), NA, x/sum(x, na.rm=T))), stackprop, fun=function(x,y)x*y
)
raster_mean <- raster::stackApply(stack_huw,
indices = rep(1,raster::nlayers(stack_huw)),
fun = "mean",
na.rm = F
)
}
result.list <- purrr::map2 (.x=list1, .y=list2, fun=build_HUW_raster)
The reason is based on the element looped. [[ extracts the list element and depending on the class of the element, map loops over either individual elements if it is a vector/matrix or the columns in case of data.frame as these are units. By using [, it extracts the element as a list
list(1, 2, 3)[1]
[[1]]
[1] 1
vs
list(1, 2, 3)[[1]]
[1] 1
When we loop over map and apply some functions that require a specific structure i.e. colSums require a matrix/data.frame ie. with dim attributes, it fails if we use [[
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[[1]], colSums)
Error in .f(.x[[i]], ...) :
'x' must be an array of at least two dimensions
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[1], colSums)
[[1]]
col1 col2
15 40
Here, we may change the code to
purrr::map2(.x=list1[1], .y=list2[1], fun=build_HUW_raster)
How can this be done more elegantly? I'm looking to convert a vector of key value pairs as concatenated strings, into a vector of values with the keys as names.
library(tidyverse)
library(purrr)
x <- c("key1|value1", "key2|value2")
# Current way
x_split <- x %>% str_split("\\|")
keys <- x_split %>% map(pluck(1)) %>% unlist()
values <- x_split %>% map(pluck(2)) %>% unlist()
y <- values %>% set_names(keys)
# More elegant way
y <- x %>% some_functions()
You can use simplify = TRUE in str_split and use set_names.
stringr::str_split(x, "\\|", simplify = TRUE) %>% {purrr::set_names(.[, 2], .[, 1])}
# key1 key2
#"value1" "value2"
I've always liked data.table::tstrsplit.
library(data.table)
tstrsplit(x,"\\|") %>% {setNames(.[[2]],.[[1]])}
# key1 key2
#"value1" "value2"
In order to filter a data.frame for only the the columns of interest I need to find the columns in this data.frame containing data outside a specific range.
Let the data.frame be
df<-data.frame(x1=c(1,5,9),x2=c(10,20,30),x3=c(20,100,1000))
ranges<-data.frame(y1=c(3,8),y2=c(10,20), y3=c(15,1250))
As an output I'd like a list returning the colnames: "x1","x2"
I tried the following, but the code works only if "ranges" contains all the numbers as specified below, and matches if the number is found. Thats unfortunately not what I need.
ranges<-c(15:300,10:20)
df.l<-colnames(df)[sapply(df,function(x) any(x %in% ranges))]
Any ideas?
Thanks!
If 'ranges' is a data.frame or list, one option is
names(which(unlist(Map(function(x, y) any(!(x >= y[1] & x <= y[2])), df, ranges))))
#[1] "x1" "x2"
Or use the reverse logic
names(which(unlist(Map(function(x, y) any(x < y[1]| x > y[2]), df, ranges))))
Or in tidyverse,
library(purrr)
library(dplyr)
library(tibble)
map2(df, ranges, ~ between(.x, .y[1], .y[2]) %>% `!` %>% any) %>%
enframe %>%
unnest(cols = value) %>%
filter(value) %>%
pull(name)
#[1] "x1" "x2"
data
ranges <- data.frame(y1 = c(3, 8), y2 = c(10, 20), y3 = c(15, 1250))
I want to select items by index from a list before applying another function to it using purrr:map. I have tried the following, but can't find a way that works.
require(dplyr)
require(purrr)
dat <- list(1:3,
4:6,
letters[1:3])
# I can select one item
dat[1]
# I can select two items
dat[c(1,2)]
# But how can I do this in a pipeline by index?
dat %>% map(mean)
dat %>%
filter(c(1,2)) %>%
map(mean)
dat %>%
keep(1,2) %>%
map(mean)
dat %>%
select(1,2) %>%
map(mean)
We can use `[` and do
dat %>%
.[c(1, 2)] %>%
map(., mean)
#[[1]]
#[1] 2
#[[2]]
#[1] 5
Or define an alias in the way the magrittr package does it
extract <- `[` # literally the same as magrittr::extract
dat %>%
extract(c(1, 2)) %>%
map(., mean)
Which could also be written as
dat %>% `[`(c(1,2))
Using baseR pipe operator this would read
dat |>
`[`(x = _, j = c(1,2)) |> # R version >= R 4.2.0
lapply(mean)
#[[1]]
#[1] 2
#
#[[2]]
#[1] 5
An option is
library(tidyverse)
keep(dat, seq_along(dat) %in% 1:2) %>%
map(mean)
#[[1]]
#[1] 2
#[[2]]
#[1] 5
Or map with pluck
map(1:2, ~ pluck(dat, .x) %>%
mean)
Or with assign_in
assign_in(dat, 3, NULL) %>%
map(mean)
Or another option is map_if
map_if(dat, is.numeric, mean, .else = ~ NULL) %>%
discard(is.null)
Or with discard
discard(dat, is.character) %>%
map(mean)
or with Filter and map
Filter(is.numeric, dat) %>%
map(mean)
NOTE: All of them gets the expected output.
I am trying to apply a custom function to a data.frame row by row, but I can't figure out how to apply the function row by row. I'm trying rowwise() as in the simple artificial example below:
library(tidyverse)
my_fun <- function(df, col_1, col_2){
df[,col_1] + df[,col_2]
}
dff <- data.frame("a" = 1:10, "b" = 1:10)
dff %>%
rowwise() %>%
mutate(res = my_fun(., "a", "b"))
How ever the data does not get passed by row. How can I achieve that?
dplyr's rowwise() puts the row-output (.data) as a list of lists, so you need to use [[. You also need to use .data rather than ., because . is the entire dff, rather than the individual rows.
my_fun <- function(df, col_1, col_2){
df[[col_1]] + df[[col_2]]
}
dff %>%
rowwise() %>%
mutate(res = my_fun(.data, 'a', 'b'))
You can see what .data looks like with the code below
dff %>%
rowwise() %>%
do(res = .data) %>%
.[[1]] %>%
head(1)
# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 1