I have a folder with different files, each with a different format, so I created different functions able to read each of the files. Is it possible to use map to apply the corresponding function to the corresponding file?
I have found this post to apply several functions to the object, but I don't think is applicable in this case since here all functions are applied always.
all_files <- list.dirs(file.path(path))
fun_A <- function(x) {read.csv(x)}
fun_B <- function(x) {read.table(x)}
fun_C <- function(x) {read.delim(x)}
funs <- c(fun_A , fun_B , fun_C)
So, if I do it manually it works:
(all_files %>%
purrr::map(., ~list.files(., full.names = T)))[[1]][1] %>% fun_A() %>%
dplyr::bind_rows((all_files %>%
purrr::map(., ~list.files(., full.names = T)))[[1]][2] %>% fun_B ()) %>%
dplyr::bind_rows((all_files %>%
purrr::map(., ~list.files(., full.names = T)))[[1]][3] %>% fun_C())
But I tried several times with purrr and I am not able to make it work. This is my final attempt:
all_files %>% purrr::map(.x = ., ~{
df = (.x)
funs %>% purrr::map(., ~ df %>% (.))
})
Any suggestions?
You can use Map or map2 as suggested by #akrun
do.call(rbind, Map(function(x, y) y(x), all_files, funs))
Using map2_df :
purrr::map2_df(all_files, funs, ~.y(.x))
For this to work it is expected that length(all_files) and length(funs) are equal.
Related
I'm testing some operations with that classical 99' Czech Bank data set, trying to execute a tidyverse task upon several data.frames in my global environment, but the loop I've created keeps overwriting the same object val, which was supposed to be a dummy for the df themselves:
x <- c("loans93","loans94","loans95","loans96",
"loans97","loans98")
x <- base::mget(x, envir=as.environment(-1), mode= "any", inherits=F)
for (val in x) {
val <- val %>%
select(account_id, district_id, balance, status, date) %>%
group_by(account_id, district_id, status, date) %>%
summarise(balance=mean(balance, na.rm=T)) %>%
ungroup()
}
What am I doing wrong? I've searched for similar questions but people keep answering lapply solutions, I just need the task to be saved upon my DFs instead of this "val" object I keep getting.
you could try this:
df_names <- c("loans93", "loans94", "loans95",
"loans96", "loans97", "loans98")
for(df_name in df_names){
get(df_name) %>%
head %>% ## replace with desired manipulations
assign(value = .,
x = paste0(df_name,'_manipulated'), ## or just: df_name to overwrite original
envir = globalenv())
}
aside: list2env is handy to "spawn" list members (e. g. dataframes) into the environment. Example:
list_of_dataframes <- list(
iris_short = head(iris),
cars_short = head(cars)
)
list2env(x = list_of_dataframes)
Is there a quick and easy way using dplyr to add a column called 'site_id' which populates rows from the number given to the filename when using map_df from purrr package to bring the data in to one dataframe?
For example my.files will read in two csv files:
"H:/Documents/2015.csv" and "H:/Documents/2021.csv"
my.files <- list.files(my.path, pattern = "*.csv", full.names = TRUE)
I then use map_df to bring all the data in to one data frame, but would like to create an additional column called 'site_id' that will populate each row from that file with its original file title e.g. 2015 or 2021
I currently merge the .csv files together with this code:
temp.df <- my.files %>% map_df(~read.csv(., skip = 15))
But I envisage using mutate to help but am unsure how it would work...
temp.df <- my.files %>% map_df(~read.csv(., skip = 15) %>%
mutate(site_id = ????))
Any help is much appreciated.
We may use imap if we want to use mutate
library(dplyr)
library(purrr)
setNames(my.files, my.files) %>%
imap_df(~ read.csv(.x, skip = 15) %>%
mutate(site_id = .y))
Or specify the .id in map
setNames(my.files, my.files) %>%
map_dfr(read.csv, skip = 15, .id = "site_id")
Using purrr & dplyr:
temp.df <- my.files %>%
purrr::set_names() %>%
purrr::map(., ~read.csv(., skip = 15)) %>%
dplyr::bind_rows(.id = "site_id")
I am trying to loop through hundreds of weather data files (.nc) and merge them together.
I can load them, and merge them manually using:
library(raster)
library(ncdf4)
library(ncdf4.helpers)
require(data.table)
#define input paths, load data, then merge
baseline_path_file <- "E:/input_data/HadOBS/tas/tas_hadukgrid_uk_1km_mon_202001-202012.nc"
baseline_path_file2 <- "E:/input_data/HadOBS/tas/tas_hadukgrid_uk_1km_mon_201901-201912.nc"
BASELINE <- setDT(as.data.frame(brick(baseline_path_file), xy = T))
BASELINE2 <- setDT(as.data.frame(brick(baseline_path_file2), xy = T))
combined <- merge(BASELINE, BASELINE2, by = c("x","y"))
but what I would like to do is define the list of files in a folder and merge them manually.
e.g.
library(fs)
files <- dir_ls("E:/input_data/HadOBS/tas")
combined2 <- map(files, brick) %>%
as.data.frame %>%
setDT %>%
reduce(inner_join, by = c("x", "y"))
but that obviously isn't working... I can't seem to get the piping in the right order. Any ideas how to get this right? Many thanks indeed.
The problem appears to be that you are only using map() to apply brick to your list elements, and not also as.data.frame() and setDT().
Lacking the data, I didn't run this code, so it might not work, but you get the idea:
combined2 <- map(files,
~ .x %>%
brick() %>%
as.data.frame() %>%
setDT()
) %>%
reduce(inner_join, by = c("x", "y"))
Consider I have two Excel files in my subdirectionary:
.../Myfolder/File1.xlsx
.../Myfolder/File2.xlsx
I know that I can read them into R as a list using the following formular:
data <- list.files(path = "./Myfolder/", pattern="*.xlsx", full.names = T)
data.list <- lapply(data, read_excel)
However, I want to name my objects in the list according to the file name. That is, the first objects name shall be "File1" and the second one should be "File2". I can use:
names(data.list) <- data
But then I get the full name (because I use full.names = T).
You can do :
names(data.list) <- sub('\\.xlsx', '', basename(data))
Or without any regex :
names(data.list) <- tools::file_path_sans_ext(basename(data))
This is what you're asking.
library(tidyverse)
library(stringr)
library(readxl)
(list.files('folder_with_sheets') %>%
keep(~ str_detect(.x, '.xlsx')) %>%
set_names(.) %>%
map(read_excel) ->
data)
But supposing they all have the same columns in each:
library(tidyverse)
library(stringr)
library(readxl)
(list.files('folder_with_sheets') %>%
keep(~ str_detect(.x, '.xlsx')) %>%
map_dfr(~ read_excel(.x) %>% mutate(sheet = .x)) ->
data)
Supposing they all share an identification column and represent different data about the same individuals:
library(tidyverse)
library(stringr)
library(readxl)
(list.files('folder_with_sheets') %>%
keep(~ str_detect(.x, '.xlsx')) %>%
map(read_excel) %>%
reduce(left_join) -> # or reduce(~ left_join(.x, .y, by = 'key_variable_name')
data)
Either way, with set_names you can pipe in name assignment, which is preferable to having two expressions, one to create data, other to label it.
P.S:
This is how I'd do it nowadays:
library(tidyverse)
library(readxl)
library(fs)
fs::dir_ls(
path = "folder/",
glob = "*.xlsx") %>%
purrr::set_names(
x = purrr::map(., readxl::read_excel),
nm = .)
# or maybe within a tibble?
tibble::tibble(
path = fs::dir_ls(
path = "folder/",
glob = "*.xlsx"),
data = purrr::map(path, readxl::read_excel))
I had to modify with. However, it does keep the final path name extension in the list names, which I don't like.
(list.files(path = 'filepath ', pattern = "\\.xlsx$", full.names = TRUE) %>%
keep(~ str_detect(.x, '\\.xlsx$')) %>%
set_names(.) %>%
map(read_excel) ->
data)
the following code harvests data from a website. I retrieve a list of lists, I want to unlist one of the lists, edit it, then re-nest it back into the data into the form the data was received. Here is my code below, it fails one the re-nesting.
library(jsonlite)
library(plyr)
library(ckanr)
library(purrr)
library(dplyr)
ckanr_setup(url = "https://energydata.info/")
package_search(q = 'organization:world-bank-grou')$count
json_data2 <- fromJSON("https://energydata.info/api/3/action/package_search?q=organization:world-bank-grou", flatten = TRUE)
dat2 <- json_data2$result
str(dat2)
###########
#Get the datasets and unlist metadata
###########
df <- as.data.frame(json_data2$result$results)
Tags <- select(df, id, topic)
#Make some edits
Tags$topic <- tolower(Tags$topic)
res <- rbind.fill(lapply(Tags,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
res$V1 = paste0("Some edit:",res$V1)
res$V2 = paste0("Some edits:", res$V2)
res$V3 = paste0("Some edit:", res$V3)
res[res=="Some edit:NA"]<-NA
res$V1 <- gsub(" ", "_", res$V1)
res$V2 <- gsub(" ", "_", res$V2)
res$V3 <- sub(" ", "_", res$V3)
res
###########
#Re-nest
###########
#turning res df back into list of lists
nestedList <- flatten(by_row(res, ..f = function(x) flatten_chr(x), .labels = FALSE)) #FAILS HERE
ERROR: Error in flatten(by_row(res, ..f = function(x) flatten_chr(x),
.labels = FALSE)) : could not find function "by_row"
Unclear from the question wording exactly what kind of list of lists you want to end up with, but maybe this is what you're looking for?
res %>%
rowwise() %>%
as.list()
or
res %>%
t() %>%
as.data.frame() %>%
rowwise() %>%
as.list()