Saving a dataset through a function, but it is not loaded afterwards - r

I have created a function that reads some text data from a certain file, does some manipulation (omitted here) and then saves each modified dataframe in this list as .RData. I have checked that the function does its job. However, when loading the output again into RStudio, the load command runs without errors, but there is no new object in my environment.
Any possible fixes?
f <- function(directory_input, directory_output, par1, par2){
library(tidyverse)
library(readxl)
if(dir.exists(directory_output) == F) {
dir.create(directory_output)
}
key <- data.frame(par= as.character(paste0(0, par1, par2)))
paths <- key %>% mutate(
filepath_in = file.path(directory_input, paste0(par, '.txt'), sep = ''),
filepath_out = file.path(directory_output, paste0(par, '.RData'), sep = '')
)
filepath_in <- paths$filepath_in
filepath_out <- paths$filepath_out
DF <- filepath_in %>% map( ~ .x %>% read.delim2(., encoding = 'Latin-1', nrows = 1000))
map2(DF, filepath_out, ~ .x %>% save(file = .y))
}
EDIT
After the comments, here is a bit more of context:
I was instructed to write a function that will be part of a future package.
The function does not create a new dataframe, only saves it in the computer. I designed in order to make it easier for the user in case he is manipulating multiple datasets.
On the other hand, it is natural to assume that the user would use these datasets in the future in the most intuitive way, using only load. So that is why it wouldn't be ideal to have a solution that requires using assign to load the results in the future.

Related

How to import a lot of datasets based on some patterns in R

I have a lot of datasets (more than 20) to import and I want to import them all at the same time
The names of all of the datasets begin with: SearchResults, like:
SearchResults_2014_S1_2.csv
SearchResults_2014_S1.csv
SearchResults_2015_S1.csv
SearchResults_2015_S2.csv
All of the datasets have the same column names in the same order.
I want to import them all in a single code line then to bind them.
I've tried to put together a whole example, since there is very scarce information here.
First step, is to have some data to load:
library(tidyverse)
c("SearchResults_2014_S1_2.csv",
"SearchResults_2014_S1.csv",
"SearchResults_2015_S1.csv",
"SearchResults_2015_S2.csv") %>%
walk(~
iris %>%
sample_n(replace = TRUE,
size = nrow(iris) *
# sample(c(2,3,4), size = 1)) %>%
runif(1, 1, 4)) %>%
readr::write_excel_csv2(., file = fs::path(fs::path_temp(), .x)) %>%
print())
#'
fs::dir_ls(fs::path_temp())
And now the temporary directory has some valid data files, but
we want to make it a bit challenging, so let us save some other files
in the same directory:
replicate(25, fs::file_temp() %>%
write.csv(x = list()))
Then look at the files in the temporary directory once more:
fs::dir_ls(fs::path_temp())
Finally, let us read the files that have SearchResults in the name:
fs::dir_ls(
path = tempdir(),
glob = "*SearchResults_*.csv",
type = "file"
) %>% {
tibble(path = .,
data = map(., . %>%
read_csv2(show_col_types = FALSE)))
} -> all_files
At this point, you've got the files in a data-variable. Run
spec() on them to see if the parsing went well in all of them.
Preferably set the col_types in the reading code above, so
you're sure that things are being read in correctly.
Finally, unnest to collate all the loaded datasets.
Plus a field called path that is only the filename, as to figure out
from whence the rows came from (in case there is another piece of information present in there).
all_files %>%
mutate(path = fs::path_file(path)) %>%
unnest(data)

Iterating through values in R

I'm new-ish to R and am having some trouble iterating through values.
For context: I have data on 60 people over time, and each person has his/her own dataset in a folder (I received the data with id #s 00:59). For each person, there are 2 values I need - time of response and picture response given (a number 1 - 16). I need to convert this data from wide to long format for each person, and then eventually append all of the datasets together.
My problem is that I'm having trouble writing a loop that will do this for each person (i.e. each dataset). Here's the code I have so far:
pam[x] <- fromJSON(file = "PAM_u[x].json")
pam[x]df <- as.data.frame(pam[x])
#Creating long dataframe for times
pam[x]_long_times <- gather(
select(pam[x]df, starts_with("resp")),
key = "time",
value = "resp_times"
)
#Creating long dataframe for pic_nums (affect response)
pam[x]_long_pics <- gather(
select(pam[x]df, starts_with("pic")),
key = "picture",
value = "pic_num"
)
#Combining the two long dataframes so that I have one df per person
pam[x]_long_fin <- bind_cols(pam[x]_long_times, pam[x]_long_pics) %>%
select(resp_times, pic_num) %>%
add_column(id = [x], .before = 1)
If you replace [x] in the above code with a person's id# (e.g. 00), the code will run and will give me the dataframe I want for that person. Any advice on how to do this so I can get all 60 people done?
Thanks!
EDIT
So, using library(jsonlite) rather than library(rjson) set up the files in the format I needed without having to do all of the manipulation. Thanks all for the responses, but the solution was apparently much easier than I'd thought.
I don't know the structure of your json files. If you are not in the same folder, like the json files, try that:
library(jsonlite)
# setup - read files
json_folder <- "U:/test/" #adjust you folder here
files <- list.files(path = paste0(json_folder), pattern = "\\.json$")
# import data
pam <- NULL
pam_df <- NULL
for (i in seq_along(files)) {
pam[[i]] <- fromJSON(file = files[i])
pam_df[[i]] <- as.data.frame(pam[[i]])
}
Here you generally read all json files in the folder and build a vector of a length of 60.
Than you sequence along that vector and read all files.
I assume at the end you can do bind_rowsor add you code in the for loop. But remember to set the data frames to NULL before the loop starts, e.g. pam_long_pics <- NULL
Hope that helped? Let me know.
Something along these lines could work:
#library("tidyverse")
#library("jsonlite")
file_list <- list.files(pattern = "*.json", full.names = TRUE)
Data_raw <- tibble(File_name = file_list) %>%
mutate(File_contents = map(File_name, fromJSON)) %>% # This should result in a nested tibble
mutate(File_contents = map(File_contents, as_tibble))
Data_raw %>%
mutate(Long_times = map(File_contents, ~ gather(key = "time", value = "resp_times", starts_with("resp"))),
Long_pics = map(File_contents, ~ gather(key = "picture", value = "pic_num", starts_with("pic")))) %>%
unnest(Long_times, Long_pics) %>%
select(File_name, resp_times, pic_num)
EDIT: you may or may not need not to include as_tibble() after reading in the JSON files, depending on how your data looks like.

Sequencing along a list, reading files from folder and applying a given function

I need help to modify my code to do the following tasks... I've used help from the following questions and answers thus far
Opening all files in a folder, and applying a function
How to assign a unique ID number to each group of identical values in a column
Here are things i hope to be able to do with my code...
I need to read in several files from a folder
I will like to use the name of each of the files in the folder to add a column. I was able to do this simply with 'mutate' but for a single file
I will like to save the result of each file separately and also combine to a single file
I also want to keep the code for reading the files separate from the function, so i can apply to other projects.
I'm trying to avoid using the 'loop' statements
Here is the sample of my incomplete code which gives error
library(tidyverse)
library(readr)
cleaningdata<- function(data){
data$Label<-gsub(".tif", "", data$Label)
data %>% select(Label:Solidity) %>%group_by(Label)%>%
mutate(view = seq_along(Label), Station="T1-1")%>%
rename(Species = Label)%>%
mutate(view = recode(view, "1" = "a","2" = "b","3" = "c"))
}
filenames <- list.files("Data", pattern="*.txt", full.names=TRUE)
ldf <- lapply(filenames, read.txt)
res <- lapply(ldf, cleaningdata)
Here is a sample of my dataset Data Folder and below is my work thus far
The fs package contains the useful dir_map function, which applies a function to each file in the path. If you need more control over the files to use, you could alternatively pipe a vector of the filenames into purrr::map() instead.
Your error Warning message: Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively or supply .default was because you were recoding 1, 2, 3 to a, b, c but one of the Species had 6 rows so 4, 5, 6 were recoded to NA. I've used letters[n] to avoid this problem.
library(tidyverse)
library(fs)
result <- dir_map(path = 'Data', fun = function(filepath) {
read_tsv(filepath) %>%
select(-1) %>%
rename(Species = Label) %>%
mutate(Species = sub('.tif$', '', Species)) %>%
group_by(Species) %>%
mutate(
View = seq_along(Species),
View = letters[View], # a, b, c, etc. instead of 1, 2, 3, etc.
Station = sub('.txt$', '', basename(filepath))
)
})
# get rows from second file
result[[2]]
# bind rows from all files
result %>% bind_rows()

How can I read in and bind together several data frames without using a for loop?

for() loops in R always seem to be my go-to for reading in multiple things but there must be a better way of doing what I want to do.
Let's say I have several data sets that all come from an automated data-pull system:
#Fake data set up
library(lubridate)
dir_path <- tempdir()
file_1 <- paste0(Sys.Date() - days(2), ".rds")
file_2 <-paste0(Sys.Date() - days(1), ".rds")
file_3 <- paste0(Sys.Date(), ".rds")
data.frame(thing = rnorm(100)) %>%
saveRDS(file.path(dir_path, file_1))
data.frame(thing = rnorm(100)) %>%
saveRDS(file.path(dir_path, file_2))
data.frame(thing = rnorm(100)) %>%
saveRDS(file.path(dir_path, file_3))
I want to read each of these into my R session, do a small bit of processing, to each, then stick them all into the same dataframe:
read_in_data <- function(file_name, dir){
d <- substr(file_name, 1, 10)
thing <-
readRDS(file.path(dir, file_name)) %>%
mutate(date = d)
}
files <- list.files(temp_dir(), pattern = "^2017-1")
this_thing <- NULL
for(i in 1:length(files)){
this_thing <-
this_thing %>%
bind_rows(read_in_data(files[i], dir_path))
}
This is great and does exactly what I want, but I have the sneaking suspicion that, as the number of files I want to read in and bind together grows, the for() loop will end up being very slow.
I could do something like
this_thing <-
read_in_data(files[1], dir_path) %>%
bind_rows(read_in_data(files[2], dir_path)) %>%
bind_rows(read_in_data(files[3], dir_path))
but this is gross and will be impossible to maintain, especially as the number of files I want to read in grows.
How can I get rid of this for loop? I know that growing things in a for() loop is a bad idea but I don't know how else to do this kind of operation. What am I missing? Probably something pretty simple.
I ended up using the purrr package:
library(purrr)
files %>%
map(safely(read_in_data, quiet = FALSE)) %>%
transpose() %>%
simplify_all() %>%
.result() %>%
bind_rows() %>%
saveRDS(file.path("path to .rds file"))

R, creating variables on the fly in a list using assign statement

I want to create variable names on the fly inside a list and assign them values in R, but I am unable to get the desired result. Here is the logic of my code:
Upon the function call: dat_in <- readf(1,2), an input file is read based on a product and site. After reading, a particular column (13th, here) is assigned to a variable aot500. I want to have this variable return from the function for each combination of product and site. For example, I need variables name in the list statement as aot500.AF, aot500.CM, aot500.RB to be returned from this function. I am having trouble in the return statement. There is no error but there is nothing in dat_in. I expect it to have dat_in$aot500.AF etc. Please inform what is wrong in the return statement. Furthermore, I want to read files for all combinations in a single call to the function, say using a for loop and I wonder how would the return statement handle list of more variables.
prod <- c('inv','tot')
site <- c('AF','CM','RB')
readf <- function(pp, kk) {
fname.dsa <- paste("../data/site_data_",prod[pp],"/daily_",site[kk],".dat",sep="")
inp.aod <- read.csv(fname.dsa,skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
aot500 <- inp.aod[,13]
return(list(assign(paste("aot500",siteabbr[kk],sep="."),aot500)))
}
Almost always there is no need to use assign(), we can solve the problem in two steps, read the files into a list, then give names.
(Not tested as we don't have your files)
prod <- c('inv', 'tot')
site <- c('AF', 'CM', 'RB')
# get combo of site and prod
prod_site <- expand.grid(prod, site)
colnames(prod_site) <- c("prod", "site")
# Step 1: read the files into a list
res <- lapply(1:nrow(prod_site), function(i){
fname.dsa <- paste0("../data/site_data_",
prod_site[i, "prod"],
"/daily_",
prod_site[i, "site"],
".dat")
inp.aod <- read.csv(fname.dsa,
skip = 4,
stringsAsFactors = FALSE,
na.strings = "N/A")
inp.aod[, 13]
})
# Step 2: assign names to a list
names(res) <- paste("aot500", prod_site$prod, prod_site$site, sep = ".")
I propose two answers, one based on dplyr and one based on base R.
You'll probably have to adapt the filename in the readAOT_500 function to your particular case.
Base R answer
#' Function that reads AOT_500 from the given product and site file
#' #param prodsite character vector containing 2 elements
#' name of a product and name of a site
readAOT_500 <- function(prodsite,
selectedcolumn = c("AOT_500"),
path = tempdir()){
cat(path, prodsite)
filename <- paste0(path, prodsite[1],
prodsite[2], ".csv")
dtf <- read.csv(filename, stringsAsFactors = FALSE)
dtf <- dtf[selectedcolumn]
dtf$prod <- prodsite[1]
dtf$site <- prodsite[2]
return(dtf)
}
# Load one file for example
readAOT_500(c("inv", "AF"))
listofsites <- list(c("inv","AF"),
c("tot","AF"),
c("inv", "CM"),
c( "tot", "CM"),
c("inv", "RB"),
c("tot", "RB"))
# Load all files in a list of data frames
prodsitedata <- lapply(listofsites, readAOT_500)
# Combine all data frames together
prodsitedata <- Reduce(rbind,prodsitedata)
dplyr answer
I use Hadley Wickham's packages to clean data.
library(dplyr)
library(tidyr)
daily_CM <- read.csv("~/downloads/daily_CM.dat",skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
# Generate all combinations of product and site.
prodsite <- expand.grid(prod = c('inv','tot'),
site = c('AF','CM','RB')) %>%
# Group variables to use do() later on
group_by(prod, site)
Create 6 fake files by sampling from the data you provided
You can skip this section when you have real data.
I used various sample length so that the number of observations
differs for each site.
prodsite$samplelength <- sample(1:495,nrow(prodsite))
prodsite %>%
do(stuff = write.csv(sample_n(daily_CM,.$samplelength),
paste0(tempdir(),.$prod,.$site,".csv")))
Read many files using dplyr::do()
prodsitedata <- prodsite %>%
do(read.csv(paste0(tempdir(),.$prod,.$site,".csv"),
stringsAsFactors = FALSE))
# Select only the columns you are interested in
prodsitedata2 <- prodsitedata %>%
select(prod, site, AOT_500)

Resources