Importing and converting multiple files based on criteria in R

Importing and converting multiple files based on criteria in R - r

Thank you guys in advance for your help, I am faced with the task of importing multiple excels files into R, such files may have a .csv or an .xlsx extention they all have the same name format which is the word "bookings" followed by a date formart YYYY_MM_DD, like this:
bookings_2016_05_23
bookings_2016_12_06
After uploading these files I have to write them as .xlsx files in a specific directory but I have to replace their names with the last two characters in the name of the file, to illustrate if I upload a file named bookings_2016_05_23, I will have to properly read it whether it is .xlsx or .csv and then I will have to save it (write) as 23.xlsx and do that for every file I am trying to accomplish this by using this code:
path <- fs::dir_ls(choose.dir())
read_all_files <- function(path){
path %>%
excel_sheets() %>%
set_names() %>%
map_df(~read_excel(.x, path = path,col_types = "text") %>% mutate(file=path, sheet=.x))
}
which Allows me to import all files and their respective sheets, but this will work if all files were .xlsx and in order to change the name of the file I am trying to uses a regexs within the function set_names() with no luck, also I will be so thankful if you guys could teach me/ reference how to save all files as .xlsx separately in a working directory using a for loop. Thank you guys so much I trully do.

Related

Extract a CSV inside a zip file inside another zip online in R?

I am attempting to extract a CSV that is inside of a zip file nested in another zip file posted online.
The analysis I am doing draws on files that usually have the same name but are updated regularly. Every so often they update format. This time they decided to put multiple versions of the data embedded in zip files inside of a larger zip file.
What have I done and tried?
I know have a list of many other file that I have downloaded and then loaded into objects. In all the cases the code block looks similar to this:
temp <- tempfile()
download.file("http://fakeurl.com/data/filename.zip",temp, mode="wb")
unzip(temp, "data.csv")
db <- read.csv("data.csv", header=T)
I cannot wrap my head around taking it to the next level. Because I am not downloading it 'directly' I do not know how to manipulate it.
Ideally, I want to unzip one file into a temp file, then unzipping the next file, then reading in the csv into a data frame.
I thank you all for your help and will answer any questions you might have to help clarify.

Unzip the downloaded file into the current directory and then iterate through the generated files unzipping any that itself is a zip file.
Files <- unzip(temp)
for(File in Files) if (grepl("\\.zip$", File)) unzip(File)
There also various approaches listed here:
How do I recursively unzip nested ZIP files?
Creating zip file from folders in R
https://superuser.com/questions/1287028/how-to-explore-nested-zip-without-extract
https://github.com/ankitkaushal/nzip

Is there a way to load csv files saved in different folders with only a partial file name in R

I am trying to load multiple csv files that are each saved in different folders within my working directory in R. However I only know part of each of the file name.
For example the file in "folder1" will be named "xxx_xxx_folder1.csv", and the file in "folder2" is "xxx_xxx_folder2.csv" etc. There is only one csv in each folder.
I was wondering is there a way to load files saved in different folders with only a partial file name?
The only way I have got it to partially work so far is to have all the files in one folder
Thanks and sorry if any of this is unclear!

From your description you could use list.files with option recursive=TRUE to get a list of your csv files. You could then loop over the list to read your files:
fn <- list.files(PATH_TO_WORKING_DIRECTORY, "\\.csv$", recursive = TRUE, full.names = TRUE)
lapply(fn, read.csv)

Vemco Acoustic Telemetry Data (vrl files) in R

Does anyone know a good way to read .vrl files from Vemco acoustic telemetry receivers directly into r as an object. Converting .vrl files to .csv files in the program VUE prior to analyzing the data in r seems like a waste of time if there is a way to bring them in directly. My internet searches have not turned up anything that worked for me.

I figured out a way using glatos to convert all .vrl files to .csv and then reading the .csv files in and binding them.
glatos has to be installed from github.
Convert all .vrl files to .csv files using vrl2csv. The help page has info on finding the path for vueExePath
library(glatos)
vrl2csv(vrl = "VRLFileInput",outDir = "VRLFilesToCSV", vueExePath = "C:/Program Files (x86)/VEMCO/VUE")
The following will pull in all .csv files in the output folder from vrl2csv and rbind them together. I had to add the paste0 function to create the full file path for each .csv in the list.
library(data.table)
AllDetections <- do.call(rbind, lapply(paste0("VRLFilesToCSV/", list.files(path = "VRLFilesToCSV")), read.csv))

Can we load .txt files to vaex?

I have folder of .txt files which is of the size of 52.6 GB. The .txt files are located in various subfolders. Each subfolder has unique labels "F","G", etc. Each subfolder has got many .txt files. I need to combine all the .txt files of each unique labels("F","G") into one single file. I tried to used vaex. But I could not find a way to do this for .txt files. Can any one please help me out?

Provided the text files have csv formatted data, and same structure across files, you could use:
df = vaex.open_many([fpath1, fpath2, ..., fpathX])
To fetch all the filenames and their paths, you could conveniently use pathlib to recursively glob the file paths
from pathlib import Path
txt_files = Path('your_label_folder_path').rglob('*.txt')
# since this returns a generator and vaex.open_many expects a list
# and while we're here, resolve the absolute path as well
txt_files = [txt.absolute() for txt in txt_files]
df = vaex.open_many(txt_files)

Import multiple csv files into R from zip folder

I know that this question has been asked exhaustively on this site, but I cannot find any question which addresses my problem.
I am trying to import multiple .csv files into R which are located in nested .zip files on my PC. The other questions seem to relate to importing a single file from a URL, which is not my issue.
I have set my working directory to the folder which contains the first .zip file, but there is another one inside of it, which then contains normal folders, and finally hundreds of .csv files which I am looking to access.
Up to now I have always manually extracted the data since I have no idea where to begin with unzipping code, but considering this folder contains around 20GB of data, I'm going to need to try something else.
Any help would be appreciated!
EDIT - CODE:
setwd("C:/docs/data/241115")
temp <- tempfile()
unzip("C:/docs/data/241115/Requested.zip",exdir=temp)
l = list.files(temp)
unzip("C:/docs/data/241115/Requested/Data Requested.zip",exdir=temp)
> error 1 in extracting from zip file

Without a minimal reproducible example it's difficult to know exactly where the problem lies. My best guess is that using a tempfile() is causing problems.
I would create a folder within your working directory to unzip the files to. You can do this from within R if you like:
# Create the folder 'temp' in your wd
dir.create("temp")
Now assuming your zip file is in the working directory I would unzip the first .zip in to temp in one step:
unzip("Requested.zip", exdir = "temp")
Finally, unzip the final .zip:
unzip("temp/Data Requested.zip", exdir = "temp")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Importing and converting multiple files based on criteria in R - r

Related

Extract a CSV inside a zip file inside another zip online in R?

Is there a way to load csv files saved in different folders with only a partial file name in R

Vemco Acoustic Telemetry Data (vrl files) in R

Can we load .txt files to vaex?

Import multiple csv files into R from zip folder

Categories

Resources