Vemco Acoustic Telemetry Data (vrl files) in R - r

Does anyone know a good way to read .vrl files from Vemco acoustic telemetry receivers directly into r as an object. Converting .vrl files to .csv files in the program VUE prior to analyzing the data in r seems like a waste of time if there is a way to bring them in directly. My internet searches have not turned up anything that worked for me.

I figured out a way using glatos to convert all .vrl files to .csv and then reading the .csv files in and binding them.
glatos has to be installed from github.
Convert all .vrl files to .csv files using vrl2csv. The help page has info on finding the path for vueExePath
library(glatos)
vrl2csv(vrl = "VRLFileInput",outDir = "VRLFilesToCSV", vueExePath = "C:/Program Files (x86)/VEMCO/VUE")
The following will pull in all .csv files in the output folder from vrl2csv and rbind them together. I had to add the paste0 function to create the full file path for each .csv in the list.
library(data.table)
AllDetections <- do.call(rbind, lapply(paste0("VRLFilesToCSV/", list.files(path = "VRLFilesToCSV")), read.csv))

Related

Extract a CSV inside a zip file inside another zip online in R?

I am attempting to extract a CSV that is inside of a zip file nested in another zip file posted online.
The analysis I am doing draws on files that usually have the same name but are updated regularly. Every so often they update format. This time they decided to put multiple versions of the data embedded in zip files inside of a larger zip file.
What have I done and tried?
I know have a list of many other file that I have downloaded and then loaded into objects. In all the cases the code block looks similar to this:
temp <- tempfile()
download.file("http://fakeurl.com/data/filename.zip",temp, mode="wb")
unzip(temp, "data.csv")
db <- read.csv("data.csv", header=T)
I cannot wrap my head around taking it to the next level. Because I am not downloading it 'directly' I do not know how to manipulate it.
Ideally, I want to unzip one file into a temp file, then unzipping the next file, then reading in the csv into a data frame.
I thank you all for your help and will answer any questions you might have to help clarify.
Unzip the downloaded file into the current directory and then iterate through the generated files unzipping any that itself is a zip file.
Files <- unzip(temp)
for(File in Files) if (grepl("\\.zip$", File)) unzip(File)
There also various approaches listed here:
How do I recursively unzip nested ZIP files?
Creating zip file from folders in R
https://superuser.com/questions/1287028/how-to-explore-nested-zip-without-extract
https://github.com/ankitkaushal/nzip

Create parquet file directory from CSV file in R

I'm running into more and more situations where I need out-of-memory (OOM) approaches to data analytics in R. I am familiar with other OOM approaches, like sparklyr and DBI but I recently came across arrow and would like to explore it more.
The problem is that the flat files I typically work with are sufficiently large that they cannot be read into R without help. So, I would ideally prefer a way to make the conversion without actually need to read the dataset into R in the first place.
Any help you can provide would be much appreciated!
arrow::open_dataset() can work on a directory of files and query them without reading everything into memory. If you do want to rewrite the data into multiple files, potentially partitioned by one or more columns in the data, you can pass the Dataset object to write_dataset().
One (temporary) caveat: as of {arrow} 3.0.0, open_dataset() only accepts a directory, not a single file path. We plan to accept a single file path or list of discrete file paths in the next release (see issue), but for now if you need to read only a single file that is in a directory with other non-data files, you'll need to move/symlink it into a new directory and open that.
You can do it in this way:
library(arrow)
library(dplyr)
csv_file <- "obs.csv"
dest <- "obs_parquet/"
sch = arrow::schema(checklist_id = float32(),
species_code = string())
csv_stream <- open_dataset(csv_file, format = "csv",
schema = sch, skip_rows = 1)
write_dataset(csv_stream, dest, format = "parquet",
max_rows_per_file=1000000L,
hive_style = TRUE,
existing_data_behavior = "overwrite")
In my case (56GB csv file), I had a really weird situation with the resulting parquet tables, so double check your parquet tables to spot any funky new rows that didn't exist in the original csv. I filed a bug report about it:
https://issues.apache.org/jira/browse/ARROW-17432
If you also experience the same issue, use the Python Arrow library to convert the csv into parquet and then load it into R. The code is also in the Jira ticket.

R Can I use .rds files for my data in a package?

I'm trying to convert some code into a package. According to the documentation, only .RData files should be in the data directory, but I'd rather use .rds files because they don't retain the file name. There are times when I save with a different name than I want to use when reading in later. And I really only want to have one data set for file, so the ability of .RData files to store more is actually a negative.
So my question is why not allow .rds files in the package data directory? Or is there another way to solve this problem?
The only acceptable data files in /data are those saved with 'save', which means they are in the .RData format. Hadley's link, which #r2evans points to, says this. As does section 1.1.6, which #rawr points to.
Old question - but you can. It is a two step process.
save your data as .rds file
create an R file in the data directory which loads the rds data.
I am doing this as followed:
rdsFile <- paste0(schemeName, "_example.rds")
saveRDS(
dmdScheme_example,
file = here::here( "data", rdsFile )
)
cat(
paste0(schemeName, "_example <- readRDS(\"./", rdsFile, "\")"),
file = here::here( "data", paste0(schemeName, "_example.R") )
)

Import multiple csv files into R from zip folder

I know that this question has been asked exhaustively on this site, but I cannot find any question which addresses my problem.
I am trying to import multiple .csv files into R which are located in nested .zip files on my PC. The other questions seem to relate to importing a single file from a URL, which is not my issue.
I have set my working directory to the folder which contains the first .zip file, but there is another one inside of it, which then contains normal folders, and finally hundreds of .csv files which I am looking to access.
Up to now I have always manually extracted the data since I have no idea where to begin with unzipping code, but considering this folder contains around 20GB of data, I'm going to need to try something else.
Any help would be appreciated!
EDIT - CODE:
setwd("C:/docs/data/241115")
temp <- tempfile()
unzip("C:/docs/data/241115/Requested.zip",exdir=temp)
l = list.files(temp)
unzip("C:/docs/data/241115/Requested/Data Requested.zip",exdir=temp)
> error 1 in extracting from zip file
Without a minimal reproducible example it's difficult to know exactly where the problem lies. My best guess is that using a tempfile() is causing problems.
I would create a folder within your working directory to unzip the files to. You can do this from within R if you like:
# Create the folder 'temp' in your wd
dir.create("temp")
Now assuming your zip file is in the working directory I would unzip the first .zip in to temp in one step:
unzip("Requested.zip", exdir = "temp")
Finally, unzip the final .zip:
unzip("temp/Data Requested.zip", exdir = "temp")

Reading data from zip files located in zip files with R

I'd like to use R to extract data from zip files located in zip files (i.e. preform some ZIP file inception).
An example "directory" of one of my datapoints looks like this:
C:\ZipMother.zip\ZipChild.zip\index.txt
My goal is to read in the "index.txt" from each ZipChild.zip. The issue is that I have 324 ZipMother.zip files with an average of 2000 ZipChild.zip files, therefore unzipping the ZipMother.zip files is a concern due to memory constraints (the ZipMother files are about 600 megabytes on average).
With the unzip package, I can successfully get the filepaths of each ZipChild located in the ZipMother, but I cannot use it to list the files located in the ZipChild folders.
Therefore,
unzip("./ZipMother.zip",list=TRUE)
works just fine, but...
unzip("./ZipMother.zip/ZipChild.zip",list=TRUE)
gives me the following error
Error in unzip("./ZipMother.zip/ZipChild.zip", list = TRUE) :
zip file './ZipMother.zip/ZipChild.zip' cannot be opened
Is there any way to use unzip or another method to extract the data from the ZipChild files?
Once I get this to work, I plan on using the ldply function to compile the index.txt files into a dataset.
Any input is very much appreciated. Thank you!
A reproducible example (i.e. a link to a zip file with the appropriate structure) would be useful, but how about:
tmpd <- tempdir()
## extract just the child
unzip("./ZipMother.zip",
files="zipChild.zip",exdir=tmpd)
ff <- file.path(tmpd,"zipChild.zip")
index <- unzip(ff,list=TRUE)
unlink(ff)
This could obviously be packaged into a function for convenience.
It could be slow, but it means you never have to unpack more than one child at a time ...

Resources