Convert multiple AVI files to JPEG - r

I am trying to convert multiple (250 or so) .avi video files into .jpeg files with R.
I have managed to convert single .avi files using function av_video_images() from library av, but I would love to know how to iterate this over multiple input files.
av::av_video_images("FILE001.AVI", destdir = "Site_1_JPEG", format = "jpg", fps = 1)
I have the 250 .avi files in a folder and would like all frames produced in the output folder Site_1_JPEG.

This is not a complete solution since I cannot reproduce your issue, but I think it will get you closer. Your example suggests that the files you want to process are in your current working directory. Secondly, your code will not produce the desired results because av_video_images names the .jpg files as "image_000001.jpg", "image_000002.jpg", "image_000003.jpg" and I see no way to alter the names of the extracted jpg's. That means your code will successively overwrite the previous files and at the end you will only have the final set of jpg's. To prevent that you need to create separate folders for each video file. Here is one solution:
library(av)
sapply(flist[1:3], function(x) av_video_images(paste0(path, x), x, fps=.5))
To test the code I specified that only the first 3 files will be processed to check things out. There are two differences between my code and yours. First my video files are located in a different directory (path) so I pasted the path onto the file name. Second I provided a different destination directory for each file which is just the file name. This produced three folders with jpg files in each.
The error message could indicate that one or more of the .avi files is corrupt. You can get the directory information on all of the files with
file.info(flist)
The main thing to look at is the size column to make sure the size is large enough.

Related

Is there a way to reference files in a folder within the working directory in R?

I have already finished with my RMarkdown and I'm trying to clean up the workspace a little bit. This isn't exactly a necessary thing but more of an organizational practice which I'm not even sure if it's a good practice, so that I can keep the data separate from some scripts and other R and git related files.
I have a bunch of .csv files for data that I used. Previously they were on (for example)
C:/Users/Documents/Project
which is what I set as my working directory. But now I want them in
C:/Users/Document/Project/Data
The problem is that this only breaks the following code because they are not in the wd.
#create one big dataframe by unioning all the data
bigfile <- vroom(list.files(pattern = "*.csv"))
I've tried adding a full path to list.files() to where the csvs are but no luck.
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv"))
Error: 'data1.csv' does not exist in current working directory ('C:/Users/Documents/Project').
Is there a way to only access the /Data folder once for creating my dataframe with vroom() instead of changing the working directory multiple times?
You can list files including those in all subdirectories (Data in particular) using list.files(pattern = "*.csv", recursive = TRUE)
Best practices
Have one directory of raw and only raw data (the stuff you measured)
Have another directory of external data (e.g. reference data bases). This is something you do can remove afterwards and redownload if required.
Have another directory for the source code
Put only the source code directory under version control plus one other file containing check sums of the raw and external data to proof integrity
Every other thing must be reproducible using raw data and the source code. This can be removed after the project. Maybe you want to keep small result files (e.g. tables) which take long time to reproduce.
You can list the files and capture the full filepath name right?
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv", full.names = T))
and that should read the file in the directory without reference to your wd
Try one of these:
# list all csv files within Data within current directory
Sys.glob("Data/*.csv")
# list all csv files within immediate subdirectories of current directory
Sys.glob("*/*.csv")
If you only have csv files then these would also work but seem less desirable. Might be useful though if you quickly want to review what files and directories are there. (I would be very careful not to use the second one within statements to delete files since if you are not in the directory you think it is in then you can wind up deleting files you did not intend to delete. The first one might too but is a bit safer since it would only lead to deleting wrong files if the directory you are in does have a Data subdirectory.)
# list all files & directories within Data within current directory
Sys.glob("Data/*")
# list all files & directories within immediate subdirectories of current directory
Sys.glob("*/*")
If the subfolder always has the same name (or the same number of characters), you should be able to do it thanks to substring. In your example, "Data" has 4 characters (5 with the /), so the following code should do:
Repository <- substring(getwd(), 1, nchar(getwd())-5)

Is there an R function that can comb through a .csv file and move selected files listed on that spreadsheet to a new folder?

I have used the package camtrapR to rename thousands of trail camera photos. The output .csv file has a column with the file path to the renamed photos and the new names they were given by camtrapR. Following the use of camtrapR one of my team members has added a new column to the .csv file for the type of species in the photo. They then went through all of the photos and put a value in that column based on what they saw in the picture (example: squirrel).
I would like to use the package MLWIC to train a model with the photos that have already been characterized by my team member. My goal is to save all of the characterized pictures in folders specific to each photographed species. I've started by going through the .csv file, finding the files that were characterized as squirrels, and then going through and moving each of those to my new squirrel folder. Then doing the same with foxes, etc. This is very time intensive, and I know there must be an R script that can expedite this process. I'm looking for something that will allow me to specify "squirrel" and then R will find all instances of "squirrel" in the species column of the .csv file, and then follow the file path on that same line of the spreadsheet to find the photo, and move it to a new designated folder.
Based on some research online I have found that file.copy can be used to create a new folder, copy, and move photos into it from the original location. The problem with this is that it will move all photos from the original folder to the new folder.
cams = read.csv("siteAcameras.csv", header=true)
dir.create(squirrel)
photos <- list.files(pattern='*.jpg')
file.copy(photos,
to = "squirrel", recursive = TRUE,
overwrite = TRUE, copy.mode = TRUE, copy.date = FALSE)
I expect that there is R script that can comb a .csv file and use file.copy to only move files from selected lines of the .csv based on a column value. Searching the internet has so far proven fruitless.
1: The .cvs file
2: The manual file moving process
I'm sorry, I just thought of the answer myself. The MLWIC package uses a .csv called data_info.csv to specify the file names of all photos being ran through in one column and a numerical species identifier in the second column. I can simply cut and past the .csv file I'm working with into that and it should work fine.
This is why we think before we type.

Reading data from zip files located in zip files with R

I'd like to use R to extract data from zip files located in zip files (i.e. preform some ZIP file inception).
An example "directory" of one of my datapoints looks like this:
C:\ZipMother.zip\ZipChild.zip\index.txt
My goal is to read in the "index.txt" from each ZipChild.zip. The issue is that I have 324 ZipMother.zip files with an average of 2000 ZipChild.zip files, therefore unzipping the ZipMother.zip files is a concern due to memory constraints (the ZipMother files are about 600 megabytes on average).
With the unzip package, I can successfully get the filepaths of each ZipChild located in the ZipMother, but I cannot use it to list the files located in the ZipChild folders.
Therefore,
unzip("./ZipMother.zip",list=TRUE)
works just fine, but...
unzip("./ZipMother.zip/ZipChild.zip",list=TRUE)
gives me the following error
Error in unzip("./ZipMother.zip/ZipChild.zip", list = TRUE) :
zip file './ZipMother.zip/ZipChild.zip' cannot be opened
Is there any way to use unzip or another method to extract the data from the ZipChild files?
Once I get this to work, I plan on using the ldply function to compile the index.txt files into a dataset.
Any input is very much appreciated. Thank you!
A reproducible example (i.e. a link to a zip file with the appropriate structure) would be useful, but how about:
tmpd <- tempdir()
## extract just the child
unzip("./ZipMother.zip",
files="zipChild.zip",exdir=tmpd)
ff <- file.path(tmpd,"zipChild.zip")
index <- unzip(ff,list=TRUE)
unlink(ff)
This could obviously be packaged into a function for convenience.
It could be slow, but it means you never have to unpack more than one child at a time ...

Unix-merging multiple file types in multiple folders into one pdf

I have a parent folder with around 30 subfolders which each contain pdfs,.doc,.docx, and .jpg files. I need to combine all files into one large pdf. I want the order in which the files are appended into the 'master pdf' to reflect my current folder and file order (which is alphabetic for the subfolder names and numeric for the files within each subfolder).
I am fairly new to Unix and am a bit stuck on this....I would be most grateful for any advice you may have on how to approach this problem. Thank you.
There are three problems here:
Traverse the directory tree to find all documents
Convert each file into PDF
Merge the PDFs
For the first part you could use the find command to get the list of files or script the directory traversal.
For the second part you could use OpenOffice/LibreOffice command line driver to convert .doc and .docx files and ghostscript to convert .jpg files.
For the third part, probably ghostscript again.
Alternatively there are good PDF APIs available for some programming languages, such as iText from Lowagie for Java.

Merging EBCDIC converted files and pdf files into a single file and pushing to mainframes

I have two pdf files and two text files which are converted into ebcdif format. The two text files acts like cover files for the pdf files containing details like pdf name, number of pages, etc. in a fixed format.
Cover1.det, Firstpdf.pdf, Cover2.det, Secondpdf.pdf
Format of the cover file could be:
Firstpdf.pdf|22|03/31/2012
that is
pdfname|page num|date generated
which is then converted into ebcdic format.
I want to merge all these files in a single file in the order first text file, first pdf file, second text file, second pdf file.
The idea is to then push this single merged file into mainframes using scp.
1) How to merge above mentioned four files into a single file?
2) Do I need to convert pdf files also in ebcdic format ? If yes, how ?
3) As far as I know, mainframe files also need record length details during transit. How to find out record length of the file if at all I succeed in merging them in a single file ?
I remember reading somewhere that it could be done using put and append in ftp. However since I have to use scp, I am not sure how to achieve this merging.
Thanks for reading.
1) Why not use something like pkzip?
2) I don't think converting the pdf files to ebcdic is necessary or even possible. The files need to be transfered in Binary mode
3) Using pkzip and scp you will not need the record length
File merging could easily be achieved by using Cat command in unix with > and >> append operators.
Also, if the next file should start from a new line (as was my case) a blank echo could be inserted between files.

Resources