Merging/concatenating video and keeping sound in R using AV package - r

I am trying to merge/concatenate multiple videos with sound sequentially into one video using only R, (I don't want to work with ffmpeg in the command line as the rest of the project doesn't require it and I would rather not bring it in at this stage).
My code looks like the following:
dir<-"C:/Users/Admin/Documents/r_programs/"
videos<-c(
paste0(dir,"video_1.mp4"),
paste0(dir,"video_2.mp4"),
paste0(dir,"video_3.mp4")
)
#encoding
av_encode_video(
videos,
output=paste0(dir,"output.mp4"),
framerate=30,
vfilter="null",
codec="libx264rgb",
audio=videos,
verbose=TRUE
)
It almost works, the output file is an mp4 containing the 3 videos sequentially one after the other, but the audio present is only from the first of the 3 video and then it cuts off.
It doesn't really matter what the videos are. I have recreated this issue with the videos I was using or even 3 randomly downloaded 1080p 30fps videos from YouTube.
Any help is appreciated & thank you in advance.

The experienced behavior (only 1 audio source) is exactly how it is designed to do. In the C source code, you can identify that encode_video only takes the first audio entry and ignores the rest. Overall, audio is poorly supported by ropensci/av atm as its primary focus is to turn R plots into videos. Perhaps, you can file a feature request issue on GitHub.
Meanwhile, why not just use base.system function to call FFmpeg from R? This will likely speed up your process significantly assuming the videos have identical format by using concat demuxer + stream-copy (-c copy). The av library does not support this feature as far as I can tell. (If formats differ, you need to use the concat filter which is also explained in the link above.)

Related

(How) can I merge audio files in R using the av package?

I have a bunch of audio files, which I extracted from mp4 video files using the av package. Now I want to merge all the audio files into one long output mp3.
My question: Is there a way to merge audio files in R using the av package?
I.e. when having a vector of file paths/names such as
files <- c("file1.mp3", "file2.mp3", "file3.mp3")
I am looking for a function or concise workaround within R that could handle this, maybe similar to:
av_function_that_should_exist_already(files, output = "big_fat_file.mp3")
Note 1: I do not want to paste an ffmpeg command to the terminal. If I wanted to use the terminal or some script, I could have done that. What I would like to do, is to solve this completely within R, preferably using av. (I want to avoid implementing yet another library, and overthrowing my complete code, making it into a library mixtape, when everything else already works just fine).
Note 2: I have already checked this post: How to concatenate multiple .wav files from a list in R?, I am specifically asking about av in this question, preferably not about other packages.
So, I just want to know if this is possible or not (and if maybe I'm just not seeing it). I haven't found anything in the documentation, which is mostly about converting audio and video files, not about concatenating audio or video files such as mp3 or aac.
I was thinking that this should be possible using something like:
av_audio_convert(files, output = "big_fat_file.mp3")
However, this just leads to "file1.mp3" being written to "big_fat_file.mp3" in this example, so from a vector of file names, only the first element will be processed by av_audio_convert.
Thanks for your help and ideas in advance,
Cat

saving frames from webcam stream

I would like a routine that systematically extracts and saves the frames from webcam footage to a local directory on my personal computer.
Specifically, I am trying to save frames from the webcam at Old Faithful geyser in Yellowstone Natl. Park. (https://www.nps.gov/yell/customcf/geyser_webcam_updated.htm)
Ideally, I would like to:
be able to control the rate at which frames are downloaded (e.g. take 1 frame every minute)
use FFMPEG or R
Save the actual frame and not a snapshot of the webpage
Despite point 3 above, I've tried simply taking a screenshot in R using the package webshot:
library(webshot)
i=1
while(i<=2) {
webshot('https://www.nps.gov/yell/customcf/geyser_webcam_updated.htm',delay=60,paste(i,'.png',sep=""))
i=i+1
}
However, from the above code I get these two images:
Despite the delay in the webshot() function (60 seconds) the two images are the same not to mention the obvious play button in the middle. This method also seems a bit of a hack as it is saving a snapshot of the website and not the frames themselves.
I am certainly open to to using more appropriate command line tools (I am just unsure of what they are). Any help is greatly appreciated!
The source code of the URL shows, under the video tag
<source type="application/x-mpegurl" src="//56cf3370d8dd3.streamlock.net:1935/nps/faithful.stream/playlist.m3u8">
The src identifies a HLS playlist. So, you can then run ffmpeg periodically to get an image output like this:
ffmpeg -i https://56cf3370d8dd3.streamlock.net:1935/nps/faithful.stream/playlist.m3u8 -vframes 1 out.png

Extract Zip File with 100% Compression Ratio

I noticed this problem when trying to run the following R script.
library(downloader)
download('http://download.cms.gov/nppes/NPPES_Data_Dissemination_Feb_2016.zip',
dest = 'dataset.zip', mode = 'wb')
npi <- read.csv(unz('dataset.zip', 'npidata_20050523-20160207.csv'),
as.is = TRUE)
The script kept spinning for some reason so I manually downloaded the data and noticed the compression ratio was 100%.
I am not certain if StackOverflow is the best Exchange for this question, so I am open to moving this question is another Exchange is suggested. The Open Data Exchange might be appropriate, but there isn't very much activity on that site.
My question is this: I work a lot with government curated data from Centers for Medicare and Medicaid Services (CMS). The data downloads from this site are in the form of zip files and occasionally, they have zip ratios of 100%. This is clearly impossible since the uncompressed size is ~800PB. (CMS notes on their site that they estimate the uncompressed size to be ~4GB.) This has affected me on my work computer; I have replicated this problem with co-worker's computer as well as my own personal computer.
One example can be found here. (Click the link and then click on NPPES Data Dissemination). There are other examples I've noticed and I've emailed CMS about this. They respond that the files are large and can't be handled with Excel. I am aware of this and this isn't really the problem I'm facing.
Does any one know why this would be happening and how I can fix it?
Per cdetermans point, what is the available system memory you have available for R to execute the uncompressing and subsequent loading of the data? Looking at both the image you posted, and the link to the actual data, which reads as ~560mb compressed, it did not pose a problem on my system ( Win 10, 16 GB, Core i7, R v.3.2.3) to download, uncompress, read the uncompressed CSV into a table.
I would recommend - if nothing else works - to decouple your uncompressing and data loading steps. Might even go as far as invoking (depending on your OS) a R system command to decompress your data, manually inspect, and then separately issue piecewise read.tables on the dataset.
Best of luck
rudycazabon

Load dicom files

I am trying to view (uncompressed) dicom files using XTK. However the browser does not show anything, although it seems that it is loading normally.
Does it matter that the slices from DICOM files are horizontal? In the Lesson 15 at https://github.com/xtk/X#readme the slices are vertical. The dicom files come from http://www.osirix-viewer.com/datasets/ (BRAINX dataset).
Thanks in advance!
Everything appears to work fine with the latest XTK Edge (https://github.com/xtk/X#readme). Could you point us to a non-working data set?
You might at some point see "white noise" instead of the actual image. It is most likely because the DICOM is JPEG compressed. XTK do not support that yet. To bypass it, you can uncompress the DICOM with dcmtk "dcmdjpeg" (http://support.dcmtk.org/docs/dcmdjpeg.html)
Thanks
Since Osirix Dataset is stored using JPEG 2000 Transfer Syntax, you are pretty much required to use gdcm:
$ gdcmconv --raw osirix_dataset.dcm output_raw.dcm
See the gdcmconv man page
DCMTK/JPEG 2000 module is not free, see here.

Converting .pdf files to excel (.xls)

A friend of mine doing an internship asked me 2 hours ago if I could help him avoid to do manually 462 pdf file to .xls using free online soft.
I thought of a shell script using unoconv, but I didn't find out how to use it properly, and I am not sure if unoconv can solve this problem since it mainly converts file to pdf, not the reverse thing.
Conversion from PDF to any other structured format is not always possible and not generally recommended.
Having said that, this does look like a one-off job and there's a fair few of them (462).
It's worth pursuing, if you can reliably extract text from most of them and it's reasonably structured. It's a matter of trying to get regular text output across a sample of the PDF's that you can reliably parse into a table structure.
There's plenty of tools around that target either direct or OCR based text extraction, just google around.
One I like is pstotext from the ghostscript suite; the -bboxes option lets me get the coordinates of each word and leaves it up to me to re-assemble the structure. Despite its name it does work on input PDFs. Downside is that it can be a bit flakey and works on some PDF's but not others.
If you get this far, you'd then most likely then need to write a shell-script or program to convert that to a CSV. You can either open this directly via a spread-sheet or look for tools to convert this into XLS.
PS If he hasn't already, get the intern to ask if there's any possible way of getting at the original data that was used to created the PDFs It will save a lot of time and effort and lead to a way more accurate result.
Update An alternative to pstotext is renderpdf.pl command which is included in the Perl CAM::PDF module. More robust, but just reports text (x,y) position, not bounding boxes.
Other responses on a linked question suggest Tabula, too.
https://github.com/tabulapdf/tabula
I tried and it works very well.

Resources