Ways to extract images from pdf using R - r

Is there a way to extract images from pdf using R and save them into a folder?
there are a lot of similar questions regarding other programming languages and there is apparently a way to do this in python, was wondering if the same work can be replicated in r https://www.thepythoncode.com/article/extract-pdf-images-in-python
there is pdftools package in r but does not sound like it can help much with images, only reads text and there is an option for ocr, I just want to extract the images and store them into a folder.
I can try to use reticulate package to use this python method in r but I won't be able to loop / map it as I would like. That's why I was asking if anyone knows a way in R.
thank you.

You can try something like this :
library(pdftools)
path_To_PDF <- "C:/my_pdf.pdf"
pdf_convert(path_To_PDF)

Related

How to determine online file size before download in R?

The question is simple and there are similar questions in other languages, but not in R, as far as I could search.
I want to download a file in R code, but before downloading, I want to print out the size and estimation of the download time.
Is there any way to do this directly in base R, or using curl utilities?
A simple solution would be:
download_size <- function(url) as.numeric(httr::HEAD(url)$headers$`content-length`)
Which would allow
download_size("https://cran.r-project.org/doc/manuals/r-release/R-ints.pdf")
#> [1] 452557

(How) can I merge audio files in R using the av package?

I have a bunch of audio files, which I extracted from mp4 video files using the av package. Now I want to merge all the audio files into one long output mp3.
My question: Is there a way to merge audio files in R using the av package?
I.e. when having a vector of file paths/names such as
files <- c("file1.mp3", "file2.mp3", "file3.mp3")
I am looking for a function or concise workaround within R that could handle this, maybe similar to:
av_function_that_should_exist_already(files, output = "big_fat_file.mp3")
Note 1: I do not want to paste an ffmpeg command to the terminal. If I wanted to use the terminal or some script, I could have done that. What I would like to do, is to solve this completely within R, preferably using av. (I want to avoid implementing yet another library, and overthrowing my complete code, making it into a library mixtape, when everything else already works just fine).
Note 2: I have already checked this post: How to concatenate multiple .wav files from a list in R?, I am specifically asking about av in this question, preferably not about other packages.
So, I just want to know if this is possible or not (and if maybe I'm just not seeing it). I haven't found anything in the documentation, which is mostly about converting audio and video files, not about concatenating audio or video files such as mp3 or aac.
I was thinking that this should be possible using something like:
av_audio_convert(files, output = "big_fat_file.mp3")
However, this just leads to "file1.mp3" being written to "big_fat_file.mp3" in this example, so from a vector of file names, only the first element will be processed by av_audio_convert.
Thanks for your help and ideas in advance,
Cat

"filename.rdata" file Exploring and Converting to CSV

I'm no R-programmer (because of the problem I started learning it), I'm using Python, In a forcasting task I got a dataset signalList.rdata of a pheomenen called partial discharge.
I tried some commands to load, open and view, Hardly got a glimps
my_data <- get(load('C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/signalList.Rdata'))
but, since i lack deep knowledge about R, I wanted to convert it into a csv file, or any type that I can deal with in python.
or, explore it and copy-paste manually.
so, i'm asking for any solution whether using R or Python or any tool to get what's in the .rdata file.
Have you managed to load the data successfully into your working environment?
If so, write.csv is the function you are looking for.
If not,
setwd("C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/")
signalList <- load("signalList.Rdata")
write.csv(signalList, "signalList.csv")
should do the trick.
If you would like to remove signalList from your working directory,
rm(signalList)
will accomplish this.
Note: changing your working directory isn't necessary, it just makes it easier to read in a comment I feel. You may also specify another path for saving your csv to within the second argument of write.csv.

R tabulizer encoding or security

I have been practicing with tabulizer package in R and have following problem. Unfortunately I can't offer reproducible example, as pdf is firms property, but I will describe problem in detail.
I'm trying to read PDF that has start and end date in upperright corner. When I open PDF they look normal
Start: 01-Mar-2018
End: 31-Mar-2018
Now the fun part. When I highlight them and use Ctrl+C to copy them here is result when pasted to R.
:tttt: 11-rrr-8118
tt:: 11-rrr-8118
This is exactly same kind of nonsense that extract_text(path, pages=1) will give. A lot of t::ttttt:ttt... My question is that is there some security in this PDF or do I just need to figure out correct encoding or because this PDF is automatically created from system, there is some weird notation to everything?
I figured it out. This PDF is mainly created by metadata (didn't know) and great tool in R for accessing metadata in PDFs is pdftools.
library(pdftools)
pdf_info(path.pdf)
and you can wrangle out all the important metadata bits.

Create and save R's default codebooks as a pdf

If I load data(mtcars) it comes with a very neat codebook that I can call using ?mtcars.
I'm interested to document my data in the same way and, furthermore, save that neat codebook as a pdf.
Is it possible to save the 'content' of ?mtcars and how is it created?
Thanks, Eric
P.S. I did read this thread.
update 2012-05-14 00:39:59 PDT
I am looking for a solution using only R; unfortunately I cannot rely on other software (e.g. Tex)
update 2012-05-14 09:49:05 PDT
Thank you very much everyone for the many answers.
Reading these answers I realized that I should have made my priorities much clearer. Therefore, here is a list of my priorities in regard to this question.
R, I am looking for a solution that is based exclusively on R.
Reproducibility, that the codebook can be part of a automated script.
Readability, the text should be easy to read.
Searchability, a file that can be open with any standard software and searched (this is why I thought pdf would be a good solution, but this is overruled by 1 through 3).
I am currently labeling my variables using label() from the Hmisc package and might end up writing a .txt codebook using Label() from the same package.
(I'm not completely sure what you're after, but):
Like other package documentation, the file for mtcars is an .Rd file. You can convert it into other formats (ASCII) than pdf, but the usual way of producing a pdf does use pdflatex.
However, most information in such an .Rd file is written more or less by hand (unless you use yet another R package like roxygen/roxygen2 help you to generate parts of it automatically.
For user-data, usually Noweb is much more convenient.
.Rnw -Sweave-> -> .tex -pdflatex-> pdf is certainly the most usual way with such files.
However, you can use it e.g. with Openoffice (if that is installed) or use it with plain ASCII files instead of TeX.
Have a look at package knitr which may be easier with pure-ASCII files. (I'm not an expert, just switching over from Sweave)
If html is an option, both Sweave and knitr can work with that.
I don't know how to get the pdf of individual data sets but you can build the pdf of the entire datasets package from the LaTeX version using:
path <- find.package('datasets')
system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD",
"Rd2pdf",shQuote(path)))
I'm not sure on this but it only makes sense you'd have to have some sort of LaTeX program like MikTex. Also I'm not sure how this will work on different OS as mine is windows and this works for me.
PS this is only a partial answer to your question as you want to do this for your data, but if nothing else it may get the ball rolling.
The help page that is displayed when entering ?mtcars is generated from an .Rd file, which is a LaTeX-like file that is used for all of R's help pages. Although .Rd files are LaTeX-like, you don't actually need to know LaTeX to read or write them. The actual mtcars.Rd file is available here: http://commondatastorage.googleapis.com/jthetzel-public/mtcars.Rd , which can be viewed with any text editor.
.Rd files included in the ./man directory of a package are converted to .html files when installing the package. They are converted by functions in the "tools" package.. If you would like functionality like ?mtcars for your datasets, you would need to create a package for them. That might sound complicated if you have never created a package before, but it is easy enough to learn and will make you a better R programmer. There are a number of examples of dataset-only packages on CRAN, for example msProstate: http://cran.r-project.org/web/packages/msProstate/index.html . Consider downloading the package source to see how it is organized.
For more information on creating your own packages, writing .Rd files, and building packages:
http://cran.r-project.org/doc/manuals/R-exts.html, especially "1.1.5 Data in packages".
Edit
And if you want to convert the .Rd file in your package to a .pdf, you can do so when building your package, but you will need a LaTeX compiler. If you are on Windows, see here: http://cran.r-project.org/bin/windows/Rtools/ .
You can't create a PDF with just R; you need to use other software that creates PDFs.
You could use a combination of utils::promptData, tools::Rd2HTML, and a simple custom function to open the created HTML file in the users' browser.
It would probably be easier to just make a package containing your data sets. Look at the "datasets" package for an example.
It looks like that if you want to generate a pdf, an external tool like LaTeX is always needed. I would recommend using a simple ASCII text format to generate such a file. In principle the .Rd files are also ASCII text, but I do not find them particularly readable.
Instead, I would recommend using a plain text ASCII format such as Markdown (which is e.g. used on StackOverflow) to write the text file. Such a file is already much more readable than an .Rd formatted file, and as a bonus it can quite easily be processed into a PDF should you choose to do so later on. The knitr package I think is capable of generating PDF files from Markdown sources. In addition, knitr allows you to mix in R code in the Markdown text. This code can be evaluated and the results (even figures) added to the resulting PDF.
In practice you can use sprintf to generate character vectors that you can pipe to a file in order to dynamically generate the markdown text. Just write the template one time, and mark the places for the text you want to add later like this:
base_text = "
First header
============
This document was generated on %s, by %s.
"
text_forfile = sprintf(text, some_date, some_name)
Just dump the text in text_forfile to a .md file and your done, no external tools needed. See this post on SO for how dump text to a file.

Resources