I have downloaded PDFtoText in mac and wrote following code to convert pdf files to text:
pdf_to_load =("~/my_directory/my.pdf")
system(paste('pdftotext', pdf_to_load))
The code runs well but I am not able to see my.txt in the source directory nor it has been saved anywhere in the folders. Where I went wrong?
One of my mentors were able to run the same code in his computer and he was able to see the converted .txt file.
Kindly guide.
You get a wrong result if the default PDF extraction engine is not found on your computer, see ?tm::readPDF. Those engines are not part of R or of the tm package, and it depends on your computer whether the necessary programs are already installed.
The easiest solution is to install the programs pdftotext and pdfinfo (you'll need both), which you can obtain as precompiled binaries here.
Once these programs are correctly installed, you should be able to extract the text of the PDF file without a system call, by using the readPDF() function of the tm package
library(tm)
my_pdf_txt <- readPDF(control=list(text="-layout"))(elem=list(uri="~/my_directory/my.pdf"), language="en")
Related
I have a script I was using a few months ago, and I was asked to re-output it using new source data. Now I get the message "Error: openxlsx can only read .xlsx or .xlsm files."
Of course, it IS an .xlsx file.
Not only that, I went back and ran the same script on the old source file (which worked), and...I get the same error!! I haven't changed any code, but my version of R has been updated by the administrators from 3.6 to 4.1.3 (I work in a virtual environment). I have confirmed that openxlsx version 4.2.5 is installed.
I've seen in other posts that people recommend using other packages to read xlsx files. That is not an ideal option here for administrative reasons (getting permission to install new packages can be very time-consuming and may blow deadlines), and I've started pursuing that option, but in the meantime, does anyone have any ideas?
Unfortunately, changing the format (i.e. exporting as csv and using read.csv) is also not an option, because we're auditors and doing that will break the audit trail.
OK, a colleague of mine solved the problem.
The source file has the extension ".XLSX". In order for openxlsx to read the file, you have to change the extension IN THE CODE to ".xlsx" (in my case at least, I didn't even have to change the actual extension of the file--just the quoted reference in the code), although other colleagues say they have had to do this.
I would like to be able to open up an .Rd documentation file and preview it in R.
For example, I can create a data documentation file using promptData:
df <- data.frame(var1=1:5,var2=6:10)
promptData(df,filename = "df_doc.Rd")
which will produce a documentation file "df_doc.Rd" in the working directory.
In order to preview this file, I can open it up in the RStudio editor and then hit "Preview", which will open up df_doc properly formatted in the Help window. However, I'd like to be able to do that with code rather than having to open up the file and hit the Preview button in the RStudio GUI. Something like a preview("df_doc.Rd") function.
I'm aware that there are ways to 'install' the documentation files so R knows where to find them. But I'm writing some code that will generate these files automatically and preview them (hopefully without having to load in the dev tools that install the documentation files), so I'm specifically hoping to be able to preview these directly from file. Is that possible?
Man, the documentation for this one was pretty well hidden! To be fair, "Rd" isn't exactly Googleable, nor is documentation about documentation. But I managed to scrounge it up.
What I've been looking for is the
previewRd('df_doc.Rd')
command in the rstudioapi library. Unfortunately, this only works in RStudio, so if I want it to be generally usable I'll need to write HTML directly instead of Rd and open that in a browser.
According to 'Writing R Extensions', run:
R CMD Rdconv -t html filename.Rd > filename.html
in the command line. See also:
R CMD Rd2pdf --help
In R: system("R CMD Rdconv -t html filename.Rd > filename.html | chromium-browser")
I'm re-running some R Markdown scripts that worked fine a month ago, but now kable_as_image is unable to find Ghostscript (yes, I'm on Windows 8). I get the following error message:
Error in kable_as_image(criteria.table,"Criteria",file_format="jpeg"):
Ghostscript is required to read PDF on windows. Please download it here: https://ghostscript.com/
My computer still has Ghostscript, which runs fine when I open it up independently (I tried reinstalling Ghostscript; it didn't help). My guess is that the problem has something to do with R, RStudio, or a package being unable to find the Ghostscript.
I'm pretty sure I've upgraded R in the interim, and I'm currently on 3.4.3 with the latest versions of kableExtra and magick. I've also tried
Sys.setenv(R_GSCMD="C:/Program Files/gs/gs9.22/bin/gswin64.exe")
(and also for gswin64c.exe) but that didn't help, either. Any advice would be appreciated.
Despite what the error message says, R needs the path to MikTeX (or your TeX program of choice), not to Ghostscript itself. The best solution is to add it to PATH in your operating system directly so it's always there, but it also works to add it within R. This is helpful to test it out before digging into your OS (make sure you have the right path), or if you don't have administrator privileges to your work machine.
Sys.setenv("PATH"=sprintf("%s;C:\\Users\\me\\AppData\\Local\\Programs\\MiKTeX 2.9\\miktex\\bin\\x64\\",Sys.getenv("PATH")))
Your path to MikTeX will likely be different than mine. Note that you need sprintf() or something similar to add the directory to the end of the PATH instead of overwriting the existing path.
I've just had RStudio crash on me unexpectedly, and on re-starting, contrary to what I've come to expect, the R script I had been tinkering around with was nowhere to be found.
I've managed to track down the Rhistory file so I'll be able to piece together all the commands, which is reassuring.
However, I am curious if there's somewhere I might try looking to find the temporary unsaved file on the off chance that might be cached somewhere (after all, it is usually cached somewhere that RStudio apparently knows to look). Is there a particular file extension/format I should be searching for?
Currently running R 3.3.1 through RStudio 0.99.903 on Linux Mint 17.3 (over Ubuntu 14.04.3 LTS).
I've tried running grep on the command line to find some of the more recently updated lines of code; I may be out of luck. I found two files:
~/.rstudio-desktop/history_database
Which appears to basically be a more centralized .Rhistory for RStudio
and
~/.rstudio-desktop/sdb/s-9CD2C698/D7986B2A
This looks JSON-like and also appears to basically be an Rhistory. Please correct me if I'm wrong.
As indicated by #KevinUshey from RStudio:
RStudio stores autosave data as part of the JSON 'blobs' within the sdb folder. You should see the document serialized as a long 'string', with newlines embedded.
Use packages such as jsonlite to parse this and best of luck.
if you used Rstudio on linux, the temporary R script files were stored in .rstudio/sources folder, and you can open all of the script files directly.
Good luck
In Rstudio if you saved or not a script but you ran this code, you can check the history off the app, this is the "telemetric data" that Rstudio has about you.
In windows, this is the path,
C:\Users\ANALISTA\AppData\Local\RStudio\history_database
you should use "visual studio code" or something similar to see it.
i am creating a package in R language, everything is running properly, but when i run R CMD check , it shows an error message while running examples.. i.e.
"can't open the file." "No such file or directory"
actually my function needs a PubMed text file containing abstracts from the PubMed, i have placed my text file in every sub-directory of my package, but its not working. showing same error again and again.
so please suggest me the right way how to put a text file in a package which can be used by examples to run properly.
i will be very thankful to you.
Usually you put such data in the /inst folder. E.g.:
<packageRoot>/inst/pubmed/myfile
After the package is build you can access the content of this folder from within the package like this:
system.file( "pubmed/myfile", package="<package>" )
See for more information http://cran.r-project.org/doc/manuals/r-release/R-exts.pdf (1.1.5 Data in packages).
I suggest you to use devtools and roxygen2 packages. Basically, you just need to prepare description and .R files.
see more details in this brilliant answer :devtools roxygen package creation and rd documentation