pdftools in R performing differently across machines - r

I'm noticing an issue where the pdftools package in R seems to be performing differently when run locally on my Windows 7 machine versus when I run it on a shared Ubuntu server via ssh.
My code:
download.file("http://www.nber.org/lbid/docs/LinkCO95Guide.pdf",
"1995codebook.pdf",
mode = "wb",
method = "libcurl")
codebook <- pdf_text("1995codebook.pdf")
On my local windows 7 machine, the object codebook shows up as "Large character (258 elements, 710.2 Kb)", whereas on the Ubuntu server it shows up as "Large character (258 elements, 701.9 Kb)".
As you might imagine, this is causing problems for me downstream where code that works on my local machine is not producing the same results on the Ubuntu server. Looking at the text contained in codebook the first difference I notice right away is that where the version produced on Windows has "\r\n" the version produced on Ubuntu only has "\n" instead (I rely on "\r\n" downstream).
Why would that character series be different? Might it have something to do with encoding? Any help appreciated on what's causing this and how I can get the same results on both machines.
One last thing to mention: I had to install the poppler library to my home directory on the Ubuntu server (don't have sudo access) in order to get pdftools to install:
apt-get source poppler
cd poppler-0.24.5
./configure --prefix=$HOME/myapps
make
make install
export PKG_CONFIG_PATH=$HOME/myapps/lib/pkgconfig
After having done so, install.packages("pdftools") seems to run correctly. And pdftools loads without issue. So if it's a bad install I'm not sure what has gone wrong.

A few things:
Windows has different line endings, this is extensively documented. This alone accounts for the size difference
Even after the download, you can convert between both conventions. One tool to do so is dos2unix which you can get via apt-get install dos2unix
You are making your life too complicated by building poppler. As the configure script for pdftools says, just install the library via apt-get install libpoppler-cpp-dev
However: most sane programs, and R included, treat \r\n and n identically so your imported data should be the same. If yours does not,
use dos2unix or equivalent tools to convert as needed. In the longer run you want your code to not care.

Related

Running R from Mac OSX terminal

I've searched the web, and I'm still unclear on how to run R from the Mac terminal. I have Rstudio and the standalone R app installed. I thought I could just type "R" from the command line as I do with "python", but that doesn't work. Is it necessary to edit the PATH in my bash profile? If so, how do I give the correct location of R?
Thanks for any help
Edits after receiving comments
So, I'm running Sierra, and when I type "r" or "R" at the terminal, I get "-bash: R: command not found." If I type, "which R" in the terminal I do not get any output.
Here is the output from "echo $PATH": /usr/local/heroku/bin:/opt/local/bin:/opt/local/sbin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/Users/samuelcolon/anaconda/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Users/samuelcolon/.rvm/gems/ruby-2.1.0/bin:/Users/samuelcolon/.rvm/gems/ruby-2.1.0#global/bin:/Users/samuelcolon/.rvm/rubies/ruby-2.1.0/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/samuelcolon/.rvm/bin:/Users/samuelcolon/.rvm/bin
As for the installation, I believe I downloaded it directly from cran.r-project.org a while ago. I can locate the GUI in my applications and open it--
I have version 3.13. Is it possible, I only have R.app installed but not R? Perhaps that's the reason I'm getting the 'command not found' when typing "R" into the terminal?
Generally, I've been working in RStudio, but I'd still like to access R from the terminal and also to find where things are located. I'm fine with removing and re-installing R if it's easiest to start from square one. I hope the extra detail helps, and I appreciate the responses.
An answer for those not that familiar with Terminal and Bash.
I have done a fresh update install of R from the R.org cran site as part of seeking an answer to your question.
I found this latest install version 3.4.0 installs R for access in Terminal, and also installs R.app as part of the package.
To my understanding, reading support docs, if you have an older version of R it will update that. However it will not update an installation of R installed by the anaconda package.
Where are the R files stored?
I can only assume that with a fresh install of the latest R, R will work for you in Terminal.
To learn where the R files are that are being accessed - in Terminal after starting R, and in R.app, type:
>R.home()
In my case as example:
In R.app - the R version 3.4.0 is accessed in the top directory (not my user folder):
R.home()
[1] "/Library/Frameworks/R.framework/Resources"
In Terminal - the R version 3.3.2 is accessed in the Anaconda package, again in the top level directory.
R.home()
[1] "/anaconda/lib/R"
So I have two different versions of R, and Terminal accesses a different version to R.app.
How can I ensure I access the same version in Terminal as I do in the R.app?
For someone familiar with bash, and how the whole bash command system works I am sure there is a well constructed command. All the same here are some novice solutions.
-
• First Solution:
I could update the anaconda version, however, I would prefer not to as as other elements of the anaconda package my depend on this older version of R. For those not yet familiar with Terminal and bash, not such a novice solution.
-
• Second Solution:
This solution came from mko. It provides a single use solution. From the result above, and checking the directory structure a little further to find this R file.
Finding the significant R file enables me to edit an extension of the above path shown in the R.app. So add /bin/R to enter
/Library/Frameworks/R.framework/Resources/bin/R
Entering and pressing return will start R from this version.
Alternatively, one can find this file and icon in the GUI Finder, lead by the above result, and just double click on it, and it will open Terminal and a session with R running for you. Easy!
One could also make an alias of it and put it on your desktop for easy future starts.
-
• Third Solution:
My last solution I think may be best, adding to mko's solution. Make an alias.
Being in my home directory in Terminal I open .bash_profile using the nano text editor. (If you do not already know how to do this, then best not use this solution.)
I then add the line in this env file.
alias Rv340='/Library/Frameworks/R.framework/Resources/bin/R'
I then save the changes and exit this terminal session. I then open a new Terminal window. (This is so the changes to the env above are incorporated in the new terminal session).
Then when I enter the alias:
Rv340
The version of R I want opens.
You can choose a different alias name to "Rv340".
-
• Fourth Solution:
A second more permanent solution for opening the same version of R in Terminal is as follows.
Copy the path as showing in R.app in response to the R.home() command above, and add that path to PATH in your .bash_profile. (If you do not know already how to do this, then ignore this solution.) Do so as follows.
export PATH="/Library/Frameworks/R.framework/Resources:$PATH"
To my understanding, this ensures that bash looks here for R (and anything else), then moves on to the other paths in PATH. Since this adds this path to the beginning of $PATH, an env variable, bash looks here first where it finds the newer version first, and stops looking.
When it comes to understanding PATH in the env set up in .bash_profile the following two links were helpful.
About PATH.
How to correctly add a path to PATH.
This solution may muck with anaconda's invocation of R. I have yet to check this.
First of all, you have to start terminal application. You can use either built in Terminal.app, or you can use replacement. My favorite one is iTerm2
https://www.iterm2.com
Then, you simply open terminal window and run R. Just like shown below:
Have fun with R!
Just ran into the same issue when installing R-4.0.3.pkg on my MacBook (MacOS BigSur). Can open R.app to the clunky R GUI, but typing in 'R' in terminal doesn't work.
Turns out, an R executable lives here: /Library/Frameworks/R.framework/Versions/4.0/Resources/bin/R
So I added this alias to my newly created .zshrc script:
alias R '/Library/Frameworks/R.framework/Versions/4.0/Resources/bin/R'
Now when I type in R, it opens... I swear this all happened seamlessly in earlier versions.
There is currently a bug in CRAN's R installation package that results in it not correctly installing symbolic links to R and Rscript for commandline use. I've just verified this by inspecting the postflight script in their 4.0.5 installation package. This only impacts MacOS system releases of 20 and above (you can check with uname -r).
I've included more info here, along with what the "correct" fix should be: manually creating symbolic links to /usr/local/bin that point to the R and Rscript binaries themselves. If this is the current challenge, then this would be a far better solution to creating aliases or manipulating PATH in various ways, since it's what the installation package intended to do (and presumably will again soon).
R: command not found
In short, if this is the problem, then Ashkan Mirzaee's answer (https://stackoverflow.com/a/67202173/2093929) to create the symbolic links directly is correct in form, but might not have the right link command. The 4.0.5 package intends instead to use:
mkdir -p /usr/local/bin
cd /usr/local/bin
rm -f R Rscript
ln -s /Library/Frameworks/R.framework/Resources/bin/R .
ln -s /Library/Frameworks/R.framework/Resources/bin/Rscript .
You can create a symbolic link from R and Rscript binaries to /usr/local/bin to add them to the PATH:
sudo ln -s /Library/Frameworks/R.framework/Versions/Current/Resources/bin/R /usr/local/bin
sudo ln -s /Library/Frameworks/R.framework/Versions/Current/Resources/bin/Rscript /usr/local/bin
Now which R should return /usr/local/bin/R and you can use R.
An easy way to open RStudio with admin privilege on macOS:
Go to Applications, then right click on RStudio
Select "Show Package Contents"
Go to Contents/MacOS
Now open terminal(in bash mode). Type sudo and drag the RStudio.exec into terminal and press on ENTER
Now RStudio will have admin access!

How to build qpdf on Windows?

When running the checks for my R-package (via devtools::check()) I face the warning ''qpdf' is needed for checks on size reduction of PDFs. I found this question were it was suggested (if I understood the answer correctly) to run Sys.which(Sys.getenv("R_QPDF", "qpdf")) and see whether qpdf is found or not. In my case this just returns
qpdf
""
so, I think I didn't install qpdf correctly. Unfortunately it seems to be quite complicated to install qpdf on Windows. My first side question is: does it really is so painful and complicated to install qpdf for Windows or is there an easy solution?
I've followed the instructions until it is said to add C:\MinGW-w64\bin and C:\MinGW-w64\lib\mingw to the PATH variable. But then I don't find further specific instructions to install qpdf, only about how to build qpdf with different other programs. The second side question is: is my assumption correct that after I've build qpdf it is installed? But the real question is: What is the best way to build qpdf? I tried the ./config-mingw32 and ./config-mingw64 commands from the section "Building with MinGW" in my C:\MinGW\msys\1.0\bin\bash.exe but got the error messages ./config-mingw32: No such file or directory and have no idea how to fix this issue.
I'm using Windows 10, R version 3.3.2 Patched (2017-01-07 r71934) -- "Sincere Pumpkin Patch" and RStudio 1.0.136.
You basically do not need to build the file on windows. Please follow three steps below:
Download qpdf for windows from https://sourceforge.net/projects/qpdf/?source=typ_redirect
Extract files in a temp folder
Copy the contents of the bin folder to %SystemRoot%\System32
job done!
Sys.which(Sys.getenv("R_QPDF", "qpdf"))
qpdf
"C:\\WINDOWS\\SYSTEM32\\qpdf.exe"
To flesh out an answer provided elsewhere:
If you are running the 32-bit version of R, it is important that you download the 32-bit version of qpdf, which is the version linked from the SourceForge homepage. If you are running a 64-bit installation of R, you will need to do a bit of digging to locate the 64-bit version of qpdf, which is buried a little more deeply (version 10.0.1 is listed here).
Rather than copying files to C:/Windows/System32, a potentially safer option is to extracted the zipped qpdf directory to C:\Program Files. If you do this, you'll need to add C:\Program Files\qpdf-version_number\bin to your system PATH under the environment variables.
To do this within R, run Sys.setenv('PATH' = paste0('C:\Program Files\qpdf-version_numer\bin;', Sys.getenv('PATH')))
To do this in Windows, open the start menu, type "edit the system environment variables" to open the System Properties, and at the bottom of the "Advanced" tab click "Environment variables". Find the "Path" entry under "System variables" and click "Edit". Then, re-start R so it picks up the modified PATH.
One further step may be required to convince Windows that pqdf is safe to run.
Navigate to C:\Program Files\qpdf-version_numer\bin and execute qpdf.exe (by double-clicking). Windows 10 throws up a security warning, as it's an unrecognized executable file. You'll need to use the more options link to find the button to run the program. This done, Windows will recognize the file as safe to run and allow other software, including R, to use it.

Converting MP3 file to WAV

Is it possible to convert a file from .mp3 to .wav in R in order to be able to play the song with R?
Yes (probably). Here's an example:
Converting MP3 to WAV is pretty straightforward:
library(tuneR)
r <- readMP3("04 Trip to Paris.mp3") ## MP3 file in working directory
writeWave(r,"tmp.wav",extensible=FALSE)
(to install tuneR on Linux, see here).
Playback is harder and platform-dependent. tuneR::play() tries to use an external player.
On Windows it tries to guess:
If under Windows and no
player is given, “mplay32.exe” or “wmplayer.exe” (if the
former does not exists as under Windows 7) will be chosen as
the default.
On MacOS, specifying "open" probably works.
On Linux, specifying "play" probably works if you have the sox package installed (sudo apt-get install sox).
So on my MacOS system
tuneR::play("tmp.wav","open")
works.
An alternative that does not use external resources is audio::play().
library(audio)
w <- load.wave("tmp.wav")
play(load.wave("tmp.wav"))
It works on MacOS. I don't know if it works on Windows. It does not work on my Linux system; audio doesn't even install unless you sudo apt-get install portaudio19-dev first, and works poorly even once installed.
(When I say "Linux" here I mean the only system I've tested, Ubuntu 14.04. The sudo apt-get install ... incantations I've listed are likely to work on other reasonably recent Debian-based systems, but ... ???)

Multiple versions of R installed but can't locate or remove the old one

I have downloaded and installed R.
I see it here in Applications folder (I am on a Mac with Yosemite):
Fine. I can launch R.app and indeed, yes, I am running the version I want, which is 3.2.2:
So far so good. I can even open up RStudio and see that I am indeed running 3.2.2!
So after all this, I simply go to my terminal, type
r
and turns out I am running 3.1.1!
I understand the old user of my work computer probably had installed this older version.
So here's what I'm wondering:
If I just installed R.app correctly, WHY is this old version still living on my computer, and how do I get rid of it?
If this is some sort of $PATH thing, WHY doesn't the most obvious location for an app, the Applications folder, get checked for the existence of R?
Thanks.
UPDATE
Turns out the old version of R has been installed by homebrew.
Typing which r in your terminal will give you where the shell thinks R is. Then, you need only uninstall it from that location.
Since we've determined it's homebrew, all you need to do now is brew unlink r; rm -Rf /usr/local/Cellar/r/3.1.1 and you should be golden after you rehash in your shell.
It worked because you installed it using homebrew. To remove it from the system, you must first unlink it and then remove it from the system.

check if a program is installed

I'm writing a function that uses pandoc in R through the command line. How can I use R to check if pandoc installed (I also assume it would have to be on the path which may be an issue for windows users)?
I don't have pandoc to install , but generally I test if a program is installed like this :
pandoc.installed <- system('pandoc -v')==0
For example to test if java is installed:
java.installed <- system('java -version') ==0
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
> java.installed
[1] TRUE
This suggestion is based entirely on my personal experience with this question that RStudio can't seem to read what's in my .bashrc file on my Ubuntu system. I have installed Pandoc using the cabal install pandoc method described here since there were features I needed from more recent versions of Pandoc than were available with Ubuntu's package manager. Running R from the terminal could detect Pandoc as expected using Sys.which, but when using RStudio, it could not. I have no idea if this is a problem with Windows users or not though!
One alternative/workaround in this case is actually creating a vector of typical paths where you expect the Pandoc executable to be found (under the presumption that many users don't really futz around with where they install programs). This info is, again, available at the install page linked above, plus the typical C:\\PROGRA~1\\... path for Windows. Thus, you might have something like the following as the paths to Pandoc:
myPaths <- c("pandoc",
"~/.cabal/bin/pandoc",
"~/Library/Haskell/bin/pandoc",
"C:\\PROGRA~1\\Pandoc\\bin\\pandoc")
# Maybe a .exe is required for that last one?
# Don't think so, but not a regular Windows user
Which you can use with Sys.which() (for example, Sys.which(myPaths)) and some of the other ideas already shared.
If the first option is uniquely matched, then there's no problem: you can use system calls to Pandoc directly.
If any of the other options are uniquely matched, you can write your functions in such a way that you paste in the full path to the executable in your system call instead of just "pandoc".
If the first option and any of the other options are matched, then you can just select the first option and proceed.
If none are matched, prompt the user for the path to their Pandoc installation or provide a message on how to install Pandoc.
I suppose you could use Sys.which and see if the result is an empty string.
pandoc.location <- Sys.which("pandoc")
if(pandoc.location == ""){
print("pandoc not available")
}else{
print("pandoc available")
}

Resources