Is it possible to install pandoc on windows using an R command? - r

I would like to download and install pandoc on a windows 7 machine, by running a command in R. Is that possible?
(I know I can do this manually, but when I'd show this to students - the more steps I can organize within an R code chunk - the better)

What about simply downloading the most recent version of the installer and starting that from R:
a) Identify the most recent version of Pandoc and grab the URL with the help of the XML package:
library(XML)
page <- readLines('http://code.google.com/p/pandoc/downloads/list', warn = FALSE)
pagetree <- htmlTreeParse(page, error=function(...){}, useInternalNodes = TRUE, encoding='UTF-8')
url <- xpathSApply(pagetree, '//tr[2]//td[1]//a ', xmlAttrs)[1]
url <- paste('http', url, sep = ':')
b) Or apply some regexp magic thanks to #G.Grothendieck instead (no need for the XML package this way):
page <- readLines('http://code.google.com/p/pandoc/downloads/list', warn = FALSE)
pat <- "//pandoc.googlecode.com/files/pandoc-[0-9.]+-setup.exe"
line <- grep(pat, page, value = TRUE); m <- regexpr(pat, line)
url <- paste('http', regmatches(line, m), sep = ':')
c) Or simply check the most recent version manually if you'd feel like that:
url <- 'http://pandoc.googlecode.com/files/pandoc-1.10.1-setup.exe'
Download the file as binary:
t <- tempfile(fileext = '.exe')
download.file(url, t, mode = 'wb')
And simply run it from R:
system(t)
Remove the needless file after installation:
unlink(t)
PS: sorry, only tested on Windows XP

Related

R: How to download single file from specific branch of private GitHub repo?

How to download single file from specific branch of GitHub private repo using R?
It can be easily done for default branch, e.g.:
require(httr)
github_path = "https://api.github.com/repos/{user}/{repo}/contents/{path_to}/{file}"
github_pat = Sys.getenv("GITHUB_PAT"))
req <- content(GET(github_path,
add_headers(Authorization = paste("token", github_pat))), as = "parsed")
tmp <- tempfile()
r1 <- GET(req$download_url, write_disk(tmp))
...but I can't figure out how to do that for specific branch.
Tried to include branch name in github_path but it didn't work (Error in handle_url(handle, url, ...)).
Since it is easy with classic curl, e.g.:
curl -s -O https://{PAT}#raw.githubusercontent.com/{user}/{repo}/{branch}/{path_to}/{file}
...I tried to do it like:
tmp <- tempfile()
curl::curl_download("https://{PAT}#raw.githubusercontent.com/{user}/{repo}/{branch}/{path_to}/{file}", tmp)
But it didn't work as well.
What am I missing?
Thanks!
You can use curl in R like this to include the auth header and the path to the desired file:
library(curl)
h <- new_handle(verbose = TRUE)
handle_setheaders(h,
"Authorization" = "token ghp_XXXXXXX"
)
con <- curl("https://raw.githubusercontent.com/username/repo/branch/path/file.R", handle = h)
readLines(con)

How can I create a data frame in R from a zip file with multiple levels located in an URL?

I have been trying to work this out but I have not been able to do it...
I want to create a data frame with four columns: country-number-year-(content of the .txt file)
There is a .zip file in the following URL:
https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/0TJX8Y/PZUURT
The file contains a folder with 49 folders in it, and each of them contain 150 .txt files give or take.
I first tried to download the zip file with get_dataset but did not work
if (!require("dataverse")) devtools::install_github("iqss/dataverse-client-r")
library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
get_dataset("=doi:10.7910/DVN/0TJX8Y/PZUURT", key = "", server = "dataverse.harvard.edu")
"Error in get_dataset("=doi:10.7910/DVN/0TJX8Y/PZUURT", key = "", server = "dataverse.harvard.edu") :
Not Found (HTTP 404)."
Then I tried
temp <- tempfile()
download.file("https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/0TJX8Y/PZUURT",temp)
UNGDC <-unzip(temp, "UNGDC+1970-2018.zip")
It worked to some point... I downloaded the .zip file and then I created UNGDC but nothing happened, because it only has the following information:
UNGDC
A connection with
description "/var/folders/nl/ss_qsy090l78_tyycy03x0yh0000gn/T//RtmpTc3lvX/fileab730f392b3:UNGDC+1970-2018.zip"
class "unz"
mode "r"
text "text"
opened "closed"
can read "yes"
can write "yes"
Here I don't know what to do... I have not found relevant information to proceed... Can someone please give me some hints? or any web to learn how to do it?
Thanks for your attention and help!!!
How about this? I used the zip package to unzip, but possibly the base unzip might work as well.
library(zip)
dir.create(temp <- tempfile())
url<-'https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/0TJX8Y/PZUURT'
download.file(url, paste0(temp, '/PZUURT.zip'), mode = 'wb', exdir = temp)
unzip(paste0(temp, '/PZUURT.zip'), exdir = temp)
Note in particular I had to set the mode = 'wb' as I'm on a Windows machine.
I then saw that the unzipped archive had a _MACOSX folder and a Converted sessions folder. Assuming I don't need the MACOSX stuff, I did the following to get just the files I'm interested in:
root_folder <- paste0(temp,'/Converted sessions/')
filelist <- list.files(path = root_folder, pattern = '*.txt', recursive = TRUE)
filenames <- basename(filelist)
'filelist' contains the full paths to each text file, while 'filenames' has just each file name, which I'll then break up to get the country, the number and the year:
df <- data.frame(t(sapply(strsplit(filenames, '_'),
function(x) c(x[1], x[2], substr(x[3], 1, 4)))))
colnames(df) <- c('Country', 'Number', 'Year')
Finally, I can read the text from each of the files and stick it into the dataframe as a new Text field:
df$Text <- sapply(paste0(root_folder, filelist), function(x) readChar(x, file.info(x)$size))

Reading MS Access (.mdb, .accdb) into R; Mac to PC conversion

I am working on a program that pulls data out of .mdb and .accdb files and creates the appropriate tables in R.
My working program on my Mac looks like this:
library(Hmisc)
p <- '/Users/Josh/Desktop/Directory/'
mdbfilename <- 'x.mdb'
mdbconcat <- paste(p, mdbfilename, sep = "")
mdb <- mdb.get(mdbconcat)
mdbnames <- data.frame(mdb.get(mdbconcat, tables = TRUE))
list2env(mdb, .GlobalEnv)
accdbfilename <- 'y.accdb'
accdbconcat <- paste(p, accdbfilename, sep = '')
accdb <- mdb.get(accdbconcat)
accdbnames <- data.frame(mdb.get(accdbconcat, tables = TRUE))
list2env(accdb, .GlobalEnv)
This works fine on my Mac, but on the PC I'm developing this for, I get this error message:
Error in system(paste("mdb-tables -1", file), intern = TRUE) :
'mdb-tables' not found
I've thought a lot about using RODBC, but this program allows me to have the tables arranged in a way where subsequent querying and dplyr functions work. Is there any way to get these function to work on a Windows machine?

Download zipped files from ftp via RCurl

I had problems to download zipped files from a ftp server. But now I have solved the problem and because I haven't found any solution to my problem here, I'm sharing my approach.
First I tried it with
download.file()
But there was the problem that my password was ending with an "#". That's why the solution with submittign user and password within the URL wasn't working. The double # was apparently confusing R.
url <- ftp://user:password##url
You'll find the solution below.
Maybe someone has some improvements.
Maybe for someone it's usefull,
Florian
Here is my solution:
library(RCurl)
url<- "ftp://adress/"
filenames <- getURL(url, userpwd="USER:PASSWORD", ftp.use.epsv = FALSE, dirlistonly = TRUE) #reading filenames from ftp-server
destnames <- filenames <- strsplit(filenames, "\r*\n")[[1]] # destfiles = origin file names
con <- getCurlHandle( ftp.use.epsv = FALSE, userpwd="USER:PASSWORD")
mapply(function(x,y) writeBin(getBinaryURL(x, curl = con, dirlistonly = FALSE), y), x = filenames, y = paste("C:\\temp\\",destnames, sep = "")) #writing all zipped files in one directory
Hopefully for anybody it's usefull!
Regards,
Florian
If you have no particular reason to stay with Rcurl, you can use this bash-based method:
URL <- "ftp.server.ca"
USR <- "aUserName"
MDP <- "myPassword"
OUT <- "output.file"
cmd <- paste("wget -m --ftp-user=",USR," --ftp-password=",MDP, " ftp://", URL," -O ", OUT, sep="")
system(cmd)

Using R to download newest files from ftp-server

I have a a number of files named
FileA2014-03-05-10-24-12
FileB2014-03-06-10-25-12
Where the part "2014-03-05-10-24-12" means "Year/Day/Month/Hours/Minutes/Seconds/". These files reside on a ftp-server. I would like to use R to connect to the ftp-server and download whatever file is newest based on date.
I have started trying to list the content, using RCurl and dirlistonly. Next step will be to try to parse and find the newest file. Not quite there yet...
library(RCurl)
getURL("ftpserver/",verbose=TRUE,dirlistonly = TRUE)
This should work
library(RCurl)
url <- "ftp://yourServer"
userpwd <- "yourUser:yourPass"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv = FALSE,dirlistonly = TRUE)
-
times<-lapply(strsplit(filenames,"[-.]"),function(x){
time<-paste(c(substr(x[1], nchar(x[1])-3, nchar(x[1])),x[2:6]),
collapse="-")
time<-as.POSIXct(time, "%Y-%m-%d-%H-%M-%S", tz="GMT")
})
ind <- which.max(times)
dat <- try(getURL(paste(url,filenames[ind],sep=""), userpwd = userpwd))
So datis now containing the newest file
To make it reproduceable: all others can use this instead of the upper part use
filenames<-c("FileA2014-03-05-10-24-12.csv","FileB2014-03-06-10-25-12.csv")

Resources