Reading MS Access (.mdb, .accdb) into R; Mac to PC conversion - r

I am working on a program that pulls data out of .mdb and .accdb files and creates the appropriate tables in R.
My working program on my Mac looks like this:
library(Hmisc)
p <- '/Users/Josh/Desktop/Directory/'
mdbfilename <- 'x.mdb'
mdbconcat <- paste(p, mdbfilename, sep = "")
mdb <- mdb.get(mdbconcat)
mdbnames <- data.frame(mdb.get(mdbconcat, tables = TRUE))
list2env(mdb, .GlobalEnv)
accdbfilename <- 'y.accdb'
accdbconcat <- paste(p, accdbfilename, sep = '')
accdb <- mdb.get(accdbconcat)
accdbnames <- data.frame(mdb.get(accdbconcat, tables = TRUE))
list2env(accdb, .GlobalEnv)
This works fine on my Mac, but on the PC I'm developing this for, I get this error message:
Error in system(paste("mdb-tables -1", file), intern = TRUE) :
'mdb-tables' not found
I've thought a lot about using RODBC, but this program allows me to have the tables arranged in a way where subsequent querying and dplyr functions work. Is there any way to get these function to work on a Windows machine?

Related

Error in file(file, "rt") : cannot open the connection - Unsure of what to do

I am currently working through Coursera's R Programming course and have hit a bit of a snag with this assignment. I have been getting various errors (not I'm not totally sure I've nailed down) but this is a new one and no matter what I do I can't seem to shake it.
Whenever I run the below code it comes back with
Error in file(file, "rt") : cannot open the connection
pollutantmean <- function (directory, pollutant, id){
files<- list.files(path = directory, "/", full.names = TRUE)
dat <- data.frame()
dat <- sapply(file = directory,"/", read.csv)
mean(dat["pollutant"], na.rm = TRUE)
}
I have tried numerous different solutions posted here on SO for this issue but none of it has worked. I made sure that I am running after setting the working directory to the folder with all of the CSV files and I can see all of the files in the file pane. I have also moved that working directory around a few times since some of the suggestions were to put it on the desktop, etc. but none of that has worked. I am currently running R Studio as an admin but that does not seem to have done anything and I have also modified the permissions on the specdata file to ensure there's no weird restrictions there. Any help is appreciated.
Here are two possible implementations:
# list all files in "directory", read them, combine and then take mean of "pollutant" column
pollutantmean_1 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
dat <- data.table::rbindlist(dat) |> as.data.frame()
mean(dat[, 'pollutant' ], na.rm = TRUE)
}
# list all files in "directory", read them, take the mean of "pollutant" column for each file and return them
pollutantmean_2 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
pollutant_means <- sapply(dat, function(x) mean(x[ , 'pollutant' ], na.rm = TRUE))
names(pollutant_means) <- basename(files)
pollutant_means
}

Running R notebook from command line

Ok, I'm trying to run this script through a batch file in Windows (server 2016), but it just starts to push out lineshifts and dots to the output screen:
"c:\Program Files\R\R-3.5.1\bin\rscript.exe" C:\projects\r\HentTsmReport.R
The script works like a charm in RStudio, it reads a html file (a TSM backup report) and transforms the content into a data frame, then it saves one of the html tables as a csv-file.
Why do I just get a gunk of nothing on the screen instead of an output to the csv when running through rscript.exe?
My goal is to run this script through a scheduled task each day to keep a history of the backup status in a table to keep track of failed backups through tivoli.
This is the script in the R-file:
library(XML)
library(RCurl)
#library(tidyverse)
library(rlist)
theurl <- getURL("file://\\\\chill\\um\\backupreport20181029.htm",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
head(tables)
test <- tables[5] # select table number 5
write.csv(test, file = "c:\\temp\\backupreport.csv")

Having trouble reading table using sqldf package (R)

Background:
I can successfully pull a particular dataset (shown in the code below) from the internet using the read.csv() function. However, when I try to utilize the sqldf package to speed up the process using read.csv.sql() it produces errors. I've tried various solutions but can't seem to solve this problem.
I can successfully pull the data and create the data frame that I want with read.csv() using the following code:
ce_data <- read.csv("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
fill=TRUE, header=TRUE, sep="")
To test the functionality of sqldf on my machine, I successfully tested read.csv.sql() by reading in the data as 1 variable rather than the 5 desired using the following code:
library(sqldf)
ce_data_sql1 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
sql = "select * from file")
To produce the result that I got using read.csv() but utilizing the speed of read.csv.sql(), I tried this code:
ce_data_sql2 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
fill=TRUE, header=TRUE, sep="", sql = "select * from file")
Unfortunately, it produced this error:
trying URL
'http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData' Content
type 'text/plain' length 24846571 bytes (23.7 MB) downloaded 23.7 MB
Error in sqldf(sql, envir = p, file.format = file.format, dbname =
dbname, : unused argument (fill = TRUE)
I have tried various methods to address the errors, using sqldf documentation and have been unsuccessful.
Question:
Is there a solution where I can read in this table with 5 variables desired using read.csv.sql()?
The reason you are reading it in as a single variable is because you did not correctly specify the separator for the original file. Try the following, where sep = "\t", for tab-separated:
ce_data_sql2 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
sep = "\t", sql = "select * from file")
.
The error you are getting in the final example:
Error in sqldf(sql, envir = p, file.format = file.format, dbname =
dbname, : unused argument (fill = TRUE)
Is due to the fact that read.csv.sql does not accept the fill argument.

Using R to download newest files from ftp-server

I have a a number of files named
FileA2014-03-05-10-24-12
FileB2014-03-06-10-25-12
Where the part "2014-03-05-10-24-12" means "Year/Day/Month/Hours/Minutes/Seconds/". These files reside on a ftp-server. I would like to use R to connect to the ftp-server and download whatever file is newest based on date.
I have started trying to list the content, using RCurl and dirlistonly. Next step will be to try to parse and find the newest file. Not quite there yet...
library(RCurl)
getURL("ftpserver/",verbose=TRUE,dirlistonly = TRUE)
This should work
library(RCurl)
url <- "ftp://yourServer"
userpwd <- "yourUser:yourPass"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv = FALSE,dirlistonly = TRUE)
-
times<-lapply(strsplit(filenames,"[-.]"),function(x){
time<-paste(c(substr(x[1], nchar(x[1])-3, nchar(x[1])),x[2:6]),
collapse="-")
time<-as.POSIXct(time, "%Y-%m-%d-%H-%M-%S", tz="GMT")
})
ind <- which.max(times)
dat <- try(getURL(paste(url,filenames[ind],sep=""), userpwd = userpwd))
So datis now containing the newest file
To make it reproduceable: all others can use this instead of the upper part use
filenames<-c("FileA2014-03-05-10-24-12.csv","FileB2014-03-06-10-25-12.csv")

Is it possible to install pandoc on windows using an R command?

I would like to download and install pandoc on a windows 7 machine, by running a command in R. Is that possible?
(I know I can do this manually, but when I'd show this to students - the more steps I can organize within an R code chunk - the better)
What about simply downloading the most recent version of the installer and starting that from R:
a) Identify the most recent version of Pandoc and grab the URL with the help of the XML package:
library(XML)
page <- readLines('http://code.google.com/p/pandoc/downloads/list', warn = FALSE)
pagetree <- htmlTreeParse(page, error=function(...){}, useInternalNodes = TRUE, encoding='UTF-8')
url <- xpathSApply(pagetree, '//tr[2]//td[1]//a ', xmlAttrs)[1]
url <- paste('http', url, sep = ':')
b) Or apply some regexp magic thanks to #G.Grothendieck instead (no need for the XML package this way):
page <- readLines('http://code.google.com/p/pandoc/downloads/list', warn = FALSE)
pat <- "//pandoc.googlecode.com/files/pandoc-[0-9.]+-setup.exe"
line <- grep(pat, page, value = TRUE); m <- regexpr(pat, line)
url <- paste('http', regmatches(line, m), sep = ':')
c) Or simply check the most recent version manually if you'd feel like that:
url <- 'http://pandoc.googlecode.com/files/pandoc-1.10.1-setup.exe'
Download the file as binary:
t <- tempfile(fileext = '.exe')
download.file(url, t, mode = 'wb')
And simply run it from R:
system(t)
Remove the needless file after installation:
unlink(t)
PS: sorry, only tested on Windows XP

Resources