I am working on setting up sparklyr utilizing R but I keep getting an error message. I essentially have this type in:
install.packages("sparklyr")
library(sparklyr)
spark_install(version = "2.1.0")
sc <- spark_connect(master = "local")
However when I get to create my spark connect I am receiving the following error message:
Using Spark: 2.1.0
Error in if (a[k] > b[k]) return(1) else if (a[k] < b[k]) return(-1L) :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: running command '"C:\WINDOWS\SYSTEM32\java.exe" -version' had status 2
2: In compareVersion(parsedVersion, "1.7") : NAs introduced by coercion
Any thoughts?
The following R code works fine from my Windows 8 laptop:
> inst<- "https://www.sec.gov/Archives/edgar/data/51143/000104746916010329/ibm-20151231.xml"
> options(stringsAsFactors = FALSE)
> xbrl.vars <- xbrlDoAll(inst, cache.dir = "XBRLcache", prefix.out = NULL, verbose=TRUE)
However, when I attempt to run it from my Ubuntu 16.04 machine, I receive the following output:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, method = "auto", quiet = !verbose) :
cannot download all files
In addition: Warning message:
In download.file(file, cached.file, method = "auto", quiet = !verbose) :
URL 'https://www.sec.gov/Archives/edgar/data/51143/000104746916010329/ibm-20151231.xsd': status was '404 Not Found'
It's finding the initial xml file but then cannot find the referenced schemas. Any help would be appreciated. Thanks in advance.
I would like to create a loop to load this files through read.esetof bioconductor.
I tried that:
for(k in 1:29){
expr <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/LRRadjustedextremes0.5kgchr",k,".txt")
pdat <- paste0("/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt")
ffdat <- paste0("/home/proj/MT_Nellore/R/LRR/Chr_adjusted/probeslabeladjustedchr",k,".txt")
eset <- read.eset(exprs.file="expr", pdat.file="/home/proj/MT_Nellore/R/eBrowser/Adjusted/Samplesbinary0.5.txt", fdat.file="ffdat")
}
However I get this error:
## Error in file(file, "r") : cannot open the connection
## In addition: Warning message:
## In file(file, "r") : cannot open file 'ffdat': No such file or directory
Any suggestions?
Ah - just spotted the error - you must remove quotes from around the "ffdat" on the final line, and same for the "expr"
(Windows 7 / R version 3.0.1)
Below the commands and the resulting error:
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp
\RtmpS8Uql1\pdfinfo167c2bc159f8': No such file or directory
How do I solve this issue?
EDIT I
(As suggested by Ben and described here)
I downloaded Xpdf copied the 32bit version to
C:\Program Files (x86)\xpdf32
and the 64bit version to
C:\Program Files\xpdf64
The environment variables pdfinfo and pdftotext are referring to the respective executables either 32bit (tested with R 32bit) or to 64bit (tested with R 64bit)
EDIT II
One very confusing observation is that starting from a fresh session (tm not loaded) the last command alone will produce the error:
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpKi5GnL
\pdfinfode8283c422f': No such file or directory
I don't understand this at all because the function variable is not defined by tm.readPDF yet. Below you'll find the function pdf refers to "naturally" and to what is returned by tm.readPDF:
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0674bd8c>
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0c3d7364>
Apparently there is no difference - then why use readPDF at all?
EDIT III
The pdf file is located here: C:\Users\Raffael\Documents
> getwd()
[1] "C:/Users/Raffael/Documents"
EDIT IV
First instruction in pdf() is a call to tm:::pdfinfo() - and there the error is caused within the first few lines:
> outfile <- tempfile("pdfinfo")
> on.exit(unlink(outfile))
> status <- system2("pdfinfo", shQuote(normalizePath("C:/Users/Raffael/Documents/17214.pdf")),
+ stdout = outfile)
> tags <- c("Title", "Subject", "Keywords", "Author", "Creator",
+ "Producer", "CreationDate", "ModDate", "Tagged", "Form",
+ "Pages", "Encrypted", "Page size", "File size", "Optimized",
+ "PDF version")
> re <- sprintf("^(%s)", paste(sprintf("%-16s", sprintf("%s:",
+ tags)), collapse = "|"))
> lines <- readLines(outfile, warn = FALSE)
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6\pdfinfo8d419174450': No such file or direc
Apparently tempfile() simply doesn't create a file.
> outfile <- tempfile("pdfinfo")
> outfile
[1] "C:\\Users\\Raffael\\AppData\\Local\\Temp\\RtmpquRYX6\\pdfinfo8d437bd65d9"
The folder C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6 exists and holds some files but none is named pdfinfo8d437bd65d9.
Intersting, on my machine after a fresh start pdf is a function to convert an image to a PDF:
getAnywhere(pdf)
A single object matching ‘pdf’ was found
It was found in the following places
package:grDevices
namespace:grDevices [etc.]
But back to the problem of reading in PDF files as text, fiddling with the PATH is a bit hit-and-miss (and annoying if you work across several different computers), so I think the simplest and safest method is to call pdf2text using system as Tony Breyal describes here.
In your case it would be (note the two sets of quotes):
system(paste('"C:/Program Files/xpdf64/pdftotext.exe"',
'"C:/Users/Raffael/Documents/17214.pdf"'), wait=FALSE)
This could easily be extended with an *apply function or loop if you have many PDF files.
I've tried read.spps(), but I get an encoding error:
library(foreign)
read.spss('persona.sav')
#>re-encoding from CP1252
Error in iconv(names(rval), cp, "") :
unsupported conversion from 'CP1252' to ''
In addition: Warning message:
In read.spss("persona.sav") :
persona.sav: Unrecognized record type 7, subtype 18 encountered in system file
Try re-encoding it as a utf-8 file:
library(foreign)
read.spss('persona.sav', reencode='utf-8')
You can try adding 'to.data.frame = TRUE' into read.spss()
For instance:
df <- read.spss("data.sav", to.data.frame = TRUE)