RStudio fails to read GBP pound sign - r

After a bit of a gap I updated RStudio and all packages this morning.
I have a little function that I use to prettify currencies
currency <- function(n, k=FALSE) {
n <- ifelse(!k, str_c("£", comma(round(n,0))), str_c("£", comma(round(n/1000,0)),"k"))
return(n)
}
It now fails to parse - the problem is the £ sign.
Error in parse(text = lines, n = -1, srcfile = srcfile) :
[path]/plot_helpers.R:72:
25: unexpected INCOMPLETE_STRING
71: currency <- function(n, k=FALSE) {
72: n <- ifelse(!k, str_c("
^
In addition: Warning message:
In readLines(con, warn = FALSE, n = n, ok = ok, skipNul = skipNul) :
invalid input found on input connection '/home/richardc/ownCloud/prodr/R/plot_helpers.R'
However I can run the code within the editor and it works fine. What is causing readLines to fail in this way ?

After some messing about today, I realise that the problem is in devtools. To recap here is a test project testencr.prj:
library(stringr)
library(devtools)
main <- function(n) {
n <- str_c("£", n)
return(n)
}
I can run the code fine from the console, but when I use devtools it barfs on the UTF-8 character:
> devtools::load_all()
Loading testencr
Error in parse(text = lines, n = -1, srcfile = srcfile) :
/home/richardc/ownCloud/test/R/test_enc.R:6:14: unexpected INCOMPLETE_STRING
5: main <- function(n) {
6: n <- str_c("
^
In addition: There were 27 warnings (use warnings() to see them)
But when I add a specific encoding into the DESCRIPTION
Encoding: UTF-8
It is all fine (this notwithstanding the project defaults are UTF8)
Loading testencr
There were 36 warnings (use warnings() to see them)```

I was having the same problem, specifically in a Shiny app (not the rest of the time). I managed to solve it by using this unicode instead of £:
enc2utf8("\u00A3")

Related

how to get document() to accept µ

I have been using this code
nutList <- gsub("_µg","",nutList, fixed = TRUE)
to remove the string "_µg" from the variable nutList for at least a year and have had no problems. Now I'm trying use document() to see where I have problems in getting ready for a package.
Error in parse(text = lines, n = -1, srcfile = srcfile) : [filename] : unexpected INCOMPLETE_STRING
nutList <- gsub("_
It appears that document() doesn't like the µ that is there. How can I modify the code so it is accepted?

Ignoring the symbol ° in R devtools function document()

I would like to create a package for internal usage (not to distribute somewhere). One of my functions contains the line
if (data$unit[i] != "°C") {
It works perfectly in the script, but if I want to create the documentation for my package using document() from devtools, i get the error
Error in parse(text = lines, keep.source = TRUE, srcfile = srcfilecopy(file, path_to_my_code: unexpected INCOMPLETE_STRING
279: if (! is.na(data$unit[i]){
280: if (data$unit[i] != "
addition: Warning message:
In readLines(con, warn = FALSE, n = n, ok = ok, skipNul = skipNul) :
invalid input found on input connection 'path_to_my_code'
If I delete the °-character, document() works. But I need this character there, so this is not an option.
When using double-\ in the if-clause, my function doesn't detect °C anymore as shown here:
test <- c("mg/l", "°C")
"\\°C" %in% test
[1] FALSE
If I use tryCatch, the documentation is also not created.
Replacing "°C" by gsub(pattern = '\\\\', replacement = "", x = '\\°C') causes the function to crash at the double-\ .
How can I tell document() that everything is fine and it should just create the files?

Error when trying to write files using 'if' function with l_ply

I'm relatively new to using R in this way and I'm completely stuck with the following problem.
I'm attempting to save html pages for parliamentary debates to a local folder in order to carry out some scraping in the future. I've written the following function (relying on other snippets of code rather than entirely freestyle!) in order to construct the directory, strip the URL down to a more understandable format (e.g. "2010-12_academieshl.html"), and then, if the file does not exist, write the file to the specified folder. (At this point I'm aware that the use of gsub below is kind of clumsy!)
dlPages <- function(pageurl, folder ,handle) {
dir.create(folder, showWarnings = FALSE)
gsub_URL <- gsub("/stages.html", "", link_list)
gsub_URL <- gsub("http://services.parliament.uk/bills/", "", gsub_URL)
gsub_URL <- gsub("/", "_", gsub_URL)
page_name <- str_c(gsub_URL, ".html")
if (!file.exists(str_c(folder, "/", page_name))) {
content <- try(getURL(pageurl, curl = handle))
write(content, str_c(folder, "/", page_name))
Sys.sleep(1)
} }
I'm then using l_ply to run a list (link_list) of links over the function:
handle <- getCurlHandle()
l_ply(link_list, dlPages,
folder = "lords_bills_all",
handle = handle)
The following error message is being returned:
Error in file(file, ifelse(append, "a", "w")) : invalid 'description' argument
Along with the following warning messages:
In addition: Warning messages:
1: In if (!file.exists(str_c(folder, "/", page_name))) { :
the condition has length > 1 and only the first element will be used
2: In if (file == "") file <- stdout() else if (substring(file, 1L, :
the condition has length > 1 and only the first element will be used
3: In if (substring(file, 1L, 1L) == "|") { :
the condition has length > 1 and only the first element will be used
Can someone help me understand where I'm going wrong? Also, is it best to use an 'if' argument in this scenario or write a 'for' loop instead?
Thanks in advance.
Andy

R/SublimeREPL R - code not working in sublime but working in RStudio

I am following the tutorials of Machine Learning for Hackers (https://github.com/johnmyleswhite/ML_for_Hackers) and I am using Sublime Text as a text editor. To run my code, I use SublimeREPL R.
I am using this code, taken directly from the book:
setwd("/path/to/folder")
# Load the text mining package
library(tm)
library(ggplot2)
# Loading all necessary paths
spam.path <- "data/spam/"
spam2.path <- "data/spam_2/"
easyham.path <- "data/easy_ham/"
easyham.path2 <- "data/easy_ham_2/"
hardham.path <- "data/hard_ham/"
hardham2.path <- "data/hard_ham_2/"
# Get the content of each email
get.msg <- function(path) {
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
msg <- text[seq(which(text == "")[1] + 1, length(text),1)]
close(con)
return(paste(msg, collapse = "\n"))
}
# Create a vector where each element is an email
spam.docs <- dir(spam.path)
spam.docs <- spam.docs[which(spam.docs != "cmds")]
all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path, p, sep = "")))
# Log the spam
head(all.spam)
This piece of code works fine in RStudio (with the data provided here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification) but when I run it in Sublime, Iget the following error message:
> all.spam <- sapply(spam.docs,
+ function(p) get.msg(file.path(spam.path, p)))
Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In readLines(con) :
invalid input found on input connection 'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'
2: In readLines(con) :
invalid input found on input connection 'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'
3: In readLines(con) :
invalid input found on input connection 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
4: In readLines(con) :
incomplete final line found on 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
5: In readLines(con) :
invalid input found on input connection 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
6: In readLines(con) :
incomplete final line found on 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
>
I get the same results when I take the code from John Myles White's repo.
How can I fix this?
Thanks
I think the problem got is in using encoding=latin1, you can just remove this one, I test it in my environment, it ran well.
spam.docs <- paste(spam.path,spam.docs,sep="")
all.spam <- sapply(spam.docs,get.msg)
Warning message:
In readLines(con) :
incomplete final line found on 'XXXXXXXXXXXXXXXXX/ML_for_Hackers-master/03-Classification/data/spam/00136.faa39d8e816c70f23b4bb8758d8a74f0'
still some warnnings in it, but it can produce the results well.
Thanks.

Error trying to read a PDF using readPDF from the tm package

(Windows 7 / R version 3.0.1)
Below the commands and the resulting error:
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp
\RtmpS8Uql1\pdfinfo167c2bc159f8': No such file or directory
How do I solve this issue?
EDIT I
(As suggested by Ben and described here)
I downloaded Xpdf copied the 32bit version to
C:\Program Files (x86)\xpdf32
and the 64bit version to
C:\Program Files\xpdf64
The environment variables pdfinfo and pdftotext are referring to the respective executables either 32bit (tested with R 32bit) or to 64bit (tested with R 64bit)
EDIT II
One very confusing observation is that starting from a fresh session (tm not loaded) the last command alone will produce the error:
> dat <- pdf(elem = list(uri = "17214.pdf"), language="de", id="id1")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpKi5GnL
\pdfinfode8283c422f': No such file or directory
I don't understand this at all because the function variable is not defined by tm.readPDF yet. Below you'll find the function pdf refers to "naturally" and to what is returned by tm.readPDF:
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0674bd8c>
> library(tm)
> pdf <- readPDF(PdftotextOptions = "-layout")
> pdf
function (elem, language, id)
{
meta <- tm:::pdfinfo(elem$uri)
content <- system2("pdftotext", c(PdftotextOptions, shQuote(elem$uri),
"-"), stdout = TRUE)
PlainTextDocument(content, meta$Author, meta$CreationDate,
meta$Subject, meta$Title, id, meta$Creator, language)
}
<environment: 0x0c3d7364>
Apparently there is no difference - then why use readPDF at all?
EDIT III
The pdf file is located here: C:\Users\Raffael\Documents
> getwd()
[1] "C:/Users/Raffael/Documents"
EDIT IV
First instruction in pdf() is a call to tm:::pdfinfo() - and there the error is caused within the first few lines:
> outfile <- tempfile("pdfinfo")
> on.exit(unlink(outfile))
> status <- system2("pdfinfo", shQuote(normalizePath("C:/Users/Raffael/Documents/17214.pdf")),
+ stdout = outfile)
> tags <- c("Title", "Subject", "Keywords", "Author", "Creator",
+ "Producer", "CreationDate", "ModDate", "Tagged", "Form",
+ "Pages", "Encrypted", "Page size", "File size", "Optimized",
+ "PDF version")
> re <- sprintf("^(%s)", paste(sprintf("%-16s", sprintf("%s:",
+ tags)), collapse = "|"))
> lines <- readLines(outfile, warn = FALSE)
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6\pdfinfo8d419174450': No such file or direc
Apparently tempfile() simply doesn't create a file.
> outfile <- tempfile("pdfinfo")
> outfile
[1] "C:\\Users\\Raffael\\AppData\\Local\\Temp\\RtmpquRYX6\\pdfinfo8d437bd65d9"
The folder C:\Users\Raffael\AppData\Local\Temp\RtmpquRYX6 exists and holds some files but none is named pdfinfo8d437bd65d9.
Intersting, on my machine after a fresh start pdf is a function to convert an image to a PDF:
getAnywhere(pdf)
A single object matching ‘pdf’ was found
It was found in the following places
package:grDevices
namespace:grDevices [etc.]
But back to the problem of reading in PDF files as text, fiddling with the PATH is a bit hit-and-miss (and annoying if you work across several different computers), so I think the simplest and safest method is to call pdf2text using system as Tony Breyal describes here.
In your case it would be (note the two sets of quotes):
system(paste('"C:/Program Files/xpdf64/pdftotext.exe"',
'"C:/Users/Raffael/Documents/17214.pdf"'), wait=FALSE)
This could easily be extended with an *apply function or loop if you have many PDF files.

Resources