Error only when running whole block of code - r

I have code that came with a dataset that I downloaded. This code is supposed to convert factor variables to numeric. When I run each line individually, it works fine, but if I try to highlight a whole section, then I get the following error:
Error: unexpected input in ...
It gives me this error for every line of code, but again if I run each line individually, then it works fine. I've never run into this before. What's going on?? Thanks!
Here's the code that I'm trying to run:
library(prettyR)
lbls <- sort(levels(DF$myVar))
lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
DF$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", DF$myVar))
DF$myVar <- add.value.labels(DF$myVar, lbls)
And here is the output with the errors:
> library(prettyR)
"rror: unexpected input in "library(prettyR)
> lbls <- sort(levels(DF$myVar))
"rror: unexpected input in "lbls <- sort(levels(DF$myVar))
> lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
"rror: unexpected input in "lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
> surv.df$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", DF$myVar))
"rror: unexpected input in "DF$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1",DF$myVar))
> surv.df$BATTLEGROUND <- add.value.labels(DF$myVar, lbls)
Error in add.value.labels(surv.df$myVar, lbls) :
object 'lbls' not found

I figured out the issue (actually someone told me what the problem was)
The code was downloaded as a .R file and must have been written using a text editor with non-standard "new line" coding. So I just copied the code to a text editor, did replace all to switch "\n" to
"#####". Then I used replace all again to switch back to new-lines and copied it back into R studio.
And everything works!

Related

How to remove paranthesis but keep the text in it in R

I am trying to clean a dataset with the column: ltaCpInfoDF$weekdays_rate_1
For some of the rows, I would like to do this:
input: Daily(7am-11pm): $1.20 ; output: 7am-11pm: $1.20
The values within the bracket can be different timings for the rows.
Initially, I was thinking of removing by part such as removing "Daily(" with gsub first then removing ")". However, I seem to be facing issues with that.
ltaCpInfoDF$weekdays_rate_1 <- gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1)
Here is the error shown:
Error in gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
invalid regular expression 'Daily(', reason 'Missing ')''
In addition: Warning message:
In gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
TRE pattern compilation error 'Missing ')''
Could someone share with me a better way? Thank you in advance!
Use sub with a capture group:
input <- "Daily(7am-11pm): $1.20"
output <- gsub("\\S+\\s*\\((.*?)\\)", "\\1", input)
output
[1] "7am-11pm: $1.20"
We may use without capturing
gsub("^[^(]+\\(|\\)", "", str1)
[1] "7am-11pm: $1.20"
data
str1 <- "Daily(7am-11pm): $1.20"

How to get data in from OS X clipboard in R

I often see R posts where there is a paste of the output of someone's data, not using a dput(). Sometimes I see people use
data_in <- read.table("clipboard")
which on my OS X machine results in
data_in <- read.table("clipboard")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : clipboard cannot be opened or contains no text
I found some previous answers here and here but for one the copy doesn't work on the way in, and for the other the readLines is leading to runaway R sessions for me, both documented below. I have worked through how to get text to and from the clipboard in OS X, which might be useful to some others, but I am hopeful that there are better methods or there is more finesse possible:
# some test data
data <- rbind(c(1,1,2,3), c(1,1, 3, 4), c(1,4,6,7))
str <- "Here is a special string \n\r with \t many üñåé tokens"
# a test input set of numbers to copy to your clipboard if you have nothing to hand
# [10:17:55, 10:37:40, 10:40:26, 10:48:18, 11:00:17, 11:01:12, 11:06:58, 11:09:20, 11:43:41, 11:48:24, 11:49:14, 12:07:31, 12:10:52, 12:10:52, 12:19:00, 12:19:00, 12:19:43, 12:20:55, 12:38:27, 12:38:27, 12:55:09, 12:55:10, 12:57:31, 12:57:31, 13:04:16, 13:04:16, 13:06:51 13:06:51, 14:55:06, 14:56:10, 15:01:30, 15:28:42, 3:29:17, 15:35:33, 15:58:32, 16:05:07, 16:09:16, 16:10:36, 16:32:57, 16:32:57, 16:34:32, 16:38:16, 17:43:27, 17:53:01, 17:56:14, 18:08:21, 18:17:23, 18:37:23, 18:37:23, 18:43:13, 18:43:13 18:51:43, 18:51:43, 19:05:39, 19:05:39]
# Input works reasonably well for tables and text
cb_handle <- pipe("pbcopy", "w")
write.table(data, file=cb_handle)
close(cb_handle)
cb_handle <- pipe("pbcopy", "w")
write(str, file = cb_handle)
close(cb_handle)
# DO NOT USE THIS ONE as it leads to a runaway R process
cb_handle <- pipe("pbcopy", "r")
read.table(cb_handle)
# This reads in the contents but leaves cleanup to do if not really a table
cb_handle <- pipe("pbpaste")
data_in <- read.table(cb_handle)

error: unexpected input

Observe following code:
Xij <- scan(n=45)
6398400 6273897 6038777 5810740 5673521 5688332 5669445 5682840 5679432
5723561 5555929 5345696 5321179 5199592 5165409 5130744 5132372
4717909 4925673 4999103 4960733 4840036 4824080 4821902
7115151 7114401 7039423 6967723 6967513 6901684
8203359 8286980 8222974 8323470 8067521
5930080 5862383 5994123 6017566
5558436 5754304 5613530
4595506 5074887
3443322
n <- length(Xij); TT <- trunc(sqrt(2*n))
i <- rep(1:TT,TT:1); j <- sequence(TT:1)
i <- as.factor(i); j <- as.factor(j)
If I now try to run following command:
Xij.1 <- xtabs(Xij˜i+j)
I get the error 'Error: unexpected input in "Xij.1 <- xtabs(Xij˜"
This exercise is however, analog to an example from the book 'Modern Actuarial Risk Theory using R'.
Does somebody know what is possibly wrong?
It works fine:
xtabs(Xij~i+j)
Notice that in R formula you have to use tilde character ~ rather then ˜ character. Those are two different characters.

R/SublimeREPL R - code not working in sublime but working in RStudio

I am following the tutorials of Machine Learning for Hackers (https://github.com/johnmyleswhite/ML_for_Hackers) and I am using Sublime Text as a text editor. To run my code, I use SublimeREPL R.
I am using this code, taken directly from the book:
setwd("/path/to/folder")
# Load the text mining package
library(tm)
library(ggplot2)
# Loading all necessary paths
spam.path <- "data/spam/"
spam2.path <- "data/spam_2/"
easyham.path <- "data/easy_ham/"
easyham.path2 <- "data/easy_ham_2/"
hardham.path <- "data/hard_ham/"
hardham2.path <- "data/hard_ham_2/"
# Get the content of each email
get.msg <- function(path) {
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
msg <- text[seq(which(text == "")[1] + 1, length(text),1)]
close(con)
return(paste(msg, collapse = "\n"))
}
# Create a vector where each element is an email
spam.docs <- dir(spam.path)
spam.docs <- spam.docs[which(spam.docs != "cmds")]
all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path, p, sep = "")))
# Log the spam
head(all.spam)
This piece of code works fine in RStudio (with the data provided here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification) but when I run it in Sublime, Iget the following error message:
> all.spam <- sapply(spam.docs,
+ function(p) get.msg(file.path(spam.path, p)))
Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In readLines(con) :
invalid input found on input connection 'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'
2: In readLines(con) :
invalid input found on input connection 'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'
3: In readLines(con) :
invalid input found on input connection 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
4: In readLines(con) :
incomplete final line found on 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
5: In readLines(con) :
invalid input found on input connection 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
6: In readLines(con) :
incomplete final line found on 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
>
I get the same results when I take the code from John Myles White's repo.
How can I fix this?
Thanks
I think the problem got is in using encoding=latin1, you can just remove this one, I test it in my environment, it ran well.
spam.docs <- paste(spam.path,spam.docs,sep="")
all.spam <- sapply(spam.docs,get.msg)
Warning message:
In readLines(con) :
incomplete final line found on 'XXXXXXXXXXXXXXXXX/ML_for_Hackers-master/03-Classification/data/spam/00136.faa39d8e816c70f23b4bb8758d8a74f0'
still some warnnings in it, but it can produce the results well.
Thanks.

R: Gwidget simpleError in envRefInferField , for gedit

For every gedit i have i get this error on the console line whenever i run my entire R of widget code
<simpleError in envRefInferField(x, what, getClass(class(x)), selfEnv): ‘no_items’ is not a valid field or method name for reference class “Entry”>
Here is some example of my gedits
textBox_keyword1 <- gedit(text="",container=grp_keywordSearch)
textBox_keyword2 <- gedit(text="",container=grp_keywordSearch)
textBox_keyword3 <- gedit(text="",container=grp_keywordSearch)
textBox_keyword4 <- gedit(text="",container=grp_keywordSearch)
textBox_numTweet <- gedit(text="",container=grp_numTweet)
Any idea how to solve this ?

Resources