why does ddply malfunction in this case? - r

In the book Machine learning for hackers Chapter4, there is a line of code contains ddply which is:
from.weight<-ddply(priority.train,.(From.EMail),summarize,Freq=length(Subject))
and it doesn't work.I have seen somewhere else saying that we can get the same result by changing the code to:
from.weight <- melt(with(priority.train, table(From.EMail)), value.name="Freq")
from.weight <- from.weight[with(from.weight, order(Freq)), ]
Is there any possibility that we still use ddply and make it?

Related

How do I close unused connections after read_html in R

I am quite new to R and am trying to access some information on the internet, but am having problems with connections that don't seem to be closing. I would really appreciate it if someone here could give me some advice...
Originally I wanted to use the WebChem package, which theoretically delivers everything I want, but when some of the output data is missing from the webpage, WebChem doesn't return any data from that page. To get around this, I have taken most of the code from the package but altered it slightly to fit my needs. This worked fine, for about the first 150 usages, but now, although I have changed nothing, when I use the command read_html, I get the warning message " closing unused connection 4 (http:....." Although this is only a warning message, read_html doesn't return anything after this warning is generated.
I have written a simplified code, given below. This has the same problem
Closing R completely (or even rebooting my PC) doesn't seem to make a difference - the warning message now appears the second time I use the code. I can run the querys one at a time, outside of the loop with no problems, but as soon as I try to use the loop, the error occurs again on the 2nd iteration.
I have tried to vectorise the code, and again it returned the same error message.
I tried showConnections(all=TRUE), but only got connections 0-2 for stdin, stdout, stderr.
I have tried searching for ways to close the html connection, but I can't define the url as a con, and close(qurl) and close(ttt) also don't work. (Return errors of no applicable method for 'close' applied to an object of class "character and no applicable method for 'close' applied to an object of class "c('xml_document', 'xml_node')", repectively)
Does anybody know a way to close these connections so that they don't break my routine? Any suggestions would be very welcome. Thanks!
PS: I am using R version 3.3.0 with RStudio Version 0.99.902.
CasNrs <- c("630-08-0","463-49-0","194-59-2","86-74-8","148-79-8")
tit = character()
for (i in 1:length(CasNrs)){
CurrCasNr <- as.character(CasNrs[i])
baseurl <- 'http://chem.sis.nlm.nih.gov/chemidplus/rn/'
qurl <- paste0(baseurl, CurrCasNr, '?DT_START_ROW=0&DT_ROWS_PER_PAGE=50')
ttt <- try(read_html(qurl), silent = TRUE)
tit[i] <- xml_text(xml_find_all(ttt, "//head/title"))
}
After researching the topic I came up with the following solution:
url <- "https://website_example.com"
url = url(url, "rb")
html <- read_html(url)
close(url)
# + Whatever you wanna do with the html since it's already saved!
I haven't found a good answer for this problem. The best work-around that I came up with is to include the function below, with Secs = 3 or 4. I still don't know why the problem occurs or how to stop it without building in a large delay.
CatchupPause <- function(Secs){
Sys.sleep(Secs) #pause to let connection work
closeAllConnections()
gc()
}
I found this post as I was running into the same problems when I tried to scrape multiple datasets in the same script. The script would get progressively slower and I feel it was due to the connections. Here is a simple loop that closes out all of the connections.
for (i in seq_along(df$URLs)){function(i)
closeAllConnections(i)
}

Problems with reassignInPackage

I am trying to understand the way the YourCast R package works and make it work with my data.
For example, if a function produces errors, I
get the source code of that function using YourCast:::bad.fn
add outputs of critical
values at critical stages
use reassignInPackage(name="original.fn", package="YourCast", value="my.fn")
Once I find the cause of the error, I fix it in the function and reassign it in the package.
However, for some strange reason this does not work for non-hidden functions.
For example:
install.packages("YourCast")
Library(YourCast)
YourCast:::check.depvar
This will print the hidden function check.depvar. One line if (all(ix == 1:3)) will produce an error message if any of the x is missing.
Thus, I change the whole function to the following and replace the original formula:
mzuba.check.depvar <- function(formula)
{
return (grepl("log[(]",as.character(formula)[2]))
}
reassignInPackage("check.depvar",
pkgName="YourCast",
mzuba.check.depvar)
rm(mzuba.check.depvar)
Now YourCast:::check.depvar will print my version of that function, and everything is fine.
However
YourCast::yourcast or YourCast:::yourcast or simply yourcast will print the non-hidden function yourcast. Suppose I want to change that function as well.
reassignInPackage(name="yourcast",
pkgName="YourCast",
value=test)
Now, YourCast::yourcast and YourCast:::yourcast will print the new, modified version but yourcast still gives the old version!
That might not a problem if I could simply call YourCast::yourcast instead of yourcast, but that produces some kind of error that I can't trace back because suddenly R-Studio does not print error messages at all anymore!, although it still does something if it is capable to:
> Uagh! do something!
> 1 + 1
[1] 2
> Why no error msg?
>
Restarting the R-session will solve the error-msg problem, though.
So my question is: How do I reassign non-hidden functions in packages?
Furthermore (this would faciliate testing a lot), is there a way to make all hidden functions available without using the ::: operator? I.e., How to export all functions from a package?

Error in code from split-apply-combine paper - how to resolve?

To try and get to grips with data manipulations in R, I've started reading Hadley's paper on split-apply-combine.
I'm on page 3 and trying to go through the code to understand it. Unfortunately the code is erroring and my reproduction is faithful (I've done c&p and handtyped). As I'm trying to learn this stuff and I'm right at the beginning I can't actually tell what's wrong with it. I tried it on both R2.5 and R3.0
library("MASS")
library("plyr")
data(ozone)
one<-ozone[1,1,]
month<-ordered(rep(1:12,length=72))
model<-rlm(one ~ month - 1)
deseas<-resid(model)
deseasf<-function(value) {rlm(value ~ month - 1)}
models<-aaply(ozone,1:2,deseasf)
deseas<-aaply(models,1:2,resid)
Where the models line errors with Error: Results must have one or more dimensions.
Can somebody tell me whether it works for them, or what needs to be fixed/amended if it doesn't and why?
PS - Can't check on http://plyr.had.co.nz/ for errata because my work proxy currently blocks the site!
It should be
models <- alply(ozone, 1:2, deseasf)
deseas <- ldply(models, resid)
It turns out this is a bug in aaply and Hadley has said he will look into it soon:
https://groups.google.com/forum/#!topic/manipulatr/kg2wDU96mGM

Message within function (for status) not showing immediately in console

I have written a function, which takes some time to run (due to a 1000+ loop on a huge dataset in combination with forecasting model testing).
To have any idea on the status, while the function is called, I use the message command inside the for-loop in the function. The problem is that all the messages are shown in the console after the function is finished, instead of showing immediately. So it doesn't help me :)
I tried to find a solution on Stackoverflow, but didn't found one. I looked for instance on the question "showing a status message in R". All answers and example codes in that topic still give me only text in the console after a function is processed instead of immediately.
How to solve this? Is there maybe a setting in R which prevents immediate printing of message text in the console?
note: examples I tried below, which give the same results as my function; showing text after processing the function.
example1 (Joshua Ulrich):
for(i in 1:10) {
Sys.sleep(0.2)
# Dirk says using cat() like this is naughty ;-)
#cat(i,"\r")
# So you can use message() like this, thanks to Sharpie's
# comment to use appendLF=FALSE.
message(i,"\r",appendLF=FALSE)
flush.console()
}
example2 (Tyler):
test.message <- function() {
for (i in 1:9){
cat(i)
Sys.sleep(1)
cat("\b")
}
}
edit: the first example does work ('flush console' was the problem)...but when I tested it, I commented out flush console for some reason :S
test.message <- function() {
for (i in 1:9){
cat(paste(as.character(i),'\n'))
flush.console()
Sys.sleep(1)
}
}
which is similar to the recommendation by fotNelton.
Edit: ttmaccer is most likely right. I've just tested on a Ubuntu server and the code works without flushing the console.
I seem to think this maybe a windows specific problem. On linux or running R in a cygwin shell the flush.console() may not be needed.
You may be interested in using one of the progress bar functions (winProgressBar, tkProgressBar, or txtProgressBar). The win version only works on windows, but the win and tk versions have the advantage that they do not clutter your output, but rather open another small window and display the progress there.
The progress through a loop can be shown with the progress bar, but other detailed information can be updated and shown with the label argument.

R Language: Error in read.table(file.path(data.dir, file_name1)) : no lines available in input

I am having a hard time coding in R language. What I am trying to do is read large amount of data in to one data frame, and make pretty pictures.
This is what I have:
# assign data
file_name1<-"data1_txt"
file_name2<-"data2_txt"
data.dir<-"/...../Documents/R programing Language/"
for(i in 1:length(1)){
newData1<-read.table(file.path(data.dir, file_name1))
#Replace negative numbers with NA
xx <- which(datavalues<0)
datavalues[xx] <- NA
newData2<-read.table(file.path(data.dir,file_name2))
}
Error I have is:
Error in read.table(file.path(data.dir, file_name1)) :
no lines available in input
I am trying to figure out by myself, but I am very new to R language, and I don't have enough knowledge of functions in R. Please explain what this error means and advice on my coding.
Thank you very much,
Uka
Similar situation was solved here with similar question (I know this post is quite old). Recently I got such error parsing several files... The reason was some files were empty which makes sense of error message.
Anyway, just make sure your input will not be empty using try ou trycatch as suggested on mentioned link.

Resources