getURL get stuck, need a wait function - r

I am trying to use R to surf the web but I have a strange problem, lets say that I have a list named URLlist containing some URL. Here is my code
for (k in 1:length(URLlist)){
temp = getURL(URLlist[k])
}
I don't know why but at some random URL, R blocks. It has nothing to do with the URL as it can work for an execution of the loop for but not for another one for the same URL. I think that the loop is going to fast and that the download of data doesn't follow. So I was thinking of making the code wait for 1 seconde before each new calling of getURL function, but I didn't find such a wait function.
Any idea please ? thank you !

?Sys.sleep()
Description:
Suspend execution of R expressions for a given number of seconds
Usage:
Sys.sleep(time)
Arguments:
time: The time interval to suspend execution for, in seconds.
Whether or not this will solve your problem is another issue.
I would suggest looking at the XML package and using htmlParse() to surf the web with R since there are rarely instances where you want html being returned as text.

Related

R/Rstudio Frustration - Can't Stop Code Execution

I often include View() statements in my R scripts. If I accidentally forget the closing bracket at the end of the line, and then run the line of code from the script window using ctrl-enter, R just keeps trying to execute the remainder of my script. I don't know why it does that (rather than using the + symbol to prompt me to provide further input).
Moreover, I've tried to stop this by setting break points in my code - I can click on the LHS of the page and a little red circle appears. But the breakpoints don't seem to work - R just ignores them and keeps going.
The only way I can get out of it is by killing the process in the Windows task manager and then going back in afterwards. But it's wasting a lot of time.
Does anyone know how I can fix this please?
Thank you.
In effect, what your function is processing looks like that:
... %>% View(
lm(am~cyl, mtcars)
...
...
As R can't find the bracket for ) it includes remaining statements as input to View and searches for the bracket.
Solutions
Kind of depends on what you want to do with those scripts but if the intention is to run them in the background consider using callr. This package lets you run R from R and offers kill methods to kill the process you started that way.
On Windows pressing Esc should enable you to get back to the console but if it's a memory intense process it may be difficult.
You may try pressing Ctrl+c in order to kill the process.

Executing an R script in a way other than using source() in JRI

I am new to R and have been trying to use JRI. Through JRI, I have used the "eval()" function to get certain results. If I want to execute an R script, I have used "source()". However I am now in a situation where I need to execute a script on continuously incoming data. While I can still use "source()", I don't think that would be an optimal way from a performance perspectve.
What I did was to read the entire R script into memory and then try and use "eval()" passing the script - but this does not seem to work. I have ensured that the script has been correctly loaded into memory - that is because if I write this script (loaded into the memory) into a file and source this newly created file, it does produce the expected results.
Is there a way for me to not keep sourcing the same file over and over again and execute it from memory? Each of my data units are independent and have to be processed independently and as soon as they become available. I cannot wait to collect a bunch of data units and then pass them on to the R script.
I have searched a lot and not found anything related to this. Any pointers which could help me in this direction would be really helpful.
The way I handled this is as below -
I enclosed the entire script into a function.
I sourced the script file (which now contains the function) at the start of the execution of my program.
The place where I was sourcing the file, I am now just calling the function which contains the script itself i.e. -
REXP result = rengine.eval("retVal<-" + getFunctionName() + "()");
Here, getFunctionName() gives me the name of the name of the function which contains the script.
Since this is loaded into the memory and available, I do not have to source the script file every time I want to execute the script. Any arguments being passed to the script are done as env. variables.
This seems to be a workaround, but solves my problem. Any better options are welcome.

Read line causing a wait for input R

I'm a total noob at R and I may have bitten off a little more than I can chew but if you can help me I will appreciate it.
So what i'm trying to do is retrieve the top trending from twitter (working) and then use them as part of a URL to try pull back their definitions. My issue is the readline function seems to wait for me to hit return before it attempts the URL and i'm looking for a way to make it do the rest automagically, please find my code below
definitions <- ""
lapply(X=hashtags,FUN=function(X){
tagdef <- c(tagdefurl,X[[dfPointer]])
tagdef<- paste(tagdef,collapse=" ")
tagdef <- stringr::str_replace(string=tagdef,pattern=" ", replacement="")
definitions <- tryCatch(readline(tagdef),silent=F)
})
tagdef is defined as is supposed to be the list to store the returned definitions in
I've checked all my OAuth nonsense and everything on that side is fine, i'm getting the trends back without issue. Can anyone give me some pointers?
Unfortunately, you might have just stumbled on a case of "user error due to similarly named functions". In R, there is both readline (which reads a line from the terminal (in interactive use)) and readLines (which is used to *read some or all text lines from a connection).
The former expects user input, and the first argument is "prompt", hence the waiting for input.
Remember also that cApItaLiZation matters in R.

Why does loading saved R file increase CPU usage?

I have an R script that I want to run frequently. Few months ago when I wrote it and initiated, there was no problem.
Now, my script is consuming almost all (99%) of the CPU and its slower than it used to be. I am running the script in a server and other users experience slow response from the server when the script is running.
I tried to find out the piece of code where its slow. The following loop is taking almost all the time and CPU that is used by the script.
for (i in 1:100){
load (paste (saved_file, i, ".RData", sep=""))
Do something (which is fast)
assign (paste ("var", i, sep=""), vector)
}
The loaded data is about 11 MB in each iteration. When I run above script for an arbitrary "i", the loading of file step takes longer time than other commands.
I spent few hours reading forum posts but could not get any hint about my problem. It would be great if you could point out if there's something I am missing or suggest more effective way to load a file in R.
EDIT: Added space in the codes to make it easier to read.
paste(saved_file, i, ".RData", sep = "")
Loads a object at each iteration, with name xxx1, xxx2, and so on.
Did you tried to rm the object at the end of loop? I guess the object stays in memory, regardless of your variable being reused.
Just a tip: add spaces in your code (like i did), it's much more easier to read/debug.

R: Dealing with functions that sometimes crash the R session?

Have an R function ( let's call it MyFunction ) that sometimes crashes the R session , most of the time it does not.
Have to apply this function to a large number of objects in a sequential manner.
for(i in 1:nrow(objects))
{
result[i] <- MyFunction(objects[i]);
}
I'm coming from a C# background - where functions rarely crash the "session" and programmers normally surround such function calls in try - catch blocks. However, in R I've seen some functions that just crash the session and using tryCatch is of no help since the function does not cause an exception but a full blast session crash ;-)
Just wondering what's the best way of "catching" the crash.
I'm considering writing a Python script that calls the R function from Python ( via one of the R-Python connectors ) and catching the R crash in Python. Would that work ?
Any advise ?
Cheers !
Use the mcparallel function from the parallel package to run the function in a forked process. That way, if it crashes R, only the subprocess crashes, and an error is returned to the main process. If you want to apply this function to a large number of objects and collect the results in a list, use mclapply
Hello such behaviour is very rare in my experience. You might not know that there is a debuger that can help you to go step by step in you function.
install.packages('debug') #install the debug package
library(debug)
mtrace(myFunctionToBeDebuged) #this function will start the debuger
mtrace(myFunctionToBeDebuged, FALSE) #to stop the function to be traced
NOTE: when you are in the debuger, should you want to quit it do qqq()

Resources