Reading in XML data from URL using parLapply R - r

Beginner here, so please alert me to any formatting errors, or general etiquette mistakes.
I am attempting to scrape data from ~800 websites, to increase speed I was looking into running in parallel.
I have simplified my code to produce the error:
url=c("https://apps.hydroshare.org/apps/nwm-forecasts/api/GetWaterML/?config=long_range&geom=channel_rt&variable=streamflow&COMID=6251152&startDate=2018-03-12&lag=t18z&member=4",
"https://apps.hydroshare.org/apps/nwm-forecasts/api/GetWaterML/?config=long_range&geom=channel_rt&variable=streamflow&COMID=6244518&startDate=2018-03-12&lag=t18z&member=4"
cores.number=2
cluster1=makeCluster(cores.number)
clusterExport(cluster1,"url")
clusterEvalQ(cluster1,library("xml2"))
clusterEvalQ(cluster1,url)
temp=parLapply(cluster1,
url,
function(x)
read_xml(x,fill=TRUE,row.names=NULL))
stopCluster(cluster1)
I get the following error:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: HTTP error 500.
When doing the same functions with lapply instead of parLapply I have no issues and when I adjust the cores.number to 1 I have no issues.
Thanks

Related

Making one table in R

I am learning programming languages for the first time. I am trying to combine tables in R using:
Trip_data <- bind_rows(oct_td, sep_td, aug_td,jul_td, jun_td,may_td)
and I get the following error:
Error in bind_rows():
! Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'symbol'
Run rlang::last_error() to see where the error occurred.
What does that means? and how do I solve it?
I have another set of tables and I managed to combine them just fine, I also ran that yesterday and I did not get an error.

Error when trying to read_html from a website with delayed loading

I'm trying to scrape the public consultations from the European Commission's website with R for my master thesis. However, the loading function in the first second when opening the website seems to prevent this.
By running
test <- read_html("https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives")
I get the following error:
Fehler in readBin(3L, "raw", 65536L) :
Failure when receiving data from the peer
("Fehler in" translates to "error in")
I couldn't find a solution yet and also struggle to identify the specific problem because I would actually expect the function the return the HTML of the loading screen that appears in the first second.
Does anyone have an idea how to get beyond the loading part with R?

Is it possible to not printing error messages when knitting an RMarkdown messages? [duplicate]

I am running a simulation study in R. Occassionally, my simulation study produces an error message. As I implemented my simulation study in a function, the simulation stops when this error message occurs. I know that it is bad practice to suppress errors, but at this moment to me there is no other option than to suppress the error and then go on with the next simulation until the total number of simulations I like to run. To do this, I have to suppress the error message R produces.
To do this, I tried different things:
library(base64)
suppressWarnings
suppressMessages
options(error = expression(NULL))
In the first two options, only warnings and message are suprressed, so that's no help. If I understand it correctly, in the last case, all error messages should be avoided. However, that does not help, the function still stops with an error message.
Has someone any idea why this does not work the way I expect it to work? I searched the internet for solutions, but could only find the above mentioned ways.
In the function I am running my simulation, a part of the code is analysed by the external program JAGS (Gibbs sampler) and the error message is produced by this analysis. Might this be where it goes wrong?
Note that I do not have to supress a certain/specific error message, as there are no other error messages produced, it is 'good enough' to have an option that supresses just all error messages.
Thanks for your time and help!
As suggested by the previous solution, you can use try or tryCatch functions, which will encapsulate the error (more info in Advanced R). However, they will not suppress the error reporting message to stderr by default.
This can be achieved by setting their parameters. For try, set silent=TRUE. For tryCatch set error=function(e){}.
Examples:
o <- try(1 + "a")
> Error in 1 + "a" : non-numeric argument to binary operator
o <- try(1 + "a", silent=TRUE) # no error printed
o <- tryCatch(1 + "a")
> Error in 1 + "a" : non-numeric argument to binary operator
o <- tryCatch(1 + "a", error=function(e){})
There is a big difference between suppressing a message and suppressing the response to an error. If a function cannot complete its task, it will of necessity return an error (although some functions have a command-line argument to take some other action in case of error). What you need, as Zoonekynd suggested, is to use try or trycatch to "encapsulate" the error so that your main program flow can continue even when the function fails.

Problems parsing StreamR JSON data

I am attempting to use the streamR in R to download and analyze Twitter, under the pretense that this library can overcome the limitations from the twitteR package.
When downloading data everything seems to work fabulously, using the filterStream function (just to clarify, the function captures Twitter data, just running it will provide the json file -saved in the working directory- that needs to be used in further steps):
filterStream( file.name="tweets_test.json",
track="NFL", tweets=20, oauth=credential, timeout=10)
Capturing tweets...
Connection to Twitter stream was closed after 10 seconds with up to 21 tweets downloaded.
However, when moving on to parse the json file, I keep getting all sorts of errors:
readTweets("tweets_test.json", verbose = TRUE)
0 tweets have been parsed.
list()
Warning message:
In readLines(tweets) : incomplete final line found on 'tweets_test.json'
Or with this function from the same package:
tweet_df <- parseTweets(tweets='tweets_test.json')
Error in `$<-.data.frame`(`*tmp*`, "country_code", value = NA) :
replacement has 1 row, data has 0
In addition: Warning message:
In stream_in_int(path.expand(path)) : Parsing error on line 0
I have tried reading the json file with jsonlite and rjson with the same results.
Originally, it seemed that the error came from special characters ({, then \) within the json file that I tried to clean up following the suggestion from this post, however, not much came out of it.
I found out about the streamR package from this post, which shows the process as very straight forward and simple (which it is, except for the parsing part!).
If any of you have experience with this library and/or these parsing issues, I'd really appreciate your input. I have been searching non stop but haven't been able to locate a solution.
Thanks!

"PickSoftThreshold" function issue in WGCNA?

Currently I am applying one dataset to WGCNA codes for Network construction and Module detection. Here I have to use a function called "pickSoftThreshold" to detect the network topology. When I run that it shows me this error-
> sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 18641.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 18641 of 54675
Error in serialize(data, node$con) : error writing to connection
Any idea how to get rid of that?
Thanks in Advance!!
I myself just started using WGCNA a couple of days ago and am not really familiar with it yet. But the error looks like you are using too many genes (up to 55k): I think you should find a way to filter out some of them, if your computer isn't powerful enough.
(Ideas from http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html )

Resources