rnoaa station data pull times out - r

I'm trying to pull NOAA data using R. I've done this pull before but all of a sudden it's not working. I have rnoaa loaded (package/library) and I have an authorization key from the NOAA. I try to run this command:
station_data <- ghcnd_stations()
And I get this error:
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : Timeout was reached: Operation timed out after 10000 milliseconds with 0 out of 0 bytes received
I found pages online that suggested updating everything. First I updated all my packages and then updated to a new version of R. All that done, it's still giving me a timeout message with this simple command. I know that the ghcnd pull sometimes take a while, but it's timing out after about 10 seconds. Is this just an noaa issue (as is sometimes the case) and I should try again tomorrow? Or is there something I can actually do to make this work? Can I change the timeout period so that it waits longer? Is the NOAA just overloaded because of the hurricane?

Looks like it was just something on the NOAA end of things. It took a day, but it's finally back up and running properly.

Related

Manually close connection in request stream HTTP.jl

My problem is the following. I am making a GET request which returns a stream and after some time I get my desired data but the webserver does not close the connection so I want to close it on my side. My code is as follows:
using HTTP
HTTP.open(:GET, "https://someurl.com", query=Dict(
"somekey"=>"somevalue"
)) do io
for i = 1:4
println("DATA: ---")
#show io
println(String(readavailable(io)))
end
#info "Close read"
closeread(io)
#info "After close read"
#show io
end
println("At the end!")
However I never reach the last line. I have tried dozens of different approaches by consulting the docs of HTTP.jl, but none worked for me and I suspect that is, because this webserver is not sending the Close: Connection, but I have not been able to find an example that closes the connection on the client side manually / forcefully.
Interesting note: When running this from the REPL and closing the connection via hitting Ctrl-C a couple of times and then rerunning the script it hangs forever. I have to wait some random amount of seconds to minutes then before I can run it again "successfully". I suspect this has to do with the stale connection not being closed properly.
As is evident I am neither very proficient in networks programming nor julia, so any help would be highly appreciated!
EDIT: I suspect I was not quite clear enough on the behaviour of the webserver and what I wanna do so I will try to break it down as simple as possible: I want to get responses from the webserver until I detect a certain keyword. After that I wanna close the connection - the webserver would keep on sending me data but I already got all I am interested in so I don't want to wait for another few minutes for the webserver to close the connection for me!
Your code is assuming the 4 times you will get the data by calling readavailable which might not be true depending on the buffer state,
Rather than that your loop should be:
while !eof(io)
println("DATA: ---")
println(String(readavailable(io)))
end
In your case the connection gets stacked because you try to read four chunks of data and perhaps you are getting everything in the first chunk and than the connection blocks.
On top of that, if your a using the do syntax you should not close the resource yourself - it will be done automatically at the end of the block.

How can I determine the amount of time a rvest query is for the http response

I have been using the rvest package to perform screen scraping for some data analytics, but there are some queries that are taking a few seconds each to actually collect the data. e.g.
sectorurl = paste("http://finance.yahoo.com/q/pr?s=,",ticker,"+Profile", sep= "")
index <- read_html( sectorurl)
The second step is that one that is taking the time, so I was wondering if there were any diagnostics in the background of R or a clever package that could be run that will determine "network wait time" as opposed to CPU time, or something similar.
I would like to know if I'm stuck with the performance I have, or if actually my R code is performing well and it is http response that is limiting my process speed.
I don't think you will be able to separate the REST call from the client side code. However, my experience with accessing web services is that the network time generally dominates the total running time, with the "CPU" time being an order of magnitude, or more, less.
One option for you to try would be to paste your URL, which appears to be a GET, into a web browser and see how long it takes to complete from the console. You can compare this time against the total time taken in R for the same call. For this, try using system.time, which returns the CPU time used by a given expression.
require(stats)
system.time(read_html(sectorurl))
Check out the documentation for more information.

Slow querying speed for RODBC with Vertica

I usually access Vertica in two ways: vsql in command line and RODBC through R. However, queries taking ~ 20s in vsql will usually 10-15 minutes through RODBC. Does anyone have this problem?
If you dig into the vertica.log, you might be able to see when your sql statement is actually getting processed, or if it's actually held up by queuing or something else.
Calling with the same user?
Very probably this is a Fetch issue. I'd suggest:
Option 1: continue using RODBC and increase the number of rows retrieved per Fetch cycle (rows_at_time). For example:
ch <- odbcConnect("mydsn", uid="mouser", pwd=“XXX", rows_at_time=8192)
Option 2: try replacing RODBC with RJDBC.

R: URL function timing out prematurely

I'm currently requesting data, sent back as XML, via the URL function and passing in parameters in the url, as so
asset.data <- url("http://www.blah.com/?parameter=value", open = "r")
Each request constitutes a vector of length ~10,000. I've had some problems with the request timing out when looped (I'm calling data for about 500 "assets"). I've set the timeout in options to something high (600 seconds, or ten minutes) but still notice that the loop will stop if a call takes longer than 60 seconds or so (definitely less than the 10 minutes I've defined). I feel like I must be missing something about how the connection timeout works - any advice here?

Executing a time consuming stored procedure on ASP.NET 2.0 + Sql Server 2008

In one part of our application, we have to save some data. The saving of data takes some time and it could vary from few secs to few minutes. To handle this, so there isn't a sql time out we have following process in place.
Note: This is all setup by our DBA and this process is in use for several years without any problem.
1: Call a storedProcedure that places a job in a table. This stored procedure returns me a jobid. The input parameters to this sp is an exec statement to execute another storedprocdure with parameters.
2: Then I call another stored procedure passing the jobid, that will return the status of the job e.g. completed(true) else false + an error message.
3: I loop through step 2 for max 5 times with a sleep of 2 seconds. If the status returned is completed or there is some error, I break out of the loop. And after 5 times still the status in incomplete, I just display message that it's taking too long and please visit the page later (this isn't a good idea but so far the process has finished in about 2-3 times looping and not reached 5 times max but there are chances).
Now I am wondering if there is any better way to do the above job or even today, after several years, this is the best option I have?
Note: Using ASP.NET 2.0 Webforms / ADO.NET
I would look into queueing the operation, or if that's not an option (sounds like it might not be) then I would execute the stored procedure asynchronously, and use a callback to let you know when it's finished.
This question might help:
Ping to stored procedure to know execution completed in .net?

Resources