What to do when a NOAA ERDDAP dataset is not found? - r

I'm trying to download some gridded ERDDAP data using the rnoaa package in R. While the data retrieval works perfectly for some datasets, I'm having some problems getting the data for some datasets in particular. For example when I run:
library (rnoaa)
ds.info <- erddap_info ("noaa_pfeg_95de_54ab_a60a")
erddap_grid (ds.info,
time = c("2005-01-01", "2015-01-01"),
altitude = c (0,0),
latitude = c (3.25, 3.75),
longitude = c (72.5, 73.25),
fields = "all")
I get the following error:
`Error: (404) - Resource not found: /erddap/griddap/ncdcOwDly.csv (Currently unknown datasetID=ncdcOwDly)`.
The error is not really consistent because it works sometimes when I try different time-spans. But I get it pretty much every single time I try to download data from the datasets noaa_pfeg_95de_54ab_a60a, noaa_pfeg_1a4b_0c2a_2365 and some others by NOAA-NCDC.
Because erddap_grid works for some datasets but not for others, I'm inclined to think it's not a bug. Maybe it is a problem of the ERDDAP server?, or maybe something to do with my API key? Is there a way around it?
Update - 2015-01-10: It seems it is a server's problem. When trying to download the data using the address generated by the web interface (see below) I get the same error. I guess I'll just have to wait until "they" fix the problem with the database.
http://coastwatch.pfeg.noaa.gov/erddap/griddap/ncdcOw6hr.csv?u[(2006-01-01):1:(2015-01-09T18:00:00Z)][(10.0):1:(10.0)][(3.25):1:(3.75)][(72.5):1:(73.25)],v[(2006-01-01):1:(2015-01-09T18:00:00Z)][(10.0):1:(10.0)][(3.25):1:(3.75)][(72.5):1:(73.25)]

ERDDAP servers often become overloaded and 404 on some requests. These are public-facing servers that do heavy data lifting, after all.
So the answer here is to try again after waiting some time (please wait a while to be nice to the ERDDAP administrators), and contact the server administrator to be sure that your IP address has not been blacklisted for performing too many requests.

Related

How to decrease the amount of URL requests per second in RISmed R package?

I am currently using the R packages RISmed and bibliometrix to analyze publication data.
Here is my code:
library(bibliometrix)
library(RISmed)
search_topic <- "(((Dental Health Services[MESH]) OR (Dentistry[MESH] OR Dentists[MESH)) AND Research[MESH]) AND Research[MESH] AND (Computers[MESH] OR Medical Informatics[MESH] OR Information Technology[MESH]) "
search_query <- EUtilsSummary(search_topic, retmax=200, mindate=2014, maxdate=2020)
summary(search_query)
D <- EUtilsGet(search_query)
M <- pubmed2df(D)
Everything works great, but when I run the last line, I get the error:
In readLines(the_url) :
cannot open URL 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmode=xml&dbfrom=pubmed&id=31173232&cmd=neighbor': HTTP status was '429 Unknown Error'
It says in the documentation for the EUtilsGet function that users can post no more than 3 times/second:
In order not to overload the E-utility servers, NCBI recommends that
users post no more than three URL requests per second and limit large
jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time
during weekdays. Failure to comply with this policy may result in an
IP address being blocked from accessing NCBI.
After consulting this post too, it became clear that I might be able to fix the solution by either restarting R and updating the package (both of which had no effect), or by modifying the code to increase the time between requests.
How might one increase the time between requests, as this seems like the only option for me in this case? If there are any other alternative solutions, those would also be appreciated as well.

R tryCatch RODBC function issue

We have a number of MS Access databases on a server which are copies from remote locations which are updated overnight. We collate some of the data from these machines for reporting purposes on a daily basis. Sometimes the overnight update fails, meaning we don’t have access to all of the databases, so I am attempting to write an R script which will test if we can connect (using a list of the database paths), and output an updated version of the list including only those which we can connect to. This will then be used to run a further script which will only update the data related to the available databases.
This is what I have so far (I am new to R but reasonably proficient in SAS and SQL – attempting to use R both as a learning exercise and for potential cost savings);
{
# Create Store data locations listing
A=matrix(c(1000,1,"One","//Server/Comms1/Access.mdb"
,2000,2,"Two","//Server/Comms2/Access.mdb"
,3000,3,"Three","//Server/Comms3/Access.mdb"
)
,nrow=3,ncol=4,byrow=TRUE)
# Add column names
colnames(A)<-c("Ref1","Ref2","Ref3","Location")
#Create summary for testing connections (Ref1 and Location)
B<-A[,c(1,4)]
ConnectionTest<-function(Ref1,Location)
{
out<-tryCatch({ch<-odbcDriverConnect(paste("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=",Location))
sqlQuery(ch,paste("select ",Ref1," as Ref1,COUNT(variable) as Count from table"))}
,error=matrix(c(Ref1,0),nrow=1,ncol=2,byrow=TRUE)
)
return(out)
}
#Run function, using 'B' to provide arguments
C<-apply(B,1,function(x)do.call(ConnectionTest,as.list(x)))
#Convert to matrix and add column names
D<-matrix(unlist(C),ncol=2,byrow=T)
colnames(D)<-c("Ref1","Count")
}
When I run the script I get the following error message;
Error in value[3L] : attempt to apply non-function
I am guessing this is because I am using TryCatch incorrectly inside the UDF?
Does anyone have any advice on what I am doing incorrectly, or even if this is the best way to do what I am attempting?
Thanks
(apologies if this is formatted incorrectly, having to post on my phone due to Stackoverflow posting being blocked)
Edit - I think I fixed the 'Error in value[3L]' issue by adding function(e) {} around the matrix function in the error part of the tryCatch.
The issue now is that the script just fails if it can't reach one of the databases, rather than doing the matrix function. Do I need to add something else to make it ignore the error?
Edit 2 - it seems tryCatch does now work - it processes the
alternate function upon error but also shows warnings about the error, which makes sense.
As mentioned in the edit above, using 'function(e) {}' to wrap the Matrix function in the error section of the tryCatch fixed the 'Error in value[3L]' issue, so the script now works, but displays error messages if it can't access a particular channel. I am guessing the 'warning' section of the tryCatch can be used to adjust these as necessary.

Error in curl: Curl Fetch Memory - RStudio Curl Memory - Failed Writing Body

For Context, I have been using RStudio to pull data from our reporting platform (Domo) then I manipulate the tables in R, and send the df back into Domo at the end. The start of the code worked perfectly fine for a few months but is now returning a "Error in Curl" message. Here is the exact code/error:
library (DomoR)
library (dplyr)
DomoR::init('Company-Name', 'TOKEN-NUMBER')
Total_Vendor_Revenue <- DomoR::fetch('DATA-FILE-NUMBER')
Error in curl::curl_fetch_memory(url, handle = handle) :
Failed writing body (0 != 16384)
Do I need to clear a certain memory within my computer? Is there a way to clear it after the code completely runs? Any information helps- thanks!
I think this means the table you are trying to fetch is too big. You can probably filter the table down in size in an ETL then pull the ETL's data set into R?

Getting Internal Server Error when trying to use Gviz's IdeogramTrack

I posted this on Bioconductor's support page but didn't any answers hence trying here.
I am using the IdeogramTrack function of R/Biocondutor package, Gviz, from my institution's cluster:
IdeogramTrack(genome="mm10",chromosome="chr1")
When I try this from the master node it works fine but when I try this from any other node in the cluster which IO's through the master node, it hangs and eventually I get the error message:
Error: Internal Server Error
I am able to access enter link description here or any other UCSC mirror through these nodes (using traceroute http://genome.ucsc.edu), and can successfully download data from other repositories such as Ensembl, (e.g., using getBM).
Any idea what's wrong?
BTW, any idea which port is IdeogramTrack trying to use?
it sounds like your institution's cluster has issue fetching annotation data from UCSC through Gviz. One suggestion I have is to see if you can manually download mm9 annotation from UCSC; here is a good place to start, by chromosome. Alternatively, you may use a Bioconductor annotation package such as this.
When you have your data.frame with chromosome and chromosomal information (e.g. mapInfo), you could take advantage of GenomicRanges::makeGRangesFromDataFrame to convert the mm9 annotation to a GRanges object, which allows you to make your own IdeogramTrack object. Details on how to make custom IdeogramTrack can be found here.
In general, here is the workflow:
library(GenomicRanges)
library(Gviz)
mm9_annot <- read.table(<file or url with annotation>)
mm9_granges <- makeGRangesFromDataFrame(mm9_annot)
# Alternatively, you may use rtracklayer package
# mm9_granges <- rtracklayer::import(<file or url with annotation>)
my_ideo <- IdeogramTrack(genome="mm9_custom", bands=mm9_granges)
Hope this helps.

"PickSoftThreshold" function issue in WGCNA?

Currently I am applying one dataset to WGCNA codes for Network construction and Module detection. Here I have to use a function called "pickSoftThreshold" to detect the network topology. When I run that it shows me this error-
> sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 18641.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 18641 of 54675
Error in serialize(data, node$con) : error writing to connection
Any idea how to get rid of that?
Thanks in Advance!!
I myself just started using WGCNA a couple of days ago and am not really familiar with it yet. But the error looks like you are using too many genes (up to 55k): I think you should find a way to filter out some of them, if your computer isn't powerful enough.
(Ideas from http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html )

Resources