Get aggregated data from OPenDAP ncml that requires authentication using R - r

I'm trying to get TRMM data from NASA OPenDAP server using the raster package in R. Initially I had some difficulty regarding authentication, but that issue was resolved.
NASA OPenDAP server publishes TRMM 3B42_daily data as individual files, one for each day and an aggregated annual data (using ncml). So, my problem now is that, using R raster package and the authentication files .dodsrc and .netrc I can download individual NetCDF files but I can't download the aggregated data.
So, this works:
library(raster)
single_date_opendap <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/TRMM_L3/TRMM_3B42_Daily.7/2002/04/3B42_Daily.20020405.7.nc4'
test <- stack(single_date_opendap, varname = 'precipitation')
This doesn't:
library(raster)
url_opendap_no_brkt <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml'
test <- stack(url_opendap_no_brkt, varname = 'precipitation')
And gives me the error message:
Error in .local(.Object, ...) :
An error occurred while creating a virtual connection to the DAP server:
Error while reading the URL: https://disc2.gesdisc.eosdis.nasa.gov:443/openda
p/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml.
ver.
The OPeNDAP server returned the following message:
Unauthorized: Contact the server administrator.
Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer",
Cannot create a RasterLayer object from this file. (file does not exist)
Is it possible to get data from a OPenDAP server that publishes aggregated data?

After some exchange with NASA support and with Antonio's tip, found out that R raster package will not work with the aggregated datasets. But ncdf4::nc_open is able to handle it. Strange because, from what I understand, raster package calls nc_open in the background.
Anyway, this works:
library(ncdf4)
url_opendap <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml'
trmm <- nc_open(url_opendap)
and this doesn't
library(raster)
url_opendap <- 'https://disc2.gesdisc.eosdis.nasa.gov:443/opendap/ncml/aggregation/TRMM_3B42_Daily.7/TRMM_3B42_daily.7_Aggregation_2001.ncml'
trmm <- stack(url_opendap, varname = "precipitation")

Related

R packages not connecting to peer for data download

I am working with a couple of R packages for genetic pathway enrichment analyses and the two packages that I am using are now throwing errors when trying to connect to each package's respective server for downloading the reference data for the analysis.
In the first package gage, I am getting the following error when attempting to download:
library(gage)
> kg.ko = kegg.gsets("ko") # ("ko" is KEGG ortholog pathway)
Error in curl::curl_fetch_memory(url, handle = handle) :
Failure when receiving data from the peer
In the second package clusterProfiler, I am getting the following error:
library(clusterProfiler)
# the data
dput(head(de_kegg_chr))
c("K14847", "K19009", "K00078", "K21407", "K23285", "K06972")
# KEGG enrichment (which will pull relevant reference data during this step)
# over-representation analysis (fisher's)
> enrich <- enrichKEGG(gene = de_kegg_chr,
+ organism = "ko",
+ keyType='kegg',
+ pvalueCutoff = 0.01)
Reading KEGG annotation online:
fail to download KEGG data...
Error in download.KEGG.Path(species) :
'species' should be one of organisms listed in 'http://www.genome.jp/kegg/catalog/org_list.html'...
In addition: Warning message:
In utils::download.file(url, quiet = TRUE, method = method, ...) :
URL 'https://rest.kegg.jp/link/ko/pathway': status was 'Failure when receiving data from the peer'
After the first error, I thought it was something specific to the gage package and found a simple work-around because these data are downloaded from the server prior to the analysis function.
This is more of a problem with the second package because the reference data are downloaded within the function that conducts the analysis.
Now that this is happening with more than one package (both of these scripts were working perfectly before yesterday), I'm thinking it is something systematic within R or R studio.

Download failed using get_ssurgo in R from FedData package

I am unable to download a large SSURGO dataset using the FedData package in R. I get the same error using the example code from the package:
vepPolygon <- polygon_from_extent(raster::extent(672800,740000,4102000,4170000),
proj4string='+proj=utm +datum=NAD83 +zone=12')
# Get the NRCS SSURGO data (USA ONLY)
SSURGO.VEPIIN <- get_ssurgo(template=vepPolygon, label='VEPIIN')
produces the error:
Error in { :
task 1 failed - "Download of https://sdmdataaccess.nrcs.usda.gov/Spatial/SDMNAD83Geographic.wfs?Service=WFS&Version=1.0.0&Request=GetFeature&Typename=SurveyAreaPoly&BBOX=-109.056777043215,37.033556543269,-108.279725827617,37.6609211495406 failed!"
It does work if I use extremely small templates, like 20'x20', but my understaning is that this package is supposed to enable larger downloads. Has something changed with the web soil survey since this package was created?

Downloading Financial Statements in R with finstr

I'm trying to download financial statements in R using a package at:
Financial statements in R
I'm trying to modify the example in their read me for other companies. I have tried to download the last two Tesla Q's.
The code I modified so far is:
xbrl_url2017Q3 <- "https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/tsla-20180930.xml"
xbrl_url2017Q2 <- "https://www.sec.gov/Archives/edgar/data/1318605/000156459018019254/tsla-20180630.xml"
old_o <- options(stringsAsFactors = FALSE)
xbrl_data_tsla2017Q3 <- xbrlDoAll(xbrl_url2017Q3)
Error from the line above is:
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'
In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459018026353/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '403 Forbidden'
xbrl_data_tsla2017Q2 <- xbrlDoAll(xbrl_url2017Q2)
options(old_o)
tsla2017Q3 <- xbrl_get_statements(xbrl_data_tsla2017Q3)
tsla2017Q2 <- xbrl_get_statements(xbrl_data_tsla2017Q2 )
tsla2017Q2
balance_sheet2017Q2 <- tsla2017Q2$StatementOfFinancialPositionClassified
balance_sheet2017Q3<- tsla2017Q3$StatementOfFinancialPositionClassified
income2017Q2 <- tsla2017Q2$StatementOfIncome
income2017Q3 <- tsla2017Q3$StatementOfIncome
balance_sheet2017Q3
Returns "NULL"
See the 10-Q at tesla's SEC fillings.
The last 10-Q.
Any recommendations on how I can go about this?
I'm looking to download the financial data to play around it with and would like it in tidy formate.
This is a common problem with the XBRL package where not all XML schemas are downloaded in the cache for some SEC filings. Download the missing schema in your cache folder and retry the xbrlDoAll call - it should work this time.

send2cy doesn't work in Rstudio ~ Cyrest, Cyotoscape

When I run "send2cy" function in R studio, I got error.
# Basic setup
library(igraph)
library(RJSONIO)
library(httr)
dir <- "/currentdir/"
setwd(dir)
port.number = 1234
base.url = paste("http://localhost:", toString(port.number), "/v1", sep="")
print(base.url)
# Load list of edges as Data Frame
network.df <- read.table("./data/eco_EM+TCA.txt")
# Convert it into igraph object
network <- graph.data.frame(network.df,directed=T)
# Remove duplicate edges & loops
g.tca <- simplify(network, remove.multiple=T, remove.loops=T)
# Name it
g.tca$name = "Ecoli TCA Cycle"
# This function will be published as a part of utility package, but not ready yet.
source('./utility/cytoscape_util.R')
# Convert it into Cytosccape.js JSON
cygraph <- toCytoscape(g.tca)
send2cy(cygraph, 'default%20black', 'circular')
Error in file(con, "r") : cannot open the connection
Called from: file(con, "r")
But I didn't find error when I use "send2cy" function from terminal R (Run R from terminal just calling by "R").
Any advice is welcome.
I tested your script with local copies of the network data and utility script, and with updated file paths. The script ran fine for me in R Studio.
Given the error message you are seeing "Error in file..." I suspect the issue is with your local files and file paths... somehow in an R Studio-specific way?
FYI: an updated, consolidated and update set of R scripts for Cytoscape are available here: https://github.com/cytoscape/cytoscape-automation/tree/master/for-scripters/R. I don't think anything has significantly changed, but perhaps trying in a new context will resolve the issue you are facing.

Download a custom dataset in Azure ML Jupyter/iPython Notebook using R

I need to download a custom dataset in an Azure Jupyter/iPython Notebook.
My ultimate goal is to install an R package. To be able to do this the package (the dataset) needs to be downloaded in code. I followed the steps outlined by Andrie de Vries in the comments section of this post: Jupyter Notebooks with R in Azure ML Studio.
Uploading the package as a ZIP file was without problems, but when I run the code in my notebook I get an error:
Error in curl(x$DownloadLocation, handle = h, open = conn): Failure
when receiving data from the peer Traceback:
download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
lapply(1:nrow(datasets), function(j) get_dataset(datasets[j, . ], ...))
FUN(1L[[1L]], ...)
get_dataset(datasets[j, ], ...)
curl(x$DownloadLocation, handle = h, open = conn)
So I simplified my code into:
library("AzureML")
ws <- workspace()
ds <- datasets(ws)
ds$Name
data <- download.datasets(ws, "plotly_3.6.0.tar.gz.zip")
head(data)
Where "plotly_3.6.0.tar.gz.zip" is the name of my dataset of data type "Zip".
Unfortunately this results in the same error.
To rule out data type issues I also tried to download another dataset of mine which is of data type "Dataset". Also the same error.
Now I change the dataset I want to download to one of the sample datasets of AzureML Studio.
"text.preprocessing.zip" is of datatype Zip
data <- download.datasets(ws, "text.preprocessing.zip")
"Flight Delays Data" is of datatype GenericCSV
data <- download.datasets(ws, "Flight Delays Data")
Both of the sample datasets can be downloaded without problems.
So why can't I download my own saved dataset?
I could not find anything helpful in the documentation of the download.datasets function. Not on rdocumentation.org, nor on cran.r-project.org (page 17-18).
Try this:
library(AzureML)
ws <- workspace(
id = "your AzureML ID",
auth = "your AzureML Key"
)
name <- "Name of your saved data"
ws <- workspace()
It seems the error I got was due to a bug in the (then early) Azure ML Studio.
I tried again after the reply of Daniel Prager only to find out my code works as expected without any changes. Adding the id and auth parameters was not needed.

Resources