NcML aggregation of remote THREDDS catalog - netcdf

I want to aggregate all files within a specific directory of a remote THREDDS catalog. These are grib2 files for nam forecast. This is the main list of directories for each month. Here is my ncml file for the aggregation of this catalog of files:
<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
<aggregation dimName="time" type="joinExisting">
<scan location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/" regExp="^.*\.grb2$" subdirs="false"/>
<dimension name="time" orgName="t" />
</aggregation>
</netcdf>
Also, I am mostly interested in having these two variables in the files: u-component_of_wind_height_above_ground and v-component_of_wind_height_above_ground.
I am not sure the above aggregation is correct from the remote catalog. I get this error from the above ncml file:
There are no datasets in the aggregation DatasetCollectionManager{ collectionName='http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/^.*\.grb2$' recheck=null
dir=http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/ filter=^.*\.grb2$
How this ncml file should be written?
Thanks.

You cannot glob remote URLs so you will need to provide a list of these OPeNDAP endpoints to the aggregation, like:
<dataset name="Nam218" urlPath="nam218">
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<netcdf location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/<file01>.grb2"/>
<netcdf location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/<file02>.grb2"/>
<netcdf location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/<file03>.grb2"/>
</aggregation>
</netcdf>
</dataset>

You can write a simple program (I used c++) for use in the command prompt. (I use Windows.) It launches a BAT file that launches wget and downloads the latest THREDDS catalog, then saves it in plain text, then the c++ program loads the entire file into a string where I parse it and do what I want with the data.

Related

Is there a way to get the content created time of an excel file in r?

I have been trying to obtain the time at which the content of an .xlsx file was created without any success so far. I can track the much-desired information on Windows either through File Properties -> Details -> Origin -> Content created, or by opening the Excel file and navigating to File -> Info -> Related Dates -> Created.
I was hoping that I would be able to obtain this information through openxlsx but while I am able to track down the creators by using the getCreators() function there does not appear to exist a similar function for the time.
I have also tried the file.info() function but it won't cut it as mtime, ctime, and atime all point to the time of the download.
Any help would be much appreciated!
I don't think openxlsx is going to do it for you, but you might want to submit a FR for them to add/extend file metadata availability. Here's something in a pinch, assuming that the XLSX file is in a newer zip-based format and not the previous binary format.
myfile <- "path/to/yourfile.xlsx"
docProps <- xml2::read_xml(unz(myfile, "docProps/core.xml"))
docProps
# {xml_document}
# <coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
# [1] <dc:creator>r2</dc:creator>
# [2] <cp:lastModifiedBy>r2</cp:lastModifiedBy>
# [3] <dcterms:created xsi:type="dcterms:W3CDTF">2021-05-09T20:01:41Z</dcterms:created>
# [4] <dcterms:modified xsi:type="dcterms:W3CDTF">2021-05-10T00:14:14Z</dcterms:modified>
xml2::xml_text(xml2::xml_find_all(docProps, "dcterms:created"))
# [1] "2021-05-09T20:01:41Z"
It's a text file, so in a pinch you can look at it manually, but I recommend not trying to do regex on XML in general. (You could get away with it here, but it's still fraught with peril.)

How can I parse a XML file in R, which has been generated probably using SRSS?

In my job I have to perform some analytics on data shared by external organisation through user access granted on web portal. Various reports are available there, which I can view and download in many formats. Two of these formats are very useful namely MS Excel and 'XML file with report data'. Excel file is normally heavily formatted (with sub-totals, merged cells, etc.) to suit the purpose of Excel users. Converting these Excel files to data frame/table is normally a big hassle. I therefore prefer to download 'xml' file and then parse it through -> save it in csv and then carry out my analysis in R.
However, whenever I try to parse xml file directly into R (to avoid intervening convert to csv step) I never succeed. So far I have tried XML xml2 libraries in R but to no avail.
Recently I tried this code.
library("XML")
library("methods")
setwd("C:\\Users\\Administrator\\Desktop\\")
res <- xmlParse("Skil.xml")
> res <- xmlParse("Skil.xml")
xmlns: URI RptSancDig_VoucherCompilationSheet is not absolute
rootnode <- xmlRoot(res)
rootsize <- xmlSize(rootnode)
> rootsize
[1] 2
xmldataframe <- xmlToDataFrame("Skil.xml")
> xmldataframe <- xmlToDataFrame("Skil.xml")
xmlns: URI RptSancDig_VoucherCompilationSheet is not absolute
> xmldataframe
Textbox24 Textbox63 DDOName_Collection
1 <NA> <NA> <NA>
2
Just to mention the file size of Skil.xml is about 12.1 Mb, and is successfully parsed in Excel.
I have also tried read_xml() function of xml2 but to no avail.
I would have happily shared a sample file to try, but I am unable to do so. Moreover, I am also unable to generate a sample file in that kind of xml format.
Can someone help?

Converting csv into a standardized xml format using R

I'm trying to take biological data saved in a .csv format and convert it into a specific xml format set by Darwin Core standards (an extension of Dublin Core). The data are set up in rows of observation records with headers in the first row. I need to repackage the data with Darwin Core standard XML tags using a basic XML tree/schema. The purpose is to standardize the data and make it readily available to load into any kind of database program.
I am a biologist, so I'm fairly new in computer programming and code. I would like to write something in R or excel that can do this repackaging step automatically so I don't have to manually reenter thousands of records.
I have tried using the developer tools in excel 365 to save the .csv as an .xml file, but it seems like I would have to develop the xml tree or schema in a text editor program first. Also, it seems like the xml add-ons that I would use are no longer available. I have downloaded the free text editor called "Brackets" build 1.14 to write some simple xml. I also have RStudio version 1.1.419 with the XML package downloaded to potentially write a script with R version 3.4.3. I've read up on all the Darwin Core Terms and basic XML syntax and rules, but I don't really know where to start.
This is an example of the data in simple .csv format:
type,institutionCode,collectionCode,catalogNumber,scientificName,individualCount,datasetID
PhysicalObject,ANSP,PH,123,"Cryptantha gypsophila Reveal & C.R. Broome",12,urn:lsid:tim.lsid.tdwg.org:collections:1
PhysicalObject,ANSP,PH,124,"Buxbaumia piperi",2,urn:lsid:tim.lsid.tdwg.org:collections:1
This is what the records should look like as an end product:
[<?xml version="1.0"?>
<dwr:SimpleDarwinRecordSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>123</dwc:catalogNumber>
<dwc:scientificName>Cryptantha gypsophila reveal & C.R. Boome</dwc:scientificName>
<dwc:individualCount>12</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>124</dwc:catalogNumber>
<dwc:scientificName>Buxbaumia piperi</dwc:scientificName>
<dwc:individualCount>2</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
</dwr:SimpleDarwinRecordSet>]
This can be done in a number of ways. Here, I go for the stringi solution because it's easy to read what the inputs are.
The code below imports the data, writes the first part of the XML, then writes SimpleDarwinRecords for each line and finally the last part of the file. unlink is there to clean up before anything is appended to the file. If indentation matters (apparently it doesn't), you may need to tweak the template a bit.
This could also be done using a Jinja2 template and Python.
library(stringr)
xy <- read.table(text = 'type,institutionCode,collectionCode,catalogNumber,scientificName,individualCount,datasetID
PhysicalObject,ANSP,PH,123,"Cryptantha gypsophila Reveal & C.R. Broome",12,urn:lsid:tim.lsid.tdwg.org:collections:1
PhysicalObject,ANSP,PH,124,"Buxbaumia piperi",2,urn:lsid:tim.lsid.tdwg.org:collections:1', header = TRUE,
sep = ",")
unlink("output.txt")
outfile <- file(description = "output.txt", open = "at")
writeLines('[<?xml version="1.0"?>
<dwr:SimpleDarwinRecordSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">', con = outfile)
writeLines(str_glue('<dwr:SimpleDarwinRecord>
<dcterms:type>{xy$type}</dcterms:type>
<dwc:institutionCode>{xy$institutionCode}</dwc:institutionCode>
<dwc:collectionCode>{xy$collectionCode}</dwc:collectionCode>
<dwc:catalogNumber>{xy$catalogNumber}</dwc:catalogNumber>
<dwc:scientificName>{xy$scientificName}</dwc:scientificName>
<dwc:individualCount>{xy$individualCount}</dwc:individualCount>
<dwc:datasetID>{xy$datasetID}</dwc:datasetID>
</dwr:SimpleDarwinRecord>'), con = outfile)
writeLines(
'</dwr:SimpleDarwinRecordSet>]',
con = outfile)
close(outfile)
This is the result:
[<?xml version="1.0"?>
<dwr:SimpleDarwinRecordSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ http://rs.tdwg.org/dwc/xsd/tdwg_dwc_simple.xsd"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/">
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>123</dwc:catalogNumber>
<dwc:scientificName>Cryptantha gypsophila Reveal & C.R. Broome</dwc:scientificName>
<dwc:individualCount>12</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
<dwr:SimpleDarwinRecord>
<dcterms:type>PhysicalObject</dcterms:type>
<dwc:institutionCode>ANSP</dwc:institutionCode>
<dwc:collectionCode>PH</dwc:collectionCode>
<dwc:catalogNumber>124</dwc:catalogNumber>
<dwc:scientificName>Buxbaumia piperi</dwc:scientificName>
<dwc:individualCount>2</dwc:individualCount>
<dwc:datasetID>urn:lsid:tim.lsid.tdwg.org:collections:1</dwc:datasetID>
</dwr:SimpleDarwinRecord>
</dwr:SimpleDarwinRecordSet>]

save file in XYZ format as vector (GML or shp)

I am using QGIS software. I would like to show value of each raster cell as label.
My idea (I don't know any plugin or any functionality from QGIS which allow to it easier) is to export raster using gdal2xyz.py into coordinates-value format and then save it as vector (GML or shapefile). For this second task, I try to use
*gdal_polygonize.py:*
gdal_polygonize.py rainfXYZ.txt rainf.shp Creating output rainf.shp of
format GML.
0...10...20...30...40...50...60...70...80...90...100 - done.
unfortunately I am unable to load created file (even if I change the extension to .gml)
ogr2ogr tool don't even recognize this format.
yes - sorry I forgot to add such information.
In general after preparing CSV file (using gdal2xyz.py with -csv option),
I need to add one line at begining of it:
"Longitude,Latitude,Value" (without the quotes)
Then I need to create a VRT file which contain
*> <OGRVRTDataSource>
> <OGRVRTLayer name="Shapefile_name">
> <SrcDataSource>Shapefile_name.csv</SrcDataSource>
> <GeometryType>wkbPoint</GeometryType>
>
> <GeometryField encoding="PointFromColumns" x="Longitude"
> y="Latitude"/>
> </OGRVRTLayer> </OGRVRTDataSource>*
Run the command "ogr2ogr -select Value Shapefile_name.shp Shapefile_name.vrt". I got the file evap_OBC.shp and two other associated files.
For the sake of archive completeness, this question has also been asked on GDAL mailing list as thread save raster as point-vector file. It seems Chaitanya provided solution for it.

standalone library for OpenOffice formula renderer?

Is there a standalone library of OpenOffice's formula renderer? I'm looking for something that can take plain text (e.g. E = mc^2) in the same syntax as used by OpenOffice, and convert to png or pdf fragments.
(note: I don't need the WYSIWYG editor, just the renderer. Basically I would like to work in OpenOffice to interactively edit my formulas, and then copy the source text for use in other contexts w/o needing OpenOffice to render them.)
I'm using unoconv to convert OpenOffice/LibreOffice document to PDF.
However, first I had to create some input document with a formula.
Unfortunately, it is not possible to use just the formula editor to create ODF file because the output PDF file would contain weird headers and footers.
Therefore, I created a simple text document (in Writer) and embedded the formula as a single object (aligned as a character). I saved the ODT file, unzipped it (since ODT is just a ZIP) and edited the content. Then, I identified what files can be deleted and formatted the remaining files to get a minimal example.
In my example, the formula itself is located in Formula/content.xml. It should be easy to change just the code within the <annotation>...</annotation> tags in an automated way.
Finally, I zipped the directory and produced a new ODT file.
Then, using unoconv and pdfcrop, I produced a nice formula as PDF.
# this trick prevents zip from creating an additional directory
cd formula.odt.unzipped
zip -r ../formula.odt .
cd ..
unoconv -f pdf formula.odt # ODT to PDF
pdfcrop formula.pdf # keep only the formula
# you can convert the PDF to bitmap as follows
convert -density 300x300 formula-crop.pdf formula.png
Content of the unzipped ODT directory:
Here is the minimal content of an ODT file formula.odt.
formula.odt.unzipped/Formula/content.xml
formula.odt.unzipped/META-INF/manifest.xml
formula.odt.unzipped/content.xml
File formula.odt.unzipped/Formula/content.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<annotation encoding="StarMath 5.0">
f ( x ) = sum from { { i = 0 } } to { infinity } { {f^{(i)}(0)} over {i!} x^i}
</annotation>
</semantics>
</math>
File formula.odt.unzipped/content.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink">
<office:body>
<office:text>
<text:p>
<draw:frame>
<draw:object xlink:href="./Formula"/>
</draw:frame>
</text:p>
</office:text>
</office:body>
</office:document-content>
File formula.odt.unzipped/META-INF/manifest.xml contains:
<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0" manifest:version="1.2">
<manifest:file-entry manifest:full-path="/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.text"/>
<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Formula/content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Formula/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.formula"/>
</manifest:manifest>
There are several web services that run LaTeX for you and return an image. For instance, http://rogercortesi.com/eqn/

Resources