How can I read gzip compressed grib files in R? - r

I am trying to open MUlti-sensor precipitation data from eumetsat in R. I can get these data only using GZIP compression method and data format type is GRIB. When I download data I get tar file.
How can I open these data in R?
I tried to use code
> untar("1098496-1of1")
but got error message
Error in gzfile(path.expand(tarfile), "rb") : cannot open the connection
In addition: Warning message:
In gzfile(path.expand(tarfile), "rb") :
cannot open compressed file '1098496-1of1', probable reason 'No such file or directory'
but I when I use next code:
> dir.create("rainfalldataeumetstatR")
> getwd()
[1] "C:/Users/st/Documents"
> untar("1098496-1of1.tar")
> untar("1098496-1of1.tar", files="rainfalldataeumetstatR")
> list.files("rainfalldataeumetstatR")
I don't get some files in my directory and get answer:
character(0)
May be that error appears because files in tar zip are gz archives?

I, too, have grappled with opening GRIB files in R. You have several problems and can tackle them one by one.
For the untar and gzip issues, work from the command line. I don't know how the tar package is built/packaged from Eumetsat; does it create a directory and put all the data files in that directory? In that case, put the tarball in a top-level data directory and then
tar xvf tar_file_name
cd (to the directory that was just created)
gunzip *.gz
Note down the full path name of the files you will want to open for later use.
Are the files in GRIB1 or GRIB2? If in GRIB1, you need to install wgrib. If in GRIB2, you need to install wgrib2. Both are available from NCEP.
You can download them from:
http://www.cpc.ncep.noaa.gov/products/wesley/
In R, 3.1 and later, you install the rNOMADS package 2.0.1 and later.
NOAA National Operational Model Archive and Distribution System (NOMADS) distributes global grid data in GRIB format (currently in GRIB2).
rNOMADS helps you open GRIB1 and GRIB2 data in R by calling wgrib or wgrib2 to decode the binary GRIB data and pipe it (in csv format) for R to read in.
Open up R, load up rNOMADS, and then call the ReadGrib routine using the full path name of your data file in "data_file_name". This is not the way described in the rNOMADS documentation, but it works.
Installing wgrib and wgrib2 is the only hard part and it may not even be that hard, depending on your system. I'm writing tutorials on how to install wgrib, wgrib2 and use rNOMADS with local data files. When I am done, they will be posted here:
http://rda.ucar.edu/datasets/ds083.2/#!software
Now for some bad news:
You need to open each file sequentially. But, you can extract and save the subfields you need, and then read in the next datafile, overwriting the large data structure into which you read the previous file. If that is too much of a PITA, have you considered using the GRADS tool for displaying GRIB data?

There is no native way to read grib files into R. Use wgrib or wgrib2 depending on whether your file is in grib or grib2 format. I am the package manager for rNOMADS - and trust me, we tried to figure out a simple R way, and ended up dropping it. Maybe the folks at NCEP will do it someday, but it's out of our skill range.

Personally I untar my files using cygwin also because the wgrib package in cygwin will allow you to get an inventory file so you can tell R what data is contained in each layer. Under the assumption the data is grib1 r can read it directly. Grib2 requires wgrib2 on your machine, RNomads is working on that challenge.
Alright I recently found a great website that shows how to install wgrib so that it can run in R in conjunction with rNOMADS.
https://bovineaerospace.wordpress.com/2015/04/26/how-to-install-rnomads-with-grib-file-support-on-windows/#comments

Related

Error message "Error: openxlsx can only read .xlsx or .xlsm files" in R 4.1.3

I have a script I was using a few months ago, and I was asked to re-output it using new source data. Now I get the message "Error: openxlsx can only read .xlsx or .xlsm files."
Of course, it IS an .xlsx file.
Not only that, I went back and ran the same script on the old source file (which worked), and...I get the same error!! I haven't changed any code, but my version of R has been updated by the administrators from 3.6 to 4.1.3 (I work in a virtual environment). I have confirmed that openxlsx version 4.2.5 is installed.
I've seen in other posts that people recommend using other packages to read xlsx files. That is not an ideal option here for administrative reasons (getting permission to install new packages can be very time-consuming and may blow deadlines), and I've started pursuing that option, but in the meantime, does anyone have any ideas?
Unfortunately, changing the format (i.e. exporting as csv and using read.csv) is also not an option, because we're auditors and doing that will break the audit trail.
OK, a colleague of mine solved the problem.
The source file has the extension ".XLSX". In order for openxlsx to read the file, you have to change the extension IN THE CODE to ".xlsx" (in my case at least, I didn't even have to change the actual extension of the file--just the quoted reference in the code), although other colleagues say they have had to do this.

Not automatically downloading dropbox files when opening data in R

I recently bought a new Macbook pro (M1) and am trying to run R codes.
I open and run a R script that I have been working, but data such as csv, excel files, and RData in the dropbox are not loaded. It works only after I manually download the files in Finder, otherwise files are not automatically downloaded when trying to load in R.
Here is an example of the code and the error message
setwd('/Users/xxx/Dropbox/Data')
load('dta.RData')
Error in load("dta.RData") :
empty (zero-byte) input file
If I manually download the "dta.Rata" in finder (using Make Available Offline) and run the same code it works well.
Any solution?
I found that this
https://help.dropbox.com/installs-integrations/desktop/macos-12-monterey-support
I guess we should wait until Dropbox find the solution.

Submitting a package to CRAN: .tar.gz file

I have a package I am ready to submit to CRAN (everything checks out). However, in the spot where it says Choose File, I am unsure what file to choose, as it says it requires a .tar.gz file, which I gather is some kind of compressed file?
Do I need to compress everything into a .tar.gz file? If so, how?
If not, I have a .Rproj file, and various files like namespace and description and license, so it is unclear to me which file to submit.
I apologize if this is a simple question, this is my first package to be submitted.
You have two options here. Use R's command line command:
> R CMD build /path/to/package/directory
Or use devtools::build from within R:
R> devtools::build( "path/to/package/directory" )
Both result in a tar.gz file on your local file system. The name will look like: mypackage_[Version].tar.gz
It is this file that you load to CRAN.

R script to check if my zip folder is corrupt

There is a python testzip() module to check if a zip file is corrupt in python. However i want to know it's parallel in R as well.
How to check if a zip folder is corrupt by writing a small script in R ?

PDF to text in R in Mac

I have downloaded PDFtoText in mac and wrote following code to convert pdf files to text:
pdf_to_load =("~/my_directory/my.pdf")
system(paste('pdftotext', pdf_to_load))
The code runs well but I am not able to see my.txt in the source directory nor it has been saved anywhere in the folders. Where I went wrong?
One of my mentors were able to run the same code in his computer and he was able to see the converted .txt file.
Kindly guide.
You get a wrong result if the default PDF extraction engine is not found on your computer, see ?tm::readPDF. Those engines are not part of R or of the tm package, and it depends on your computer whether the necessary programs are already installed.
The easiest solution is to install the programs pdftotext and pdfinfo (you'll need both), which you can obtain as precompiled binaries here.
Once these programs are correctly installed, you should be able to extract the text of the PDF file without a system call, by using the readPDF() function of the tm package
library(tm)
my_pdf_txt <- readPDF(control=list(text="-layout"))(elem=list(uri="~/my_directory/my.pdf"), language="en")

Resources