NetCDF - What is is 'phony_dim_0', 'phony_dim_1', 'phony_dim_2'? - netcdf

I am very new to using NetCDF files and I am at the exploratory stage trying to understand what this file could do. I am using 'netCDF4' python library. I am trying to find what does 'phony_dim_0', 'phony_dim_1', 'phony_dim_2' mean and contains? I am thinking it could be 'lat','lon', and/or 'time'?
Loading nc file
ds = nc.Dataset('my_file.nc')
type(ds)
>>> netCDF4._netCDF4.Dataset
Printing Keys
print(ds.dimensions.keys())
>>> dict_keys(['phony_dim_0', 'phony_dim_1', 'phony_dim_2'])
Extract what is in this key?
ds.dimensions['phony_dim_0']
>>> <class 'netCDF4._netCDF4.Dimension'>: name = 'phony_dim_0', size = 1179
Error:
for c in ds.dimensions['phony_dim_0']:
print(c) # Want to see what is in this? # Errors: TypeError: 'netCDF4._netCDF4.Dimension' object is not iterable

Related

Cannot write int16 data type using the R's rhdf5 package

In R I would like to write a matrix of integers into an HDF5 file ".h5" as an int16 data type. To do so I am using the rhdf5 package. The documentation says that you should set one of the supported H5 data types when creating the dataset. However, even when setting up the int16 data type the result is always int32. Is it possible to store the data as int16 or uint16?
library(rhdf5)
m <- matrix(1,5,5)
outFile <- "test.h5"
h5createFile(outFile)
h5createDataset(file=outFile,"m",dims=dim(m),H5type = "H5T_NATIVE_INT16")
h5write(m,file=outFile,name="m")
H5close()
h5ls(outFile)
The result is:
Using another library as I did not find rhdf5
library(hdf5r)
m <- matrix(1L,5L,5L)
outFile <- h5file("test.h5")
createDataSet(outFile, "m", m, dtype=h5types$H5T_NATIVE_INT16)
print(outFile)
print(outFile[["m"]])
h5close(outFile)
For the first print (the file)
Class: H5File
Filename: D:\Travail\Projets\SWM\swm.gps\test.h5
Access type: H5F_ACC_RDWR
Listing:
name obj_type dataset.dims dataset.type_class
m H5I_DATASET 5 x 5 H5T_INTEGER
Here we see it displays H5T_INTEGER as the datatype for the dataset m
and the second (the dataset)
Class: H5D
Dataset: /m
Filename: D:\Travail\Projets\SWM\swm.gps\test.h5
Access type: H5F_ACC_RDWR
Datatype: H5T_STD_I16LE
Space: Type=Simple Dims=5 x 5 Maxdims=Inf x Inf
Chunk: 64 x 64
We can see that it has the right datatype H5T_STD_I16LE
The code your provided works as expected, but it's a limitation of the h5ls() function in rhdf5 that it doens't report a more details data type. As #r2evans points out, it's technically true that it's an integer, you just want to know a bit more detail that that.
If we run you code and use the h5ls() tool distributed by the HDF5 group we get more information:
library(rhdf5)
m <- matrix(1,5,5)
outFile <- tempfile(fileext = ".h5")
h5createFile(outFile)
h5createDataset(file=outFile,"m", dims=dim(m),H5type = "H5T_NATIVE_INT16")
h5write(m,file=outFile, name="m")
system2("h5ls", args = list("-v", outFile))
## Opened "/tmp/RtmpFclmR3/file299e79c4c206.h5" with sec2 driver.
## m Dataset {5/5, 5/5}
## Attribute: rhdf5-NA.OK {1}
## Type: native int
## Location: 1:800
## Links: 1
## Chunks: {5, 5} 50 bytes
## Storage: 50 logical bytes, 14 allocated bytes, 357.14% utilization
## Filter-0: shuffle-2 OPT {2}
## Filter-1: deflate-1 OPT {6}
## Type: native short
Here the most important part is the final line which confirms the datatype is "native short" a.k.a native int16.

Error converting a string in to datetime object

I am reading a line from a text file. It contains the date in YYYY-MM-DD format. I am trying to convert it to datetime object so as to find the difference between two dates.
l = datetime.strptime(last_execution_date,"%Y-%m-%d").date()
Its throwing an error:ValueError: unconverted data remains:
But when I am using below its working perfectly fine
l = datetime.strptime('2019-01-25',"%Y-%m-%d").date()
My complete code looks something like this:
def incoming_mails_duration():
f = open('last_script_execution_time.txt', 'r')
last_execution_date = f.readline()
print(last_execution_date)
print(type(last_execution_date))
l = datetime.strptime(last_execution_date,"%Y-%m-%d").date()
print(l)
print(type(l))
present_date = date.today()
delta_days = abs((present_date - l).days)
f.close()
Why I am getting the above error when I am passing the string as variable read from a file ?
It is because f.readline() returns string with \n in the end. You either have to strip the newline character or include it inside strptime format argument.
Solution 1:
last_execution_date = f.readline().strip()
Solution 2:
l = datetime.strptime(last_execution_date,"%Y-%m-%d\n").date() # Note \n
Note
Also it is good practice to open files with with statement. This is a safe way to handle files. File will be safely closed even if exception occurred inside with block.
with open(filepath) as f:
for line in f:
# Work with line here
pass

Error in running a Python code from R with the package rPithon

I would like to run this Python code from R:
>>> import nlmpy
>>> nlm = nlmpy.mpd(nRow=50, nCol=50, h=0.75)
>>> nlmpy.exportASCIIGrid("raster.asc", nlm)
Nlmpy is a Python package to build neutral landscape models. The example comes from the website
To run this Python code from R, I 'm trying to use the package rPithon. However, I obtain this error message:
if (pithon.available())
{
nRow <- 50
nCol <- 50
h <- 0.75
# this file contains the definition of function concat
pithon.load("C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py")
pithon.call( "mpd", nRow, nCol, h)
} else {
print("Unable to execute python")
}
Error in pithon.get("_r_call_return", instance.name = instname) :
Couldn't retrieve variable: Traceback (most recent call last):
File "C:/Users/Documents/R/win-library/3.3/rPithon/pythonwrapperscript.py", line 110, in <module>
reallyReallyLongAndUnnecessaryPrefix.data = json.dumps([eval(reallyReallyLongAndUnnecessaryPrefix.argData)])
File "C:\Users\ANACON~1\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Users\ANACON~1\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Users\ANACON~1\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "C:\Users\ANACON~1\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([[ 0.36534654, 0.31962481, 0.44229946, ..., 0.11513079,
0.07156331, 0.00286971], [ 0.41534291, 0.41333479, 0.48118995, ..., 0.19203674,
0.04192771, 0.03679473], [ 0.5188
Is this error caused by a syntax issue in my code ? I work with the Anaconda 4.2.0 platform for Windows which uses the Python 2.7 version.
I haven't used the nlmpy package hence, I am not sure what would be your expected output. However, this code successfully communicates between R and Python.
There are two files,
nlmpyInR.R
command ="python"
path2script="path_to_your_pythoncode/nlmpyInPython.py"
nRow <-50
nCol <-50
h <- 0.75
# Build up args in a vector
args = c(nRow, nCol, h)
# Add path to script as first arg
allArgs = c(path2script, args)
Routput = system2(command, args=allArgs, stdout=TRUE)
#The command would be python nlmpyInPython.py 50 50 0.75
print(paste("The Output is:\n", Routput))
nlmpyInPython.py
import sys
import nlmpy
#Getting the arguments from the command line call
nRow = sys.argv[1]
nCol = sys.argv[2]
h = sys.argv[3]
nlm = nlmpy.mpd(nRow, nCol, h)
pyhtonOutput = nlmpy.exportASCIIGrid("raster.asc", nlm)
#Whatever you print will get stored in the R's output variable.
print pyhtonOutput
The cause of the error that you're getting is hinted at by the
"is not JSON serializable" line. Your R code calls the mpd
function with certain arguments, and that function itself will
execute correctly. The rPithon library will then try to send the
return value of the function back to R, and to do this it will try
to create a JSON object
that describes the return value.
This works well for integers, floating point values, arrays, etc,
but not every kind of Python object can be converted to such a
JSON representation. And because rPithon can't convert the return value
of mpd this way, an error is generated.
You can still use rPithon to call the mpd function though. The following
code creates a new Python function that performs two steps: first
it calls the mpd function with the specified parameters, and then it
exports the result to a file, of which the filename is also an argument.
Using rPithon, the new function is then called from R. Because myFunction doesn't return anything, representing the return value in JSON format will not be a problem.
library("rPithon")
pythonCode = paste("import nlmpy.nlmpy as nlmpy",
"",
"def myFunction(nRow, nCol, h, fileName):",
" nlm = nlmpy.mpd(nRow, nCol, h)",
" nlmpy.exportASCIIGrid(fileName, nlm)",
sep = "\n")
pithon.exec(pythonCode)
nRow <- 50
nCol <- 50
h <- 0.75
pithon.call("myFunction", nRow, nCol, h, "outputraster.asc")
Here, the Python code defined as an R string, and executed using
pithon.exec. You could also put that Python code in a separate file
and use pithon.load to process the code so that the myFunction
function is known.

Porting to Python3: PyPDF2 mergePage() gives TypeError

I'm using Python 3.4.2 and PyPDF2 1.24 (also using reportlab 3.1.44 in case that helps) on windows 7.
I recently upgraded from Python 2.7 to 3.4, and am in the process of porting my code. This code is used to create a blank pdf page with links embedded in it (using reportlab) and merge it (using PyPDF2) with an existing pdf page. I had an issue with reportlab in that saving the canvas used StringIO which needed to be changed to BytesIO, but after doing that I ran into this error:
Traceback (most recent call last):
File "C:\cms_software\pdf_replica\builder.py", line 401, in merge_pdf_files
input_page.mergePage(link_page)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2013, in mergePage
self.mergePage(page2)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2059, in mergePage
page2Content = PageObject._pushPopGS(page2Content, self.pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1973, in _pushPopGS
stream = ContentStream(contents, pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2446, in __init
stream = BytesIO(b_(stream.getData()))
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 826, in getData
decoded._data = filters.decodeStreamData(self)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 326, in decodeStreamData
data = ASCII85Decode.decode(data)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in decode
data = [y for y in data if not (y in ' \n\r\t')]
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in
data = [y for y in data if not (y in ' \n\r\t')]
TypeError: 'in <string>' requires string as left operand, not int
Here is the line and the line above where the traceback mentions:
link_page = self.make_pdf_link_page(pdf, size, margin, scale_factor, debug_article_links)
if link_page != None:
input_page.mergePage(link_page)
Here are the relevant parts of that make_pdf_link_page function:
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=(size['width'], size['height']))
....# left out code here is just reportlab specifics for size and url stuff
can.linkURL(url, r1, thickness=1, color=colors.green)
can.rect(x1, y1, width, height, stroke=1, fill=0)
# create a new PDF with Reportlab that has the url link embedded
can.save()
packet.seek(0)
try:
new_pdf = PdfFileReader(packet)
except Exception as e:
logger.exception('e')
return None
return new_pdf.getPage(0)
I'm assuming it's a problem with using BytesIO, but I can't create the page with reportlab with StringIO. This is a critical feature that used to work perfectly with Python 2.7, so I'd appreciate any kind of feedback on this. Thanks!
UPDATE:
I've also tried changing from using BytesIO to just writing to a temp file, then merging. Unfortunately I got the same error.
Here is tempfile version:
import tempfile
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, "tmp.pdf")
can = canvas.Canvas(temp_path, pagesize=(size['width'], size['height']))
....
can.showPage()
can.save()
try:
new_pdf = PdfFileReader(temp_path)
except Exception as e:
logger.exception('e')
return None
return new_pdf.getPage(0)
UPDATE:
I found an interesting bit of information on this. It seems if I comment out the can.rect and can.linkURL calls it will merge. So drawing anything on a page, then trying to merge it with my existing pdf is causing the error.
After digging in to PyPDF2 library code, I was able to find my own answer. For python 3 users, old libraries can be tricky. Even if they say they support python 3, they don't necessarily test everything. In this case, the problem was with the class ASCII85Decode in filters.py in PyPDF2. For python 3, this class needs to return bytes. I borrowed the code for this same type of function from pdfminer3k, which is a port for python 3 of pdfminer. If you exchange the ASCII85Decode() class for this code, it will work:
import struct
class ASCII85Decode(object):
def decode(data, decodeParms=None):
if isinstance(data, str):
data = data.encode('ascii')
n = b = 0
out = bytearray()
for c in data:
if ord('!') <= c and c <= ord('u'):
n += 1
b = b*85+(c-33)
if n == 5:
out += struct.pack(b'>L',b)
n = b = 0
elif c == ord('z'):
assert n == 0
out += b'\0\0\0\0'
elif c == ord('~'):
if n:
for _ in range(5-n):
b = b*85+84
out += struct.pack(b'>L',b)[:n-1]
break
return bytes(out)

Parallel IO in R with rhdf5

I have a large amount of data in R data frames that I would like to write to an HDF5 file in parallel. In my initial experiments, the hdf5 file gets corrupted, presumably because parallel IO is not enabled in rhdf5. Is it possible to do parallel IO from R with rhdf5?
Here is an example of what I am trying to do:
library(parallel)
library(Rmpi)
library(rhdf5)
nwrites = 10000
db_name = 'testing_parallel_io.h5'
if(file.exists(db_name))unlink(db_name)
h5createFile(db_name)
write_data = function(index, db_name){
suppressPackageStartupMessages(require(rhdf5))
nr = 1000
nc = 10
df = as.data.frame(matrix(rnorm(nr*nc),nr,nc))
group_name = sprintf('group%05d',index)
dataset_name = sprintf('%s/A',group_name)
h5createGroup(db_name, group_name)
h5write(df, db_name, dataset_name)
return(0)
}
cl = makeCluster(detectCores(),type='MPI')
res = parSapply(cl, 1:nwrites, write_data, db_name)
stopCluster(cl)
mpi.quit()
When I run this I get all sorts of errors from hdf5 like this:
HDF5-DIAG: Error detected in HDF5 (1.8.7) thread 0:
#000: H5D.c line 170 in H5Dcreate2(): unable to create dataset
major: Dataset
minor: Unable to initialize object
#001: H5Dint.c line 431 in H5D_create_named(): unable to create and link to dataset
major: Dataset
minor: Unable to initialize object
#002: H5L.c line 1640 in H5L_link_object(): unable to create new link to object
major: Links
minor: Unable to initialize object
#003: H5L.c line 1884 in H5L_create_real(): can't insert link
major: Symbol table
minor: Unable to insert object
#004: H5Gtraverse.c line 905 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#005: H5Gtraverse.c line 799 in H5G_traverse_real(): component not found
major: Symbol table
minor: Object not found

Resources