Read HDF5 data with numpy axis order with Julia HDF5 - julia

I have an HDF5 file containing arrays that are saved with Python/numpy. When I read them into Julia using HDF5.jl, the axes are in the reverse of the order in which they appear in Python. To reduce the mental gymnastics involved in moving between the Python and Julia codebases, I reverse the axis order when I read the data into Julia. I have written my own function to do this:
function reversedims(ary::Array)
permutedims(ary, [ ndims(ary):-1:1 ])
end
data = HDF5.read(someh5file, somekey) |> reversedims
This is not ideal because (1) I always have to import reversedims to use this; (2) I have to remember to do this for each Array I read. I am wondering if it is possible to either:
instruct HDF5.jl to read in the arrays with a numpy-style axis order, either through a keyword argument or some kind of global configuration parameter
use a builtin single argument function to reverse the axes

The best approach would be to create a H5py.jl package, modeled on MAT.jl (which reads and writes .mat files created by Matlab). See also https://github.com/timholy/HDF5.jl/issues/180.

It looks to me like permutedims! does what you're looking for, however it does do an array copy. If you can rewrite the hdf5 files in python, numpy.asfortranarray claims to return your data stored in column-major format, though the numpy internals docs seem to suggest that the data isn't altered, simply the stride is, so I don't know if the hdf5 file output would be any different
Edit: Sorry, I just saw you are already using permutedims in your function. I couldn't find anything else on the Julia side, but I would still try the numpy.asfortranarray and see if that helps.

Related

R write in feather format to byte vector

In R, how can a data.frame be written to an in-memory raw byte vector in the feather format?
The arrow package has a write_feather function in which the destination can be an BufferedOutputStream, but the documentation doesn’t describe how to create such a stream or access its underlying buffer.
Other than that most other packages assume usage of a local file system rather than in-memory storage.
Thank you in advance for your consideration and response.
BufferOutputStream$create() is how you create one. You can pass that to write_feather(). If you want a raw R vector back, you can use write_to_raw(), which wraps that. See https://arrow.apache.org/docs/r/reference/write_to_raw.html for docs; there's a link there to the source if you want to see exactly what it's doing, in case you want to do something slightly differently.

How to make an R object immutable? [duplicate]

I'm working in R, and I'd like to define some variables that I (or one of my collaborators) cannot change. In C++ I'd do this:
const std::string path( "/projects/current" );
How do I do this in the R programming language?
Edit for clarity: I know that I can define strings like this in R:
path = "/projects/current"
What I really want is a language construct that guarantees that nobody can ever change the value associated with the variable named "path."
Edit to respond to comments:
It's technically true that const is a compile-time guarantee, but it would be valid in my mind that the R interpreter would throw stop execution with an error message. For example, look what happens when you try to assign values to a numeric constant:
> 7 = 3
Error in 7 = 3 : invalid (do_set) left-hand side to assignment
So what I really want is a language feature that allows you to assign values once and only once, and there should be some kind of error when you try to assign a new value to a variabled declared as const. I don't care if the error occurs at run-time, especially if there's no compilation phase. This might not technically be const by the Wikipedia definition, but it's very close. It also looks like this is not possible in the R programming language.
See lockBinding:
a <- 1
lockBinding("a", globalenv())
a <- 2
Error: cannot change value of locked binding for 'a'
Since you are planning to distribute your code to others, you could (should?) consider to create a package. Create within that package a NAMESPACE. There you can define variables that will have a constant value. At least to the functions that your package uses. Have a look at Tierney (2003) Name Space Management for R
I'm pretty sure that this isn't possible in R. If you're worried about accidentally re-writing the value then the easiest thing to do would be to put all of your constants into a list structure then you know when you're using those values. Something like:
my.consts<-list(pi=3.14159,e=2.718,c=3e8)
Then when you need to access them you have an aide memoir to know what not to do and also it pushes them out of your normal namespace.
Another place to ask would be R development mailing list. Hope this helps.
(Edited for new idea:) The bindenv functions provide an
experimental interface for adjustments to environments and bindings within environments. They allow for locking environments as well as individual bindings, and for linking a variable to a function.
This seems like the sort of thing that could give a false sense of security (like a const pointer to a non-const variable) but it might help.
(Edited for focus:) const is a compile-time guarantee, not a lock-down on bits in memory. Since R doesn't have a compile phase where it looks at all the code at once (it is built for interactive use), there's no way to check that future instructions won't violate any guarantee. If there's a right way to do this, the folks at the R-help list will know. My suggested workaround: fake your own compilation. Write a script to preprocess your R code that will manually substitute the corresponding literal for each appearance of your "constant" variables.
(Original:) What benefit are you hoping to get from having a variable that acts like a C "const"?
Since R has exclusively call-by-value semantics (unless you do some munging with environments), there isn't any reason to worry about clobbering your variables by calling functions on them. Adopting some sort of naming conventions or using some OOP structure is probably the right solution if you're worried about you and your collaborators accidentally using variables with the same names.
The feature you're looking for may exist, but I doubt it given the origin of R as a interactive environment where you'd want to be able to undo your actions.
R doesn't have a language constant feature. The list idea above is good; I personally use a naming convention like ALL_CAPS.
I took the answer below from this website
The simplest sort of R expression is just a constant value, typically a numeric value (a number) or a character value (a piece of text). For example, if we need to specify a number of seconds corresponding to 10 minutes, we specify a number.
> 600
[1] 600
If we need to specify the name of a file that we want to read data from, we specify the name as a character value. Character values must be surrounded by either double-quotes or single-quotes.
> "http://www.census.gov/ipc/www/popclockworld.html"
[1] "http://www.census.gov/ipc/www/popclockworld.html"

What is the Julia's best approximation to R objects' attributes?

I store important metadata in R objects as attributes. I want to migrate my workflow to Julia and I am looking for a way to represent at least temporarily the attributes as something accessible by Julia. Then I can start thinking about extending the RData package to fill this data structure with actual objects' attributes.
I understand, that annotating with things like label or unit in DataFrame - I think the most important use for object' attributes - is probably going to be implemented in the DataFrames package some time (https://github.com/JuliaData/DataFrames.jl/issues/35). But I am asking about about more general solution, that doesn't depend on this specific use case.
For anyone interested, here is a related discussion in the RData package
In Julia it is ideomatic to define your own types - you'd simply make fields in the type to store the attributes. In R, the nice thing about storing things as attributes is that they don't affect how the type dispatches - e.g. adding metadata to a Vector doesn't make it stop behaving like a Vector. In julia, that approach is a little more complicated - you'd have to define the AbstractVector interface for your type https://docs.julialang.org/en/latest/manual/interfaces/#man-interface-array-1 to have it behave like a Vector.
In essence, this means that the workflow solutions are a little different - e.g. often the attribute metadata in R is used to associate metadata to an object when it's returned from a function. An easy way to do something similar in Julia is to have the function return a tuple and assign the result to a tuple:
function ex()
res = rand(5)
met = "uniformly distributed random numbers"
res, met
end
result, metadata = ex()
I don't think there are plans to implement attributes like in R.

Which functions should I use to work with an XDF file on HDFS?

I have an .xdf file on an HDFS cluster which is around 10 GB having nearly 70 columns. I want to read it into a R object so that I could perform some transformation and manipulation. I tried to Google about it and come around with two functions:
rxReadXdf
rxXdfToDataFrame
Could any one tell me the preferred function for this as I want to read data & perform the transformation in parallel on each node of the cluster?
Also if I read and perform transformation in chunks, do I have to merge the output of each chunks?
Thanks for your help in advance.
Cheers,
Amit
Note that rxReadXdf and rxXdfToDataFrame have different arguments and do slightly different things:
rxReadXdf has a numRows argument, so use this if you want to read the top 1000 (say) rows of the dataset
rxXdfToDataFrame supports rxTransforms, so use this if you want to manipulate your data in addition to reading it
rxXdfToDataFrame also has the maxRowsByCols argument, which is another way of capping the size of the input
So in your case, you want to use rxXdfToDataFrame since you're transforming the data in addition to reading it. rxReadXdf is a bit faster in the local compute context if you just want to read the data (no transforms). This is probably also true for HDFS, but I haven’t checked this.
However, are you sure that you want to read the data into a data frame? You can use rxDataStep to run (almost) arbitrary R code on an xdf file, while still leaving your data in that format. See the linked documentation page for how to use the transforms arguments.

Create an alias to a slot of an object in R

I've bumped my head on the walls trying to create an alias (aka a pointer, or a new short nickname designating the same object in memory without copying that object) to a subpart of a complex object. Let's say I am working with an object of class SpatialPolygonsDataFrame (package "sp"), and I want to perform operations on an part thereof, deep down in the hierarchical representation of that object. Instead of writing repeatedly things like
myBigMap#polygons[FRA][[1]]#Polygons[[1]]
I want to be able to write simply
mypolygon
so that
myBigMap#polygons[FRA][[1]]#Polygons[[1]]#coords
can be abbreviated
mypolygon#coords
etc. I've seen that I should maybe use environments as a replacement to the former .Alias defunct function, but can't find out how to tell R that I want to consider a subpart of a complex object as an environment. Thanks!
assignment:
mypolygon=myBigMap#polygons[FRA][[1]]#Polygons[[1]]
doesn't create a copy until you modify something in it. So if its just shorthand for accessing the data to make some code more readable then that will be fine:
mypolygon#coords
mean(mypolygon#coords[,1])
neither of those will make a copy.
However, if you do modify mypolygon, eg by changing #coords, you need to put the modified value back in the structure since a copy is made:
mypolygon#coords = mypolygon#coords * 1000
myBigMap#polygons[FRA][[1]]#Polygons[[1]] = mypolygon
I think that's a preferred solution, since its just as efficient as any kind of magic aliasing scheme and its explicit since there's no magic action-at-a-distance happening.
I don't think there's any way to alias parts of an object like the way you want to do.

Resources