In Julia, how do I find out why Dict{String, Any} is Any? - dictionary

I am very new to Julia and mostly code in Python these days. I am using Julia to work with and manipulate HDF5 files.
So when I get to writing out (h5write), I get an error because the data argument is of mixed type and I need to find out why.
The error message says Array{Dict{String,Any},4} is what I am trying to pass in, but when I look at the values (and it is a huge structure), I see a lot of 0xff and values like this. How do I quickly find why the Any and not a single type?

Just to make this an answer:
If my_dicts is an Array{Dict{String, Any}, 4}, then one way of working out what types are hiding in the Any part of the dict is:
unique(typeof.(values(my_dicts[1])))
To explain:
my_dicts[1] picks out the first element of your Array, i.e. one of your Dict{String, Any}
values then extracts the values, which is the Any part of the dictionary,
typeof. (notice the dot) broadcasts the typeof function over all elements returned by values, returning the types of all of these elements; and
unique takes the list of all these types and reduces it to its unique elements, so you'll end up with a list of each separate type contained in the Any partof your dictionary.

Related

Create unique symbol in Julia, similar to Mathematica's `Unique[]`

I need to populate a column of a data frame with unique factors. I have been using sequential integers, but I don't want to consumer of my function to be confused and think that they can do arithmetic on these values. These values are categorical with no definition for order, distance, and scale. In R, I would have solved this problem with as.factor. I see that there is a CategoricalArrays.jl project, which I have never used, that might offer similar functionality.
Mathematica has a useful Unique function that can create a (as the name implies) unique symbol.
In[1]:= Unique[]
Out[1]= $10
Julia has a similar Symbol that generates a lightweight value that I think makes sense to treat as a factor, but I haven't found a built-in technique to automatically generate unique symbols. You cannot invoke Symbol() without a parameter. I suppose I could call Symbol(UUIDs.uuid1()), but these are very long.
julia> using UUIDs
julia> Symbol(UUIDs.uuid1())
Symbol("8a9452d0-2451-11ec-08b4-3bb7f56a346a")
Is there an idiomatic way to generate short and unique symbols in Julia?
The way to generate unique Symbol is to use the gensym function.
However, I assume you most likely want to use CategoricalArrays.jl as you have commented. This package allows you to create arrays of both ordered or unordered factors - just like in R. The difference from R is that the user will be able to clearly see that what is stored in an array is a factor even after extracting it from an array, e.g.:
julia> using CategoricalArrays
julia> x = categorical(1:3)
3-element CategoricalArray{Int64,1,UInt32}:
1
2
3
julia> x[1]
CategoricalValue{Int64, UInt32} 1
and as you can see the notion of being categorical is not lost which I guess is exactly what you want.

Can a SQLite user-defined function take a row argument?

They are described as scalar, but I think that refers to the return type rather than the arguments.
I'm trying to define one in rust that will provide a TEXT value derived from other columns in the row, for convenience/readability at point of use, I'd like to call it as select myfunc(mytable) from mytable rather than explicitly the columns that it derives.
The rusqlite example simply gets an argument as f64, so it's not that clear to me how it might be possible to interpret it as a row and retrieve columnar values from within it. Nor have I been able to find examples in other languages.
Is this possible?
This doesn't seem possible.
func(tablename) syntax that I'm familiar with seems to be PostgreSQL-specific; SQLite supports func(*) but when func is user-defined it receives zero arguments, not one (structured) or N (all columns separately) as I expected.

In R what makes NULL atomical and therefore unable to exist in a vector?

In R for Everyone by Jared P. Lander on p. 54 it says "...NULL is atomical and cannot exist within a vector. If used inside a vector, it simply disappears."
I understand the concept of being atomic is being indivisible and that NULL represents "nothingness", used commonly to handle returns that are undefined.
Therefore, is NULL atomical b/c it has this one value always of "nothingness", meaning something simply does not exist and therefore R's way of handling that is to just not let it exist in a vector or on assignment in a list it will actually remove that element?
Trying to wrap my head around it and find a more intuitive and comprehensive answer.
In my opinion talking about vectors as being "atomic" is more confusing than helpful. Instead, consider that R has a series of data types built into the language. They are given by definition and are distinct from one another.
For example, one such data type is "integer vector", which represents a sequence of integer values. Note that R does not have a data type of "integer". If we are talking about integer 5 in R, it is actually an integer vector of length 1.
Another built-in data type is NULL. There is a single object of type NULL, which is also called NULL. Since NULL is a type and an object, but not an integer value, it cannot be part of an integer vector.
Missing data in an integer vector are represented by NA. In this context NA is considered an integer value. Note that NA can also be a numeric value, logical value, etc. NA is a not a data type, but a value.
A complete list of built-in data types can be found in the R source code and also in the documentation, e.g. https://cran.r-project.org/doc/manuals/r-release/R-ints.html#SEXPTYPEs

Determining argument descriptions within R

I need a way to determine the description of an argument within R.
For example, if I'm using the function qplot() from the package ggplot2, I need to be able to extract a description of each argument in qplot(). From ggplot2's reference manual, it's easy to see that there are several arguments to consider. One of them is called "data", which is described as: "data frame to use (optional). If not specified, will create one, extracting vectors from the current environment."
Is there a way to get this information from within an R session, rather than by reading a reference manual? Maybe an R function similar to packageDescription(), but for a function's arguments?
Thanks!
edit: I found a variant on my question answered here:
How to access the help/documentation .rd source files in R?
Reading the .Rd files seems like the safest way to get the information I need. For anyone interested, the following code returns a list of arguments and their descriptions, where "package_name" can be any package you want:
db <- Rd_db("package_name")
lapply(db, tools:::.Rd_get_metadata, "arguments")
Thank you for your help, everyone.
From the R console in the Mac GUI R.app ... When I look at the text output from help'seq', help_type="text") (which goes to a temporary file) I see that the beginning of hte descriptions you want are demarcated by:
_A_r_g_u_m_e_n_t_s: # Those underscores were ^H's before I pasted
And then the arguments appear in are name:description pairs:
...: arguments passed to or from methods.
from, to: the starting and (maximal) end values of the sequence. Of
length ‘1’ unless just ‘from’ is supplied as an unnamed
argument.
by: number: increment of the sequence.
length.out: desired length of the sequence. A non-negative number,
which for ‘seq’ and ‘seq.int’ will be rounded up if
fractional.
along.with: take the length from the length of this argument.
When I use a Terminal session to get that same output it appears in the same window but as a Unix help page like:
Arguments:
...: arguments passed to or from methods.
from, to: the starting and (maximal) end values of the sequence. Of
length ‘1’ unless just ‘from’ is supplied as an unnamed
argument.
by: number: increment of the sequence.
length.out: desired length of the sequence. A non-negative number,
which for ‘seq’ and ‘seq.int’ will be rounded up if
fractional.
along.with: take the length from the length of this argument.
I believe these are displayed by whatever system program is called by the value of options("pager"). In my case, that is the program "less".

Fortran90 created allocatable arrays but elements incorrect

Trying to create an array from an xyz data file. The data file is arranged so that x,y,z of each atom is on a new line and I want the array to reflect this.
Then to use this array to find find the distance from each atom in the list with all the others.
To do this the array has been copied such that atom1 & atom2 should be identical to the input file.
length is simply the number of atoms in the list.
The write statement: WRITE(20,'(3F12.9)') atom1 actually gives the matrix wanted but when I try to find individual elements they're all wrong!
Any help would be really appreciated!
Thanks guys.
DOUBLE PRECISION, DIMENSION(:,:), ALLOCATABLE ::atom1,atom2'
ALLOCATE(atom1(length,3),atom2(length,3))
READ(10,*) ((atom1(i,j), i=1,length), j=1,3)
atom2=atom1
distn=0
distc=0
DO n=1,length
x1=atom1(n,1)
y1=atom1(n,2) !1st atom
z1=atom1(n,3)
DO m=1,length
x2=atom2(m,1)
y2=atom2(m,2) !2nd atom
z2=atom2(m,3)`
Your READ statement reads all the x coordinates for all atoms from however many records, then all the y coordinates, then all the z coordinates. That's inconsistent with your description of the input file. You have the nesting of the io-implied-do's in the READ statement around the wrong way - it should be ((atom1(i,j),j=1,3),i=1,length).
Similarly, as per the comment, your diagnostic write mislead you - you were outputting all x ordinates, followed by all y ordinates, etc. Array element order of a whole array reference varies the first (leftmost) dimension fastest (colloquially known as column major order).
(There are various pitfalls associated with list directed formatting that mean I wouldn't recommend it for production code (or perhaps for input specifically written with the knowledge of and defence against those pitfalls). One of those pitfalls is that the READ under list directed formatting will pull in as many records as it requires to satisfy the input list. You may have detected the problem earlier if you were using an explicit format that nominated the number of fields per record.)

Resources