Vector graphics formats documentation - vector-graphics

I would like to extract vectors from a vector graphics file to use in my own programs. For example, vectors from a .ai file. What would be a good way to do it? I have seen free converters, so I'm assuming that there must exist an openly accessible .ai documentation. Converting to another format that's open would be ok for me, too.

Related

R Social Network Analysis/Data Manipulation Question: Reading in .edges, .circles, .egofeat, .feat, and .featnames files

So I'm working with a network dataset from Stanford's SNAP Datasets and "SNAP" has wrappers for Python and C++ but not R - however, the data is still usable since I believe it's a mix of CSV files.
I can actually read in the .edges file and form an igraph object but want to read in the other files, get the attributes & add those attributes to the igraph object for analysis. I'm just confused on how to work with the .circles, .egofeat, .feat, and .featnames files since the documentation on the dataset is very scarce. Hoping someone has worked with the dataset in R or even another language and has any pointers to get started.
Thank you!

Write custom metadata to Parquet file in Julia

I am currently storing the output (a Julia Dataframe) of my Julia simulation in a Parquet file using Parquet.jl. I would also like to save some of the simulation parameters (eg. a list of (byte-)strings) to that same output file.
Preferably, these parameters are different for each column as each column is the result of different starting conditions of my code. However, I could also work with a global parameter list and then untangle it afterwards by indexing.
I have found a solution for Python using pyarrow
https://mungingdata.com/pyarrow/arbitrary-metadata-parquet-table/.
Do you know a way how to do it in Julia?
It's not quite done yet, and it's not registered, but my rewrite of the Julia parquet package, Parquet2.jl does support both custom file metadata and individual column metadata (the keyword arguments metadata and column_metadata in Parquet2.writefile.
I haven't gotten to documentation for writing yet, but if you are feeling adventurous you can give it a shot. I do expect to finish up this package and register it within the next couple of weeks. I don't have unit tests in for writing yet, so of course, if you try it and have problems, please open an issue.
It's probably also worth mentioning that the main use case I recommend for parquet is if you must have parquet for compatibility reasons. Most of the time, Julia users are probably better off with Arrow.jl as the format has a number of advantages over parquet for most use cases, please see my FAQ answer on this. Of course, the reason I undertook writing the package is because parquet is arguably the only ubiquitous binary format in "big data world" so a robust writer is desperately needed.

Non-programmer, ascii file data extract (can I even learn to code?)

As the title says, I'm not a programmer. I've tried R before, got very confused and abandoned it. I'm a physician, and I do all my statistics either with SPSS or Excel. I'd like to learn some coding for when I get into problems like this:
I have an ascii file that I'd like to extract data from. The fields are contained within columns of variable width. 90% of the file is useless to me. For example, the fields I'm interested in extracting are encoded in columns 00645-00649, 03315-03319, etc. I'd like to get this into a format so I can run stats in SPSS/Excel. Should I be looking to use R, Python, something else or am I totally beyond hope?
Thanks in advance.
It's impossible to say for certain given only the information here, but the DATA LIST command in SPSS may well allow you to read the data into SPSS directly from the current file. If you can specify the column locations of the desired variables, you can specify those on that command, and SPSS will simply skip over the unnamed columns.

Read C++ binary file in R

Can I read a binary file written by C++ in R?
I have been using Rcpp in my R package and the simulations typically generate a large amount of data. I am planning to write the output to binary files in C++ and then read those back in R. This works if I write as text files but I didn't find a solution with binary files. The program sometimes crashes abruptly if I pass data using many NumericVectors (I am yet to fully understand the memory management using Rcpp).
Can this approach enable me to share larger datasets between C++ and R compared to what is possible by passing vectors? In C++, the maximum vector size is limited by RAM and address bus (may be?) but I think R is able to load larger vectors using swap. Am I correct or misunderstanding the concepts?
Yes you can. But it's "complicated".
You are embarking on a topic called binary serialization. There is a lot of work out there. In essence you are somewhere in the continum between of
minimal: open a file, write out N binary items; then on the other side read N binaries. We did something similar at work years ago where wrote some metadata with <rows,cols,version> and then a binary blob of rows * cols double to attach to a matrix
maximal: use a fully descriptive meta language like Protocol Buffer or MessagePack to describe the binary content, write it in C++ (using the appropriate library) and read in back in R (using the corresponding packages---I am involved with one each: RProtoBuf and RcppMsgPack).
And a lot in between. If you really only need to communicate between C(++) and R you could try the RData / rds format. There is one library: librdata and I experimented with it (and filed some bug reports and made some pull requests). I might start there.
So in short: do some research, figure out what to do and then do it :)
PS If you call C++ via Rcpp from R then you may not need files. We can pass large object back and forth -- the limit may be your RAM.

R: Help reading a particular .mat file into R

So I've been trying to read this particular .mat file into R. I don't know too much about matlab, but I know enough that the R.matlab package can only read uncompressed data into R, and to save it as uncompressed I need to save it as such in matlab by using
save new.mat -v6.
Okay, so I did that, but when I used readMat("new.mat") in R, it just got stuck loading that forever. I also tried using package hdf5 via:
> hdf5load("new.mat", load=FALSE)->g
Error in hdf5load("new.mat", load = FALSE) :
can't handle hdf type 201331051
I'm not sure what this problem could be, but if anyone wants to try to figure this out the file is located at http://dibernardo.tigem.it/MANTRA/MANTRA_online/Matlab_Code%26Data.html and is called inventory.mat (the first file).
Thanks for your help!
This particular file has one object, inventory, which is a struct object, with a lot of different things inside of it. Some are cell arrays, others are vectors of doubles or logicals, and a couple are matrices of doubles. It looks like R.matlab does not like cells arrays within structs, but I'm not sure what's causing issues for R to load this. For reasons like this, I'd generally recommend avoiding mapping structs in Matlab to objects in R. It is similar to a list, and this one can be transformed to a list, but it's not always a good idea.
I recommend creating a new file, one for each object, e.g. ids = inventory.instance_ids and save each object to either a separate .mat file, or save all of them, except for the inventory object, into 1 file. Even better is to go to text, e.g via csvwrite, so that you can see what's being created.
I realize that's going around use of a Matlab to R reader, but having things in a common, universal format is much more useful for reproducibility than to acquire a bunch of different readers for a proprietary format.
Alternatively, you can pass objects in memory via R.matlab, or this set of functions + the R/DCOM interface (on Windows).
Although this doesn't address how to use R.matlab, I've done a lot of transferring of data between R and Matlab, in both directions, and I find that it's best to avoid .mat files (and, similarly, .rdat files). I like to pass objects in memory, so that I can inspect them on each side, or via standard text files. Dealing with application specific file formats, especially those that change quite a bit and are inefficient (I'm looking at you MathWorks), is not a good use of time. I appreciate the folks who work on readers, but having a lot more control over the data structures used in the target language is very much worth the space overhead of using a simple output file format. In-memory data transfer is very nice because you can interface programs, but that may be a distraction if your only goal is to move data.
Have you run the examples in http://cran.r-project.org/web/packages/R.matlab/R.matlab.pdf on pages 22 to 24? That will test your ability to read from versions 4 and 5. I'm not sure that R cannot read compressed files. There is an Rcompresssion package in Omegahat.

Resources