How can I make DOT/neato graphs more compact without introducing overlap? - dot

My question is essentially the same as this one but the given answer doesn't work for me.
Here is a sample rendering (source) with
compound=true;
overlap=scalexy;
splines=true;
layout=neato;
There is some unnecessary overlap in the edges but this isn't too bad, the main problem is all the wasted space.
I tried setting sep=-0.7; and here's what happens.
The spacing is much better but now there is some overlap with the nodes. I experimented with different overlap parameters and this is the only one which gives remotely acceptable results.
I tried changing to fdp layout and setting the spring constant attribute K globally but I just got stuff like this:
The source is all straightforward a--b--c sort of stuff, no fancy tricks.
What I want is for all edges to be shortened as much as possible (up to a minimum) provided that this adjustment doesn't introduce any new overlaps, which is where sep fails completely. That doesn't seem like it should be too hard for a layout engine to do. Is it possible with the graphviz suite? I don't mind changing rendering software, but I don't want to annotate the source on a per-node or per-edge basis.
My ideal result would be to minimize the deviation in edge length, considered one node at a time, i.e. each node would have edges of equal length apart from the necessary exceptions, but that's wishful thinking. The priority is to reduce the length of each edge with the constraint that this cannot introduce overlap.
I will accept partial solutions but they must be fully automatic and open source.
How can I do this? Thanks.

I found https://sites.google.com/site/kuabus/programming-by-hu/graphviz-test-tool, an interactive tool for parameterizing the many options and repeatedly rendering them. I went through the full list provided by the Java application, eventually ending up with this set of attributes:
overlap=false
maxiter=99999999
damping=9999999
voro_margin=.001
start=0.1
K=1
nodesep=999999999999
labelloc=c
defaultdist=9999999
size=20,20
sep=+1
normalize=99999999
labeljust=l
outputorder=nodesfirst
concentrate=true
mindist=2
fontsize=99999999
center=true
scale=.01
inputscale=99999999
levelsgap=9999999
epsilon=0.0001
I was not able to find a parameterization of neato that made producing the desired "moderately scaled" graph possible.

You should set
overlap = compress;
this should compress it at much as possible.
Try sep = +1; first, and then play with values between 0 and +1 to find the optimal setting for you.

I have a graph with 50 nodes and 68 edged (sorry cannot publish the whole picture, just a fragment). Found two reasonable presets (1 and 2):
digraph {
graph[
# 1. Less overlaps but less compact.
# This is the choice for now.
layout=neato; overlap=prism; overlap_scaling=-3.5;
# 2. More compact but some overlaps exist (may be adjusted by `sep`).
#layout=neato; overlap=voronoi; sep=-0.15;
# The following is common.
outputorder=nodesfirst, # Will always draw edges over nodes.
splines=curved;
]
node[fontname="Helvetica",];
node[shape=box;style="filled";penwidth="0.5";width=0;height=0;margin="0.05,0.05"];
edge[label=" ";color="#000080";penwidth="0.5";arrowhead="open";arrowsize="0.7";];
. . .
}

Related

Plotting a transition map in Gnuplot 4.6 (or other softwares?)

Goodmorning everybody,
I searched in the web for this problem for something like an hour an nobody seemed to have my same difficulty, so here I am. I'll explain you my problem: I have four states (but in future they can become much more the four) A, B, C and D between which a particle can do transitions. Suppose I know exactly how many times the particle undergoes the A->B, B->C, C->D, A->C (... and so on) transitions. I want to represent my states as dots on a circle and the transitions as arrows between states, which are wider (the arrows, not the dots!) proportionally to the number of transitions. I hope I made myself clear. Is this possible on Gnuplot 4.6? Otherwise: do you know other programs able to do that? Because I saw maps like this before, but sincerly I don't even know how to search for them!!
Thank you very much for your help.
A much more suitable tool for your problem would be Graphviz. You can use the Graphviz “DOT” language to express your transition map directly, and then use the Graphviz tools to perform various types of automated layout (including circular ones) and to produce image files showing the nodes and edges in the graph. If none of the automated layouts are what you want, you can compute them yourself and set the positions explicitly, and still use graphviz's renderer.

Problem with Principal Component Analysis

I'm not sure this is the right place but here I go:
I have a database of 300 picture in high-resolution. I want to compute the PCA on this database and so far here is what I do: - reshape every image as a single column vector - create a matrix of all my data (500x300) - compute the average column and substract it to my matrix, this gives me X - compute the correlation C = X'X (300x300) - find the eigenvectors V and Eigen Values D of C. - the PCA matrix is given by XV*D^-1/2, where each column is a Principal Component
This is great and gives me correct component.
Now what I'm doing is doing the same PCA on the same database, except that the images have a lower resolution.
Here are my results, low-res on the left and high-res on the right. Has you can see most of them are similar but SOME images are not the same (the ones I circled)
Is there any way to explain this? I need for my algorithm to have the same images, but one set in high-res and the other one in low-res, how can I make this happen?
thanks
It is very possible that the filter you used could have done a thing or two to some of the components. After all, lower resolution images don't contain higher frequencies that, too, contribute to which components you're going to get. If component weights (lambdas) at those images are small, there's also a good possibility of errors.
I'm guessing your component images are sorted by weight. If they are, I would try to use a different pre-downsampling filter and see if it gives different results (essentially obtain lower resolution images by different means). It is possible that the components that come out differently have lots of frequency content in the transition band of that filter. It looks like images circled with red are nearly perfect inversions of each other. Filters can cause such things.
If your images are not sorted by weight, I wouldn't be surprised if the ones you circled have very little weight and that could simply be a computational precision error or something of that sort. In any case, we would probably need a little more information about how you downsample, how you sort the images before displaying them. Also, I wouldn't expect all images to be extremely similar because you're essentially getting rid of quite a few frequency components. I'm pretty sure it wouldn't have anything to do with the fact that you're stretching out images into vectors to compute PCA, but try to stretch them out in a different direction (take columns instead of rows or vice versa) and try that. If it changes the result, then perhaps you might want to try to perform PCA somewhat differently, not sure how.

How to detect a trend inside unsteady data (e.g. Trendly)?

I was wondering what kind of model / method / technique Trendly might use to achieve this model:
[It tries to find the moments where significant changes set in and ignores random movements]
Any pointers very welcome! :)
I've never seen 'Trendly', and don't know anything about it, but if I wanted to produce that red line from that blue line, in an algorithmic fashion, I would try:
Fourier the whole data set
Choose a block size longer than the period of the dominant frequency
Divide the data up into blocks of the chosen size
Compare adjacent ones with a statistical test of some sort.
Where the test says two blocks belong to the same underlying distribution, merge them.
If any were merged, go back to 4.
Red trend line is the mean of each block.
A simple "median" function could produce smoother curves over a mostly un-smooth curve.
Otherwise, a brute-force or genetic algorithm could be used; attempting to find the way to split the data into sections, so that more sections = worse solution, and less accuracy of the lines = worse solution.
Another way would be like this: Start at the beginning. As soon as the line moves outside of some radius (3 above or 3 below the first, for instance) set the new height to an average of the current line's height and the previous marker.
If you keep doing that, it would ignore small fluctuations. However, if the fluctuation was large enough, it would still effect it.

Very general question about the implementation of vectors, vertices, edges, rays, lines & line segments

This is just a LARGE generalized question regarding rays (and/or line segments or edges etc) and their place in a software rendered 3d engine that is/not performing raytracing operations. I'm learning the basics and I'm the first to admit that I don't know much about this stuff so please be kind. :)
I wondered why a parameterized line is not used instead of a ray(or are they??). I have looked around at a few cpp files around the internet and seen a couple of resources define a Ray.cpp object, one with a vertex and a vector, another used a point and a vector. I'm pretty sure that you can define an infinate line with only a normal or a vector and then define intersecting points along that line to create a line segment as a subset of that infinate line. Are there any current engines implementing lines in this way, or is there a better way to go about this?
To add further complication (or simplicity?) Wikipedia says that in vector space, the end points of a line segment are often vectors, notably u -> u + v, which makes alot of sence if defining a line by vectors in space rather than intersecting an already defined, infinate line, but I cannot find any implementation of this either which makes me wonder about the validity of my thoughts when applying this in a 3d engine and even further complication is created when looking at the Flash 3D engine, Papervision, I looked at the Ray class and it takes 6 individual number values as it's parameters and then returns them as 2 different Number3D, (the Papervision equivalent of a Vector), data types?!?
I'd be very interested to see an implementation of something which actually uses the CORRECT way of implementing these low level parts as per their true definitions.
I'm pretty sure that you can define an infinate line with only a normal or a vector
No, you can't. A vector would define a direction of the line, but all the parallel lines share the same direction, so to pick one, you need to pin it down using a specific point that the line passes through.
Lines are typically defined in Origin + Direction*K form, where K would take any real value, because that form is easy for other math. You could as well use two points on the line.

Similarity Between Colors

I'm writing a program that works with images and at some point I need to posterize the image. This means I need to bin the colors, but I'm having trouble deciding how to tell how close one color is to another.
Given a color in RGB, I can think of at least 2 ways to see how different they are:
|r1 - r2| + |g1 - g2| + |b1 - b2|
sqrt((r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2)
And if I move into HSV, I can think of other ways of doing it.
So I ask, ignoring speed, what is the best way to tell how similar two colors are? Best meaning most accurate to the human eye.
Well, if speed is not an issue, the most accurate way would be to take some sample images and apply the filter to them using various cutoff values for the distance (distance being determined by one of the equations on the Color_difference page that astander linked to, meaning you'd have to use one of those color spaces listed there with the calculations, then convert to sRGB or something [which also means that you'd need to convert the image into the other color space first if it's not in it to begin with]), and then have a large number of people examine the images to see what looks best to them, then go with the cutoff value for the images that the majority agrees looks best.
Basically, it's largely a matter of subjectiveness; in fact, it also depends on how stylized you want the images, and you might even want to add in some sort of control so that you can alter the cutoff distance on the fly.
If speed does become a bit of an issue and/or you want more simplicity, then just use your second choice for distance calculation (which is simply the CIE76 equation; just make sure to use the Lab* color space) with the cutoff being around 2 or 2.3.
What do you mean by "posterize the image"?
If you're trying to cluster the colors into bins, you should look at
cluster analysis
Just a comment if you are going to move to HSV (or similar spaces):
Diffing on H: difference between 0° and 359° is numerically big but perceptually is negligible.
H difference if V or S are small - is small.
For computer vision apps, more important not perceptual difference (used mostly by paint manufacturers) but are these colors belong to the same object/segment or not. Which means that we might partially ignore V, which can change from lighting conditions.

Resources