A graph to indicate the percentage of elements from one node that are transferred to another node in two different stages. In principle, the number of nodes in one stage need not be equal to the number of nodes in the next stage. I would like to know the name of this type of graph and if it is possible to create it in R.
This is sankey diagram:
https://r-graph-gallery.com/sankey-diagram.html
You could find more info how to do it here:
https://plotly.com/r/sankey-diagram/
I am trying to find a way to have a dynamic plot (for the moment I use plotly) in which if you keep the mouse over a certain point it is calling a routine which is showing something, e.g. a photo that is referring to that specific data point. An example is to have the MNIST data-set clustered in 2 dimensions (e.g. using t-SNE) and when you go over the points with the pointer you see the actual digit which stays for that point.
I'm getting familiar with Graphviz and wonder if it's doable to generate a diagram/graph like the one below (not sure what you call it). If not, does anyone know what's a good open source framework that does it? (pref, C++, Java or Python).
According to Many Eyes, this is a bubble chart. They say:
It is especially useful for data sets with dozens to hundreds of values, or with values that differ by several orders of magnitude.
...
To see the exact value of a circle on the chart, move your mouse over it. If you are charting more than one dimension, use the menu to choose which dimension to show. If your data set has multiple numeric columns, you can choose which column to base the circle sizes on by using the menu at the bottom of the chart.
Thus, any presentation with a lot of bubbles in it (especially with many small bubbles) would have to be dynamic to respond to the mouse.
My usual practice with bubble charts is to show three or four variables (x, y and another variable through the size of the bubble, and perhaps another variable with the color or shading of the bubble). With animation, you can show development over time too - see GapMinder. FlowingData provides a good example with a tutorial on how to make static bubble charts in R.
In the example shown in the question, though, the bubbles appear to be located somewhat to have similar companies close together. Even then, the exact design criteria are unclear to me. For example, I'd have expected Volkswagen to be closer to General Motors than Pfizer is (if some measure of company similarity is used to place the bubbles), but that isn't so in this diagram.
You could use Graphviz to produce a static version of a bubble chart, but there would be quite a lot of work involved to do so. You would have to preprocess the data to calculate a similarity matrix, obtain edge weights from that matrix, assign colours and sizes to each bubble and then have the preprocessing script write the Graphviz file with all edges hidden and run the Graphviz file through neato to draw it.
The subject of this question might not give the true scenario, please read all below, thanks.
I am developing a Scatter Plot based on following data (JSON - in a file simple.json):
{
"docs":
[
{"timestamp":"01","id":"100","quantity":"5","pay":"50","bp":"25","city":"Multan"},
{"timestamp":"02","id":"200","quantity":"10","pay":"100","bp":"50","city":"Lahore"},
{"timestamp":"03","id":"300","quantity":"3","pay":"30","bp":"15","city":"Multan"},
{"timestamp":"04","id":"400","quantity":"5","pay":"50","bp":"25","city":"Multan"},
{"timestamp":"05","id":"500","quantity":"6","pay":"60","bp":"30","city":"Lahore"},
{"timestamp":"06","id":"600","quantity":"15","pay":"150","bp":"75","city":"Islamabad"},
{"timestamp":"07","id":"700","quantity":"14","pay":"140","bp":"70","city":"Islamabad"},
{"timestamp":"08","id":"800","quantity":"18","pay":"180","bp":"90","city":"Islamabad"},
{"timestamp":"09","id":"900","quantity":"7","pay":"70","bp":"35","city":"Lahore"},
{"timestamp":"10","id":"1000","quantity":"20","pay":"200","bp":"100","city":"Islamabad"}
]
}
I am trying to develop a Re-Usable graph, where I can present user with available data columns (from above data). So user can select a certain column (say "id") for X axis and another column (say "quantity") for Y axis (till here everything is perfect and as per expectations). And later user can select another column and can click a button to plot that column on the graph (along with previously added columns).
Here comes the problem:
When I proceed with another column (say "pay") for Y axis, while keeping previously on the graph, new ones get plotted correctly (I am rescaling the axis based on new data as well). But the old ones DO NOT RE-ARRANGED. This is the actual problem. I am thinking to keep track of each column added (by storing column references in a separate array), so every time there's a new column, I will have to redraw the old ones again (should I?). But this doesn't look feasible in terms of D3's power or performance.
For this I also applied an anonymous class "update" to every circle drawn, so that I can pick all "update" circles, but here comes another issue, that how would I know the new place for these circles? Do I need to traverse the data again for that particular series? and have to do that drawing again? For every new series, keeping track of old-ones and redrawing them, will increase the processing over-head turn by turn. Is there any handy solution or built-in (d3's) mechanism to re-adjust previous drawing according to new scale?
Please suggest something. I am sure I am lacking some key points.
I'm searching a data viewer/plotter for some data I've generated.
Facts
First some facts about the data I've generated:
There are several datasets with about 3 million data points each.
Each dataset currently is stored in ascii format.
Every line represents a point and consists of multiple columns.
The first two columns determine the position of the point (i.e. x and y value) whereas the first column is a timestamp and the second is a normalized float between 0 and 1.
The other columns contain additional data which may be used to colorize the plot or filter the data.
An example data point:
2012-08-08T01:02:03.040 0.0165719281 foobar SUCCESS XX:1
Current Approach
Currently I am generating multiple png files (with gnuplot) with different selection criteria like the following ones for each data set:
Display all points in grey.
Display all points in grey, but SUCCESS in red.
Display all points in grey, but SUCCESS in red, XX:-1 in green; if both SUCCESS and XX:-1 match use blue as coloring.
Drawbacks
With the current approach there are some drawbacks I'd like to have addressed:
I can't easily switch on/off some filters or colorings because I have to generate a new png file every time.
I need to use a limited resolution in my image file because the higher the resolution the slower is the viewer. So I can only zoom in to a limited level of detail.
I don't have the raw data available in the png viewer for each point. Ideally I'd like to have the data visible on selection of a point.
Already tested
I've already tested some other approaches:
Gnuplot itself has a viewer but it can't handle that amount of points efficiently - it is too slow and consumes too much memory.
I've had a quick look at KST, but I couldn't find a way to display 2D data and I don't think it will meet my wishes.
Wishes
I'd like to have a viewer which can operate on the raw data, can displays the points quickly if zoomed out, can also zoom in quickly and as well should resolve the aforementioned drawbacks.
Question
So finally, does anybody know of such a viewer or has another suggestion?
If there isn't a viewer some recommendations for programming it myself are welcome, too.
Thanks in advance
Stefan