Visualising changing rank-ordering when missing data is present

Visualising changing rank-ordering when missing data is present - r

I wish to visualise changes in relative rankings between categories through time, much like this so-called 'subway-style' plot. However, not all categories are present in all time steps.
I have made a preliminary plot (see attached) that is sufficient to interpret the data (one simply needs to look at the crossing lines). However, because not every category is represented in every time slice, lines may traverse the y-axis without any change in rank, which is visually confusing:
Do algorithms exist for minimising the kinks in static ranks when missing data is present? To put it another way, my goal is to maintain straight lines wherever possible (when there is no change in relative rank).

Related

Is there a way to show the significance of category means against a set value, not one another?

I am graphing measured results versus expected results from a model, grouped by categories (the category in the boxplot below is one of a few different ones I'm using). For each data point, I subtracted the expected from the observed to determine the difference. My task is to modify the model to minimize the difference.
I would like to add the significance level to this chart but all resources I am finding are to compare means of each category to one another. In this case, I would like to know if each of the category's means is significantly different from 0. I can run this test one by one, selecting for data points falling within each category and testing for a difference from 0, but this seems inefficient.
Is there a way to automatically generate this and plot it? stat_compare_means seemed promising but I couldn't figure out how to make it work, while stat_pvalue_manual may hold more promise if I figure out how to code this.
Thanks in advance!
Sample boxplot (too new to add preview)

Issues with combining different (continuous and ordinal) plot types into one plot

I am preparing a figure for a paper presenting data for 2 different experiments in one plot. For that reason I don't need a legend for every plot, so I try to combine them with ggdraw from cowplot.
My code
should generate a reproducible example
and gives this output:
It seems like the two figures get the same slot (A) and the legend gets slot (B). Typically, I would probably use facet wrap to plot them together (which should also guarantee that the scaling/legend is consistent across the two plots.), but that will probably not work in this case, as I am trying to add an additional figure type to C and D.
The problem is that this figure type is ordinal so I have used a somewhat “hacky” approach to plot it, giving me this figure looking essentially as I want it to:
I so far have not been able to extract to another element that ggdraw can use.
Ideally the final plot should roughly look like this (of course with different labels):
How would you go about plotting these different types together?
Thank you for taking time to read my question and I hope that you can help me. I now it is quite a mouth full, but I was not sure how I meaningfully could reduce it to smaller chunks.

Is there a way to create a geom_path heatmap in ggplot?

For example, this is a heatmap from a website using GPS data:
I have gotten some degree of success with adding a weight parameter to each vertex and calculating the number of events that have vertices near those, but that takes a long time, especially with a large amount of data. It also appears a bit spotty when the distance between vertices is a bit wonky, which causes random splotches of different colors throughout the heatmap. It looks kind of cool, but it makes the data a bit harder to read.
When you zoom out, it looks a bit more continuous due to the paths overlapping more.
In R, the closest I can do to this involves using an alpha channel, but that only gets me a monochromatic heatmap, which is not always desirable, especially when you want to see lesser-traveled paths visibly. In theory I could do two lines to resolve the visibility part (first opaque, second semi-transparent), but I would like to be able to have different hue values.
Ideally I would like this to work with ggplot, but if it cannot, I would accept other methods, provided they are reasonably quick computationally.
Edit: The data format is a data frame with sequential (latitude, longitude) coordinate pairs, along with some associated data that can be used for filter & grouping (such as activity type and event ID).
Here is a sample of the data for the region displayed in the above images (~1.5 MB):
https://www.dropbox.com/s/13p2jtz4760m26d/sample_coordinate_data.csv?dl=0

I would try something like
ggplot() + geom_count(data, aes(longitude, latitude, alpha=..prop..))
but you need to show some data to check how it works.

Tableau map shapes overlapped

I am trying to render some geographic data onto the map in Tableau. However, some data points located at the same point, so the shape images of the data points overlaps together. By clicking on a shape, you could only get the top one.
How can we distinguish the overlapped data points in Tableau? I know that we can manually exclude the top data to see another, but is there any other way, for example, make a drop down list in the right click menu to select the overlapped data points?
Thank you!

There are a couple of ways to deal with this issue.
Some choices you can try are:
Add some transparency to the marks by editing the color shelf properties. That way at least you get a visual indication when there are multiple marks stacked on top of each other. This approach can be considered a poor man's heat map if you have many points in different areas as the denser/darker sections will have more marks. (But that just affects the appearance and doesn't help you select and view details for marks that are covered by others)
Add some small pseudo-random jitter to each coordinate using calculated fields. This will be easier when Tableau supports a rand() function, but in the meantime you can get creative enough using other fields and the math function to add a little jitter. The goal here is to slightly shift locations enough that they don't stack exactly, but not enough to matter in precision. Depends on the scale.
Make a grid style heat map where the color indicates the number of data points in each grid. To do this, you'll need to create calculated fields to bin together nearby latitudes or longitudes. Say to round each latitude to a certain number of decimal places, or use the hex bin functions in Tableau. Those calculated fields will need to have a geographic role and be treated as continuous dimensions.
Define your visualization to display one mark for each unique location, and then use color or size to indicate the number of data points at that location, as opposed to a mark for each individual data point

D3.js - Multiple Series (columns) of Data on ScatterPlot at Y Axis

The subject of this question might not give the true scenario, please read all below, thanks.
I am developing a Scatter Plot based on following data (JSON - in a file simple.json):
{
"docs":
[
{"timestamp":"01","id":"100","quantity":"5","pay":"50","bp":"25","city":"Multan"},
{"timestamp":"02","id":"200","quantity":"10","pay":"100","bp":"50","city":"Lahore"},
{"timestamp":"03","id":"300","quantity":"3","pay":"30","bp":"15","city":"Multan"},
{"timestamp":"04","id":"400","quantity":"5","pay":"50","bp":"25","city":"Multan"},
{"timestamp":"05","id":"500","quantity":"6","pay":"60","bp":"30","city":"Lahore"},
{"timestamp":"06","id":"600","quantity":"15","pay":"150","bp":"75","city":"Islamabad"},
{"timestamp":"07","id":"700","quantity":"14","pay":"140","bp":"70","city":"Islamabad"},
{"timestamp":"08","id":"800","quantity":"18","pay":"180","bp":"90","city":"Islamabad"},
{"timestamp":"09","id":"900","quantity":"7","pay":"70","bp":"35","city":"Lahore"},
{"timestamp":"10","id":"1000","quantity":"20","pay":"200","bp":"100","city":"Islamabad"}
]
}
I am trying to develop a Re-Usable graph, where I can present user with available data columns (from above data). So user can select a certain column (say "id") for X axis and another column (say "quantity") for Y axis (till here everything is perfect and as per expectations). And later user can select another column and can click a button to plot that column on the graph (along with previously added columns).
Here comes the problem:
When I proceed with another column (say "pay") for Y axis, while keeping previously on the graph, new ones get plotted correctly (I am rescaling the axis based on new data as well). But the old ones DO NOT RE-ARRANGED. This is the actual problem. I am thinking to keep track of each column added (by storing column references in a separate array), so every time there's a new column, I will have to redraw the old ones again (should I?). But this doesn't look feasible in terms of D3's power or performance.
For this I also applied an anonymous class "update" to every circle drawn, so that I can pick all "update" circles, but here comes another issue, that how would I know the new place for these circles? Do I need to traverse the data again for that particular series? and have to do that drawing again? For every new series, keeping track of old-ones and redrawing them, will increase the processing over-head turn by turn. Is there any handy solution or built-in (d3's) mechanism to re-adjust previous drawing according to new scale?
Please suggest something. I am sure I am lacking some key points.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex