How to create a hybrid categorical / linear axis / ticker in bokeh - bokeh

I have process data of a machine that produces stuff in batches. Inside each batch, there's some continuous quantities that I want to plot over time across batches.
Think of a 3D printer printing cylinders as an example. The quantities that I'd plot would be the x and y position of the print head. With this, the plot would look like this (except not hand-drawn, of course):
The batches (indicated as #1, #2, #3 here, and plotted with different colours) map to CategoricalAxis, and the time to LinearAxis. Is it possible to create a hybrid one, or have a hybrid ticker that indicates both?
OP of this question might want to achieve something similar, so these questions might be related

Related

Box plots, plots in octave

I'm new to Octave, so there are many confusing things for me, and I've never done computer programming before so most of the language is also confusing.
I have sets of data c_o, m_o, y_o, k_o as 144 x 1 matrices (column vectors?)
Box plots
Using examples I found online, I wrote this:
axis ([0,5]);
boxplot (c_o, m_o, y_o, k_o);
set(gca (), "xtick", [1,2,3,4], "xticklabel", {"cyan", "magenta", "yellow", "key"});
However, it results in an error
Boxplot.m: grouping vector may only be passed as second arg
I have no idea what this means.
Plots
I'm trying to figure out how to plot multiple data sets with different colors.
For example,
figure (1); plot (c_o , "c");
works perfectly fine.
However, I'd like to remove the horizontal axis, change the horizontal axis from [0,200] to [0,150] , and plot multiple sets of data on the same plot (not multiple plots in the same figure, but the different data on the same set of axis). I haven't been able to find out how, though.
For the record, I do know that there are probably other programming languages more suited for statistical analysis; it just so happens that my first use of this happened to be statistical in nature.

Making a 'flip-book' type animation using density plots from R

I'm new to R, but have worked out how to graph the distribution of my students' grades for a given term using a density plot, and have made some ridgeline plots to show how the distribution evolves throughout the academic year.
I'm thinking it might be fun (and make the graphs easier to interpret) if I could make a kind of flip-book animation that went from one terms grades to the next, relatively quickly, to see how the distribution changes. At its simplest, I could just pop these distribution plots into Powerpoint and just scroll through the pages, but I'm wondering what commands I need to put into R's ggplot command to ensure that the axes/scaling from one chart to the next stays consistent from one chart to the next?
At the moment, I'm just making a simple chart using this command, where HT102 is the data from the 2nd term of Year 10, and A8 is a vector containing all the (numeric) grades. I am then doing the same thing with another set of grades called ht103, and so on...
ggplot(ht102, aes(x = A8)) +
geom_density(alpha=.3)
What would you recommend to keep the scaling consistent, and any thoughts on a better way to animate this than just popping them into powerpoint?

Is there a way to create a geom_path heatmap in ggplot?

For example, this is a heatmap from a website using GPS data:
I have gotten some degree of success with adding a weight parameter to each vertex and calculating the number of events that have vertices near those, but that takes a long time, especially with a large amount of data. It also appears a bit spotty when the distance between vertices is a bit wonky, which causes random splotches of different colors throughout the heatmap. It looks kind of cool, but it makes the data a bit harder to read.
When you zoom out, it looks a bit more continuous due to the paths overlapping more.
In R, the closest I can do to this involves using an alpha channel, but that only gets me a monochromatic heatmap, which is not always desirable, especially when you want to see lesser-traveled paths visibly. In theory I could do two lines to resolve the visibility part (first opaque, second semi-transparent), but I would like to be able to have different hue values.
Ideally I would like this to work with ggplot, but if it cannot, I would accept other methods, provided they are reasonably quick computationally.
Edit: The data format is a data frame with sequential (latitude, longitude) coordinate pairs, along with some associated data that can be used for filter & grouping (such as activity type and event ID).
Here is a sample of the data for the region displayed in the above images (~1.5 MB):
https://www.dropbox.com/s/13p2jtz4760m26d/sample_coordinate_data.csv?dl=0
I would try something like
ggplot() + geom_count(data, aes(longitude, latitude, alpha=..prop..))
but you need to show some data to check how it works.

R plotting strangeness with large dataset

I have a data frame with several million points in it - each having two values.
When I plot this like this:
plot(myData)
All the points are plotted, but the plot is quite busy, so I thought I'd plot it as a line:
plot(myData, type="l")
But while the x axis doesn't change (i.e. goes from 0 to 7e+07), the actual plotting stops at about 3e+07 and I don't actually get a proper line plot either.
Is there a limitation on line plotting?
Update
If I use
plot(myData, type="h")
I get correct and useable output, but I still wonder why the type="l" option fails so badly.
Further update
I am plotting a time series - here is one output using type="h":
That's perfectly usable, but having a line would allow me to compare several outputs.
High dimensional data graphic representation is growing issue in data analysis. The problem, actually, is not create the graph. The problem is make the graph capable of communicate information that we could transform in useful knowledge. Allow me to present an example to produce this point, by considering a data with a million observations, that is, not that big.
x <- rnorm(10^6, 0, 1)
y <- rnorm(10^6, 0, 1)
Let's plot it. R can yes easily manage such a problem. But can we? Probably not.
Afterall, what kind of information can we deduce from an ink hard stain? Probably, no more than a tasseographyst trying to divinate the future in patterns of tea leaves, coffee grounds, or wine sediments.
plot(x, y)
A different approach is represented by the smoothScatter function. It creates a density plot of bivariate data. There, we create two examples.
First, with defaults.
smoothScatter(x, y)
Second, the bandwidth was specified to be a little larger than the default, and five points are specified to be shown using a different symbol pch = 3.
smoothScatter(x, y, bandwidth=c(5,1)/(1/3), nrpoints=5, pch=3)
As you can see, the problem is not solved. Nevertheless, we can have a better grasp on the distribution of our data. This kind of approach is still in development, and there are several matters that are discussed and evolved. If this approach represents a more suitable approach to represent your big dataset, I suggest you to visit this blog that discuss throughfully the issue.
For what it's worth, all the evidence I have is that is computer - even though it was a lump of big iron - ran out of memory.

R: update plot [xy]lims with new points() or lines() additions?

Background:
I'm running a Monte Carlo simulation to show that a particular process (a cumulative mean) does not converge over time, and often diverges wildly in simulation (the expectation of the random variable = infinity). I want to plot about 10 of these simulations on a line chart, where the x axis has the iteration number, and the y axis has the cumulative mean up to that point.
Here's my problem:
I'll run the first simulation (each sim. having 10,000 iterations), and build the main plot based on its current range. But often one of the simulations will have a range a few orders of magnitude large than the first one, so the plot flies outside of the original range. So, is there any way to dynamically update the ylim or xlim of a plot upon adding a new set of points or lines?
I can think of two workarounds for this: 1. store each simulation, then pick the one with the largest range, and build the base graph off of that (not elegant, and I'd have to store a lot of data in memory, but would probably be laptop-friendly [[EDIT: as Marek points out, this is not a memory-intense example, but if you know of a nice solution that'd support far more iterations such that it becomes an issue (think high dimensional walks that require much, much larger MC samples for convergence) then jump right in]]) 2. find a seed that appears to build a nice looking version of it, and set the ylim manually, which would make the demonstration reproducible.
Naturally I'm holding out for something more elegant than my workarounds. Hoping this isn't too pedestrian a problem, since I imagine it's not uncommon with simulations in R. Any ideas?
I'm not sure if this is possible using base graphics, if someone has a solution I'd love to see it. However graphics systems based on grid (lattice and ggplot2) allow the graphics object to be saved and updated. It's insanely easy in ggplot2.
require(ggplot2)
make some data and get the range:
foo <- as.data.frame(cbind(data=rnorm(100), numb=seq_len(100)))
make an initial ggplot object and plot it:
p <- ggplot(as.data.frame(foo), aes(numb, data)) + layer(geom='line')
p
make some more data and add it to the plot
foo <- as.data.frame(cbind(data=rnorm(200), numb=seq_len(200)))
p <- p + geom_line(aes(numb, data, colour="red"), data=as.data.frame(foo))
plot the new object
p
I think (1) is the best option. I actually don't think this isn't elegant. I think it would be more computationally intensive to redraw every time you hit a point greater than xlim or ylim.
Also, I saw in Peter Hoff's book about Bayesian statistics a cool use of ts() instead of lines() for cumulative sums/means. It looks pretty spiffy:

Resources