Fine-grained control over plotting order in Gadfly? - julia

I'm creating a scatterplot that looks something like this:
using DataFrames
using Gadfly
using ColorBrewer
using Distributions
colors = palette("Set1", 4)
df1 = DataFrame(rand(Normal(0, 0.5), 1000,2))
df1[:x3] = :a
df2 = DataFrame(rand(Normal(-0.25, 0.25), 500,2))
df2[:x3] = :b
df3 = DataFrame(rand(Normal(0.25, 0.25), 500,2))
df3[:x3] = :c
df4 = DataFrame(rand(Normal(0, 0.25), 500,2))
df4[:x3] = :d
df = vcat(df1, df2, df3, df4)
plot(df, x=:x1, y=:x2, color=:x3, Geom.point, Scale.color_discrete_manual(colors..., levels=[:b, :c, :d, :a]),
Theme(highlight_width=0pt))
I want the points plotted from front to back in this order [:d, :b, :c, :a] so that the larger number of points in :a are in the back. So why do I have to specify the order as levels=[:b, :c, :d, :a] do get my desired result. What's the discrepancy here?
Also, interestingly it seems as though the order depends on what colors are used!? as trying different colors from ColorBrewer leads to different ordering results, which is probably a bug. Relevant issue: https://github.com/dcjones/Gadfly.jl/issues/858

FWIW, I have given up trying to fully control what layers overwrite what layers when using Gadfly.
Maybe it is even quite difficult to do, because sometimes my exact same code, run twice, produces two slightly different figures, in which the order in which a layer overwrites another layer has changed, apparently randomly. This happens at least when I send the output figure to a postscript file via Gadfly.draw(PS(file, size...), p), which is what I usually do.
Of course this may improve in the future. I presently use Gadfly 0.5.2 in Julia 0.5.0 under Windows 10, 64 bits.

Related

Have two (or more) node label sets in Julia GraphPlot maybe using Compose?

Here is a minimal working code from Julia Discourse:
using LightGraphs
using GraphPlot
using Colors
g = graphfamous("karate")
membership = [1,1,1,1,1,1,1,1,2,1,1,1,1,1,2,2,1,1,2,1,2,1,2,1,1,1,1,2,1,1,2,1,1,1]
nodelabels = 1:34
nodecolor = [colorant"lightgrey", colorant"orange"]
nodefillc = nodecolor[membership]
colors = [colorant"lightgray" for i in 1:78]
colors[42] = colorant"orange"
gplot(g, nodefillc=nodefillc, layout=circular_layout, edgestrokec=colors, nodelabel=nodelabels)
Which produces:
I succeed to have node labels, from 1 to 34, however, I need to display another type of labels for some specific nodes. e.g., the weight of some nodes. That is, I need, for instance, the weight of node 19 is 100 and the weight of node 1 is 0.001.
Is there a way to display such data? I could node find a relevant keyword in GraphPlot (only nodelabel only accepts a Vector) and I could not find another Julia package that could do it for plotting graphs.
EDIT thanks to #Dan Getz, before posting on SE, I had the same idea as he suggested: try to label the nodes with a string of the format "$i\n $weight"
However, the result is highly unsatisfying as you can see in this picture of one of my actual graphs. Node 12 in Orange, separated from its weight 177.0 with \n is not really nice to read!
EDIT thanks to #Przemyslaw Szufel maybe my question could be resolved with Compose (that I actually already use) which is a graphic backend for GraphPlot. Unfortunately it is a bit undocumented despite I and other people asking about it!
You could use GraphMakie.jl, which is also compatible with (Light)Graphs.jl and possibly a bit more flexible than GraphPlot.jl.
using Graphs, GraphMakie, Colors
g = smallgraph(:karate)
membership = [1,1,1,1,1,1,1,1,2,1,1,1,1,1,2,2,1,1,2,1,2,1,2,1,1,1,1,2,1,1,2,1,1,1]
nodelabels = repr.(collect(1:34))
nodecolor = [colorant"lightgrey", colorant"orange"]
nodefillc = nodecolor[membership]
colors = [colorant"lightgray" for i in 1:78]
colors[42] = colorant"orange"
fig = Figure(resolution=(500,500))
ax = Axis(fig[1,1])
pos = Shell()(g) # = circular layout
graphplot!(ax, g,
layout=_->pos,
edge_color=colors,
node_color=nodefillc,
node_size=30,
nlabels=nodelabels,
nlabels_align=(:center, :center)
)
hidedecorations!(ax)
hidespines!(ax)
# add additional annotation to node 17
weightOffset = Point2(0, 0.045)
text!(ax, "0.001", position=pos[17] - weightOffset, space=:data, align=(:center, :top), fontsize=10)
display(fig)

Vega-Lite - One plot for multiple datasets

I'm working on a package for Julia with the goal of doing quick plots using Vega-Lite as backend.
As people familiar with Matplotlib know, it is very common to have different sets for vectors, and plot all of them in the same figure, each with it's own label. For example:
x = range(0,10)
y = np.random.rand(10)
w = range(0,5)
z = np.random.rand(5)
plt.plot(x,y,label = 'y')
plt.plot(w,z,label = 'z')
plt.legend()
What I'd like to know is how can I do something similar, but using Vega-Lite (or Altair).
I know that I can do two separate plots and then add one over another. My problem is mainly about how to get the legends to work, since to get a legend, one usually needs another field
such as "color", pointing to another field in the dataframe.
I've seen similar posts, but dealing with the question of posting data from different columns. The answer to this case is basically to use the Fold Transform. But in my question this doesn't quite work, because I'm more interested in starting from two different plots, possibly using two different datasets, so "merging" the datasets is not a good solution.
You can take advantage of the fact that in composite charts, Vega-Lite uses shared scales by default. If you assign the color, shape, strokeDash, etc. to a unique value for each layer, an appropriate legend will be generated automatically.
Here is an example, using Altair to generate the Vega-Lite specification:
import pandas as pd
import numpy as np
import altair as alt
x = np.linspace(0, 10)
df1 = pd.DataFrame({
'x': x,
'y': np.sin(x)
})
df2 = pd.DataFrame({
'x': x,
'y': np.cos(x)
})
chart1 = alt.Chart(df1).transform_calculate(
label='"sine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
chart2 = alt.Chart(df2).transform_calculate(
label='"cosine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
alt.layer(chart1, chart2)

Multiple histograms in Julia using Plots.jl

I am working with a large number of observations and to really get to know it I want to do histograms using Plots.jl
My question is how I can do multiple histograms in one plot as this would be really handy. I have tried multiple things already, but I am a bit confused with the different plotting sources in julia (plots.jl, pyplot, gadfly,...).
I don't know if it would help for me to post some of my code, as this is a more general question. But I am happy to post it, if needed.
There is an example that does just this:
using Plots
pyplot()
n = 100
x1, x2 = rand(n), 3rand(n)
# see issue #186... this is the standard histogram call
# our goal is to use the same edges for both series
histogram(Any[x1, x2], line=(3,0.2,:green), fillcolor=[:red :black], fillalpha=0.2)
I looked for "histograms" in the Plots.jl repo, found this related issue and followed the links to the example.
With Plots, there are two possibilities to show multiple series in one plot:
First, you can use a matrix, where each column constitutes a separate series:
a, b, c = randn(100), randn(100), randn(100)
histogram([a b c])
Here, hcat is used to concatenate the vectors (note the spaces instead of commas).
This is equivalent to
histogram(randn(100,3))
You can apply options to the individual series using a row matrix:
histogram([a b c], label = ["a" "b" "c"])
(Again, note the spaces instead of commas)
Second, you can use plot! and its variants to update a previous plot:
histogram(a) # creates a new plot
histogram!(b) # updates the previous plot
histogram!(c) # updates the previous plot
Alternatively, you can specify which plot to update:
p = histogram(a) # creates a new plot p
histogram(b) # creates an independent new plot
histogram!(p, c) # updates plot p
This is useful if you have several subplots.
Edit:
Following Felipe Lema's links, you can implement a recipe for histograms that share the edges:
using StatsBase
using PlotRecipes
function calcbins(a, bins::Integer)
lo, hi = extrema(a)
StatsBase.histrange(lo, hi, bins) # nice edges
end
calcbins(a, bins::AbstractVector) = bins
#userplot GroupHist
#recipe function f(h::GroupHist; bins = 30)
args = h.args
length(args) == 1 || error("GroupHist should be given one argument")
bins = calcbins(args[1], bins)
seriestype := :bar
bins, mapslices(col -> fit(Histogram, col, bins).weights, args[1], 1)
end
grouphist(randn(100,3))
Edit 2:
Because it is faster, I changed the recipe to use StatsBase.fit for creating the histogram.

Color option in xtsExtra

I am having trouble adjusting the colors of a multiple time series plot using xtsExtra.
This is the code of a minimal example:
require("xtsExtra")
n <- 50
data <- replicate(2, rnorm(n))
my.ts <- as.xts(ts(data, start=Sys.Date()-n, end=Sys.Date()))
plot.zoo(my.ts, col = c('blue', 'green'))
plot.xts(my.ts, col = c('blue', 'green'))
The plot.zoo commands yields
,
whereas the plot command from the xtsExtra package results in
.
In the second plot, the two time series are nicely overlaid, but seem insensitive to the col option.
I'm using the latest version 0.0-1 of the xtsExtra package (rev. 862).
It is my understanding that the xts and xtsExtra packages are designed as extensions of zoo and should work with the same arguments (plus many additional ones). Even though I can get the same overlay behavior in plot.zoo using the screens option, I cannot really resort to using it because the call to plot.xts that causes my problems is within the quantstrat package (functions chart.forward.training and chart.forward.testing for example) which I'd loathe to modify. (Incidentally, the dev.new() in these functions is causing me trouble as well.)
Question: Why does plot from the xtsExtra package seem not to respond to the col= option and what can be done about it, if modifying
the call to the function is not a real option?
Q1. If you take time to read the help text for plot.xts, you see that the function does not have a col argument. Together with the fact that partial matching of argument names doesn't seem to be allowed in the function, it explains why plot.xts it does not respond col =.
Compare with a case where partial matching works:
plot(x = 1:2, y = 1:2, type = "b"); plot(x = 1:2, y = 1:2, ty = "b"); "ty" matches "type".
See here: "If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched".
Q2. Instead you may use the colorset argument:
"color palette to use, set by default to rational choices" (colorset = 1:12).
plot.xts(my.ts, colorset = c('blue', 'green'))

What does negative length vectors in a wireframe plot (lattice package) means?

I want to plot a wireframe in R using the lattice package. However, I get the following error message "error using packet 1 negative length vectors are not allowed". The data looks like the following:
> result_mean
experiment alpha beta packet
1 0 1.0 1 3.000000
2 0 1.1 1 2.571429
The command to create the data is the following
png(file=paste("foobar.png"),width=1280, height=1280);
plot <- wireframe(result_mean$packet ~ result_mean$alpha * result_mean$beta,
data=result_mean, scales = list(arrows=FALSE, cex= .45, col = "black", font = 3),
drape = TRUE, colorkey = TRUE, main = "Foo",
col.regions = terrain.colors(100),
screen = list(z = -60, x = -60),
xlab="alpha", ylab="beta", zlab="mean \npackets");
print(plot);
dev.off();
I'm wondering what this error message means and if there is a good way to debug this?
Thanks in advance!
Debugging lattice graphics is a bit difficult because (a) the code is complex and multi-layered and (b) the errors get trapped in a way that makes them hard to intercept. However, you can at least get some way in diagnosing the problem.
First create a minimal example. I suspected that your problem was that your data fall on a single line, so I created data that looked like that:
d <- data.frame(x=c(1,1.1),
y=c(1,1),
z=c(2,3))
library(lattice)
wireframe(z~y*x,data=d)
Now confirm that fully three-dimensional data (data that define a plane) work just fine:
d2 <- data.frame(expand.grid(x=c(1,1.1),
y=c(1,1.1)),
z=1:4)
wireframe(z~y*x,data=d2)
So the question is really -- did you intend to draw a wireframe of two points lying on a line? If so, what did you want to have appear in the plot? You could hack things a little bit to set the y values to differ by a tiny bit -- I tried it, though, and got no wireframe appearing (but no error either).
edit: I did a bit more tracing, with various debug() incantations (and searching the source code of the lattice package and R itself for "negative length") to deduce the following: within a function called lattice:::panel.3dwire, there is a call to a C function wireframePanelCalculations, which you can see at https://r-forge.r-project.org/scm/viewvc.php/pkg/src/threeDplot.c?view=markup&root=lattice
Within this function:
nh = (nx-1) * (ny-1) * ng; /* number of quadrilaterals */
sHeights = PROTECT(allocVector(REALSXP, nh));
In this case nx is zero, so this code is asking R to allocate a negative-length vector, which is where the error comes from.
In this case, though, I think the diagnosis is more useful than the explicit debugging.

Resources