bokeh axis limits fail when mixing x_range with y_range across multiple plots - bokeh

I'm trying to visualize a high-dim point set x (here of dim (6 x 42)) in a series of 2D scatter plots (x[1] vs x[2] etc.) using bokeh. [edit2] See this nice example from scikit-opt as a reference. When x[1] occurs in two plots it should interact with the same range and the plots should rescale simultaneously. I have accomplished this, but I don't get it to scale correctly. Here's a minimal example: [edit2]
import bokeh
import bokeh.io
import numpy as np
import bokeh.plotting
bokeh.io.output_notebook()
# That's my fictional dataset
x = np.random.randn(6, 42)
x[2] *= 10
# Build the pairwise scatter plots
kw = dict(plot_width=165, plot_height=165)
# `ranges` stores the range in each dimension,
# used as both, x- and y-range depending on
# where the variable is.
figs, ranges = {}, {}
for r, row in enumerate(x):
for c, col in enumerate(x):
if r is not c:
fig = bokeh.plotting.figure(
x_range=ranges.get(c, None), y_range=ranges.get(r, None),
**kw)
fig.scatter(x=col, y=row)
fig.xaxis.axis_label = f'Dim {c}'
fig.yaxis.axis_label = f'Dim {r}'
if c not in ranges:
ranges[c] = fig.x_range
if r not in ranges:
ranges[r] = fig.y_range
figs[f'{r}_{c}'] = fig
else:
break
# Setup the plotting layout
plots = [[]]
for r, row in enumerate(x):
for c, col in enumerate(x):
if r is not c:
plots[-1].append(figs[f'{r}_{c}'])
else:
plots.append([])
break
staircase = bokeh.layouts.gridplot(plots, **kw)
bokeh.plotting.show(staircase)
.. into an ipython notebook (>=py3.6), bokeh sets the scale for dim 1, and 2 correctly. Then, it starts to set the scale for the following dimensions as in dim 2. Notice that I scaled dim 2 10-fold to make this point.
Interactively, I can rescale the plot back to optimal settings. However, I'd like to do that by default. What options do I have inside bokeh to rescale? I played a bit with fig.xaxis.bounds, but unsuccessfully. Thanks for your help!
Epilogue:
Following #bigreddot's answer, I added the lines:
for i, X in enumerate(x):
ranges[i].start = X.min()
ranges[i].end = X.max()
to fix the starting ranges. I still think that the behaviour is a bug.

From your code and description I still can't quite tell what you are hoping to accomplish. [1] But I will state that the default DataRange1d ranges that plot's use automatically make space for all renderers, across all plots they are shared by. In this sense, I see exactly what I would expect when I run your code. If you want something different, there are two things you could control:
DataRange1d has a .renderers property. If you only want the "auto" ranging to be over a subset of the renderers, then you can explicitly set this property to the list you want. Renderers are returned by the glyph functions, e.g. fig.scatter
Don't use the "auto" ranges. You can also set the x_range and y_range yourself to be Range1d objects. These have start and end properties that you can set, and these will be the definite bounds of the range, e.g. x-range=Range1d(0, 10)
[1] The ranges are linked in what I would consider an odd way, and I can't tell if that is intended. But that is a result of your looping/python code and not Bokeh.

Related

Plot not showing in Julia

I have a file named mycode.jl with following code taken from here.
using MultivariateStats, RDatasets, Plots
# load iris dataset
println("loading iris dataset:")
iris = dataset("datasets", "iris")
println(iris)
println("loaded; splitting dataset: ")
# split half to training set
Xtr = Matrix(iris[1:2:end,1:4])'
Xtr_labels = Vector(iris[1:2:end,5])
# split other half to testing set
Xte = Matrix(iris[2:2:end,1:4])'
Xte_labels = Vector(iris[2:2:end,5])
print("split; Performing PCA: ")
# Suppose Xtr and Xte are training and testing data matrix, with each observation in a column. We train a PCA model, allowing up to 3 dimensions:
M = fit(PCA, Xtr; maxoutdim=3)
println(M)
# Then, apply PCA model to the testing set
Yte = predict(M, Xte)
println(Yte)
# And, reconstruct testing observations (approximately) to the original space
Xr = reconstruct(M, Yte)
println(Xr)
# Now, we group results by testing set labels for color coding and visualize first 3 principal components in 3D plot
println("Plotting fn:")
setosa = Yte[:,Xte_labels.=="setosa"]
versicolor = Yte[:,Xte_labels.=="versicolor"]
virginica = Yte[:,Xte_labels.=="virginica"]
p = scatter(setosa[1,:],setosa[2,:],setosa[3,:],marker=:circle,linewidth=0)
scatter!(versicolor[1,:],versicolor[2,:],versicolor[3,:],marker=:circle,linewidth=0)
scatter!(virginica[1,:],virginica[2,:],virginica[3,:],marker=:circle,linewidth=0)
plot!(p,xlabel="PC1",ylabel="PC2",zlabel="PC3")
println("Reached end of program.")
I run above code with command on Linux terminal: julia mycode.jl
The code runs all right and reaches the end but the plot does not appear.
Where is the problem and how can it be solved.
As the Output section of the Plots docs says:
A Plot is only displayed when returned (a semicolon will suppress the return), or if explicitly displayed with display(plt), gui(), or by adding show = true to your plot command.
You can have MATLAB-like interactive behavior by setting the default value: default(show = true)
The first part about "when returned" is about when you call plot from the REPL (or Jupyter, etc.), and doesn't apply here.
Here, you can use one of the other options:
calling display(p) after the last plot! call (this is the most common way to do it)
calling gui() after the last plot!
adding a show = true argument to the last plot! call
setting the default to always show the plot by setting Plots.default(show = true) at the beginning of the script
Any one of these is sufficient to make the plot window appear.
The plot closes when the Julia process ends, if that's happening too soon, you can either:
Run your code as julia -i mycode.jl at the terminal - this will run your code, display the plot, and then land you at the Julia REPL. This will both keep the plot open, and let you work with the variables in your code further if you need to.
add a readline() call at the end of your program. This will keep Julia waiting for an extra press of newline/Enter/Return key, and the plot will remain in display until you press that.
(Credit to ffevotte on Julia Discourse for these suggestions.)

Vega-Lite - One plot for multiple datasets

I'm working on a package for Julia with the goal of doing quick plots using Vega-Lite as backend.
As people familiar with Matplotlib know, it is very common to have different sets for vectors, and plot all of them in the same figure, each with it's own label. For example:
x = range(0,10)
y = np.random.rand(10)
w = range(0,5)
z = np.random.rand(5)
plt.plot(x,y,label = 'y')
plt.plot(w,z,label = 'z')
plt.legend()
What I'd like to know is how can I do something similar, but using Vega-Lite (or Altair).
I know that I can do two separate plots and then add one over another. My problem is mainly about how to get the legends to work, since to get a legend, one usually needs another field
such as "color", pointing to another field in the dataframe.
I've seen similar posts, but dealing with the question of posting data from different columns. The answer to this case is basically to use the Fold Transform. But in my question this doesn't quite work, because I'm more interested in starting from two different plots, possibly using two different datasets, so "merging" the datasets is not a good solution.
You can take advantage of the fact that in composite charts, Vega-Lite uses shared scales by default. If you assign the color, shape, strokeDash, etc. to a unique value for each layer, an appropriate legend will be generated automatically.
Here is an example, using Altair to generate the Vega-Lite specification:
import pandas as pd
import numpy as np
import altair as alt
x = np.linspace(0, 10)
df1 = pd.DataFrame({
'x': x,
'y': np.sin(x)
})
df2 = pd.DataFrame({
'x': x,
'y': np.cos(x)
})
chart1 = alt.Chart(df1).transform_calculate(
label='"sine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
chart2 = alt.Chart(df2).transform_calculate(
label='"cosine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
alt.layer(chart1, chart2)

Custom legend labels - geopandas.plot()

A colleague and I have been trying to set custom legend labels, but so far have failed. Code and details below - any ideas much appreciated!
Notebook: toy example uploaded here
Goal: change default rate values used in the legend to corresponding percentage values
Problem: cannot figure out how to access the legend object or pass legend_kwds to geopandas.GeoDataFrame.plot()
Data: KCMO metro area counties
Excerpts from toy example
Step 1: read data
# imports
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
# read data
gdf = gpd.read_file('kcmo_counties.geojson')
Option 1 - get legend from ax as suggested here:
ax = gdf.plot('val', legend=True)
leg = ax.get_legend()
print('legend object type: ' + str(type(leg))) # <class NoneType>
plt.show()
Option 2: pass legend_kwds dictionary - I assume I'm doing something wrong here (and clearly don't fully understand the underlying details), but the _doc_ from Geopandas's plotting.py - for which GeoDataFrame.plot() is simply a wrapper - does not appear to come through...
# create number of tick marks in legend and set location to display them
import numpy as np
numpoints = 5
leg_ticks = np.linspace(-1,1,numpoints)
# create labels based on number of tickmarks
leg_min = gdf['val'].min()
leg_max = gdf['val'].max()
leg_tick_labels = [str(round(x*100,1))+'%' for x in np.linspace(leg_min,leg_max,numpoints)]
leg_kwds_dict = {'numpoints': numpoints, 'labels': leg_tick_labels}
# error "Unknown property legend_kwds" when attempting it:
f, ax = plt.subplots(1, figsize=(6,6))
gdf.plot('val', legend=True, ax=ax, legend_kwds=leg_kwds_dict)
UPDATE
Just came across this conversation on adding in legend_kwds - and this other bug? which clearly states legend_kwds was not in most recent release of GeoPandas (v0.3.0). Presumably, that means we'll need to compile from the GitHub master source rather than installing with pip/conda...
I've just come across this issue myself. After following your link to the Geopandas source code, it appears that the colourbar is added as a second axis to the figure. so you have to do something like this to access the colourbar labels (assuming you have plotted a chloropleth with legend=True):
# Get colourbar from second axis
colourbar = ax.get_figure().get_axes()[1]
Having done this, you can manipulate the labels like this:
# Get numerical values of yticks, assuming a linear range between vmin and vmax:
yticks = np.interp(colourbar.get_yticks(), [0,1], [vmin, vmax])
# Apply some function f to each tick, where f can be your percentage conversion
colourbar.set_yticklabels(['{0:.2f}%'.format(ytick*100) for ytick in yticks])
This can be done by passing key-value pairs to dictionary argument legend_kwds:
gdf.plot(column='col1', cmap='Blues', alpha=0.5, legend=True, legend_kwds={'label': 'FOO', 'shrink': 0.5}, ax=ax)

Multiple histograms in Julia using Plots.jl

I am working with a large number of observations and to really get to know it I want to do histograms using Plots.jl
My question is how I can do multiple histograms in one plot as this would be really handy. I have tried multiple things already, but I am a bit confused with the different plotting sources in julia (plots.jl, pyplot, gadfly,...).
I don't know if it would help for me to post some of my code, as this is a more general question. But I am happy to post it, if needed.
There is an example that does just this:
using Plots
pyplot()
n = 100
x1, x2 = rand(n), 3rand(n)
# see issue #186... this is the standard histogram call
# our goal is to use the same edges for both series
histogram(Any[x1, x2], line=(3,0.2,:green), fillcolor=[:red :black], fillalpha=0.2)
I looked for "histograms" in the Plots.jl repo, found this related issue and followed the links to the example.
With Plots, there are two possibilities to show multiple series in one plot:
First, you can use a matrix, where each column constitutes a separate series:
a, b, c = randn(100), randn(100), randn(100)
histogram([a b c])
Here, hcat is used to concatenate the vectors (note the spaces instead of commas).
This is equivalent to
histogram(randn(100,3))
You can apply options to the individual series using a row matrix:
histogram([a b c], label = ["a" "b" "c"])
(Again, note the spaces instead of commas)
Second, you can use plot! and its variants to update a previous plot:
histogram(a) # creates a new plot
histogram!(b) # updates the previous plot
histogram!(c) # updates the previous plot
Alternatively, you can specify which plot to update:
p = histogram(a) # creates a new plot p
histogram(b) # creates an independent new plot
histogram!(p, c) # updates plot p
This is useful if you have several subplots.
Edit:
Following Felipe Lema's links, you can implement a recipe for histograms that share the edges:
using StatsBase
using PlotRecipes
function calcbins(a, bins::Integer)
lo, hi = extrema(a)
StatsBase.histrange(lo, hi, bins) # nice edges
end
calcbins(a, bins::AbstractVector) = bins
#userplot GroupHist
#recipe function f(h::GroupHist; bins = 30)
args = h.args
length(args) == 1 || error("GroupHist should be given one argument")
bins = calcbins(args[1], bins)
seriestype := :bar
bins, mapslices(col -> fit(Histogram, col, bins).weights, args[1], 1)
end
grouphist(randn(100,3))
Edit 2:
Because it is faster, I changed the recipe to use StatsBase.fit for creating the histogram.

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources