Vega-Lite - One plot for multiple datasets - julia

I'm working on a package for Julia with the goal of doing quick plots using Vega-Lite as backend.
As people familiar with Matplotlib know, it is very common to have different sets for vectors, and plot all of them in the same figure, each with it's own label. For example:
x = range(0,10)
y = np.random.rand(10)
w = range(0,5)
z = np.random.rand(5)
plt.plot(x,y,label = 'y')
plt.plot(w,z,label = 'z')
plt.legend()
What I'd like to know is how can I do something similar, but using Vega-Lite (or Altair).
I know that I can do two separate plots and then add one over another. My problem is mainly about how to get the legends to work, since to get a legend, one usually needs another field
such as "color", pointing to another field in the dataframe.
I've seen similar posts, but dealing with the question of posting data from different columns. The answer to this case is basically to use the Fold Transform. But in my question this doesn't quite work, because I'm more interested in starting from two different plots, possibly using two different datasets, so "merging" the datasets is not a good solution.

You can take advantage of the fact that in composite charts, Vega-Lite uses shared scales by default. If you assign the color, shape, strokeDash, etc. to a unique value for each layer, an appropriate legend will be generated automatically.
Here is an example, using Altair to generate the Vega-Lite specification:
import pandas as pd
import numpy as np
import altair as alt
x = np.linspace(0, 10)
df1 = pd.DataFrame({
'x': x,
'y': np.sin(x)
})
df2 = pd.DataFrame({
'x': x,
'y': np.cos(x)
})
chart1 = alt.Chart(df1).transform_calculate(
label='"sine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
chart2 = alt.Chart(df2).transform_calculate(
label='"cosine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
alt.layer(chart1, chart2)

Related

Linking HoloViews plots with Bokeh customizations

I'm struggling with some of the finer points of complex HoloViews plots, especially linked plots customizing the appearance of fonts and data points.
Using the following code, I can create this plot that has most of the features I want, but am stumped by a few things:
I want one marginal for the whole set of plots linked to 'ewr' (with individual marginals for each of the other axes), ideally on the left of the set; but my attempts to get just one in my definitions of s1 and s2 haven't worked, and I can find nothing in the documentation about moving a marginal to the left (or bottom for that matter).
I want to be able to define tooltips that use columns from my data that are not displayed in the plots. I can see one way of accomplishing this as shown in the commented alternate definition for s1, but that unlinks the plot it creates from the others. How do I create linked plots that have tooltips with elements not in those plots?
For reference, the data used is available here (converted in the code below to a Pandas dataframe, df).
import holoviews as hv
from holoviews import dim, opts
hv.extension('bokeh')
renderer = hv.renderer('bokeh')
from bokeh.models import HoverTool
from holoviews.plotting.links import DataLink
TOOLS="crosshair,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,poly_select,lasso_select".split(",")
ht = HoverTool(
tooltips=[('Name', '#{name}'), ('EWR', '#{ewr}{%0.2f}'), ('Win Rate', '#{winrate}{%d}')],
formatters={'ewr' : 'printf', 'winrate' : 'printf'})
point_opts = opts.Scatter(fill_color='black', fill_alpha=0.1, line_width=1, line_color='gray', size=5, tools=TOOLS+[ht])
hist_opts = opts.Histogram(fill_color='gray', fill_alpha=0.9, line_width=1, line_color='gray', tools=['box_select'], labelled=[None, None])
#s1 = hv.Scatter(df[['kfai','ewr','name','winrate']]).hist(num_bins=51, dimension='kfai')
s1 = hv.Scatter(df, 'kfai','ewr').hist(num_bins=51, dimension='kfai')
s2 = hv.Scatter(df, 'aerc', 'ewr').hist(num_bins=51, dimension=['aerc',None])
s3 = hv.Scatter(df, 'winrate', 'ewr').hist(num_bins=51, dimension=['winrate','ewr'])
p = (s1 + s2 + s3).opts(point_opts, hist_opts, opts.Layout(shared_axes=True, shared_datasource=True))
renderer.save(p, '_testHV')

Custom legend labels - geopandas.plot()

A colleague and I have been trying to set custom legend labels, but so far have failed. Code and details below - any ideas much appreciated!
Notebook: toy example uploaded here
Goal: change default rate values used in the legend to corresponding percentage values
Problem: cannot figure out how to access the legend object or pass legend_kwds to geopandas.GeoDataFrame.plot()
Data: KCMO metro area counties
Excerpts from toy example
Step 1: read data
# imports
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
# read data
gdf = gpd.read_file('kcmo_counties.geojson')
Option 1 - get legend from ax as suggested here:
ax = gdf.plot('val', legend=True)
leg = ax.get_legend()
print('legend object type: ' + str(type(leg))) # <class NoneType>
plt.show()
Option 2: pass legend_kwds dictionary - I assume I'm doing something wrong here (and clearly don't fully understand the underlying details), but the _doc_ from Geopandas's plotting.py - for which GeoDataFrame.plot() is simply a wrapper - does not appear to come through...
# create number of tick marks in legend and set location to display them
import numpy as np
numpoints = 5
leg_ticks = np.linspace(-1,1,numpoints)
# create labels based on number of tickmarks
leg_min = gdf['val'].min()
leg_max = gdf['val'].max()
leg_tick_labels = [str(round(x*100,1))+'%' for x in np.linspace(leg_min,leg_max,numpoints)]
leg_kwds_dict = {'numpoints': numpoints, 'labels': leg_tick_labels}
# error "Unknown property legend_kwds" when attempting it:
f, ax = plt.subplots(1, figsize=(6,6))
gdf.plot('val', legend=True, ax=ax, legend_kwds=leg_kwds_dict)
UPDATE
Just came across this conversation on adding in legend_kwds - and this other bug? which clearly states legend_kwds was not in most recent release of GeoPandas (v0.3.0). Presumably, that means we'll need to compile from the GitHub master source rather than installing with pip/conda...
I've just come across this issue myself. After following your link to the Geopandas source code, it appears that the colourbar is added as a second axis to the figure. so you have to do something like this to access the colourbar labels (assuming you have plotted a chloropleth with legend=True):
# Get colourbar from second axis
colourbar = ax.get_figure().get_axes()[1]
Having done this, you can manipulate the labels like this:
# Get numerical values of yticks, assuming a linear range between vmin and vmax:
yticks = np.interp(colourbar.get_yticks(), [0,1], [vmin, vmax])
# Apply some function f to each tick, where f can be your percentage conversion
colourbar.set_yticklabels(['{0:.2f}%'.format(ytick*100) for ytick in yticks])
This can be done by passing key-value pairs to dictionary argument legend_kwds:
gdf.plot(column='col1', cmap='Blues', alpha=0.5, legend=True, legend_kwds={'label': 'FOO', 'shrink': 0.5}, ax=ax)

how to plot more than two plots using for loop in python?

I'm trying to do 4 plots using for loop.But I'm not sure how to do it.how can I display the plots one by one orderly?or save the figure as png?
Here is my code:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from astropy.io import fits
import pyregion
import glob
# read in the image
xray_name = glob.glob("*.fits")
for filename in xray_name:
f_xray = fits.open(filename)
#name = file_name[:-len('.fits')]
try:
from astropy.wcs import WCS
from astropy.visualization.wcsaxes import WCSAxes
wcs = WCS(f_xray[0].header)
fig = plt.figure()
ax = plt.subplot(projection=wcs)
fig.add_axes(ax)
except ImportError:
ax = plt.subplot(111)
ax.imshow(f_xray[0].data, cmap="summer", vmin=0., vmax=0.00038, origin="lower")
reg_name=glob.glob("*.reg")
for i in reg_name:
r =pyregion.open(i).as_imagecoord(header=f_xray[0].header)
from pyregion.mpl_helper import properties_func_default
# Use custom function for patch attribute
def fixed_color(shape, saved_attrs):
attr_list, attr_dict = saved_attrs
attr_dict["color"] = "red"
kwargs = properties_func_default(shape, (attr_list, attr_dict))
return kwargs
# select region shape with tag=="Group 1"
r1 = pyregion.ShapeList([rr for rr in r if rr.attr[1].get("tag") == "Group 1"])
patch_list1, artist_list1 = r1.get_mpl_patches_texts(fixed_color)
r2 = pyregion.ShapeList([rr for rr in r if rr.attr[1].get("tag") != "Group 1"])
patch_list2, artist_list2 = r2.get_mpl_patches_texts()
for p in patch_list1 + patch_list2:
ax.add_patch(p)
#for t in artist_list1 + artist_list2:
# ax.add_artist(t)
plt.show()
the aim of the code is to plot a region on fits file image,if there is a way to change the color of the background image to white and the brighter (centeral region) as it is would be okay.Thanks
You are using colormap "summer" with provided limits. It is not clear to me what you want to achieve since the picture you posted looks more or less digital black and white pixelwise.
In matplotlib there are built in colormaps, and all of those have a reversed twin.
'summer' has a reversed twin with 'summer_r'
This can be picked up in the mpl docs at multiple spots, like colormap example, or SO answers like this.
Hope that is what you are looking for. For the future, when posting code like this, try to remove all non relevant portions as well as at minimum provide a description of the data format/type. Best is to also include a small sample of the data and it's structure. A piece of code only works together with a set of data, so only sharing one is only half the problem formulation.

bokeh axis limits fail when mixing x_range with y_range across multiple plots

I'm trying to visualize a high-dim point set x (here of dim (6 x 42)) in a series of 2D scatter plots (x[1] vs x[2] etc.) using bokeh. [edit2] See this nice example from scikit-opt as a reference. When x[1] occurs in two plots it should interact with the same range and the plots should rescale simultaneously. I have accomplished this, but I don't get it to scale correctly. Here's a minimal example: [edit2]
import bokeh
import bokeh.io
import numpy as np
import bokeh.plotting
bokeh.io.output_notebook()
# That's my fictional dataset
x = np.random.randn(6, 42)
x[2] *= 10
# Build the pairwise scatter plots
kw = dict(plot_width=165, plot_height=165)
# `ranges` stores the range in each dimension,
# used as both, x- and y-range depending on
# where the variable is.
figs, ranges = {}, {}
for r, row in enumerate(x):
for c, col in enumerate(x):
if r is not c:
fig = bokeh.plotting.figure(
x_range=ranges.get(c, None), y_range=ranges.get(r, None),
**kw)
fig.scatter(x=col, y=row)
fig.xaxis.axis_label = f'Dim {c}'
fig.yaxis.axis_label = f'Dim {r}'
if c not in ranges:
ranges[c] = fig.x_range
if r not in ranges:
ranges[r] = fig.y_range
figs[f'{r}_{c}'] = fig
else:
break
# Setup the plotting layout
plots = [[]]
for r, row in enumerate(x):
for c, col in enumerate(x):
if r is not c:
plots[-1].append(figs[f'{r}_{c}'])
else:
plots.append([])
break
staircase = bokeh.layouts.gridplot(plots, **kw)
bokeh.plotting.show(staircase)
.. into an ipython notebook (>=py3.6), bokeh sets the scale for dim 1, and 2 correctly. Then, it starts to set the scale for the following dimensions as in dim 2. Notice that I scaled dim 2 10-fold to make this point.
Interactively, I can rescale the plot back to optimal settings. However, I'd like to do that by default. What options do I have inside bokeh to rescale? I played a bit with fig.xaxis.bounds, but unsuccessfully. Thanks for your help!
Epilogue:
Following #bigreddot's answer, I added the lines:
for i, X in enumerate(x):
ranges[i].start = X.min()
ranges[i].end = X.max()
to fix the starting ranges. I still think that the behaviour is a bug.
From your code and description I still can't quite tell what you are hoping to accomplish. [1] But I will state that the default DataRange1d ranges that plot's use automatically make space for all renderers, across all plots they are shared by. In this sense, I see exactly what I would expect when I run your code. If you want something different, there are two things you could control:
DataRange1d has a .renderers property. If you only want the "auto" ranging to be over a subset of the renderers, then you can explicitly set this property to the list you want. Renderers are returned by the glyph functions, e.g. fig.scatter
Don't use the "auto" ranges. You can also set the x_range and y_range yourself to be Range1d objects. These have start and end properties that you can set, and these will be the definite bounds of the range, e.g. x-range=Range1d(0, 10)
[1] The ranges are linked in what I would consider an odd way, and I can't tell if that is intended. But that is a result of your looping/python code and not Bokeh.

ggplot2 equivalent of 'factorization or categorization' in googleVis in R

Due to static graph prepared by ggplot, we are shifting our graphs to googleVis with interactive charts. But when it comes to categorization we are facing many problems. Let me give example which will help you understand:
#dataframe
df = data.frame( x = sample(1:100), y = sample(1:100), cat = sample(c('a','b','c'), 100, replace=TRUE) )
ggplot2 provides parameter like alpha, colour, linetype, size which we can use with categories like shown below:
ggplot(df) + geom_line(aes(x = x, y = y, colour = cat))
Not just line chart, but majority of ggplot2 graphs provide categorization based on column values. Now I would like to do the same in googleVis, based on value df$cat I would like parameters to get changed or grouping of line or charts.
Note:
I have already tried dcast to make multiple columns based on category column and use those multiple columns as Y input, but that it not what I would like to do.
Can anyone help me regarding this?
Let me know if you need more information.
vrajs5 you are not alone! We struggled with this issue. In our case we wanted to fill bar charts like in ggplot. This is the solution. You need to add specifically named columns, linked to your variables, to your data table for googleVis to pick up.
In my fill example, these are called roles, but once you see my syntax you can abstract it to annotations and other cool features. Google has them all documented here (check out superheroes example!) but it was not obvious how it applied to r.
#mages has this documented on this webpage, which shows features not in demo(googleVis):
http://cran.r-project.org/web/packages/googleVis/vignettes/Using_Roles_via_googleVis.html
EXAMPLE ADDING NEW DIMENSIONS TO GOOGLEVIS CHARTS
# in this case
# How do we fill a bar chart showing bars depend on another variable?
# We wanted to show C in a different fill to other assets
suppressPackageStartupMessages(library(googleVis))
library(data.table) # You can use data frames if you don't like DT
test.dt = data.table(px = c("A","B","C"), py = c(1,4,9),
"py.style" = c('silver', 'silver', 'gold'))
# Add your modifier to your chart as a new variable e.g. py1.style
test <-gvisBarChart(test.dt,
xvar = "px",
yvar = c("py", "py.style"),
options = list(legend = 'none'))
plot(test)
We have shown py.style deterministically here, but you could code it to be dependent on your categories.
The secret is myvar.googleVis_thing_youneed linking the variable myvar to the googleVis feature.
RESULT BEFORE FILL (yvar = "py")
RESULT AFTER FILL (yvar = c("py", "py.style"))
Take a look at mages examples (code also on Github) and you will have cracked the "categorization based on column values" issue.

Resources