I'm using holoviews with bokeh backend for interactive visualizations. I have a histogram with edges and frequency data. What is an elegant way of overlaying my histogram with the cumulative distribution (cdf) curve?
I tried using the cumsum option in hv.dim but don't think i'm doing it right. The help simply says,
Help on function cumsum in module holoviews.util.transform:
cumsum(self, **kwargs)
My code looks something like,
df_hist = pd.DataFrame(columns=['edges', 'freq'])
df_hist['edges'] = [-2, -1, 0, 1, 2]
df_hist['freq'] = [1, 3, 5, 3, 1]
hv.Histogram((df_hist.edges, df_hist.freq))
The result is a histogram plot.
Is there something like a...
hv.Histogram((df_hist.edges, df_hist.freq), type='cdf')
... to show the cumulative distribution?
One possible solution is by using histogram(cumulative=True) as follows:
from holoviews.operation import histogram
histogram(hv.Histogram((df_hist.edges, df_hist.freq)), cumulative=True)
More info on transforming elements here:
http://holoviews.org/user_guide/Transforming_Elements.html
Or a more general solution by turning the original data into a hv.Dataset():
import holoviews as hv
import seaborn as sns
hv.extension('bokeh')
iris = sns.load_dataset('iris')
hv_data = hv.Dataset(iris['petal_width'])
histogram(hv_data, cumulative=True)
But I like using library hvplot, which is built on top of Holoviews, even more:
import hvplot
import hvplot.pandas
iris['petal_width'].hvplot.hist(cumulative=True)
Related
I'm working on a package for Julia with the goal of doing quick plots using Vega-Lite as backend.
As people familiar with Matplotlib know, it is very common to have different sets for vectors, and plot all of them in the same figure, each with it's own label. For example:
x = range(0,10)
y = np.random.rand(10)
w = range(0,5)
z = np.random.rand(5)
plt.plot(x,y,label = 'y')
plt.plot(w,z,label = 'z')
plt.legend()
What I'd like to know is how can I do something similar, but using Vega-Lite (or Altair).
I know that I can do two separate plots and then add one over another. My problem is mainly about how to get the legends to work, since to get a legend, one usually needs another field
such as "color", pointing to another field in the dataframe.
I've seen similar posts, but dealing with the question of posting data from different columns. The answer to this case is basically to use the Fold Transform. But in my question this doesn't quite work, because I'm more interested in starting from two different plots, possibly using two different datasets, so "merging" the datasets is not a good solution.
You can take advantage of the fact that in composite charts, Vega-Lite uses shared scales by default. If you assign the color, shape, strokeDash, etc. to a unique value for each layer, an appropriate legend will be generated automatically.
Here is an example, using Altair to generate the Vega-Lite specification:
import pandas as pd
import numpy as np
import altair as alt
x = np.linspace(0, 10)
df1 = pd.DataFrame({
'x': x,
'y': np.sin(x)
})
df2 = pd.DataFrame({
'x': x,
'y': np.cos(x)
})
chart1 = alt.Chart(df1).transform_calculate(
label='"sine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
chart2 = alt.Chart(df2).transform_calculate(
label='"cosine"'
).mark_line().encode(
x='x',
y='y',
color='label:N'
)
alt.layer(chart1, chart2)
I'm struggling with some of the finer points of complex HoloViews plots, especially linked plots customizing the appearance of fonts and data points.
Using the following code, I can create this plot that has most of the features I want, but am stumped by a few things:
I want one marginal for the whole set of plots linked to 'ewr' (with individual marginals for each of the other axes), ideally on the left of the set; but my attempts to get just one in my definitions of s1 and s2 haven't worked, and I can find nothing in the documentation about moving a marginal to the left (or bottom for that matter).
I want to be able to define tooltips that use columns from my data that are not displayed in the plots. I can see one way of accomplishing this as shown in the commented alternate definition for s1, but that unlinks the plot it creates from the others. How do I create linked plots that have tooltips with elements not in those plots?
For reference, the data used is available here (converted in the code below to a Pandas dataframe, df).
import holoviews as hv
from holoviews import dim, opts
hv.extension('bokeh')
renderer = hv.renderer('bokeh')
from bokeh.models import HoverTool
from holoviews.plotting.links import DataLink
TOOLS="crosshair,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,poly_select,lasso_select".split(",")
ht = HoverTool(
tooltips=[('Name', '#{name}'), ('EWR', '#{ewr}{%0.2f}'), ('Win Rate', '#{winrate}{%d}')],
formatters={'ewr' : 'printf', 'winrate' : 'printf'})
point_opts = opts.Scatter(fill_color='black', fill_alpha=0.1, line_width=1, line_color='gray', size=5, tools=TOOLS+[ht])
hist_opts = opts.Histogram(fill_color='gray', fill_alpha=0.9, line_width=1, line_color='gray', tools=['box_select'], labelled=[None, None])
#s1 = hv.Scatter(df[['kfai','ewr','name','winrate']]).hist(num_bins=51, dimension='kfai')
s1 = hv.Scatter(df, 'kfai','ewr').hist(num_bins=51, dimension='kfai')
s2 = hv.Scatter(df, 'aerc', 'ewr').hist(num_bins=51, dimension=['aerc',None])
s3 = hv.Scatter(df, 'winrate', 'ewr').hist(num_bins=51, dimension=['winrate','ewr'])
p = (s1 + s2 + s3).opts(point_opts, hist_opts, opts.Layout(shared_axes=True, shared_datasource=True))
renderer.save(p, '_testHV')
Surprisingly nobody took the pain to make an example in the bokeh gallery for 2D histogram plotting
histogram2d of numpy gives the raw material, but would be nice to have an example as it happens for matplotlib
Any idea for a short way to make one?
Following up a proposed answer let me attach a case in which hexbin does not the job because exagons are not a good fit for the job. Also check out matplotlib result.
Of course I am not saying bokeh cannot do this, but it seem not straightfoward. Would be enough to change the hexbin plot into a square bin plot, but quad(left, right, top, bottom, **kwargs) seems not to do this, nor hexbin to have an option to change "tile" shapes.
You can make something close with relatively few lines of code (comapring with this example from the matplotib gallery). Note bokeh has some examples for hex binning in the gallery here and here. Adapting those and the example provided in the numpy docs you can get the below:
import numpy as np
from bokeh.plotting import figure, show
from bokeh.layouts import row
# normal distribution center at x=0 and y=5
x = np.random.randn(100000)
y = np.random.randn(100000) + 5
H, xe, ye = np.histogram2d(x, y, bins=100)
# produce an image of the 2d histogram
p = figure(x_range=(min(xe), max(xe)), y_range=(min(ye), max(ye)), title='Image')
p.image(image=[H], x=xe[0], y=ye[0], dw=xe[-1] - xe[0], dh=ye[-1] - ye[0], palette="Spectral11")
# produce hexbin plot
p2 = figure(title="Hexbin", match_aspect=True)
p.grid.visible = False
r, bins = p2.hexbin(x, y, size=0.1, hover_color="pink", hover_alpha=0.8, palette='Spectral11')
show(row(p, p2))
I wanted to create a heatmap of a probability density matrix using plotly.
import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, plot
import plotly.graph_objs as go
probability_matrix = np.loadtxt("/path/to/file")
trace = go.Heatmap(z = probability_matrix)
data=[trace]
plot(data, filename='basic-heatmap')
This gives me an image like this:
I want to smoothen the color of the squares so that the transition between adjacent squares in the image are somewhat "smoother". I was wondering if there is a way of doing that, without manually resizing the matrix using interpolation.
You can use the zsmooth argument which can take three values ('fast', 'best', or False). For example:
data = [go.Heatmap(z=[[1, 20, 30],
[20, 1, 60],
[30, 60, 1]],
zsmooth = 'best')]
iplot(data)
Will give you the following smooth heatmap:
I'm trying to do 4 plots using for loop.But I'm not sure how to do it.how can I display the plots one by one orderly?or save the figure as png?
Here is my code:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from astropy.io import fits
import pyregion
import glob
# read in the image
xray_name = glob.glob("*.fits")
for filename in xray_name:
f_xray = fits.open(filename)
#name = file_name[:-len('.fits')]
try:
from astropy.wcs import WCS
from astropy.visualization.wcsaxes import WCSAxes
wcs = WCS(f_xray[0].header)
fig = plt.figure()
ax = plt.subplot(projection=wcs)
fig.add_axes(ax)
except ImportError:
ax = plt.subplot(111)
ax.imshow(f_xray[0].data, cmap="summer", vmin=0., vmax=0.00038, origin="lower")
reg_name=glob.glob("*.reg")
for i in reg_name:
r =pyregion.open(i).as_imagecoord(header=f_xray[0].header)
from pyregion.mpl_helper import properties_func_default
# Use custom function for patch attribute
def fixed_color(shape, saved_attrs):
attr_list, attr_dict = saved_attrs
attr_dict["color"] = "red"
kwargs = properties_func_default(shape, (attr_list, attr_dict))
return kwargs
# select region shape with tag=="Group 1"
r1 = pyregion.ShapeList([rr for rr in r if rr.attr[1].get("tag") == "Group 1"])
patch_list1, artist_list1 = r1.get_mpl_patches_texts(fixed_color)
r2 = pyregion.ShapeList([rr for rr in r if rr.attr[1].get("tag") != "Group 1"])
patch_list2, artist_list2 = r2.get_mpl_patches_texts()
for p in patch_list1 + patch_list2:
ax.add_patch(p)
#for t in artist_list1 + artist_list2:
# ax.add_artist(t)
plt.show()
the aim of the code is to plot a region on fits file image,if there is a way to change the color of the background image to white and the brighter (centeral region) as it is would be okay.Thanks
You are using colormap "summer" with provided limits. It is not clear to me what you want to achieve since the picture you posted looks more or less digital black and white pixelwise.
In matplotlib there are built in colormaps, and all of those have a reversed twin.
'summer' has a reversed twin with 'summer_r'
This can be picked up in the mpl docs at multiple spots, like colormap example, or SO answers like this.
Hope that is what you are looking for. For the future, when posting code like this, try to remove all non relevant portions as well as at minimum provide a description of the data format/type. Best is to also include a small sample of the data and it's structure. A piece of code only works together with a set of data, so only sharing one is only half the problem formulation.