Legend for gridded Holoviews visualization with categorical data - bokeh

I'm using the holoviz xarray extension (holoviews.xarray) to visualize a gridded dataset with landcover classes. Plotting the data is straightforward with da.hvplot(). This results however in a continuous colormap with standard tick labels, whereas I need the categories plotted using a specific colormap and their labels included in a legend.
So how can I plot gridded categorical data using Holoviews? My plot needs to:
Have the categories plotted according to a specific colormap (hex color codes).
Include a legend with labels ["water", "cirrus", ...].
Handle situations where the data do not contain all classes. Explanation, when using da.hvplot(cmap=tuple(color_key.values()) while da does not contain all classes this typically results in a plot where the colorbar ticks do not match the color classes.
Have a legend outside the plotted data.
The best I got so far is the example provided below. But how can I move that legend out of the plot? Or is there a more straightforward solution?
import holoviews as hv
import hvplot.xarray
import numpy as np
import xarray as xr
color_key = {
"No Data": "#000000",
"Saturated / Defective": "#ff0000",
"Dark Area Pixels": "#2f2f2f",
"Cloud Shadows": "#643200",
"Vegetation": "#00a000",
"Bare Soils": "#ffe65a",
"water": "#0000ff",
"Clouds low probability / Unclassified": "#808080",
"Clouds medium probability": "#c0c0c0",
"Clouds high probability": "#ffffff",
"Cirrus": "#64c8ff",
"Snow / Ice": "#ff96ff",
}
# Generate sample data
nx = 40
ny = 70
xcoords = [37 + 0.1 * i for i in range(nx)]
ycoords = [5 + 0.2 * i for i in range(ny)]
data = np.random.randint(low=0, high=len(color_key), size=nx * ny).reshape(nx, ny)
da = xr.DataArray(
data,
dims=["x", "y"],
coords={"x": xcoords, "y": ycoords},
)
# Visualization
legend = hv.NdOverlay(
{
k: hv.Points([0, 0], label=f"{k}").opts(color=v, size=0, apply_ranges=False)
for k, v in color_key.items()
},
"Classification",
)
da.hvplot().opts(cmap=tuple(color_key.values())) * legend

You could either set .opts(legend_location='right') OR you can override the actual ticks on the colorbar using the colorbar_opts option and by providing a fixed ticker along with major_label_overrides like this:
ticks = np.arange(len(color_key), dtype='float') + 0.0001
ticker = FixedTicker(ticks=ticks)
labels = dict(zip(ticks, color_key))
da.hvplot(height=600).opts(clim=(-0.5, 11.5), cmap=tuple(color_key.values()), colorbar_opts={'ticker': ticker, 'major_label_overrides': labels})

Related

Border Plot Not Displaying

I am plotting a gpd map with an overlay of scatterplot and circle patches. However, when I run the code, the output only displays the scatterplot and circle patches without the border maps. There is nothing wrong with the map as I tried running it separately and it is showing the right output. Please see the image description below for how the plot looks like.
Here is the code:
def plot_dbscan(points, dbscan, title, pt_sizer=1, plot_circles=False):
# Index noise and clusters out of the dbscan points
noise = points[dbscan.labels_ == -1]
clusters = points[dbscan.labels_ != -1]
# Plot country border
fig, ax = plt.subplots(1, figsize=(12,8))
map_1_prj.plot(ax=ax, fc='None', ec='k', linewidth=1.5)
# Allow relative point size adjustment with pt_sizer argument
sns.scatterplot(x=noise[:,0], y=noise[:,1], ax=ax, alpha=1, s=2*pt_sizer, color='gray')
sns.scatterplot(x=clusters[:,0], y=clusters[:,1], ax=ax, s=4*pt_sizer, color='red')
# Option to plot a minimum bounding circle around each cluster
if plot_circles:
for label in np.unique(dbscan.labels_):
if label != -1:
cluster_points = points[dbscan.labels_ == label]
# Get minimum bounding circle using pointpats.centrography.minimum_bounding_circle()
(center_x, center_y), radius = minimum_bounding_circle(cluster_points)
# Create matplotlib patch
circle_patch = mpatches.Circle((center_x, center_y), radius=radius, fc='None', ec='yellow', linewidth=2)
ax.add_patch(circle_patch)
ax.axis('equal')
# Limit bounds of plot to earthquake data
ax.set_xlim(gdf_prj.total_bounds[0], gdf_prj.total_bounds[2])
ax.set_ylim(gdf_prj.total_bounds[1], gdf_prj.total_bounds[3])
# Manually prepare legend items
Ph_border = mlines.Line2D([], [], color='k', linewidth=1.5, label='Philippine Border')
noise_l = mlines.Line2D([], [], marker='.', linewidth=0, markersize=4,
color='gray', label='Noise')
if plot_circles:
# Draw yellow circle around red point for legend
mec = 'yellow'
else:
mec = 'None'
clusters_l = mlines.Line2D([], [], marker='.', linewidth=0,
markersize=12, color='red', markeredgecolor=mec,
label='DBSCAN Clusters')
# Define legend
plt.legend(handles=[Ph_border, noise_l, clusters_l])
plt.title(title)
plt.show()

Import of pngs and plotting them according to additional parameters

I am conducting a test with two factorial parameters (x, y) in an image processing program, which has produced a number of test images as pngs of the same dimensions. For example, in the mtcars dataset, they could represent one illustrative car image for each of the cyl/carb combinations.
I would like to import these images into R and plot them in a facet grid with the respective parameter values on the axes (e.g. cyl and carb).
What's the best way to a) import the images into a tibble/df, b) plotting them as per above?
(If necessary, I'd be happy to update the question with example code once I know what package to use).
If you have the images as png, you don't actually need to convert them to data frames. You can use geom_image from ggimage, which only requires the file path. Suppose I have two categories - "animal" and "color", with an image representing the conjunction of each unique "color" and "animal".
I need only do:
df <- data.frame(animal = rep(c("cat", "dog"), each = 3),
color = rep(c("red", "green", "blue"), 2),
img = path.expand(
c("~/redcat.png", "~/greencat.png", "~/bluecat.png",
"~/reddog.png", "~/greendog.png", "~/bluedog.png")))
library(ggimage)
ggplot(df) +
geom_image(aes(x = 1, y = 1, image = img), size = 1) +
facet_grid(color~animal, scales = "free")

How do I make x and y axes thicker with Plots (Julia)?

How can I make the lines for the x- and y-axes thicker in Julia Plots?
Is there a simple way to achieve this?
MWE:
using Plots
Nx, Ny = 101,101
x = LinRange(0, 100, Nx)
y = LinRange(0, 100, Ny)
foo(x,y; x0=50, y0=50, sigma =1) = exp(- ((x-x0)^2 + (y-y0)^2)/(2*sigma^2) )
NA = [CartesianIndex()] # for "newaxis"
Z = foo.(x[:,NA], y[NA,:], sigma=10);
hm = heatmap(x, y, Z, xlabel="x", ylabel="y", c=cgrad(:Blues_9), clim=(0,1))
plot(hm, tickfontsize=10, labelfontsize=14)
Leads to:
The posts I found so far suggested that this was not possible:
https://discourse.julialang.org/t/plots-jl-modify-frame-thickness/24258/4
https://github.com/JuliaPlots/Plots.jl/issues/1099
It this still so?
The actual code for my plot is much longer.
I would not like to rewrite all of it in a different plot library.
Currently, there does not seem to be an attribute for axes thickness in Plots.jl.
As a workaround, you may use the attribute thickness_scaling, which will scale the thickness of everything: lines, grid lines, axes lines, etc. Since you only want to change the thickness of axes, you need to scale down the others. Here is your example code doing that using pyplot backend.
using Plots
pyplot() # use pyplot backend
Nx, Ny = 101,101
x = LinRange(0, 100, Nx)
y = LinRange(0, 100, Ny)
foo(x,y; x0=50, y0=50, sigma =1) = exp(- ((x-x0)^2 + (y-y0)^2)/(2*sigma^2) )
NA = [CartesianIndex()] # for "newaxis"
Z = foo.(x[:,NA], y[NA,:], sigma=10);
hm = heatmap(x, y, Z, xlabel="x", ylabel="y", c=cgrad(:Blues_9), clim=(0,1))
plot(hm, tickfontsize=10, labelfontsize=14) # your previous plot
# here is the plot code that shows the same plot with thicker axes on a new window
# note that GR backend does not support `colorbar_tickfontsize` attribute
plot(hm, thickness_scaling=2, tickfontsize=10/2, labelfontsize=14/2, colorbar_tickfontsize=8/2, reuse=false)
See Julia Plots Documentation for more about plot attributes.
A simple workaround where you do not need to add attributes for all the fonts is to add verticle and horizontal lines at the limits for x and y of the plots. For example, if I have a figure fig with 4 subplots, each with the same bounds, I can use this to get a thicker box frame:
for i ∈ 1:4
vline!(fig[i], [xlim_lb, xlim_ub],
linewidth=3,
color=:black,
label=false)
hline!(fig[i], [ylim_lb, ylim_ub],
linewidth=3,
color=:black,
label=false)
end
or for the original example here, add this to the end:
frame_thickness = 5
vline!([x[1], x[end]], color=:black, linewidth=frame_thickness, label=false)
hline!([y[1], y[end]], color=:black, linewidth=frame_thickness, label=false)

matplotlib bar plot add legend from categories dataframe column

I try to add the legend which should, according to my example, output:
a red square with the word fruit and
a green square with the word
veggie.
I tried several things (the example below is just 1 of the many trials), but I can't get it work.
Can someone tell me how to solve this problem?
import pandas as pd
from matplotlib import pyplot as plt
data = [['apple', 'fruit', 10], ['nanaba', 'fruit', 15], ['salat','veggie', 144]]
data = pd.DataFrame(data, columns = ['Object', 'Type', 'Value'])
colors = {'fruit':'red', 'veggie':'green'}
c = data['Type'].apply(lambda x: colors[x])
bars = plt.bar(data['Object'], data['Value'], color=c, label=colors)
plt.legend()
The usual way to create a legend for objects which are not in the axes would be to create proxy artists as shown in the legend guide
Here,
colors = {'fruit':'red', 'veggie':'green'}
labels = list(colors.keys())
handles = [plt.Rectangle((0,0),1,1, color=colors[label]) for label in labels]
plt.legend(handles, labels)
So this is a hacky solution and I'm sure there are probably better ways to do this. What you can do is plot individual bar plots that are invisible using width=0 with the original plot colors and specify the labels. You will have to do this in a subplot though.
import pandas as pd
from matplotlib import pyplot as plt
data = [['apple', 'fruit', 10], ['nanaba', 'fruit', 15], ['salat','veggie', 144]]
data = pd.DataFrame(data, columns = ['Object', 'Type', 'Value'])
colors = {'fruit':'red', 'veggie':'green'}
c = data['Type'].apply(lambda x: colors[x])
ax = plt.subplot(111) #specify a subplot
bars = ax.bar(data['Object'], data['Value'], color=c) #Plot data on subplot axis
for i, j in colors.items(): #Loop over color dictionary
ax.bar(data['Object'], data['Value'],width=0,color=j,label=i) #Plot invisible bar graph but have the legends specified
ax.legend()
plt.show()

Bubble chart for integer variables where the largest bubble has a diameter of 1 (on the x or y axis scale)?

I want to achieve the following outcomes:
Rescale the size of the bubbles such that the largest bubble has a
diameter of 1 (on whichever has the more compressed scale of the x
and y axes).
Rescale the size of the bubbles such that the smallest bubble has a diameter of 1 mm
Have a legend with the first and last points the minimum non-zero
frequency and the maximum frequency.
The best I have been able to do is as follows, but I need a more general solution where the value of maxSize is computed rather than hard-coded. If I was doing it in the traditional R plots I would use par("pin") to work out the size of plot area and work backwards, but I cannot figure out how to access this information with ggplot2. Any suggestions?
library(ggplot2)
agData = data.frame(
class=rep(1:7,3),
drv = rep(1:3,rep(7,3)),
freq = as.numeric(xtabs(~class+drv,data = mpg))
)
agData = agData[agData$freq != 0,]
rng = range(agData$freq)
mn = rng[1]
mx = rng[2]
minimumArea = mx - mn
maxSize = 20
minSize = max(1,maxSize * sqrt(mn/mx))
qplot(class,drv,data = agData, size = freq) + theme_bw() +
scale_area(range = c(minSize,maxSize),
breaks = seq(mn,mx,minimumArea/4), limits = rng)
Here is what it looks like so far:
When no ggplot, lattice or other highlevel package seems to do the job without hours of fine tuning I always revert to the base graphics. The following code gets you what you want, and after it I have another example based on how I would have plotted it.
Note however that I have set the maximum radius to 1 cm, but just divide size.range/2 to get diameter instead. I just thought radius gave me nicer plots, and you'll probably want to adjust things anyways.
size.range <- c(.1, 1) # Min and max radius of circles, in cm
# Calculate the relative radius of each circle
radii <- sqrt(agData$freq)
radii <- diff(size.range)*(radii - min(radii))/diff(range(radii)) + size.range[1]
# Plot in two panels
mar0 <- par("mar")
layout(t(1:2), widths=c(4,1))
# Panel 1: The circles
par(mar=c(mar0[1:3],.5))
symbols(agData$class, agData$drv, radii, inches=size.range[2]/cm(1), bg="black")
# Panel 2: The legend
par(mar=c(mar0[1],.5,mar0[3:4]))
symbols(c(0,0), 1:2, size.range, xlim=c(-4, 4), ylim=c(-2,4),
inches=1/cm(1), bg="black", axes=FALSE, xlab="", ylab="")
text(0, 3, "Freq")
text(c(2,0), 1:2, range(agData$freq), col=c("black", "white"))
# Reset par settings
par(mar=mar0)
Now follows my suggestion. The largest circle has a radius of 1 cm and area of the circles are proportional to agData$freq, without forcing a size of the smallest circle. Personally I think this is easier to read (both code and figure) and looks nicer.
with(agData, symbols(class, drv, sqrt(freq),
inches=size.range[2]/cm(1), bg="black"))
with(agData, text(class, drv, freq, col="white"))

Resources