"BAD_COLUMN_NAME" message from Bokeh plot - bokeh

I am trying to create a simple bokeh chart (vert or hor) from a csv file and seem to be having issues. I am able to create the chart utilizing ColumnDataSource by listing items manually, but when I try to create the same chart from a simple csv file, it seems to give me trouble. Now I am trying to just create the figure by reading a pandas df. I am getting the dreaded Bad Column Name error message. Any help is appreciated. First time posting so let me know if I have posted incorrectly and I will fix. Thanks in advance.
from bokeh.io import output_notebook, show
output_notebook()
from bokeh.core.properties import value
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource,
HoverTool, FactorRange, Range1d
from bokeh.plotting import figure
from bokeh.transform import dodge
output_file("test.html")
import pandas as pd
df = pd.read_csv(r'C:\test.csv')
print(df)
Mains Total Length Length Surveyed
0 1.0 88.4 87.6
1 2.0 313.8 316.8
2 3.0 271.0 265.6
3 4.0 155.0 153.1
4 5.0 301.8 299.0
5 6.0 293.9 132.3
6 7.0 148.1 147.2
7 8.0 292.9 290.1
8 9.0 307.6 306.0
9 10.0 559.0 236.0
10 11.0 448.8 441.5
11 12.0 297.9 13.0
12 13.0 172.2 67.5
source = ColumnDataSource(data=dict(df))
Mains = data=dict(df)
data = {'Mains': df}
p = figure(x_range=(0,20), y_range=(0, 500),
plot_height=250, title="CCTV Survey August 6-
9th,
2018",
toolbar_location=None, tools="")
p.vbar(x=dodge('Mains', -0.25, range=p.x_range),
top='2015', width=0.2, source=source,
color="#c9d9d3", legend=value("Total
Length"))
p.vbar(x=dodge('Mains', 0.0, range=p.x_range),
top='2016', width=0.2, source=source,
color="#718dbf", legend=value("Length
Surveyed"))
p.add_tools(HoverTool(tooltips=[("Total Length",
"#2015 ft"),("Length Surveyed", "#2016
ft")]))
p.xaxis.major_label_orientation = 1.4
##p.x_range.factors=data_dict['x']
##p.x_range.range_padding = 0.0
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: 2015 [renderer: GlyphRenderer(id='107d32b5-2700-4608-b1d8-9d0602f82a5b', ...)]
ERROR:bokeh.core.validation.check:E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: 2016 [renderer: GlyphRenderer(id='4dcb960b-b29e-4998-972a-046311d037f8', ...)]

You are telling Bokeh that the top of the bars should be driven by a column named "2016":
top='2016', width=0.2, source=source,
But your dataframe / column data source has no such column. You need to configure the bar glyphs to use columns that are actually part of your data (presumably
"Total Length" or "Length Surveyed")

Related

ggplot2 black bar when outputting plot using bbplot theme

I've started using the bbplot package today which is generally working really well, and is a theme and wrapper for ggplot2 created by the BBC. However, when finalising the plot, the exports seem to include a black bar at the bottom of the image which masks the source details. I've fiddled with the code and it seems to be happening when the subtitle or source are longer than a certain number of characters (although still less wide than the width of the available space). I can't get any plots to finalise with both a subtitle and source at present.
Using databricks in azure - Runtime 9.1 LTS
Code:
svannualcount <- ggplot(svkc_count, aes(x = as.character(Recorded_Year), y = n)) +
geom_col(fill="#082a52") +
geom_hline(yintercept = 0, size = 1, colour="#333333") +
theme(axis.text.x = element_text(angle = 0)) +
bbc_style() +
labs(title="This is a test title",
subtitle = "Can you read my test title?")
finalise_plot(plot_name = svannualcount,
source = "Source: This is a test source.",
save_filepath = "/tmp/svkcannualplot.png",
width_pixels = 640,
height_pixels = 450,
logo_image_path = "/tmp/vrulogo.png"
)
Data:
Recorded_Year n `as.character(Recorded_Year)`
<int> <dbl> <chr>
1 2018 2171 2018
2 2015 1128 2015
3 2019 2128 2019
4 2020 2041 2020
5 2016 1572 2016
6 2017 1968 2017
7 2021 1711 2021
Image:
On further testing and with useful input from #rui-barradas this was identified as an issue with databricks and ggplot2 plots not appearing correctly in line in a notebook. The exported plots render correctly when opened in an image browser.

How to replicate the Column with data set in table

I want to convert question columns into a row.using Python Pandas like below
import pandas as pd
from openpyxl import load_workbook
df = pd.read_excel (r'file' ,sheet_name='results')
d = {'Score.1':'Score','Score.2':'Score','Duration.1':'Duration','Duration.2':'Duration'}
melted=pd.melt(df, id_vars=['userid','Candidate','Score','Duration'], value_vars=['Question 1'],var_name='myVarname', value_name='myValname')
melted1=pd.melt(df, id_vars=['userid','Candidate','Score.1','Duration.1'], value_vars=['Question 2'],var_name='myVarname', value_name='myValname').rename(columns=d)
melted2=pd.melt(df, id_vars=['userid','Candidate','Score.2','Duration.2'], value_vars=['Question 3 '],var_name='myVarname', value_name='myValname').rename(columns=d)
......
melted2=pd.melt(df, id_vars=['userid','Candidate','Score.25','Duration.25'], value_vars=['Question 25 '],var_name='myVarname', value_name='myValname').rename(columns=d)
meltedfinal=[melted,melted1,melted2]
result = pd.concat(meltedfinal)
result.to_excel(r'file') # doctest: +SKIP

IndexError while plotting netCDF data using Basemap, matplotlib and contour command

I am trying to plot variables of pressure and wind velocities from the netcdf file of MERRA re-analysis data at a single level. I'm using the basemap module for the plotting and other necessary packages to get the data. Unfortunately, I end up with an error.
Here is my code:
from netCDF4 import Dataset as NetCDFFile
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
import os
import netCDF4
from netCDF4 import Dataset
from osgeo import gdal
os.chdir('F:\Atmospheric rivers\MERRA')
dirrec=str('F:\Atmospheric rivers\MERRA')
ncfile='MERRA2_400.inst6_3d_ana_Np.20180801.SUB.nc'
file=Dataset(ncfile,'r',format='NETCDF4')
lon=file.variables['lon'][:]
lat=file.variables['lat'][:]
time=file.variables['time'][:]
u=file.variables['U'][:]
v=file.variables['V'][:]
p=file.variables['PS'][:]
q=file.variables['QV'][:]
map = Basemap(projection='merc',llcrnrlon=70.,llcrnrlat=8.,urcrnrlon=90.,urcrnrlat=20.,resolution='i')
map.drawcoastlines()
map.drawstates()
map.drawcountries()
#map.drawlsmask(land_color='Linen', ocean_color='#CCFFFF', resolution ='i') # can use HTML names or codes for colors
#map.drawcounties()
parallels = np.arange(0,50,5.) # make latitude lines ever 5 degrees from 30N-50N
meridians = np.arange(-95,-70,5.) # make longitude lines every 5 degrees from 95W to 70W
map.drawparallels(parallels,labels=[1,0,0,0],fontsize=10)
map.drawmeridians(meridians,labels=[0,0,0,1],fontsize=10)
x,y= np.meshgrid(lon-180,lat) # for this dataset, longitude is 0 through 360, so you need to subtract 180 to properly display on map
#x,y = map(lons,lats)
clevs = np.arange(960,1040,4)
cs = map.contour(x,y,p[0,:,:]/100,clevs,colors='blue',linewidths=1.)
plt.show()
plt.savefig('Pressure.png')
Error
%run "F:/Atmospheric rivers/MERRA/aosc.py"
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py:3505: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
b = ax.ishold()
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py:3570: MatplotlibDeprecationWarning: axes.hold is deprecated.
See the API Changes document (http://matplotlib.org/api/api_changes.html)
for more details.
ax.hold(b)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
F:\Atmospheric rivers\MERRA\aosc.py in <module>()
36 #x,y = map(lons,lats)
37 clevs = np.arange(960,1040,4)
---> 38 cs = map.contour(x,y,p[0,:,:]/100,clevs,colors='blue',linewidths=1.)
39
40
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py in with_transform(self, x, y, data, *args, **kwargs)
519 # convert lat/lon coords to map projection coords.
520 x, y = self(x,y)
--> 521 return plotfunc(self,x,y,data,*args,**kwargs)
522 return with_transform
523
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py in contour(self, x, y, data, *args, **kwargs)
3540 # only do this check for global projections.
3541 if self.projection in _cylproj + _pseudocyl:
-> 3542 xx = x[x.shape[0]/2,:]
3543 condition = (xx >= self.xmin) & (xx <= self.xmax)
3544 xl = xx.compress(condition).tolist()
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
So, in summary the error turns out to be
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
How can I fix this error?
I guess you are not using correct values for the x and y to make the plot. Basemap contour expects all x,y and z to be 2D arrays. In your case, x and y are most likely vectors of longitude and latitude values. In addtion, you should convert the longitude and latitude to the figure coordinates based on the map coordinates/projection.
Can you try:
lons,lats = np.meshgrid(lon-180.,lat);
x,y = map(lons,lats)
cs = map.contour(x,y,p[0,:,:]/100,clevs,colors='blue',linewidths=1.)
The second line should convert your longitude and latitude values for the correct x and y values for your selected projection.

Error thrown from periodic callback: ValueError('Must stream updates to all existing columns (missing: index)',)

I'm trying to figure out how to stream data from a pandas dataframe. My code is looking like this:
def modify_doc(doc):
df_all = pd.read_csv(data)
df_all['Date'] = pd.to_datetime(df_all['Date'])
# startup using most of the data and stream the rest
df = df_all[0:-1]
source = ColumnDataSource(df)
plot = figure(x_axis_type='datetime',
y_range=(0, 10000000),
y_axis_label='Y Label',
title="Title")
plot.line('Date', 'ALL_EXCL_FUEL', color='blue', alpha=1, source=source)
plot.line('Date', 'MOSTLY_FOOD', color='lightblue', alpha=1, source=source)
plot.line('Date', 'NON_SPECIALISED', color='grey', alpha=1, source=source)
def callback():
# hardcode update values for now
source.stream( df[-1:] )
doc.add_root(plot)
doc.add_periodic_callback(callback, 50)
bokeh_app = Application(FunctionHandler(modify_doc))
However, this results in the error:
Error thrown from periodic callback: ValueError('Must stream updates to all existing columns (missing: index)',)
I can print out the source.data.keys:
source = ColumnDataSource(df)
print(source.data.keys())
dict_keys(['Date', 'ALL_EXCL_FUEL', 'MOSTLY_FOOD', 'NON_SPECIALISED', 'TEXTILE', 'HOUSEHOLD', 'OTHER', 'NON_STORE', 'index'])
It appears that the ColumnData source is using the dataframe index internally? A few others have also run into this issue: https://github.com/bokeh/bokeh/issues/4797, although the ticket has been closed.
I have included a Minimal, Complete, and Verifiable example below to enable reproducing my issue:
bokeh_server.py
import pandas as pd
from tornado.ioloop import IOLoop
import yaml
from jinja2 import Template
from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application
from bokeh.layouts import column
from bokeh.models import ColumnDataSource, Slider, Div
from bokeh.plotting import figure
from bokeh.server.server import Server
from bokeh.themes import Theme
import os
# if running locally, listen on port 5000
PORT = int(os.getenv('PORT', '5000'))
HOST = "0.0.0.0"
# this is set in the cloud foundry manifest
try:
ALLOW_WEBSOCKET_ORIGIN = os.getenv("ALLOW_WEBSOCKET_ORIGIN").split(',')
except:
ALLOW_WEBSOCKET_ORIGIN = [ 'localhost:{0}'.format(PORT) ]
print('ALLOW_WEBSOCKET_ORIGIN', ALLOW_WEBSOCKET_ORIGIN)
io_loop = IOLoop.current()
import io
data = io.StringIO("""Date,ALL_EXCL_FUEL,MOSTLY_FOOD,NON_SPECIALISED,TEXTILE,HOUSEHOLD,OTHER,NON_STORE
1986 Jan,1883154,747432,163708,267774,261453,281699,161088
1986 Feb,1819796,773161,152656,223836,246502,275121,148519
1986 Mar,1912582,797104,169440,251438,249614,292348,152638
1986 Apr,1974419,809334,170540,275975,260086,299271,159213
1986 May,1948915,800193,170173,274979,251175,297655,154740
1986 Jun,2019114,821785,178366,295463,251507,311447,160546
1986 Jul,2051539,816033,184812,297969,269786,323187,159752
1986 Aug,2011746,804386,180911,297138,263427,310220,155665
1986 Sep,2046678,792943,181055,305350,280640,318368,168322
1986 Oct,2110669,810147,187728,308919,298637,325617,179621
1986 Nov,2315710,847794,231599,352009,332079,358077,194152
1986 Dec,2830206,970987,319570,490001,373714,469399,206536
1987 Jan,2032021,798562,172215,288186,288534,307900,176624
1987 Feb,1980748,805713,165682,247219,282836,313577,165721
1987 Mar,2009717,816051,174034,256756,280207,315562,167106
1987 Apr,2156967,862749,189729,308543,284440,336755,174751
1987 May,2075808,834375,175464,287515,280404,330093,167957
1987 Jun,2137092,844051,183014,304706,286522,345149,173651
1987 Jul,2208377,847098,198848,330804,301537,356037,174054
1987 Aug,2193689,854672,186160,317375,304843,356241,174399
1987 Sep,2177927,825398,188343,317164,314681,350923,181418
1987 Oct,2281593,850022,202862,340464,334112,355424,198710
1987 Nov,2506843,892292,248366,381103,371953,397845,215285
1987 Dec,3075829,1028966,346378,533443,422524,519848,224669
1988 Jan,2267165,845068,193734,316077,354371,364295,193620
1988 Feb,2164201,864420,178627,267003,324824,351326,178001
1988 Mar,2227296,893751,192979,283258,319268,356518,181522
1988 Apr,2309954,899831,195328,312896,330680,379170,192049
1988 May,2321889,904736,193670,322577,325868,385344,189694
1988 Jun,2331091,900316,199227,330852,323326,387613,189757
1988 Jul,2443590,907775,212694,356501,363880,406913,195827
1988 Aug,2410116,913793,204410,339444,355879,405094,191497
""")
def modify_doc(doc):
df_all = pd.read_csv(data)
df_all['Date'] = pd.to_datetime(df_all['Date'])
df = df_all[0:-1]
source = ColumnDataSource(df)
plot = figure(x_axis_type='datetime',
y_range=(0, 10000000),
y_axis_label='Y Label',
title="Title")
plot.line('Date', 'ALL_EXCL_FUEL', color='blue', alpha=1, source=source)
plot.line('Date', 'MOSTLY_FOOD', color='lightblue', alpha=1, source=source)
plot.line('Date', 'NON_SPECIALISED', color='grey', alpha=1, source=source)
def callback():
# hardcode update values for now
source.stream( df[-1:] )
doc.add_root(plot)
doc.add_periodic_callback(callback, 50)
bokeh_app = Application(FunctionHandler(modify_doc))
server = Server(
{'/': bokeh_app},
io_loop=io_loop,
allow_websocket_origin=ALLOW_WEBSOCKET_ORIGIN,
**{'port': PORT, 'address': HOST}
)
server.start()
if __name__ == '__main__':
io_loop.add_callback(server.show, "/")
io_loop.start()
Running
python bokeh_server.py
The solution for me was to convert the data frame to a dict:
def modify_doc(doc):
df_all = ...
start_data = df_all.to_dict(orient='list')
source = ColumnDataSource(data=start_data)
...
def callback():
...
new_data = df_new.to_dict(orient='list')
source.stream( new_data )

How to make `Heatmaps` in `Bokeh` with a continuous color map, using Python 3?

I was trying to replicate this style of HeatMap that maps continuous values to a LinearColorMapper instance: http://docs.bokeh.org/en/latest/docs/gallery/unemployment.html
I wanted to make a HeatMap (w/ either charts or rect) and then add a single selection widget to select the obsv_id and then a slider widget to go through the dates.
However, I was having trouble in the beginning with the HeatMap itself with a single obsv_id/date pair. What am I doing wrong in creating this HeatMap? This would essentially be a 3x3 rectangle plot of the size variable and the loc variable.
Bonus: Can you help me/give some advice on how to wire the output of these widgets to control the plot?
I saw these posts but all of the examples use actual hex colors as a list instead of mapping using a continuous measure:
python bokeh, how to make a correlation plot? http://docs.bokeh.org/en/latest/docs/gallery/categorical.html
# Init
import numpy as np
import pandas as pd
from bokeh.plotting import figure, output_notebook, output_file, reset_output, show, ColumnDataSource
from bokeh.models import LinearColorMapper
reset_output()
output_notebook()
np.random.seed(0)
# Coords
dates = ["07-3","07-11","08-6","08-28"]
#locs = ["air","water","earth"]
locs = [0,1,2]
size = [3.0, 0.2, 0.025]
observations = ["obsv_%d"%_ for _ in range(10)]
# Data
Ar_tmp = np.zeros(( len(dates)*len(locs)*len(size)*len(observations), 5 ), dtype=object)
i = 0
for date in dates:
for loc in locs:
for s in size:
for obsv_id in observations:
Ar_tmp[i,:] = np.array([obsv_id, date, loc, s, np.random.random()])
i += 1
DF_tmp = pd.DataFrame(Ar_tmp, columns=["obsv_id", "date", "loc", "size", "value"])
DF_tmp["value"] = DF_tmp["value"].astype(float)
DF_tmp["size"] = DF_tmp["size"].astype(float)
DF_tmp["loc"] = DF_tmp["loc"].astype(float)
# obsv_id date loc size value
# 0 obsv_0 07-3 air 3.0 0.548814
# 1 obsv_1 07-3 air 3.0 0.715189
# 2 obsv_2 07-3 air 3.0 0.602763
# 3 obsv_3 07-3 air 3.0 0.544883
# 4 obsv_4 07-3 air 3.0 0.423655
mapper = LinearColorMapper(low = DF_tmp["value"].min(), high = DF_tmp["value"].max())
# # Create Heatmap of a single observation and date pair
query_idx = set(DF_tmp.index[DF_tmp["obsv_id"] == "obsv_0"]) & set(DF_tmp.index[DF_tmp["date"] == "08-28"])
# p = HeatMap(data=DF_tmp.loc[query_idx,:], x="loc", y="size", values="value")
p = figure()
p.rect(x="loc", y="size",
source=ColumnDataSource(DF_tmp.loc[query_idx,:]),
fill_color={'field': 'value', 'transform': mapper},
line_color=None)
show(p)
My Error:
# Javascript error adding output!
# TypeError: Cannot read property 'length' of null
# See your browser Javascript console for more details.
You have to provide a palette to LinearColorMapper. For example:
mapper = LinearColorMapper(
palette='Magma256',
low=DF_tmp["value"].min(),
high=DF_tmp["value"].max()
)
From the LinearColorMapper doc:
class LinearColorMapper(palette=None, **kwargs)
Map numbers in a range [low, high] linearly into a sequence of colors (a palette).
Not related to your exception, but you'll also need to pass a width and height parameters to p.rect().

Resources