Purpose of columnDataSource in bokeh - bokeh

I am new to bokeh and trying to figure out what columnDataSource does. It appears in many places but I am uncertain of its purpose and how it works. Can someone illuminate? Apologies if this is a silly question...

ColumnDataSource is the object where the data of a Bokeh graph is stored. You can choose not to use a ColumnDataSource and feed your graph directly with Python dictionaries, pandas dataframes, etc, but for certain features such as having a popup window showing data information when the user hovers the mouse on glyphs, you are forced to use a ColumnDataSource otherwise the popup window will not be able to get the data. Other uses would be when streaming data.
You can create a ColumnDataSource from dictionaries and pandas dataframes and then use the ColumnDataSource to create the glyphs.

This should work:
import pandas as pd
import bokeh.plotting as bp
from bokeh.models import HoverTool, DatetimeTickFormatter
# Create the base data
data_dict = {"Dates":["2017-03-01",
"2017-03-02",
"2017-03-03",
"2017-03-04",
"2017-03-05",
"2017-03-06"],
"Prices":[1, 2, 1, 2, 1, 2]}
# Turn it into a dataframe
data = pd.DataFrame(data_dict, columns = ['Dates', 'Prices'])
# Convert the date column to the dateformat, and create a ToolTipDates column
data['Dates'] = pd.to_datetime(data['Dates'])
data['ToolTipDates'] = data.Dates.map(lambda x: x.strftime("%b %d")) # Saves work with the tooltip later
# Create a ColumnDataSource object
mySource = bp.ColumnDataSource(data)
# Create your plot as a bokeh.figure object
myPlot = bp.figure(height = 600,
width = 800,
x_axis_type = 'datetime',
title = 'ColumnDataSource',
y_range=(0,3))
# Format your x-axis as datetime.
myPlot.xaxis[0].formatter = DatetimeTickFormatter(days='%b %d')
# Draw the plot on your plot object, identifying the source as your Column Data Source object.
myPlot.circle("Dates",
"Prices",
source=mySource,
color='red',
size = 25)
# Add your tooltips
myPlot.add_tools( HoverTool(tooltips= [("Dates","#ToolTipDates"),
("Prices","#Prices")]))
# Create an output file
bp.output_file('columnDataSource.html', title = 'ColumnDataSource')
bp.show(myPlot) # et voilĂ .

Related

How to split date to separate dateonly and time only columns but formatting data to datetime instead of object to be able to group sales by week?

I have a date set that I am trying to tidy up to group by days and the group sales data by weeks. so That I can use linear regression.
The issue Im having is that when I split the column the data is set to object. How can I amend this?
Any help would really be appreciated.
My code so far is
date = pd.to_datetime("8th of March, 2022").import numpy as np
import io
import matplotlib.pyplot as plt
from sklearn import linear_model
from google.colab import files**```
```**uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['sales.csv']))
df.head()**```
below is a sample of the file im uploading
index,Till,Date,Receipt,Account,StaffMember,Value,Discount,Lines
0,22,03/01/2022 09:37:01,684629,NaN,1,-35.7,0.0,1
1,21,03/01/2022 09:46:01,682593,NaN,1,45.5,0.0,2
2,22,03/01/2022 09:48:01,684630,NaN,1,7.0,0.0,1
3,22,03/01/2022 09:50:01,684631,NaN,2,65.92,0.0,1
4,22,03/01/2022 10:01:01,684632,NaN,1,35.7,0.0,1
```**df[['DateOnly', 'TimeOnly']] = df['Date'].str.split(' ', 1, expand=True)
df.head()**```
When I check the df.types DateOnly and TimeOnly are set as objects.
```**pd.date_range('2022-01-01', '2022-03-26', freq='W')**```
I can group by week but I cannot seem to have the vale with it which is what I need.

Extract the max and min values from a ColumnDataSource column

I'm doing a Bokeh application in which I have an input table, some calculations are performed on it and it produces a new table. I'm trying to plot a heatmap of this new table, so I have to create a colorbar using the LinearColorMapper function, however I can't use the min and max values from the calculated table (which is a ColumnDataSource), this is how the table is stored:
def val_portafolio_mostrar():
val_portafolio=datos_calcular()
val_mapa=pd.DataFrame(val_portafolio.stack(), columns=['valoracion']).reset_index()
datos_heatmap.data=dict(val_mapa)
The values which are going to be plotted on the heatmap are in the 'valoracion' column from datos_heatmap, this is the code I'm using for the LinearColorMapper
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
mapper = LinearColorMapper(palette=colors, low=min(datos_heatmap.data['valoracion']),
high=max(datos_heatmap.data['valoracion']))
however I'm getting the following error:
in mapper_fun
mapper = LinearColorMapper(palette=colors, low=min(datos_heatmap.data['valoracion']),
ValueError: min() arg is an empty sequence
I think it's because in order to access a ColumnDataSource value, the function needs to have the "source" parameter, however the LinearColorMapper function does not have this parameter so it's not possible to solve it this way. I also tried to store the max and min values in another ColumnDataSource but I get the same error because I'm not using a source rather just extracting the values as in "datos_heatmap.data['valoracion']"
Thanks in advance!
I guess the problem is, that your datos_heatmap.data['valoracion'] really is an empty sequence. Please provide futher information here with a minimal example and try printing your datos_heatmap.data['valoracion'] right before you pass it to the LinearColorMapper. If you work with python functions, make sure that your definitions are correct within the namespaces you want them to exist.
Check the following example, that works the same way as you want it to:
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, LinearColorMapper
from bokeh.layouts import layout
plot1 = figure(plot_width=1000, plot_height=250)
df = pd.DataFrame({"ID":[0, 1, 2, 3],
"Value1":[0, 100, 200, 300]})
source = ColumnDataSource(df)
cmap = LinearColorMapper(palette="Turbo256", low = min(source.data["Value1"]), high = max(source.data["Value1"]))
print(f"high value cmap: {cmap.high}")
print(f"low value cmap: {cmap.low}")
circle = plot1.circle(x='ID', y='Value1', source=source, size=30,
fill_color={"field":'Value1', "transform":cmap})
show(plot1)
This produces the following plot:

Plot all columns from dataframe without having to define them in Plotly

I would like to plot in Plotly all columns from dataframe without having to define them.
The required is the same functionality in Plotly as here in matplotlib.
import glob
import pandas as pd
df = pd.DataFrame({
'A': ['15','21','30'],
'M': ['12','24','31'],
'I': ['28','32','10']})
%matplotlib inline
from matplotlib import pyplot as plt
df=df.astype(float)
df.plot()
Here is my code for Plotly, but as I said, I have no idea how to plot all the columns automatically. The once I have noticed is also, that in Plotly the X-axis needs to be defined, but with this restriction I can live.
import plotly.express as px
import pandas as pd
import numpy as np
import os
# data
df = pd.DataFrame({
'ID': ['1','2','3'],
'A': ['15','21','30'],
'M': ['12','24','31'],
'I': ['28','32','10']})
df_long=pd.melt(df , id_vars=['ID'], value_vars=['A', 'M' , 'I'])
fig = px.line(df_long, x='ID', y='value', color='variable')
fig.show()
How can I define how to plot in Plotly all the columns automatically?
Okay, i have found the solution to my problem:
df_long=pd.melt(df , id_vars=['ID'])
instead of:
df_long=pd.melt(df , id_vars=['ID'], value_vars=['A', 'M' , 'I'])
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

Can a Bokeh ColumnDataSource be plotted in one go like a Pandas DataFrame?

Can I plot a bokeh ColumnDataSource in one go, similar to plotting a Pandas DataFrame (e.g., second figure on the Pandas plotting documentation); especially if the ColumnDatasource is derived from a DataFrame? Or is the loss of e.g. the index preventing this?
So, with Pandas I can do:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
years = np.arange(2000, 2020)
columns = ['first', 'second', 'third', 'fourth']
df = pd.DataFrame(data=np.random.random((20, 4)), columns=columns, index=years)
df.plot()
plt.show()
But I'd like some interactivity on that figure; in particular, hovering over the lines should show the y-value and the respective label name.
With Bokeh, I currently use:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.palettes import Category20
output_notebook()
p = figure(plot_width=800, plot_height=600)
for column, color in zip(columns, Category20[4]):
line = p.line(df.index, df[column], color=color, legend=column, name=column)
p.add_tools(HoverTool(renderers=[line], tooltips=[(column, "$y")], toggleable=False))
p.legend.background_fill_alpha = 0.5
show(p)
The separate p.add_tools(...) calls are so that the individual dataset labels and y-values show up when hovered over the corresponding line.
But the manual loop feels a tad unwieldy. I wonder if there is something available along the following lines:
source = ColumnDataSource(df)
p.multi_line(source=source, color=Category20[4])
# or something similar to: p.multi_line(source=source, xs='index', ys=models)
where the index is automatically used for the x-axis, the columns are looped over, and the legend labels are derived from the column names.
The best alternative I've found appears to be
source = ColumnDataSource(dict(
x=[df.index]*len(columns),
y=[df[column].values for column in columns],
color=Category20[4],
legend=columns))
p.multi_line(source=source, xs='x', ys='y', color='color')
show(p)
but that again feels unwieldy in creating the ColumnDatasource. Plus, I don't know how to create tooltips for each line individually: I guess multi-line isn't supposed to be used for lines that should be regarded individually.
Is there a more direct, easier, way?
My previous comment did not come out at all like what I was trying to type.
I was trying to suggest:
p.add_tools(HoverTool( tooltips=[ ("X" ,"#x"), ("Y", "#y") ]))
Your ColumnDataSource construction seems fairly concise to me already.
As to your question in the comment, would adding
p.add_tools(HoverTool( tooltips=[ ("X" ,"$x"), ("Y", "$y"), ("Name" , "#legend")]))
do the trick?

How to display GroupBy Count as Bokeh vbar for categorical data

I have a small issue creating a Bokeh vbar in 0.13.0
from a dataframe groupby count operation. The response here was for a multi level group by where as mine isn't.
Updates since posting
added sample data and code based on provided answer to see if issue is my code or something else
Outline
The pandas dataframe contains survey responses
Excellent
Good
Poor
Satisfactory
Very Good
under columns ('ResponseID','RateGeneral','RateAccomodation','RateClean','RateServices')and the dtype as been set as catagory. I want to display a bokeh vbar of the Response Count groupby using
DemoDFCount = DemoDF.groupby('RateGeneral').count()
My bokeh code looks like this
pTest= figure(title='Rating in General',plot_height=350)
pTest.vbar(width=0.9,source=DemoDFCount, x='RateGeneral',top='ResponseID')
show(pTest))
but doesn't produce any chart only a title and toolbar
If I use pandas DemoDFCount.plot.bar(legend=False) I can plot something but how do I create this chart in bokeh?
Sample data as json export
50 rows of sample data from DemoDF.to_json()
'{"ResponseID":{"0":1,"1":2,"2":3,"3":4,"4":5,"5":6,"6":7,"7":8,"8":9,"9":10,"10":11,"11":12,"12":13,"13":14,"14":15,"15":16,"16":17,"17":18,"18":19,"19":20,"20":21,"21":22,"22":23,"23":24,"24":25,"25":26,"26":27,"27":28,"28":29,"29":30,"30":31,"31":32,"32":33,"33":34,"34":35,"35":36,"36":37,"37":38,"38":39,"39":40,"40":41,"41":42,"42":43,"43":44,"44":45,"45":46,"46":47,"47":48,"48":49,"49":50},"RateGeneral":{"0":"Good","1":"Satisfactory","2":"Good","3":"Poor","4":"Good","5":"Satisfactory","6":"Excellent","7":"Good","8":"Good","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Satisfactory","13":"Excellent","14":"Satisfactory","15":"Very Good","16":"Satisfactory","17":"Excellent","18":"Very Good","19":"Excellent","20":"Satisfactory","21":"Good","22":"Satisfactory","23":"Excellent","24":"Satisfactory","25":"Good","26":"Excellent","27":"Very Good","28":"Good","29":"Very Good","30":"Good","31":"Satisfactory","32":"Very Good","33":"Very Good","34":"Very Good","35":"Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Good","41":"Satisfactory","42":"Very Good","43":"Very Good","44":"Poor","45":"Excellent","46":"Good","47":"Excellent","48":"Satisfactory","49":"Good"},"RateAccomodation":{"0":"Very Good","1":"Excellent","2":"Satisfactory","3":"Satisfactory","4":"Good","5":"Good","6":"Very Good","7":"Very Good","8":"Good","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Satisfactory","13":"Excellent","14":"Good","15":"Very Good","16":"Good","17":"Excellent","18":"Excellent","19":"Very Good","20":"Good","21":"Satisfactory","22":"Good","23":"Excellent","24":"Satisfactory","25":"Very Good","26":"Excellent","27":"Excellent","28":"Good","29":"Very Good","30":"Very Good","31":"Very Good","32":"Excellent","33":"Very Good","34":"Very Good","35":"Very Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Excellent","41":"Poor","42":"Very Good","43":"Very Good","44":"Poor","45":"Excellent","46":"Satisfactory","47":"Excellent","48":"Good","49":"Good"},"RateClean":{"0":"Excellent","1":"Excellent","2":"Satisfactory","3":"Good","4":"Excellent","5":"Very Good","6":"Very Good","7":"Excellent","8":"Excellent","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Good","13":"Good","14":"Excellent","15":"Excellent","16":"Good","17":"Excellent","18":"Excellent","19":"Excellent","20":"Good","21":"Very Good","22":"Poor","23":"Very Good","24":"Satisfactory","25":"Very Good","26":"Excellent","27":"Good","28":"Poor","29":"Good","30":"Excellent","31":"Good","32":"Good","33":"Very Good","34":"Satisfactory","35":"Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Very Good","41":"Satisfactory","42":"Excellent","43":"Excellent","44":"Very Good","45":"Excellent","46":"Good","47":"Excellent","48":"Good","49":"Excellent"},"RateServices":{"0":"Very Good","1":"Excellent","2":"Good","3":"Good","4":"Excellent","5":"Good","6":"Good","7":"Very Good","8":"Good","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Good","13":"Very Good","14":"Good","15":"Excellent","16":"Poor","17":"Excellent","18":"Excellent","19":"Excellent","20":"Good","21":"Good","22":"Very Good","23":"Excellent","24":"Satisfactory","25":"Very Good","26":"Excellent","27":"Very Good","28":"Good","29":"Excellent","30":"Very Good","31":"Excellent","32":"Good","33":"Excellent","34":"Very Good","35":"Very Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Very Good","41":"Satisfactory","42":"Excellent","43":"Excellent","44":"Good","45":"Excellent","46":"Very Good","47":"Excellent","48":"Good","49":"Very Good"}}'
The fact that it is multi-level in the other question is not really relevant. When you use a Pandas GroupBy as a data source for Bokeh, Bokeh uses the results of group.describe (which includes counts for each column per group) as the contents of the data source. Here is a complete example that shows Counts-per-Origin from the "cars" data set:
from bokeh.io import show, output_file
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg as df
output_file("groupby.html")
df.origin = df.origin.astype(str)
group = df.groupby('origin')
p = figure(plot_height=350, x_range=group, title="Count by Origin",
toolbar_location=None, tools="")
# using yr_count, but count for any column would work
p.vbar(x='origin', top='yr_count', width=0.8, source=group)
p.y_range.start = 0
p.xgrid.grid_line_color = None
show(p)

Resources