I have a date set that I am trying to tidy up to group by days and the group sales data by weeks. so That I can use linear regression.
The issue Im having is that when I split the column the data is set to object. How can I amend this?
Any help would really be appreciated.
My code so far is
date = pd.to_datetime("8th of March, 2022").import numpy as np
import io
import matplotlib.pyplot as plt
from sklearn import linear_model
from google.colab import files**```
```**uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['sales.csv']))
df.head()**```
below is a sample of the file im uploading
index,Till,Date,Receipt,Account,StaffMember,Value,Discount,Lines
0,22,03/01/2022 09:37:01,684629,NaN,1,-35.7,0.0,1
1,21,03/01/2022 09:46:01,682593,NaN,1,45.5,0.0,2
2,22,03/01/2022 09:48:01,684630,NaN,1,7.0,0.0,1
3,22,03/01/2022 09:50:01,684631,NaN,2,65.92,0.0,1
4,22,03/01/2022 10:01:01,684632,NaN,1,35.7,0.0,1
```**df[['DateOnly', 'TimeOnly']] = df['Date'].str.split(' ', 1, expand=True)
df.head()**```
When I check the df.types DateOnly and TimeOnly are set as objects.
```**pd.date_range('2022-01-01', '2022-03-26', freq='W')**```
I can group by week but I cannot seem to have the vale with it which is what I need.
I'm doing a Bokeh application in which I have an input table, some calculations are performed on it and it produces a new table. I'm trying to plot a heatmap of this new table, so I have to create a colorbar using the LinearColorMapper function, however I can't use the min and max values from the calculated table (which is a ColumnDataSource), this is how the table is stored:
def val_portafolio_mostrar():
val_portafolio=datos_calcular()
val_mapa=pd.DataFrame(val_portafolio.stack(), columns=['valoracion']).reset_index()
datos_heatmap.data=dict(val_mapa)
The values which are going to be plotted on the heatmap are in the 'valoracion' column from datos_heatmap, this is the code I'm using for the LinearColorMapper
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
mapper = LinearColorMapper(palette=colors, low=min(datos_heatmap.data['valoracion']),
high=max(datos_heatmap.data['valoracion']))
however I'm getting the following error:
in mapper_fun
mapper = LinearColorMapper(palette=colors, low=min(datos_heatmap.data['valoracion']),
ValueError: min() arg is an empty sequence
I think it's because in order to access a ColumnDataSource value, the function needs to have the "source" parameter, however the LinearColorMapper function does not have this parameter so it's not possible to solve it this way. I also tried to store the max and min values in another ColumnDataSource but I get the same error because I'm not using a source rather just extracting the values as in "datos_heatmap.data['valoracion']"
Thanks in advance!
I guess the problem is, that your datos_heatmap.data['valoracion'] really is an empty sequence. Please provide futher information here with a minimal example and try printing your datos_heatmap.data['valoracion'] right before you pass it to the LinearColorMapper. If you work with python functions, make sure that your definitions are correct within the namespaces you want them to exist.
Check the following example, that works the same way as you want it to:
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, LinearColorMapper
from bokeh.layouts import layout
plot1 = figure(plot_width=1000, plot_height=250)
df = pd.DataFrame({"ID":[0, 1, 2, 3],
"Value1":[0, 100, 200, 300]})
source = ColumnDataSource(df)
cmap = LinearColorMapper(palette="Turbo256", low = min(source.data["Value1"]), high = max(source.data["Value1"]))
print(f"high value cmap: {cmap.high}")
print(f"low value cmap: {cmap.low}")
circle = plot1.circle(x='ID', y='Value1', source=source, size=30,
fill_color={"field":'Value1', "transform":cmap})
show(plot1)
This produces the following plot:
I would like to plot in Plotly all columns from dataframe without having to define them.
The required is the same functionality in Plotly as here in matplotlib.
import glob
import pandas as pd
df = pd.DataFrame({
'A': ['15','21','30'],
'M': ['12','24','31'],
'I': ['28','32','10']})
%matplotlib inline
from matplotlib import pyplot as plt
df=df.astype(float)
df.plot()
Here is my code for Plotly, but as I said, I have no idea how to plot all the columns automatically. The once I have noticed is also, that in Plotly the X-axis needs to be defined, but with this restriction I can live.
import plotly.express as px
import pandas as pd
import numpy as np
import os
# data
df = pd.DataFrame({
'ID': ['1','2','3'],
'A': ['15','21','30'],
'M': ['12','24','31'],
'I': ['28','32','10']})
df_long=pd.melt(df , id_vars=['ID'], value_vars=['A', 'M' , 'I'])
fig = px.line(df_long, x='ID', y='value', color='variable')
fig.show()
How can I define how to plot in Plotly all the columns automatically?
Okay, i have found the solution to my problem:
df_long=pd.melt(df , id_vars=['ID'])
instead of:
df_long=pd.melt(df , id_vars=['ID'], value_vars=['A', 'M' , 'I'])
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
Can I plot a bokeh ColumnDataSource in one go, similar to plotting a Pandas DataFrame (e.g., second figure on the Pandas plotting documentation); especially if the ColumnDatasource is derived from a DataFrame? Or is the loss of e.g. the index preventing this?
So, with Pandas I can do:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
years = np.arange(2000, 2020)
columns = ['first', 'second', 'third', 'fourth']
df = pd.DataFrame(data=np.random.random((20, 4)), columns=columns, index=years)
df.plot()
plt.show()
But I'd like some interactivity on that figure; in particular, hovering over the lines should show the y-value and the respective label name.
With Bokeh, I currently use:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.palettes import Category20
output_notebook()
p = figure(plot_width=800, plot_height=600)
for column, color in zip(columns, Category20[4]):
line = p.line(df.index, df[column], color=color, legend=column, name=column)
p.add_tools(HoverTool(renderers=[line], tooltips=[(column, "$y")], toggleable=False))
p.legend.background_fill_alpha = 0.5
show(p)
The separate p.add_tools(...) calls are so that the individual dataset labels and y-values show up when hovered over the corresponding line.
But the manual loop feels a tad unwieldy. I wonder if there is something available along the following lines:
source = ColumnDataSource(df)
p.multi_line(source=source, color=Category20[4])
# or something similar to: p.multi_line(source=source, xs='index', ys=models)
where the index is automatically used for the x-axis, the columns are looped over, and the legend labels are derived from the column names.
The best alternative I've found appears to be
source = ColumnDataSource(dict(
x=[df.index]*len(columns),
y=[df[column].values for column in columns],
color=Category20[4],
legend=columns))
p.multi_line(source=source, xs='x', ys='y', color='color')
show(p)
but that again feels unwieldy in creating the ColumnDatasource. Plus, I don't know how to create tooltips for each line individually: I guess multi-line isn't supposed to be used for lines that should be regarded individually.
Is there a more direct, easier, way?
My previous comment did not come out at all like what I was trying to type.
I was trying to suggest:
p.add_tools(HoverTool( tooltips=[ ("X" ,"#x"), ("Y", "#y") ]))
Your ColumnDataSource construction seems fairly concise to me already.
As to your question in the comment, would adding
p.add_tools(HoverTool( tooltips=[ ("X" ,"$x"), ("Y", "$y"), ("Name" , "#legend")]))
do the trick?
I have a small issue creating a Bokeh vbar in 0.13.0
from a dataframe groupby count operation. The response here was for a multi level group by where as mine isn't.
Updates since posting
added sample data and code based on provided answer to see if issue is my code or something else
Outline
The pandas dataframe contains survey responses
Excellent
Good
Poor
Satisfactory
Very Good
under columns ('ResponseID','RateGeneral','RateAccomodation','RateClean','RateServices')and the dtype as been set as catagory. I want to display a bokeh vbar of the Response Count groupby using
DemoDFCount = DemoDF.groupby('RateGeneral').count()
My bokeh code looks like this
pTest= figure(title='Rating in General',plot_height=350)
pTest.vbar(width=0.9,source=DemoDFCount, x='RateGeneral',top='ResponseID')
show(pTest))
but doesn't produce any chart only a title and toolbar
If I use pandas DemoDFCount.plot.bar(legend=False) I can plot something but how do I create this chart in bokeh?
Sample data as json export
50 rows of sample data from DemoDF.to_json()
'{"ResponseID":{"0":1,"1":2,"2":3,"3":4,"4":5,"5":6,"6":7,"7":8,"8":9,"9":10,"10":11,"11":12,"12":13,"13":14,"14":15,"15":16,"16":17,"17":18,"18":19,"19":20,"20":21,"21":22,"22":23,"23":24,"24":25,"25":26,"26":27,"27":28,"28":29,"29":30,"30":31,"31":32,"32":33,"33":34,"34":35,"35":36,"36":37,"37":38,"38":39,"39":40,"40":41,"41":42,"42":43,"43":44,"44":45,"45":46,"46":47,"47":48,"48":49,"49":50},"RateGeneral":{"0":"Good","1":"Satisfactory","2":"Good","3":"Poor","4":"Good","5":"Satisfactory","6":"Excellent","7":"Good","8":"Good","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Satisfactory","13":"Excellent","14":"Satisfactory","15":"Very Good","16":"Satisfactory","17":"Excellent","18":"Very Good","19":"Excellent","20":"Satisfactory","21":"Good","22":"Satisfactory","23":"Excellent","24":"Satisfactory","25":"Good","26":"Excellent","27":"Very Good","28":"Good","29":"Very Good","30":"Good","31":"Satisfactory","32":"Very Good","33":"Very Good","34":"Very Good","35":"Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Good","41":"Satisfactory","42":"Very Good","43":"Very Good","44":"Poor","45":"Excellent","46":"Good","47":"Excellent","48":"Satisfactory","49":"Good"},"RateAccomodation":{"0":"Very Good","1":"Excellent","2":"Satisfactory","3":"Satisfactory","4":"Good","5":"Good","6":"Very Good","7":"Very Good","8":"Good","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Satisfactory","13":"Excellent","14":"Good","15":"Very Good","16":"Good","17":"Excellent","18":"Excellent","19":"Very Good","20":"Good","21":"Satisfactory","22":"Good","23":"Excellent","24":"Satisfactory","25":"Very Good","26":"Excellent","27":"Excellent","28":"Good","29":"Very Good","30":"Very Good","31":"Very Good","32":"Excellent","33":"Very Good","34":"Very Good","35":"Very Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Excellent","41":"Poor","42":"Very Good","43":"Very Good","44":"Poor","45":"Excellent","46":"Satisfactory","47":"Excellent","48":"Good","49":"Good"},"RateClean":{"0":"Excellent","1":"Excellent","2":"Satisfactory","3":"Good","4":"Excellent","5":"Very Good","6":"Very Good","7":"Excellent","8":"Excellent","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Good","13":"Good","14":"Excellent","15":"Excellent","16":"Good","17":"Excellent","18":"Excellent","19":"Excellent","20":"Good","21":"Very Good","22":"Poor","23":"Very Good","24":"Satisfactory","25":"Very Good","26":"Excellent","27":"Good","28":"Poor","29":"Good","30":"Excellent","31":"Good","32":"Good","33":"Very Good","34":"Satisfactory","35":"Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Very Good","41":"Satisfactory","42":"Excellent","43":"Excellent","44":"Very Good","45":"Excellent","46":"Good","47":"Excellent","48":"Good","49":"Excellent"},"RateServices":{"0":"Very Good","1":"Excellent","2":"Good","3":"Good","4":"Excellent","5":"Good","6":"Good","7":"Very Good","8":"Good","9":"Satisfactory","10":"Satisfactory","11":"Excellent","12":"Good","13":"Very Good","14":"Good","15":"Excellent","16":"Poor","17":"Excellent","18":"Excellent","19":"Excellent","20":"Good","21":"Good","22":"Very Good","23":"Excellent","24":"Satisfactory","25":"Very Good","26":"Excellent","27":"Very Good","28":"Good","29":"Excellent","30":"Very Good","31":"Excellent","32":"Good","33":"Excellent","34":"Very Good","35":"Very Good","36":"Excellent","37":"Satisfactory","38":"Excellent","39":"Good","40":"Very Good","41":"Satisfactory","42":"Excellent","43":"Excellent","44":"Good","45":"Excellent","46":"Very Good","47":"Excellent","48":"Good","49":"Very Good"}}'
The fact that it is multi-level in the other question is not really relevant. When you use a Pandas GroupBy as a data source for Bokeh, Bokeh uses the results of group.describe (which includes counts for each column per group) as the contents of the data source. Here is a complete example that shows Counts-per-Origin from the "cars" data set:
from bokeh.io import show, output_file
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg as df
output_file("groupby.html")
df.origin = df.origin.astype(str)
group = df.groupby('origin')
p = figure(plot_height=350, x_range=group, title="Count by Origin",
toolbar_location=None, tools="")
# using yr_count, but count for any column would work
p.vbar(x='origin', top='yr_count', width=0.8, source=group)
p.y_range.start = 0
p.xgrid.grid_line_color = None
show(p)