BokehUserWarning: ColumnDataSource's columns must be of the same length - Workaround? - bokeh

I'm trying to create a visualization of a split horizontal bar chart to show differences in distribution by gender (female, male). I created the dataframe manually because it's only 11 rows x 2 columns. When I run the code it shows an empty result and states this warning:
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('+75', 1), ('20-24', 2), ('25-29', 2), ('30-34', 2), ('35-39', 2), ('40-44', 2), ('45-49', 2), ('50-54', 2), ('55-59', 2), ('60-64', 2), ('65-74', 1), ('F', 2)
I checked and the length between female and male are the same, not sure what could be wrong.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import RdYlBu11
from bokeh.plotting import figure
output_file("example_split.html")
gender = ['F', 'M']
age_group = ["20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-74", "+75"]
females = {'F' : gender,
'20-24' : [3, 0],
'25-29' : [22, 0],
'30-34' : [24, 0],
'35-39' : [16, 0],
'40-44' : [9, 0],
'45-49' : [5, 0],
'50-54' : [3, 0],
'55-59' : [2, 0],
'60-64' : [7, 0],
'65-74' : [0],
'+75' : [2]}
males = {'M' : gender,
'20-24' : [0],
'25-29' : [0],
'30-34' : [9],
'35-39' : [20],
'40-44' : [22],
'45-49' : [16],
'50-54' : [11],
'55-59' : [6],
'60-64' : [7],
'65-74' : [0],
'+75' : [1]}
p = figure(y_range=gender, plot_height=250, x_range=(-25, 25), title="My_title",
toolbar_location=None)
p.hbar_stack(age_group, y='gender', height=0.9, color=RdYlBu11, source=ColumnDataSource(females),
legend_label=["%s females" % x for x in age_group])
p.hbar_stack(age_group, y='gender', height=0.9, color=RdYlBu11, source=ColumnDataSource(males),
legend_label=["%s males" % x for x in age_group])
p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None
show(p)

It's right there in the warning - each of your data sources has columns of different lengths.
females has column F of length 2 and column +75 of length 1.
males has column M of length 2 and all the rest of the columns are of length 1.
I'm not sure what you want to achieve, but your data layout is definitely wrong.
A ColumnDataSource is a data source that represents tabular data. Each key in its data dict is like a column in a table, and each item at index i in each value of the data dict is like a cell in that table somewhere in the row with index i.

Related

ChAMP package:Error in cmdscale(d) : 'k' must be in {1, 2, .. n - 1}

I selected two GSM samples.
when I use the “champ. QC” function of ChAMP package, a error appears:
> champ.QC(beta = myLoad$beta,pheno=myLoad$pd$type)
[===========================]
[<<<<< ChAMP.QC START >>>>>>]
-----------------------------
champ.QC Results will be saved in ./CHAMP_QCimages/
[QC plots will be proceed with 411557 probes and 2 samples.]
<< Prepare Data Over. >>
Error in cmdscale(d) : 'k' must be in {1, 2, .. n - 1}
The complete code is as follows:
pd10 <- data.frame(stringsAsFactors = FALSE,
Sample_Name = c("GSM1669564","GSM1669589"),
type = c("lung_NSCLC_adenocarcinoma"))
idat.name10 <- list.files("/home/shuangshuang/R/Rstudio/03.MethyICIBERSORT/dataset/LUAD",
pattern = "*.idat") |> substr(1L,30L)
pd10$Sentrix_ID <- substr(idat.name10[seq(1,4,2)],12,21)
pd10$Sentrix_Position <- substr(idat.name10[seq(1,4,2)],23,28)
pd10$Sample_Type <- ("tumor")
write.csv(pd10,file = "sample_type1.csv",row.names = F,quote = F)
myDir="/home/shuangshuang/R/Rstudio/03.MethyICIBERSORT/dataset/LUAD"
myLoad <- champ.load(myDir, arraytype="450K")
class(myLoad)
#[1] "list"
champ.QC(beta = myLoad$beta,pheno=myLoad$pd$type)
So how to solve this problem? Thanks!
I checked my code and found no problems. I don't know where the problems are

pandera - use decorator to specify multiple output schemas

I would like to know if it is possible to use pandera decorator to specify multiple output schemas.
Let's say for example you have a function that returns 2 dataframes and you want to check the schema of these dataframes using check_io() decorator:
import pandas as pd
import pandera as pa
from pandera import DataFrameSchema, Column, Check, check_input
df = pd.DataFrame({
"column1": [1, 4, 0, 10, 9],
"column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
})
in_schema = DataFrameSchema({
"column1": Column(int),
"column2": Column(float),
})
out_schema1 = DataFrameSchema({
"column1": Column(int),
"column2": Column(float),
"column3": Column(float),
})
out_schema2 = DataFrameSchema({
"column1": Column(int),
"column2": Column(float),
"column3": Column(int),
})
def preprocessor(df1, df2):
df_out1 = (df1 + df2).assign(column3=lambda x: x.column1 + x.column2)
df_out2 = (df1 + df2).assign(column3=lambda x: x.column1 ** 2)
return df_out1, df_out2
How would this be implemented for the above example?
just in case anyone else is looking for the solution:
#pa.check_io(df1=in_schema, df2=in_schema, out=[(0, out_schema1), (1, out_schema2)])
def preprocessor(df1, df2):
df_out1 = (df1 + df2).assign(column3=lambda x: x.column1 + x.column2)
df_out2 = (df1 + df2).assign(column3=lambda x: x.column1 ** 2)
return df_out1, df_out2
preprocessor(df, df)

How to create custom hover tool with value mapping

I am trying to create a custom hover tool using which takes the y-value of the plot and maps the value to different value.
The code I could come up with so far to achieve this functionality is
from bokeh.models import HoverTool
import holoviews as hv
df = pd.DataFrame(
{
"zero": [0, 0, 0, 0, 0, 0, 0],
"one": [1, 1, 1, 1, 1, 1, 1],
"two": [2, 2, 2, 2, 2, 2, 2],
}
)
mapping = {i: c for i, c in enumerate(df.columns)}
def col_mapping(num):
return mapping[int(num)]
hover = HoverTool(tooltips=[("x", "$x"), ("y", "$y")])
img = hv.Image((df.index, np.arange(df.shape[1]), df.T)).opts(tools=[hover])
img
x and y will be float values. So the idea is to map the y coordinates to its corresponding value in the mapping dictionary
Let me know how I can get a new value in the hover tool so that when the value is b/w 0 and 1 it will be
Thanks
Here's how I'd do it:
code = f"return ({json.dumps(mapping)})[Math.floor(special_vars.y)];"
hover = HoverTool(tooltips=[("x", "$x"), ("y", "$y"), ('mapped_y', '$y{0}')],
formatters={'$y': CustomJSHover(code=code)})
If you need a some more complicated code than that of col_mapping, then you'd have to use a ColumnDataSource and just add to it the fully transformed column.

How to plot a vertical line on a bar plot in Bokeh?

Based on the first example of the user-guide of Bokeh,
from bokeh.io import show, output_file
from bokeh.plotting import figure
from bokeh.models import Span
output_file("bars.html")
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
p = figure(x_range=fruits, plot_height=250, title="Fruit Counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
# these two lines
vline = Span(location='Apples', dimension='height', line_color='blue', line_width=4)
p.renderers.extend([vline])
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
I am trying to add a vertical line to a bar plot whose x-range are categories. However, this does not seem to be possible, as this raises an error "ValueError: expected a value of type Real, got Apples of type str".
location='Apples' does not work as intended as it expected a number.
One solution is to convert the categorical value to the corresponding numeric value on the plot:
index = p.x_range.factors.index("Apples")
delta = (p.x_range.end - p.x_range.start)/p.x_range.factors.length;
location = delta/2 + index;
If the plot is dynamic (e.g. values are not known when the plot is built), then use an auxiliary JS function to do the conversion:
function _value_to_location(x_range, value) {
var index = x_range.factors.findIndex(x => x == value)
var delta = (x_range.end - x_range.start)/x_range.factors.length;
return delta/2 + index;
};
...
vline.location = _value_to_location(figure.x_range, "Apples");

Using NetworkX to Study the Shift Operator and other Mathematical Creations

A branch of operator theory studies the shift operator S. Basically, given a graph with weights assigned to each vertex of the graph, the shift operator produces a new graph by taking the same graph (A) and replacing the weight of each vertex with the sum of the weights of the vertex's neighbors. For example, 3 in graph (A) is replaced by 5 + 5 + 2 + 0.
A
B
Does anyone know if networkx can help me automate such a process for an arbitrary graph, G? Also, what are the limits in size (vertexes, edges, etc) of graphs that I may construct?
First you need to create a graph and add the node weights.
I name the nodes with letters from a to h.
For larger graphs you'll need a different way of naming nodes (so each node has a unique name).
In the code bellow I also draw the node names.
Note that I manually set the node positions so I have the same example as you.
For larger graphs check out graph layouts.
import networkx as nx
from matplotlib import pyplot as plt
G = nx.Graph()
nodes = [
['a', {'weight' : 5}],
['b', {'weight' : 4}],
['c', {'weight' : 2}],
['d', {'weight' : 3}],
['e', {'weight' : 5}],
['f', {'weight' : 0}],
['g', {'weight' : 0}],
['h', {'weight' : 1}]
]
for node in nodes:
G.add_node(node[0], node[1]) # add node and node weight from list
G.add_edges_from([
('a', 'd'),
('b', 'e'),
('c', 'd'),
('d', 'e'),
('d', 'g'),
('e', 'h'),
('e', 'f')
])
pos = {'a' : (1, 2), 'b' : (2, 2), 'c' : (0, 1), 'd' : (1, 1), 'e' : (2, 1), 'f' : (3, 1), 'g' : (1, 0), 'h' : (2, 0)} # manual fixed positions
plt.figure()
nx.draw(G, pos=pos, with_labels=True, node_size=700, node_color='w') # draw node names
plt.show()
Output:
Here is the code which draws the node weights:
plt.figure()
nx.draw(G, pos=pos, labels=nx.get_node_attributes(G, 'weight'), node_size=700, node_color='w') # draw node weights
plt.show()
And finally the code for calculating your shift operator S.
You can get the neighbors of some node node with G[node].
The weight attribute for some node neighbor can be accessed with G.node[neighbor]['weight'].
Using that and list comprehension I sum the list of weights for all neighbor nodes of the current node. Note that the new weights are set with nx.set_node_attributes(G, 'weight', new_weights).
new_weights = {}
for node in G.nodes():
new_weights[node] = sum([G.node[neighbor]['weight'] for neighbor in G[node]]) # sum weights of all neighbors of current node
nx.set_node_attributes(G, 'weight', new_weights) # set new weights
plt.figure()
nx.draw(G, pos=pos, labels=nx.get_node_attributes(G, 'weight'), node_size=700, node_color='w') # draw new node weights
plt.show()
Final graph:

Resources