How to make `Heatmaps` in `Bokeh` with a continuous color map, using Python 3? - plot

I was trying to replicate this style of HeatMap that maps continuous values to a LinearColorMapper instance: http://docs.bokeh.org/en/latest/docs/gallery/unemployment.html
I wanted to make a HeatMap (w/ either charts or rect) and then add a single selection widget to select the obsv_id and then a slider widget to go through the dates.
However, I was having trouble in the beginning with the HeatMap itself with a single obsv_id/date pair. What am I doing wrong in creating this HeatMap? This would essentially be a 3x3 rectangle plot of the size variable and the loc variable.
Bonus: Can you help me/give some advice on how to wire the output of these widgets to control the plot?
I saw these posts but all of the examples use actual hex colors as a list instead of mapping using a continuous measure:
python bokeh, how to make a correlation plot? http://docs.bokeh.org/en/latest/docs/gallery/categorical.html
# Init
import numpy as np
import pandas as pd
from bokeh.plotting import figure, output_notebook, output_file, reset_output, show, ColumnDataSource
from bokeh.models import LinearColorMapper
reset_output()
output_notebook()
np.random.seed(0)
# Coords
dates = ["07-3","07-11","08-6","08-28"]
#locs = ["air","water","earth"]
locs = [0,1,2]
size = [3.0, 0.2, 0.025]
observations = ["obsv_%d"%_ for _ in range(10)]
# Data
Ar_tmp = np.zeros(( len(dates)*len(locs)*len(size)*len(observations), 5 ), dtype=object)
i = 0
for date in dates:
for loc in locs:
for s in size:
for obsv_id in observations:
Ar_tmp[i,:] = np.array([obsv_id, date, loc, s, np.random.random()])
i += 1
DF_tmp = pd.DataFrame(Ar_tmp, columns=["obsv_id", "date", "loc", "size", "value"])
DF_tmp["value"] = DF_tmp["value"].astype(float)
DF_tmp["size"] = DF_tmp["size"].astype(float)
DF_tmp["loc"] = DF_tmp["loc"].astype(float)
# obsv_id date loc size value
# 0 obsv_0 07-3 air 3.0 0.548814
# 1 obsv_1 07-3 air 3.0 0.715189
# 2 obsv_2 07-3 air 3.0 0.602763
# 3 obsv_3 07-3 air 3.0 0.544883
# 4 obsv_4 07-3 air 3.0 0.423655
mapper = LinearColorMapper(low = DF_tmp["value"].min(), high = DF_tmp["value"].max())
# # Create Heatmap of a single observation and date pair
query_idx = set(DF_tmp.index[DF_tmp["obsv_id"] == "obsv_0"]) & set(DF_tmp.index[DF_tmp["date"] == "08-28"])
# p = HeatMap(data=DF_tmp.loc[query_idx,:], x="loc", y="size", values="value")
p = figure()
p.rect(x="loc", y="size",
source=ColumnDataSource(DF_tmp.loc[query_idx,:]),
fill_color={'field': 'value', 'transform': mapper},
line_color=None)
show(p)
My Error:
# Javascript error adding output!
# TypeError: Cannot read property 'length' of null
# See your browser Javascript console for more details.

You have to provide a palette to LinearColorMapper. For example:
mapper = LinearColorMapper(
palette='Magma256',
low=DF_tmp["value"].min(),
high=DF_tmp["value"].max()
)
From the LinearColorMapper doc:
class LinearColorMapper(palette=None, **kwargs)
Map numbers in a range [low, high] linearly into a sequence of colors (a palette).
Not related to your exception, but you'll also need to pass a width and height parameters to p.rect().

Related

Get prediction of OLS fit from statsmodels

I am trying to get in sample predictions from an OLS fit as below,
import numpy as np
import pandas as pd
import statsmodels.api as sm
macrodata = sm.datasets.macrodata.load_pandas().data
macrodata.index = pd.period_range('1959Q1', '2009Q3', freq='Q')
mod = sm.OLS(macrodata['realgdp'], sm.add_constant(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']])).fit()
mod.get_prediction(sm.add_constant(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']])).summary_frame(0.95).head()
This is fine. But if I alter the positions of regressors in mod.get_prediction, I get different estimates,
mod.get_prediction(sm.add_constant(macrodata[['tbilrate', 'unemp', 'realdpi', 'realinv']])).summary_frame(0.95).head()
This is surprising. Can't mod.get_prediction identify the regressors based on column names?
As noted in the comments, sm.OLS will convert your data frame into an array for fitting, and likewise for prediction, it expects the predictors to be in the same order.
If you would like the column names to be used, you can use the formula interface, see the documentation for more details. Below I apply your example :
import statsmodels.api as sm
import statsmodels.formula.api as smf
macrodata = sm.datasets.macrodata.load_pandas().data
mod = smf.ols(formula='realgdp ~ realdpi + realinv + tbilrate + unemp', data=macrodata)
res = mod.fit()
In the order provided :
res.get_prediction(macrodata[['realdpi', 'realinv', 'tbilrate', 'unemp']]).summary_frame(0.95).head()
mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
0 2716.423418 14.608110 2715.506229 2717.340607 2710.782460 2722.064376
1 2802.820840 13.714821 2801.959737 2803.681943 2797.188729 2808.452951
2 2781.041564 12.615903 2780.249458 2781.833670 2775.419588 2786.663539
3 2786.894138 12.387428 2786.116377 2787.671899 2781.274166 2792.514110
4 2848.982580 13.394688 2848.141577 2849.823583 2843.353507 2854.611653
Results are the same if we flip the columns:
res.get_prediction(macrodata[['tbilrate', 'unemp', 'realdpi', 'realinv']]).summary_frame(0.95).head()
mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
0 2716.423418 14.608110 2715.506229 2717.340607 2710.782460 2722.064376
1 2802.820840 13.714821 2801.959737 2803.681943 2797.188729 2808.452951
2 2781.041564 12.615903 2780.249458 2781.833670 2775.419588 2786.663539
3 2786.894138 12.387428 2786.116377 2787.671899 2781.274166 2792.514110
4 2848.982580 13.394688 2848.141577 2849.823583 2843.353507 2854.611653

Plotly choropleth map in jupyter notebooks not showing color

Trying to make a choropleth map in plotly using some data I have in a csv file. Have created This is what i get in result(my map)
Below are the coding that I have did to the work:
import json
import pandas as pd
import plotly.express as px
asean_country = json.load(open("aseancovidmap.geojson","r"))
df= pd.read_csv("covidcases.csv")
df["iso-2"]=df['Country'].apply(lambda x: id_map[x])
id_map={}
for feature in asean_country['features']:
feature['id']= feature['properties']['sform']
id_map[feature['properties']['name']]=feature['id']
figure=px.choropleth(df,locations='iso-2',locationmode='country names',geojson=asean_country,color='Ttlcases',scope='asia',title='Total COVID 19 cases in ASEAN Countries as on 10/1/2022')
figure.show()
clearly I don't have access to your files, so have sourced geometry and COVID data. For reference this is at end of answer.
the key change I have made. *Don't loop over geojson Define locations as column in dataframe and featureidkey
clearly this is coloring countries
solution
import json
import pandas as pd
import plotly.express as px
# asean_country = json.load(open("aseancovidmap.geojson","r"))
asean_country = gdf_asean.rename(columns={"adm0_a3": "iso_a2"}).__geo_interface__
# df= pd.read_csv("covidcases.csv")
df = gdf_asean_cases.loc[:, ["iso_code", "adm0_a3", "total_cases", "date"]].rename(
columns={"iso_code": "iso_a2", "total_cases": "Ttlcases"}
)
figure = px.choropleth(
df,
locations="iso_a2",
featureidkey="properties.iso_a2",
geojson=asean_country,
color="Ttlcases",
title="Total COVID 19 cases in ASEAN Countries as on 10/1/2022",
).update_geos(fitbounds="locations", visible=True).update_layout(margin={"t":40,"b":0,"l":0,"r":0})
figure.show()
data sourcing
import requests, io
import geopandas as gpd
import pandas as pd
# get asia geometry
gdf = gpd.read_file(
"https://gist.githubusercontent.com/hrbrmstr/94bdd47705d05a50f9cf/raw/0ccc6b926e1aa64448e239ac024f04e518d63954/asia.geojson"
)
# get countries that make up ASEAN
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_ASEAN_countries_by_GDP")[1].loc[1:]
# no geometry for singapore.... just ASEAN geometry
gdf_asean = (
gdf.loc[:, ["admin", "adm0_a3", "geometry"]]
.merge(
df.loc[:, ["Country", "Rank"]], left_on="admin", right_on="Country", how="right"
)
)
# get COVID data
dfall = pd.read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")
# filter to last date in data
dfall["date"] = pd.to_datetime(dfall["date"])
dflatest = dfall.groupby(["iso_code"], as_index=False).last()
# merge geometry and COVID data
gdf_asean_cases = gdf_asean.merge(
dflatest.loc[:, ["iso_code", "total_cases", "date"]], left_on="adm0_a3", right_on="iso_code"
)

Listing All Variables (Column Names) in R Shiny's checkboxGroupInput

I'm writing an R shiny application. I'm facing much trouble, particularly the checkboxGroupInput function. I'm hoping that I will be able to create a dynamic list that will automatically list down all columns except the first column, source_file$Date of a dataset named source_file, and I'm not entirely sure on it. Would greatly appreciate any help you can provide!
Sample dataset of source_file would look something like this:
Date
Index 1
Index 2
Index 3
Index 4
Index 5
2016-01-01
+5%
-2%
+5%
+10%
+12%
2016-01-08
+3%
+13%
-8%
-3%
+10%
2016-01-15
+2%
+11%
-3%
+4%
-15%
The end goal is that I hope the checkboxGroupInput function will be able to automatically read all columns starting from the second column (ignore Date). In this case, the check box would load up 5 options, Index 1 to Index 5. It should be replicable such that it can load any number of indexes depending on the data specified. I tried hard-coding each individual index in but it's definitely counter-intuitive and so frustrating to do.
tabPanel("Target Volatility Portfolio",
sidebarPanel(
tags$h3("Find an optimised portfolio to achieve maximum return for a given level of risk/volatility"),
tags$h4("Input:"),
checkboxGroupInput("portfolio_selection",
"Select Number of Indexes for Portfolio",
choices = list(#####please send help here#####)
Edits: Would appreciate if you could help me fix this.
I want to reference the output that comes from the checkbox into my global.R in this format. Basically, I want to use the selected variables to plot a graph. A selection of 2 variables will result in a graph plotting a graph related to the 2 variables, whereas a selection of 10 variables will create a plot involving all 10 variables. (I'm basically plotting the efficient market frontier of x number of stocks where x is the number of variables selected. Its a little hard to explain but I hope attaching the code can provide you some insight) The hashed line is what I need help fixing. Thank you!
plot_emf = function(n_points, target_vol, portfolio_selection)
{
first <- portfolio_selection[1]
last <- portfolio_selection[length(portfolio_selection)]
#######asset_returns = source_file[first:last]########
# Extract necessary parameters
n_assets = ncol(asset_returns)
n_obs = nrow(asset_returns)
n_years = n_obs / 52
# Initialize containers for holding return and vol simulations
return_vector = c()
vol_vector = c()
sharpe_vector = c()
for (i in 1:n_points)
{
# Generate random weights for n assets from uniform(0,1)
asset_weights = runif(n_assets, min = 0, max = 1)
normalization_ratio = sum(asset_weights)
# Asset weights need to add up to 100%
asset_weights = asset_weights / normalization_ratio
# print(asset_weights)
# print(asset_returns)
# Generate the portfolio return vector using these weights
random_portfolio_returns = emf_portfolio_returns(
asset_weights,
asset_returns)
# print(random_portfolio_returns)
# plot_returns_histogram(random_portfolio_returns$portfolio_returns)
cumulative_return = calculate_cumulative_return(random_portfolio_returns$portfolio_returns)
annualized_return = 100*((1 + cumulative_return/100)^(1/n_years) - 1)
annualized_vol = sd(random_portfolio_returns$portfolio_returns)*(52^0.5)
sharpe = annualized_return / annualized_vol
return_vector = append(return_vector, annualized_return)
vol_vector = append(vol_vector, annualized_vol)
sharpe_vector = append(sharpe_vector, sharpe)
#print(paste("Asset weights:",asset_weights))
#print(paste("Anualized return:",annualized_return))
#print(paste("Annualized vol:",annualized_vol))
}
g = ggplot(data = data.frame(vol_vector, return_vector, sharpe_vector),
aes(x = vol_vector, y = return_vector, color = sharpe_vector)) +
scale_color_gradient(low = "red", high = "blue", name = "Sharpe Ratio\n(Return/Risk)") +
ggtitle("Efficient Market Frontier") +
xlab("Annualized Vol (%)") +
ylab("Annualized Return (%)") +
theme(plot.title = element_text(hjust=0.5)) + geom_vline(xintercept=target_vol) +
geom_point()
print(g)
}
You can try something like the following which uses colnames() to extract the new choices, and then updates the checkboxGroupInput with updateCheckboxGroupInput():
server <- function(input, output, session) {
# Read the data once per session - this step might be better to
# put in a `global.R` file
source_file <- read.csv("source_file.csv")
# Column names we want to show - all except `Date`
opts <- setdiff(colnames(source_file), "Date")
# Update your checkboxGroupInput:
updateCheckboxGroupInput(
session, "portfolio_selection", choices = opts
)
# Rest of app after this point --------------------------------------
}

Filtering a torch dataset in R

I'm trying to follow along with the book "Deep Learning with PyTorch". I am using the new R packages torch and torchvision.
On page 173, section 7.2.1 I'm just not sure how to filter this dataset to include only labels 1 and 3 (corresponding to the 0 and 2 in the book).
This is my code, and I'd like to know how to filter transformed_cifar10 as per the code in the book. Meaning filter it so that the transformed_cifar10$y labels only include 1 and 3. and then remap {1,3} to {1,2}.
library(dplyr)
library(torch)
library(torchvision)
data_path <- "./ch7/data" # need to change this?
train_transforms <- function (img) {
img %>%
transform_to_tensor() %>%
transform_normalize(mean = c(0.4915, 0.4823, 0.4468),
std = c(0.2470, 0.2435, 0.2616))
}
transformed_cifar10 <- cifar10_dataset(data_path,
train = TRUE,
download = TRUE,
transform = train_transforms)
This is the python code in the book:
# In[5]:
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']
cifar2 = [(img, label_map[label])
for img, label in cifar10
if label in [0, 2]]
First I thought of trying something like this but clearly it doesn't work...
tensor_cifar10[tensor_cifar10$y == 1]

IndexError while plotting netCDF data using Basemap, matplotlib and contour command

I am trying to plot variables of pressure and wind velocities from the netcdf file of MERRA re-analysis data at a single level. I'm using the basemap module for the plotting and other necessary packages to get the data. Unfortunately, I end up with an error.
Here is my code:
from netCDF4 import Dataset as NetCDFFile
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
import os
import netCDF4
from netCDF4 import Dataset
from osgeo import gdal
os.chdir('F:\Atmospheric rivers\MERRA')
dirrec=str('F:\Atmospheric rivers\MERRA')
ncfile='MERRA2_400.inst6_3d_ana_Np.20180801.SUB.nc'
file=Dataset(ncfile,'r',format='NETCDF4')
lon=file.variables['lon'][:]
lat=file.variables['lat'][:]
time=file.variables['time'][:]
u=file.variables['U'][:]
v=file.variables['V'][:]
p=file.variables['PS'][:]
q=file.variables['QV'][:]
map = Basemap(projection='merc',llcrnrlon=70.,llcrnrlat=8.,urcrnrlon=90.,urcrnrlat=20.,resolution='i')
map.drawcoastlines()
map.drawstates()
map.drawcountries()
#map.drawlsmask(land_color='Linen', ocean_color='#CCFFFF', resolution ='i') # can use HTML names or codes for colors
#map.drawcounties()
parallels = np.arange(0,50,5.) # make latitude lines ever 5 degrees from 30N-50N
meridians = np.arange(-95,-70,5.) # make longitude lines every 5 degrees from 95W to 70W
map.drawparallels(parallels,labels=[1,0,0,0],fontsize=10)
map.drawmeridians(meridians,labels=[0,0,0,1],fontsize=10)
x,y= np.meshgrid(lon-180,lat) # for this dataset, longitude is 0 through 360, so you need to subtract 180 to properly display on map
#x,y = map(lons,lats)
clevs = np.arange(960,1040,4)
cs = map.contour(x,y,p[0,:,:]/100,clevs,colors='blue',linewidths=1.)
plt.show()
plt.savefig('Pressure.png')
Error
%run "F:/Atmospheric rivers/MERRA/aosc.py"
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py:3505: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
b = ax.ishold()
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py:3570: MatplotlibDeprecationWarning: axes.hold is deprecated.
See the API Changes document (http://matplotlib.org/api/api_changes.html)
for more details.
ax.hold(b)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
F:\Atmospheric rivers\MERRA\aosc.py in <module>()
36 #x,y = map(lons,lats)
37 clevs = np.arange(960,1040,4)
---> 38 cs = map.contour(x,y,p[0,:,:]/100,clevs,colors='blue',linewidths=1.)
39
40
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py in with_transform(self, x, y, data, *args, **kwargs)
519 # convert lat/lon coords to map projection coords.
520 x, y = self(x,y)
--> 521 return plotfunc(self,x,y,data,*args,**kwargs)
522 return with_transform
523
C:\Users\Pavilion\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\mpl_toolkits\basemap\__init__.py in contour(self, x, y, data, *args, **kwargs)
3540 # only do this check for global projections.
3541 if self.projection in _cylproj + _pseudocyl:
-> 3542 xx = x[x.shape[0]/2,:]
3543 condition = (xx >= self.xmin) & (xx <= self.xmax)
3544 xl = xx.compress(condition).tolist()
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
So, in summary the error turns out to be
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
How can I fix this error?
I guess you are not using correct values for the x and y to make the plot. Basemap contour expects all x,y and z to be 2D arrays. In your case, x and y are most likely vectors of longitude and latitude values. In addtion, you should convert the longitude and latitude to the figure coordinates based on the map coordinates/projection.
Can you try:
lons,lats = np.meshgrid(lon-180.,lat);
x,y = map(lons,lats)
cs = map.contour(x,y,p[0,:,:]/100,clevs,colors='blue',linewidths=1.)
The second line should convert your longitude and latitude values for the correct x and y values for your selected projection.

Resources