holoviews overaly of quadmesh and points with time slider not working - multidimensional-array

I am trying to overaly hv.points on top of hv.quadmesh or hv.image where both these plots have a time dimension.
import xarray as xr
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
#create data
np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)
precipitation = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
time = pd.date_range("2014-09-06", periods=3)
reference_time = pd.Timestamp("2014-09-05")
ds = xr.Dataset(
data_vars=dict(
temperature=(["x", "y", "time"], temperature),
precipitation=(["x", "y", "time"], precipitation),
),
coords=dict(
lon=(["x", "y"], lon),
lat=(["x", "y"], lat),
time=time,
reference_time=reference_time,
),
attrs=dict(description="Weather related data."),
)
df = ds.temperature.to_dataframe().reset_index()
#create quadmesh plot
img = hv.Dataset(ds, ['lon', 'lat','time'],['temperature']).to(hv.QuadMesh,['lon', 'lat'],['temperature'])
img
#create points plot
pnt = hv.Points(df,vdims=['temperature','time'], kdims=['lon','lat']).groupby('time').opts(color='temperature',cmap='turbo')
pnt
Overaly of both image and point only shows first plot
img*pnt
If i remove the time component from the points plot, I can overlay the data but then the slider does not change points value with time
img*hv.Points(df,vdims=['temperature','time'], kdims=['lon','lat']).opts(color='temperature',cmap='turbo')
[image*points with no time component][4]
Thank you for your help !!

Your problem is likely due to the transformation of the time objects when you cast to a dataframe.
If you manually try to select data for a particular time step via
dfSubset = df.where(df.time == ds.time[0]).dropna()
you will encounter an error.
In my experience a manual selection of the time and then letting HoloViews decide how to handle the objects works better than grouping and then transforming via .to().
In your case this would be the following 3 lines:
dsSubset = ds.where(ds.time == ds.time[0], drop=True)
dfSubset = df.where(df.time == df.time[0]).dropna()
hv.QuadMesh(dsSubset, ['lon', 'lat'] ) * hv.Points(dfSubset, ['lon','lat'], vdims=['temperature']).opts(color='temperature',cmap='plasma',colorbar=True)

Related

Getting a list of points (in x, y form) in a Julia graph

If I have a Julia plot with any number of points, would it be possible for me to get a list of all of the data points within the graph (using the Plots library)?
EDIT: I am working with GeoStats.jl to create temporal variograms, and I just wanted to calculate the error (using RMSE and MAE) of the model's fit. To do this, I thought I had to compare the points within model's curve with the original semivariogram. The current code I have running is:
using GeoStats, Plots, DataFrames, CSV, Dates, MLJ
data_frame = CSV.read("C:/Users/VSCode/MINTS-Variograms/data/MINTS_001e06373996_IPS7100_2022_01_02.csv", DataFrame)
ms = [parse(Float64,x[20:26]) for x in data_frame[!,:dateTime]]
ms = string.(round.(ms,digits = 3)*1000)
ms = chop.(ms,tail= 2)
data_frame.dateTime = chop.(data_frame.dateTime,tail= 6)
data_frame.dateTime = data_frame.dateTime.* ms
data_frame.dateTime = DateTime.(data_frame.dateTime,"yyyy-mm-dd HH:MM:SS.sss")
ls_index = findall(x-> Millisecond(500)<x<Millisecond(1500), diff(data_frame.dateTime))
df = data_frame[ls_index, :]
#include calculation for average lag
#initialize georef data
𝒟 = georef((Z=df.pm2_5, ))
#empirical variogram - same thing as semivariogram
g = EmpiricalVariogram(𝒟, :Z, maxlag=300.)
plot(g, label = "")
γ = fit(Variogram, g)
plot!(γ, label = "")
hline!([γ.nugget], label = "")
hline!([γ.sill], label = "")
println("nugget: " * string(γ.nugget))
println("sill: " * string(γ.sill))

How to plot scatter plot with original variables after scalling with K-means

I have scaled my original data X1:
scaler = StandardScaler()
X1_scaled = pd.DataFrame(scaler.fit_transform(X1),columns = X1.columns)
and then performed k-means clustering:
kmeans = KMeans(
init="random",
n_clusters=3,
n_init=10,
max_iter=300,
random_state=123)
X1['label'] = kmeans.fit_predict(X1_scaled[['Wn', 'LL']])
# get centroids
centroids = kmeans.cluster_centers_
cen_x = [i[0] for i in centroids]
cen_y = [i[1] for i in centroids]
Now, I would like to plot the original data(X1) and the centroids, but the centroids are scaled, so when I plot the results:
g = sns.scatterplot(x=X1.Wn, y= X1.LL, hue=X1.label,
data=X1, palette='colorblind',
legend='full')
g = sns.scatterplot(cen_x,cen_y,s=80,color='black')
the centroids is outside the clusters.
How can I plot the original data, with the groups and the centroids?
this is the image I got:
and this is what I would like to have, but with the original data and not the scaled data:
You can call scaler.inverse_transform() on the centroids. (Note that sns.scatterplot is an axes-level function and returns an ax, not a FacetGrid.)
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
X1 = pd.DataFrame({'Wn': np.random.rand(30) * 12, 'LL': np.random.rand(30) * 6})
scaler = StandardScaler()
X1_scaled = pd.DataFrame(scaler.fit_transform(X1), columns=X1.columns)
kmeans = KMeans(init="random",
n_clusters=3,
n_init=10,
max_iter=300,
random_state=123)
X1['label'] = kmeans.fit_predict(X1_scaled[['Wn', 'LL']])
# get centroids
centroids = scaler.inverse_transform(kmeans.cluster_centers_)
cen_x = [i[0] for i in centroids]
cen_y = [i[1] for i in centroids]
ax = sns.scatterplot(x='Wn', y='LL', hue='label',
data=X1, palette='colorblind',
legend='full')
sns.scatterplot(x=cen_x, y=cen_y, s=80, color='black', ax=ax)
plt.tight_layout()
plt.show()

Holoviews tap stream of correlation heatmap and regression plot

I want to make a correlation heatmap for a DataFrame and a regression plot for each pair of the variables. I have tried to read all the docs and am still having a very hard time to connect two plots so that when I tap the heatmap, the corresponding regression plot can show up.
Here's some example code:
import holoviews as hv
from holoviews import opts
import seaborn as sns
import numpy as np
import pandas as pd
hv.extension('bokeh')
df = sns.load_dataset('tips')
df = df[['total_bill', 'tip', 'size']]
corr = df.corr()
heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
.opts(tools=['tap', 'hover'], height=400, width=400, toolbar='above')
m, b = np.polyfit(df.tip, df.total_bill, deg=1)
x = np.linspace(df.tip.min(), df.tip.max())
y = m*x + b
curve = hv.Curve((x, y))\
.opts(height=400, width=400, color='red', ylim=(0, 100))
points = hv.Scatter((df.tip, df.total_bill))
hv.Layout((points * curve) + heatmap).cols(2)
I adjusted the relevant parts of the docs http://holoviews.org/reference/streams/bokeh/Tap.html with your code. Maybe this clears up your confusion.
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', width=90)
import seaborn as sns
# Declare dataset
df = sns.load_dataset('tips')
df = df[['total_bill', 'tip', 'size']]
# Declare HeatMap
corr = df.corr()
heatmap = hv.HeatMap((corr.columns, corr.index, corr))
# Declare Tap stream with heatmap as source and initial values
posxy = hv.streams.Tap(source=heatmap, x='total_bill', y='tip')
# Define function to compute histogram based on tap location
def tap_histogram(x, y):
m, b = np.polyfit(df[x], df[y], deg=1)
x_data = np.linspace(df.tip.min(), df.tip.max())
y_data = m*x_data + b
return hv.Curve((x_data, y_data), x, y) * hv.Scatter((df[x], df[y]), x, y)
tap_dmap = hv.DynamicMap(tap_histogram, streams=[posxy])
(heatmap + tap_dmap).opts(
opts.Scatter(height=400, width=400, color='red', ylim=(0, 100), framewise=True),
opts.HeatMap(tools=['tap', 'hover'], height=400, width=400, toolbar='above'),
opts.Curve(framewise=True)
)
Two common problems we face while modeling is collinearity and nonlinearity. The collinearity could be visualized with a correlation heatmap, but it would become hard to explore with a large amount of variables/features. In the following application, you can hover the mouse over to check the correlation coefficient between any two variables. When you tap, the scatter plot will be updated with a second-degree fitted curve to reveal the nonlinearity between the two variables.
With the help of #doopler, I changed the code a little bit and share it here:
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
# generate random data
df = pd.DataFrame(data={'col_1': np.random.normal(5, 2, 100)})
df['col_2'] = df.col_1 + np.random.gamma(5, 2, 100)
df['col_3'] = df.col_1*2 + np.random.normal(0, 10, 100)
df['col_4'] = df.col_1**2 + np.random.normal(0, 10, 100)
df['col_5'] = np.sin(df.col_1)
df['col_6'] = np.cos(df.col_1)
corr = df.corr().abs()
# mask the upper triangle of the heatmap
corr.values[np.triu_indices_from(corr, 0)] = np.nan
heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
.opts(tools=['hover'], height=400, width=400, fontsize=9,
toolbar='above', colorbar=False, cmap='Blues',
invert_yaxis=True, xrotation=90, xlabel='', ylabel='',
title='Correlation Coefficient Heatmap (absolute value)')
# define tap stream with heatmap as source
tap_xy = hv.streams.Tap(source=heatmap, x='col_1', y='col_4')
# calculate correlation plot based on tap
def tap_corrplot(x, y):
# drop missing values if there are any
df_notnull = df[[x, y]].dropna(how='any')
# fit a 2nd degree line/curve
m1, m2, b = np.polyfit(df_notnull[x], df_notnull[y], deg=2)
# generate data to plot fitted line/curve
x_curve = np.linspace(df[x].min(), df[x].max())
y_curve = m1*x_curve**2 + m2*x_curve+ b
curve = hv.Curve((x_curve, y_curve), x, y)\
.opts(color='#fc4f30', framewise=True)
scatter = hv.Scatter((df[x], df[y]), x, y)\
.opts(height=400, width=400, fontsize=9, size=5,
alpha=0.2, ylim=(df[y].min(), df[y].max()),
color='#30a2da', framewise=True,
title='Correlation Plot (2nd degree fit)')
return curve * scatter
# map tap in heatmap with correlation plot
tap_dmap = hv.DynamicMap(tap_corrplot, streams=[tap_xy])
layout = heatmap + tap_dmap
layout
In case that you need to run a Bokeh application:
from bokeh.server.server import Server
renderer = hv.renderer('bokeh')
app = renderer.app(layout)
server = Server({'/': app}, port=0)
server.start()
server.show('/')
The code works well with Jupyter Lab. If you use Jupyter Notebook, check this link.

LSTM Sequential Model, Predict future Values on M15 chart for one day

Hello Stackoverflow members,
I have built up an LSTM Seuqential Model for Forex M15 Values, specifically for the pair EURUSD, with typical_price as the price type.
Now after setting up and train the model, I would like to predict, extrapolate the typical_price for one future day.
In my dataset I took the data for one month (January 2017) from 1st to 30th as training and testing dataset (1920 values). Now I would like to extrapolate the prices for the 31th of January. I cannot really resolve what the model likes as input data and shape, to extrapolate the data from the last value of the 30th of January.
Can someone give me a hint or explain what the function model.predict() needs as input values?
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from subprocess import check_output
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from sklearn.cross_validation import train_test_split
import time #helper libraries
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
from numpy import newaxis
from keras.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold
import time
df = pd.read_csv('EURUSD15.csv')
df.columns = ['date','time','open','high','low','close','vol']
df['date']=df['date'].str.replace('.','-')
J = df[(df['date'] > '2017-01-01') & (df['date'] < '2017-01-30')]
J['timestamp'] = pd.to_datetime(J['date'].apply(str)+' '+J['time'])
J['tp']=((J['high']+J['low']+J['close'])/3)
EURUSD = J[['timestamp','open','high','low','close','vol','tp']]
df = EURUSD.drop(['timestamp','open','high','low','close','vol'], axis=1)
scaler = MinMaxScaler(feature_range=(0,1))
df = scaler.fit_transform(df)
def window_transform_series(series,window_size):
# containers for input/output pairs
dataX = []
datay = []
for i in range(window_size, len(series)):
dataX.append(series[i - window_size:i])
datay.append(series[i])
# reshape
dataX = np.asarray(dataX)
dataX.shape = (np.shape(dataX)[0:2])
datay = np.asarray(datay)
datay.shape = (len(datay),1)
return dataX,datay
window_size = 50
dataX,datay = window_transform_series(series = df, window_size = window_size)
train_test_split = int(np.ceil(2*len(datay)/float(3))) # set the split point
# partition the training set
# X_train = dataX[:train_test_split,:]
# y_train = datay[:train_test_split]
# partition the training set
X_train = dataX[:train_test_split,:]
y_train = datay[:train_test_split]
#keep the last chunk for testing
X_test = dataX[train_test_split:,:]
y_test = datay[train_test_split:]
# NOTE: to use keras's RNN LSTM module our input must be reshaped
X_train = np.asarray(np.reshape(X_train, (X_train.shape[0], window_size, 1)))
X_test = np.asarray(np.reshape(X_test, (X_test.shape[0], window_size, 1)))
import keras
np.random.seed(0)
#Build an RNN to perform regression on our time series input/output data
model = Sequential()
model.add(LSTM(5, input_shape=(window_size, 1)))
model.add(Dense(1))
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
# compile the model
model.compile(loss='mean_squared_error', optimizer=optimizer)
model.fit(X_train, y_train, epochs=500, batch_size=64, verbose=1)
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
# print out training and testing errors
training_error = model.evaluate(X_train, y_train, verbose=0)
print('training error = ' + str(training_error))
testing_error = model.evaluate(X_test, y_test, verbose=0)
print('testing error = ' + str(testing_error))
training error = 0.0001732897365647525
testing error = 0.00019586048660112955
%matplotlib inline
#plot original series
plt.plot(df, color = 'k')
# plot training set prediction
split_pt = train_test_split + window_size
plt.plot(np.arange(window_size,split_pt,1),train_predict,color = 'b')
# plot testing set prediction
plt.plot(np.arange(split_pt,split_pt + len(test_predict),1), test_predict,color ='r')
# pretty up graph
plt.xlabel('day')
plt.ylabel('(normalized) price of EURUSD')
plt.legend(['original series','training fit','testing fit'],loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
It suppose to be open, highest, lowest price, volume. So you can predict closing price for some imaginary date or you can model.predict(X_test[30]). But one line in your code is strange - the line where you drop all yours features. I wonder how yout X_train[0] looks like.

Interactive version of charts.PerformanceSummary()

I would like to create an interactive version of charts.PerformanceSummary() using rCharts.
This is my attempt so far...but am struggling to put it all together....
# Load xts and PerformanceAnalytics package
require(xts)
require(PerformanceAnalytics)
# Generate rtns data
set.seed(123)
X.stock.rtns <- xts(rnorm(1000,0.00001,0.0003), Sys.Date()-(1000:1))
Y.stock.rtns <- xts(rnorm(1000,0.00003,0.0004), Sys.Date()-(1000:1))
Z.stock.rtns <- xts(rnorm(1000,0.00005,0.0005), Sys.Date()-(1000:1))
rtn.obj <- merge(X.stock.rtns , Y.stock.rtns, Z.stock.rtns)
colnames(rtn.obj) <- c("x.stock.rtns","y.stock.rtns","z.stock.rtns")
# The below output is what we are aiming for
charts.PerformanceSummary(rtn.obj,lwd=1,main="Performance of stocks x,y and z")
# So this is what I have tried to do to replicate the data and try and generate graphs
# custom function to convert xts to data.frame
xts.2.df <- function(xts.obj){
df <- ggplot2:::fortify(xts.obj)
df[,1] <- as.character(df[,1])
df
}
# calculating the data for the top and bottom graph
cum.rtn <- do.call(merge,lapply(seq(ncol(rtn.obj)),function(y){cumprod(rtn.obj[,y]+1)-1}))
dd.rtn <- do.call(merge,lapply(seq(ncol(rtn.obj)),function(y){Drawdowns(rtn.obj[,y])}))
# Loading rCharts package
require(devtools)
install_github('rCharts', 'ramnathv',ref='dev')
require(rCharts)
# creating the first cumulative return graph
m1 <- mPlot(x = "Index", y = c("x.stock.rtns","y.stock.rtns","z.stock.rtns"), type = "Line", data = xts.2.df(cum.rtn),
pointSize = 0, lineWidth = 1)
# Top cumulative return graph
m1
# Creating the individual bar graphs that are to be shown when one line is hovered over
m.x <- mPlot(x = "Index", y = c("x.stock.rtns"), type="Bar",data = xts.2.df(rtn.obj))
m.y <- mPlot(x = "Index", y = c("y.stock.rtns"), type="Bar",data = xts.2.df(rtn.obj))
m.z <- mPlot(x = "Index", y = c("z.stock.rtns"), type="Bar",data = xts.2.df(rtn.obj))
# Creating the drawdown graph
m2 <- mPlot(x = "Index", y = c("x.stock.rtns","y.stock.rtns","z.stock.rtns"), type = "Line", data = xts.2.df(dd.rtn),
pointSize = 0, lineWidth = 1)
m2
So there are few parts to the question:
How do you put three morris.js charts together so that they are linked?
Can you make bold the line that is being hovered over in the top graph (m1)?
How do you get the middle one (i.e. one of m.x, m.y, m.z)to change according to what's been hovered over, i.e if hovering over stock z, then stock z's returns (m.z) show up un the middle?
Can you get it to make bold in the bottom graph, the same asset that is being made bold in the top graph?
Can you change the information that is being displayed to in the floating box to display some stats about the asset being hovered over?
How do you add axes labels?
How do you add an overall title?
BONUS: How do you integrate crossfilter.js into it so that a subset of time can be chosen...and all graphs get re-drawn?
Even if you can't answer all parts any help/comments/answers would be appreciated...

Resources