Plots.jl and multiple sided boxplots - julia

In Julia, I've managed to get a boxplot with the following minimal working code:
using Plots
using DataFrames
function boxplot_smaa_similarity(arr_nb_alternative::Vector{Int},
arr_nb_montecarlo::Vector{Int},
nb_criteria::Int, nb_simulations::Int)
# Create a fill dataframe
df = DataFrame(NbAlternative = Int[], NbMonteCarlo = Int[], Similarity = Float64[])
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println()
println("...$mt")
append!(df, (NbAlternative=ones(Int, nb_simulations)*na,
NbMonteCarlo=ones(Int, nb_simulations)*mt,
Similarity=rand(Float64, nb_simulations)))
end
end
# Boxplot dataframe data
p = Plots.boxplot(df[:NbMonteCarlo],
df[:Similarity],
group = df[:NbAlternative],
ylims = (0.0, 1.1),
xlabel ="Nb Simulations Monte Carlo",
ylabel = "Similarity",
dpi = 500)
# Save figure to path, do not hesitate to change path if necessary
Plots.savefig("../output/plot_compare_SMAA-TRI-AD_crit$(nb_criteria)"*
"_nb_alternative_$(arr_nb_alternative[1])-$(arr_nb_alternative[end])"*
"_nb_MC$(arr_nb_montecarlo[1])-$(arr_nb_montecarlo[end]).png")
return p
end
boxplot_smaa_similarity([50,100,150], [2,4,6,8,10], 5, 10)
However, the result is not good to me as the three boxplots are overlapping. Is there a fix with Plots.jl or should I move to PyPlot or another Julia librairy?

Felipe's comment is correct - you should use StatsPlots.jl, which has all the statistical recipes for Plots.jl. There's a groupedboxplot recipe which seems not to be in the readme
a = rand(1:5, 100)
b = rand(1:5, 100)
c = randn(100)
using StatsPlots
groupedboxplot(a, c, group = b, bar_width = 0.8)

Related

Percentage stacked barplot in Julia

I would like to create a percentage stacked barplot in Julia. In R we may do the following:
set.seed(7)
data <- matrix(sample(1:30,6), nrow=3)
colnames(data) <- c("A","B")
rownames(data) <- c("V1","V2","V3")
library(RColorBrewer)
cols <- brewer.pal(3, "Pastel1")
df_percentage <- apply(data, 2, function(x){x*100/sum(x,na.rm=T)})
barplot(df_percentage, col=cols, border="white", xlab="group")
Created on 2022-12-29 with reprex v2.0.2
I am now able to create the axis in percentages, but not to make it stacked and percentage for each stacked bar like above. Here is some reproducible code:
using StatsPlots
measles = [38556, 24472]
mumps = [20178, 23536]
chickenPox = [37140, 32169]
ticklabel = ["A", "B"]
foo = #. measles + mumps + chickenPox
my_range = LinRange(0, maximum(foo), 11)
groupedbar(
[measles mumps chickenPox],
bar_position = :stack,
bar_width=0.7,
xticks=(1:2, ticklabel),
yticks=(my_range, 0:10:100),
label=["measles" "mumps" "chickenPox"]
)
Output:
This is almost what I want. So I was wondering if anyone knows how to make a stacked percentage barplot like above in Julia?
You just need to change the maximum threshold of the LinRange to be fitted to the maximum value of bars (which is 1 in this case), and change the input data for plotting to be the proportion of each segment:
my_range = LinRange(0, 1, 11)
foo = #. measles + mumps + chickenPox
groupedbar(
[measles./foo mumps./foo chickenPox./foo],
bar_position = :stack,
bar_width=0.7,
xticks=(1:2, ["A", "B"]),
yticks=(my_range, 0:10:100),
label=["measles" "mumps" "chickenPox"],
legend=:outerright
)
If you want to have the percentages on each segment, then you can use the following function:
function percentages_on_segments(data)
first_phase = permutedims(data)[end:-1:1, :]
a = [0 0;first_phase]
b = accumulate(+, 0.5*(a[1:end-1, :] + a[2:end, :]), dims=1)
c = vec(b)
annotate!(
repeat(1:size(data, 1), inner=size(data, 2)),
c,
["$(round(100*item, digits=1))%" for item=vec(first_phase)],
:white
)
end
percentages_on_segments([measles./foo mumps./foo chickenPox./foo])
Note that [measles./foo mumps./foo chickenPox./foo] is the same data that I passed to the groupedbar function:

Getting a list of points (in x, y form) in a Julia graph

If I have a Julia plot with any number of points, would it be possible for me to get a list of all of the data points within the graph (using the Plots library)?
EDIT: I am working with GeoStats.jl to create temporal variograms, and I just wanted to calculate the error (using RMSE and MAE) of the model's fit. To do this, I thought I had to compare the points within model's curve with the original semivariogram. The current code I have running is:
using GeoStats, Plots, DataFrames, CSV, Dates, MLJ
data_frame = CSV.read("C:/Users/VSCode/MINTS-Variograms/data/MINTS_001e06373996_IPS7100_2022_01_02.csv", DataFrame)
ms = [parse(Float64,x[20:26]) for x in data_frame[!,:dateTime]]
ms = string.(round.(ms,digits = 3)*1000)
ms = chop.(ms,tail= 2)
data_frame.dateTime = chop.(data_frame.dateTime,tail= 6)
data_frame.dateTime = data_frame.dateTime.* ms
data_frame.dateTime = DateTime.(data_frame.dateTime,"yyyy-mm-dd HH:MM:SS.sss")
ls_index = findall(x-> Millisecond(500)<x<Millisecond(1500), diff(data_frame.dateTime))
df = data_frame[ls_index, :]
#include calculation for average lag
#initialize georef data
𝒟 = georef((Z=df.pm2_5, ))
#empirical variogram - same thing as semivariogram
g = EmpiricalVariogram(𝒟, :Z, maxlag=300.)
plot(g, label = "")
γ = fit(Variogram, g)
plot!(γ, label = "")
hline!([γ.nugget], label = "")
hline!([γ.sill], label = "")
println("nugget: " * string(γ.nugget))
println("sill: " * string(γ.sill))

R print groups of data points in different colors

I'm doing some basic statistics in R and I'm trying to have a different color for each iteration of the loop. So all the data points for i=1 should have the same color, all the data points for i=2 should have the same color etc. The best would be to have different colors for the varying i ranging from yellow to blue for exemple. (I already tried to deal with Colorramp etc. but I didn't manage to get it done.)
Thanks for your help.
library(ggplot2)
#dput(thedata[,2])
#c(1.28994585412464, 1.1317747077577, 1.28029504741834, 1.41172820353708,
#1.13172920065253, 1.40276516298315, 1.43679599499374, 1.90618019359643,
#2.33626745030772, 1.98362330686504, 2.22606615548188, 2.40238822720322)
#dput(thedata[,4])
#c(NA, -1.7394747097211, 2.93081902519318, -0.33212717268786,
#-1.78796119503752, -0.5080871442002, -0.10110379236627, 0.18977632798691,
#1.7514277696687, 1.50275797771879, -0.74632159611221, 0.0978774103243802)
#OR
#dput(thedata[,c(2,4)])
#structure(list(LRUN74TTFRA156N = c(1.28994585412464, 1.1317747077577,
#1.28029504741834, 1.41172820353708, 1.13172920065253, 1.40276516298315,
#1.43679599499374, 1.90618019359643, 2.33626745030772, 1.98362330686504,
#2.22606615548188, 2.40238822720322), SELF = c(NA, -1.7394747097211,
#2.93081902519318, -0.33212717268786, -1.78796119503752, -0.5080871442002,
#-0.10110379236627, 0.18977632798691, 1.7514277696687, 1.50275797771879,
#-0.74632159611221, 0.0978774103243802)), row.names = c(NA, 12L
#), class = "data.frame")
x1=1
xn=x1+3
plot(0,0,col="white",xlim=c(0,12),ylim=c(-5,7.5))
for(i in 1:3){
y=thedata[x1:xn,4]
x=thedata[x1:xn,2]
reg<-lm(y~x)
points(x,y,col=colors()[i])
abline(reg,col=colors()[i])
x1=x1+4
xn=x1+3
}
The basic idea of colorRamp and colorRampPalette is that they are functionals - they are functions that return functions.
From the help page:
colorRampPalette returns a function that takes an integer argument (the required number of colors) and returns a character vector of colors (see rgb) interpolating the given sequence (similar to heat.colors or terrain.colors).
So, we'll get a yellow-to-blue palette function from colorRampPalette, and then we'll give it the number of colors we want along that ramp to actually get the colors:
# create the palette function
my_palette = colorRampPalette(colors = c("yellow", "blue"))
# test it out, see how it works
my_palette(3)
# [1] "#FFFF00" "#7F7F7F" "#0000FF"
my_palette(5)
# [1] "#FFFF00" "#BFBF3F" "#7F7F7F" "#3F3FBF" "#0000FF"
# Now on with our plot
x1 = 1
xn = x1 + 3
# Set the number of iterations (number of colors needed) as a variable:
nn = 3
# Get the colors from our palettte function
my_cols = my_palette(nn)
# type = 'n' means nothing will be plotted, no points, no lines
plot(0, 0, type = 'n',
xlim = c(0, 12),
ylim = c(-5, 7.5))
# plot
for (i in 1:nn) {
y = thedata[x1:xn, 2]
x = thedata[x1:xn, 1]
reg <- lm(y ~ x)
# use the ith color
points(x, y, col = my_cols[i])
abline(reg, col = my_cols[i])
x1 = x1 + 4
xn = x1 + 3
}
You can play with just visualizing the palette---try out the following code for different n values. You can also try out different options, maybe different starting colors. I like the results better with the space = "Lab" argument for the palette.
n = 10
my_palette = colorRampPalette(colors = c("yellow", "blue"), space = "Lab")
n_palette = my_palette(n)
plot(1:n, rep(1, n), col = n_palette, pch = 15, cex = 4)
Besides of lacking a reproducible example, you seem to have some misconceptions.
First, the function colors doesn't take a numeric argument, see ?colors. So if you want to fetch a different color in each iteration, you need to call it like colors()[i]. The code should look something similar to this (in absence of a reproducible example):
for (i in 20:30){
plot(1:10, 1:10, col = colors()[i])
}
Please bear in mind that the call of x1 and xn in your first and second lines inside the for loop, before defining them will cause an error too.

ggplot equivalent of gelman.plot MCMC diagnostic in r

I am creating a series of MCMC diagnostic plots in r using ggplot. I realize there is already a package available in gg for MCMC plotting, but much of this is for my own education as well as practical use. One thing I can't seem to figure out is how to generate the gelman.plot in a ggplot framework.
The gelman.diag function only returns a simple data point and I would like to recreate the complete running chart as shown in gelman.plot.
Is anyone familiar with the algorithmic structure of the gelman potential scale reduction factor and/or a means to port its output to ggplot?
Thank you!
You haven't provided a reproducible example, so I've used the example here. We need the object called combinedchains from that example. In order to avoid cluttering the answer, I've put the code for that at the end of this post.
Now we can run gelman.plot on combined.chains. This is the plot we want to duplicate:
library(coda)
gelman.plot(combined.chains)
To create a ggplot version, we need to get the data for the plot. I haven't done MCMC before, so I'm going to let gelman.plot generate the data for me. For your actual use case, you can probably just generate the appropriate data directly.
Let's look at what gelman.plot is doing: We can see the code for that function by typing the bare function name in the console. A portion of the function code is below. The ... show where I've removed sections of the original code for brevity. Note the call to gelman.preplot, with the output of that function stored in y. Note also that y is returned invisibly at the end. y is a list containing the data we need to create a gelman.plot in ggplot.
gelman.plot = function (x, bin.width = 10, max.bins = 50, confidence = 0.95,
transform = FALSE, autoburnin = TRUE, auto.layout = TRUE,
ask, col = 1:2, lty = 1:2, xlab = "last iteration in chain",
ylab = "shrink factor", type = "l", ...)
{
...
y <- gelman.preplot(x, bin.width = bin.width, max.bins = max.bins,
confidence = confidence, transform = transform, autoburnin = autoburnin)
...
return(invisible(y))
}
So, let's get the data that gelman.plot returns invisibly and store it in an object:
gp.dat = gelman.plot(combinedchains)
Now for the ggplot version. First, gp.dat is a list and we need to convert the various parts of that list into a single data frame that ggplot can use.
library(ggplot2)
library(dplyr)
library(reshape2)
df = data.frame(bind_rows(as.data.frame(gp.dat[["shrink"]][,,1]),
as.data.frame(gp.dat[["shrink"]][,,2])),
q=rep(dimnames(gp.dat[["shrink"]])[[3]], each=nrow(gp.dat[["shrink"]][,,1])),
last.iter=rep(gp.dat[["last.iter"]], length(gp.dat)))
For the plot, we'll melt df into long format, so that we can have each chain in a separate facet.
ggplot(melt(df, c("q","last.iter"), value.name="shrink_factor"),
aes(last.iter, shrink_factor, colour=q, linetype=q)) +
geom_hline(yintercept=1, colour="grey30", lwd=0.2) +
geom_line() +
facet_wrap(~variable, labeller= labeller(.cols=function(x) gsub("V", "Chain ", x))) +
labs(x="Last Iteration in Chain", y="Shrink Factor",
colour="Quantile", linetype="Quantile") +
scale_linetype_manual(values=c(2,1))
MCMC example code to create the combinedchains object (code copied from here):
trueA = 5
trueB = 0
trueSd = 10
sampleSize = 31
x = (-(sampleSize-1)/2):((sampleSize-1)/2)
y = trueA * x + trueB + rnorm(n=sampleSize,mean=0,sd=trueSd)
likelihood = function(param){
a = param[1]
b = param[2]
sd = param[3]
pred = a*x + b
singlelikelihoods = dnorm(y, mean = pred, sd = sd, log = T)
sumll = sum(singlelikelihoods)
return(sumll)
}
prior = function(param){
a = param[1]
b = param[2]
sd = param[3]
aprior = dunif(a, min=0, max=10, log = T)
bprior = dnorm(b, sd = 5, log = T)
sdprior = dunif(sd, min=0, max=30, log = T)
return(aprior+bprior+sdprior)
}
proposalfunction = function(param){
return(rnorm(3,mean = param, sd= c(0.1,0.5,0.3)))
}
run_metropolis_MCMC = function(startvalue, iterations) {
chain = array(dim = c(iterations+1,3))
chain[1,] = startvalue
for (i in 1:iterations) {
proposal = proposalfunction(chain[i,])
probab = exp(likelihood(proposal) + prior(proposal) - likelihood(chain[i,]) - prior(chain[i,]))
if (runif(1) < probab){
chain[i+1,] = proposal
}else{
chain[i+1,] = chain[i,]
}
}
return(mcmc(chain))
}
startvalue = c(4,2,8)
chain = run_metropolis_MCMC(startvalue, 10000)
chain2 = run_metropolis_MCMC(startvalue, 10000)
combinedchains = mcmc.list(chain, chain2)
UPDATE: gelman.preplot is an internal coda function that's not directly visible to users. To get the function code, in the console type getAnywhere(gelman.preplot). Then you can see what the function is doing and, if you wish, construct your own function to return the appropriate diagnostic data in a form more suitable for ggplot.

R quantmod chartSeries newTA chob - modify legend and axis (primary and secundary)

This is an advanced question.
I use my own layout for the chartSeries quantmod function, and I can even create my own newTA. Everything works fine. But ...
What I want to do but I can't:
a) Manipulate the legend of each of the 3 charts:
- move to other corner, (from "topleft" to "topright")
- change the content
- remove completely if needed ...
b) My indicator generates 2 legends:
value1
value2
same as above ... how could I modify them? how could I delete them?
c) control position and range of yaxis (place it on the left / right
or even remove them
same when there is a secundary axis on the graph
d) Modify main legend (the one in the top right
where is written the range of dates
A working sample code:
# Load Library
library(quantmod)
# Get Data
getSymbols("SPY", src="yahoo", from = "2010-01-01")
# Create my indicator (30 values)
value1 <- rnorm(30, mean = 50, sd = 25)
value2 <- rnorm(30, mean = 50, sd = 25)
# merge with the first 30 rows of SPY
dataset <- merge(first(SPY, n = 30),
value1,
value2)
# **** data has now 8 columns:
# - Open
# - High
# - Low
# - Close
# - Volume
# - Adjusted
# - a (my indicator value 1)
# - b (my indicator value 2)
#
# create my TA function - This could also be achieve using the preFUN option of newTA
myTAfun <- function(a){
# input: a: function will receive whole dataset
a[,7:8] # just return my indicator values
}
# create my indicator to add to chartSeries
newMyTA <- newTA(FUN = myTAfun, # chartSeries will pass whole dataset,
# I just want to process the last 2 columns
lty = c("solid", "dotted"),
legend.name = "My_TA",
col = c("red", "blue")
)
# define my layout
layout(matrix(c(1, 2, 3), 3, 1),
heights = c(2.5, 1, 1.5)
)
# create the chart
chartSeries(dataset,
type = "candlesticks",
main = "",
show.grid = FALSE,
name = "My_Indicator_Name",
layout = NULL, # bypass internal layout
up.col = "blue",
dn.col = "red",
TA = c(newMyTA(),
addVo()
),
plot = TRUE,
theme = chartTheme("wsj")
)
I have tried using legend command, and also the option legend.name (with very limited control of the output).
I have had a look at the chob object returned by chartSeries, but I can't figure out what to do next ...
Image below:
After some time learning a little bit more about R internals, S3 and S4 objects, and quantmod package, I've come up with the solution. It can be used to change anything in the graph.
A) If the legend belongs to a secundary indicator window:
Do not print the chartSeries (type option plot = FALSE) and get the returned "chob" object.
In one of the slots of the "chob" object there is a "chobTA" object with 2 params related to legend. Set them to NULL.
Finally, call the hidden function chartSeries.chob
In my case:
#get the chob object
my.chob <- chartSeries(dataset,
type = "candlesticks",
main = "",
show.grid = FALSE,
name = "My_Indicator_Name",
layout = NULL, # bypass internal layout
up.col = "blue",
dn.col = "red",
TA = c(newMyTA(),
addVo()
),
plot = FALSE, # do not plot, just get the chob
#plot = TRUE,
theme = chartTheme("wsj")
)
#if the legend is in a secundary window, and represents
#an indicator created with newTA(), this will work:
my.chob#passed.args$TA[[1]]#params$legend <- NULL
my.chob#passed.args$TA[[1]]#params$legend.name <- NULL
quantmod:::chartSeries.chob(my.chob)
B) In any other case, it is possible to modify "chartSeries.chob", "chartTA", "chartBBands", etc and then call chartSeries.chob
In my case:
fixInNamespace("chartSeries.chob", ns = "quantmod")
quantmod:::chartSeries.chob(my.chob)
It is just enough with adding "#" at the beginning of the lines related to legend().
That's it.

Resources