Processing statistics in Gadfly - julia

I want to extend the Gadfly package to match my own idiosyncratic preferences. However I am having trouble understanding how to use Gadfly's statistics in a way that allows for their output to be processed before plotting.
For example, say I want to use the x,y aesthetics produced by Stat.histogram. To add these to a plot, I understand I can include Stat.histogram as an argument in a layer(). But what do I do if I want to use Stat.histogram to calculate the x,y aesthetics, edit them using my own code, and then plot these edited aesthetics?
I'm looking for a function like load_aesthetics(layer(x=x, Stat.histogram)), or a field like layer(x=x, Stat.histogram).aesthetics.

you can create your own statistic. see https://github.com/GiovineItalia/Gadfly.jl/issues/894

Building off #bjarthur's answer, I wrote the below function.
"Return the aesthetics produced by a Gadfly Statistic object."
function process_statistic(statistic::Gadfly.StatisticElement,
input_aesthetics::Dict{Symbol,<:Any}
)
# Check that enough statistics have been provided.
required_aesthetics = Gadfly.input_aesthetics(statistic)
for required_aesthetic in required_aesthetics
if required_aesthetic ∉ keys(input_aesthetics)
error("Aesthetic $(required_aesthetic) is required")
end
end
# Create the aes object, which contains the statistics.
aes = Gadfly.Aesthetics()
[setfield!(aes, key, value) for (key, value) in input_aesthetics]
# These need to be passed to the apply_statistic() function. I do
# not understand them, and the below code might need to be edited
# for this function to work in some cases.
scales = Dict{Symbol, Gadfly.ScaleElement}()
coord = Gadfly.Coord.Cartesian()
# This function edits the aes object, filling it with the desired aesthetics.
Gadfly.Stat.apply_statistic(statistic, scales, coord, aes)
# Return the produced aesthetics in a dictionary.
outputs = Gadfly.output_aesthetics(statistic)
return Dict(output => getfield(aes, output) for output in outputs)
end
Example usage:
process_statistic(Stat.histogram(), Dict(:x => rand(100)))

Related

Add text to a ggpairs() scatterplot?

dumb but maddening question: How can I add text labels to my scatterplot points in a ggpairs(...) plot? ggpairs(...) is from the GGally library. The normal geom_text(...) function doesn't seem to be an option, as it take x,y arguments and ggpairs creates an NxN matrix of differently-styled plots.
Not showing data, but imagine I have a column called "ID" with id's of each point that's displayed in the scatterplots.
Happy to add data if it helps, but not sure it's necessary. And maybe the answer is simply that it isn't possible to add text labels to ggpairs(...)?
library(ggplot2)
library(GGally)
ggpairs(hwWrld[, c(2,6,4)], method = "pearson")
Note: Adding labels is for my personal reference. So no need to tell me it would look like an absolute mess. It will. I'm just looking to identify my outliers.
Thanks!
It is most certainly possible. Looking at the documentation for ?GGally::ggpairs there are three arguments, upper, lower and diag, which from the details of the documentations are
Upper and lower are lists that may contain the variables 'continuous', 'combo', 'discrete' and 'na'. Each element of thhe list may be a function or a string
... (more description)
If a function is supplied as an option to upper, lower, or diag, it should implement the function api of function(data, mapping, ...){#make ggplot2 plot}. If a specific function needs its parameters set, wrap(fn, param1 = val1, param2 = val2) the function with its parameters.
Thus a way to "make a label" would be to overwrite the default value of a plot. For example if we wanted to write "hello world" in the upper triangle we could do something like:
library(ggplot2)
library(GGally)
#' Plot continuous upper function, by adding text to the standard plot
#' text is placed straight in the middle, over anything already residing there!
continuous_upper_plot <- function(data, mapping, text, ...){
p <- ggally_cor(data, mapping, ...)
if(!is.data.frame(text))
text <- data.frame(text = text)
lims <- layer_scales(p)
p + geom_label(data = text, aes(x = mean(lims$x$range$range),
y = mean(lims$y$range$range),
label = text),
inherit.aes = FALSE)
}
ggpairs(iris, upper = list(continuous = wrap(continuous_upper_plot,
text = 'hello world')))
with the end result being:
There are 3 things to note here:
I've decided to add the text in the function itself. If your text is part of your existing data, simply using the mapping (aes) argument when calling the function will suffice. And this is likely also better, as you are looking to add text to specific points.
If you have any additional arguments to a function (outside data and mapping) you will need to use wrap to add these to the call.
The function documentation specifically says that arguments should be data, mapping rather than the standard for ggplot2 which is mapping, data. As such for any of the ggplot functions a small wrapper switching their positions will be necessary to overwrite the default arguments for ggpairs.

How to plot the function 4(x)^2 = ((y)^2/(1-y))?

I want to plot the function
4(x)^2 = ((y)^2/(1-y));
how can I plot this?
--> 4*(x) = ((y^2)*(1-y)^-1)^0.5;
4*(x) = ((y^2)*(1-y)^-1)^0.5;
^^
Error: syntax error, unexpected =, expecting end of file
Since Scilab 6.1.0, plotimplicit() does it:
plotimplicit "4*x^2 = y^2/(1-y)"
xgrid()
Can't do more simple. Result:
Well, you have to first create a function and for that you have to express one variable in terms of the other.
function x = f(y)
x = (((y^2)*(1-y)^-1)^0.5)/4;
endfunciton
Then you need to generate the input data (i.e, the points at which you want to evaluate the function)
ydata = linspace(1, 10)
Now you push your input point through the function to get your output points
xdata = f(ydata)
Then, you can plot the pairs of x and y using:
plot(xdata, ydata)
Or even easier, without the intermediate step of generating the output data, you can simply do:
plot(f(ydata), ydata)
BTW. I find it strange that the function you are trying to plot is x in terms of y, usually, x is the input variable, but I hope you know what you are trying to accomplish.
Reference: https://www.scilab.org/tutorials/getting-started/plotting
Take care that y must be in [-inf 1[
y=linspace(-10 ,1.00001,1000);
x = sqrt(y^2./(1-y))/4;
clf; plot(y,x),plot(y,-x)
If x is a solution -x is also solution

How to plot StatsBase.Histogram object in Julia?

I am using a package(LightGraphs.jl) in Julia, and it has a predefined histogram method that creates the degree distribution of a network g.
deg_hist = degree_histogram(g)
I want to make a plot of this but i am new to plotting in Julia. The object returned is a StatsBase.Histogram which has the following as its inner fields:
StatsBase.Histogram{Int64,1,Tuple{FloatRange{Float64}}}
edges: 0.0:500.0:6000.0
weights: [79143,57,32,17,13,4,4,3,3,2,1,1]
closed: right
Can you help me how I can make use of this object to plot the histogram?
I thought this was already implemented, but I just added the recipe to StatPlots. If you check out master, you'll be able to do:
julia> using StatPlots, LightGraphs
julia> g = Graph(100,200);
julia> plot(degree_histogram(g))
For reference, the associated recipe that I added to StatPlots:
#recipe function f(h::StatsBase.Histogram)
seriestype := :histogram
h.edges[1], h.weights
end
Use the histogram fields .edges and .weights to plot it e.g.
using PyPlot, StatsBase
a = rand(1000); # generate something to plot
test_hist = fit(Histogram, a)
# line plot
plot(test_hist.edges[1][2:end], test_hist.weights)
# bar plot
bar(0:length(test_hist.weights)-1, test_hist.weights)
xticks(0:length(test_hist.weights), test_hist.edges[1])
or you could create/extend a plotting function adding a method like so:
function myplot(x::StatsBase.Histogram)
... # your code here
end
Then you will be able to call your plotting functions directly on the histogram object.

How to only change parameters for "lower" plots in the ggpairs function from GGally package

I have the following example
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
Resulting in a really nice figure:
But my problem is that in the real dataset I have to many points whereby I would like to change the parameters for the point geom. I want to reduce the dot size and use a lower alpha value. I can however not doe this with the "param" option it applies to all plot - not just the lower one:
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
params=c(alpha=1/10),
axisLabels='show'
)
resulting in this plot:
Is there a way to apply parameters to only "lower" plots - or do I have to use the ability to create custom plots as suggested in the topic How to adjust figure settings in plotmatrix?
In advance - thanks!
There doesn't seem to be any elegant way to do it, but you can bodge it by writing a function to get back the existing subchart calls from the ggally_pairs() object and then squeezing the params in before the last bracket. [not very robust, it'll only work for if the graphs are already valid]
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
g<-ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
add_p<-function(g,i,params){
side=length(g$columns) # get number of cells per side
lapply(i,function(i){
s<-as.character(g$plots[i]) # get existing call as a template
l<-nchar(s)
p<-paste0(substr(s,1,l-1),",",params,")") # append params before last bracket
r<-i%/%side+1 # work out the position on the grid
c<-i%%side
array(c(p,r,c)) # return the sub-plot and position data
})
}
rep_cells<-c(4,7,8)
add_params<-"alpha=0.3, size=0.1, color='red'"
ggally_data<-g$data # makes sure that the internal parameter picks up your data (it always calls it's data 'ggally_data'
calls<-add_p(g,rep_cells,params=add_params) #call the function
for(i in 1:length(calls)){g<-putPlot(g,calls[[i]][1],as.numeric(calls[[i]][2]),as.numeric(calls[[i]][3]))}
g # call the plot

Save heatmap.2 in variable and plot again

I use heatmap.2 from gplots to make a heatmap:
library(gplots)
# some fake data
m = matrix(c(0,1,2,3), nrow=2, ncol=2)
# make heatmap
hm = heatmap.2(m)
When I do 'heatmap.2' directly I get a plot that I can output to a device. How can I make the plot again from my variable 'hm'? Obviously this is a toy example, in real life I have a function that generates and returns a heatmap which I would like to plot later.
There are several alternatives, although none of them are particularly elegant. It depends on if the variables used by your function are available in the plotting environment. heatmap.2 doesn't return a proper "heatmap" object, although it contains the necessary information for plotting the graphics again. See str(hm) to inspect the object.
If the variables are available in your environment, you could just re-evaluate the original plotting call:
library(gplots)
# some fake data (adjusted a bit)
set.seed(1)
m = matrix(rnorm(100), nrow=10, ncol=10)
# make heatmap
hm = heatmap.2(m, col=rainbow(4))
# Below fails if all variables are not available in the global environment
eval(hm$call)
I assume this won't be the case though, as you mentioned that you are calling the plot command from inside a function and I think you're not using any global variables. You could just re-construct the heatmap drawing call from the fields available in your hm-object. The problem is that the original matrix is not available, but instead we have a re-organized $carpet-field. It requires some tinkering to obtain the original matrix, as the projection has been:
# hm2$carpet = t(m[hm2$rowInd, hm2$colInd])
At least in the case when the data matrix has not been scaled, the below should work. Add extra parameters according to your specific plotting call.
func <- function(mat){
h <- heatmap.2(mat, col=rainbow(4))
h
}
# eval(hm2$call) does not work, 'mat' is not available
hm2 <- func(m)
# here hm2$carpet = t(m[hm2$rowInd, hm2$colInd])
# Finding the projection back can be a bit cumbersome:
revRowInd <- match(c(1:length(hm2$rowInd)), hm2$rowInd)
revColInd <- match(c(1:length(hm2$colInd)), hm2$colInd)
heatmap.2(t(hm2$carpet)[revRowInd, revColInd], Rowv=hm2$rowDendrogram, Colv=hm2$colDendrogram, col=hm2$col)
Furthermore, I think you may be able to work your way to evaluating hm$call in the function's environment. Perhaps with-function would be useful.
You could also make mat available by attaching it to the global environment, but I think this is considered bad practice, as too eager use of attach can result in problems. Notice that in my example every call to func creates the original plot.
I would do some functional programming:
create_heatmap <- function(...) {
plot_heatmap <- function() heatmap.2(...)
}
data = matrix(rnorm(100), nrow = 10)
show_heatmap <- create_heatmap(x = data)
show_heatmap()
Pass all of the arguments you need to send to plot_heatmap through the .... The outer function call sets up an environment in which the inner function looks first for its arguments. The inner function is returned as an object and is now completely portable. This should produce the exact same plot each time!

Resources