In Stata I can easily add bands in the background, e.g. to signal a period of recession:
clear
set obs 2
gen year = 1990
replace year = 2000 if _n == 2
gen x = 0
replace x = 1 if _n == 2
twoway ///
(scatteri 1 `=' 1995 1 `=' 1996, bcolor(gs10) recast(area) lwidth(none)) ///
(line x year)
The result is an increasing line with a background vertical band:
In Julia with Gadfly, the best I could find was contrived:
using Gadfly, DataFrames, Colors
df = DataFrame(year = [1990; 2000], x = [0; 1], color = [1; 1])
x_shade = [1995 1995 1996 1996]
y_shade = [0 1 1 0]
theme = Theme(
discrete_highlight_color = u -> ARGB(1, 1, 1, 0),
default_color = colorant"grey")
p = plot(
layer(
x = x_shade,
y = y_shade,
Geom.polygon(preserve_order = true, fill = true),
order = 1
),
layer(df,
y = "x",
x = "year",
color = "color",
Geom.line,
order = 2
),
theme
)
The result is similar to Stata:
To remove the stroke, I gave the theme a function that returns a transparent white (as suggested in this thread). I was unable to set the fill of the band, so I set the default color to grey and added a dummy field color to change the color of the line plot from grey to another color. I also gave the y-coordinates of the vertical band from the maximum and minimum of the data, which may not coincide with the maximum and minimum of the viewport.
Does someone know of a better way?
With Plots.jl you can just add a vspan, e.g.:
vspan([1995, 1996], linecolor = :grey, fillcolor = :grey)
plot!(df[:year], df[:x])
This would require a non-trivial work.
You can base off Geom.vline, where you would need to create something like VBandGeometry (like VLineGeometry), and specify how to build the band (like render(::VLineGeometry)), getting the limits correctly at Compose
Following the discussion on the GitHub issue, one contributor suggested Geom.rect, which requires adding a minimum and a maximum for the y axis:
using Gadfly, DataFrames
df = DataFrame(year = [1990; 2000], x = [0; 1], color = [1; 1])
recessions = DataFrame(peaks = [1995],
troughs = [1996],
ymin = minimum(df[:x]),
ymax = maximum(df[:x]))
plot(
recessions,
xmin = :peaks,
xmax = :troughs,
ymin = :ymin,
ymax = :ymax,
Geom.rect,
Theme(default_color = colorant"grey"),
layer(df,
y = "x",
x = "year",
Geom.line,
order = 2,
Theme(default_color = colorant"black")
)
)
Related
There are other variations of this question, such as:
R: place geom_text() relative to plot borders rather than fixed position on the plot
ggplot2 annotate layer position in R
Position ggplot text in each corner
In my opinion, these do not solve the general problem. The first simply pre-calculates the x and y ranges so that proportions can be used. The second two use the "trick" that one can pass +/- Inf to position text in a given corner.
Here are two improvements I think would make for a more generalized solution:
allow arbitrary positioning of a label via relative positioning
works with variables calculated on the fly via dplyr (rules out pre-calculating ranges/ratios)
For sample data:
data.frame(
x = runif(100, min = sample(0:50, 1), max = sample(50:1000, 1)),
y = runif(100, min = sample(0:1000, 1), max = sample(1000:10000, 1))
) %>%
mutate(z = x + y) %>%
# code here to plot and put an annotation at e.g. x = 0.95, y = 0.1, relative to plot limits
I've been wrestling with this today and had a possible improvement to existing answers, leveraging some additional learning about how one can access the data from within the ggplot() call.
I found (see 1 and 2) that by surrounding the ggplot call in {} and passing . as the data argument, one can continue to refer to . throughout the call. This enables:
pos_x <- 0.95
pos_y <- 0.1
data.frame(
x = runif(100, min = sample(0:50, 1), max = sample(50:1000, 1)),
y = runif(100, min = sample(0:1000, 1), max = sample(1000:10000, 1))
) %>%
mutate(z = x + y) %>% {
ggplot(., aes(x = x, y = z)) + geom_point() +
annotate(geom = "text", label = "some label",
x = min(.$x) + pos_x * diff(range(.$x)),
y = min(.$z) + pos_y * diff(range(.$z)),
hjust = 1, vjust = 1) +
scale_x_continuous(limits = range(.$x)) +
scale_y_continuous(limits = range(.$z))
}
You can re-run this and observe the plot label stay fixed even as the x/y axis ranges change significantly. For some improvement opportunities:
the lower axis limit varies with the data so just using y = number_close_to_zero*max(.$y) could be risky if min(.$y) is too high. For this reason, I manually specified the axis limits
similarly, for this reason the position isn't exact between plots if you just do pos_x_rel * max(.$x), so I used min(.$x) + diff(range(.$x)) instead
hjust and vjust aren't automatic; they need to be tweaked depending on the desired label location
it would be nice to automagically get the variable used for x/y vs. having to use the column name. In other words, if I wanted to change to aes(..., y = y), I wouldn't have to change instances of .$z to .$y.
I'm trying to create a heat map for an OD matrix, but I wanted to scale the rows and columns by certain weights. Since these weights are constant across each category I would expect the plot would keep the rows and columns structure.
# Tidy OD matrix
df <- data.frame (origin = c(rep("A", 3), rep("B", 3),rep("C", 3)),
destination = rep(c("A","B","C"),3),
value = c(0, 1, 10, 5, 0, 11, 15, 6, 0))
# Weights
wdf <- data.frame(region = c("A","B","C"),
w = c(1,2,3))
# Add weights to the data.
plot_df <- df %>%
merge(wdf %>% rename(w_origin = w), by.x = 'origin', by.y = 'region') %>%
merge(wdf %>% rename(w_destination = w), by.x = 'destination', by.y = 'region')
Here's how the data looks like:
> plot_df
destination origin value w_origin w_destination
1 A A 0 1 1
2 A C 15 3 1
3 A B 5 2 1
4 B A 1 1 2
5 B B 0 2 2
6 B C 6 3 2
7 C B 11 2 3
8 C A 10 1 3
9 C C 0 3 3
However, when passing the weights as width and height in the aes() I get this:
ggplot(plot_df,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value),
color = 'black')
It seems to be working for the size of the columns (width), but not quite because the proportions are not the right. And the rows are all over the place and not aligned.
I'm only using geom_tile because I could pass height and width as aesthetics, but I accept other suggestions.
The issue is that your tiles are overlapping. The reason is that while you could pass the width and the heights as aesthetics, geom_tile will not adjust the x and y positions of the tiles for you. As your are mapping a discrete variable on x and y your tiles are positioned on a equidistant grid. In your case the tiles are positioned at .5, 1.5 and 2.5. The tiles are then drawn on these positions with the specified width and height.
This could be easily seen by adding some transparency to your plot:
library(ggplot2)
library(dplyr)
ggplot(plot_df,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value), color = "black", alpha = .2)
To achieve your desired result you have to manually compute the x and y positions according to the desired widths and heights to prevent the overlapping of the boxes. To this end you could switch to a continuous scale and set the desired breaks and labels via scale_x/y_ continuous:
breaks <- wdf %>%
mutate(cumw = cumsum(w),
pos = .5 * (cumw + lag(cumw, default = 0))) %>%
select(region, pos)
plot_df <- plot_df %>%
left_join(breaks, by = c("origin" = "region")) %>%
rename(y = pos) %>%
left_join(breaks, by = c("destination" = "region")) %>%
rename(x = pos)
ggplot(plot_df,
aes(x = x,
y = y)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value), color = "black") +
scale_x_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1)) +
scale_y_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1))
So I think I have a partial solution for you. After playing arround with geom_tile, it appears that the order of your dataframe matters when you are using height and width.
Here is some example code I came up with off of yours (run your code first). I converted your data_frame to a tibble (part of dplyr) to make it easier to sort by a column.
# Converted your dataframe to a tibble dataframe
plot_df_tibble = tibble(plot_df)
# Sorted your dataframe by your w_origin column:
plot_df_tibble2 = plot_df_tibble[order(plot_df_tibble$w_origin),]
# Plotted the sorted data frame:
ggplot(plot_df_tibble2,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value),
color = 'black')
And got this plot:
Link to image I made
I should note that if you run the converted tibble before you sort that you get the same plot you posted.
It seems like the height and width arguements may not be fully developed for this portion of geom_tile, as I feel that the order of the df should not matter.
Cheers
When plotting an ellips with ggplot is it possible to constrain the ellips to values that are actually possible?
For example, the following reproducible code and data plots Ele vs. Var for two species. Var is a positive variable and cannot be negative. Nonetheless, negative values are included in the resulting ellips. Is it possible to bound the ellips by 0 on the x-axis (using ggplot)?
More specifically, I am picturing a flat edge with the ellipsoids truncated at 0 on the x-axis.
library(ggplot2)
set.seed(123)
df <- data.frame(Species = rep(c("BHS", "MTG"), each = 100),
Ele = c(sample(1500:3000, 100), sample(2500:3500, 100)),
Var = abs(rnorm(200)))
ggplot(df, aes(Var, Ele, color = Species)) +
geom_point() +
stat_ellipse(aes(fill = Species), geom="polygon",level=0.95,alpha=0.2)
You could edit the default stat to clip points to a particular value. Here we change the basic stat to trim x values less than 0 to 0
StatClipEllipse <- ggproto("StatClipEllipse", Stat,
required_aes = c("x", "y"),
compute_group = function(data, scales, type = "t", level = 0.95,
segments = 51, na.rm = FALSE) {
xx <- ggplot2:::calculate_ellipse(data = data, vars = c("x", "y"), type = type,
level = level, segments = segments)
xx %>% mutate(x=pmax(x, 0))
}
)
Then we have to wrap it in a ggplot stat that is identical to stat_ellipe except that it uses our custom Stat object
stat_clip_ellipse <- function(mapping = NULL, data = NULL,
geom = "path", position = "identity",
...,
type = "t",
level = 0.95,
segments = 51,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatClipEllipse,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
type = type,
level = level,
segments = segments,
na.rm = na.rm,
...
)
)
}
then you can use it to make your plot
ggplot(df, aes(Var, Ele, color = Species)) +
geom_point() +
stat_clip_ellipse(aes(fill = Species), geom="polygon",level=0.95,alpha=0.2)
This was inspired by the source code for stat_ellipse.
Based on my comment above, I created a less-misleading option for visualization. This is ignoring the problem with y being uniformly distributed, since that's a somewhat less egregious problem than the heavily skewed x variable.
Both these options use the ggforce package, which is an extension of ggplot2, but just in case, I've also included the source for the particular function I used.
library(ggforce)
library(scales)
# power_trans <- function (n)
# {
# scales::trans_new(name = paste0("power of ", fractions(n)), transform = function(x) {
# x^n
# }, inverse = function(x) {
# x^(1/n)
# }, breaks = scales::extended_breaks(), format = scales::format_format(),
# domain = c(0, Inf))
# }
Option 1:
ggplot(df, aes(Var, Ele, color = Species)) +
geom_point() +
stat_ellipse(aes(fill = Species), geom="polygon",level=0.95,alpha=0.2) +
scale_x_sqrt(limits = c(-0.1,3.5),
breaks = c(0.0001,1:4),
labels = 0:4,
expand = c(0.00,0))
This option stretches the x-axis along a square-root transform, spreading out the points clustered near zero. Then it computes an ellipse over this new space.
Advantage: looks like an ellipse still.
Disadvantage: in order to get it to play nice and label the Var=0 point on the x axis, you have to use expand = c(0,0), which clips the limits exactly, and so requires a bit more fiddling with manual limits/breaks/labels, including choosing a very small value (0.0001) to be represented as 0.
Disadvantage: the x values aren't linearly distributed along the axis, which requires a bit more cognitive load when reading the figure.
Option 2:
ggplot(df, aes(sqrt(Var), Ele, color = Species)) +
geom_point() +
stat_ellipse() +
coord_trans(x = ggforce::power_trans(2)) +
scale_x_continuous(breaks = sqrt(0:4), labels = 0:4,
name = "Var")
This option plots the pre-transformed sqrt(Var) (notice the aes(...)). It then calculates the ellipses based on this new approximately normal value. Then it stretches out the x-axis so that the values of Var are once again linearly spaced, which distorts the ellipse in the same transformation.
Advantage: looks cool.
Advantage: values of Var are easy to interpret on the x-axis.
Advantage: you can see the density near Var=0 with the points and the wide flat end of the "egg" easily.
Advantage: the pointy end shows you how low the density is at those values.
Disadvantage: looks unfamiliar and requires explanation and additional cognitive load to interpret.
I am very new to reshape and rgl, to see if I had understood correctly I am going through the code presented in http://www.r-bloggers.com/creating-3d-geographical-plots-in-r-using-rgl/ , it seems to work with the dataset presented in the website, but not with mine and i do not understand why, but what i get is only a grey image.
This is a link to my dataset http://www.mediafire.com/download/8ifpssvdyr665g7/sig.RData
as in the example the two columns are lat and long, the third is a value.
Is this the best way to plot data which are not on a grid?
Any help is greatly appreciated
library(rgl)
library(reshape)
rgl.clear(type = c("shapes"))
load("sig.RData")
calls = sig
calls=as.matrix(calls)
dimnames(calls) <- list(NULL, c("lat","long","total"))
calls<-as.data.frame(calls)
head(calls)
bin_size = 0.5
calls$long_bin = cut(calls$long, seq(min(calls$long), max(calls$long), bin_size))
calls$lat_bin = cut(calls$lat, seq(min(calls$lat), max(calls$lat), bin_size))
calls$total = log(calls$total) / 3 #need to do this to flatten out totals
calls = melt(calls[,3:5])
calls = cast(calls, lat_bin~long_bin, fun = sum, fill = 0)
calls = calls[,2:(ncol(calls)-1)]
calls = as.matrix(calls)
# simple black and white plot
x = (1: nrow(calls))
z = (1: ncol(calls))
rgl.surface(x, z, calls)
rgl.bringtotop()
rgl.pop()
# nicer colored plot
ylim <- range(calls)
ylen <- ylim[2] - ylim[1] + 1
col <- topo.colors(ylen)[ calls-ylim[1]+1 ]
x = (1: nrow(calls))
z = (1: ncol(calls))
rgl.bg(sphere=FALSE, color=c("black"), lit=FALSE)
rgl.viewpoint( theta = 300, phi = 30, fov = 170, zoom = 0.03)
rgl.surface(x, z, calls, color = col, shininess = 10)
rgl.bringtotop()
I would like to create a simple scatter plot in R or MATLAB involving two variables $x$ and $y$ which have errors associated with them, $\epsilon_x$ and $\epsilon_y$.
Instead of adding error-bars, however, I was hoping to create a "shaded box" around each $(x,y)$ pair where the height of the box ranges from ($y - \epsilon_y$) to ($y + \epsilon_y$) and the width of the box ranges from ($x - \epsilon_y$) to ($x + \epsilon_y$) .
Is this possible in R or MATLAB? If so, what package or code can I use to generate these plots. Ideally, I would like the package to also support asymmetric error bounds.
You could do it in matlab by creating the following function:
function errorBox(x,y,epsx,epsy)
%# make sure inputs are all column vectors
x = x(:); y = y(:); epsx = epsx(:); epsy = epsy(:);
%# define the corner points of the error boxes
errBoxX = [x-epsx, x-epsx, x+epsx, x+epsx];
errBoxY = [y-epsy, y+epsy, y+epsy, y-epsy];
%# plot the transparant errorboxes
fill(errBoxX',errBoxY','b','FaceAlpha',0.3,'EdgeAlpha',0)
end
x, y, epsx and epsy can all be vectors.
Example:
x = randn(1,5); y = randn(1,5);
epsx = rand(1,5)/5;
epsy = rand(1,5)/5;
plot(x,y,'rx')
hold on
errorBox(x,y,epsx,epsy)
Result:
It's probably easier using the ggplot2. First create some data:
set.seed(1)
dd = data.frame(x = 1:5, eps_x = rnorm(5, 0, 0.1), y = rnorm(5), eps_y = rnorm(5, 0, 0.1))
##Save space later
dd$xmin = dd$x - dd$eps_x
dd$xmax = dd$x + dd$eps_x
dd$ymin = dd$y - dd$eps_y
dd$ymax = dd$y + dd$eps_y
Then use the rectangle geom in ggplot2:
library(ggplot2)
ggplot(dd) +
geom_rect(aes( xmax = xmax, xmin=xmin, ymin=ymin, ymax = ymax))
gives the first plot. Of course, you don't need to use ggplot2, to get something similar in base graphics, try:
plot(0, 0, xlim=c(0.5, 5.5), ylim=c(-1, 1), type="n")
for(i in 1:nrow(dd)){
d = dd[i,]
polygon(c(d$xmin, d$xmax, d$xmax, d$xmin), c(d$ymin, d$ymin, d$ymax,d$ymax), col="grey80")
}
to get the second plot.
Here's how to do it using Matlab (with asymmetric intervals). Converting to symmetric ones should be trivial.
%# define some random data
x = rand(5,1)*10;y = rand(5,1)*10;
%# ex, ey have two columns for lower/upper bounds
ex = abs(randn(5,2))*0.3;ey=abs(randn(5,2));
%# create vertices, faces, for patches
vertx = bsxfun(#minus,y,ey(:,[1 2 2 1]))';
verty = bsxfun(#minus,y,ey(:,[1 1 2 2]))';
vertices = [vertx(:),verty(:)];
faces = bsxfun(#plus,[1 2 3 4],(0:4:(length(x)-1)*4)');
%# create patch
patch(struct('faces',faces,'vertices',vertices),'FaceColor',[0.5 0.5 0.5]);
%# add "centers" - note, the intervals are asymmetric
hold on, plot(x,y,'oy','MarkerFaceColor','r')
It's simple with the ggplot2 package in R.
# An example data frame
dat <- data.frame(x = 1:5, y = 5:1, ex = (1:5)/10, ey = (5:1)/10)
# Plot
library(ggplot2)
ggplot(dat) +
geom_rect(aes(xmin = x - ex, xmax = x + ex, ymin = y - ey, ymax = y + ey),
fill = "grey") +
geom_point(aes(x = x, y = y))
In the aes function inside geom_rect the size of the rectangle is defined by ex and ey around x and y.
Here's a MATLAB answer:
x = randn(1,5); y = 3-2*x + randn(1,5);
ex = (.1+rand(1,5))/5; ey = (.2+rand(1,5))/3;
plot(x,y,'ro')
patch([x-ex;x+ex;x+ex;x-ex],[y-ey;y-ey;y+ey;y+ey],[.9 .9 .9],'facealpha',.2,'linestyle','none')