I would like to make this plot:
Plot 1: The plot that I wanted
My data looks like this:
> head(ranges_example)
labels Minimum Maximum error
1 One -275 -240 1
2 Two -265 -210 1
3 Three -260 -215 1
4 Four -273 -230 1
5 Five NaN -200 1
6 Six NaN -240 1
But, alas, I had to make that plot in illustrator by modifying the plot that I did make in R, this one:
Plot 2: The plot that I got
And I made it using geom_linerange, specifically:
ggplot() +
geom_linerange(data = ranges_example,
mapping=aes(x = labels, ymin = Minimum, ymax = Maximum,
lwd = 1, color = error, alpha = 0.5),
position = position_dodge(width = 1)) +
scale_y_continuous(c(-240, -300)) +
coord_flip()
Plot 2 is good enough for this once--it takes maybe 15 minutes to turn it into Plot 1 in Illustrator--but I'll probably need to make a good few more of these.
The reason why I don't just remove the position_dodge statement is that then it just blends the colors together, like this:
I need them to be their own, distinct colors so that it's easy to tell them apart. The different shades mean different things and I need to be able to easily distinguish between and alter them.
How can I create a plot that looks more like Plot 2 right out of the box?
ggplot() +
geom_linerange(data = ranges_example %>% arrange(-error),
mapping=aes(x = labels, ymin = Minimum, ymax = Maximum,
lwd = 1, color = error)) +
scale_y_continuous(c(-240, -300)) +
scale_color_continuous(high = "lightgreen", low = "forestgreen") +
coord_flip() +
theme_classic()
# Example data
ranges_example <- tribble(
~labels, ~Minimum, ~Maximum, ~error,
"One", -275, -240, 1,
"Two", -265, -210, 1,
"One", -285, -215, 2,
"Two", -275, -190, 2,
"One", -300, -200, 3,
"Two", -290, -180, 3)
Related
I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)
I want to plot a very simple boxplot like this in R:
desired graph
It is a log-link (Gamma distributed: jh_conc is a hormone concentration variable) Generalized linear model of a continuous dependent variable (jh_conc) for a categorical grouping variable (group: type of bee)
My script that I already have is:
> jh=read.csv("data_jh_titer.csv",header=T)
> jh
group jh_conc
1 Queens 6.38542714
2 Queens 11.22512563
3 Queens 7.74472362
4 Queens 11.56834171
5 Queens 3.74020100
6 Virgin Queens 0.06080402
7 Virgin Queens 0.12663317
8 Virgin Queens 0.08090452
9 Virgin Queens 0.04422111
10 Virgin Queens 0.14673367
11 Workers 0.03417085
12 Workers 0.02449749
13 Workers 0.02927136
14 Workers 0.01648241
15 Workers 0.02150754
fit1=glm(jh_conc~group,family=Gamma(link=log), data=jh)
ggplot(fit, aes(group, jh_conc))+
geom_boxplot(aes(fill=group))+
coord_trans(y="log")
the resulting plot looks like this:
My question is: what (geom) extensions can I use to split the y-axis and rescale them different? Also how do I add the black circles (averages; which are calculated on a log scale and then back-transformed to the original scale) horizontal lines which are significance levels based on posthoc tests performed on log transformed data: ** : p<0.01, *** :p< 0.001?
You can't create a broken numeric axis in ggplot2 by design, mainly because it visually distorts the data/differences being represented and is considered misleading.
You can however use scale_log10() + annotation_logticks() to help condense data across a wide range of values or better show heteroskedastic data. You can also use annotate to build out your p-value representation stars and bars.
Also you can easily grab information from a model using it's named attributes, here we care about fit$coef:
# make a zero intercept version for easy plotting
fit2 <- glm(jh_conc ~ 0 + group, family = Gamma(link = log), data = jh)
# extract relevant group means and use exp() to scale back
means <- data.frame(group = gsub("group", "",names(fit2$coef)), means = exp(fit2$coef))
ggplot(fit, aes(group, jh_conc)) +
geom_boxplot(aes(fill=group)) +
# plot the circles from the model extraction (means)
geom_point(data = means, aes(y = means),size = 4, shape = 21, color = "black", fill = NA) +
# use this instead of coord_trans
scale_y_log10() + annotation_logticks(sides = "l") +
# use annotate "segment" to draw the horizontal lines
annotate("segment", x = 1, xend = 2, y = 15, yend = 15) +
# use annotate "text" to add your pvalue *'s
annotate("text", x = 1.5, y = 15.5, label = "**", size = 4) +
annotate("segment", x = 1, xend = 3, y = 20, yend = 20) +
annotate("text", x = 2, y = 20.5, label = "***", size = 4) +
annotate("segment", x = 2, xend = 3, y = .2, yend = .2) +
annotate("text", x = 2.5, y = .25, label = "**", size = 4)
This question already has an answer here:
Manually setting group colors for ggplot2
(1 answer)
Closed 6 years ago.
I'm doing multiple plots split by one variable and in each plot, colour code based on another variable.
set.seed(12345)
dates = seq(as.Date("2000-01-01"), as.Date("2016-01-01"), by = 1)
dd = data.table(date = dates, value = rnorm(length(dates)))
dd[, year := lubridate::year(date)]
dd[, c := cut(value, c(-Inf, -3, 3, Inf))]
for (thisyear in 2000:2015) {
ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1)
}
dd[, length(unique(c)), by = year]
year V1
1: 2000 1
2: 2001 2
3: 2002 2
4: 2003 3
5: 2004 3
....
Now the colour in different plots will be inconsistent since not every year has the same length of unique cut values. Even worse is when one year has all (-Inf,3] values (unlikely here of course) and another year has all [3,Inf) values, they will both be coloured red in two plots.
How can I specify that (-Inf, 3] always take blue and (-3,3] always take green?
One way to manually specify the colors to use, would be to simply create a column in your data frame specifying the plot color to use.
For example:
# scatter plot
dd$color <- ifelse(dd$value <= 3, 'blue', 'green')
ggplot(dd, aes(date, value)) + geom_point(colour=dd$color)
# ribbon plot
thisyear <- '2001'
dd_year <- dd[year == thisyear,]
ggplot(dd_year, aes(date, group=color, colour=color)) +
geom_ribbon(aes(ymin=value - 1, ymax=value + 1, fill=color), alpha=0.5) +
scale_fill_manual(values=unique(dd_year$color)) +
scale_color_manual(values=unique(dd_year$color))
This would result in all points <= 3 being colored blue, and the remaining ones green.
Not the most interesting example perhaps since there is only only data point that gets colored green here, but it should look like this:
You can create a named vector of colors to pass to scale_fill_manual. This allows you to choose the colors of each group as well as ensuring that each plot has the same colors among groups.
colors = c("blue", "green", "red")
names(colors) = levels(dd$c)
(-Inf,-3] (-3,3] (3, Inf]
"blue" "green" "red"
Now the same plot, but with scale_fill_manual added.
for (thisyear in 2000:2015) {
print(ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, y = value, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1) +
scale_fill_manual(values = colors))
}
I am using geom_boxplot to draw candlesticks using stock market data. The problem is that the individual boxplot's upper and lower edges as well as the upper whisker end point show up way higher on the y-axis than their corresponding values. The relative height (difference between upper and lower edges) and the end point of the lower whisker of each boxplot are fine though. Here's my code :
candlestickPlot <- function(x){
library("ggplot2")
# x is a data.frame with columns 'date','open','high','low','close'
x$candleLower <- pmin(x$open, x$close)
x$candleUpper <- pmax(x$open, x$close)
x$candleMiddle <- NA
x$fill <- "red"
x$fill[x$open < x$close] = "green"
# Draw the candlesticks
g <- ggplot(x, aes(x=date, lower=candleLower, middle=candleMiddle, upper=candleUpper, ymin=low, ymax=high))
g <- g + geom_boxplot(stat='identity', aes(group=date, fill=fill))
g
}
Here's x :
date close volume open high low
5 2013-12-30 25.82 3525026 27.30 27.76 25.7
4 2013-12-31 27.41 5487204 25.25 27.70 25.25
3 2014-01-02 30.70 7835374 29.25 31.24 29.21
2 2014-01-03 30.12 4577278 31.49 31.80 30.08
1 2014-01-06 30.65 4042724 30.89 31.88 30.37
Am I doing something wrong here?
There are more efficient ways to create OHLC candlesticks with ggplot2 than the way you have described using geom_boxplot. Your code seems very similar to the example in the link:
http://www.perdomocore.com/2012/using-ggplot-to-make-candlestick-charts-alpha/
It seems many people are putting ggplot candlestick examples on the net that are based on the example in that link using geom_boxplot. But the problem with plotting with geom_boxplot is that the plotting itself gets slow at producing plots as the number of bars plotted increases.
Here is one computationally faster solution for plotting financial data using candlesticks/OHLC bars:
library(ggplot2)
library(quantmod)
FOSL <- getSymbols("FOSL", from="2015-01-01", auto.assign=FALSE)
names(FOSL) <- gsub("^.+\\.","",names(FOSL)) # remove "FOSL." from column names
rng <- "2015-08"
FOSL <- FOSL[rng]
FOSL <- data.frame(Date=as.POSIXct(index(FOSL)), FOSL[,1:4])
FOSL$chg <- ifelse(Cl(FOSL) > Op(FOSL), "up", "dn")
FOSL$width <- as.numeric(periodicity(FOSL)[1])
FOSL$flat_bar <- FOSL[, "High"] == FOSL[, "Low"]
# Candle chart:
pl <- ggplot(FOSL, aes(x=Date))+
geom_linerange(aes(ymin=Low, ymax=High)) +
theme_bw() +
labs(title="FOSL") +
geom_rect(aes(xmin = Date - width/2 * 0.9, xmax = Date + width/2 * 0.9, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = chg)) + guides(fill = FALSE, colour = FALSE) + scale_fill_manual(values = c("dn" = "darkred", "up" = "darkgreen"))
# Handle special case of drawing a flat bar where OHLC = Open:
if (any(FOSL$flat_bar)) pl <- pl + geom_segment(data = FOSL[FOSL$flat_bar,], aes(x = Date - width / 2 * 0.9, y = Close, yend = Close, xend = Date + width / 2 * 0.9))
print(pl)
Thank you FXQuantTrader for introducing a beautiful and fast alternative approach to the candlestick bars in R! Awesome, concise, easy to read!
Here comes a bit improved version of FXQuantTrader's solution, which include:
- wraps it into a function
- supports lower resolution (down to 1 sec bars)
- changes candle's whiskers colour from black to proper one
- adds small horizontal line for bars with Close == Open
- adds 3rd colour (blue) to bars with Close == Open
- adds 'alpha' argument which allows you to make the whole candlesticks chart more transparent, so when you draw on top some Bollinger Bands and/or Moving Averages the bars will be less distracting (more like a background)
- a bit more comments for newbies to figure out what is going on :)
Here she comes:
library(ggplot2)
library(quantmod)
draw_candles <- function(df, title_param, alpha_param = 1){
df$change <- ifelse(df$Close > df$Open, "up", ifelse(df$Close < df$Open, "down", "flat"))
# originally the width of the bars was calculated by FXQuantTrader with use of 'periodicity()', which
# seems to work ok only with: ‘minute’,‘hourly’, ‘daily’,‘weekly’, ‘monthly’,
# ‘quarterly’, and ‘yearly’, but can not do 1 sec bars while we want arbitrary bar size support!-)
# df$width <- as.numeric(periodicity(df)[1])
# So let us instead find delta (seconds) between 1st and 2nd row and just
# use it for all other rows. We check 1st 3 rows to avoid larger "weekend gaps"
width_candidates <- c(as.numeric(difftime(df$Date[2], df$Date[1]), units = "secs"),
as.numeric(difftime(df$Date[3], df$Date[2]), units = "secs"),
as.numeric(difftime(df$Date[4], df$Date[3]), units = "secs"))
df$width_s = min(width_candidates) # one (same) candle width (in seconds) for all the bars
# define the vector of candle colours either by name or by rgb()
#candle_colors = c("down" = "red", "up" = "green", "flat" = "blue")
candle_colors = c("down" = rgb(192,0,0,alpha=255,maxColorValue=255), "up" = rgb(0,192,0,alpha=255,maxColorValue=255), "flat" = rgb(0,0,192,alpha=255,maxColorValue=255))
# Candle chart:
g <- ggplot(df, aes(x=Date))+
geom_linerange(aes(ymin=Low, ymax=High, colour = change), alpha = alpha_param) + # candle whiskerss (vertical thin lines:)
theme_bw() +
labs(title=title_param) +
geom_rect(aes(xmin = Date - width_s/2 * 0.9, xmax = Date + width_s/2 * 0.9, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = change), alpha = alpha_param) + # cabdke body
guides(fill = FALSE, colour = FALSE) +
scale_color_manual(values = candle_colors) + # color for line
scale_fill_manual(values = candle_colors) # color for candle fill
# Handle special cases: flat bar and Open == close:
if (any(df$change == "flat")) g <- g + geom_segment(data = df[df$change == "flat",], aes(x = Date - width_s / 2 * 0.9, y = Close, yend = Close, xend = Date + width_s / 2 * 0.9, colour = change), alpha = alpha_param)
#print(g)
g
}
Could not completely understand your problem but this seems to work nicely:
http://www.perdomocore.com/2012/using-ggplot-to-make-candlestick-charts-alpha/
I created a package generating candlestick chart with the possibility of further extension.
https://github.com/dominikduda/candlePlotter
From help:
Plots OHLC chart
(...)
Arguments:
time_series: A data frame with c('Time', 'Open', 'High', 'Low',
'Close') columns where Time column must be of POSIXct type.
chart_title: An optional string with main chart title
under_candles_layers: A vector of ggplot layers to print under candles
Working example how to use:
# Plotting a chart and saving it from a string:
raw_data <- "
Time Open High Low Close
2018-08-30 7050.267 7068.232 6740.648 6985.976
2018-08-31 6982.225 7075.417 6915.935 7046.783
2018-09-01 7040.911 7257.571 7030.790 7193.122
2018-09-02 7203.630 7314.289 7136.561 7277.199
2018-09-03 7286.205 7334.481 7201.419 7255.241
2018-09-04 7269.067 7394.179 7251.269 7364.443
2018-09-05 7365.232 7391.967 6704.715 6704.715
2018-09-06 6715.508 6715.508 6365.000 6503.564
2018-09-07 6514.690 6544.672 6378.351 6446.210
2018-09-08 6426.220 6485.850 6147.691 6203.588
2018-09-09 6202.271 6417.675 6178.907 6260.216
2018-09-10 6270.848 6351.214 6263.048 6317.647
2018-09-11 6320.536 6391.365 6241.453 6289.961
2018-09-12 6296.140 6349.481 6238.578 6339.010
2018-09-13 6345.973 6525.523 6337.746 6498.652
2018-09-14 6488.631 6583.669 6428.993 6492.367
2018-09-15 6488.870 6561.979 6480.306 6524.671"
data_for_chart <- read.table(text = raw_data, header = TRUE)
data_for_chart <- transform(data_for_chart, Time = as.POSIXct(Time))
plot <- prettyCandlePlot(data_for_chart, 'BTCUSD')
ggsave(
'btc_usd_daily.png',
plot = plot,
width = 30,
height = 18,
units = 'cm'
)
I was thinking of doing this in R but am new to it and would appreciate any help
I have a dataset (pitches) of baseball pitches identified by
'pitchNumber' and 'outcome' e.g S = swinging strike, B = ball, H= hit
etc.
e.g.
1 B ;
2 H ;
3 S ;
4 S ;
5 X ;
6 H; etc.
All I want to do is have a graph that plots them in a line cf BHSSXB
but replacing the letter with a small bar colored to represent the letter, with a legend, and optionally having the pitch number above the color . Somewhat like a sparkline.
Any suggestion on how to implement this much appreciated
And the same graph using ggplot.
Data courtesy of #GavinSimpson.
ggplot(baseball, aes(x=pitchNumber, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
Here is a base graphics idea from which to work. First some dummy data:
set.seed(1)
baseball <- data.frame(pitchNumber = seq_len(50),
outcome = factor(sample(c("B","H","S","S","X","H"),
50, replace = TRUE)))
> head(baseball)
pitchNumber outcome
1 1 H
2 2 S
3 3 S
4 4 H
5 5 H
6 6 H
Next we define the colours we want:
## better colours - like ggplot for the cool kids
##cols <- c("red","green","blue","yellow")
cols <- head(hcl(seq(from = 0, to = 360,
length.out = nlevels(with(baseball, outcome)) + 1),
l = 65, c = 100), -1)
then plot the pitchNumber as a height 1 histogram-like bar (type = "h"), suppressing the normal axes, and we add on points to the tops of the bars to help visualisation:
with(baseball, plot(pitchNumber, y = rep(1, length(pitchNumber)), type = "h",
ylim = c(0, 1.2), col = cols[outcome],
ylab = "", xlab = "Pitch", axes = FALSE, lwd = 2))
with(baseball, points(pitchNumber, y = rep(1, length(pitchNumber)), pch = 16,
col = cols[outcome]))
Add on the x-axis and the plot frame, plus a legend:
axis(side = 1)
box()
## note: this assumes that the levels are in alphabetical order B,H,S,X...
legend("topleft", legend = c("Ball","Hit","Swinging Strike","X??"), lty = 1,
pch = 16, col = cols, bty = "n", ncol = 2, lwd = 2)
Gives this:
This is in response to your last comment on #Gavin's answer. I'm going to build off of the data provided by #Gavin and the ggplot2 plot by #Andrie. ggplot() supports the concept of faceting by a variable or variables. Here you want to facet by pitcher and at the pitch limit of 50 per row. We'll create a new variable that corresponds to each row we want to plot separately. The equivalent code in base graphics would entail adjusting mfrow or mfcol in par() and calling separate plots for each group of data.
#150 pitches represents a somewhat typical 9 inning game.
#Thanks to Gavin for sample data.
longGame <- rbind(baseball, baseball, baseball)
#Starter goes 95 pitches, middle relief throws 35, closer comes in for 20 and the glory
longGame$pitcher <- c(rep("S", 95), rep("M", 35), rep("C",20))
#Adjust pitchNumber accordingly
longGame$pitchNumber <- c(1:95, 1:35, 1:20)
#We want to show 50 pitches at a time, so will combine the pitcher name
#with which set of pitches this is
longGame$facet <- with(longGame, paste(pitcher, ceiling(pitchNumber / 50), sep = ""))
#Create the x-axis in increments of 1-50, by pitcher
longGame <- ddply(longGame, "facet", transform, pitchFacet = rep(1:50, 5)[1:length(facet)])
#Convert facet to factor in the right order
longGame$facet <- factor(longGame$facet, levels = c("S1", "S2", "M1", "C1"))
#Thanks to Andrie for ggplot2 function. I change the x-axis and add a facet_wrap
ggplot(longGame, aes(x=pitchFacet, y=1, ymin=0, ymax=1, colour=outcome)) +
geom_point() +
geom_linerange() +
facet_wrap(~facet, ncol = 1) +
ylab(NULL) +
xlab(NULL) +
scale_y_continuous(breaks=c(0, 1)) +
opts(
panel.background=theme_blank(),
panel.grid.minor=theme_blank(),
axis.text.y = theme_blank()
)
You can obviously change the labels for the facet variable, but the above code will produce: