I am using geom_boxplot to draw candlesticks using stock market data. The problem is that the individual boxplot's upper and lower edges as well as the upper whisker end point show up way higher on the y-axis than their corresponding values. The relative height (difference between upper and lower edges) and the end point of the lower whisker of each boxplot are fine though. Here's my code :
candlestickPlot <- function(x){
library("ggplot2")
# x is a data.frame with columns 'date','open','high','low','close'
x$candleLower <- pmin(x$open, x$close)
x$candleUpper <- pmax(x$open, x$close)
x$candleMiddle <- NA
x$fill <- "red"
x$fill[x$open < x$close] = "green"
# Draw the candlesticks
g <- ggplot(x, aes(x=date, lower=candleLower, middle=candleMiddle, upper=candleUpper, ymin=low, ymax=high))
g <- g + geom_boxplot(stat='identity', aes(group=date, fill=fill))
g
}
Here's x :
date close volume open high low
5 2013-12-30 25.82 3525026 27.30 27.76 25.7
4 2013-12-31 27.41 5487204 25.25 27.70 25.25
3 2014-01-02 30.70 7835374 29.25 31.24 29.21
2 2014-01-03 30.12 4577278 31.49 31.80 30.08
1 2014-01-06 30.65 4042724 30.89 31.88 30.37
Am I doing something wrong here?
There are more efficient ways to create OHLC candlesticks with ggplot2 than the way you have described using geom_boxplot. Your code seems very similar to the example in the link:
http://www.perdomocore.com/2012/using-ggplot-to-make-candlestick-charts-alpha/
It seems many people are putting ggplot candlestick examples on the net that are based on the example in that link using geom_boxplot. But the problem with plotting with geom_boxplot is that the plotting itself gets slow at producing plots as the number of bars plotted increases.
Here is one computationally faster solution for plotting financial data using candlesticks/OHLC bars:
library(ggplot2)
library(quantmod)
FOSL <- getSymbols("FOSL", from="2015-01-01", auto.assign=FALSE)
names(FOSL) <- gsub("^.+\\.","",names(FOSL)) # remove "FOSL." from column names
rng <- "2015-08"
FOSL <- FOSL[rng]
FOSL <- data.frame(Date=as.POSIXct(index(FOSL)), FOSL[,1:4])
FOSL$chg <- ifelse(Cl(FOSL) > Op(FOSL), "up", "dn")
FOSL$width <- as.numeric(periodicity(FOSL)[1])
FOSL$flat_bar <- FOSL[, "High"] == FOSL[, "Low"]
# Candle chart:
pl <- ggplot(FOSL, aes(x=Date))+
geom_linerange(aes(ymin=Low, ymax=High)) +
theme_bw() +
labs(title="FOSL") +
geom_rect(aes(xmin = Date - width/2 * 0.9, xmax = Date + width/2 * 0.9, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = chg)) + guides(fill = FALSE, colour = FALSE) + scale_fill_manual(values = c("dn" = "darkred", "up" = "darkgreen"))
# Handle special case of drawing a flat bar where OHLC = Open:
if (any(FOSL$flat_bar)) pl <- pl + geom_segment(data = FOSL[FOSL$flat_bar,], aes(x = Date - width / 2 * 0.9, y = Close, yend = Close, xend = Date + width / 2 * 0.9))
print(pl)
Thank you FXQuantTrader for introducing a beautiful and fast alternative approach to the candlestick bars in R! Awesome, concise, easy to read!
Here comes a bit improved version of FXQuantTrader's solution, which include:
- wraps it into a function
- supports lower resolution (down to 1 sec bars)
- changes candle's whiskers colour from black to proper one
- adds small horizontal line for bars with Close == Open
- adds 3rd colour (blue) to bars with Close == Open
- adds 'alpha' argument which allows you to make the whole candlesticks chart more transparent, so when you draw on top some Bollinger Bands and/or Moving Averages the bars will be less distracting (more like a background)
- a bit more comments for newbies to figure out what is going on :)
Here she comes:
library(ggplot2)
library(quantmod)
draw_candles <- function(df, title_param, alpha_param = 1){
df$change <- ifelse(df$Close > df$Open, "up", ifelse(df$Close < df$Open, "down", "flat"))
# originally the width of the bars was calculated by FXQuantTrader with use of 'periodicity()', which
# seems to work ok only with: ‘minute’,‘hourly’, ‘daily’,‘weekly’, ‘monthly’,
# ‘quarterly’, and ‘yearly’, but can not do 1 sec bars while we want arbitrary bar size support!-)
# df$width <- as.numeric(periodicity(df)[1])
# So let us instead find delta (seconds) between 1st and 2nd row and just
# use it for all other rows. We check 1st 3 rows to avoid larger "weekend gaps"
width_candidates <- c(as.numeric(difftime(df$Date[2], df$Date[1]), units = "secs"),
as.numeric(difftime(df$Date[3], df$Date[2]), units = "secs"),
as.numeric(difftime(df$Date[4], df$Date[3]), units = "secs"))
df$width_s = min(width_candidates) # one (same) candle width (in seconds) for all the bars
# define the vector of candle colours either by name or by rgb()
#candle_colors = c("down" = "red", "up" = "green", "flat" = "blue")
candle_colors = c("down" = rgb(192,0,0,alpha=255,maxColorValue=255), "up" = rgb(0,192,0,alpha=255,maxColorValue=255), "flat" = rgb(0,0,192,alpha=255,maxColorValue=255))
# Candle chart:
g <- ggplot(df, aes(x=Date))+
geom_linerange(aes(ymin=Low, ymax=High, colour = change), alpha = alpha_param) + # candle whiskerss (vertical thin lines:)
theme_bw() +
labs(title=title_param) +
geom_rect(aes(xmin = Date - width_s/2 * 0.9, xmax = Date + width_s/2 * 0.9, ymin = pmin(Open, Close), ymax = pmax(Open, Close), fill = change), alpha = alpha_param) + # cabdke body
guides(fill = FALSE, colour = FALSE) +
scale_color_manual(values = candle_colors) + # color for line
scale_fill_manual(values = candle_colors) # color for candle fill
# Handle special cases: flat bar and Open == close:
if (any(df$change == "flat")) g <- g + geom_segment(data = df[df$change == "flat",], aes(x = Date - width_s / 2 * 0.9, y = Close, yend = Close, xend = Date + width_s / 2 * 0.9, colour = change), alpha = alpha_param)
#print(g)
g
}
Could not completely understand your problem but this seems to work nicely:
http://www.perdomocore.com/2012/using-ggplot-to-make-candlestick-charts-alpha/
I created a package generating candlestick chart with the possibility of further extension.
https://github.com/dominikduda/candlePlotter
From help:
Plots OHLC chart
(...)
Arguments:
time_series: A data frame with c('Time', 'Open', 'High', 'Low',
'Close') columns where Time column must be of POSIXct type.
chart_title: An optional string with main chart title
under_candles_layers: A vector of ggplot layers to print under candles
Working example how to use:
# Plotting a chart and saving it from a string:
raw_data <- "
Time Open High Low Close
2018-08-30 7050.267 7068.232 6740.648 6985.976
2018-08-31 6982.225 7075.417 6915.935 7046.783
2018-09-01 7040.911 7257.571 7030.790 7193.122
2018-09-02 7203.630 7314.289 7136.561 7277.199
2018-09-03 7286.205 7334.481 7201.419 7255.241
2018-09-04 7269.067 7394.179 7251.269 7364.443
2018-09-05 7365.232 7391.967 6704.715 6704.715
2018-09-06 6715.508 6715.508 6365.000 6503.564
2018-09-07 6514.690 6544.672 6378.351 6446.210
2018-09-08 6426.220 6485.850 6147.691 6203.588
2018-09-09 6202.271 6417.675 6178.907 6260.216
2018-09-10 6270.848 6351.214 6263.048 6317.647
2018-09-11 6320.536 6391.365 6241.453 6289.961
2018-09-12 6296.140 6349.481 6238.578 6339.010
2018-09-13 6345.973 6525.523 6337.746 6498.652
2018-09-14 6488.631 6583.669 6428.993 6492.367
2018-09-15 6488.870 6561.979 6480.306 6524.671"
data_for_chart <- read.table(text = raw_data, header = TRUE)
data_for_chart <- transform(data_for_chart, Time = as.POSIXct(Time))
plot <- prettyCandlePlot(data_for_chart, 'BTCUSD')
ggsave(
'btc_usd_daily.png',
plot = plot,
width = 30,
height = 18,
units = 'cm'
)
Related
I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)
I would like to make this plot:
Plot 1: The plot that I wanted
My data looks like this:
> head(ranges_example)
labels Minimum Maximum error
1 One -275 -240 1
2 Two -265 -210 1
3 Three -260 -215 1
4 Four -273 -230 1
5 Five NaN -200 1
6 Six NaN -240 1
But, alas, I had to make that plot in illustrator by modifying the plot that I did make in R, this one:
Plot 2: The plot that I got
And I made it using geom_linerange, specifically:
ggplot() +
geom_linerange(data = ranges_example,
mapping=aes(x = labels, ymin = Minimum, ymax = Maximum,
lwd = 1, color = error, alpha = 0.5),
position = position_dodge(width = 1)) +
scale_y_continuous(c(-240, -300)) +
coord_flip()
Plot 2 is good enough for this once--it takes maybe 15 minutes to turn it into Plot 1 in Illustrator--but I'll probably need to make a good few more of these.
The reason why I don't just remove the position_dodge statement is that then it just blends the colors together, like this:
I need them to be their own, distinct colors so that it's easy to tell them apart. The different shades mean different things and I need to be able to easily distinguish between and alter them.
How can I create a plot that looks more like Plot 2 right out of the box?
ggplot() +
geom_linerange(data = ranges_example %>% arrange(-error),
mapping=aes(x = labels, ymin = Minimum, ymax = Maximum,
lwd = 1, color = error)) +
scale_y_continuous(c(-240, -300)) +
scale_color_continuous(high = "lightgreen", low = "forestgreen") +
coord_flip() +
theme_classic()
# Example data
ranges_example <- tribble(
~labels, ~Minimum, ~Maximum, ~error,
"One", -275, -240, 1,
"Two", -265, -210, 1,
"One", -285, -215, 2,
"Two", -275, -190, 2,
"One", -300, -200, 3,
"Two", -290, -180, 3)
I want to create an histogram from already existing classes. I have this dataset:
interval counts
0 - 8.50 2577
8.51 - 10.00 1199
10.01 - 12.00 1878
12.01 - 14.00 637
14.01 - 16.00 369
16.01 - 18.00 98
18.00 - 20.00 308
library(ggplot2)
plot_tab5_lohn <- ggplot(DS18, aes(x=interval)) + geom_histogram(stat="count")
return(plot_tab5_lohn)})
does result in this graph:
I want the counts to be on the y axis and the intervals have to be a different width. How can I do this?
EDIT:
I've made it this far:
using this code
DS18$interval <- factor(DS18$interval, levels = DS18$interval)
output$DS32 <- renderPlot({
plot_tab5_lohn <- ggplot(DS18, aes(x=interval, y = counts)) +
geom_col() +
geom_point(color = "red") +
geom_line(aes(group = 1), color = "red")
return(plot_tab5_lohn)
})
I'd like the bars to be as wide as the interval itself. And the density should be on the Y-Axis. The sum of the areas should be 1 (100%) then.
Something like this link
You can extract the boundaries, then plot using geom_rect:
# Using dt from #www
library(tidyr)
dt2 <- separate(dt, interval, c('left', 'right'), sep = ' - ', convert = TRUE)
ggplot(dt2) +
geom_rect(aes(xmin = left, xmax = right, ymin = 0, ymax = counts),
col = 1) +
geom_line(aes(x = right + (left - right) / 2, y = counts),
col = 'red')
Alternatively, you can first expand your data into single observations, this also easily allows you to plot the densities instead:
library(dplyr)
library(tidyr)
dt3 <- dt %>%
group_by(interval) %>%
do(data.frame(interval = rep.int(.$interval, .$counts), stringsAsFactors = FALSE)) %>%
separate(interval, c('left', 'right'), sep = ' - ', convert = TRUE) %>%
mutate(value = right + (left - right) / 2)
breaks <- c(0, unique(dt3$right))
ggplot(dt3, aes(value)) +
geom_histogram(aes(y = ..density..), breaks = breaks, col = 1) +
geom_freqpoly(aes(y = ..density..), breaks = breaks, col = 'red')
I think what you need is not a histogram, but a barplot. Here I showed how to use geom_col to create a barplot. Notice that I used factor to sort the bar of each class before plotting the data.
library(ggplot2)
# Order the bar
dt$interval <- factor(dt$interval, levels = dt$interval)
# Create the bar plot
ggplot(dt, aes(x=interval, y = counts)) + geom_col()
DATA
dt <- read.table(text = "interval counts
'0 - 8.50' 2577
'8.51 - 10.00' 1199
'10.01 - 12.00' 1878
'12.01 - 14.00' 637
'14.01 - 16.00' 369
'16.01 - 18.00' 98
'18.00 - 20.00' 308",
header = TRUE, stringsAsFactors = FALSE)
You can use stat = "identity" and add a y aesthetic to get your desired graph:
ggplot(DS18, aes(x=interval, y = counts)) +
geom_histogram(stat="identity")
that gives you this:
I'd like to make small returns in this plot more visible. The most appropriate function seems to be scale_colour_gradient2, but this washes out the small returns, which happen most often. Using limits helped but I couldn't work out how to set oob (out of bounds) so it would just have a "saturated" value rather than be grey. And the log transform just made small values stand out. Has someone else figured out how to do this elegantly?
library(zoo)
library(ggplot2)
library(tseries)
spx <- get.hist.quote(instrument="^gspc", start="2000-01-01",
end="2013-12-14", quote="AdjClose",
provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
spx.rtn <- diff(log(spx$AdjClose)) * 100
rtn.data <- data.frame(x=time(spx.rtn),yend=spx.rtn)
p <- ggplot(rtn.data) +
geom_segment(aes(x=x,xend=x,y=0,yend=yend,colour=yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
theme(legend.position="null",axis.title.x=element_blank())
# low returns invisible
p + scale_colour_gradient2(low="blue",high="red")
# extreme values are grey
p + scale_colour_gradient2(low="blue",high="red",limits=c(-3,3))
# log transform returns has opposite problem
max_val <- max(log(abs(spx.rtn)))
values <- seq(-max_val, max_val, length = 11)
library(RColorBrewer)
p + scale_colour_gradientn(colours = brewer_pal(type="div",pal="RdBu")(11),
values = values
, rescaler = function(x, ...) sign(x)*log(abs(x)), oob = identity)
Here is another possibility, using scale_colour_gradientn. Mapping of colours is set using values = rescale(...) so that resolution is higher for values close to zero. I had a look at some colour scales here: http://colorbrewer2.org. I chose a 5-class diverging colour scheme, RdBu, from red to blue via near-white. There might be other scales that suit your needs better, this is just to show the basic principles.
# check the colours
library(RColorBrewer)
# cols <- brewer_pal(pal = "RdBu")(5) # not valid in 1.1-2
cols <- brewer.pal(n = 5, name = "RdBu")
cols
# [1] "#CA0020" "#F4A582" "#F7F7F7" "#92C5DE" "#0571B0"
# show_col(cols) # not valid in 1.1-2
display.brewer.pal(n = 5, name = "RdBu")
Using rescale, -10 corresponds to blue #0571B0; -1 = light blue #92C5DE; 0 = light grey #F7F7F7; 1 = light red #F4A582; 10 = red #CA0020. Values between -1 and 1 are interpolated between light blue and light red, et c. Thus, mapping is not linear and resolution is higher for small values.
library(ggplot2)
library(scales) # needed for rescale
ggplot(rtn.data) +
geom_segment(aes(x = x, xend = x, y = 0, yend = yend, colour = yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
scale_colour_gradientn(colours = cols,
values = rescale(c(-10, -1, 0, 1, 10)),
guide = "colorbar", limits=c(-10, 10)) +
theme(legend.position = "null", axis.title.x = element_blank())
how about:
p + scale_colour_gradient2(low="blue",high="red",mid="purple")
or
p + scale_colour_gradient2(low="blue",high="red",mid="darkgrey")
I'd like to make small returns in this plot more visible. The most appropriate function seems to be scale_colour_gradient2, but this washes out the small returns, which happen most often. Using limits helped but I couldn't work out how to set oob (out of bounds) so it would just have a "saturated" value rather than be grey. And the log transform just made small values stand out. Has someone else figured out how to do this elegantly?
library(zoo)
library(ggplot2)
library(tseries)
spx <- get.hist.quote(instrument="^gspc", start="2000-01-01",
end="2013-12-14", quote="AdjClose",
provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
spx.rtn <- diff(log(spx$AdjClose)) * 100
rtn.data <- data.frame(x=time(spx.rtn),yend=spx.rtn)
p <- ggplot(rtn.data) +
geom_segment(aes(x=x,xend=x,y=0,yend=yend,colour=yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
theme(legend.position="null",axis.title.x=element_blank())
# low returns invisible
p + scale_colour_gradient2(low="blue",high="red")
# extreme values are grey
p + scale_colour_gradient2(low="blue",high="red",limits=c(-3,3))
# log transform returns has opposite problem
max_val <- max(log(abs(spx.rtn)))
values <- seq(-max_val, max_val, length = 11)
library(RColorBrewer)
p + scale_colour_gradientn(colours = brewer_pal(type="div",pal="RdBu")(11),
values = values
, rescaler = function(x, ...) sign(x)*log(abs(x)), oob = identity)
Here is another possibility, using scale_colour_gradientn. Mapping of colours is set using values = rescale(...) so that resolution is higher for values close to zero. I had a look at some colour scales here: http://colorbrewer2.org. I chose a 5-class diverging colour scheme, RdBu, from red to blue via near-white. There might be other scales that suit your needs better, this is just to show the basic principles.
# check the colours
library(RColorBrewer)
# cols <- brewer_pal(pal = "RdBu")(5) # not valid in 1.1-2
cols <- brewer.pal(n = 5, name = "RdBu")
cols
# [1] "#CA0020" "#F4A582" "#F7F7F7" "#92C5DE" "#0571B0"
# show_col(cols) # not valid in 1.1-2
display.brewer.pal(n = 5, name = "RdBu")
Using rescale, -10 corresponds to blue #0571B0; -1 = light blue #92C5DE; 0 = light grey #F7F7F7; 1 = light red #F4A582; 10 = red #CA0020. Values between -1 and 1 are interpolated between light blue and light red, et c. Thus, mapping is not linear and resolution is higher for small values.
library(ggplot2)
library(scales) # needed for rescale
ggplot(rtn.data) +
geom_segment(aes(x = x, xend = x, y = 0, yend = yend, colour = yend)) +
xlab("") + ylab("S&P 500 Daily Return %") +
scale_colour_gradientn(colours = cols,
values = rescale(c(-10, -1, 0, 1, 10)),
guide = "colorbar", limits=c(-10, 10)) +
theme(legend.position = "null", axis.title.x = element_blank())
how about:
p + scale_colour_gradient2(low="blue",high="red",mid="purple")
or
p + scale_colour_gradient2(low="blue",high="red",mid="darkgrey")