R ggplot2: colouring step plot depending on value - r

How do I configure a ggplot2 step plot so that when the value being plotted is over a certain level it is one colour and when it is below that certain level it is another colour? (Ultimately I would like to specify the colours used.)
My first thought was that this would be a simple issue that only required me to add a column to my existing data frame and map this column to the aes() for geom_step(). That works to a point: I get two colours, but they overlap as shown in this image:
I have searched SO for the past several hours and found many similar but not identical questions. However, despite trying a wide variety of combinations in different layers I have not been able to resolve the problem. Code follows. Any help much appreciated.
require(ggplot2)
tmp <- structure(list(date = structure(c(1325635200, 1325635800, 1325636400,
1325637000, 1325637600, 1325638200, 1325638800, 1325639400, 1325640000,
1325640600, 1325641200, 1325641800, 1325642400, 1325643000, 1325643600,
1325644200, 1325647800, 1325648400, 1325649000, 1325649600, 1325650200,
1325650800, 1325651400, 1325652000, 1325652600, 1325653200, 1325653800,
1325654400, 1325655000, 1325655600, 1325656200, 1325656800), tzone = "", tclass = c("POSIXct",
"POSIXt"), class = c("POSIXct", "POSIXt")), Close = c(739.07,
739.86, 740.41, 741.21, 740.99, 741.69, 742.64, 741.34, 741.28,
741.69, 741.6, 741.32, 741.95, 741.86, 741.02, 741.08, 742.08,
742.88, 743.19, 743.18, 743.78, 743.65, 743.66, 742.78, 743.34,
742.81, 743.31, 743.81, 742.91, 743.09, 742.47, 742.99)), .Names = c("date",
"Close"), row.names = c(NA, -32L), class = "data.frame")
prevclose <- 743
tmp$status <- as.factor(ifelse (tmp$Close> prevclose, "Above", "Below"))
ggplot() +
geom_step(data = tmp,aes(date, Close, colour = status))

You need group = 1 in aes:
# top panel
ggplot(tmp, aes(date, Close, colour = status, group = 1)) +
geom_step() + scale_colour_manual(values = c("pink", "green"))
Maybe you want to do something like this:
# make sure that data is sorted by date
tmp2 <- arrange(tmp, date)
# add intermittent column between below/above
tmp3 <- tmp2[1, ]
for (i in seq(nrow(tmp2))[-1]) {
if (tmp2[i-1, ]$status != tmp2[i, ]$status) {
tmp3 <- rbind(tmp3,
transform(tmp2[i, ], Close = prevclose, status = tmp2[i-1, ]$status),
transform(tmp2[i, ], Close = prevclose))
}
tmp3 <- rbind(tmp3, tmp2[i, ])
}
# bottom panel
ggplot(tmp3, aes(date, Close, colour = status, group = 1)) + geom_step() +
scale_colour_manual(values = c("pink", "green"))

Related

Plotting geom_segment with position_dodge

I have a data set with information of where individuals work at over time. More specifically, I have information on the interval at which individuals work in a given workplace.
library('tidyverse')
library('lubridate')
# individual A
a_id <- c(rep('A',1))
a_start <- c(201201)
a_end <- c(201212)
a_workplace <-c(1)
# individual B
b_id <- c(rep('B',2))
b_start <- c(201201, 201207)
b_end <- c(201206, 201211)
b_workplace <-c(1, 2)
# individual C
c_id <- c(rep('C',2))
c_start <- c(201201, 201202)
c_end <- c(201204, 201206)
c_workplace <-c(1, 2)
# individual D
d_id <- c(rep('D',1))
d_start <- c(201201)
d_end <- c(201201)
d_workplace <-c(1)
# final data frame
id <- c(a_id, b_id, c_id, d_id)
start <- c(a_start, b_start, c_start, d_start)
end <- c(a_end, b_end, c_end, d_end)
workplace <- as.factor(c(a_workplace, b_workplace, c_workplace, d_workplace))
mydata <- data.frame(id, start, end, workplace)
mydata_ym <- mydata %>%
mutate(ymd_start = as.Date(paste0(start, "01"), format = "%Y%m%d"),
ymd_end0 = as.Date(paste0(end, "01"), format = "%Y%m%d"),
day_end = as.numeric(format(ymd_end0 + months(1) - days(1), format = "%d")),
ymd_end = as.Date(paste0(end, day_end), format = "%Y%m%d")) %>%
select(-ymd_end0, -day_end)
I would like a plot where I can see the patterns of how long each individual works at each workplace as well as how they move around. I tried plotting a geom_segment as I have information of start and end date the individual works in each place. Besides, because the same individual may work in more than one place during the same month, I would like to use position_dodge to make it visible when there is overlap of different workplaces for the same id-time. This was suggested in this post here: Ggplot (geom_line) with overlaps
ggplot(mydata_ym) +
geom_segment(aes(x = id, xend = id, y = ymd_start, yend = ymd_end),
position = position_dodge(width = 0.1), size = 2) +
scale_x_discrete(limits = rev) +
coord_flip() +
theme(panel.background = element_rect(fill = "grey97")) +
labs(y = "time", title = "Work affiliation")
The problem I am having is that: (i) the position_dodge doesn't seem to be working, (ii) I don't know why all the segments are being colored in black. I would expect each workplace to have a different color and a legend to show up.
If you include colour = workplace in the aes() mapping for geom_segment you get colours and a legend and some dodging, but it doesn't work quite right (it looks like position_dodge only applies to x and not xend ... ? this seems like a bug, or at least an "infelicity", in position_dodge ...
However, replacing geom_segment with an appropriate use of geom_linerange does seem to work:
ggplot(mydata_ym) +
geom_linerange(aes(x = id, ymin = ymd_start, ymax = ymd_end, colour = workplace),
position = position_dodge(width = 0.1), size = 2) +
scale_x_discrete(limits = rev) +
coord_flip()
(some tangential components omitted).
A similar approach is previously documented here — a near-duplicate of your question once the colour= mapping is taken care of ...

Gap between forecast and actual data in ggplot

I am trying to plot some data, fitted values and forecasts on a nice ggplot format but when I plot my data the way I think should work I get a gap between the real data and the forecast. The gap is meaningless but it would be nice if it was gone.
Some R code you can use to recreate my problem is:
library(xts)
library(tidyverse)
library(forecast)
dates <- seq(as.Date("2016-01-01"), length = 100, by = "days")
realdata <- arima.sim(model = list(ar = 0.7, order = c(1,1,0)), n = 99)
data <- xts(realdata, order.by = dates)
user_arima <- arima(data, order = c(1,1,0))
user_arimaf <- forecast(user_arima)
fits <- xts(user_arimaf$fitted, order.by = dates)
fcastdates <- as.Date(dates[100]) + 1:10
meancast <- xts(user_arimaf$mean[1:10], order.by = fcastdates)
lowercast95 <- xts(user_arimaf$lower[1:10], order.by = fcastdates)
uppercast95 <- xts(user_arimaf$upper[1:10], order.by = fcastdates)
frame <- merge(data, fits, meancast, uppercast95, lowercast95, all = TRUE, fill = NA)
frame <- as.data.frame(frame) %>%
mutate(date = as.Date(dates[1] + 0:(109)))
frame %>%
ggplot() +
geom_line(aes(date, data, color = "Data")) +
geom_line(aes(date, fits, color = "Fitted")) +
geom_line(aes(date, meancast, color = "Forecast")) +
geom_ribbon(aes(date, ymin=lowercast95,ymax=uppercast95),alpha=.25) +
scale_color_manual(values = c(
'Data' = 'black',
'Fitted' = 'red',
'Forecast' = 'darkblue')) +
labs(color = 'Legend') +
theme_classic() +
ylab("some data") +
xlab("Date") +
labs(title = "chart showing a gap",
subtitle = "Shaded area is the 95% CI from the ARIMA")
And the chart is below
I know there is a geom_forecast in ggplot now but I would like to build this particular plot the way i'm doing it. Although if there's no other solution to the gap then i'll use the geom_forecast.
Closing the gap requires providing a data point in the meancast column for the blank area. I guess it makes sense just to use the value for the last "real" data point.
# Grab the y-value corresponding to the date just before the gap.
last_data_value = frame[frame$date == as.Date("2016-04-09"), "data"]
# Construct a one-row data.frame.
extra_row = data.frame(data=NA_real_,
fits=NA_real_,
meancast=last_data_value,
uppercast95=last_data_value,
lowercast95=last_data_value,
date=as.Date("2016-04-09"))
# Add extra row to the main data.frame.
frame = rbind(frame, extra_row)

Map custom color gradient to POSIXct values

Data:
df1 <- structure(list(Index = 1:11, Duration = structure(c(1487577655,
1487577670, 1487577675, 1487577680, 1487577685, 1487577680, 1487577700,
1487577705, 1487577695, 1487577700, 1487577680), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("Index", "Duration"), class = "data.frame", row.names = 3:13)
Now I construct the graph as follows:
g1 <- ggplot(df1, aes(x = Index, y = Duration, color = Duration))+
geom_point()+
geom_line()+
scale_y_datetime(labels = date_format("%M:%S"))
As it is now, the color scale is set to the default "Black" to "Blue" gradient.
The problem is, I get an error trying to assign a custom gradient to the data.
For a non-POSIXct object:
scale_color_gradient("Duration", low = "#D80427", high = "#07a0ff", space = "Lab")
works, but I get the following error with the POSIXct object df1$Duration as the explanatory variable:
Error in Ops.POSIXt((x - from[1]), diff(from)) : '/' not defined
for "POSIXt" objects
Is there a different gradient function I need to use when graphing a POSIXct object?
You may use trans = time_trans():
library(ggplot2)
library(scales)
g1 +
scale_color_gradient("Duration", low = "#D80427", high = "#07a0ff",
trans = time_trans())
If you wish another format of the labels in the legend, add e.g. labels = format(pretty(df1$Duration), "%M:%S").
We can convert date to number for colour:
library(ggplot2)
library(scales)
ggplot(df1, aes(x = Index, y = Duration, color = as.numeric(Duration))) +
geom_point() +
geom_line() +
scale_y_datetime(labels = date_format("%M:%S")) +
scale_color_gradient("Duration", low = "#D80427", high = "#07A0FF",
labels = c("00", "10", "20", "30", "40"))
As suggested by #Henrik, to avoid hardcoding the labels use below:
# avoid hardcoding labels using pretty()
ggplot(df1, aes(x = Index, y = Duration, color = as.numeric(Duration))) +
geom_point() +
geom_line() +
scale_y_datetime(labels = date_format("%M:%S")) +
scale_color_gradient("Duration", low = "#D80427", high = "#07A0FF",
breaks = pretty(as.numeric(df1$Duration)),
labels = format(pretty(df1$Duration), "%M:%S"))

ggplot and the geom_text()

I am new to R and ggplot2.I have searched a lot regarding this but I could not find the solution.
Sample observation1 observation2 observation3 percentage
sample1_A 163453473 131232689 61984186 30.6236955883
Sample1_B 170151351 137202212 59242536 26.8866816109
sample2_A 194102849 162112484 89158170 40.4183031852
sample2_B 170642240 141888123 79925652 41.7493687378
sample3_A 192858504 161227348 90532447 41.8068248626
sample3_B 177174787 147412720 81523935 40.5463120438
sample4_A 199232380 174656081 118115358 55.6409038531
sample4_B 211128931 186848929 123552556 54.7201927527
sample5_A 186039420 152618196 87012356 40.9656544833
sample5_B 145855252 118225865 66265976 39.5744515254
sample6_A 211165202 186625116 112710053 48.5457722338
sample6_B 220522502 193191927 114882014 47.238670909
I am planning to plot a bar plot with ggplot2. I want to plot the first three columns as a bar plot "dodge" and label the observation3 bar with the percentage. I could plot the bars as below but I could not use geom_text() to add the label.
data1 <- read.table("readStats.txt", header=T)
data1.long <- melt(data1)
ggplot(data1.long[1:36,], aes(data1.long$Sample[1:36],y=data1.long$value[1:36], fill=data1.long$variable[1:36])) + geom_bar(stat="identity", width=0.5, position="dodge")
Transform data1 to long form with the observation columns as the measure variables and the Sample and percentage columns as the id variables. Compute the maximum value, mx, to be used to place the percentages. Then perform the plot. Note that geom_bar uses data1.long but geom_text uses data1. We have colored the text giving the percentages the same color as the observation3 bars. (See this post for how to specify default colors.) Both inherit aes(x = Sample) but use different y and other aesthetics. We clean up the X axis labels by removing all lower case letters and underscores from the data1$Sample (optional).
library(ggplot2)
library(reshape2)
data1.long <- melt(data1, measure = 2:4) # cols 2:4 are observation1, ..., observation3
mx <- max(data1.long$value) # maximum observation value
ggplot(data1.long, aes(x = Sample, y = value)) +
geom_bar(aes(fill = variable), stat = "identity", width = 0.5, position = "dodge") +
geom_text(aes(y = mx, label = paste0(round(percentage), "%")), data = data1,
col = "#619CFF", vjust = -0.5) +
scale_x_discrete(labels = gsub("[a-z_]", "", data1$Sample))
(click on chart to enlarge)
Note: We used this data. Note that one occurrence of Sample was changed to sample with a lower case s:
Lines <- "Sample observation1 observation2 observation3 percentage
sample1_A 163453473 131232689 61984186 30.6236955883
sample1_B 170151351 137202212 59242536 26.8866816109
sample2_A 194102849 162112484 89158170 40.4183031852
sample2_B 170642240 141888123 79925652 41.7493687378
sample3_A 192858504 161227348 90532447 41.8068248626
sample3_B 177174787 147412720 81523935 40.5463120438
sample4_A 199232380 174656081 118115358 55.6409038531
sample4_B 211128931 186848929 123552556 54.7201927527
sample5_A 186039420 152618196 87012356 40.9656544833
sample5_B 145855252 118225865 66265976 39.5744515254
sample6_A 211165202 186625116 112710053 48.5457722338
sample6_B 220522502 193191927 114882014 47.238670909"
data1 <- read.table(text = Lines, header = TRUE)
UPDATE: minor improvements
It might be that G. Grothendieck's answer is a better solution, but here's my suggestion (code below)
# install.packages("ggplot2", dependencies = TRUE)
require(ggplot2)
df <- structure(list(Sample = structure(1:12, .Label = c("sample1_A",
"Sample1_B", "sample2_A", "sample2_B", "sample3_A", "sample3_B",
"sample4_A", "sample4_B", "sample5_A", "sample5_B", "sample6_A",
"sample6_B"), class = "factor"), observation1 = c(163453473L,
170151351L, 194102849L, 170642240L, 192858504L, 177174787L, 199232380L,
211128931L, 186039420L, 145855252L, 211165202L, 220522502L),
observation2 = c(131232689L, 137202212L, 162112484L, 141888123L,
161227348L, 147412720L, 174656081L, 186848929L, 152618196L,
118225865L, 186625116L, 193191927L), observation3 = c(61984186L,
59242536L, 89158170L, 79925652L, 90532447L, 81523935L, 118115358L,
123552556L, 87012356L, 66265976L, 112710053L, 114882014L),
percentage = c(30.6236955883, 26.8866816109, 40.4183031852,
41.7493687378, 41.8068248626, 40.5463120438, 55.6409038531,
54.7201927527, 40.9656544833, 39.5744515254, 48.5457722338,
47.238670909)), .Names = c("Sample", "observation1", "observation2",
"observation3", "percentage"), class = "data.frame", row.names = c(NA,
-12L))
# install.packages("reshape2", dependencies = TRUE)
require(reshape2)
data1.long <- melt(df, id=c("Sample"), measure.var = c("observation1", "observation2", "observation3"))
data1.long$percentage <- paste(round(data1.long$percentage, 2), "%", sep="")
data1.long[data1.long$variable == "observation1" | data1.long$variable == "observation2" ,2] <- ""
ggplot(data1.long, aes(x = Sample, y = value, fill=variable)) +
geom_bar(, stat="identity", width=0.5, position="dodge") +
geom_text(aes(label = percentage), vjust=2.10, size=2, hjust=-.06, angle = 90)

Overlaying plots with a horizontal date in R

I was attempting to overlay two plots using ggplot2, I can graph them individually, but I want to overlay them to show a comparison. They have the same y axis. The y axis is a score from 0 to 100, the x axis is a specific date in the month (from a range of 3 weeks)
Here is what I have tried:
data <- read.table(text = Level5avg, header = TRUE)
data2 <- read.table(text = Level6avg, header = TRUE)
colnames(data) = c("x","y")
colnames(data2) = c("x","y")
ggplot(rbind(data.frame(data2, group="a"), data.frame(data, group="b")), aes(x=x,y=y)) +
stat_density2d(geom="tile", aes(fill = group, alpha=..density..), contour=FALSE) + scale_fill_manual(values=c("b"="#FF0000", "a"="#00FF00")) + geom_point() + theme_minimal()
When I do this, I get a strange graph that has several dots, but I'm not sure if my code is right, since I can't distinguish the data. I want to add 3 more (small) datasets to the plot, if it is possible. If it is possible, how do I make it into a line graph in order to distinguish the datasets?
Note: I was under the impression ggplot would work for my purposes because of this post (and several other posts on this site advised using ggplot as opposed to Lattice). I'm not sure if what I want is possible, so I came here.
Data sets:
dput(data) structure(list(x = structure(1:6, .Label = c("10/27/2015",
"10/28/2015",
"10/29/2015", "10/30/2015", "10/31/2015", "11/1/2015"), class = "factor"),
y = c(0, 12.5, 0, 0, 11, 43)), .Names = c("x", "y"), class = "data.frame",
row.names = c(NA, -6L))
dput(data2) structure(list(x = structure(1:3, .Label
=c("10/28/2015","10/31/2015",
"11/1/2015"), class = "factor"), y = c(0, 0, 41.5)), .Names = c("x",
"y"), class = "data.frame", row.names = c(NA, -3L))
I've now managed to get my overlay, but is there a way to organize the horizontal axis? The dates have no order.
It seems to me that the answer that you are basing your plots on uses density plots that are not useful for your data. If you are just looking for some line plots with points, you could do the following (note I created a dataframe outside of the ggplot() call to make it look a little cleaner):
data$group <- "b"
data2$group <- "a"
df <- rbind(data2,data)
df$x <- as.Date(df$x,"%m/%d/%Y")
ggplot(df,aes(x=x,y=y,group=group,color=group)) + geom_line() +
geom_point() + theme_minimal()
Note that by converting the date, the dates end up in the right order all on their own.

Resources