I have some time series data with gaps.
df<-read.table(header=T,sep=";", text="Date;x1;x2
2014-01-10;2;5
2014-01-11;4;7
2014-01-20;8;9
2014-01-21;10;15
")
df$Date <- strptime(df$Date,"%Y-%m-%d")
df.long <- melt(df,id="Date")
ggplot(df.long, aes(x = Date, y = value, fill = variable, order=desc(variable))) +
geom_area(position = 'stack')
Now ggplot fills in the missing dates (12th, 13th, ...). What I want is just ggplot to not interpolate the missing dates and just draw the data available. I've tried filling NA with merge for the missing dates, which results in an error message of removed rows.
Is this possible? Thanks.
You can add an additional variable, group, to your data frame indicating whether the difference between two dates is not equal to one day:
df.long$group <- c(0, cumsum(diff(df.long$Date) != 1))
ggplot(df.long, aes(x = Date, y = value, fill = variable,
order=desc(variable))) +
geom_area(position = 'stack', aes(group = group))
Update:
In order to remove the space between the groups, I recommend facetting:
library(plyr)
df.long2 <- ddply(df.long, .(variable), mutate,
group = c(0, cumsum(diff(Date) > 1)))
ggplot(df.long2, aes(x = Date, y = value, fill = variable,
order=desc(variable))) +
geom_area(position = 'stack',) +
facet_wrap( ~ group, scales = "free_x")
Related
I want to create a function that makes a heatmap where the y axis will have unique breaks, but repeated and ordered labels. I know that this is might not be a great practice. I am also aware that similar questions have been asked before. For example: ggplot in R, reordering the bars. But I want to achieve these repeated and ordered labels through sorting within a function, not by typing them manually. I am aware of solutions for reordering axes based on the values of factor (e.g., Order Bars in ggplot2 bar graph), but I don't think they apply or can't see how to apply these to my case, where the breaks are unique but the labels repeat.
Here is some code to reproduce the problem and some of my attempts:
Libraries and data
library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(4)
id <- LETTERS[1:10]
lab <- paste(c("AB", "CD"), 1:5, sep = "_") %>%
sample(., size = 10, replace = TRUE)
val <- sample.int(n = 6, size = 10, replace = TRUE)
tes <- ifelse(val >= 4, 1, 0)
dat <- data.frame(id, lab, val, tes)
A heatmap with unique breaks on the y axis
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)
A heatmap where the y axis is labeled with repeated labels instead of the unique breaks
This works, to the point that labels are used instead of unique ids, but the y axis is not ordered by the labels. Also, I am not sure about setting breaks and labels from the data frame in wide format (dat), rather than the data frame in long format used by ggplot (dat2).
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks=dat$id, labels=dat$lab)
Mapping the vector of with repeated values on the y axis obviously doesn't work
dat2 <- dat %>% gather(kind, value, val:tes)
ggplot(dat2) +
geom_tile(aes(x = kind, y = lab, fill = value), color="white", size=1)
Repeated and ordered labels, try 1
As expected, merely sorting the input data by the non-unique lab variable does not work.
dat2 <- dat %>% gather(kind, value, val:tes) %>%
arrange(lab)
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks=id, label=lab)
Repeated and ordered labels, try 2
Try to create a named breaks vector ordered by the (repeating) labels. This gets me nowhere. Half the labels are missing and they are still not sorted.
dat2 <- dat %>% gather(kind, value, val:tes)
brks <- setNames(dat$id, dat$lab)[sort(dat$lab)]
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks = brks, labels = names(brks))
Repeated and ordered labels, try 3
Starting with the data frame sorted by label, try to create an ordered factor for lab. Then sort the table by this ordered factor. No luck.
dat2 <- dat %>% gather(kind, value, val:tes) %>% arrange(lab)
dat2 <- mutate(dat2, lab_f=factor(lab, levels=sort(unique(lab)), ordered = TRUE))
dat2 <- arrange(dat2, lab_f)
# check
dat2$lab_f
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
scale_y_discrete(breaks = dat2$id, labels = dat2$lab_f)
A workaround, which I can use if I have to, but I am trying to avoid
We can create a combination of id and lab which will be unique and use it for the y axis
dat2 <- dat %>% gather(kind, value, val:tes) %>%
mutate(id_lab=paste(lab, id, sep="_"))
ggplot(dat2) +
geom_tile(aes(x = kind, y = id_lab, fill = value), color="white", size=1)
I must be missing something. Any help is much appreciated.
The goal is to have a function that will take an arbitrarily long table and plot a y axis with unique breaks but (possibly) repeated and ordered labels.
heat <- function(dat) {
dat2 <- dat %>% gather(kind, value, val:tes)
# any other manipulation here
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)
# scale_y_discrete() (if needed)
}
The plot I am looking for is something like this (created in inkscape)
Using limits instead of breaks sets the order:
ggplot(dat2) +
geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
geom_text(aes(x = 1, y = id, label = id), col = 'white') +
scale_y_discrete(limits = dat$id[order(dat$lab)], labels = sort(dat$lab))
I need to plot hourly data for different days using ggplot, and here is my dataset:
The data consists of hourly observations, and I want to plot each day's observation into one separate line.
Here is my code
xbj1 = bj[c(1:24),c(1,6)]
xbj2 = bj[c(24:47),c(1,6)]
xbj3 = bj[c(48:71),c(1,6)]
ggplot()+
geom_line(data = xbj1,aes(x = Date, y= Value), colour="blue") +
geom_line(data = xbj2,aes(x = Date, y= Value), colour = "grey") +
geom_line(data = xbj3,aes(x = Date, y= Value), colour = "green") +
xlab('Hour') +
ylab('PM2.5')
Please advice on this.
I'll make some fake data (I won't try to transcribe yours) first:
set.seed(2)
x <- data.frame(
Date = rep(Sys.Date() + 0:1, each = 24),
# Year, Month, Day ... are not used here
Hour = rep(0:23, times = 2),
Value = sample(1e2, size = 48, replace = TRUE)
)
This is a straight-forward ggplot2 plot:
library(ggplot2)
ggplot(x) +
geom_line(aes(Hour, Value, color = as.factor(Date))) +
scale_color_discrete(name = "Date")
ggplot(x) +
geom_line(aes(Hour, Value)) +
facet_grid(Date ~ .)
I highly recommend you find good tutorials for ggplot2, such as http://www.cookbook-r.com/Graphs/. Others exist, many quite good.
Say I have two datasets. One that contains two months of data:
units_sold <- data.frame(date = seq(as.Date("2017-05-01"), as.Date("2017-07-01"), 1),
units = rep(20,62),
category = "units_sold")
And one that contains just a week:
forecast <- data.frame(date = seq(as.Date("2017-06-12"), as.Date("2017-06-18"), 1),
units = 5,
category = "forecast")
I can put them on the same plot. I.e.,
joined <- rbind(units_sold, forecast)
ggplot(data = joined, aes(x=date, y=units, colour = category)) + geom_line()
However, I can't seem to figure out how to put a ribbon between the two lines.
This is what I'm trying:
library(dplyr)
ribbon_dat <- left_join(forecast, units_sold, by = "date") %>%
rename(forecast = units.x) %>%
rename(units_sold = units.y) %>%
select(-c(category.x, category.y))
ggplot(data = joined, aes(x=date, y=units, colour = category)) +
geom_line() +
geom_ribbon(aes(x=ribbon_dat$date, ymin=ribbon_dat$forecast, ymax=ribbon_dat$units_sold))
I get this error: Error: Aesthetics must be either length 1 or the same as the data (69): x, ymin, ymax, y, colour
You are very close, you need to pass the second dataset to the data argument in geom_ribbon().
ggplot(data = joined, aes(x = date)) +
geom_line(aes(y = units, colour = category)) +
geom_ribbon(
data = ribbon_dat,
mapping = aes(ymin = forecast, ymax = units_sold)
)
In R with ggplot, I want to create a spaghetti plot (2 quantitative variables) grouped by a third variable to specify line color. Secondly, I want to aggregate that grouping variable with the line type or width.
Here's an example using the airquality dataset. I want the line's color to represent the month, and the summer months to have a different line width from non-summer months.
First, I created an indicator variable for the aggregated groups:
airquality$Summer <- with(airquality, ifelse(Month >= 6 & Month < 9, 1, 0))
I would like something like this, but with differing line widths:
However, this fails:
library(ggplot2)
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month), group = Summer)) +
geom_point() +
geom_line(linetype = as.factor(Summer))
This also fails (specifying airquality$Summer):
ggplot(data = airquality, aes(x=Wind, y = Temp,
color = as.factor(Month), group = airquality$Summer)) +
geom_point() +
geom_line(linetype = as.factor(airquality$Summer))
I attempted this solution, but get another error:
lty <- setNames(c(0, 1), levels(airquality$Summer))
ggplot(data = airquality, aes(x=Wind, y = Temp,
color = as.factor(Month), group = airquality$Summer)) +
geom_point() +
geom_line(linetype = as.factor(airquality$Summer)) +
scale_linetype_manual(values = lty)
Any ideas?
EDIT:
My actual data show very clear trends, and I want to differentiate the top line from all the others below. My goal is to convince people they should make more than just the minimum payment on their student loans:
You just need to change the group to Month and putlinetype in aes:
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month), group = Month)) +
geom_point() +
geom_line(aes(linetype = factor(Summer)))
If you want to specify the linetype you can use a few methods. Here is one way:
lineT <- c("solid", "dotdash")
names(lineT) <- c("1","0")
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month))) +
geom_point() +
geom_line(aes(linetype = factor(Summer))) +
scale_linetype_manual(values = lineT)
Is it possible to enforce the stack order when using geom_area()? I cannot figure out why geom_area(position = "stack") produces this strange fluctuation in stack order around 1605.
There are no missing values in the data frame.
library(ggplot2)
counts <- read.csv("https://gist.githubusercontent.com/mdlincoln/d5e1bf64a897ecb84fd6/raw/34c6d484e699e0c4676bb7b765b1b5d4022054af/counts.csv")
ggplot(counts, aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
You need to order your data. In your data, the first value found for each year is 'Flemish' until 1605, and from 1606 the first value is 'Dutch'. So, if we do this:
ggplot(counts[order(counts$nationality),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
It results in
Further illustration if we use random ordering:
set.seed(123)
ggplot(counts[sample(nrow(counts)),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
As randy said, ggplot2 2.2.0 does automatic ordering. If you want to change the order, just reorder the factors used for fill. If you want to switch which group is on top in the legend but not the plot, you can use scale_fill_manual() with the limits option.
(Code to generate ggplot colors from John Colby)
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
cols <- gg_color_hue(2)
Default ordering in legend
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Dutch","Flemish"))
Reversed ordering in legend
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Flemish","Dutch"))
Reversed ordering in plot and legend
counts$nationality <- factor(counts$nationality, rev(levels(counts$nationality)))
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Flemish","Dutch"))
this should do it for you
ggplot(counts[order(counts$nationality),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
hope this helps