I have a data set of Standardized Precipitation Index values from 1980 to 2005. There is one value for each month, so altogether there are 312 (26 years * 12 months) values. The SPI values range between -3 and +3. Here is an easy reproducible example, since the exact values are not important for my question:
vec1 <- rep(seq(1980, 2005), each= 12)
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
df <- data.frame(vec1, vec2)
colnames(df) <- c("Year", "SPI")
Now I would like to plot the SPI values with the years being the x-axis.
When I try to plot it using ggplot2:
ggplot() +
geom_line(aes(x=df$Year, y=df$SPI))
Something like this comes out:
So the problem is, there is no continuous line.
I can plot it with a continuous line with Base R for example:
plot(vec2, type="l")
But then the problem is that the x-axis only shows the values 1:312 and I need the years as the x-values.
Anybody with a hint?
EDIT after the answer of marcguery:
It turned out that I cannot use a line plot for my purpose. Instead, I need to do a column plot with many single columns when using ggplot2 since I need to color the areas above/below zero.
marcguery's answer works for a geom_line() plot, but unfortunately not for a geom_col() plot. I have no idea why.
Here is the modified code:
vec1 <- seq(as.Date("1980-01-01"),
by = "month",
to = as.Date("2005-12-01"))
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
vec3 <- 1:312
df <- data.frame(vec1, vec2, vec3)
colnames(df) <- c("Date", "SPI", "ID")
library(data.table)
df <- as.data.table(df)
This is what unfortunately does not work with the dates as x-axis, there is a strange output:
library(ggplot2)
# with Date as x-axis
ggplot(data= df, aes(x= Date, y= SPI, width= 1)) +
geom_col(data = df[SPI <= 0], fill = "red") +
geom_col(data = df[SPI >= 0], fill = "blue") +
theme_bw()
This is what works with the simple rownumber as x-axis:
# with ID as x-axis
ggplot(data= df, aes(x= ID, y= SPI, width= 1)) +
geom_col(data = df[SPI <= 0], fill = "red") +
geom_col(data = df[SPI >= 0], fill = "blue") +
theme_bw()
I need something like the last example, just with the dates as the x-axis.
Your observations per month of each year have all the same value in your column Year, hence why ggplot cannot assign them different x values. Since you are working with dates, you could use Date format for your time points so that each month is assigned a different value.
#Seed for reproducibility
set.seed(123)
#Data
vec1 <- seq(as.Date("1980-01-01"),
by = "month",
to = as.Date("2005-12-01"))
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
df <- data.frame(vec1, vec2)
colnames(df) <- c("Date", "SPI")
#Plot
library(ggplot2)
ggplot(df) +
geom_line(aes(x = Date, y = SPI))+
scale_x_date(breaks = "5 years", date_labels = "%Y",
limits = c(as.Date("1979-12-01"),
as.Date("2006-01-01")),
expand = c(0,0))
Edit after you added your question about coloring the area between your values and 0 based on the sign of the values:
You can definitely use a geom_line plot for that purpose. Using a geom_col plot is a possibility but you would loose visual information about change between your x variables which are continuously related as they represent dates.
To plot a nice geom_line, I will base my approach on the answer here https://stackoverflow.com/a/18009173/14027775. You will have to adapt your data by transforming your dates to numerical values, for instance number of days since a given date (typically 1970/01/01).
#Colored plot
#Numerical format for dates (number of days after 1970-01-01)
df$numericDate <- difftime(df$Date,
as.Date("1970-01-01", "%Y-%m-%d"),
units="days")
df$numericDate <- as.numeric(df$Date)
rx <- do.call("rbind",
sapply(1:(nrow(df)-1), function(i){
f <- lm(numericDate~SPI, df[i:(i+1),])
if (f$qr$rank < 2) return(NULL)
r <- predict(f, newdata=data.frame(SPI=0))
if(df[i,]$numericDate < r & r < df[i+1,]$numericDate)
return(data.frame(numericDate=r,SPI=0))
else return(NULL)
}))
#Get back to Date format
rx$Date <- as.Date(rx$numericDate, origin = "1970-01-01")
d2 <- rbind(df,rx)
ggplot(d2,aes(Date,SPI)) +
geom_area(data=subset(d2, SPI<=0), fill="red") +
geom_area(data=subset(d2, SPI>=0), fill="blue") +
geom_line()+
scale_x_date(breaks = "5 years", date_labels = "%Y",
limits = c(as.Date("1979-12-01"),
as.Date("2006-01-01")),
expand = c(0,0))
Now if you want to keep using geom_col, the reason why you don't see all the bars using dates for the x axis is that they are too thin to be filled as they represent one single day over a long period of time. By filling and coloring them, you should be able to see all of them.
ggplot(data= df, aes(x= Date, y= SPI)) +
geom_col(data = df[df$SPI <= 0,],
fill = "red", color="red", width= 1) +
geom_col(data = df[df$SPI >= 0,],
fill = "blue", color="blue", width= 1) +
scale_x_date(breaks = "5 years", date_labels = "%Y",
limits = c(as.Date("1979-12-01"),
as.Date("2006-01-01")),
expand = c(0,0))
Related
I need to visualize and compare the difference in two equally long sales periods. 2018/2019 and 2019/2020. Both periods begin at week 44 and end at week 36 of the following year. If I create a graph, both periods are continuous and line up. If I use only the week number, the values are sorted as continuum and the graph does not make sense. Can you think of a solution?
Thank You
Data:
set.seed(1)
df1 <- data.frame(sells = runif(44),
week = c(44:52,1:35),
YW = yearweek(seq(as.Date("2018-11-01"), as.Date("2019-08-31"), by = "1 week")),
period = "18/19")
df2 <- data.frame(sells = runif(44),
week = c(44:52,1:35),
YW = yearweek(seq(as.Date("2019-11-01"), as.Date("2020-08-31"), by = "1 week")),
period = "19/20")
# Yearweek on x axis, when both period are separated
ggplot(df1, aes(YW, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
labs(color="Legend text")
# week on x axis when weeks are like continuum and not splited by year
ggplot(df1, aes(week, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
labs(color="Legend text")
Another alternative is to facet it. This'll require combining the two sets into one, preserving the data source. (This is commonly a better way of dealing with it in general, anyway.)
(I don't have tstibble, so my YW just has seq(...), no yearweek. It should translate.)
ggplot(dplyr::bind_rows(tibble::lst(df1, df2), .id = "id"), aes(YW, sells)) +
geom_line(aes(color = id)) +
facet_wrap(id ~ ., scales = "free_x", ncol = 1)
In place of dplyr::bind_rows, one might also use data.table::rbindlist(..., idcol="id"), or do.call(rbind, ...), though with the latter you will need to assign id externally.
One more note: the default formatting of the x-axis is obscuring the "year" of the data. If this is relevant/important (and not apparent elsewhere), then use ggplot2's normal mechanism for forcing labels, e.g.,
... +
scale_x_date(labels = function(z) format(z, "%Y-%m"))
While unlikely that you can do this without having tibble::lst available, you can replace that with list(df1=df1, df2=df2) or similar.
If you want to keep the x axis as a numeric scale, you can do:
ggplot(df1, aes((week + 9) %% 52, sells)) +
geom_line(aes(color="Period 18/19")) +
geom_line(data=df2, aes(color="Period 19/20")) +
scale_x_continuous(breaks = 1:52,
labels = function(x) ifelse(x == 9, 52, (x - 9) %% 52),
name = "week") +
labs(color="Legend text")
Try this. You can format your week variable as a factor and keep the desired order. Here the code:
library(ggplot2)
library(tsibble)
#Data
df1$week <- factor(df1$week,levels = unique(df1$week),ordered = T)
df2$week <- factor(df2$week,levels = unique(df2$week),ordered = T)
#Plot
ggplot(df1, aes(week, sells)) +
geom_line(aes(color="Period 18/19",group=1)) +
geom_line(data=df2, aes(color="Period 19/20",group=1)) +
labs(color="Legend text")
Output:
I have measurement data from the past months:
Variables
x <- df$DatoTid
y <- df$Partikler
color <- df$Opgave
I'm trying to plot my data based on the timestamp, so that I have the hours of the day in the x-axis, instead of the specific POSIXct datetime.
I would like the labels and ticks of the x-axis to be fx "00:00", "01:00",..."24:00".
So that noon is in the middle of the x-axis.
So far I tried to convert the datetime values into characters.
Doesn't look good yet (as you can see the axis ticks and labels are gone. Possibly other things are wrong as well).
Can someone help me?
And please let me know how to upload the data for you. I don't know how to add a huge .csv-file....
# Rounding up to nearest 10 min:
head(df)
df$Tid2 <- format(strptime("1970-01-01", "%Y-%m-%d", tz="CET") +
round(as.numeric(df$DatoTid)/300)*300 + 3600, "%Y-%m-%d %H:%M:%S")
head(df)
df$Tid2 <- as.character(df$Tid2)
str(df)
x <- df$Tid2
y <- df$Partikler
color <- df$Opgave
plot2 <- ggplot(data = df, aes(x = x, y = y, color = color)) +
geom_point(shape=16, alpha=0.6, size=1.8) +
scale_y_continuous(labels=function(x) format(x, big.mark = ".", decimal.mark = ",", scientific = FALSE)) +
scale_x_discrete(breaks=c("00:00:00", "06:00:00", "09:00:00", "12:00:00", "18:00:00", "21:00:00")) +
scale_color_discrete(name = "Case") +
xlab(" ") +
ylab(expression(paste("Partikelkoncentration [pt/cc]"))) +
myTheme +
theme(legend.text=element_text(size=8), legend.title=element_text(size=8))
plot2
I would approach this by making a new time stamp that uses a single day, but the hours/minutes/seconds of your existing time stamp.
First, here's a made-up version of your data, here using a linear trend in Partikler:
library(tidyverse); library(lubridate)
df <- data_frame(Tid2 = seq.POSIXt(from = ymd_h(2019010100),
to = ymd_h(2019011500), by = 60*60),
Partikler = seq(from = 0, to = 2.5E5, along.with = Tid2),
Opgave = as.factor(floor_date(Tid2, "3 days")))
# Here's a plot that's structurally similar to yours:
ggplot(df, aes(Tid2, Partikler, col = Opgave)) +
geom_point() +
scale_color_discrete(name = "Case")
Now, if we change the timestamps to be in the same day, we can control them like usual in ggplot, but with them collapsed into a single day of timing. We can also change the x axis so it doesn't mention the date component of the time stamp:
df2 <- df %>%
mutate(Tid2_sameday = ymd_hms(paste(Sys.Date(),
hour(Tid2), minute(Tid2), second(Tid2))))
ggplot(df2, aes(Tid2_sameday, Partikler, col = Opgave)) +
geom_point() +
scale_color_discrete(name = "Case") +
scale_x_datetime(date_labels = "%H:%M")
I'm trying to make a simple Date * temperature heatmap (?raster graph?) that shows temperature over time based on binned temperature groups. Something like this but only along the date axis (no time variable. I'd prefer to use ggplot, but keep runnning astray. The graph the data produces is headed in the right direction, but I can't figure out how to get rid of the y-axis. I'd appreciate any help
dat <- data.frame(temp = sample(20,100, replace=TRUE), date=seq(as.Date("2011-07-01"), by=1, len=100))
p <- ggplot(dat, aes(date, temp)) + geom_tile(aes(fill = temp)) + scale_fill_gradient(low = "blue", high = "red")
Thanks!
So you don't want to map temp on the y axis?
Well then you could use a fixed value for y and remove the rest of the y-axis:
dat <- data.frame(temp = sample(20,100, replace=TRUE),
date=seq(as.Date("2011-07-01"), by=1, len=100))
require(ggplot2)
ggplot(dat, aes(x = date, y = 1)) +
geom_tile(aes(fill = temp)) +
scale_fill_gradient(low = "blue", high = "red") +
labs(y = NULL) +
scale_y_continuous(breaks = NULL)
You could also try doing something like the plot below with the metvurst package.
http://i.imgur.com/8Js1Uz7.png
dat <- data.frame(temp = sample(20,60, replace=TRUE),
date=seq(as.POSIXct("2011-01-01 00:00"), by=3600, len=8760))
dat$year <- as.numeric(format(dat$date,"%Y"))
dat$month <- as.numeric(format(dat$date,"%m"))
# Install and load metvurst library
install_github('metvurst', 'tim-salabim')
library(metvurst)
plot.air.temp <- strip(x = dat$temp,
date = dat$date,
cond = dat$year,
arrange = "long",
colour = colorRampPalette(rev(brewer.pal(11, "Spectral"))),
main = "Daily Air Temperatures\n\nTemperature [°C]")
plot.air.temp
I am trying to develop a weather plot like that appears in weather data - something like.
I want to plot daily value (although average value can appear in circle). I am using ggplot2 as it need multifaceted (for each month and year).
st <- as.Date ("2009-1-1")
en <- as.Date ("2011-12-28")
date1 <- seq(st, en, "1 day")
year <- format(date1, "%Y")
month <- format (date1, "%b")
day <- as.numeric (format(date1, "%d"))
avgtm <- round (rnorm (length(date1), 50,5), 1)
maxtm <- avgtm + abs(rnorm (length (avgtm), 0, 5))
mintm <- avgtm - abs(rnorm (length (avgtm), 0, 5))
myd <- data.frame ( year, month, day, avgtm, maxtm, mintm)
require(ggplot2)
qplot(day, avgtm, data = myd, geom = "line", col = "red") +
facet_grid(year ~ month) + theme_bw()
There is one major problem here, line will connect between months.
Each month is plotted to maximum (although one month can end in 28, leaving blank at the month).
Is there a smart way to achieve what I want to achieve. I tried ggplot2 but there might be other nice options.
Edit:
I am trying to add vertical line at the first day of month to demark the months. Here is I tried to find the first day of month:
td = as.Date (seq(as.Date("2009/1/1"), as.Date("2011/12/28"), "months"))
I tried to use this to plot line:
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3) +
geom_vline(xintercept=td, linetype="dotted") + theme_bw()
But running an error:
Error : Invalid intercept type: should be a numeric vector, a function, or a name of a function
How can plot the vertical line with the date ?
There is a solution with panel.xblocks from latticeExtra:
st <- as.Date("2009-1-1")
en <- as.Date("2011-12-28")
date1 <- seq(st, en, "1 day")
avgtm <- round (rnorm (length(date1), 50,5), 1)
myd <- data.frame(date1, avgtm)
I define two functions to extract month and year values instead of
including them in the data.frame. This approach is useful with
panel.xblocks in the panel function of xyplot:
month <- function(x)format(x, '%m')
year <- function(x)format(x, '%Y')
I use year(date1) as conditioning variable to produce three
panels. Each of these panels will display the time series for that
year (panel.xyplot) and a sequence of contiguous blocks with
alternating colors to highlight months (panel.xblocks). You
should note that the y argument in panel.xblocks is the
function month previously defined:
xyplot(avgtm ~ date1 | year(date1), data=myd,
type='l', layout=c(1, 3),
scales=list(x=list(relation='free')),
xlab='', ylab='',
panel=function(x, y, ...){
panel.xblocks(x, month,
col = c("lightgray", "white"),
border = "darkgray")
panel.xyplot(x, y, lwd = 1, col='black', ...)
})
How about making a date column, then faceting on year only
myd$date <- as.Date(paste(myd$year, myd$month, myd$day), format='%Y %b %d')
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3)
You could add scales='free_x' to your plot as well, but will find it makes interpretation difficult.
By faceting on month and year you are telling the viewer and the plotting tool that the variables plotted are not continuous. This is incorrect as you've pointed out in your question. Thus, no faceting... You can add tick marks for each month or each day if you want.
library(scales)
qplot(date, avgtm, data = myd, geom = "line", col = "red") +
facet_wrap(~year, scales='free_x', ncol=1, nrow=3) +
scale_x_date(breaks=date_breaks("month"), labels=date_format("%b"))
Alternatively you could extract day of year and plot everything on one plot, coloring by year:
myd$doy <- format(myd$date, '%j')
p <- ggplot(myd, aes(x=doy, y=avgtm, color=year, group=year))
p + geom_line()
or
p + geom_smooth()
I am trying to plot the change in a time series for each calendar year using ggplot and I am having problems with the fine control of the x-axis. If I do not use scale="free_x" then I end up with an x-axis that shows several years as well as the year in question, like this:
If I do use scale="free_x" then as one would expect I end up with tick labels for each plot, and that in some cases vary by plot, which I do not want:
I have made various attempts to define the x-axis using scale_x_date etc but without any success. My question is therefore:
Q. How can I control the x-axis breaks and labels on a ggplot facet grid so that the (time series) x-axis is identical for each facet, shows only at the bottom of the panel and is in the form of months formatted 1, 2, 3 etc or as 'Jan','Feb','Mar'?
Code follows:
require(lubridate)
require(ggplot2)
require(plyr)
# generate data
df <- data.frame(date=seq(as.Date("2009/1/1"), by="day", length.out=1115),price=runif(1115, min=100, max=200))
# remove weekend days
df <- df[!(weekdays(as.Date(df$date)) %in% c('Saturday','Sunday')),]
# add some columns for later
df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df$month <- as.numeric(format(as.Date(df$date), format="%m"))
df$day <- as.numeric(format(as.Date(df$date), format="%d"))
# calculate change in price since the start of the calendar year
df <- ddply(df, .(year), transform, pctchg = ((price/price[1])-1))
p <- ggplot(df, aes(date, pctchg)) +
geom_line( aes(group = 1, colour = pctchg),size=0.75) +
facet_wrap( ~ year, ncol = 2,scale="free_x") +
scale_y_continuous(formatter = "percent") +
opts(legend.position = "none")
print(p)
here is an example:
df <- transform(df, doy = as.Date(paste(2000, month, day, sep="/")))
p <- ggplot(df, aes(doy, pctchg)) +
geom_line( aes(group = 1, colour = pctchg),size=0.75) +
facet_wrap( ~ year, ncol = 2) +
scale_x_date(format = "%b") +
scale_y_continuous(formatter = "percent") +
opts(legend.position = "none")
p
Do you want this one?
The trick is to generate day of year of a same dummy year.
UPDATED
here is an example for the dev version (i.e., ggplot2 0.9)
p <- ggplot(df, aes(doy, pctchg)) +
geom_line( aes(group = 1, colour = pctchg), size=0.75) +
facet_wrap( ~ year, ncol = 2) +
scale_x_date(label = date_format("%b"), breaks = seq(min(df$doy), max(df$doy), "month")) +
scale_y_continuous(label = percent_format()) +
opts(legend.position = "none")
p