I have some time series data and I would like to customize the x-axis (dates) to show the date labels where I obtain measurements, as opposed to having regular breaks per week/month/year.
Sample data:
dates <- as.Date("2011/01/01") + sample(0:365, 5, replace=F)
number <- sample(1:100, 5)
df <- data.frame(
dates = dates,
number = number
)
This way I can plot my df with regular breaks every month...
ggplot(df, aes(as.Date(dates), number)) +
geom_point(size=6) +
geom_segment(aes(x = dates, y = 0, xend = dates, yend = number),
size=0.5, linetype=2) +
scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%d-%b-%Y")) +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))
... but I would like to set the major breaks to the actual 5 dates in df$dates. It works with a normal continuous scale (scale_x_continuous(breaks = c(1, 3, 7, 9))) but I can't figure out how to do it for a continuous date scale.
I am looking to do something like...
scale_dates_continuous(breaks = df$dates)
...but that doesn't exist unfortunately. Thanks lot for your help!
Please read ?scale_x_date, about the breaks argument: you can use a "vector of breaks". Thus, try
scale_x_date(breaks = df$dates, labels = date_format("%d-%b-%Y"))
Related
I have a dataframe which contains a variable for week-since-2017. So, it counts up from 1 to 313 in that column. I mutated another variable into the dataframe to indicate the year. So, in my scatterplot, I have each week as a point, but the x-axis is horrid, counting up from 1 to 313. Is there a way I can change the scale at the bottom to instead display the variable year, possibly even adding vertical lines in between to show when the year changes?
Currently, I have this:
ggplot(HS, aes(as.integer(Obs), Total)) + geom_point(aes(color=YEAR)) + geom_smooth() + labs(title="Weekly Sales since 2017",x="Week",y="Written Sales") + theme(axis.line = element_line(colour = "orange", size = 1, linetype = "solid"))
You can convert the number of weeks to a number of days using 7 * Obs and add this value on to the start date (as.Date('2017-01-01')). This gives you a date-based x axis which you can format as you please.
Here, we set the breaks at the turn of each year so the grid fits to them:
ggplot(HS, aes(as.Date('2017-01-01') + 7 * Obs, Total)) +
geom_point(aes(color = YEAR)) +
geom_smooth() +
labs(title = "Weekly Sales since 2017", x = "Week", y = "Written Sales") +
theme(axis.line = element_line(colour = "orange", size = 1)) +
scale_x_date('Year', date_breaks = 'year', date_labels = '%Y')
Data used
Obviously, we don't have your data, so I had to create a reproducible set with the same names and similar values to yours for the above example:
set.seed(1)
HS <- data.frame(Obs = 1:312,
Total = rnorm(312, seq(1200, 1500, length = 312), 200)^2,
YEAR = rep(2017:2022, each = 52))
I have a data set of Standardized Precipitation Index values from 1980 to 2005. There is one value for each month, so altogether there are 312 (26 years * 12 months) values. The SPI values range between -3 and +3. Here is an easy reproducible example, since the exact values are not important for my question:
vec1 <- rep(seq(1980, 2005), each= 12)
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
df <- data.frame(vec1, vec2)
colnames(df) <- c("Year", "SPI")
Now I would like to plot the SPI values with the years being the x-axis.
When I try to plot it using ggplot2:
ggplot() +
geom_line(aes(x=df$Year, y=df$SPI))
Something like this comes out:
So the problem is, there is no continuous line.
I can plot it with a continuous line with Base R for example:
plot(vec2, type="l")
But then the problem is that the x-axis only shows the values 1:312 and I need the years as the x-values.
Anybody with a hint?
EDIT after the answer of marcguery:
It turned out that I cannot use a line plot for my purpose. Instead, I need to do a column plot with many single columns when using ggplot2 since I need to color the areas above/below zero.
marcguery's answer works for a geom_line() plot, but unfortunately not for a geom_col() plot. I have no idea why.
Here is the modified code:
vec1 <- seq(as.Date("1980-01-01"),
by = "month",
to = as.Date("2005-12-01"))
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
vec3 <- 1:312
df <- data.frame(vec1, vec2, vec3)
colnames(df) <- c("Date", "SPI", "ID")
library(data.table)
df <- as.data.table(df)
This is what unfortunately does not work with the dates as x-axis, there is a strange output:
library(ggplot2)
# with Date as x-axis
ggplot(data= df, aes(x= Date, y= SPI, width= 1)) +
geom_col(data = df[SPI <= 0], fill = "red") +
geom_col(data = df[SPI >= 0], fill = "blue") +
theme_bw()
This is what works with the simple rownumber as x-axis:
# with ID as x-axis
ggplot(data= df, aes(x= ID, y= SPI, width= 1)) +
geom_col(data = df[SPI <= 0], fill = "red") +
geom_col(data = df[SPI >= 0], fill = "blue") +
theme_bw()
I need something like the last example, just with the dates as the x-axis.
Your observations per month of each year have all the same value in your column Year, hence why ggplot cannot assign them different x values. Since you are working with dates, you could use Date format for your time points so that each month is assigned a different value.
#Seed for reproducibility
set.seed(123)
#Data
vec1 <- seq(as.Date("1980-01-01"),
by = "month",
to = as.Date("2005-12-01"))
vec2 <- sample(x = -3:3, size = 312, replace = TRUE)
df <- data.frame(vec1, vec2)
colnames(df) <- c("Date", "SPI")
#Plot
library(ggplot2)
ggplot(df) +
geom_line(aes(x = Date, y = SPI))+
scale_x_date(breaks = "5 years", date_labels = "%Y",
limits = c(as.Date("1979-12-01"),
as.Date("2006-01-01")),
expand = c(0,0))
Edit after you added your question about coloring the area between your values and 0 based on the sign of the values:
You can definitely use a geom_line plot for that purpose. Using a geom_col plot is a possibility but you would loose visual information about change between your x variables which are continuously related as they represent dates.
To plot a nice geom_line, I will base my approach on the answer here https://stackoverflow.com/a/18009173/14027775. You will have to adapt your data by transforming your dates to numerical values, for instance number of days since a given date (typically 1970/01/01).
#Colored plot
#Numerical format for dates (number of days after 1970-01-01)
df$numericDate <- difftime(df$Date,
as.Date("1970-01-01", "%Y-%m-%d"),
units="days")
df$numericDate <- as.numeric(df$Date)
rx <- do.call("rbind",
sapply(1:(nrow(df)-1), function(i){
f <- lm(numericDate~SPI, df[i:(i+1),])
if (f$qr$rank < 2) return(NULL)
r <- predict(f, newdata=data.frame(SPI=0))
if(df[i,]$numericDate < r & r < df[i+1,]$numericDate)
return(data.frame(numericDate=r,SPI=0))
else return(NULL)
}))
#Get back to Date format
rx$Date <- as.Date(rx$numericDate, origin = "1970-01-01")
d2 <- rbind(df,rx)
ggplot(d2,aes(Date,SPI)) +
geom_area(data=subset(d2, SPI<=0), fill="red") +
geom_area(data=subset(d2, SPI>=0), fill="blue") +
geom_line()+
scale_x_date(breaks = "5 years", date_labels = "%Y",
limits = c(as.Date("1979-12-01"),
as.Date("2006-01-01")),
expand = c(0,0))
Now if you want to keep using geom_col, the reason why you don't see all the bars using dates for the x axis is that they are too thin to be filled as they represent one single day over a long period of time. By filling and coloring them, you should be able to see all of them.
ggplot(data= df, aes(x= Date, y= SPI)) +
geom_col(data = df[df$SPI <= 0,],
fill = "red", color="red", width= 1) +
geom_col(data = df[df$SPI >= 0,],
fill = "blue", color="blue", width= 1) +
scale_x_date(breaks = "5 years", date_labels = "%Y",
limits = c(as.Date("1979-12-01"),
as.Date("2006-01-01")),
expand = c(0,0))
I am trying to plot a graph in ggplot2 where the x-axis represents month-day combinations, the dots represent y-values for two different groups.
When graphing my original data set using this code,
ggplot(graphing.df, aes(MONTHDAY, y.var, color = GROUP)) +
geom_point() +
ylab(paste0(""))+
scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 15)])+
theme(legend.text = element_blank(),
legend.title = element_blank()) +
geom_vline(xintercept = which(graphing.df$MONTHDAY == "12-27")[1], col='red', lwd=2)
I get this graph where the vertical line is not showing.
When I tried to create a reproducible example using the following code...
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
#geom_vline(xintercept = which(df$MONTHDAY == verticle_line)[1], col='red', lwd=2)+
geom_vline(xintercept = which(df$MONTHDAY == verticle_line), col='blue', lwd=2)
The vertical line is showing, but now its showing in the wrong place
In my original data set I have two values for each month-day combination (representing each of the two groups). The month-day combination column is a character vector, it is not a factor and does not have levels.
Here is a way. It subsets the data keeping only the rows of interest and plots the vertical line defined by MONTHDAY.
library(ggplot2)
verticle_line <- "01-02"
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(data = subset(df, MONTHDAY == verticle_line),
mapping = aes(xintercept = MONTHDAY), color = 'blue', size = 2)
Data
I will repost the data creation code, this time setting the RNG seed in order to make the example reproducible.
set.seed(2020)
df <- data.frame(MONTHDAY = c("01-01", "01-01", "01-02", "01-02", "01-03", "01-03"),
TYPE = rep(c("A", "B"), 3),
VALUE = sample(1:10, 6, replace = TRUE))
The reason your line is not showing up where you expect is because you are setting the value of xintercept= via the output of the which() function. which() returns the index value where the condition is true. So in the case of your reproducible example, you get the following:
> which(df$MONTHDAY == verticle_line)
[1] 3 4
It returns a vector indicating that in df$MONTHDAY, indexes 3 and 4 in that vector are true. So your code below:
geom_vline(xintercept = which(df$MONTHDAY == verticle_line)...
Reduces down to this:
geom_vline(xintercept = c(3,4)...
Your MONTHDAY axis is not formatted as a date, but treated as a discrete axis of character vectors. In this case xintercept=c(3,4) applied to a discrete axis draws two vertical lines at x intercepts equivalent to the 3rd and 4th discrete position on that axis: in other words, "01-03" and... some unknown 4th position that is not observable within the axis limits.
How do you fix this? Just take out which():
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = verticle_line, col='blue', lwd=2)
We can get the corresponding values of 'MONTHDAY' after subsetting
ggplot(df, aes(MONTHDAY, VALUE, color = TYPE)) +
geom_point() +
geom_vline(xintercept = df$MONTHDAY[df$MONTHDAY == verticle_line],
col='blue', lwd=2)
I need to add value labels for data points on two separate lines that are generated from a data frame in R using ggplot2. The following is the code snippet that I am using:
DataFrame = data.frame(Amount = c(results$Costs,
results$TotalPoC),
Legend = rep(c("Cost as % of initial costs",
"Revenue as % of cost"),
each = nrow(results)),
Year = rep(0:5,2))
p <- ggplot(ResultsCR, aes(x=Year, y=Amount, group=Legend)) +
geom_line(aes(linetype=Legend))+
geom_point(aes(shape=Legend))+
geom_text(aes(label=Amount))+
theme_classic(base_size = 15) +
ggtitle("Hospital Costs and Revenues")
print(p)
However, the graph is only displaying the labels on the second line, i.e. the one corresponding to the Legend "Revenue as % of cost". How can I generate labels for data points on all lines generated from the same data frame in ggplot2?
I am not able to reproduce your example. Can you please use this dataset to reproduce your problem or draw on it what you would like to change?
library(tidyverse)
set.seed(1)
df <-
tibble(
amount = sample(10:30, 10),
legend = rep(
c("Cost as % of initial costs",
"Revenue as % of cost"),
each = 5),
year = rep(1:5, 2)
)
ggplot(df, aes(x = year, y = amount, group = legend)) +
geom_line(aes(linetype = legend)) +
geom_point(aes(shape = legend)) +
geom_text(aes(label = amount), hjust = -1) +
theme_classic(base_size = 15) +
xlim(1, 5.5) +
ggtitle("Hospital Costs and Revenues")
I'm learning to use ggplot2 and am looking for the smallest ggplot2 code that reproduces the base::plot result below. I've tried a few things and they all ended up being horrendously long, so I'm looking for the smallest expression and ideally would like to have the dates on the x-axis (which are not there in the plot below).
df = data.frame(date = c(20121201, 20121220, 20130101, 20130115, 20130201),
val = c(10, 5, 8, 20, 4))
plot(cumsum(rowsum(df$val, df$date)), type = "l")
Try this:
ggplot(df, aes(x=1:5, y=cumsum(val))) + geom_line() + geom_point()
Just remove geom_point() if you don't want it.
Edit: Since you require to plot the data as such with x labels are dates, you can plot with x=1:5 and use scale_x_discrete to set labels a new data.frame. Taking df:
ggplot(data = df, aes(x = 1:5, y = cumsum(val))) + geom_line() +
geom_point() + theme(axis.text.x = element_text(angle=90, hjust = 1)) +
scale_x_discrete(labels = df$date) + xlab("Date")
Since you say you'll have more than 1 val for "date", you can aggregate them first using plyr, for example.
require(plyr)
dd <- ddply(df, .(date), summarise, val = sum(val))
Then you can proceed with the same command by replacing x = 1:5 with x = seq_len(nrow(dd)).
After a couple of years, I've settled on doing:
ggplot(df, aes(as.Date(as.character(date), '%Y%m%d'), cumsum(val))) + geom_line()
Jan Boyer seems to have found a more concise solution to this problem in this question, which I have shortened a bit and combined with the answers of Prradep, so as to provide a (hopefully) up-to-date-answer:
ggplot(data = df,
aes(x=date)) +
geom_col(aes(y=value)) +
geom_line(aes(x = date, y = cumsum((value))/5, group = 1), inherit.aes = FALSE) +
ylab("Value") +
theme(axis.text.x = element_text(angle=90, hjust = 1))
Note that date is not in Date-Format, but character, and that value is already grouped as suggested by Prradep in his answer above.