scaling x-axis in ggplot - r

I have an Object classt of xts. I would like to plot this object with ggplot2
my xtsobject:
structure(c(463829L, 469849L, 608148L, 470825L, 560057L, 431183L,
418000L, 508168L, 422579L, 589829L, 462264L, 487183L, 612174L,
467904L, 454620L, 450243L, 549898L, 422026L, 508311L, 385633L,
420200L, 619074L, 462605L, 465353L, 565804L, 464841L, 505977L,
624608L, 491175L, 459701L, 563406L, 461595L, 499607L, 674799L,
505167L, 637375L, 500131L, 473494L, 488527L, 613972L, 468938L,
454034L, 566511L, 456879L, 592797L, 491368L, 481690L, 597927L
), .Tsp = c(2012, 2015.91666666667, 12), class = "ts")
I woulf like to have also month numbers on my plot. I have tried this code:
library(ggplot2)
library(zoo)
library(scales)
autoplot(as.zoo(a2)) + geom_line()
+scale_x_date(format = "%b-%Y")
but I get this error:
Error in continuous_scale(aesthetics, "date", identity, breaks = breaks, :
unused argument (format = "%b-%Y")
What should I do to do this job? Like this plot but with month:

Simply try this:
a2 <- read.table(text=' Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 463829 469849 608148 470825 560057 431183 418000 508168 422579 589829 462264 487183
2013 612174 467904 454620 450243 549898 422026 508311 385633 420200 619074 462605 465353
2014 565804 464841 505977 624608 491175 459701 563406 461595 499607 674799 505167 637375
2015 500131 473494 488527 613972 468938 454034 566511 456879 592797 491368 481690 597927', header=TRUE)
library(ggplot2)
library(reshape2)
a2$year <- rownames(a2)
a2 <- melt(a2)
ggplot(a2, aes(variable, value, group=year)) + geom_line() + facet_wrap(~year, ncol=1)
with output
or all in one plot:
ggplot(a2, aes(variable, value, group=year, col=year)) + geom_line()
with output
or this:
a2 <- read.table(text=' Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 463829 469849 608148 470825 560057 431183 418000 508168 422579 589829 462264 487183
2013 612174 467904 454620 450243 549898 422026 508311 385633 420200 619074 462605 465353
2014 565804 464841 505977 624608 491175 459701 563406 461595 499607 674799 505167 637375
2015 500131 473494 488527 613972 468938 454034 566511 456879 592797 491368 481690 597927', header=TRUE)
a2$year <- rownames(a2)
a2 <- melt(a2, id='year')
a2$date <- as.Date(paste(a2$year, a2$variable, '01'), '%Y %b %d')
ggplot(a2, aes(date, value)) + geom_line() +
scale_x_date(date_breaks = "months", date_labels = "%b %Y") +
theme(axis.text.x = element_text(angle = 90))
with output

I think the problem is that the index to your time series is in decimal date (i.e., numeric) format, and scale_x_date is expecting something in date format.
Here's some code that gets close to what I think you want. It involves creating a zoo object with the index in date format first, then plotting that. Like:
a3 <- zoo(a2, order.by = as.Date(yearmon(index(a2))))
p <- autoplot(a3)
p + scale_x_date(date_breaks = "1 month")
+ theme(axis.text.x = element_text(angle = 90))
I think you'll want to tinker with the options in scale_x_date to improve the look of the result, but this should get you on the right path, I think.

test this option in the ggplot.
+scale_x_date(labels=date_format("%Y-%m")

Note that the class of a2 is not "xts" -- it is a "ts" class object. Anyways, first convert the index to class "yearmon" and then use scale_x_yearmon like this:
z <- as.zoo(a2)
index(z) <- as.yearmon(index(z))
autoplot(z) + scale_x_yearmon()

Related

Converting month_year variable into week_year (dplyr) & (lubridate)

I have a dataset structured as follows, where I am tracking collective action mentions by subReddit by month, relative to a policy treatment which is introduced in Feb 17th, 2012. As a result, the period "Feb 2012" appears twice in my dataset where the "pre" period refers to the Feb 2012 days before treatment, and "post" otherwise.
treatment_status month_year collective_action_percentage
pre Dec 2011 5%
pre Jan 2012 8%
pre Feb 2012 10%
post Feb 2012 3%
post March 2012 10%
However, I am not sure how to best visualize this indicator by month, but I made the following graph but I was wondering if presenting this pattern/variable by week&year, rather than month&year basis would be clearer if I am interested in showing how collective action mentions decline after treatment?
ggplot(data = df1, aes(x = as.Date(month_year), fill = collective_action_percentage ,y = collective_action_percentage)) +
geom_bar(stat = "identity", position=position_dodge()) +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
xlab("Criticism by individuals active before and after treatment") +
theme_classic()+
theme(plot.title = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 90, vjust = 0.5))
output:
I created the month_year variable as follows using the Zoo package
df<- df %>%
mutate(month_year = zoo::as.yearmon(date))
Finally, I tried aggregating the data by weekly-basis as follows, however, given that I have multiple years in my dataset, I want to ideally aggregate data by week&year, and not simply by week
df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
Plot a point for each row and connect them with a line so that it is clear what the order is. We also color the pre and post points differently and make treatment status a factor so that we can order the pre level before the post level.
library(ggplot2)
library(zoo)
df2 <- transform(df1, month_year = as.yearmon(month_year, "%b %Y"),
treatment_status = factor(treatment_status, c("pre", "post")))
ggplot(df2, aes(month_year, collective_action_percentage)) +
geom_point(aes(col = treatment_status), cex = 4) +
geom_line()
Note
We assume df1 is as follows. We have already removed % .
df1 <-
structure(list(treatment_status = c("pre", "pre", "pre", "post",
"post"), month_year = c("Dec 2011", "Jan 2012", "Feb 2012", "Feb 2012",
"March 2012"), collective_action_percentage = c(5L, 8L, 10L,
3L, 10L)), class = "data.frame", row.names = c(NA, -5L))

Automatically set data representative breaks in ggplot with facet_grid

Here's a reproductible example taken from the R Graph Gallery:
library(ggplot2)
library(dplyr)
library(viridis)
library(Interpol.T)
library(lubridate)
library(ggExtra)
library(tidyr)
data <- data(Trentino_hourly_T,package = "Interpol.T")
names(h_d_t)[1:5]<- c("stationid","date","hour","temp","flag")
df <- as_tibble(h_d_t) %>%
filter(stationid =="T0001")
df$date<-ymd(df$date)
df <- df %>% mutate(date = ymd(date),
year = year(date),
month = month(date, label=TRUE),
day = day(date))
rm(list=c("h_d_t","mo_bias","Tn","Tx",
"Th_int_list","calibration_l",
"calibration_shape","Tm_list"))
df <- df %>%
filter(between(date, as.Date("2004-02-13"), as.Date("2004-04-29")) | between(date, as.Date("2005-02-13"), as.Date("2005-04-29")))
df <-df %>% select(stationid,day,hour,month,year,temp)%>%
fill(temp)
statno <-unique(df$stationid)
######## Plotting starts here#####################
p <-ggplot(df, aes(day,hour,fill=temp))+
geom_tile(color= "white",size=0.1) +
scale_fill_viridis(name="Hrly Temps C",option ="C") +
facet_grid(year~month, scales = "free") +
scale_y_continuous(trans = "reverse", breaks = unique(df$hour)) +
theme_minimal(base_size = 8) +
labs(title= paste("Hourly Temps - Station",statno), x="Day", y="Hour Commencing") +
theme(legend.position = "bottom",
plot.title=element_text(size = 14, hjust = 0),
axis.text.y=element_text(size=6),
strip.background = element_rect(colour="white"),
axis.ticks=element_blank(),
axis.text=element_text(size=7),
legend.text=element_text(size=6))+
removeGrid()
What is bothering me is that the x axis breaks don't show explicitly the first and last day of each month, even worse they show a February 30th, a March 0th and a April 0th.
My goal is to use a function that automatically and explicitly shows the REAL first and last day of each ploted month (in the example February 13th - February 29th, March 1st - March 31th and April 1st - April 29th) with 4 to 6 breaks within each month.
As this plot will be shown in a shiny app where the user can change the time period ploted, the solution REALLY needs to be automated.
Here are some things I've tried:
library(scales)
p + scale_x_continuous(breaks =breaks_pretty())
But it doesn't change much.
I've tried to write my own function but something horrible happened:
breaksFUN <- function(x){
round(seq(min(x), max(x), length.out = 5), 0)
}
p + scale_x_continuous(breaks =breaksFUN)
Thank you in advance.
Thank you Axeman for your contribution, it really helped! It works for my example but i've encountered some issues trying it out in my data. However, I modified it and it works properly now, here's my solution inspired by Axeman:
breaksFUN <- function(x) {
s <- round(c(seq(min(x) + 1.5, max(x) - 5.5, length.out = 4), max(x) - 1.5))
s[s == 0] <- 1
s[s > 31] <- 31
s <- round(seq(range(s)[1], range(s)[2], length.out = 5))
unique(s)
}
p + scale_x_continuous(breaks = breaksFUN)

reorder character class data frame column and plot

I have a data frame that looks like the following:
df <- data.frame(date.time = c("Fri 00:00", "Fri 23:30", "Mon 00:00", "Mon 23:30",
"Sat 00:00", "Sat 23:30", "Sun 00:00", "Sun 23:30",
"Thu 00:00", "Thu 23:30", "Tue 00:00", "Tue 23:30",
"Wed 00:00", "Wed 23:30"),
Price = c(36.15368, 41.61206, 30.80412, 37.47360, 38.04516, 35.72798,
33.05613, 32.65447, 35.50335, 41.81241, 35.14006, 37.56432,
35.04553, 38.00721))
the date.time values are of class character and the Price values are of class numeric. I would like to plot the data using ggplot. The problem is that the data is in the wrong order. I would like an order of: sun, mon, ..., sat
I have attempted to do this using the following code:
my.order <- c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)
df %>%
ggplot(aes(x = reorder(date.time, my.order), y = Price, group = 1)) +
geom_line()
but I end up getting a strange order that begins at the 'Tue' row of the original data frame. What am I doing wrong?
i would also like to label the x axis and so i have tried the following code:
df %>%
ggplot(aes(x = reorder(date.time, my.order), y = Price, group = 1)) +
geom_line() +
scale_x_discrete(name = 'Day', breaks = df$date.time[c(1,3,5,7,9,11,13)],
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
But the labels end up in the order of the original data set, while the plot is ordered beginning on 'Tue' as above. How can I get both the data and labels to appear in the order I would like?
Edit: I think it might have something to do with the levels. Running the following code
df$date.time[c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)]
results in the following output
[1] Sun 00:00 Sun 23:30 Mon 00:00 Mon 23:30 Tue 00:00 Tue 23:30 Wed 00:00 Wed 23:30
[9] Thu 00:00 Thu 23:30 Fri 00:00 Fri 23:30 Sat 00:00 Sat 23:30
14 Levels: Tue 00:00 Tue 23:30 Mon 00:00 Mon 23:30 Wed 00:00 Wed 23:30 ... Sun 23:30
Not sure why.
Your code actually does what you ask it to do in the first part of your problem: respecting the order of your data in df, you assigned position 1 and 2 to the two Tue values, which is why ggplot2 plots them first.
You can see the numbers associated to each element when running the following:
my.order <- c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)
reorder(df$date.time, my.order)
You can use this vector for my.order instead:
my.order <- c(11,12,3,4,13,14,1,2,9,10,5,6,7,8)
df %>%
ggplot(aes(x = reorder(date.time, my.order), y = Price, group = 1)) +
geom_line()
The difference with the method df$date.time[c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)] is that in your first reorder method, you associate a position to each element of your vector (i.e. 1st element has position 7, 2nd element has position 8, etc.) whereas, in the square bracket method you define the order in which elements in your vector come up (i.e. 7th element comes 1st, 8th element comes 2nd, etc.).
You will find that using the square bracket method in your ggplot call won't help as ggplot2 automatically uses the alphabetic order by default, i.e. the order of the data in your dataframe does not matter (the data being strings or factors won't make a difference).
However, if you use factors (which is the default when storing strings with the data.frame() function), you can order their levels:
df$date.time <- ordered(df$date.time,
levels = df$date.time[c(7,8,3,4,11,12,13,14,9,10,1,2,5,6)])
# see the new ordered levels
levels(df$date.time)
# visualise as is, ggplot2 uses ordered levels
df %>%
ggplot(aes(x = date.time, y = Price, group = 1)) +
geom_line()
For your labels, as the ordering of levels has not changed the order of your data in your dataframe, you still have to refer to their original position. But if you want your original code to work, you can add a step to reorganise your whole dataframe according to the ordered levels:
library(dplyr)
df <- df %>%
arrange(date.time)
The dplyr::arrange() function will take the ordered levels into account, and your rows are now ordered as expected.
Your original labelling method should then work fine:
df %>%
ggplot(aes(x = date.time, y = Price, group = 1)) +
geom_line() +
scale_x_discrete(name = 'Day', breaks = df$date.time[c(1,3,5,7,9,11,13)],
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
To get Sunday to appear first do this:
df$date.time <- reorder(df$date.time, my.order)
df %>%
ggplot(aes(x = as.character(date.time), y = Price, group = 1)) +
geom_line()
No idea why, but making it a character sorts out the re-order issue.
EDIT: with as.character() it looks like the labels works as well?
df %>%
ggplot(aes(x = as.character(date.time), y = Price, group = 1)) +
geom_line() +
scale_x_discrete(name = 'Day', breaks = df$date.time[c(1,3,5,7,9,11,13)],
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))

Scale the x-axes with quarterly date format

I created a plot in R using the ggplot library:
library(ggplot2)
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = variable), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
I got the plot that I want but the only problem is that variable, yQ values have the format:
1990Q1
1900Q2
1990Q3
1990Q4
......
......
2017Q1
2017Q2
2017Q3
2017Q4
and because there are many years, the x-axis label cannot show all the dates clearly (they overlapped).
Therefore, I want the x-axis label to show only Q1 and Q3 for every 5 years.
So I want the x-axis to be something like this:
1990Q1 1990Q3 1995Q1 1995Q3 ...... 2015Q1 2015Q3
I tried to use scale_x_date but my dates are not in date format (e.g. 1990Q1) and therefore this does not work. How can I fix it?
The question does not provide reproducible input but using df from the Note below with the autoplot.zoo method of ggplot's autoplot generic we can write:
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "yQ", FUN = as.yearqtr)
autoplot(z) + scale_x_yearqtr()
Note
Test input--
df <- data.frame(yQ = c("1990Q1", "1990Q2", "1990Q3", "1990Q4"), value = 1:4)
The zoo::format.yearqtr() function is quite easy to use with ggplot2.
Try
scale_x_date(labels = function(x) zoo::format.yearqtr(x, "%YQ%q"))
Use function zoo::as.yearqtr (zoo package) to work with quarterly dates.
Generate example data:
year <- 1990:2000
quar <- paste0("Q", 1:4)
foo <- as.vector(outer(year, quar, paste0))
data <- data.frame(dateQ = foo, Y = rnorm(length(foo)))
head(data)
dateQ Y
1 1990Q1 -0.09944705
2 1991Q1 0.14493910
3 1992Q1 0.54856787
4 1993Q1 1.12966224
5 1994Q1 -0.93539302
6 1995Q1 0.24772265
Transform quarterly date to "normal" date:
data$dateNorm <- as.Date(zoo::as.yearqtr(data$dateQ))
head(data)
dateQ Y dateNorm
1 1990Q1 -0.09944705 1990-01-01
2 1991Q1 0.14493910 1991-01-01
3 1992Q1 0.54856787 1992-01-01
4 1993Q1 1.12966224 1993-01-01
5 1994Q1 -0.93539302 1994-01-01
6 1995Q1 0.24772265 1995-01-01
It sets Q1/2/3/4 as the first day of January/April/July/October.
data[grep("1991", data$dateQ), ]
dateQ Y dateNorm
2 1991Q1 0.1449391 1991-01-01
13 1991Q2 1.5878678 1991-04-01
24 1991Q3 -0.1071823 1991-07-01
35 1991Q4 2.2905729 1991-10-01
Now you can plot it or perform other calculations as it's in Date format.
library(ggplot2)
ggplot(data, aes(dateNorm, Y)) +
geom_line()
You can
manipulate x-axis breaks and labels with scale_x_discrete(breaks = ..., labels = ...)
change the angle of text with theme(axis.text.x = element_text(angle = ...))
I generated some data
Combs <- expand.grid(1990:2017, c("Q1", "Q2", "Q3", "Q4"))
df <- data.frame(
yQ = sort(apply(Combs, 1, paste, collapse="")),
value = runif(112)
)
In the first example, I subset yQ values you want with a logical vector - and change the angle of text
library(ggplot2)
pattern <- c(T, F, T, F, rep(F, 16))
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ[pattern], labels = df$yQ[pattern]) +
theme(axis.text.x = element_text(angle=90))
But notice that ticks marks not specified by break are not shown - so the alternative is to copy yQ values into a vector and make non-relevant years = ""
xVec <- as.character(df$yQ)
xVec[pattern==F] <- ""
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ, labels = xVec) +
theme(axis.text.x = element_text(angle=90))

Reordering month results in the x axis (ggplot)

I'd like to produce a plot with reordered months on the x axis (instead of starting in Jan and ending in Dec, I'd like to start on Apr and end on Mar).
My data is something like:
Month An Fiscal.Year Month.Number Month.Name
1 2009-04-01 40488474 2009 4 Apr
2 2009-05-01 53071971 2009 5 May
3 2009-06-01 24063572 2009 6 Jun
...
44 2012-11-01 39457771 2012 11 Nov
45 2012-12-01 44045572 2012 12 Dec
46 2013-01-01 90734077 2012 1 Jan
My code for producing the plot is:
g <- ggplot(data = data, aes(x = Month.Number, y = An)) +
geom_line(aes(group = Fiscal.Year, colour = factor(Fiscal.Year))) +
scale_x_discrete(
name = "Month",
breaks = data$Month.Number,
labels = data$Month.Name
) +
scale_y_continuous();
but the result is a plot ordered by month from Jan to Dec, not from Apr to Mar as I want.
I've tried the limits option inside scale_x_discrete, but I think this just reorders the x axis labels, not the real data.
Could you please help me?
Thanks in advance for your answer!
You have to reorder the factor levels of Month.Name. Assuming dfis your data.frame:
df$Month.Name <- factor( df$Month.Name, levels = c( "Apr", "May", ..., "Feb", "Mar" ) )
g <- ggplot(data = df, aes(x = Month.Name, y = An) ) +
geom_line(aes(group = Fiscal.Year, colour = factor(Fiscal.Year))) +
scale_x_discrete( name = "Month" ) +
scale_y_continuous();
Alternatively you can just change Month.Number such that, Apr is 1, May is 2 and so on...
Just run before plotting:
data$Month.Number <- ((data$Month.Number+8) %% 12) + 1

Resources