ggplot line chart does not show data correctly - r

I am trying to be as specific as possible.
The data I am working with looks like:
dates bsheet mro ciss
1 2008 Oct 490509 3.751000 0.8579982
2 2008 Nov 513787 3.434333 0.9153926
3 2008 Dec 570591 2.718742 0.9145012
4 2009 Jan 534985 2.323581 0.8811410
5 2009 Feb 528390 2.001000 0.8551557
6 2009 Mar 551730 1.662290 0.8286146
7 2009 Apr 514041 1.309333 0.7460113
8 2009 May 486151 1.097774 0.5925725
9 2009 Jun 484629 1.001000 0.5412631
10 2009 Jul 454379 1.001000 0.5398128
11 2009 Aug 458111 1.001000 0.3946989
12 2009 Sep 479956 1.001000 0.2232348
13 2009 Oct 448080 1.001000 0.2961637
14 2009 Nov 427756 1.001000 0.3871220
15 2009 Dec 448548 1.001000 0.3209175
and can be produced via
structure(list(dates = c("2008 Oct", "2008 Nov", "2008 Dec",
"2009 Jan", "2009 Feb", "2009 Mar", "2009 Apr", "2009 May", "2009 Jun",
"2009 Jul", "2009 Aug", "2009 Sep", "2009 Oct", "2009 Nov", "2009 Dec"
), bsheet = c(490509, 513787, 570591, 534985, 528390, 551730,
514041, 486151, 484629, 454379, 458111, 479956, 448080, 427756,
448548), mro = c(3.751, 3.43433333333333, 2.71874193548387, 2.32358064516129,
2.001, 1.66229032258065, 1.30933333333333, 1.09777419354839,
1.001, 1.001, 1.001, 1.001, 1.001, 1.001, 1.001), ciss = c(0.857998173913043,
0.9153926, 0.914501173913044, 0.881140954545454, 0.85515565,
0.828614636363636, 0.746011318181818, 0.592572476190476, 0.541263136363636,
0.539812782608696, 0.394698857142857, 0.223234772727273, 0.296163727272727,
0.387122047619048, 0.32091752173913)), row.names = c(NA, 15L), class = "data.frame")
The line chart I created using the following code
ciss_plot = ggplot(data = example) +
geom_line(aes(x = dates, y = ciss, group = 1)) +
labs(x = 'Time', y = 'CISS') +
scale_x_discrete(breaks = dates_breaks, labels = dates_labels) +
scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)), expand = c(0, 0)) +
theme_bw() +
theme(axis.text.x = element_text(hjust = c(rep(0.5, 11), 0.8, 0.2)))
ciss_plot
for ggplot2 looks like:
whereas if plot the same data using the standard built in plot() function of R using
plot(example$ciss, type = 'l')
results in
which obviously is NOT identical!
Could someone please help me out? These plots take me forever already and I am not figuring out where the problem is. I suspect something is wring either with "group = 1" or the data type of the example$dates column!
I am thankful for any constructive input!!
Thank you all in advance!
Manuel

Your date column is in character format. This means that ggplot will by default convert it to a factor and arrange it in alphabetical order, which is why the plot appears in a different shape. One way to fix this is to ensure you have the levels in the correct order before plotting, like this:
library(dplyr)
library(ggplot2)
dates_breaks <- as.character(example$dates)
ggplot(data = example %>% mutate(dates = factor(dates, levels = dates))) +
geom_line(aes(x = dates, y = ciss, group = 1)) +
labs(x = 'Time', y = 'CISS') +
scale_x_discrete(breaks = dates_breaks, labels = dates_breaks,
guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)),
expand = c(0, 0)) +
theme_bw()
A smarter way would be to convert the date column to actual date times, which allows greater freedom of plotting and prevents you having to use a grouping variable at all:
example <- example %>%
mutate(dates = as.POSIXct(strptime(paste(dates, "01"), "%Y %b %d")))
ggplot(example) +
geom_line(aes(x = dates, y = ciss, group = 1)) +
labs(x = 'Time', y = 'CISS') +
scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)),
expand = c(0, 0)) +
scale_x_datetime(breaks = seq(min(example$dates), max(example$dates), "year"),
labels = function(x) strftime(x, "%Y\n%b")) +
theme_bw() +
theme(panel.grid.minor.x = element_blank())

Related

Converting month_year variable into week_year (dplyr) & (lubridate)

I have a dataset structured as follows, where I am tracking collective action mentions by subReddit by month, relative to a policy treatment which is introduced in Feb 17th, 2012. As a result, the period "Feb 2012" appears twice in my dataset where the "pre" period refers to the Feb 2012 days before treatment, and "post" otherwise.
treatment_status month_year collective_action_percentage
pre Dec 2011 5%
pre Jan 2012 8%
pre Feb 2012 10%
post Feb 2012 3%
post March 2012 10%
However, I am not sure how to best visualize this indicator by month, but I made the following graph but I was wondering if presenting this pattern/variable by week&year, rather than month&year basis would be clearer if I am interested in showing how collective action mentions decline after treatment?
ggplot(data = df1, aes(x = as.Date(month_year), fill = collective_action_percentage ,y = collective_action_percentage)) +
geom_bar(stat = "identity", position=position_dodge()) +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
xlab("Criticism by individuals active before and after treatment") +
theme_classic()+
theme(plot.title = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 90, vjust = 0.5))
output:
I created the month_year variable as follows using the Zoo package
df<- df %>%
mutate(month_year = zoo::as.yearmon(date))
Finally, I tried aggregating the data by weekly-basis as follows, however, given that I have multiple years in my dataset, I want to ideally aggregate data by week&year, and not simply by week
df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
Plot a point for each row and connect them with a line so that it is clear what the order is. We also color the pre and post points differently and make treatment status a factor so that we can order the pre level before the post level.
library(ggplot2)
library(zoo)
df2 <- transform(df1, month_year = as.yearmon(month_year, "%b %Y"),
treatment_status = factor(treatment_status, c("pre", "post")))
ggplot(df2, aes(month_year, collective_action_percentage)) +
geom_point(aes(col = treatment_status), cex = 4) +
geom_line()
Note
We assume df1 is as follows. We have already removed % .
df1 <-
structure(list(treatment_status = c("pre", "pre", "pre", "post",
"post"), month_year = c("Dec 2011", "Jan 2012", "Feb 2012", "Feb 2012",
"March 2012"), collective_action_percentage = c(5L, 8L, 10L,
3L, 10L)), class = "data.frame", row.names = c(NA, -5L))

How to plot Quarterly and Year-to-Date values in ggplot?

Raw data
structure(list(attainment_target = c(7.5, 15), quarter_2022 = c("Q1",
"Q2"), total_attainment = c(2, 4), percent_attainment = c(0.2666,
0.2666)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
Quarter | Target | Attainment
2022-01-01 7.5 2
2022-04-01 15 4
Scenario
I would like to plot a ggplot (geom_col or geom_bar) with Quarter as x-axis and Attainment as y-axis with Target as a horizontal dash line that shows how far off I am from that value.
However, I am having trouble plotting YTD (Total attainment given # of quarters) in the same plot. Here is an example of how I used dplyr to create new field that shows calculated YTD value:
Desired output
Quarter | Target | Attainment | YTD. | % Attainment
2022-01-01 7.5 2 2 27
2022-04-01 15 4 6 40
Which is the best way to plot this via ggplot in R? Here is my current approach but having trouble incorporating all the above:
df1 <- df %>%
mutate(YTD_TOTAL = sum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = sum(total_attainment) / max(attainment_target))
ggplot(data = df1, aes(fill=quarter_2022, x=attainment_target, y=total_attainment, color = quarter_2022, palette = "Paired",
label = TRUE,
position = position_dodge(0.9)))
Not sure exactly what you have in mind but here are some of the pieces you might want to use:
df %>%
mutate(YTD_TOTAL = cumsum(total_attainment)) %>%
mutate(YTD_PERCENT_ATTAINMENT = YTD_TOTAL/ attainment_target) %>%
ggplot(aes(quarter_2022, total_attainment)) +
geom_col(aes(y = YTD_TOTAL), fill = NA, color = "gray20") +
geom_text(aes(y = YTD_TOTAL, label = scales::percent(YTD_PERCENT_ATTAINMENT)),
vjust = -0.5) +
geom_col(fill = "gray70", color = "gray20") +
geom_text(aes(label = total_attainment),
position = position_stack(vjust = 0.5)) +
geom_segment(aes(x = as.numeric(as.factor(quarter_2022)) - 0.4,
xend = as.numeric(as.factor(quarter_2022)) + 0.4,
y = attainment_target, yend = attainment_target),
linetype = "dashed")

Color the current data points differently in the scatter plot

My data tables can contain either daily data or weekly data from the year 2020. I create the scatter plot with the regression line as follows:
##########
## DAY: ##
##########
dt.day <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
DE = rnorm(365, 4, 1), Austria = rnorm(365, 10, 2),
Czechia = rnorm(365, 1, 2), check.names = FALSE)
## Linear regression: ##
regLine <- lm(DE ~ Austria, data = dt.day)
## PLOT: ##
p <- ggplot(data = dt.day, aes(x = Austria, y = DE,
text = paste("Date: ", date, '\n',
"Austria: ", Austria, "GWh/h", '\n',
"DE: ", DE, "\u20ac/MWh"),
group = 1)
) +
geom_point(color = "#419F44") +
geom_smooth(method = "lm", se = FALSE, color = "#007d3c") +
theme_classic() +
theme(legend.position = "none") +
theme(panel.background = element_blank()) +
xlab("Austria") +
ylab("DE")
# Correlation plot converting from ggplot to plotly: #
AUSTRIA <- plotly::ggplotly(p, tooltip = "text")
###########
## WEEK: ##
###########
dt.week <- data.table(date = seq(as.Date('2020-01-01'), by = '7 day', length.out = 53),
Germany = rnorm(53, 4, 1), Austria = rnorm(53, 10, 2),
Czechia = rnorm(53, 1, 2), check.names = FALSE)
The plot for the daily data looks like this:
I would like to plot the last 10 days of my data table (regardless of how much data it contains, because it could only be data from January to April every day) in a different color ("#F07D00").
The plot for the weekly data table is plotted analogously. I would like to color differently the last 4 weeks.
I also have another question:
If I had a data table that had hourly entries every day (i.e. 24 per day), how would that work for the points for the last 2 days? The format of the 1st column (date) has the format "POSIXct" "POSIXt" as the following:
You could use library(lubridate). For the last 10 days:
geom_point(aes(color = ifelse(date >= now()-days(10), "#F07D00", "#007d3c")))

Changing Date Labels From Odd to Even Years

I want to make a seemingly trivial adjustment to the chart pictured below:
I would like the labels along the x-axis to be even years, rather than odd years. So instead of going from 2009 -> 2011 -> 2013, they should go from 2008 -> 2010 -> 2012, and so forth...
How do I go about doing this?
Here is the code:
germany_yields <- read.csv(file = "Germany 10-Year Yield Weekly (2007-2020).csv", stringsAsFactors = F)
italy_yields <- read.csv(file = "Italy 10-Year Yield Weekly (2007-2020).csv", stringsAsFactors = F)
germany_yields <- germany_yields[, -(3:6)]
italy_yields <- italy_yields[, -(3:6)]
colnames(germany_yields)[1] <- "Date"
colnames(germany_yields)[2] <- "Germany.Yield"
colnames(italy_yields)[1] <- "Date"
colnames(italy_yields)[2] <- "Italy.Yield"
combined <- join(germany_yields, italy_yields, by = "Date")
combined <- na.omit(combined)
combined$Date <- as.Date(combined$Date,format = "%B %d, %Y")
combined["Spread"] <- combined$Italy.Yield - combined$Germany.Yield
fl_dates <- c(tail(combined$Date, n=1), head(combined$Date, n=1))
ggplot(data=combined, aes(x = Date, y = Spread)) + geom_line() +
scale_x_date(limits = fl_dates,
expand = c(0, 0),
date_breaks = "2 years",
date_labels = "%Y")
A -- not very elegant -- way would be to put these arguments in your scale_x_date() :
scale_x_date(date_labels = "%Y",
breaks = ymd(unique(year(combined$fl_dates)[year(combined$fl_dates)%%2 == 0]), truncated = 2L)
(we define breaks manually, by subsetting the whole range of dates and keeping the even years)
That's actually fairly simple. Just set the lower limit to an even number, and set the upper limit to NA. As you haven't provided a reproducible example, here on some fake data.
library(tidyverse)
mydates <- seq(as.Date("2007/1/1"), by = "3 months", length.out =100)
df <- tibble(
myvalue = rnorm(length(mydates))
)
# without limits argument
ggplot(df ) +
aes(x = mydates, y = myvalue) +
geom_line(size = 1L, colour = "#0c4c8a") +
scale_x_date(date_breaks = "2 years",
date_labels = "%Y")
# with limits argument
ggplot(df ) +
aes(x = mydates, y = myvalue) +
geom_line(size = 1L, colour = "#0c4c8a") +
scale_x_date(date_breaks = "2 years",
date_labels = "%Y",
limits = c(as.Date("2006/1/1"), NA))
Created on 2020-04-29 by the reprex package (v0.3.0)

How plot timing graph with specific options

I have this data.table which has 3 columns. the first one is about MonthlySalesMean , the second is the year and then the month.
> data[,MonthlySalesMean:=mean(StoreMean),by=c("DateMonth","DateYear")][,c("MonthlySalesMean","DateYear","DateMonth")]
MonthlySalesMean DateYear DateMonth
1: 6839.340 2015 7
2: 6839.340 2015 7
3: 6839.340 2015 7
4: 6839.340 2015 7
5: 6839.340 2015 7
---
641938: 6852.171 2013 1
641939: 6852.171 2013 1
641940: 6852.171 2013 1
641941: 6852.171 2013 1
641942: 6852.171 2013 1
I need to plot a graph of three lines because I have 3 years:
> unique(data[,DateYear])
[1] 2015 2014 2013
>
And For each year or each line, it should be plotted across all months of a year the MonthlySalesMean values. In another word it should be like this graph:
How can I do this, please?
thank you for advance!
Without a reproducible example, I can't test with your data, but here's the idea. You plot a path, with aesthetics of sales (y) against month (x) grouped by year (color)
library(tidyverse)
example_data <- tibble(
MonthlySalesMean = rnorm(36, 100, 20),
DateYear = c(rep(2013, 12), rep(2014, 12), rep(2015, 12)),
DateMonth = c(1:12, 1:12, 1:12)
)
ggplot(example_data, aes(x = DateMonth, y = MonthlySalesMean, color = as.factor(DateYear))) +
geom_path() +
geom_point(size = 2) +
geom_text(aes(label = DateYear),
data = filter(example_data, DateMonth == 1),
nudge_x = -0.5) + # plot year numbers
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_colour_manual(guide = FALSE, # hides legend
values = c("red", "green", "blue")) + # custom colors
expand_limits(x = 0.5) + # adds a space before January
labs(x = "Month", y = "Sales") +
theme_bw() +
theme(panel.grid = element_blank()) # removes gridlines

Resources