Removing unused months on ggplot - r

I have a dataframe that only lists months from October to April. When I plot this data on a line graph, it includes the unused months as well. I would only like to show the months that are listed in the data so there is no unused space on the plot.
The code I am using for the plot
gplot(df,aes(GAME_DATE,DEF_RATING)) +
geom_line(aes(y=rollmean(df$DEF_RATING,9, na.pad = TRUE))) +
geom_line(aes(y=rollmean(df$OFF_RATING,9,na.pad = TRUE)),color='steelblue')
Sample data
GAME_DATE OFF_RATING DEF_RATING
<dttm> <dbl> <dbl>
1 2017-04-12 106.1 113.1
2 2017-04-10 107.1 100.8
3 2017-04-08 104.4 105.1
4 2017-04-06 116.1 105.9
5 2017-04-04 116.9 116.0

ggplot2 doesn't allow broken axes because such axes can be misleading. However, if you still want to proceed with this, you can simulate a broken axis with faceting. To do this, create a grouping variable to mark each "island" where data is present with a unique group code and then facet by those group codes.
Also, the data should be converted to long format before plotting, so that you can get two separate colored lines with a single call to geom_line. Mapping a column to color inside aes will also automatically generate a legend.
Here's an example with fake data:
library(tidyverse)
# Fake data
set.seed(2)
dat = data.frame(x=1950:2000,
y1=c(cumsum(rnorm(30)), rep(NA,10), cumsum(rnorm(11))),
y2=c(cumsum(rnorm(30)), rep(NA,10), cumsum(rnorm(11))))
dat %>%
# Convert to long format
gather(key, value, y1:y2) %>%
# Add the grouping variable
group_by(key) %>%
mutate(group=c(0, cumsum(diff(as.integer(is.na(value)))!=0))) %>%
# Remove missing values
filter(!is.na(value)) %>%
ggplot(aes(x, value, colour=key)) +
geom_line() +
scale_x_continuous(breaks=seq(1950,2000,10), expand=c(0,0.1)) +
facet_grid(. ~ group, scales="free_x", space="free_x") +
theme(strip.background=element_blank(),
strip.text=element_blank())

You can try to delimiting x-axis with "scale_x_date()" for your present dates like this:
gplot(df,aes(GAME_DATE,DEF_RATING)) +
geom_line(aes(y=rollmean(df$DEF_RATING,9, na.pad = TRUE))) +
geom_line(aes(y=rollmean(df$OFF_RATING,9,na.pad = TRUE)),color='steelblue') +
scale_x_date(date_labels="%b",breaks=seq(min(df$GAME_DATE),max(df$GAME_DATE), "1 month"))

Related

How to have separate columns for duplicate x-axis values in geom_col()?

I have a dataframe as below (very simple structure) and I want to draw a column chart to show the amount for each date. The issue is that the date has duplicate entries (e.g., 2020-01-15).
# A tibble: 5 x 2
date amount
<date> <dbl>
1 2020-01-02 4000
2 2020-01-06 2568.
3 2020-01-15 2615.
4 2020-01-15 2565
5 2020-01-16 2640
When I try doing the following it somehow groups the similar dates together and draws a stacked column chart which is NOT what I want.
df %>%
ggplot(aes(x= factor(date), y=amount)) +
geom_col()
scale_x_discrete( labels = df$date ) #this creates discrete x-axis labels but the values are still stacked. So it just messes things up.
There's no issue if i'm using geom_line() but I want to see a bar for each date. Any idea how to do this?
Try:
df %>%
ggplot(aes(date, amount)) +
geom_col(position = position_dodge2()) +
scale_x_date(breaks = unique(df$date))
Result:

ggplot2 - How to plot length of time using geom_bar?

I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))

how do I plot 3 variable separarelt in ggplot?

I want to create a time series plot showing how two variables have changed overtime and colour them to their appropriate region?
I have 2 regions, England and Wales and for each I have calculated the total_tax and the total_income.
I want to plot these on a ggplot over the years, using the years variable.
How would I do this and colour the regions separately?
I have the year variable which I will put on the x axis, then I want to plot both incometax and taxpaid on the graph but show how they have both changed over time?
How would I add a 3rd axis to get the plot how these two variables have changed overtime?
I have tried this code but it has not worked the way I wanted it to do.
ggplot(tax_data, filter %>% aes(x=date)) +
geom_line(aes(y=incometax, color=region)) +
geom_line(aes(y=taxpaid, color=region))+
ggplot is at the beginning a bit hard to grasp - I guess you're trying to achieve something like the following:
Assuming your data is in a format with a column for each date, incometax and taxpaid - I'm creating here an example:
library(tidyverse)
dataset <- tibble(date = seq(from = as.Date("2015-01-01"), to = as.Date("2019-12-31"), by = "month"),
incometax = rnorm(60, 100, 10),
taxpaid = rnorm(60, 60, 5))
Now, for plotting a line for each incometax and taxpaid we need to shape or "tidy" the data (see here for details):
dataset <- dataset %>% pivot_longer(cols = c(incometax, taxpaid))
Now you have three columns like this - we've turned the former column names into the variable name:
# A tibble: 6 x 3
date name value
<date> <chr> <dbl>
1 2015-01-01 incometax 106.
2 2015-01-01 taxpaid 56.9
3 2015-02-01 incometax 112.
4 2015-02-01 taxpaid 65.0
5 2015-03-01 incometax 95.8
6 2015-03-01 taxpaid 64.6
this has now the right format for ggplot and you can map the name to the colour of the lines:
ggplot(dataset, aes(x = date, y = value, colour = name)) + geom_line()

R - How to create a seasonal plot - Different lines for years

I already asked the same question yesterday, but I didnt get any suggestions until now, so I decided to delete the old one and ask again, giving additional infos.
So here again:
I have a dataframe like this:
Link to the original dataframe: https://megastore.uni-augsburg.de/get/JVu_V51GvQ/
Date DENI011
1 1993-01-01 9.946
2 1993-01-02 13.663
3 1993-01-03 6.502
4 1993-01-04 6.031
5 1993-01-05 15.241
6 1993-01-06 6.561
....
....
6569 2010-12-26 44.113
6570 2010-12-27 34.764
6571 2010-12-28 51.659
6572 2010-12-29 28.259
6573 2010-12-30 19.512
6574 2010-12-31 30.231
I want to create a plot that enables me to compare the monthly values in the DENI011 over the years. So I want to have something like this:
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot
Jan-Dec on the x-scale, values on the y-scale and the years displayed by different colored lines.
I found several similar questions here, but nothing works for me. I tried to follow the instructions on the website with the example, but the problem is that I cant create a ts-object.
Then I tried it this way:
Ref_Data$MonthN <- as.numeric(format(as.Date(Ref_Data$Date),"%m")) # Month's number
Ref_Data$YearN <- as.numeric(format(as.Date(Ref_Data$Date),"%Y"))
Ref_Data$Month <- months(as.Date(Ref_Data$Date), abbreviate=TRUE) # Month's abbr.
g <- ggplot(data = Ref_Data, aes(x = MonthN, y = DENI011, group = YearN, colour=YearN)) +
geom_line() +
scale_x_discrete(breaks = Ref_Data$MonthN, labels = Ref_Data$Month)
That also didnt work, the plot looks horrible. I dont need to put all the years in 1 plot from 1993-2010. Actually only a few years would be ok, like from 1998-2006 maybe.
And suggestions, how to solve this?
As others have noted, in order to create a plot such as the one you used as an example, you'll have to aggregate your data first. However, it's also possible to retain daily data in a similar plot.
reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-02-11
library(tidyverse)
library(lubridate)
# Import the data
url <- "https://megastore.uni-augsburg.de/get/JVu_V51GvQ/"
raw <- read.table(url, stringsAsFactors = FALSE)
# Parse the dates, and use lower case names
df <- as_tibble(raw) %>%
rename_all(tolower) %>%
mutate(date = ymd(date))
One trick to achieve this would be to set the year component in your date variable to a constant, effectively collapsing the dates to a single year, and then controlling the axis labelling so that you don't include the constant year in the plot.
# Define the plot
p <- df %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, deni011, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
In this case though, your daily data are quite variable, so this is a bit of a mess. You could hone in on a single year to see the daily variation a bit better.
# Hone in on a single year
p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
geom_line(data = function(x) filter(x, year == 2010), size = 1)
But ultimately, if you want to look a several years at a time, it's probably a good idea to present smoothed lines rather than raw daily values. Or, indeed, some monthly aggregate.
# Smoothed version
p + geom_smooth(se = F)
#> `geom_smooth()` using method = 'loess'
#> Warning: Removed 117 rows containing non-finite values (stat_smooth).
There are multiple values from one month, so when plotting your original data, you got multiple points in one month. Therefore, the line looks strange.
If you want to create something similar to the example your provided, you have to summarize your data by year and month. Below I calculated the mean of each year and month for your data. In addition, you need to convert your year and month to factors if you want to plot it as discrete variables.
library(dplyr)
Ref_Data2 <- Ref_Data %>%
group_by(MonthN, YearN, Month) %>%
summarize(DENI011 = mean(DENI011)) %>%
ungroup() %>%
# Convert the Month column to factor variable with levels from Jan to Dec
# Convert the YearN column to factor
mutate(Month = factor(Month, levels = unique(Month)),
YearN = as.factor(YearN))
g <- ggplot(data = Ref_Data2,
aes(x = Month, y = DENI011, group = YearN, colour = YearN)) +
geom_line()
g
If you don't want to add in library(dplyr), this is the base R code. Exact same strategy and results as www's answer.
dat <- read.delim("~/Downloads/df1.dat", sep = " ")
dat$Date <- as.Date(dat$Date)
dat$month <- factor(months(dat$Date, TRUE), levels = month.abb)
dat$year <- gsub("-.*", "", dat$Date)
month_summary <- aggregate(DENI011 ~ month + year, data = dat, mean)
ggplot(month_summary, aes(month, DENI011, color = year, group = year)) +
geom_path()

Grouped bar chart with date on x-axis

I'm getting back to R, and I have some trouble plotting the data I want.
It's in this format :
date value1 value2
10/25/2016 50 60
12/16/2016 70 80
01/05/2017 35 45
And I would like to plot value1 and value2 next to each other, with the corresponding date on the x axis. So far I have this, I tried to plot only value1 first :
df$date <- as.Date(df$date, "%m/%d/%Y")
ggplot(data=df,aes(x=date,y=value1))
But the resulting plot doesn't show anything. The maximum values on the x and y axis seem to correspond to the ranges of my dataframe, but why is nothing showing up?
It works with plot(df$date,df$value1) though, so I don't get what I am doing wrong.
the ggplot call alone does not actually create any layers on the plot. You need to add a geom.
For this you probably want geom_point() or geom_line()
ggplot(data=df,aes(x=date,y=value1)) +
geom_point()
or
ggplot(data=df,aes(x=date,y=value1)) +
geom_line()
or you could do both if you want points and lines
ggplot(data=df,aes(x=date,y=value1)) +
geom_point() +
geom_line()
If you want both values on the plot, I would recommend doing some data manipulation first with the tidyr package.
df %>%
gather(key = "group", value = "value", value1:value2) %>%
ggplot(aes(date, value, color = group, group = group)) +
geom_line()

Resources