How to use conditional arguments while plotting in R? - r

I have a dataframe of which two columns are of date and sales. date column varies from 2012-10-22 to 2016-09-22. I want to plot the graph of sales in jan 2013 by day without creating any subset.
I have used this-
ggplot(subsales,aes(Date,spdby))+geom_line()
Is it possible by using ggplot()?
I have plotted the sales per day and look like this-
I want to zoom in, to January 2013 and want to extract that part as a new plot.

yes, in ggplot:
library(ggplot2)
subsales <- data.frame(
date = seq(as.Date("2013-1-1"), as.Date("2017-1-1"), by = "day"),
spdby = runif(1462, 2000, 6000)
)
ggplot(subsales, aes(date, spdby)) +
geom_line()
ggplot(subsales, aes(date, spdby)) + geom_line() +
scale_x_date(limits = c(as.Date("2013-1-1"), as.Date("2013-1-31")))
#> Warning: Removed 1431 rows containing missing values (geom_path).

Related

geom bar comapre years per month

I have 2 datas, one for 2020 and the other for 2019. Each is divided into 5 groups when each month has its own data.
I want to create a graph that compares each month for each group between the figure in 2020 and the figure in 2019.
the data for 2020 was like that-
enter image description here
and the data for 2019 was the same.
I combine the 2 datas to that:
enter image description here
The problem is that all the graphs I looked at on the internet have either one column of values or no division into months.
How can you create one graph that compares each month between 2019 and 2020?
library(tidyverse)
library(ggplot2)
# bring table in long format
longerTable <- tibble(month = 1:12, value_2020 = rnorm(12), value_2019=rnorm(12)) %>%
pivot_longer(cols=starts_with("value"), names_to="year", values_to="value")
# plot with ggplot.
ggplot(longerTable, aes(x=month, y=value, fill=year)) +
# stat = identity -> plot numbers as they are
# position = dodge -> show bars next to each other
geom_bar(stat="identity", position = "dodge")
Created on 2020-10-01 by the reprex package (v0.3.0)

R ggplot boxplot group plots by combining factors

I have some water quality (metals) results that are taken in June and December of each year. My current df has Month, Year, Detection. I would like to group by each test, ie June 2019, December 2019 and June 2020. I could create a new factor say Test with values of 0619, 1219, 0620. Also I could create a new factor from (Month Year)for each value.
Before that I was wondering if geom_boxplot could combine factor of Month, Year to accomplish plotting the 3 unique tests. Grouping by Year or Month will not give me the 3 unique tests.
I am looking for a call syntax solution before the new factor route.
ggplot(data = Agm, aes(x = Month+Year, y = Level) , na.rm=TRUE) +
ggtitle("Lead Levels",subtitle=subtext )+
xlab("Test") + ylab("ppb") +
geom_boxplot( fill="red",width = 0.8) + theme_bw()
If I understand correctly, you want to display a boxplot using two columns of factors (Month and Year).
There are a couple of ways you can accomplish this. Firstly, you can simply paste your columns together in within the ggplot call, for example:
ggplot(data = Agm, aes(x = paste(Year, Month), y = Level)) +
geom_boxplot() + theme_bw()
In this situation though I usually create a new column and use that as the variable for the X axis. This will allow you more flexibility in managing the values and how they display. For example:
library(tidyverse)
# Create a new Date column, combining year and month, separated by a -
Agm <- Agm %>% mutate(Date = paste(Year, Month, sep = "-") %>% arrange(Date)
ggplot(data = Agm, aes(x = Date, y = Level)) +
geom_boxplot() + theme_bw()
Note, when using either method above I would suggest that you join based on the year first, and then the month as I have done, so that it doesn't order the data incorrectly on your plot. If you do month first, then January for all the years will be displayed first/left most, then February or October, depending if you have leading zeros or not.

How to plot bar chart of monthly deviations from annual mean?

SO!
I am trying to create a plot of monthly deviations from annual means for temperature data using a bar chart. I have data across many years and I want to show the seasonal behavior in temperatures between months. The bars should represent the deviation from the annual average, which is recalculated for each year. Here is an example that is similar to what I want, only it is for a single year:
My data is sensitive so I cannot share it yet, but I made a reproducible example using the txhousing dataset (it comes with ggplot2). The salesdiff column is the deviation between monthly sales (averaged acrross all cities) and the annual average for each year. Now the problem is plotting it.
library(ggplot2)
df <- aggregate(sales~month+year,txhousing,mean)
df2 <- aggregate(sales~year,txhousing,mean)
df2$sales2 <- df2$sales #RENAME sales
df2 <- df2[,-2] #REMOVE sales
df3<-merge(df,df2) #MERGE dataframes
df3$salesdiff <- df3$sales - df3$sales2 #FIND deviation between monthly and annual means
#plot deviations
ggplot(df3,aes(x=month,y=salesdiff)) +
geom_col()
My ggplot is not looking good at the moment-
Somehow it is stacking the columns for each month with all of the data across the years. Ideally the date would be along the x-axis spanning many years (I think the dataset is from 2000-2015...), and different colors depending on if salesdiff is higher or lower. You are all awesome, and I would welcome ANY advice!!!!
Probably the main issue here is that geom_col() will not take on different aesthetic properties unless you explicitly tell it to. One way to get what you want is to use two calls to geom_col() to create two different bar charts that will be combined together in two different layers. Also, you're going to need to create date information which can be easily passed to ggplot(); I use the lubridate() package for this task.
Note that we combine the "month" and "year" columns here, and then useymd() to obtain date values. I chose not to convert the double valued "date" column in txhousing using something like date_decimal(), because sometimes it can confuse February and January months (e.g. Feb 1 gets "rounded down" to Jan 31).
I decided to plot a subset of the txhousing dataset, which is a lot more convenient to display for teaching purposes.
Code:
library("tidyverse")
library("ggplot2")
# subset txhousing to just years >= 2011, and calculate nested means and dates
housing_df <- filter(txhousing, year >= 2011) %>%
group_by(year, month) %>%
summarise(monthly_mean = mean(sales, na.rm = TRUE),
date = first(date)) %>%
mutate(yearmon = paste(year, month, sep = "-"),
date = ymd(yearmon, truncated = 1), # create date column
salesdiff = monthly_mean - mean(monthly_mean), # monthly deviation
higherlower = case_when(salesdiff >= 0 ~ "higher", # for fill aes later
salesdiff < 0 ~ "lower"))
ggplot(data = housing_df, aes(x = date, y = salesdiff, fill = as.factor(higherlower))) +
geom_col() +
scale_x_date(date_breaks = "6 months",
date_labels = "%b-%Y") +
scale_fill_manual(values = c("higher" = "blue", "lower" = "red")) +
theme_bw()+
theme(legend.position = "none") # remove legend
Plot:
You can see the periodic behaviour here nicely; an increase in sales appears to occur every spring, with sales decreasing during the fall and winter months. Do keep in mind that you might want to reverse the colours I assigned if you want to use this code for temperature data! This was a fun one - good luck, and happy plotting!
Something like this should work?
Basically you need to create a binary variable that lets you change the color (fill) if salesdiff is positive or negative, called below factordiff.
Plus you needed a date variable for month and year combined.
library(ggplot2)
library(dplyr)
df3$factordiff <- ifelse(df3$salesdiff>0, 1, 0) # factor variable for colors
df3 <- df3 %>%
mutate(date = paste0(year,"-", month), # this builds date like "2001-1"
date = format(date, format="%Y-%m")) # here we create the correct date format
#plot deviations
ggplot(df3,aes(x=date,y=salesdiff, fill = as.factor(factordiff))) +
geom_col()
Of course this results in a hard to read plot because you have lots of dates, you can subset it and show only a restricted time:
df3 %>%
filter(date >= "2014-1") %>% # we filter our data from 2014
ggplot(aes(x=date,y=salesdiff, fill = as.factor(factordiff))) +
geom_col() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # adds label rotation

ggplot: Issue when converting axis values from number of days to months in a boxplot

When converting a numeric variable "number of days from 1st of January of 2015" to date, the boxplot only shows part of the range of y-values but not all.
In this example, I plotted "gender" vs "months". Months were obtained by transforming the original "days" variable (i.e. days starting from 2015/1/1). The range of numeric values should extend from the end of March to the beginning of April of the subsequent year, but ggplot() is only plotting values between Aug and Jan and showing only month labels within that range in the y-axis.
Any help to solve this issue is very welcome!
Here is the code and the corresponding plot:
gender <- c(rep("female",144), rep("male",144))
days <- c(274,285,302,330,117,230,271,207,235,249,268,NA,NA,NA,NA,210,255,290,267,252,257,268,288,220,264,270,277,303,222,252,296,323,369,NA,258,NA,240,245,310,271,272,282,314,345,214,211,258,268,145,176,244,273,249,257,277,284,272,273,272,282,290,297,260,266,277,213,247,244,269,349,268,NA,220,235,269,299,266,273,274,307,285,299,300,224,257,284,291,305,278,294,455,280,262,272,276,295,338,264,339,232,277,230,270,312,276,285,308,241,273,340,249,260,270,352,297,217,247,287,320,191,249,265,287,320,432,262,265,324,309,234,441,409,264,381,262,276,316,330,252,264,298,315,287,330,274,287,371,237,259,266,349,247,249,241,333,379,486,198,249,270,275,279,314,182,234,252,289,319,216,262,293,234,272,284,311,258,NA,299,314,290,292,296,300,274,289,359,267,319,NA,492,294,319,293,265,273,315,307,315,287,378,238,239,315,325,361,249,NA,192,224,226,204,208,234,263,283,294,430,267,273,307,327,460,240,307,319,492,300,311,485,348,297,348,317,317,318,338,316,316,336,255,284,316,249,302,307,308,301,265,273,316,281,326,272,283,NA,NA,243,254,271,191,259,324,287,265,310,337,287,326,304,399,337,295,313,228,288,307,270,347,290,245,NA,283,423,223,NA,264,314,283)
mytable <- data.frame(gender,days)
range(mytable$days, na.rm=T) # 117 to 492
mytable$months <- (as.Date(days,origin = "2015/1/1"))
ggplot(mytable, aes(x=gender, y=months,fill=gender)) +
geom_boxplot()
I am not sure about the intuition behind this plot. But, this would give you what you desire:
ggplot(mytable, aes(x=gender, y=months, fill=gender)) +
geom_boxplot() +
scale_y_date(date_labels="%b ", date_breaks ="1 month",
limits = c(as.Date("2015-3-1"), as.Date("2016-2-1")))

ggplot: Multiple years on same plot by month

So, I've hit something I don't think I have every come across. I scoured Google looking for the answer, but have not found anything (yet)...
I have two data sets - one for 2015 and one for 2016. They represent the availability of an IT system. The data frames read as such:
2015 Data Set:
variable value
Jan 2015 100
Feb 2015 99.95
... ...
2015 Data Set:
variable value
Jan 2016 99.99
Feb 2016 99.90
... ...
They just go from Jan - Dec listing the availability of the system. The "variable" column is a as.yearmon data type and the value is a simple numeric.
I want to create a geom_line() chart with ggplot2 that will basically have the percentages as the y-axis and the months as the x-axis. I have been able to do this where there are two lines, but the x-axis runs from Jan 2015 - Dec 2016. What I'd like is to have them only be plotted by month, so they overlap. I have tried some various things with the scales and so forth, but I have yet to figure out how to do this.
Basically, I need the x-axis to read January - December in chronological order, but I want to plot both 2015 and 2016 on the same chart. Here is my ggplot code (non-working) as I have it now:
ggplot(data2015,aes(variable,value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
scale_x_yearmon() +
theme_classic()
This plots in a continuous stream as I am dealing with a yearmon() data type. I have tried something like this:
ggplot(data2015,aes(months(variable),value)) +
geom_line(aes(color="2015")) +
geom_line(data=data2016,aes(color="2016")) +
theme_classic()
Obviously that won't work. I figure the months() is probably still carrying the year somehow. If I plot them as factors() they are not in order. Any help would be very much appreciated. Thank you in advance!
To get a separate line for each year, you need to extract the year from each date and map it to colour. To get months (without year) on the x-axis, you need to extract the month from each date and map to the x-axis.
library(zoo)
library(lubridate)
library(ggplot2)
Let's create some fake data with the dates in as.yearmon format. I'll create two separate data frames so as to match what you describe in your question:
# Fake data
set.seed(49)
dat1 = data.frame(date = seq(as.Date("2015-01-15"), as.Date("2015-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat1$date = as.yearmon(dat1$date)
dat2 = data.frame(date = seq(as.Date("2016-01-15"), as.Date("2016-12-15"), "1 month"),
value = cumsum(rnorm(12)))
dat2$date = as.yearmon(dat2$date)
Now for the plot. We'll extract the year and month from date with the year and month functions, respectively, from the lubridate package. We'll also turn the year into a factor, so that ggplot will use a categorical color palette for year, rather than a continuous color gradient:
ggplot(rbind(dat1,dat2), aes(month(date, label=TRUE, abbr=TRUE),
value, group=factor(year(date)), colour=factor(year(date)))) +
geom_line() +
geom_point() +
labs(x="Month", colour="Year") +
theme_classic()
month value year
Jan 99.99 2015
Feb 99.90 2015
Jan 100 2016
Feb 99.95 2016
You need one longform dataset that has a year column. Then you can plot both lines with ggplot
ggplot(dataset, aes(x = month, y = value, color = year)) + geom_line()
ggseasonplotfrom forecast package can do that for you. Example code with ts object:
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
Source

Resources