I was trying to create a plot for each of the 4 quarters in my data. So I used:
qplot(date, mape, data = df, facets = .~quarter)
The resulting plot was:
The X-axis is using all the months present in the "date" column - from Jan - Dec. But the data on the chart is limited to the months of the quarter for e.g. Jan-Feb-March in the Q1 facet, Apr-May-June for Q2 etc. How can I limit the x-axis to the months corresponding to the quarter only?
as an alternate, I did this but its not exactly how I want to display my data: qplot(strftime(df$date, format = "%b"), mape, data = predicted.modelset)
It would be best if you could provide data for this but I am pretty sure the following will work:
qplot(date, mape, data = df) + facet_wrap(~quarter,scales="free_x") #without a dot before ~
Related
SO!
I am trying to create a plot of monthly deviations from annual means for temperature data using a bar chart. I have data across many years and I want to show the seasonal behavior in temperatures between months. The bars should represent the deviation from the annual average, which is recalculated for each year. Here is an example that is similar to what I want, only it is for a single year:
My data is sensitive so I cannot share it yet, but I made a reproducible example using the txhousing dataset (it comes with ggplot2). The salesdiff column is the deviation between monthly sales (averaged acrross all cities) and the annual average for each year. Now the problem is plotting it.
library(ggplot2)
df <- aggregate(sales~month+year,txhousing,mean)
df2 <- aggregate(sales~year,txhousing,mean)
df2$sales2 <- df2$sales #RENAME sales
df2 <- df2[,-2] #REMOVE sales
df3<-merge(df,df2) #MERGE dataframes
df3$salesdiff <- df3$sales - df3$sales2 #FIND deviation between monthly and annual means
#plot deviations
ggplot(df3,aes(x=month,y=salesdiff)) +
geom_col()
My ggplot is not looking good at the moment-
Somehow it is stacking the columns for each month with all of the data across the years. Ideally the date would be along the x-axis spanning many years (I think the dataset is from 2000-2015...), and different colors depending on if salesdiff is higher or lower. You are all awesome, and I would welcome ANY advice!!!!
Probably the main issue here is that geom_col() will not take on different aesthetic properties unless you explicitly tell it to. One way to get what you want is to use two calls to geom_col() to create two different bar charts that will be combined together in two different layers. Also, you're going to need to create date information which can be easily passed to ggplot(); I use the lubridate() package for this task.
Note that we combine the "month" and "year" columns here, and then useymd() to obtain date values. I chose not to convert the double valued "date" column in txhousing using something like date_decimal(), because sometimes it can confuse February and January months (e.g. Feb 1 gets "rounded down" to Jan 31).
I decided to plot a subset of the txhousing dataset, which is a lot more convenient to display for teaching purposes.
Code:
library("tidyverse")
library("ggplot2")
# subset txhousing to just years >= 2011, and calculate nested means and dates
housing_df <- filter(txhousing, year >= 2011) %>%
group_by(year, month) %>%
summarise(monthly_mean = mean(sales, na.rm = TRUE),
date = first(date)) %>%
mutate(yearmon = paste(year, month, sep = "-"),
date = ymd(yearmon, truncated = 1), # create date column
salesdiff = monthly_mean - mean(monthly_mean), # monthly deviation
higherlower = case_when(salesdiff >= 0 ~ "higher", # for fill aes later
salesdiff < 0 ~ "lower"))
ggplot(data = housing_df, aes(x = date, y = salesdiff, fill = as.factor(higherlower))) +
geom_col() +
scale_x_date(date_breaks = "6 months",
date_labels = "%b-%Y") +
scale_fill_manual(values = c("higher" = "blue", "lower" = "red")) +
theme_bw()+
theme(legend.position = "none") # remove legend
Plot:
You can see the periodic behaviour here nicely; an increase in sales appears to occur every spring, with sales decreasing during the fall and winter months. Do keep in mind that you might want to reverse the colours I assigned if you want to use this code for temperature data! This was a fun one - good luck, and happy plotting!
Something like this should work?
Basically you need to create a binary variable that lets you change the color (fill) if salesdiff is positive or negative, called below factordiff.
Plus you needed a date variable for month and year combined.
library(ggplot2)
library(dplyr)
df3$factordiff <- ifelse(df3$salesdiff>0, 1, 0) # factor variable for colors
df3 <- df3 %>%
mutate(date = paste0(year,"-", month), # this builds date like "2001-1"
date = format(date, format="%Y-%m")) # here we create the correct date format
#plot deviations
ggplot(df3,aes(x=date,y=salesdiff, fill = as.factor(factordiff))) +
geom_col()
Of course this results in a hard to read plot because you have lots of dates, you can subset it and show only a restricted time:
df3 %>%
filter(date >= "2014-1") %>% # we filter our data from 2014
ggplot(aes(x=date,y=salesdiff, fill = as.factor(factordiff))) +
geom_col() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # adds label rotation
I have a dataset with a few organisms, which I would like to plot on my y-axis, against date, which I would like to plot on the x-axis. However, I want the fluctuation of the curve to represent the abundance of the organisms. I.e I would like to plot a time series with the relative abundance separated by the organism to show similar patterns with time.
However, of course, plotting just date against an organism does not yield any information on the abundance. So, my question is, is there a way to make the curve represent abundance using ggridges?
Here is my code for an example dataset:
set.seed(1)
Data <- data.frame(
Abundance = sample(1:100),
Organism = sample(c("organism1", "organism2"), 100, replace = TRUE)
)
Date = rep(seq(from = as.Date("2016-01-01"), to = as.Date("2016-10-01"), by =
'month'),times=10)
Data <- cbind(Date, Data)
ggplot(Data, aes(x = Abundance, y = Organism)) +
geom_density_ridges(scale=1.15, alpha=0.6, color="grey90")
This produces a plot with the two organisms, however, I want the date on the x-axis and not abundance. However, this doesn't work. I have read that you need to specify group=Date or change date into julian day, however, this doesn't change the fact that I do not get to incorporate abundance into the plot.
Does anyone have an example of a plot with date vs. a categorical variable (i.e. organism) plotted against a continuous variable in ggridges?
I really like to output from ggridges and would like to be able to use it for these visualizations. Thank you in advance for your help!
Cheers,
Anni
To use geom_density_ridges, it'll help to reshape the data to show observations in separate rows, vs. as summarized by Abundance.
library(ggplot2); library(ggridges); library(dplyr)
# Uncount copies the row "Abundance" number of times
Data_sum <- Data %>%
tidyr::uncount(Abundance)
ggplot(Data_sum, aes(x = Date, y = Organism)) +
ggridges::geom_density_ridges(scale=1, alpha=0.6, color="grey90")
I have a continuous variable y measured on different dates. I need to make boxplots with a box showing the distribution of y for each 5 year interval.
Sample data:
rdob <- as.Date(dob, format= "%m/%d/%y")
ggplot(data = data, aes(x=rdob, y=ageyear)) + geom_boxplot()
#Warning message:
#Continuous x aesthetic -- did you forget aes(group=...)?
This image is the first one I tried. What I want is a box for every five year interval, instead of a box for every year.
Here is a way to pull out the year in base R:
format(as.Date("2008-11-03", format="%Y-%m-%d"), "%Y")
Simply wrap your date vector in a format() and add the "%Y". To get this to be integer, you can use as.integer.
You could also take a look at the year function in the lubridate package which will make this extraction a little bit more straightforward.
One method to get 5 year intervals is to use cut to create a factor variable that creates levels at selected break points. Unless you have dozens of years your best bet would be to set the break points manually:
df$myTimeInterval <- cut(df$years, breaks=c(1995, 2000, 2005, 2010, 2015))
Here's an example taking Dave2e's suggestion of using cut on date intervals along with ggplot's group aesthetic mapping:
library(ggplot2)
n <- 1000
## Randomly sample birth dates and dummy up an effect that trends upward with DOB
dobs <- sample(seq(as.Date('1970/01/01'), Sys.Date(), by="day"), n)
effect <- rnorm(n) + as.numeric(as.POSIXct(dobs)) / as.numeric(as.POSIXct(Sys.Date()))
data <- data.frame(dob=dobs, effect=effect)
## boxplot w/ DOB binned to 5 year intervals
ggplot(data=data, aes(x=dob, y=effect)) + geom_boxplot(aes(group=cut(dob, "5 year")))
library(lubridate)
year=year(rdob)
I have about 20 years of daily data in a time series. It has columns Date, rainfall and other data.
I am trying plot rainfall vs Time. I want to get 20 line plots with different colours and legend is generated that show the years in one graph. I tried the following codes but it is not giving me the desired results. Any suggestion to fix my issue would be most welcome
library(ggplot2)
library(seas)
data(mscdata)
p<-ggplot(data=mscdata,aes(x=date,y=precip,group=year,color=year))
p+geom_line()+scale_x_date(labels=date_format("%m"),breaks=date_breaks("1 months"))
It doesnt look great but here's a method. We first coerce the data into dates in the same year:
mscdata$dayofyear <- as.Date(format(mscdata$date, "%j"), format = "%j")
Then we plot:
library(ggplot2)
library(scales)
p <- ggplot(data = mscdata, aes(x = dayofyear, y = precip, group = year, color = year))
p + geom_line() +
scale_x_date(labels = date_format("%m"), breaks = date_breaks("1 months"))
While I agree with #Jaap that this may not be the best way to depict these data, try to following:
mscdata$doy <- as.numeric(strftime(mscdata$date, format="%j"))
ggplot(data=mscdata,aes(x=doy,y=precip,group=year)) +
geom_line(aes(color=year))
Although the given answers are good answers to your questions as it stands, i don't think it will solve your problem. I think you should be looking at a different way to present the data. #Jaap already suggested using facets. Take for example this approach:
#first add a month column to your dataframe
mscdata$month <- format(mscdata$date, "%m")
#then plot it using boxplot with year on the X-axis and month as facet.
p1 <- ggplot(data = mscdata, aes(x = year, y = precip, group=year))
p1 + geom_boxplot(outlier.shape = 3) + facet_wrap(~month)
This will give you a graph per month, showing the rainfall per year next to one each other. Because i use boxplot, the peaks in rainfall show up as dots ('normal' rain events are inside box).
Another possible approach would be to use stat_summary.
My goal is to compare the distribution of various socioeconomic factor such as income over multiple years to see how the population has evolved in particular region in say, over 5 years. The primary data for this comes from the Public Use Microdata Sample. I am using R + ggplot2 as my preferred tool.
When comparing two years worth of data (2005 and 2010) I have two data frames hh2005 and hh2010 with the household data for the two years. The income data for the two years are stored in the variable hincp in both data frames. Using ggplot2 I am going about creating the density plot for individual years as follows (example for 2010):
p1 <- ggplot(data = hh2010, aes(x=hincp))+
geom_density()+
labs(title = "Distribution of income for 2010")+
labs(y="Density")+
labs(x="Household Income")
p1
How do I overlay the 2005 density over this plot? I am unable to figure it out as having read data in as hh2010 I am not sure how to proceed. Should I be processing the data in a fundamentally different way from the very beginning?
You can pass data arguments to individual geoms, so you should be able to add the second density as a new geom like this:
p1 <- ggplot(data = hh2010, aes(x=hincp))+
geom_density() +
# Change the fill colour to differentiate it
geom_density(data=hh2005, fill="purple") +
labs(title = "Distribution of income for 2010")+
labs(y="Density")+
labs(x="Household Income")
This is how I would approach the problem:
Tag each data frame with the variable of interest (in this case, the year)
Merge the two data sets
Update the 'fill' aesthetic in the ggplot function
For example:
# tag each data frame with the year^
hh2005$year <- as.factor(2005)
hh2010$year <- as.factor(2010)
# merge the two data sets
d <- rbind(hh2005, hh2010)
d$year <- as.factor(d$year)
# update the aesthetic
p1 <- ggplot(data = d, aes(x=hincp, fill=year)) +
geom_density(alpha=.5) +
labs(title = "Distribution of income for 2005 and 2010") +
labs(y="Density") +
labs(x="Household Income")
p1
^ Note, the 'fill' parameter seems to work best when you use a factor, thus I defined the years as such. I also set the transparency of the overlapping density plots with the 'alpha' parameter.