Draw monthly data in geofacet US maps - r

I have a data df with the format
State
Date
Ratio
AL
2019-01
10.1
AL
2019-02
12.1
...
...
...
NY
2019-01
15.1
...
...
...
And I would like to draw a time series with the geofacet package. I am having troubles with the Date format I guess.
ggplot(df,aes(Date, Ratio)) + geom_line() + facet_geo(~ State, grid = "us_state_grid2") + ylab("Rate (%)")
The following errors shown:geom_path: Each group consists of only one observation. Do you
need to adjust the group aesthetic?
How I can adjust it?

Your date is structured 'yyyy-mm', so I'm guessing it's a character vector rather than a date object. You should convert it to class Date with as.Date() and then it should work as expected. (You'll need to paste on the day of the month.)
You get a grouping error because when your x-axis is a character vector, geom_line will group by values of the character vector x-axis. Lines are drawn instead between the various y values at each x value. Here's an example using the geofacet package's own state_ranks dataset.
library(ggplot2)
library(dplyr)
library(geofacet)
data(state_ranks)
# The lines are not connected across a character x-axis.
ggplot(state_ranks) +
geom_line(aes(x = variable, y = rank))
# Throws error: geom_path: Each group consists of only one observation. Do
# you need to adjust the group aesthetic?
ggplot(state_ranks) +
geom_line(aes(x = variable, y = rank)) +
facet_geo(~ state)
If you group by state, you get the expected result (with an alphabetically ordered x-axis).
# Works, x-axis is alphabetized and lines are connected
ggplot(state_ranks) +
geom_line(aes(x = variable, y = rank, group = state)) +
facet_geo(~ state)

Related

Zig Zag when using geom_line with ggplot in R

I would really appreciate some insight on the zagging when using the following code in R:
tbi_military %>%
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))
The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.
enter image description here
This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:
library(ggplot2)
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line()
You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.
The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(aes(group = interaction(Chick, Diet)))
If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(stat = "summary", fun = mean)
Created on 2021-08-30 by the reprex package (v1.0.0)

How to shade under part of a line from a dataset

I have a simple plot of same data from an experiment.
plot(x=sample95$PositionA, y=sample95$AbsA, xlab=expression(position (mm)), ylab=expression(A[260]), type='l')
I would like to shade a particular area under the line, let's say from 35-45mm. From what I've searched so far, I think I need to use the polygon function, but I'm unsure how to assign vertices from a big dataset like this. Every example I've seen so far uses a normal curve.
Any help is appreciated, I am very new to R/RStudio!
Here is a solution using tidyverse tools including ggplot2. I use the built in airquality dataset as an example.
This first part is just to put the data in a format that we can plot by combining the month and the day into a single date. You can just substitute date for PositionA in your data.
library(tidyverse)
df <- airquality %>%
as_tibble() %>%
magrittr::set_colnames(str_to_lower(colnames(.))) %>%
mutate(date = as.Date(str_c("1973-", month, "-", day)))
This is the plot code. In ggplot2, we start with the function ggplot() and add geom functions to it with + to create the plot in layers.
The first function, geom_line, joins up all observations in the order that they appear based on the x variable, so it makes the line that we see. Each geom needs a particular mapping to an aesthetic, so here we want date on the x axis and temp on the y axis, so we write aes(x = date, y = temp).
The second function, geom_ribbon, is designed to plot bands at particular x values between a ymax and a ymin. This lets us shade the area underneath the line by choosing a constant ymin = 55 (a value lower than the minimum temperature) and setting ymax = temp.
We shade a specific part of the chart by specifying the data argument. Normally geom functions act on the dataset inherited from ggplot(), but you can override them by specifying individually. Here we use filter to only plot the points where the date is in June in geom_ribbon.
ggplot(df) +
geom_line(aes(x = date, y = temp)) +
geom_ribbon(
data = filter(df, date < as.Date("1973-07-01") & date > as.Date("1973-06-01")),
mapping = aes(x = date, ymax = temp, ymin = 55)
)
This gives the chart below:
Created on 2018-02-20 by the reprex package (v0.2.0).

Grouped bar chart with date on x-axis

I'm getting back to R, and I have some trouble plotting the data I want.
It's in this format :
date value1 value2
10/25/2016 50 60
12/16/2016 70 80
01/05/2017 35 45
And I would like to plot value1 and value2 next to each other, with the corresponding date on the x axis. So far I have this, I tried to plot only value1 first :
df$date <- as.Date(df$date, "%m/%d/%Y")
ggplot(data=df,aes(x=date,y=value1))
But the resulting plot doesn't show anything. The maximum values on the x and y axis seem to correspond to the ranges of my dataframe, but why is nothing showing up?
It works with plot(df$date,df$value1) though, so I don't get what I am doing wrong.
the ggplot call alone does not actually create any layers on the plot. You need to add a geom.
For this you probably want geom_point() or geom_line()
ggplot(data=df,aes(x=date,y=value1)) +
geom_point()
or
ggplot(data=df,aes(x=date,y=value1)) +
geom_line()
or you could do both if you want points and lines
ggplot(data=df,aes(x=date,y=value1)) +
geom_point() +
geom_line()
If you want both values on the plot, I would recommend doing some data manipulation first with the tidyr package.
df %>%
gather(key = "group", value = "value", value1:value2) %>%
ggplot(aes(date, value, color = group, group = group)) +
geom_line()

R Setting Y Axis to Count Distinct in ggplot2

I have a data frame that contains 4 variables: an ID number (chr), a degree type (factor w/ 2 levels of Grad and Undergrad), a degree year (chr with year), and Employment Record Type (factor w/ 6 levels).
I would like to display this data as a count of the unique ID numbers by year as a stacked area plot of the 6 Employment Record Types. So, count of # of ID numbers on the y-axis, degree year on the x-axis, the value of x being number of IDs for that year, and the fill will handle the Record Type. I am using ggplot2 in RStudio.
I used the following code, but the y axis does not count distinct IDs:
ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID,
fill=Employment.Data.Type)) + geom_freqpoly() +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
I also tried setting y = Entity.ID to y = ..count.. and that did not work either. I have searched for solutions as it seems to be a problem with how I am writing the aes code.
I also tried the following code based on examples of similar plots:
ggplot(AlumJobStatusCopy, aes(interval)) +
geom_area(aes(x=Degree.Year, y = Entity.ID,
fill = Employment.Data.Type)) +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
This does not even seem to work. I've read the documentation and am at my wit's end.
EDIT:
After figuring out the answer to the problem, I realized that I was not actually using the correct values for my Year variable. A count tells me nothing as I am trying to display the rise in a lack of records and the decline in current records.
My Dataset:
Year, int, 1960-2015
Current Record, num: % of total records that are current
No Record, num: % of total records that are not current
Ergo each Year value has two corresponding percent values. I am now using 2 lines instead of an area plot since the Y axis has distinct values instead of a count function, but I would still like the area under the curves filled. I tried using Melt to convert the data from wide to long, but was still unable to fill both lines. Filling is just for aesthetic purposes as I would like to use a gradient for each with 1 fill being slightly lighter than the other.
Here is my current code:
ggplot(Alum, aes(Year)) +
geom_line(aes(y = Percent.Records, colour = "Percent.Records")) +
geom_line(aes(y = Percent.No.Records, colour = "Percent.No.Records")) +
scale_y_continuous(labels = percent) + ylab('Percent of Total Records') +
ggtitle("Active, Living Alumni Employment Record") +
scale_x_continuous(breaks=seq(1960, 2014, by=5))
I cannot post an image yet.
I think you're missing a step where you summarize the data to get the quantities to plot on the y-axis. Here's an example with some toy data similar to how you describe yours:
# Make toy data with three levels of employment type
set.seed(1)
df <- data.frame(Entity.ID = rep(LETTERS[1:10], 3), Degree.Year = rep(seq(1990, 1992), each=10),
Degree.Type = sample(c("grad", "undergrad"), 30, replace=TRUE),
Employment.Data.Type = sample(as.character(1:3), 30, replace=TRUE))
# Here's the part you're missing, where you summarize for plotting
library(dplyr)
dfsum <- df %>%
group_by(Degree.Year, Employment.Data.Type) %>%
tally()
# Now plot that, using the sums as your y values
library(ggplot2)
ggplot(dfsum, aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")
The result could use some fine-tuning, but I think it's what you mean. Here, the bars are equal height because each year in the toy data include an equal numbers of IDs; if the count of IDs varied, so would the total bar height.
If you don't want to add objects to your workspace, just do the summing in the call to ggplot():
ggplot(tally(group_by(df, Degree.Year, Employment.Data.Type)),
aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")

ggplot2 time series plotting: how to omit periods when there is no data points?

I have a time series with multiple days of data. In between each day there's one period with no data points. How can I omit these periods when plotting the time series using ggplot2?
An artificial example shown as below, how can I get rid of the two periods where there's no data?
code:
Time = Sys.time()+(seq(1,100)*60+c(rep(1,100)*3600*24, rep(2, 100)*3600*24, rep(3, 100)*3600*24))
Value = rnorm(length(Time))
g <- ggplot()
g <- g + geom_line (aes(x=Time, y=Value))
g
First, create a grouping variable. Here, two groups are different if the time difference is larger than 1 minute:
Group <- c(0, cumsum(diff(Time) > 1))
Now three distinct panels could be created using facet_grid and the argument scales = "free_x":
library(ggplot2)
g <- ggplot(data.frame(Time, Value, Group)) +
geom_line (aes(x=Time, y=Value)) +
facet_grid(~ Group, scales = "free_x")
The problem is that how does ggplot2 know you have missing values? I see two options:
Pad out your time series with NA values
Add an additional variable representing a "group". For example,
dd = data.frame(Time, Value)
##type contains three distinct values
dd$type = factor(cumsum(c(0, as.numeric(diff(dd$Time) - 1))))
##Plot, but use the group aesthetic
ggplot(dd, aes(x=Time, y=Value)) +
geom_line (aes(group=type))
gives
csgillespie mentioned padding by NA, but a simpler method is to add one NA after each block:
Value[seq(1,length(Value)-1,by=100)]=NA
where the -1 avoids a warning.

Resources