I'm running into trouble plotting some data onto two seperate y-scales. Here are two visualizations of some air quality data I've been working with. The first figure depicts each pollutant on a parts per billion y-scale. In this figure, co dominates the y-axis, and none of the other pollutants' variation is being properly represented. Within air quality science, the pollutant co is conventionally represented in parts per million instead of parts per billion. The second figure illustrates the same no, no2, and o3 data, but I've converted the co concentration from ppb to ppm (divide by 1000). However, while no, no2, and o3 look better, the variation in co is not being justly represented...
Is there an easy way using ggplot() to normalize the scale of the y-axis and best represent each type of pollutant? I'm also trying to work through some other examples that make use of gridExtra to stitch together two seperate plots, each retaining their original y-scales.
The data required to generate these figures is huge (26,295 observations), so I'm still working on a reproducible example. Hopefully a solution can be found within the ggplot() code described below:
plt <- ggplot(df, aes(x=date, y = value, color = pollutant)) +
geom_point() +
facet_grid(id~pollutant, labeller = label_both, switch = "y")
plt
Here's what the head(df) looks like (before converting the co to ppm):
date id pollutant value
1 2017-06-16 10:00:00 Pohl co 236.00
2 2017-06-16 10:00:00 Pohl no 23.06
3 2017-06-16 10:00:00 Pohl no2 12.05
4 2017-06-16 10:00:00 Pohl o3 8.52
5 2017-06-16 11:00:00 Pohl co 207.00
6 2017-06-16 11:00:00 Pohl no 20.82
Marius pointed out that including scales = "free_y" in the facet_grid() function would provide the desired output. Thanks!
Solution:
plt <- ggplot(df, aes(x=date, y = value, color = pollutant)) +
geom_point() +
facet_grid(pollutant~id, scales = "free_y", labeller = label_both, switch = "y")
plt
Output:
Related
Working with a large chemistry dataset of samples collected at different depths. The data is in long format as:
<Date> <Depth> <Temp>
2015-06-11 4 m 15
2015-07-11 4 m 16
2015-08-11 4 m 17
2015-06-11 3 m 19
2015-07-11 3 m 20
2015-08-11 3 m 21
2015-06-11 2 m 25
2015-07-11 2 m 26
2015-08-11 2 m 27
Trying to graph it as such that I have temperature on my x-axis and depth on my y-axis and then color them by their dates. Currently when I add a geom_line to the function it just connects all the dots.
ggplot(aes(x = Temp, y = Depth, color = Date)) +
geom_point() +
geom_line()
Connection for geoms is established typically just by applying an aesthetic as you did (color=). What's actually happening there is that ggplot is applying the actual aesthetic (drawing the color), but will also apply a somewhat hidden aesthetic, group= to the same value. Normally, this works fine unless the column you assign to color= is continuous (like a date), rather than a factor (which is ordinal, but discrete). If df$Date is actually formatted as a "Date" class, then it's continuous and would exhibit behavior consistent with what you mentioned. The fix is to either explicitly define the group= aesthetic in addition to color=, or to convert df$Date to a factor (discrete).
The example below using your dataset should help explain. For exemplary purposes, I'm adding a column called df$Other, which is formatted as a factor.
df <- data.frame(
Date=rep(c('2015-06-11','2016-07-11','2015-08-11'),3),
Other=rep(c('Jun','July','Aug'),3),
Depth=c(4,4,4,3,3,3,2,2,2),
Temp=c(15,16,17,19,20,21,25,26,27)
)
df$Date <- as.Date(df$Date, format='%Y-%m-%d')
First, here's what your code posted gives you:
ggplot(df, aes(x=Temp, y=Depth, color=Date)) + geom_point() + geom_line()
Look familiar? We know that df$Date is continuous, because ggplot2 draws a legend which is continuous by default, and also because we know it is formatted as a Date class. Consider what happens if you swap out df$Other in place of df$Date:
ggplot(df, aes(x=Temp, y=Depth, color=Other)) + geom_point() + geom_line()
Now the issue should be very clear, but how can you solve it? Well, like I mentioned there are two approaches. One is to maintain df$Date as a continuous variable, but clarify to ggplot2 that you want to use this as a grouping variable. In order to do so, ggplot2 will basically convert it to a factor for purposes of connecting the lines, but keep it continuous to make the color scale:
ggplot(df, aes(x=Temp, y=Depth, color=Date)) +
geom_point() + geom_line(aes(group=Date))
One of the best options might be to set df$Date as a factor with ordered levels, since you're not actually using the "Date" class's continuous nature anyway. You can actually just use color=factor(Date) to fix it right in-line, but you'll notice that the levels are not going to be correct (in terms of the months in the correct order). In this case, I'd recommend changing the column prior to plotting using factor() and setting the levels there. Here's my solution:
# convert to character vector first
df$Date <- as.character(df$Date)
# it's already in the correct order, so just use the order of the df
df$Date <- factor(df$Date, levels=unique(df$Date))
ggplot(df, aes(x=Temp, y=Depth, color=Date)) + geom_point() + geom_line()
I have tried to use ggplot2 to create a professional-looking graph, but I have having some trouble with a lot of things. I would like to add color to the data points, add dates on the x-axis, and create a line of best fit or something similar if possible. I have been searching on Stack Exchange and Google in general to try and solve this problem but to no avail. I am using the "Civilian Labor Force Participation Rate: 20 years and over, Black or African American Men" from the Federal Reserve Bank of St. Louis (FRED).
I am using RStudio, and I imported the data from LNS11300031 and then used the read.csv() function to read it into RStudio. I initially used the plot() function to plot the data, but I want to use the ggplot() function to create a better looking graph, but when I create the graph the data points look very opaque, blurry, and cloudy, and there is no labels on the x-axis. I would like to add color and a line of best fit, but I do not know how to do that.
This is the code I used to create the graph with no x-axis labels:
ggplot(data = labor, mapping = aes(x = labor$DATE, y = labor$LNS11300031)) + geom_point(alpha = 0.1)
This is the graph that my code produced:
Here is some sample data (labor is the variable I used to store the data from the FRED site):
head(labor) DATE LNS11300031
1 1972-01-01 77.6
2 1972-02-01 78.3
3 1972-03-01 78.7
4 1972-04-01 78.6
5 1972-05-01 78.7
6 1972-06-01 79.4
I would like to change the variable name LNS11300031 to Labor Force Participation Rate
Additional information about the data:
str(labor)
'data.frame': 566 obs. of 2 variables:
$ DATE : Factor w/ 566 levels "1972-01-01","1972-02-01",..: 1 2 3 4 5 6 7 8 9 10 ...
$ LNS11300031: num 77.6 78.3 78.7 78.6 78.7 79.4 78.8 78.7 78.6 78.1 ...
I would like the code to create much clearer data points with color and a trend line, and be able to have an x-axis with the corresponding dates.
Here's a basic attempt to cover all 3 of your desired improvements:
Clearer points: don't set the alpha too low! A bit of alpha is good for overlapping points, but alpha = 0.1 makes them too blurry.
Colour: R understands simple colour names like "red", but also hex colour codes. Pick any colours you want.
Trend line: easy to add with stat_smooth(). I've used method='lm' which gives a straight linear regression line but there are more flexible alternatives.
Date labels on the x-axis: Make sure your DATE column is correctly set as a Date type, and use scale_x_date() to tweak the labels.
quantmod::getSymbols("LNS11300031", src="FRED")
# Your data is available from the quantmod package
labor = LNS11300031 %>%
as.data.frame() %>%
rownames_to_column(var = "DATE") %>%
# Make sure DATE is a Date column
mutate(DATE = as.Date(DATE))
# Generally, you don't use data$column syntax within ggplot,
# just give the column name
ggplot(data = labor, mapping = aes(x = DATE, y = LNS11300031)) +
geom_point(alpha = 0.7, colour = "#B07AA1") +
stat_smooth(method = "lm", colour = "#E15759", se = FALSE) +
scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
theme_minimal()
Output:
New to R, new to stackoverflow, so forgive me....
I'm trying to do a timeseries plot in R using ggplot2. I want to show two line graphs which are filled below their value for a given date. I've been trying to do this with the geom_area(position="identity") function.
However, only one color shows up on my graph (though both show in the legend). I started by melting my data using melt() and am now working with three columns (X=time, variable=groundwater well, value=groundwater elevation). Below is a simplified version of my code, and a screenshot at what I get.
Bank01MWtest<-data.frame(X=(c(1,2,2,1)),variable=(c("MW-01A","MW-01A","MW-01B","MW-01B")),value=(c(576,571,584,580)))
ggplot(data=Bank01MWtest, aes(x=X, y=value,group=variable))+geom_area(position="identity", aes(fill=variable))+geom_line(aes(color=variable))+coord_cartesian(ylim=c(570,590))
I want to show two colors. One color below MW.01A line and one below MW.01B line.
Any help?
Try this with geom_area, with some synthetically generated Bank01MWtest dataset:
head(Bank01MWtest)
Time variable value
1 2016-07-01 MW-01A 582.5482
2 2016-07-02 MW-01A 580.5652
3 2016-07-03 MW-01A 582.3305
4 2016-07-04 MW-01A 583.3122
5 2016-07-05 MW-01A 576.3432
6 2016-07-06 MW-01A 584.4086
tail(Bank01MWtest)
Time variable value
195 2016-10-03 MW-01B 573.8355
196 2016-10-04 MW-01B 575.3218
197 2016-10-05 MW-01B 570.8007
198 2016-10-06 MW-01B 572.3415
199 2016-10-07 MW-01B 575.3291
200 2016-10-08 MW-01B 578.0055
ggplot(data=Bank01MWtest, aes(x=Time, y=value,group=variable))+
geom_area(position='identity', aes(fill=variable), alpha=0.2)+
scale_x_date(date_breaks= "1 month", date_minor_breaks = "15 days", date_labels = "%b",
limits = c(min(Bank01MWtest$Time),max(Bank01MWtest$Time))) +
geom_line(aes(color=variable))+coord_cartesian(ylim=c(570,590))
I believe geom_area is being replaced by geom_ribbon in ggplot2, so I'll use the latter in my solution. You'll also need to restructure your data from long to wide for this solution, giving each of the legend categories their own column. I'll do this with the dcast function within the reshape2 package.
The idea here is to add layers with different ymax variables, assign legend labels with the fill option, and then add a legend with colors using the scale_fill_manual function.
library(ggplot2)
library(reshape2)
Bank01MWtest<-data.frame(X=sample(c(1,1,2,2)),
variable=sample(c("MW01A","MW01A","MW01B","MW01B")),
value=sample(c(576,571,584,580)))
### Note above I modified your category labels by getting rid of the "-" sign
### so that they can be used as variable names below.
dat = dcast(Bank01MWtest, X~variable)
ggplot(data=dat, aes(x=X)) +
geom_ribbon(aes(ymin=0, ymax=MW01A, fill="MW01A")) +
geom_ribbon(aes(ymin=0, ymax=MW01B, fill="MW01B")) +
scale_fill_manual("", values=c("green", "blue")) +
coord_cartesian(ylim=c(570,590))
head(bktst.plotdata)
date method product type actuals forecast residual Percent_error month
1 2012-12-31 bauwd CUSTM NET 194727.51 -8192.00 -202919.51 -104.21 Dec12
2 2013-01-31 bauwd CUSTM NET 470416.27 1272.01 -469144.26 -99.73 Jan13
3 2013-02-28 bauwd CUSTM NET 190943.57 -1892.45 -192836.02 -100.99 Feb13
4 2013-03-31 bauwd CUSTM NET -42908.91 2560.05 45468.96 -105.97 Mar13
5 2013-04-30 bauwd CUSTM NET -102401.68 358807.48 461209.16 -450.39 Apr13
6 2013-05-31 bauwd CUSTM NET -134869.73 337325.33 472195.06 -350.11 May13
I have been trying to plot my back test result using ggplot2. Given above a sample dataset. I have dates ranging from Dec2012 to Jul2013. 3 levels in 'method', 5 levels in 'product' and 2 levels in 'type'
I tried this code, trouble is that R is not reading x-axis correct, on the X-axis I am getting 'Jan, feb, mar, apr, may,jun, jul, aug', instead I expect R to plot Dec-to-Jul
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= date, y=Percent_error, colour=method))
facet4 <- facet_grid(product~type,scales="free_y")
title3 <- ggtitle("Percent Error - Month-over-Month")
xaxis2 <- xlab("Date")
yaxis3 <- ylab("Error (%)")
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
# Tried changing the code to this still not getting the X-axis right
month.plot1 <- ggplot(data=bktst.plotdata, aes(x= format(date,'%b%y'), y=Percent_error, colour=method))
month.plot1+geom_line(stat="identity", size=1, position="identity")+facet4+title3+xaxis2+yaxis3
Well, it looks like you are plotting the last day of each month, so it actually makes sense to me that December 31 is plotted very very close to January. If you look at the plotted points (with geom_point) you can see that each point is just to the left of the closest month axis.
It sounds like you want to plot years and months instead of actual dates. There are a variety of ways you might do this, but one thing you could is to change the day part of the date to the first of the month instead of the last of the month. Here I show how you could do this using some functions from package lubridate along with paste (I have assumed your variable date is already a Date object).
require(lubridate)
bktst.plotdata$date2 = as.Date(with(bktst.plotdata,
paste(year(date), month(date), "01", sep = "-")))
Then the plot axes start at December. You can change the format of the x axis if you load the scales package.
require(scales)
ggplot(data=bktst.plotdata, aes(x = date2, y=Percent_error, colour=method)) +
facet_grid(product~type,scales="free_y") +
ggtitle("Percent Error - Month-over-Month") +
xlab("Date") + ylab("Error (%)") +
geom_line() +
scale_x_date(labels=date_format(format = "%m-%Y"))
I have a data in R like the following:
bag_id location_type event_ts
2 155 sorter 2012-01-02 17:06:05
3 305 arrival 2012-01-01 07:20:16
1 155 transfer 2012-01-02 15:57:54
4 692 arrival 2012-03-29 09:47:52
10 748 transfer 2012-01-08 17:26:02
11 748 sorter 2012-01-08 17:30:02
12 993 arrival 2012-01-23 08:58:54
13 1019 arrival 2012-01-09 07:17:02
14 1019 sorter 2012-01-09 07:33:15
15 1154 transfer 2012-01-12 21:07:50
where class(event_ts) is POSIXct.
I wanted to find the density of bags at each location in different times.
I used the command geom_density(ggplot2) and I could plot it very nice. I wonder if there is any difference between density(base) and this command. I mean any difference about the methods that they are using or the default bandwith that they are using and the like.
I need to add the densities to my data frame. If I had used the function density(base), I knew how I can use the function approxfun to add these values to my data frame, but I wonder if it is the same when I use geom_density(ggplot2) .
A quick perusal of the ggplot2 documentation for geom_density() reveals that it wraps up the functionality in stat_density(). The first argument there references that the adjust parameter coming from the base function density(). So, to your direct question - they are built off of the same function, though the exact parameters used may be different. You have some control over setting those parameters, but you may not be able to have the amount of flexibility you want.
One alternative to using geom_density() is to calculate the density that you want outside of ggplot() and then plot it with geom_line(). For example:
library(ggplot2)
#100 random variables
x <- data.frame(x = rnorm(100))
#Calculate own density, set parameters as you desire
d <- density(x$x)
x2 <- data.frame(x = d$x, y = d$y)
#Using geom_density()
ggplot(x, aes(x)) + geom_density()
#Using home grown density
ggplot(x2, aes(x,y)) + geom_line(colour = "red")
Here, they give nearly identical plots, though they may vary more significantly with your data and your settings.