Tips to make plot with 5 datasets clear - r

I'm really new to R and I'm trying to plot data from air polution with NOx from 5 different locations (having a data of monthly averages from every location from 01-1996 to 12-2019). Each plot line should represent different location.
I've created a ggplot but I find it really unclear. I would like to ask you about your tips to make that plot better to read (It will be no bigger than A4, because it will be included in my work and printed). I would also like to have more years on X axis (1996, 1997, 1998)
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
ggplot() +
geom_line(data = ALIBA, aes(x = START_TIME, y = VALUE), color = "blue") +
geom_line(data = BMISA, aes(x = START_TIME, y = VALUE), color = "red") +
geom_line(data = CCBDA, aes(x = START_TIME, y = VALUE), color = "yellow") +
geom_line(data = TKARA, aes(x = START_TIME, y = VALUE), color = "green") +
geom_line(data = UULKA, aes(x = START_TIME, y = VALUE), color = "pink")
all csv files are in format:
START_TIME,VALUE
1996-01-01T00:00:00Z,61.3049451304964
1996-02-01T00:00:00Z,47.7234010245664
1996-03-01T00:00:00Z,33.083512309072
1996-04-01T00:00:00Z,47.771166691758
1996-05-01T00:00:00Z,24.7022422574005
1996-06-01T00:00:00Z,25.4495954480684
1996-07-01T00:00:00Z,23.301224242488
...
Thanks

First, I would paste all data sets together:
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
ALIBA$Location <- "ALIBA" # and so on
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
df <- rbind(ALIBA, BMISA, ...) # and so on
ggplot(data = df, aes(x = START_TIME, y = VALUE, color = Location) +
geom_line(size = 1) + # play with the stroke thickness
scale_color_brewer(palette = "Set1") + # here you can choose from a wide variety of palettes, just google
How would you like to add more years? In the same graph (everything will be tiny) or in seperate "windows" (= facets, better)?

Related

dual y-axis (bar and line) in ggplot in r

The data I have contain four columns: x, y_cnt, y1_rate, y2_rate.
set.seed(123)
x <- seq(1,10)
y_cnt <- rnorm(10, 200, 50)
y1_rate <- runif(10,0,1)
y2_rate <- runif(10,0,1)
df <- data.frame(x, y_cnt, y1_rate, y2_rate)
I need to produce a plot such that x is on the x-axis, both y1_rate and y2_rate are on the main y-axis, and y_cnt on the secondary y-axis.
Here how it looks in Excel:
Update:
This is what I've so far. It seems that the figure below shows y1_rate only.
transf_fact <- max(df$y_cnt)/max(df$y1_rate)
# Plot
ggplot(data = df,
mapping = aes(x = as.factor(x),
y = y_cnt)) +
geom_col(fill = 'red') +
geom_line(aes(y = transf_fact * y1_rate), group = 1) +
geom_line(aes(y = transf_fact * y2_rate)) +
scale_y_continuous(sec.axis = sec_axis(trans = ~ . / transf_fact,
name = "Rate"))+
labs(x = "X")
Here's an approach that adjusts the scaling of the rate variables, then gathers all the series into long form, and then shows the variables with their respective geoms.
transf_fact <- max(df$y_cnt)/max(df$y1_rate)
library(tidyverse) # Using v1.2.1
df %>%
# Scale any variables with "rate" in their name
mutate_at(vars(matches("rate")), ~.*transf_fact) %>%
# Gather into long form;
# one column specifying variable, one column specifying value
gather(y_var, val, -x) %>%
# Pipe into ggplot; all layers share x, y, and fill/color columns
ggplot(aes(x = as.factor(x), y = val, fill = y_var)) +
# bar chart only uses "y_cnt" data
geom_col(data = . %>% filter(y_var == "y_cnt")) +
# lines only use non-"y_cnt" data
geom_line(data = . %>% filter(y_var != "y_cnt"),
aes(color = y_var, group = y_var),
size = 1.2) +
# Use the "fill" aesthetic to control both colour and fill;
# geom_col typically uses fill, geom_line uses colour
scale_fill_discrete(aesthetics = c("colour", "fill")) +
scale_y_continuous(sec.axis = sec_axis(trans = ~ . / transf_fact,
name = "Rate")) +
labs(x = "X")

plot multiple lines in ggplot

I need to plot hourly data for different days using ggplot, and here is my dataset:
The data consists of hourly observations, and I want to plot each day's observation into one separate line.
Here is my code
xbj1 = bj[c(1:24),c(1,6)]
xbj2 = bj[c(24:47),c(1,6)]
xbj3 = bj[c(48:71),c(1,6)]
ggplot()+
geom_line(data = xbj1,aes(x = Date, y= Value), colour="blue") +
geom_line(data = xbj2,aes(x = Date, y= Value), colour = "grey") +
geom_line(data = xbj3,aes(x = Date, y= Value), colour = "green") +
xlab('Hour') +
ylab('PM2.5')
Please advice on this.
I'll make some fake data (I won't try to transcribe yours) first:
set.seed(2)
x <- data.frame(
Date = rep(Sys.Date() + 0:1, each = 24),
# Year, Month, Day ... are not used here
Hour = rep(0:23, times = 2),
Value = sample(1e2, size = 48, replace = TRUE)
)
This is a straight-forward ggplot2 plot:
library(ggplot2)
ggplot(x) +
geom_line(aes(Hour, Value, color = as.factor(Date))) +
scale_color_discrete(name = "Date")
ggplot(x) +
geom_line(aes(Hour, Value)) +
facet_grid(Date ~ .)
I highly recommend you find good tutorials for ggplot2, such as http://www.cookbook-r.com/Graphs/. Others exist, many quite good.

R ggplot2 - Add a ribbon for only part of the x axis

Say I have two datasets. One that contains two months of data:
units_sold <- data.frame(date = seq(as.Date("2017-05-01"), as.Date("2017-07-01"), 1),
units = rep(20,62),
category = "units_sold")
And one that contains just a week:
forecast <- data.frame(date = seq(as.Date("2017-06-12"), as.Date("2017-06-18"), 1),
units = 5,
category = "forecast")
I can put them on the same plot. I.e.,
joined <- rbind(units_sold, forecast)
ggplot(data = joined, aes(x=date, y=units, colour = category)) + geom_line()
However, I can't seem to figure out how to put a ribbon between the two lines.
This is what I'm trying:
library(dplyr)
ribbon_dat <- left_join(forecast, units_sold, by = "date") %>%
rename(forecast = units.x) %>%
rename(units_sold = units.y) %>%
select(-c(category.x, category.y))
ggplot(data = joined, aes(x=date, y=units, colour = category)) +
geom_line() +
geom_ribbon(aes(x=ribbon_dat$date, ymin=ribbon_dat$forecast, ymax=ribbon_dat$units_sold))
I get this error: Error: Aesthetics must be either length 1 or the same as the data (69): x, ymin, ymax, y, colour
You are very close, you need to pass the second dataset to the data argument in geom_ribbon().
ggplot(data = joined, aes(x = date)) +
geom_line(aes(y = units, colour = category)) +
geom_ribbon(
data = ribbon_dat,
mapping = aes(ymin = forecast, ymax = units_sold)
)

ggplot with variable line types and colors

In R with ggplot, I want to create a spaghetti plot (2 quantitative variables) grouped by a third variable to specify line color. Secondly, I want to aggregate that grouping variable with the line type or width.
Here's an example using the airquality dataset. I want the line's color to represent the month, and the summer months to have a different line width from non-summer months.
First, I created an indicator variable for the aggregated groups:
airquality$Summer <- with(airquality, ifelse(Month >= 6 & Month < 9, 1, 0))
I would like something like this, but with differing line widths:
However, this fails:
library(ggplot2)
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month), group = Summer)) +
geom_point() +
geom_line(linetype = as.factor(Summer))
This also fails (specifying airquality$Summer):
ggplot(data = airquality, aes(x=Wind, y = Temp,
color = as.factor(Month), group = airquality$Summer)) +
geom_point() +
geom_line(linetype = as.factor(airquality$Summer))
I attempted this solution, but get another error:
lty <- setNames(c(0, 1), levels(airquality$Summer))
ggplot(data = airquality, aes(x=Wind, y = Temp,
color = as.factor(Month), group = airquality$Summer)) +
geom_point() +
geom_line(linetype = as.factor(airquality$Summer)) +
scale_linetype_manual(values = lty)
Any ideas?
EDIT:
My actual data show very clear trends, and I want to differentiate the top line from all the others below. My goal is to convince people they should make more than just the minimum payment on their student loans:
You just need to change the group to Month and putlinetype in aes:
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month), group = Month)) +
geom_point() +
geom_line(aes(linetype = factor(Summer)))
If you want to specify the linetype you can use a few methods. Here is one way:
lineT <- c("solid", "dotdash")
names(lineT) <- c("1","0")
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month))) +
geom_point() +
geom_line(aes(linetype = factor(Summer))) +
scale_linetype_manual(values = lineT)

How to enforce stack ordering in ggplot geom_area

Is it possible to enforce the stack order when using geom_area()? I cannot figure out why geom_area(position = "stack") produces this strange fluctuation in stack order around 1605.
There are no missing values in the data frame.
library(ggplot2)
counts <- read.csv("https://gist.githubusercontent.com/mdlincoln/d5e1bf64a897ecb84fd6/raw/34c6d484e699e0c4676bb7b765b1b5d4022054af/counts.csv")
ggplot(counts, aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
You need to order your data. In your data, the first value found for each year is 'Flemish' until 1605, and from 1606 the first value is 'Dutch'. So, if we do this:
ggplot(counts[order(counts$nationality),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
It results in
Further illustration if we use random ordering:
set.seed(123)
ggplot(counts[sample(nrow(counts)),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
As randy said, ggplot2 2.2.0 does automatic ordering. If you want to change the order, just reorder the factors used for fill. If you want to switch which group is on top in the legend but not the plot, you can use scale_fill_manual() with the limits option.
(Code to generate ggplot colors from John Colby)
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
cols <- gg_color_hue(2)
Default ordering in legend
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Dutch","Flemish"))
Reversed ordering in legend
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Flemish","Dutch"))
Reversed ordering in plot and legend
counts$nationality <- factor(counts$nationality, rev(levels(counts$nationality)))
ggplot(counts,
aes(x = year, y = artists_strict, fill = factor(nationality))) +
geom_area()+
scale_fill_manual(values=c("Dutch" = cols[1],"Flemish"=cols[2]),
limits=c("Flemish","Dutch"))
this should do it for you
ggplot(counts[order(counts$nationality),],
aes(x = year, y = artists_strict, fill = factor(nationality))) + geom_area()
hope this helps

Resources