Trying to plot data from two different datasets with two axes - r

So I am using chemistry and precipitation data in the following two df:
chem_df
rain_df
I plotted the two datasets using ggplot() and in order to get 2 axes used the sex.axis function of the scale_y_continuous as follows:
chem_rain_fig <- ggplot() +
geom_point(data = chem_df, aes(x = Date, y = Temp)) +
geom_line(data = rain_df, aes(x = Date, y = Rain)) +
scale_y_continuous(name = "Temp", sec.axis = (.~*, name = "Rain"))
But it keeps plotting both of the data sets to the original y-axis as follows:
Graph with Issue
I would like to just note that the rain data is between 0-10 cm, so that is why it follows the first axis and not the secondary axis with the limit(0,10)

This might answer your question
In essence, you have to manually transform your data and scale to make it appear the right size. Can't try without sample data but this should work, multiplying by 2 and dividing the scale:
chem_rain_fig <- ggplot() +
geom_point(data = chem_df, aes(x = Date, y = Temp)) +
geom_line(data = rain_df, aes(x = Date, y = Rain*2)) +
scale_y_continuous(name = "Temp", sec.axis = sec_axis(~./2, name = "Rain"))

Related

Tips to make plot with 5 datasets clear

I'm really new to R and I'm trying to plot data from air polution with NOx from 5 different locations (having a data of monthly averages from every location from 01-1996 to 12-2019). Each plot line should represent different location.
I've created a ggplot but I find it really unclear. I would like to ask you about your tips to make that plot better to read (It will be no bigger than A4, because it will be included in my work and printed). I would also like to have more years on X axis (1996, 1997, 1998)
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
ggplot() +
geom_line(data = ALIBA, aes(x = START_TIME, y = VALUE), color = "blue") +
geom_line(data = BMISA, aes(x = START_TIME, y = VALUE), color = "red") +
geom_line(data = CCBDA, aes(x = START_TIME, y = VALUE), color = "yellow") +
geom_line(data = TKARA, aes(x = START_TIME, y = VALUE), color = "green") +
geom_line(data = UULKA, aes(x = START_TIME, y = VALUE), color = "pink")
all csv files are in format:
START_TIME,VALUE
1996-01-01T00:00:00Z,61.3049451304964
1996-02-01T00:00:00Z,47.7234010245664
1996-03-01T00:00:00Z,33.083512309072
1996-04-01T00:00:00Z,47.771166691758
1996-05-01T00:00:00Z,24.7022422574005
1996-06-01T00:00:00Z,25.4495954480684
1996-07-01T00:00:00Z,23.301224242488
...
Thanks
First, I would paste all data sets together:
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
ALIBA$Location <- "ALIBA" # and so on
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
df <- rbind(ALIBA, BMISA, ...) # and so on
ggplot(data = df, aes(x = START_TIME, y = VALUE, color = Location) +
geom_line(size = 1) + # play with the stroke thickness
scale_color_brewer(palette = "Set1") + # here you can choose from a wide variety of palettes, just google
How would you like to add more years? In the same graph (everything will be tiny) or in seperate "windows" (= facets, better)?

How to stop ggplot line plot adding fill

I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()

plot multiple lines in ggplot

I need to plot hourly data for different days using ggplot, and here is my dataset:
The data consists of hourly observations, and I want to plot each day's observation into one separate line.
Here is my code
xbj1 = bj[c(1:24),c(1,6)]
xbj2 = bj[c(24:47),c(1,6)]
xbj3 = bj[c(48:71),c(1,6)]
ggplot()+
geom_line(data = xbj1,aes(x = Date, y= Value), colour="blue") +
geom_line(data = xbj2,aes(x = Date, y= Value), colour = "grey") +
geom_line(data = xbj3,aes(x = Date, y= Value), colour = "green") +
xlab('Hour') +
ylab('PM2.5')
Please advice on this.
I'll make some fake data (I won't try to transcribe yours) first:
set.seed(2)
x <- data.frame(
Date = rep(Sys.Date() + 0:1, each = 24),
# Year, Month, Day ... are not used here
Hour = rep(0:23, times = 2),
Value = sample(1e2, size = 48, replace = TRUE)
)
This is a straight-forward ggplot2 plot:
library(ggplot2)
ggplot(x) +
geom_line(aes(Hour, Value, color = as.factor(Date))) +
scale_color_discrete(name = "Date")
ggplot(x) +
geom_line(aes(Hour, Value)) +
facet_grid(Date ~ .)
I highly recommend you find good tutorials for ggplot2, such as http://www.cookbook-r.com/Graphs/. Others exist, many quite good.

ggplot with variable line types and colors

In R with ggplot, I want to create a spaghetti plot (2 quantitative variables) grouped by a third variable to specify line color. Secondly, I want to aggregate that grouping variable with the line type or width.
Here's an example using the airquality dataset. I want the line's color to represent the month, and the summer months to have a different line width from non-summer months.
First, I created an indicator variable for the aggregated groups:
airquality$Summer <- with(airquality, ifelse(Month >= 6 & Month < 9, 1, 0))
I would like something like this, but with differing line widths:
However, this fails:
library(ggplot2)
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month), group = Summer)) +
geom_point() +
geom_line(linetype = as.factor(Summer))
This also fails (specifying airquality$Summer):
ggplot(data = airquality, aes(x=Wind, y = Temp,
color = as.factor(Month), group = airquality$Summer)) +
geom_point() +
geom_line(linetype = as.factor(airquality$Summer))
I attempted this solution, but get another error:
lty <- setNames(c(0, 1), levels(airquality$Summer))
ggplot(data = airquality, aes(x=Wind, y = Temp,
color = as.factor(Month), group = airquality$Summer)) +
geom_point() +
geom_line(linetype = as.factor(airquality$Summer)) +
scale_linetype_manual(values = lty)
Any ideas?
EDIT:
My actual data show very clear trends, and I want to differentiate the top line from all the others below. My goal is to convince people they should make more than just the minimum payment on their student loans:
You just need to change the group to Month and putlinetype in aes:
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month), group = Month)) +
geom_point() +
geom_line(aes(linetype = factor(Summer)))
If you want to specify the linetype you can use a few methods. Here is one way:
lineT <- c("solid", "dotdash")
names(lineT) <- c("1","0")
ggplot(data = airquality, aes(x=Wind, y = Temp, color = as.factor(Month))) +
geom_point() +
geom_line(aes(linetype = factor(Summer))) +
scale_linetype_manual(values = lineT)

Increasing the distance in ggplot

ggplot(data = sortmax, aes(x = Date, y = price, colour = Grade)) +geom_line(aes(group = Grade)) + geom_point()
I have five different graphs for five different grades . All the graphs are intersecting and over writing each other because of common values of price on y axis. How can I increase the distance between all these graphs ?
It will be useful if you can post output of command: dput(sortmax)
You can try separating the graphs completely by using facet_grid:
ggplot(data = sortmax, aes(x = Date, y = price, color=Grade)) +
geom_line() +
geom_point()+
facet_grid(Grade ~ .)
If you group your data only by one variable, you can also use facet_wrap. If 5 different Grade result in a too wide plot you can choose to add nrow or ncol (number of rows/columns) argument to adjust the final layout
Variant of rnso answer:
ggplot(data = sortmax, aes(x = Date, y = price, color=Grade)) +
geom_line() +
geom_point()+
facet_wrap(~Grade,nrow=2)

Resources