Remove lines between data, grouping and smoothing in ggplot

Remove lines between data, grouping and smoothing in ggplot - r

Here the problem:
I have to plot temporal series (9 years of non-continuous data). In the data frame that I am using there is a row with N.A. which separates every year. My aim is to plot this series with a smoothing line.
When I plot the temporal series, with the smooth I use this code:
df$Date <- as.Date(df$Date,"%Y/%m/%d")
ggplot(df, aes(x = Date , y = antSO4 )) +
geom_line(color="gray40")+
scale_y_continuous(expand = c(0, 0), limits = c(0, 3000)) +
scale_x_date(breaks="year", labels=date_format("%Y")) +
geom_smooth(aes(x= Date, y=antSO4), method = lm, formula = y ~ splines::bs(x, 3), se = FALSE, colour = 'red')
I obtain this:
as you can see the are some lines which connected two different yearly series, but not all the series. To eliminate this problem I group the dataset by years, with:
ggplot(df, aes(x = Date , group = year(Date), y = antSO4))
This is the plot that I obtain.
Grouping data eliminates the connections but at the same time, correctly, the smooth is calculated year per year, and not on the complete dataset.
I am quite new in the use of R, so I searched in the previous posts but I saw that many problem are connected with the presence of separation rows with N.A., and generally, the connection are between every series.
Thanks in advance for any kind of help!

Related

Plotting a continuous line from incomplete data in echarts4r

The data I'm trying to visualize has two assessments performed on the same time scale, but at different intervals (i.e. Temperature taken 4 times over 12 hours, pain assessed every hour), as an example:
df <- data.frame(
Hour = 0:12,
Pain = sample(7:10, 13, TRUE),
Temp = c(36.8,rep(NA,3),37.2,rep(NA,3),37.4,rep(NA,3),37.0)
)
In ggplot, I'd visualize it like this:
library(ggplot2)
ggplot(df) +
geom_col(aes(x = Hour, y = Pain)) +
geom_point(aes(x = Hour, y = Temp/3)) +
geom_line(data = df[!is.na(df$Temp), ], aes(x = Hour, y = Temp/3)) +
scale_y_continuous(sec.axis = sec_axis(~.*3,name = "Temp"))
In echarts4r however, I cannot get my line to be continuous (I believe because of the NA values)
library(echarts4r)
e_chart(df, Hour) |> e_bar(Pain) |> e_line(Temp)
Is there a way to subset the dataset before e_line to remove the missing values - I've searched online and can't seem to find anything? Or should I be structuring my data differently?

How to plot mixed-frequency series with NAs in ggplot?

I have the following dataframe x:
x1 <- data.frame(Date = seq(as.Date("2010-01-01"),
as.Date("2012-12-01"),
by = "month"),
TS1 = rnorm(36,0,1),
TS2 = rnorm(36,0,1),
stringsAsFactors = F)
x2 <- data.frame(Date = seq(as.Date("2010-01-01"),
as.Date("2012-12-01"),
by = "quarter"),
TS3 = rnorm(12,0,1),
stringsAsFactors = F)
x <- left_join(x1, x2, by = "Date")
x contains two monthly series, while one is quarterly.
I would like to plot all three series at the same time with ggplot. I am aware of dualplot as a way to do it. The issue with it however is that it allows you to plot only 2 mixed frequency series.
Is there anyone who can help me with this?
Thanks!

Note that ggplot requires long format, so we first use tidyr::pivot_longer.
Next, we can plot TS1 and TS2 easily, but TS3 will not plot at all as it contains missing values.
One option is to plot the line with missings with a separate geom_line call:
x2 <- x %>%
tidyr::pivot_longer(cols = c(TS1, TS2, TS3), names_to = "TS") %>%
mutate(TS = as.factor(TS))
ggplot(x2, aes(x = Date, y = value, group = TS, color = TS)) +
geom_line() +
geom_line(data = subset(x2, TS == "TS3" & !is.na(value)))

In this instance, ggplot does not have to have the data transformed into long format (although it is a nice solution, if you are familiar with transforming data, and recommended especially if there were lots of columns or separate lines to be plotted).
For simplicity, especially when learning ggplot can I propose an alternative solution.
TS1 and TS2 can easily be plotted against date, as neither have NA values. Here, we call geom_line() twice, once for each line:
x %>%
ggplot()+
geom_line(aes(Date, TS1), colour = 'red')+
geom_line(aes(Date, TS2), colour = 'blue')
If you try and include a third geom_line() with TS3, only the original two lines are plotted due to TS3's missing values (NA). A solution is to fill in the NA values in the data before plotting, using zoo::na.approx(). As the name suggests, zoo::na.approx() is able to approximate values when you have NAs, by linear interpolation. In this instance, I assume linear interpolation between known values is appropriate for plotting (as geom_line is doing anyway). Check out ?zoo::na.approx for more details, including non-linear interpolation.
zoo::na.approx(TS3, Date, na.rm = FALSE) may be read aloud like: "We want to approximate the values of TS3 when they are missing (NA), based on the values of Date, and if there are still NAs in the interpolated data keep the non-NA values we can approximate."
x %>%
mutate(
TS3 = zoo::na.approx(TS3, Date, na.rm = FALSE)
) %>%
ggplot()+
geom_line(aes(Date, TS1), colour = 'red')+
geom_line(aes(Date, TS2), colour = 'blue')+
geom_line(aes(Date, TS3), colour = 'green')
Note that the green line finishes just short (2 data points) of the other two lines. This is because by default, zoo::na.approx() doesn't interpolate when NA is not between two known data points. This is why we specified na.rm = FALSE when doing the interpolation. Look at the help page ?zoo::na.approx for alternatives (such as repeating the last known observation).

Shading different regions of the graph based on time period

I am creating a graph using ggplot2 that takes dates on the x-axis (i.e 1000 years ago) and probabilities on the y-axis. I would like to distinguish different time periods by shading regions of the graph different colors. I stored the following dates here:
paleo.dates <- c(c(13500,8000), c(13500,10050) ,c(10050,9015),
c(9015,8000), c(8000,2500), c(8000,5500), c(5500,3500), c(3500,2500),
c(2500,1150), c(2500,2000), c(2000,1500), c(1500,1150), c(1150,500))
I would like to take a time period, say 13500 to 8000, and color code it until it overlaps with another date, such as the third entry.
I am using the ggplot2 cheatsheat, and I attempted to use aes(fill = paleo.dates), but this does not work as it is not the same length as my dataset. I was also thinking of using + geom_rect() to manually fill the areas, but that does not seem very elegant, and I am not sure it will even work.
Any advice is appreciated, thank you.

You just need to create a subset of period. In this case I created a sub vector to transform into a factor to facilitate the fill.
library(dplyr)
library(ggplot2)
df <- data.frame(paleo.dates = seq(500, 13000, 100),
p = runif(n = length(seq(500, 13000, 100)),
0, 1))
sub <- data.frame(sub = rep(1:(13000/500), each = 5))
sub <- sub %>%
dplyr::slice(1:nrow(df))
df <- df %>%
dplyr::mutate(period = sub$sub,
period = as.factor(period))
ggplot2::ggplot(df) +
geom_bar(aes(x = paleo.dates, y = p,
fill = period,
col = period),
show.legend = F, stat = "identity") +
theme_bw()

Plot time series as one year

I have a time series of monthly data for 10 years:
myts <- ts(rnorm(12*10), frequency = 12, start = 2001)
Now, I'd like to plot the data but with the x-axis restricted to a range/ticks from Jan - Dec (generic year). Thus, the whole time series should be broken in ten lines where each line starts at Jan and ends at Dec. So multiple lines should be overplotted each other which I'd like to use to visually compare different years. Is there a straight forward command to do that in R?
So far I came up with following solution using matplot which might not be the most sophisticated one:
mydf <- as.data.frame(matrix(myts, 12))
matplot(mydf,type="l")
Or even better would be a way to calculate an average value and the corresponding CI/standard deviation for each month and plot then the average from Jan - Dec as a line and the corresponding CI/standard deviation as a band around the line for the average.

Consider using ggplot2.
library(ggplot2)
library(ggfortify)
d <- fortify(myts)
d$year <- format(d$Index, "%Y")
d$month <- format(d$Index, "%m")
It's useful to start by reshaping the ts object into a long dataframe. Given the dataframe, it's straightforward to create the plots you have in mind:
ggplot(d, aes(x = month, y = Data, group = year, colour = year)) +
geom_line()
ggplot(d, aes(x = month, y = Data, group = month)) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1.96))
Result:
You can also summarise the data yourself, then plot it:
d_sum <- do.call(rbind, (lapply(split(d$Data, d$month), mean_se, mult = 1.96)))
d_sum$month <- rownames(d_sum)
ggplot(d_sum, aes(x = month, y = y, ymin = ymin, ymax = ymax)) +
geom_errorbar() +
geom_point() +
geom_line(aes(x = as.numeric(month)))
Result:

Barplot from data organised by factors ggplot2 in r

I have a data frame with the following structure
df <- data.frame(Build = rep(2000:2003, each = 4),
Year = rep(2000:2003, each = 4) + 1:4, val = sort(rnorm(16)))
I would like to generate a ggplot bar plot for this data frame, using Build as x-coordinate and Year as y-coordinate, adding a gradient fill for val.
I have tried the following
ggplot(df, aes(x = Build, y = Year, fill = val)) + geom_bar(stat = "identity")
But this is what I get
What I want to see in the y-axis is the range of values that the Year variable takes for each value of Build, while preserving the color-gradient representation for value; instead what I see in the y-axis is a quantity that is not related to what I have in my data frame (sum of the values for Year?).
Could someone please point me in the right direction?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove lines between data, grouping and smoothing in ggplot - r

Related

Plotting a continuous line from incomplete data in echarts4r

How to plot mixed-frequency series with NAs in ggplot?

Shading different regions of the graph based on time period

Plot time series as one year

Barplot from data organised by factors ggplot2 in r

Categories

Resources