Format lubridate duration column for axis labelling - r

I want to plot data containing numbers and durations. For the transformation of the character vector I chose the lubridate package. Unfortunately, the duration is always printed in seconds on the x axis:
set.seed(20161027)
a <- c("00:30:45", "00:59:07", "01:08:30", "02:10:09", "02:20:53")
b <- rnorm(n = 5)
example <- data.frame(a, b)
# This is what I want
ggplot(data = example, aes(x = a, y = b)) +
geom_point()
library(lubridate)
a <- hms(a)
a <- as.duration(a)
example <- data.frame(a, b)
ggplot(data = example, aes(x = a, y = b)) +
geom_point()
This is how I want it to look like.
This is how it currently looks.
Is there a lubridate way to format the time to a prettier format? Or do I need to preserve the character vector for axis labels?

Not sure why you'd want to do this with lubridate. By using as.POSIXct:
a <- as.POSIXct(c("00:30:45", "00:59:07", "01:08:30", "02:10:09", "02:20:53"),
format="%H:%M:%S")
b <- rnorm(n = 5)
example <- data.frame(a, b)
ggplot(data = example, aes(x = a, y = b)) +
geom_point()

I think #HubertL's solution is good enough, but if you insist on using lubridate, you can try
library(lubridate)
set.seed(20161027)
a <- c("00:30:45", "00:59:07", "01:08:30", "02:10:09", "02:20:53")
b <- rnorm(5)
example <- data.frame(a=hms(a), b=b)
ggplot(data = example, aes(x = a$hour + a$minute / 60 + a$second / 60^2, y = b)) +
geom_point() +
scale_x_continuous(name="a",
breaks=c(0.5, 1, 1.5, 2),
labels=c("00:30", "01:00", "01:30", "02:00"))

Related

How to show integers when using ggplot2::geom_smooth()

In the example below, how can I round the x label to even numbers? I cant convert them as factors first, because then geom_smooth does not work
library(ggplot2)
set.seed(32)
df <- data.frame(a = as.integer(rnorm(250, 2, 0.1)))
df$b <- df$a + rnorm(250)
df$id = 1
df_2 <- df
df_2$id <- 2
df_tot <- rbind(df, df_2)
ggplot(df_tot, aes(x = a, y = b)) +
geom_smooth() +
facet_wrap(~id)
If we want even numbers, an option is to add labels as a function in scale_x_continuous
library(ggplot2)
ggplot(df_tot, aes(x = a, y = b)) +
geom_smooth() +
facet_wrap(~id) +
scale_x_continuous(labels = function(x) seq(2, length.out = length(x)))

Remove data to the left and right of local minima

I have a lot of measurements where I get data that looks something like this:
# Generate example data
x <- 1:100
y <- 100*(1-exp(-0.3*x))
x2 <- 101:200
y2 <- rev(y)
df <- data.frame("x" = c(x, x2),
"y" = c(y, y2))
df$x <- df$x + 50
rm(x, x2, y, y2)
x <- 1:50
y <- 25.91818
x2 <- 251:300
y2 <- 25.91818
df2 <- data.frame("x" = c(x, x2),
"y" = c(y, y2))
rm(x, x2, y, y2)
df <- rbind(df, df2)
rm(df2)
If I plot this I can see that there are left-most and right-most local minima.
library(ggplot2)
p <- ggplot(df, aes(x,y))+
geom_line()+
geom_point(data = data.frame("x" = c(50, 250), "y" = c(25.91818, 25.91818)),
mapping = aes(x, y), colour = "red")+
scale_y_continuous(limits = c(0, 101))
p + annotate("text", label = "minimum 1", x = 50, y = 20) +
annotate("text", label = "minimum 2", x = 250, y = 20)
What I would like to do is trim those data that are to the left of minimum 1 and right of minimum 2. It's not super straightforward as there may also be local minima between those two points, because the real data doesn't look this ideal. I would also need to apply this process to many many samples, but I think this may be trivial because I could use e.g. dplyr and group_by().
I had some luck plotting the local minima using the ggpmisc package, but I'm not sure how I can use that to actually subset my data. Just for clarity I included the code to do so below, and with the real data it looks a little better:
library(ggpmisc)
p2 <- ggplot(df, aes(x, y))+
geom_line()+
ggpmisc::stat_peaks(col="red", span=3)
p2
I hope this is clear and I'm happy to clarify any questions. Thank you in advance.
You could do this using the following steps:
Sort your data according to its x co-ordinates
On your sorted data, find the diff of the y co-ordinates, which will be 0 (or close to 0) for the flat sections at either end (as well as any flat sections in between)
Starting from the left, find the first point where the diff is not zero (or at least is above a minimal threshold). Store this index as a variable called left
Starting from the right, find the first point where the diff is not zero (or at least is above a minimal threshold). Store this index as a variable called right
Subset your data frame so it only contains the data between rows left:right
So, in your example we would have:
# Define a minimal threshold above which we are not at the minimum line
minimal_change <- 1e-6
df <- df[order(df$x),] # Step 1
left <- which(diff(df$y) > minimal_change)[1] # Step 2
right <- nrow(df) - which(diff(rev(df$y)) > minimal_change)[1] + 1 # Step 3
df <- df[left:right, ] # Step 4
Now we can plot the result:
ggplot(df, aes(x, y)) +
geom_line()+
geom_point(data = data.frame("x" = c(50, 250), "y" = c(25.91818, 25.91818)),
mapping = aes(x, y), colour = "red") +
scale_y_continuous(limits = c(0, 101)) +
scale_x_continuous(limits = c(0, 300))

NA Sawthooth signal

How could be possible to represent (plot and numerically) a sawthooth signal in R from:
y <- c(NA,NA,NA,NA,1,NA,NA,NA,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,1,NA)
where 1 represents in y the time points when the sawtooth achieves a peak (obviously to 1). Note that the distance between peaks are unequal.
I thought about using interpolation but maybe it is unnecessary.
Thank you,
You can create a sequence of falling numbers like this:
peaks <- c(0, which(!is.na(y)), length(y))
drop <- -1/max(diff(peaks))
df <- do.call(rbind, lapply(diff(peaks), function(x) {
data.frame(x = c(0, rep(1, x)),
y = c(1, seq(1 + drop, by = drop, length.out = x)))
}))
df$x <- cumsum(df$x)
Which gives this result:
plot(df$x, df$y, type = "l")
Or if you want to be fancy...
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_line(col = "deepskyblue4", size = 1.5) +
theme_bw()
Created on 2020-09-18 by the reprex package (v0.3.0)

R function using ifelse to plot returns data and plot

Reproducable example
require(ggplot2)
A <- data.frame(x = 1:5, y = 1:5)
B <- data.frame(x = 1:5, y = (1:5)^2)
plotA <- ggplot(data = A, aes(x = x, y = y)) + geom_line()
plotB <- ggplot(data = B, aes(x = x, y = y)) + geom_line()
myfn <- function(){print(plotA)}
myfn()
myfn2 <- function(printA = TRUE){ifelse(printA, print(plotA), print(plotB))}
myfn2(TRUE)
myfn() returns exactly what I would expect, specifically plotA. myfn2(TRUE) on the otherhand does return plotA, but it also returns the data behind the plot. How do I just return the plot? (With more complicated plots the amount of data returned can be significant)

Faceted time series with mean profile in ggplot2

Using the following simulated time series:
n=70
m1 = matrix(rnorm(n), ncol=7)
m2 = matrix(rnorm(n, 0,4), ncol=7)
d = data.frame(rbind(m1,m2), cl=rep(c(1,2), each=5))
(first 7 columns represent the time point, last column the class)
Is it possible to construct a faceted time series that includes the mean curve in each plot, using ggplot2?
The results should look something like this:
It might not be the most beautiful code, but I believe it gets you what you are looking for,
n=70
m1 = matrix(rnorm(n), ncol=7)
m2 = matrix(rnorm(n, 0,4), ncol=7)
d = data.frame(rbind(m1,m2), cl=rep(c(1,2), each=5))
d <- cbind(paste("d", 1:NROW(d), sep = ""), d)
names(d)[1] <- "id.var"
library(reshape)
longDF <- melt(d, id=c("cl", "id.var"))
library(ggplot2)
p <- ggplot(data = longDF, aes(x = variable, y = value, group = id.var))
p + geom_line() + stat_smooth(aes(group = 1), method = "lm",
se = FALSE, colour="red") + facet_grid(cl ~ .)
Please don't hesitate to improve my code.

Resources