Hello I am working with sigmoidal data and am attempting to plot two scatter plots on top of each other: the raw data & the first derivative of the raw data. My issue doesn't lie in plotting the data, but more-so finding a function that will create an accurate representation of the first derivative.
What have I tried: Creating a function that calculates the slope of the current & next point: (y2-y1)/(x2-x1) & assigning the value to the current temperature.
dput() of Data Frame:
structure(list(Temperature = c(4.98, 5.49, 6.01, 6.5, 7.02, 7.52, 8.03, 8.52, 9.03, 9.54, 10.04, 10.54, 11.05, 11.55, 12.05, 12.55, 13.05, 13.56, 14.06, 14.57, 15.07, 15.57, 16.07, 16.59, 17.08, 17.59, 18.08, 18.59, 19.09, 19.6, 20.1, 20.64, 21.12, 21.63, 22.13, 22.62, 23.13, 23.63, 24.13, 24.63, 25.11, 25.62, 26.11, 26.68, 27.19, 27.7, 28.2, 28.71, 29.21, 29.71, 30.21, 30.7, 31.21, 31.69, 32.19, 32.69, 33.19, 33.7, 34.19, 34.68, 35.19, 35.68, 36.19, 36.69, 37.19, 37.7, 38.19, 38.7, 39.2, 39.7, 40.21, 40.7, 41.22, 41.71, 42.21, 42.71, 43.21, 43.72, 44.22, 44.72, 45.22, 45.73, 46.23, 46.73, 47.23, 47.97, 48.71, 49.23, 49.74, 50.23, 50.73, 51.23, 51.73, 52.24, 52.75, 53.24, 53.75, 54.24, 54.75, 55.26, 55.75, 56.25, 56.75, 57.24, 57.75, 58.27, 58.77, 59.26, 59.77, 60.26, 60.78, 61.27, 61.79, 62.27, 62.77, 63.29, 63.79, 64.27, 64.78, 65.3, 65.8, 66.27, 66.8, 67.3, 67.8, 68.31, 68.78, 69.3, 69.8, 70.32, 70.81, 71.32, 71.81, 72.33, 72.82, 73.31, 73.83, 74.33, 74.82, 75.32, 75.83, 76.34, 76.84, 77.35, 77.82, 78.34, 78.85, 79.36, 79.84, 80.35, 80.85, 81.36, 81.86, 82.37, 82.86, 83.37, 83.88, 84.36, 84.88, 85.38, 85.88, 86.38, 86.89, 87.38, 87.89, 88.39, 88.89, 89.4, 89.9, 90.39, 90.9, 91.4, 91.91, 92.37, 92.89, 93.4, 93.91, 94.41, 94.91, 95.42), Absorbance = c(1.401351929, 1.403320313, 1.405181885, 1.406326294, 1.407440186, 1.409118652, 1.410095215, 1.410797119, 1.411560059, 1.412918091, 1.413970947, 1.414245605, 1.416000366, 1.415435791, 1.41809082, 1.4190979, 1.419677734, 1.420150757, 1.421966553, 1.420333862, 1.422637939, 1.422790527, 1.423461914, 1.426513672, 1.426315308, 1.426071167, 1.426467896, 1.428710938, 1.428070068, 1.428817749, 1.429733276, 1.432144165, 1.432434082, 1.433227539, 1.434616089, 1.435806274, 1.434814453, 1.436096191, 1.436096191, 1.436447144, 1.437896729, 1.4375, 1.438934326, 1.440139771, 1.440139771, 1.441741943, 1.442108154, 1.443969727, 1.444778442, 1.443862915, 1.444534302, 1.445648193, 1.444473267, 1.446395874, 1.447219849, 1.446151733, 1.449569702, 1.449066162, 1.448852539, 1.4503479, 1.451385498, 1.45111084, 1.451217651, 1.453125, 1.452560425, 1.455047607, 1.455093384, 1.456665039, 1.457977295, 1.457336426, 1.458648682, 1.46043396, 1.462158203, 1.464813232, 1.463531494, 1.468048096, 1.468643188, 1.470748901, 1.471878052, 1.476257324, 1.478057861, 1.482040405, 1.484466553, 1.486129761, 1.48815918, 1.496520996, 1.499786377, 1.504302979, 1.507217407, 1.512985229, 1.517471313, 1.524108887, 1.528198242, 1.534637451, 1.539169312, 1.546142578, 1.554611206, 1.55809021, 1.56854248, 1.572875977, 1.580307007, 1.585739136, 1.592514038, 1.600067139, 1.609222412, 1.616607666, 1.622375488, 1.631469727, 1.635635376, 1.642929077, 1.649780273, 1.655014038, 1.661483765, 1.663742065, 1.671859741, 1.677200317, 1.677108765, 1.683380127, 1.684082031, 1.687438965, 1.694595337, 1.694961548, 1.696685791, 1.696685791, 1.699768066, 1.702514648, 1.703613281, 1.705093384, 1.70022583, 1.707595825, 1.707962036, 1.709075928, 1.705276489, 1.71055603, 1.709259033, 1.70916748, 1.709732056, 1.710189819, 1.710281372, 1.711868286, 1.711883545, 1.713104248, 1.713760376, 1.711120605, 1.709716797, 1.711776733, 1.712814331, 1.714324951, 1.711120605, 1.713378906, 1.712432861, 1.716125488, 1.710006714, 1.710845947, 1.711502075, 1.711120605, 1.710006714, 1.70980835, 1.708602905, 1.708236694, 1.710189819, 1.707672119, 1.706939697, 1.710006714, 1.706192017, 1.706573486, 1.706207275, 1.705734253, 1.706207275, 1.705184937, 1.70954895, 1.705841064, 1.702972412, 1.703979492, 1.703063965, 1.709350586, 1.703338623, 1.700408936, 1.705276489, 1.705368042)), row.names = 1621:1800, class = "data.frame")
Code For my Attempt
raw = "<insert dput line>>"
columns = c("Temperature","Absorbance")
first = data.frame(matrix(nrow=0,ncol=2))
colnames(dFrame) = columns
for (i in 1:nrow(raw)) {
if(i != nrow(raw)) {
cAbs = raw[i,2]
nextAbs = raw[i+1,2]
cT = raw[i,1]
nextT = raw[i+1,1]
Temperature = raw[i,1]
Absorbance =((nextAbs-cAbs)/(nextT-cT))
t <- data.frame(Temperature,Absorbance)
names(t) <- names(raw)
first <- rbind(first, t)
}
}
ggplot()+
geom_point(data=raw, aes(x=Temperature,y=Absorbance), color = "red") +
geom_point(data = first, aes(x=Temperature,y = Absorbance), color = "blue")
What I was expecting
I was expecting an output that had the shape of something like so:
library(dplyr); library(ggplot2)
df %>%
arrange(Temperature) %>%
mutate(slope = (Absorbance - lag(Absorbance))/
(Temperature - lag(Temperature))) %>%
ggplot(aes(Temperature)) +
geom_line(aes(y= Absorbance, color = "Absorbance"), size = 1.2) +
geom_point(aes(y= slope * 20 + 1.4, color = "slope")) +
geom_smooth(aes(y= slope * 20 + 1.4, color = "slope"), se = FALSE, size = 0.8) +
scale_y_continuous(sec.axis = sec_axis(trans = ~(.x - 1.4)/20, name = "slope"))
If the data is even a little noisy, calculating the derivative by first differencing can be very noisy.
You can get a better estimate by fitting a smoothing spline function and calculating the derivative of the spline function. By differentiating a smooth function, you get a smooth derivative.
In most cases, smooth.spline with default arguments is fine, but I recommend taking a look at the result and possibly tuning the smooth.spline parameters for more or less smoothing, depending on your judgment.
edit: I learned this approach from the Numerical Recipes textbook.
library(tidyverse)
df <- tibble(
x = seq(1, 15, by = 0.1),
y = sin(x) + runif(length(x), -0.2, 0.2),
d1_diff = c(NA, diff(y) / diff(x)),
d1_spline = smooth.spline(x, y) %>% predict(x, deriv = 1) %>% pluck("y")
)
df %>%
pivot_longer(-x) %>%
mutate(name = factor(name, unique(name))) %>%
ggplot() + aes(x, value, color = name) + geom_point() + geom_line() +
facet_wrap(~name, ncol = 1)
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Created on 2022-10-26 with reprex v2.0.2
Related
I am trying to fit an equation through the given data but have been unsuccessful
data <- data.frame(x=c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
y=c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
frequency <- (data$x)
brightness <- (data$y)*2.71057477e-3
# Define the Planck blackbody equation as a function in R
B <- function(frequency, t) {
h <- 6.62607015e-34
c <- 299792458
k <- 1.380649e-23
(2 * h * frequency^3 * c^-2) / (exp((h * frequency) / (k * t)) - 1)
}
library(stats)
fit <- nls(brightness ~ B(frequency, t), data = data, start = list(t = 2.5))
# Summarize the fit
summary(fit)
# Plot the data and the fitted model
library(ggplot2)
ggplot(data, aes(x = frequency, y = brightness)) +
geom_point() +
geom_line(aes(x = frequency, y = predict(fit)))
I just need a basic scatter plot of the data with the equation fitted over it as a line
data <- data.frame(x=c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
y=c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
data$frequency <- (data$x)
data$brightness <- (data$y)*2.71057477e-3
# Define the Planck blackbody equation as a function in R
B <- function(frequency, t) {
h <- 6.62607015e-34
c <- 299792458
k <- 1.380649e-23
(2 * h * frequency^3 * c^-2) / (expm1((h * frequency) / (k * t)))
}
library(stats)
fit <- nls(brightness ~ B(frequency, t), data = data, start = list(t = 2.5))
# Summarize the fit
summary(fit)
# Plot the data and the fitted model
library(ggplot2)
ggplot(data, aes(x = frequency, y = brightness)) +
geom_point() +
geom_line(aes(x = frequency, y = predict(fit)))
This has been asked as comments on your previous questions, What are the units on frequency and brightness?
Going back to your original question the units on data$x is 1/cm so it is not frequency but wavenumber and brightness is MJy/sr which is equal to 1.256E-19 kg/s2
So if you use the correct form of the Plank's equation (There is a typo on the Wikipedia page) and perform the unit conversions you are able to perform a fit. Not necessary a great fit.
data <- data.frame(freq = c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
brightness = c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
planck <- function(freq, t, h, c, k) {
# freq = wavenumber * c
# (2 * h * freq^3) / (c^2) * 1/(exp((h * freq) / (k * t)) - 1)
(2 * h * c * freq^3) * 1/(exp((h*c*freq)/(k*t))-1)
}
# fit the data using nls
h = 6.62607e-34
c= 3e8
k = 1.38065e-23
data$freq<- data$freq*100 #(convert from 1/cm to 1/m)
fit <- nls(brightness*1.256E-19 ~ planck(freq, t, h, c, k), start = list(t = 3), data = data)
summary(fit)
#1.256E-19 to convert from kg/s^2 to MJy/sr
brightness_fit <- predict(fit, data)/1.256E-19
# plot the data and the fitted curve
plot(data$freq, data$brightness, xlab = "Frequency (1/m)", ylab = "Brightness")
lines(data$freq, y=brightness_fit, col = "red")
I have a series of x and y values that I've used to build a linear model.
I can use predict() to find a value of y from a known value of x, but I'm struggling to calculate x from a known value of y. I've seen a few posts that talk about using the approx() function, but I can't figure out how to implement it for my use case. The idea is to write a function that takes a numerical value of y as an input and returns the expected value of x that it would correspond to, ideally with a prediction interval, eg "The expected value of x is 38.90, plus or minus 0.7", or something like that.
Here's my data:
> dput(x)
c(4.66, 5.53, 5.62, 5.85, 6.26, 6.91, 7.04, 7.32, 7.43, 7.85,
8.1, 8.3, 8.34, 8.53, 8.69, 8.7, 8.73, 8.76, 8.96, 9.06, 9.42,
9.78, 10.3, 10.82, 10.98, 11.07, 11.09, 11.32, 11.75, 12.1, 12.46,
12.5, 12.99, 13.02, 13.28, 13.43, 13.96, 14, 14.07, 14.29, 14.57,
14.66, 15.21, 15.56, 15.97, 16.44, 16.8, 17.95, 18.33, 18.62,
18.92, 19.49, 19.9, 19.92, 20.14, 20.18, 21.19, 22.7, 23.25,
23.48, 23.49, 23.58, 23.7, 23.83, 23.83, 23.97, 24.05, 24.14,
24.15, 24.19, 24.32, 24.62, 24.9, 24.92, 25, 25.06, 25.31, 25.36,
25.86, 25.9, 25.95, 25.99, 26.08, 26.2, 26.27, 26.39, 26.5, 26.51,
26.68, 26.78, 26.82, 26.92, 26.92, 27.05, 27.05, 27.07, 27.32,
27.6, 27.77, 27.8, 27.91, 27.96, 27.97, 28.04, 28.05, 28.15,
28.2, 28.28, 28.37, 28.51, 28.53, 28.53, 28.66, 28.68, 28.72,
28.74, 28.82, 28.83, 28.83, 28.86, 28.89, 28.91, 29.04, 29.2,
29.35, 29.4, 29.42, 29.48, 29.53, 29.65, 29.67, 29.69, 29.7,
29.72, 29.93, 29.97, 30.03, 30.08, 30.09, 30.11, 30.18, 30.62,
30.66, 30.78, 31, 31.32, 31.43, 31.47, 31.69, 31.96, 32.33, 32.5,
32.5, 32.58, 32.7, 32.92, 33.2, 33.6, 33.72, 33.77, 33.95, 34.02,
34.08, 34.42, 34.79, 34.91, 34.99, 35.08, 35.15, 35.49, 35.6,
35.6, 35.74, 35.8, 36.05, 36.17, 36.3, 36.37, 36.84, 37.31, 37.95,
38.75, 38.78, 38.81, 38.9, 39.21, 39.31, 39.5, 42.68, 43.92,
43.95, 44.64, 45.7, 45.95, 46.25, 46.8, 49.08, 50.33, 51.23,
52.76, 53.06, 62)
> dput(y)
c(11.91, 13.491, 13.708, 13.984, 14.624, 15.688, 15.823, 16.105,
16.387, 17.004, 17.239, 17.498, 17.686, 17.844, 17.997, 18.044,
18.003, 18.191, 18.332, 18.25, 18.778, 19.237, 19.693, 20.177,
20.441, 20.876, 20.512, 20.894, 21.493, 21.539, 21.951, 21.763,
22.498, 22.451, 22.744, 22.785, 23.409, 23.314, 23.408, 23.567,
23.849, 23.978, 24.472, 24.678, 25.236, 25.547, 25.676, 26.81,
26.83, 27.275, 27.331, 27.844, 28.009, 28.244, 28.497, 28.555,
29.067, 30.412, 30.788, 30.965, 31.058, 31.423, 31.346, 31.118,
31.252, 31.258, 31.399, 31.605, 31.552, 31.881, 31.822, 31.91,
32.333, 32.174, 32.222, 32.704, 32.445, 32.557, 32.993, 32.845,
32.997, 32.909, 32.911, 33.121, 33.191, 33.156, 33.426, 33.332,
33.52, 33.526, 33.697, 33.379, 33.849, 33.726, 33.538, 33.885,
33.961, 34.284, 34.208, 33.896, 34.278, 34.355, 34.276, 34.267,
34.399, 34.507, 34.492, 34.531, 34.695, 34.642, 34.872, 34.772,
34.813, 34.942, 34.883, 34.948, 34.719, 34.983, 34.99, 35.136,
35.007, 34.026, 35.148, 35.201, 35.459, 35.418, 35.236, 35.411,
35.459, 35.5, 35.665, 35.724, 35.636, 35.667, 35.747, 35.788,
35.882, 35.9, 35.83, 36.106, 36.029, 36.364, 36.358, 36.517,
37.005, 36.74, 36.963, 36.634, 37.04, 37.48, 37.581, 37.78, 37.686,
38.262, 37.998, 37.986, 38.498, 39.296, 38.467, 38.779, 38.885,
38.72, 39.038, 38.932, 39.719, 39.654, 39.367, 40.072, 39.707,
39.742, 39.919, 40.054, 40.189, 40.197, 40.154, 40.383, 42.146,
40.595, 40.971, 41.441, 41.964, 42.328, 42.463, 42.627, 42.633,
42.721, 42.786, 42.857, 45.318, 45.665, 46.406, 46.335, 47.663,
47.181, 48.074, 48.109, 49.931, 50.377, 51.053, 52.451, 53.004,
65.889)
> model <- lm(y ~ poly(x,3,raw=TRUE))
> model
Call:
lm(formula = y ~ poly(x, 3, raw = TRUE))
Coefficients:
(Intercept) poly(x, 3, raw = TRUE)1 poly(x, 3, raw = TRUE)2 poly(x, 3, raw = TRUE)3
6.6096981 1.4736619 -0.0238935 0.0002445
Since you have fitted a low order polynomial in ordinary form (raw = TRUE), you can use polyroot to directly find x given y.
## pc: polynomial coefficients in increasing order
solvePC <- function (pc, y) {
pc[1] <- pc[1] - y
## all roots, including complex ones
roots <- polyroot(pc)
## keep real roots
Re(roots)[abs(Im(roots)) / Mod(roots) < 1e-10]
}
y0 <- 38.9 ## example y-value
x0 <- solvePC(coef(model), y0)
#[1] 34.28348
plot(x, y, col = 8)
lines(x, model$fitted, lwd = 2)
abline(h = y0)
abline(v = x0)
To get an interval estimate, we can use sampling methods.
## polyfit: an ordinary polynomial regression model fitted by lm()
rootCI <- function (polyfit, y, nSamples = 1000, level = 0.05) {
## sample regression coefficients from their joint distribution
pc <- MASS::mvrnorm(nSamples, coef(polyfit), vcov(polyfit))
## for each row (a sample), call solvePC()
roots <- apply(pc, 1, solvePC, y)
## confidence interval
quantile(roots, prob = c(0.5 * level, 1 - 0.5 * level))
}
## 95% confidence interval
rootCI(model, y = y0)
# 2.5% 97.5%
#34.17981 34.38828
You can use optim:
Predict the y values given x:
pred_y <- function(x)predict(model, data.frame(x))
pred_y(x = 10)
[1] 19.20145
Now to predict x given y, we do:
pred_x <- function(y) optim(1, \(x) (y-pred_y(x))^2, method='BFGS')[[1]]
pred_x(19.20145)
[1] 10
The uniroot function is intended for this type of problem.
#coefficients for the model
coeff <- c(6.6096981, 1.4736619, -0.0238935, 0.0002445)
#define the equation which one needs the root of
modely <- function(x, y) {
# could use the predict function here
my<-coeff[1] + coeff[2]*x + coeff[3]*x**2 + coeff[4]*x**3
y-my
}
#use the uniroot functiion
#In this example y=10
uniroot(modely, lower=-100, upper=100, y=10)
$root
[1] 2.391022
$f.root
[1] -1.208443e-08
$iter
[1] 10
$init.it
[1] NA
$estim.prec
[1] 6.103516e-05
In this case for y=10, x = 2.391022
I'm able to convert the following df to tsibble object and plot using gg_season():
library(tsibble)
library(feasts)
library(tidyr)
library(dplyr)
df <- structure(list(date = structure(c(18292, 18321, 18352, 18382,
18413, 18443, 18474, 18505, 18535, 18566, 18596, 18627, 18658,
18686, 18717, 18747, 18778, 18808, 18839, 18870, 18900, 18931,
18961, 18992), class = "Date"), value1 = c(-2.94, -40.61, -6.89,
3.04, -3.5, 0.18, 6.79, 9.08, 9.35, 10.92, 20.53, 18.04, 24.6,
154.6, 30.4, 32.1, 27.7, 32.1, 19.2, 25.4, 28, 26.9, 21.7, 20.9
), value2 = c(-12.66, 7.56, -1.36, -14.39, -16.18, 3.29, -0.69,
-1.6, 13.47, 4.83, 4.56, 7.58, 28.7, 18.9, 39.1, 44, 52, 37.1,
28.2, 32.7, 17.2, 20.4, 31.4, 19.5)), class = "data.frame", row.names = c(NA,
-24L))
# Convert to tsibble object and plot using gg_season()
df %>%
pivot_longer(value1:value2) %>%
mutate(date = yearmonth(date)) %>%
mutate(year = year(date)) %>%
as_tsibble(index = date, key = name) %>%
gg_season(value) +
geom_point() # +
# scale_color_manual(values = c('2020' = 'blue', '2021' = 'red'))
Now I try to customize colors based on different years, ie., setting blue for 2020, and red for 2021. I've added scale_color_manual(values = c('2020' = 'blue', '2021' = 'red')), but I didn't succeed yet, how could I do that correctly? Thanks.
Reference:
how to change the color in geom_point or lines in ggplot
...
gg_season(value, pal = c("#3333FF", "#FF3333")) +
geom_point()
The year scale here is a continuous one (explaining why the scale_color_manual line produces "Error: Continuous value supplied to discrete scale"). But we can give gg_season a vector of color codes to use in its pal parameter.
I have trajectories of some moving objects in a data frame for example three trajectories were considered here as shown below:
> d1 <- data.frame(X1 = c(86.46, 79.88, 73.63, 67.63, 61.78, 56.05, 50.40, 44.80, 39.25, 33.79, 28.49, 23.40, 18.53, 13.84, 9.31, 5.04, 1.12),
Y1 = c(4.28, 5.49, 6.80, 8.16, 9.59, 11.18, 13.05, 15.28, 17.86, 20.81, 24.12, 27.75, 31.68, 35.87, 40.31, 44.92, 49.61))
> d2 <- data.frame(X2 = c(0.32, 4.00, 8.00, 12.31, 16.87, 21.64, 26.60, 31.75, 37.08, 42.62, 48.36, 54.33, 60.59, 67.25, 74.48, 82.42),
Y2 = c(57.55, 52.67, 47.98, 43.49, 39.24, 35.26, 31.59, 28.24, 25.19, 22.42, 19.92, 17.65, 15.55, 13.54, 11.54, 9.47))
> d3 <- data.frame(X3 = c(0.04, 1.76, 3.61, 5.63, 7.89, 10.42, 13.19, 16.14, 19.25, 22.61, 26.29, 30.35, 34.83, 39.71, 44.97, 50.58, 56.47, 62.56, 68.79, 75.19, 81.82),
Y3 = c(58.34, 55.97, 53.49, 50.89, 48.15, 45.27, 42.35, 39.43, 36.50, 33.57, 30.66, 27.85, 25.18, 22.66, 20.27, 18.02, 15.94, 14.02, 12.22, 10.48, 8.83))
my.lists <- list(d1, d2, d3)
df12 <- do.call(qpcR:::cbind.na, my.lists)
now I would like to calculate the radius of each trajectory in the df12 data frame.
from each trajectory we can get useful parameters such as arc length and chord length as follows:
#for arc length
> library(geosphere) ## for calculating distance between successive points #step-1:
> d11 <- data.frame(Distance = sqrt(diff(d1$X1)^2 + diff(d1$Y1)^2)) #step-2:
> d11$csum1 <- ave(d11$Distance, FUN=cumsum) #step-3:
#for chord length
> sqrt((d1[1,1]-d1[17,1])^2+(d1[1,2]-d1[17,2])^2)
is there any approach to calculate the radius of a trajectory?
thanks in advance
It is not completely clear what you mean with "radius of a trajectory".
Here I assume you want to fit data points of a trajectory with a circle
that runs through these points. We will look at your first example.
library(pracma)
x <- d1$X1; y <- d1$Y1 # data points
res <- circlefit(x, y, fast=TRUE) # "fitting a circle"
res
## RMS error: 0.2696326
## [1] 93.85894 123.25466 118.51384
This computes a circle of radius r0 = 118.51384 and the center at
(93.85894, 123.25466) with an RMS error of about 0.27. To visualize:
x0 <- res[1]; y0 <- res[2]; r0 <- res[3] # center and radius
ts <- seq(0, 2*pi, length.out = 100)
xs <- x0 + r0*cos(ts); ys <- y0 + r0*sin(ts)
plot(xs, ys, type='l', col = "red", asp=1)
points(x, y); grid()
Do not use the fast = FALSE option in circlefit, the optimization
process steered by optim() runs wild.
You can also apply function CircleFitBy...() in package conicfit --
maybe that's even the better alternative. It depends also on what your
preferred 'measure of fit' is.
I'm trying to make a 2d density plot where the density is displayed for each category. For example, in the image below, we have a density plot for each day, and all the daily densities are combined into the coloured plots. These types of plots are common in the scientific literature on atmospheric sciences and aerosol pollution studies.
So far I've got this
ggplot(dat, aes(y = `dN/dlogDp`, x = date)) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
scale_fill_gradient(low="blue", high="red") +
geom_point(alpha = 0.1) +
theme_bw()
But I want to facet it by day, and I'm not sure where to start.
Here are the example data:
structure(list(date = structure(c(1359244800, 1359245400, 1359246000,
1359246600, 1359247200, 1359247800, 1359248400, 1359249000, 1359249600,
1359250200, 1359250800, 1359251400, 1359252000, 1359252600, 1359253200,
1359253800, 1359254400, 1359255000, 1359255600, 1359256200, 1359256800,
1359257400, 1359258000, 1359258600, 1359259200, 1359259800, 1359260400,
1359261000, 1359261600, 1359262200, 1359262800, 1359263400, 1359264000,
1359264600, 1359265200, 1359265800, 1359266400, 1359267000, 1359267600,
1359268200, 1359268800, 1359269400, 1359270000, 1359270600, 1359271200,
1359271800, 1359272400, 1359273000, 1359273600, 1359274200, 1359274800,
1359275400, 1359276000, 1359276600, 1359277200, 1359277800, 1359278400,
1359279000, 1359279600, 1359280200, 1359280800, 1359281400, 1359282000,
1359282600, 1359283200, 1359283800, 1359284400, 1359285000, 1359285600,
1359286200, 1359286800, 1359287400, 1359288000, 1359288600, 1359289200,
1359289800, 1359290400, 1359291000, 1359291600, 1359292200, 1359292800,
1359293400, 1359294000, 1359294600, 1359295200, 1359295800, 1359296400,
1359297000, 1359297600, 1359298200, 1359298800, 1359299400, 1359300000,
1359300600, 1359301200, 1359301800, 1359302400, 1359303000, 1359303600,
1359304200), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`dN/dlogDp` = c(49.8, 49.275, 47.4, 47.975, 48.625, 51.725,
50.7, 47.55, 45.975, 45.35, 45.4, 47.75, 49.625, 48.225,
47.65, 47.3, 48.75, 50.075, 34.725, 42.025, 48.825, 52.25,
54.05, 49.15, 34.6, 34.375, 42.85, 30.325, 43.15, 36.875,
32.85, 36.85, 35.725, 39.8, 38.65, 40.1, 42.675, 38.5, 37.2,
34.425, 25.2, 14.725, 22.675, 14.875, 37.45, 46.025, 49.275,
35.425, 30, 38.9, 28.6, 41.675, 46.05, 48.6, 62.425, 62.65,
61.7, 49.5, 70.05, 71.875, 59.4, 38.525, 36.85, 25.625, 14.675,
14.7, 14.6, 14.725, 15.6, 15, 14.6, 14.75, 15.05, 14.975,
15.425, 15.1, 15.95, 14.95, 15, 14.6, 14.725, 14.85, 15.175,
28.95, 14.975, 14.725, 16.6, 18.925, 53.225, 60.2, 56.425,
54.55, 41.4, 19.025, 19.825, 31.875, 14.85, 16.375, 16.65,
34.325), Diameter = c(14.6, 15.1, 15.7, 16.3, 16.8, 17.5,
18.1, 18.8, 19.5, 20.2, 20.9, 21.7, 22.5, 23.3, 24.1, 25,
25.9, 26.9, 27.9, 28.9, 30, 31.1, 32.2, 33.4, 34.6, 35.9,
37.2, 38.5, 40, 41.4, 42.9, 44.5, 46.1, 47.8, 49.6, 51.4,
53.3, 55.2, 57.3, 59.4, 61.5, 63.8, 66.1, 68.5, 71, 73.7,
76.4, 79.1, 82, 85.1, 88.2, 91.4, 94.7, 98.2, 101.8, 105.5,
109.4, 113.4, 117.6, 121.9, 126.3, 131, 135.8, 140.7, 145.9,
151.2, 156.8, 162.5, 168.5, 174.7, 181.1, 187.7, 194.6, 201.7,
209.1, 216.7, 224.7, 232.9, 241.4, 250.3, 259.5, 269, 278.8,
289, 299.6, 310.6, 322, 333.8, 346, 358.7, 371.8, 385.4,
399.5, 414.2, 429.4, 445.1, 461.4, 478.3, 495.8, 514)), .Names = c("date",
"dN/dlogDp", "Diameter"), row.names = c(NA, 100L), class = c("tbl_df",
"tbl", "data.frame"))
UPDATE This question is misguided and I now think that using categories isn't relevant to recreating this plot. These other questions are more closely related to the task of recreating this plot:
geom_raster interpolation with log scale
Use R to recreate contour plot made in Igor
And after I asked this question I have been keeping an updated gist of R code that combines details from the answers to these questions, and successfully replicates these plots (example output included in the gist). That gist is here: https://gist.github.com/benmarwick/9a54cbd325149a8ff405
The key steps are to strip away much of the decoration in the panels, and use scale_*_continuous(expand = c(0,0)) to make the density plot fill the entire panel. Here's an example of how to put it together:
# get the day and hour to use as facet panels
dat$day <- as.Date(dat$date)
dat$hour <- as.numeric(format(dat$date, "%H"))
library(ggplot2)
library(viridis)
# theme to suppress many details
squeeze_grid_theme <- theme_bw() + theme(axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
strip.text = element_blank(),
strip.background = element_blank(),
panel.margin.y = unit(0, "lines"),
panel.margin.x = unit(-1,"lines"),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.text.x = element_text(margin=margin(0,0,0,0,"pt")),
axis.text.y = element_text(margin=margin(0,0,0,0,"pt")))
p <- ggplot(dat, aes(z = Diameter, y = `dN/dlogDp`, x = date)) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
scale_fill_viridis() +
geom_point(alpha = 0.1) +
facet_grid(~hour) +
scale_y_continuous(expand = c(0,0)) +
scale_x_datetime(expand = c(0,0)) +
squeeze_grid_theme
p
Then we get a separate density plot for each hour, tightly squeezed together like the example plot in the question.