Flow duration curve (fdc) extract low threshold

Flow duration curve (fdc) extract low threshold - r

I am a newbie working with streamflow duration curves and the function fdc.
I am working with more than 300 series and I am interested in saving the low quartile threshold Qlow.thr value that appears in the plot generated:
Here is the reproducible example:
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98, 4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16, 13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60 ,2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19 ,3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91, 21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60, 12.73, 4.17, 6.70 ,16.45)
fdc(dat,plot = T,lQ.thr=0.8,ylab='Hm3',main='Upstream monthly duration curve',thr.shw=TRUE)
The fdc function returns a vector of probabilities, but I am not sure how to convert these probabilities to the original units and select the 80% percentile value expressed in Hm3 as I would do with pnorm, for example, in case of working with normal probabilities.
Thank you so much.

You can construct the FDC yourself by
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98,
4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16,
13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60,
2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19,
3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91,
21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60,
12.73, 4.17, 6.70 ,16.45)
dat <- sort(dat, decreasing = T)
df <- data.frame(x = 100/length(dat) * 1:length(dat), y = dat)
plot(x = df$x, y = df$y, type = "l", log = "y")
So the sorted flow data is simply plotted against the percentage exceedance scale. This scale is created by dividing 100% by the number of data points which gives us the increment for each point.
Therefore
quantile(dat, p = c(0.2, 0.8), type = 1)
gives you your desired results.
Notice that the computation of the quantile differs in fdc. It seems like they just use
p <- c(0.8, 0.2)
dat[round(p * length(dat))]
> [1] 4.21 27.40
to compute the values.

Related

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates for Bb

I am trying to fit an equation through the given data but have been unsuccessful
data <- data.frame(x=c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
y=c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
frequency <- (data$x)
brightness <- (data$y)*2.71057477e-3
# Define the Planck blackbody equation as a function in R
B <- function(frequency, t) {
h <- 6.62607015e-34
c <- 299792458
k <- 1.380649e-23
(2 * h * frequency^3 * c^-2) / (exp((h * frequency) / (k * t)) - 1)
}
library(stats)
fit <- nls(brightness ~ B(frequency, t), data = data, start = list(t = 2.5))
# Summarize the fit
summary(fit)
# Plot the data and the fitted model
library(ggplot2)
ggplot(data, aes(x = frequency, y = brightness)) +
geom_point() +
geom_line(aes(x = frequency, y = predict(fit)))
I just need a basic scatter plot of the data with the equation fitted over it as a line

data <- data.frame(x=c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
y=c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
data$frequency <- (data$x)
data$brightness <- (data$y)*2.71057477e-3
# Define the Planck blackbody equation as a function in R
B <- function(frequency, t) {
h <- 6.62607015e-34
c <- 299792458
k <- 1.380649e-23
(2 * h * frequency^3 * c^-2) / (expm1((h * frequency) / (k * t)))
}
library(stats)
fit <- nls(brightness ~ B(frequency, t), data = data, start = list(t = 2.5))
# Summarize the fit
summary(fit)
# Plot the data and the fitted model
library(ggplot2)
ggplot(data, aes(x = frequency, y = brightness)) +
geom_point() +
geom_line(aes(x = frequency, y = predict(fit)))

This has been asked as comments on your previous questions, What are the units on frequency and brightness?
Going back to your original question the units on data$x is 1/cm so it is not frequency but wavenumber and brightness is MJy/sr which is equal to 1.256E-19 kg/s2
So if you use the correct form of the Plank's equation (There is a typo on the Wikipedia page) and perform the unit conversions you are able to perform a fit. Not necessary a great fit.
data <- data.frame(freq = c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
brightness = c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
planck <- function(freq, t, h, c, k) {
# freq = wavenumber * c
# (2 * h * freq^3) / (c^2) * 1/(exp((h * freq) / (k * t)) - 1)
(2 * h * c * freq^3) * 1/(exp((h*c*freq)/(k*t))-1)
}
# fit the data using nls
h = 6.62607e-34
c= 3e8
k = 1.38065e-23
data$freq<- data$freq*100 #(convert from 1/cm to 1/m)
fit <- nls(brightness*1.256E-19 ~ planck(freq, t, h, c, k), start = list(t = 3), data = data)
summary(fit)
#1.256E-19 to convert from kg/s^2 to MJy/sr
brightness_fit <- predict(fit, data)/1.256E-19
# plot the data and the fitted curve
plot(data$freq, data$brightness, xlab = "Frequency (1/m)", ylab = "Brightness")
lines(data$freq, y=brightness_fit, col = "red")

How to plot a titration curve in R?

I would like to plot a smooth titration curve with empirical values in R. Unfortunately, I was not able to calculate the point of inflection of the curve where the equivalence point is located.
Do you have any ideas on how I can do this?
par(mfrow=c(1, 1))
vtit <- c(7.05, 9.00, 11.10, 13.00, 15.00, 17.00, 18.05, 18.95, 20.00, 21.00,
21.95, 23.05, 24.00, 25.05, 26.00, 28.10, 30.00, 33.05, 36.10, 39.05,
41.10, 42.10, 42.55, 43.15, 44.99)
vtit. <- vtit - 7.05
pH <- c(2.99, 3.48, 3.82, 4.02, 4.18, 4.30, 4.37, 4.42, 4.45, 4.51, 4.57, 4.64,
4.67, 4.74, 4.79, 4.86, 4.99, 5.18, 5.42, 5.77, 6.33, 9.01, 10.62,
11.06, 11.39)
plot(vtit., pH, type="o", lwd=2, main="Titration of acetic acid with 0.86M NaOH",
cex.main=0.8, xlab=expression(italic(V[NaOH])), ylab=expression(pH))
model <- lm(pH ~ poly(vtit.,17))
pHcurve <- predict(model)
lines(vtit., pHcurve, col='green', lwd=2)
abline(v=34.9, lty=2)

One option is to try the approx() function in order to attempt a reasonable smoothing of the curve. In the code below I am using 200 points, you may want to try increasing or decreasing this value to see how the results may change.
For this example this method works reasonably well,
plot(vtit., pH, type="o", lwd=2, main="Titration of acetic acid with 0.86M NaOH",
cex.main=0.8, xlab=expression(italic(V[NaOH])), ylab=expression(pH))
#use the approx function
#plot(approx(vtit., pH, n=200))
app<-approx(vtit., pH, n=200)
#calculate the slope
slope <- (app$y-lag(app$y))/(app$x-lag(app$x))
#find the titration point with max slope
equ_pt <- app$x[which.max(slope)]
#plot initial estimate aganist found point
abline(v=34.9, lty=2)
abline(v=equ_pt, lty=2, col="red")
Here is the original chart with the initial estimate and the current estimate in red.

Frequency table for intervals

I saved data into the object datos so I could calculate AF (absolute frequency) and RF(relative frequency) for a continuous variable in column V1. But I want to have the frequencies be in intervals.
I don't really know how to do it so I need your help. If anyone has any idea about how to do it, here is my code:
k is the number of intervals I'm using
and largo is the quantity of data I have.
read.table("datos.txt", header = FALSE)-> datos
largo<-length(datos$V1)
k<- (1+log2(largo))
k<-round(k,digits = 0)
vectordatos <- datos$v1
histograma<-hist(datos$V1,breaks=k)
FA<-table(datos$V1)
FR<-table(datos$V1)/largo
FA
FR
The datos object is as follows:
datos = structure(list(V1 = c(6.16, 5.83, 5.66, 3.63, 1.38, 9.64, 7.46,
5.34, 7.93, 8.5, 4.18, 5.18, 10.27, 5.41, 4.76, 4.67, 10.02,
7.1, 5.38, 8.55, 4.85, 8.28, 2.9, 7.18, 6.54, 5.66, 7.26, 6.45,
3.97, 6.55, 5.15, 7.83, 5.52, 7.21, 7.3, 6.19)), class = "data.frame", row .names = c(NA,
-36L))

You can use cut to create k intervals and table to represent the frequency per interval. You can use the following code:
table(cut(datos$V1,k))
Output:
(1.37,2.86] (2.86,4.34] (4.34,5.83] (5.83,7.31] (7.31,8.79] (8.79,10.3]
1 4 11 11 6 3

Getting the distance matrix back from already clustered data

I have used hclust in the TSclust package to do agglomerative hierarchical clustering. My question is, Can I get the dissimlarity (distance) matrix back from hclust? I wanted the values of the distance to rank which variable is closer to a single variable in the group of variables.
example: If (x1,x2, x3,x4,x5,x6,x7,x8,x9,x10) are the variables used to form the distance matrix, then what I wanted is the distance between x3 and the rest of variables (x3x1,x3x2,x3x4,x3x5, and so on). Can we do that? Here is the code and reproducible data.
Data:
structure(list(x1 = c(186.41, 100.18, 12.3, 14.38, 25.97, 0.06,
0, 6.17, 244.06, 19.26, 256.18, 255.69, 121.88, 75, 121.45, 11.34,
34.68, 3.09, 34.3, 26.13, 111.31), x2 = c(327.2, 8.05, 4.23,
6.7, 3.12, 1.91, 37.03, 39.17, 140.06, 83.72, 263.29, 261.22,
202.48, 23.27, 2.87, 7.17, 14.48, 3.41, 5.95, 70.56, 91.58),
x3 = c(220.18, 126.14, 98.59, 8.56, 0.5, 0.9, 17.45, 191.1,
164.64, 224.36, 262.86, 237.75, 254.88, 42.05, 9.12, 0.04,
12.22, 0.61, 61.86, 114.08, 78.94), x4 = c(90.74, 26.11,
47.86, 10.86, 3.74, 23.69, 61.79, 68.12, 87.92, 171.76, 260.98,
266.62, 96.27, 57.15, 78.89, 16.73, 6.59, 49.44, 57.21, 202.2,
67.17), x5 = c(134.09, 27.06, 7.44, 4.53, 17, 47.66, 95.96,
129.53, 40.23, 157.37, 172.61, 248.56, 160.84, 421.94, 109.93,
22.77, 2.11, 49.18, 64.13, 52.61, 180.87), x6 = c(173.17,
46.68, 6.54, 3.05, 0.35, 0.12, 5.09, 72.46, 58.19, 112.31,
233.77, 215.82, 100.63, 65.84, 2.69, 0.01, 3.63, 12.93, 66.55,
28, 61.74), x7 = c(157.22, 141.81, 19.98, 116.18, 16.55,
122.3, 62.67, 141.84, 78.3, 227.27, 340.22, 351.38, 147.73,
0.3, 56.12, 33.2, 5.51, 54.4, 82.98, 152.66, 218.26), x8 = c(274.08,
51.92, 54.86, 15.37, 0.31, 0.05, 36.3, 162.04, 171.78, 181.39,
310.73, 261.55, 237.99, 123.99, 1.92, 0.74, 0.23, 18.51,
7.68, 65.55, 171.33), x9 = c(262.71, 192.34, 2.75, 21.68,
1.69, 3.92, 0.09, 9.33, 120.36, 282.92, 236.7, 161.59, 255.44,
126.44, 7.63, 2.04, 1.02, 0.12, 5.87, 146.25, 134.11), x10 = c(82.71,
44.09, 1.52, 2.63, 4.38, 28.64, 168.43, 80.62, 20.36, 39.29,
302.31, 247.52, 165.73, 18.27, 2.67, 1.77, 23.13, 53.47,
53.14, 46.61, 86.29)), class = "data.frame", row.names = c(NA,
-21L))
Code:
as.ts(cdata)
library(dplyr) # data wrangling
library(ggplot2) # grammar of graphics
library(ggdendro) # dendrograms
library(TSclust) # cluster time series
cluster analysis
dist_ts <- TSclust::diss(SERIES = t(cdata), METHOD = "INT.PER") # note the data frame must be transposed
hc <- stats::hclust(dist_ts, method="complete") # method can be also "average" or diana (for DIvisive ANAlysis Clustering)
hcdata <- ggdendro::dendro_data(hc)
names_order <- hcdata$labels$label
# Use the following to remove labels from dendogram so not doubling up - but good for checking hcdata$labels$label <- ""
hcdata%>%ggdendro::ggdendrogram(., rotate=FALSE, leaf_labels=FALSE)

I believe the object you are looking for is stored in the variable dist_ts:
dist_ts <- TSclust::diss(SERIES = t(cdata), METHOD = "INT.PER")
print(dist_ts)

Rolling Regression Data Frame

Appreciate this may have been asked before but I have not found a clear solution to work over a data frame.
I want to run a rolling linear regression over a look back of 5 days. (small so can illustrate here)
So far I am trying:
rollingbeta <- rollapply(df,
width=5,
FUN = function(Z)
{
t = lm(formula=y_Close ~ x_Close+0, data = as.data.frame(Z));
return(t$coef)[1]
},
by.column=FALSE, align="right",fill = NA)
head(rollingbeta,100)
However, I expect to have the beta for the rolling lookback window. Instead I have and output with 10 columns.
> NCOL(rollingbeta)
[1] 10
Can anyone assist?
Here is dummy data (save to .txt and read)
df <- read.table("your_dir\df.txt",header=TRUE, sep="", stringsAsFactors=FALSE)
Date open.x high.x low.x x_Close volume.x open.y high.y low.y y_Close volume.y x.y.cor
1451 2010-01-04 57.32 58.13 57.32 57.85 442900 6.61 6.8400 6.61 6.83 833100 NA
1452 2010-01-05 57.90 58.33 57.54 58.20 436900 6.82 7.1200 6.80 7.12 904500 NA
1453 2010-01-06 58.20 58.56 58.01 58.42 850600 7.05 7.3800 7.05 7.27 759800 NA
1454 2010-01-07 58.31 58.41 57.14 57.90 463600 7.24 7.3000 7.06 7.11 557800 NA
1455 2010-01-08 57.45 58.62 57.45 58.47 206500 7.08 7.3500 6.95 7.29 588100 NA
1456 2010-01-11 58.79 59.00 57.22 57.73 331900 7.38 7.4500 7.17 7.22 450500 NA
1457 2010-01-12 57.20 57.21 56.15 56.34 428500 7.15 7.1900 6.87 7.00 694700 NA
1458 2010-01-13 56.32 56.66 54.83 56.56 577500 7.05 7.1700 6.98 7.15 528800 NA
1459 2010-01-14 56.51 57.05 55.37 55.53 368100 7.08 7.1701 7.08 7.11 279900 NA
1460 2010-01-15 56.59 56.59 55.19 55.84 417900 7.03 7.0500 6.95 7.03 407600 NA
The output should for the first rolling linear regression should be:
NA NA NA NA NA 0.1229065

A faster alternative than wibeasley's answer is to use the rollRegres package as follows
ds <- structure(list(Date = structure(
c(14613, 14614, 14615, 14616, 14617, 14620, 14621, 14622, 14623, 14624), class = "Date"),
open.x = c(57.32, 57.9, 58.2, 58.31, 57.45, 58.79, 57.2, 56.32, 56.51, 56.59),
high.x = c(58.13, 58.33, 58.56, 58.41, 58.62, 59, 57.21, 56.66, 57.05, 56.59),
low.x = c(57.32, 57.54, 58.01, 57.14, 57.45, 57.22, 56.15, 54.83, 55.37, 55.19),
x_Close = c(57.85, 58.2, 58.42, 57.9, 58.47, 57.73, 56.34, 56.56, 55.53, 55.84),
volume.x = c(442900L, 436900L, 850600L, 463600L, 206500L, 331900L, 428500L, 577500L, 368100L, 417900L),
open.y = c(6.61, 6.82, 7.05, 7.24, 7.08, 7.38, 7.15, 7.05, 7.08, 7.03),
high.y = c(6.84, 7.12, 7.38, 7.3, 7.35, 7.45, 7.19, 7.17, 7.1701, 7.05),
low.y = c(6.61, 6.8, 7.05, 7.06, 6.95, 7.17, 6.87, 6.98, 7.08, 6.95),
y_Close = c(6.83, 7.12, 7.27, 7.11, 7.29, 7.22, 7, 7.15, 7.11, 7.03),
volume.y = c(833100L, 904500L, 759800L, 557800L, 588100L, 450500L, 694700L, 528800L, 279900L, 407600L)),
row.names = c(NA, -10L), class = "data.frame")
# we get the same
library(roll)
library(rollRegres)
X <- as.matrix(ds$x_Close)
Y <- ds$y_Close
Ymat <- as.matrix(Y)
all.equal(
roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L)$coefficients[, 2],
drop(roll_regres.fit(x = X, y = Y, width = 5L)$coefs),
check.attributes = FALSE)
#R [1] TRUE
You can also fit the model with a formula as with lm using the roll_regres function
all.equal(
roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L)$coefficients[, 2],
drop(roll_regres(y_Close ~ x_Close - 1, ds, width = 5L)$coefs),
check.attributes = FALSE)
#R [1] TRUE
Here is a benchmark of the computation speed
# We add a few more observation to get an interesting example
set.seed(1)
n <- 250 * 5 # 5 years of trading data
X <- as.matrix(rnorm(n))
Y <- rnorm(n)
Ymat <- as.matrix(Y)
microbenchmark::microbenchmark(
roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L),
roll_regres.fit(x = X, y = Y, width = 5L),
times = 1e3)
#R Unit: microseconds
#R expr min lq mean median uq max neval
#R roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L) 663.7 739.9 834.2 777.1 860.2 3972.3 1000
#R roll_regres.fit(x = X, y = Y, width = 5L) 186.9 204.6 237.4 224.8 248.3 546.4 1000

Consider using the roll package.
library(magrittr); requireNamespace("roll")
ds <- readr::read_csv(
" Date, open.x, high.x, low.x, x_Close, volume.x, open.y, high.y, low.y, y_Close, volume.y
2010-01-04, 57.32, 58.13, 57.32, 57.85, 442900, 6.61, 6.8400, 6.61, 6.83, 833100
2010-01-05, 57.90, 58.33, 57.54, 58.20, 436900, 6.82, 7.1200, 6.80, 7.12, 904500
2010-01-06, 58.20, 58.56, 58.01, 58.42, 850600, 7.05, 7.3800, 7.05, 7.27, 759800
2010-01-07, 58.31, 58.41, 57.14, 57.90, 463600, 7.24, 7.3000, 7.06, 7.11, 557800
2010-01-08, 57.45, 58.62, 57.45, 58.47, 206500, 7.08, 7.3500, 6.95, 7.29, 588100
2010-01-11, 58.79, 59.00, 57.22, 57.73, 331900, 7.38, 7.4500, 7.17, 7.22, 450500
2010-01-12, 57.20, 57.21, 56.15, 56.34, 428500, 7.15, 7.1900, 6.87, 7.00, 694700
2010-01-13, 56.32, 56.66, 54.83, 56.56, 577500, 7.05, 7.1700, 6.98, 7.15, 528800
2010-01-14, 56.51, 57.05, 55.37, 55.53, 368100, 7.08, 7.1701, 7.08, 7.11, 279900
2010-01-15, 56.59, 56.59, 55.19, 55.84, 417900, 7.03, 7.0500, 6.95, 7.03, 407600"
)
runs <- roll::roll_lm(
x = as.matrix(ds$x_Close),
y = as.matrix(ds$y_Close),
width = 5,
intercept = FALSE
)
# Nested in a named-column, within a matrix, within a list.
ds$beta <- runs$coefficients[, "x1"]
ds$beta
# [1] NA NA NA NA 0.1224813
# [6] 0.1238653 0.1242478 0.1246279 0.1256553 0.1259121
Double-check the alignment of the variables in your dataset. x_Close is around 50, while y_Close is around 7. That might explain the small disparity between the expected 0.1229065 and the 0.1224813 value above.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Flow duration curve (fdc) extract low threshold - r

Related

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates for Bb

How to plot a titration curve in R?

Frequency table for intervals

Getting the distance matrix back from already clustered data

Rolling Regression Data Frame

Categories

Resources