Rolling Regression Data Frame

Rolling Regression Data Frame - r

Appreciate this may have been asked before but I have not found a clear solution to work over a data frame.
I want to run a rolling linear regression over a look back of 5 days. (small so can illustrate here)
So far I am trying:
rollingbeta <- rollapply(df,
width=5,
FUN = function(Z)
{
t = lm(formula=y_Close ~ x_Close+0, data = as.data.frame(Z));
return(t$coef)[1]
},
by.column=FALSE, align="right",fill = NA)
head(rollingbeta,100)
However, I expect to have the beta for the rolling lookback window. Instead I have and output with 10 columns.
> NCOL(rollingbeta)
[1] 10
Can anyone assist?
Here is dummy data (save to .txt and read)
df <- read.table("your_dir\df.txt",header=TRUE, sep="", stringsAsFactors=FALSE)
Date open.x high.x low.x x_Close volume.x open.y high.y low.y y_Close volume.y x.y.cor
1451 2010-01-04 57.32 58.13 57.32 57.85 442900 6.61 6.8400 6.61 6.83 833100 NA
1452 2010-01-05 57.90 58.33 57.54 58.20 436900 6.82 7.1200 6.80 7.12 904500 NA
1453 2010-01-06 58.20 58.56 58.01 58.42 850600 7.05 7.3800 7.05 7.27 759800 NA
1454 2010-01-07 58.31 58.41 57.14 57.90 463600 7.24 7.3000 7.06 7.11 557800 NA
1455 2010-01-08 57.45 58.62 57.45 58.47 206500 7.08 7.3500 6.95 7.29 588100 NA
1456 2010-01-11 58.79 59.00 57.22 57.73 331900 7.38 7.4500 7.17 7.22 450500 NA
1457 2010-01-12 57.20 57.21 56.15 56.34 428500 7.15 7.1900 6.87 7.00 694700 NA
1458 2010-01-13 56.32 56.66 54.83 56.56 577500 7.05 7.1700 6.98 7.15 528800 NA
1459 2010-01-14 56.51 57.05 55.37 55.53 368100 7.08 7.1701 7.08 7.11 279900 NA
1460 2010-01-15 56.59 56.59 55.19 55.84 417900 7.03 7.0500 6.95 7.03 407600 NA
The output should for the first rolling linear regression should be:
NA NA NA NA NA 0.1229065

A faster alternative than wibeasley's answer is to use the rollRegres package as follows
ds <- structure(list(Date = structure(
c(14613, 14614, 14615, 14616, 14617, 14620, 14621, 14622, 14623, 14624), class = "Date"),
open.x = c(57.32, 57.9, 58.2, 58.31, 57.45, 58.79, 57.2, 56.32, 56.51, 56.59),
high.x = c(58.13, 58.33, 58.56, 58.41, 58.62, 59, 57.21, 56.66, 57.05, 56.59),
low.x = c(57.32, 57.54, 58.01, 57.14, 57.45, 57.22, 56.15, 54.83, 55.37, 55.19),
x_Close = c(57.85, 58.2, 58.42, 57.9, 58.47, 57.73, 56.34, 56.56, 55.53, 55.84),
volume.x = c(442900L, 436900L, 850600L, 463600L, 206500L, 331900L, 428500L, 577500L, 368100L, 417900L),
open.y = c(6.61, 6.82, 7.05, 7.24, 7.08, 7.38, 7.15, 7.05, 7.08, 7.03),
high.y = c(6.84, 7.12, 7.38, 7.3, 7.35, 7.45, 7.19, 7.17, 7.1701, 7.05),
low.y = c(6.61, 6.8, 7.05, 7.06, 6.95, 7.17, 6.87, 6.98, 7.08, 6.95),
y_Close = c(6.83, 7.12, 7.27, 7.11, 7.29, 7.22, 7, 7.15, 7.11, 7.03),
volume.y = c(833100L, 904500L, 759800L, 557800L, 588100L, 450500L, 694700L, 528800L, 279900L, 407600L)),
row.names = c(NA, -10L), class = "data.frame")
# we get the same
library(roll)
library(rollRegres)
X <- as.matrix(ds$x_Close)
Y <- ds$y_Close
Ymat <- as.matrix(Y)
all.equal(
roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L)$coefficients[, 2],
drop(roll_regres.fit(x = X, y = Y, width = 5L)$coefs),
check.attributes = FALSE)
#R [1] TRUE
You can also fit the model with a formula as with lm using the roll_regres function
all.equal(
roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L)$coefficients[, 2],
drop(roll_regres(y_Close ~ x_Close - 1, ds, width = 5L)$coefs),
check.attributes = FALSE)
#R [1] TRUE
Here is a benchmark of the computation speed
# We add a few more observation to get an interesting example
set.seed(1)
n <- 250 * 5 # 5 years of trading data
X <- as.matrix(rnorm(n))
Y <- rnorm(n)
Ymat <- as.matrix(Y)
microbenchmark::microbenchmark(
roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L),
roll_regres.fit(x = X, y = Y, width = 5L),
times = 1e3)
#R Unit: microseconds
#R expr min lq mean median uq max neval
#R roll_lm(x = X, y = Ymat, intercept = FALSE, width = 5L) 663.7 739.9 834.2 777.1 860.2 3972.3 1000
#R roll_regres.fit(x = X, y = Y, width = 5L) 186.9 204.6 237.4 224.8 248.3 546.4 1000

Consider using the roll package.
library(magrittr); requireNamespace("roll")
ds <- readr::read_csv(
" Date, open.x, high.x, low.x, x_Close, volume.x, open.y, high.y, low.y, y_Close, volume.y
2010-01-04, 57.32, 58.13, 57.32, 57.85, 442900, 6.61, 6.8400, 6.61, 6.83, 833100
2010-01-05, 57.90, 58.33, 57.54, 58.20, 436900, 6.82, 7.1200, 6.80, 7.12, 904500
2010-01-06, 58.20, 58.56, 58.01, 58.42, 850600, 7.05, 7.3800, 7.05, 7.27, 759800
2010-01-07, 58.31, 58.41, 57.14, 57.90, 463600, 7.24, 7.3000, 7.06, 7.11, 557800
2010-01-08, 57.45, 58.62, 57.45, 58.47, 206500, 7.08, 7.3500, 6.95, 7.29, 588100
2010-01-11, 58.79, 59.00, 57.22, 57.73, 331900, 7.38, 7.4500, 7.17, 7.22, 450500
2010-01-12, 57.20, 57.21, 56.15, 56.34, 428500, 7.15, 7.1900, 6.87, 7.00, 694700
2010-01-13, 56.32, 56.66, 54.83, 56.56, 577500, 7.05, 7.1700, 6.98, 7.15, 528800
2010-01-14, 56.51, 57.05, 55.37, 55.53, 368100, 7.08, 7.1701, 7.08, 7.11, 279900
2010-01-15, 56.59, 56.59, 55.19, 55.84, 417900, 7.03, 7.0500, 6.95, 7.03, 407600"
)
runs <- roll::roll_lm(
x = as.matrix(ds$x_Close),
y = as.matrix(ds$y_Close),
width = 5,
intercept = FALSE
)
# Nested in a named-column, within a matrix, within a list.
ds$beta <- runs$coefficients[, "x1"]
ds$beta
# [1] NA NA NA NA 0.1224813
# [6] 0.1238653 0.1242478 0.1246279 0.1256553 0.1259121
Double-check the alignment of the variables in your dataset. x_Close is around 50, while y_Close is around 7. That might explain the small disparity between the expected 0.1229065 and the 0.1224813 value above.

Related

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates for Bb

I am trying to fit an equation through the given data but have been unsuccessful
data <- data.frame(x=c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
y=c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
frequency <- (data$x)
brightness <- (data$y)*2.71057477e-3
# Define the Planck blackbody equation as a function in R
B <- function(frequency, t) {
h <- 6.62607015e-34
c <- 299792458
k <- 1.380649e-23
(2 * h * frequency^3 * c^-2) / (exp((h * frequency) / (k * t)) - 1)
}
library(stats)
fit <- nls(brightness ~ B(frequency, t), data = data, start = list(t = 2.5))
# Summarize the fit
summary(fit)
# Plot the data and the fitted model
library(ggplot2)
ggplot(data, aes(x = frequency, y = brightness)) +
geom_point() +
geom_line(aes(x = frequency, y = predict(fit)))
I just need a basic scatter plot of the data with the equation fitted over it as a line

data <- data.frame(x=c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
y=c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
data$frequency <- (data$x)
data$brightness <- (data$y)*2.71057477e-3
# Define the Planck blackbody equation as a function in R
B <- function(frequency, t) {
h <- 6.62607015e-34
c <- 299792458
k <- 1.380649e-23
(2 * h * frequency^3 * c^-2) / (expm1((h * frequency) / (k * t)))
}
library(stats)
fit <- nls(brightness ~ B(frequency, t), data = data, start = list(t = 2.5))
# Summarize the fit
summary(fit)
# Plot the data and the fitted model
library(ggplot2)
ggplot(data, aes(x = frequency, y = brightness)) +
geom_point() +
geom_line(aes(x = frequency, y = predict(fit)))

This has been asked as comments on your previous questions, What are the units on frequency and brightness?
Going back to your original question the units on data$x is 1/cm so it is not frequency but wavenumber and brightness is MJy/sr which is equal to 1.256E-19 kg/s2
So if you use the correct form of the Plank's equation (There is a typo on the Wikipedia page) and perform the unit conversions you are able to perform a fit. Not necessary a great fit.
data <- data.frame(freq = c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
brightness = c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
planck <- function(freq, t, h, c, k) {
# freq = wavenumber * c
# (2 * h * freq^3) / (c^2) * 1/(exp((h * freq) / (k * t)) - 1)
(2 * h * c * freq^3) * 1/(exp((h*c*freq)/(k*t))-1)
}
# fit the data using nls
h = 6.62607e-34
c= 3e8
k = 1.38065e-23
data$freq<- data$freq*100 #(convert from 1/cm to 1/m)
fit <- nls(brightness*1.256E-19 ~ planck(freq, t, h, c, k), start = list(t = 3), data = data)
summary(fit)
#1.256E-19 to convert from kg/s^2 to MJy/sr
brightness_fit <- predict(fit, data)/1.256E-19
# plot the data and the fitted curve
plot(data$freq, data$brightness, xlab = "Frequency (1/m)", ylab = "Brightness")
lines(data$freq, y=brightness_fit, col = "red")

Getting an error when using gather to create clustered bar chart

I am trying to create a clustered bar chart and am trying to use the gather function to get the correct bars to group together. When I do this, I get the error that the gather function could not be found. I have dplyr and magrittr installed. Any thoughts on how to make this work or if there is a better way to create the bar chart grouped by test and date?
Plaster <- Plaster_2019_Data %>%
gather("pH", "Temperature", "Surface", -Date)
Data:
Surface pH Temperature Date
12.08 8.56 11.16 5/13/2019
11.68 8.90 8.76 5/29/2019
8.69 9.07 14.65 6/10/2019
2.26 7.49 17.51 6/24/2019
4.54 7.77 23.82 7/8/2019
2.13 8.17 25.29 8/5/2019
6.34 8.62 26.50 8/19/2019
9.33 9.03 24.31 9/4/2019
10.98 8.58 21.02 9/16/2019
9.59 8.61 17.33 9/30/2019
16.07 8.70 10.39 10/14/2019
9.12 8.07 6.38 11/14/2019

We can use require to install and load the package tidyr as gather is from tidyr
require('tidyr')

As mentioned by #akrun, you need tidyr. Moreover, the function pivot_longer is dedicated to replace in a near future the function gather (https://tidyr.tidyverse.org/reference/gather.html).
Moreover, as the range of values between your test is quite different, I would suggest to use facet_wrap to make a nice plot.
Altogether, you can write something like that:
df$Date = as.Date(df$Date, format = "%m/%d/%Y")
library(tidyr)
library(ggplot2)
library(dplyr)
df %>% pivot_longer(., -Date, names_to = "Test", values_to = "value") %>%
ggplot(aes(x = Date, y = value, fill = Test))+
geom_bar(stat = "identity", position = position_dodge())+
facet_wrap(.~Test, scales = "free") +
scale_x_date(date_labels = "%b %d",
date_breaks = "2 weeks")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Data
structure(list(Surface = c(12.08, 11.68, 8.69, 2.26, 4.54, 2.13,
6.34, 9.33, 10.98, 9.59, 16.07, 9.12), pH = c(8.56, 8.9, 9.07,
7.49, 7.77, 8.17, 8.62, 9.03, 8.58, 8.61, 8.7, 8.07), Temperature = c(11.16,
8.76, 14.65, 17.51, 23.82, 25.29, 26.5, 24.31, 21.02, 17.33,
10.39, 6.38), Date = structure(c(18029, 18045, 18057, 18071,
18085, 18113, 18127, 18143, 18155, 18169, 18183, 18214), class = "Date")), row.names = c(NA,
-12L), class = "data.frame")

Using predicted log change values from regression to predict future prices

I had this problem I was hoping someone could help me with. I have a data set which shows the prices of multiple goods (each a column) on a daily basis for some years. I've run a regression like below for part of my data frame, and then created predicted values for the rest of the time period I have. My predicted values are the log changes in price for pet. For clarification, I have all of the actual values for price of pet already, however I am just trying to predict them using this method.
lin <- lm(diff(log(pet)) ~ diff(log(bron)) + diff(log(yen)) +
diff(yal) - 1, data = codData[1:634,])
predictions <- (predict(lin, newdata = codData[635:1025,]))
My problem now is that I want to get the actual predicted value of the price of pet, which I would normally do by multiplying the first predicted log change + 1 by the first price of pet which I want to predict, which would get me the first predicted value of the price of pet. I would then multiply the second predicted log change + 1 by that newly predicted value of pet, and so on and so forth. I'm not sure how I can do this in R though. Does anyone have any ideas?
Thanks ahead of time!
Code to get sample data
codData <- structure(list(date = structure(c(1306800000, 1306886400, 1306972800,
1307059200, 1307318400, 1307404800, 1307491200, 1307577600, 1307664000,
1307923200, 1308009600, 1308096000, 1308182400, 1308268800, 1308528000,
1308614400, 1308700800, 1308787200, 1308873600, 1309132800, 1309219200,
1309305600, 1309392000, 1309478400, 1309824000, 1309910400, 1309996800,
1310083200, 1310342400, 1310428800, 1310515200, 1310601600, 1310688000,
1310947200, 1311033600, 1311120000, 1311206400, 1311292800, 1311552000,
1311638400, 1311724800, 1311811200, 1311897600, 1312156800, 1312243200,
1312329600, 1312416000, 1312502400, 1312761600, 1312848000, 1312934400,
1313020800, 1313107200, 1313366400, 1313452800, 1313539200, 1313625600,
1313712000, 1313971200, 1314057600, 1314144000, 1314230400, 1314316800,
1314576000, 1314662400, 1314748800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), yal = c(3.05, 2.96, 3.04, 2.99, 3.01, 3.01,
2.98, 3.01, 2.99, 3, 3.11, 2.98, 2.93, 2.94, 2.97, 2.99, 3.01,
2.93, 2.88, 2.95, 3.05, 3.14, 3.18, 3.22, 3.16, 3.12, 3.17, 3.03,
2.94, 2.92, 2.92, 2.98, 2.94, 2.94, 2.91, 2.96, 3.03, 2.99, 3.03,
2.99, 3.01, 2.98, 2.82, 2.77, 2.66, 2.64, 2.47, 2.58, 2.4, 2.2,
2.17, 2.34, 2.24, 2.29, 2.23, 2.17, 2.08, 2.07, 2.1, 2.15, 2.29,
2.23, 2.19, 2.28, 2.19, 2.23), sp = c(1345.2, 1314.55, 1312.9399,
1300.16, 1286.17, 1284.9399, 1279.5601, 1289, 1270.98, 1271.83,
1287.87, 1265.42, 1267.64, 1271.5, 1278.36, 1295.52, 1287.14,
1283.5, 1268.45, 1280.1, 1296.67, 1307.41, 1320.64, 1339.67,
1337.88, 1339.22, 1353.22, 1343.8, 1319.49, 1313.64, 1317.72,
1308.87, 1316.14, 1305.4399, 1326.73, 1325.84, 1343.8, 1345.02,
1337.4301, 1331.9399, 1304.89, 1300.67, 1292.28, 1286.9399, 1254.05,
1260.34, 1200.0699, 1199.38, 1119.46, 1172.53, 1120.76, 1172.64,
1178.8101, 1204.49, 1192.76, 1193.89, 1140.65, 1123.53, 1123.8199,
1162.35, 1177.6, 1159.27, 1176.8, 1210.08, 1212.92, 1218.89),
pet = c(102.63, 100.13, 100.54, 100.49, 98.85, 98.98, 100.93,
101.71, 99.02, 96.98, 99.17, 95.29, 94.96, 92.96, 93.25,
93.4, 94.59, 91.75, 91.25, 90.81, 92.89, 94.93, 94.92, 94.7,
96.8, 96.64, 98.49, 96.31, 95.05, 96.77, 97.89, 95.73, 97.32,
96, 97.7, 98.14, 99.25, 99.82, 99.13, 99.44, 97.31, 97.13,
95.92, 95.33, 93.25, 91.93, 86.44, 87.07, 80.74, 81.12, 81.55,
85.46, 85.25, 87.89, 86.93, 87.45, 81.58, 82.63, 84.12, 86.12,
85.17, 84.94, 85.42, 87.45, 88.76, 88.91), bron = c(419.25,
409.5, 409.7, 412.4, 412.25, 414.65, 411.25, 410.5, 404.45,
403.38, 415.85, 411.63, 412.3, 410.05, 407.7, 408.35, 405.85,
406.58, 408.45, 407.2, 409.85, 421.8, 426.45, 430.25, 432.95,
432.4, 442.15, 439.08, 434.5, 438.52, 438.52, 437.95, 440.73,
440.55, 446.45, 442.42, 437.92, 440.2, 440.33, 447.3, 443.15,
447.3, 448.3, 441, 438.3, 433.65, 421.4, 412.35, 393.05,
403.55, 389.5, 404.1, 399.5, 403.67, 399.25, 404, 394.13,
396.85, 393.98, 401.25, 401.27, 409.17, 409.8, 409.5, 414.7,
418.2), yen = c(929.87, 932.16, 927.79, 922.76, 925.77, 921.77,
925.73, 926.87, 934, 929.98, 928.28, 939.99, 939.99, 934.44,
934.93, 929.78, 932.43, 936.68, 940.12, 938.95, 935.56, 930.47,
927.23, 925.86, 929.43, 932.42, 930.49, 931.15, 939.64, 938.86,
929.71, 930.59, 929.31, 931.59, 929.23, 925.3, 919.2, 919.95,
918.83, 912.58, 917.17, 919.02, 915.52, 918.61, 920.61, 918.09,
932.46, 926.3, 931.17, 921.45, 931.42, 929.27, 929.41, 922.31,
923.17, 920.27, 926.05, 924.52, 926.53, 923.23, 926.24, 929.12,
923.74, 922.74, 924.79, 925.04)), row.names = c(NA, -66L), class = c("tbl_df",
"tbl", "data.frame"))
Data picture

I recommend you read into these post to get familiar with the fable package https://www.mitchelloharawild.com/blog/fable/
library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)
df_example <- codData %>%
mutate(simple_date = as_date(date)) %>%
select(-date) %>%
as_tsibble(index = simple_date) %>%
tsibble::fill_gaps() %>%
tidyr::fill(yal:yen)
fit <- df_example %>%
filter(simple_date < yearmonth("2011 08")) %>%
model(linear_reg = TSLM(log(pet) ~ log(bron) + log(yen) + log(yal)))
forecasts_result <- fit %>% forecast(df_example)
forecasts_result %>%
filter(simple_date >= yearmonth("2011 08")) %>%
autoplot(df_example)
forecasts_result %>%
accuracy(df_example)
#> # A tibble: 1 x 9
#> .model .type ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 linear_reg Test -3.34 6.30 4.70 -3.97 5.34 NaN 0.947
Another option is using the VAR model
var_fit <- df_example %>%
filter(simple_date < yearmonth("2011 08")) %>%
model(VAR_MODEL =VAR(vars(yal,pet,bron,yen) ~ AR(7)))
forecast_result_var <- var_fit %>%
forecast(h = 31)
forecast_result_var %>%
autoplot(df_example)
forecast_result_var %>%
accuracy(df_example)
#> # A tibble: 4 x 10
#> .model .response .type ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 VAR_MODEL yal Test -0.448 0.477 0.450 -20.2 20.2 4.53 0.713
#> 2 VAR_MODEL pet Test -10.7 11.3 10.7 -12.6 12.6 4.33 0.639
#> 3 VAR_MODEL bron Test -32.0 34.7 32.0 -7.95 7.95 4.75 0.746
#> 4 VAR_MODEL yen Test 41.8 45.3 41.8 4.52 4.52 6.19 0.827
Created on 2020-01-05 by the reprex package (v0.3.0)

Flow duration curve (fdc) extract low threshold

I am a newbie working with streamflow duration curves and the function fdc.
I am working with more than 300 series and I am interested in saving the low quartile threshold Qlow.thr value that appears in the plot generated:
Here is the reproducible example:
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98, 4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16, 13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60 ,2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19 ,3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91, 21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60, 12.73, 4.17, 6.70 ,16.45)
fdc(dat,plot = T,lQ.thr=0.8,ylab='Hm3',main='Upstream monthly duration curve',thr.shw=TRUE)
The fdc function returns a vector of probabilities, but I am not sure how to convert these probabilities to the original units and select the 80% percentile value expressed in Hm3 as I would do with pnorm, for example, in case of working with normal probabilities.
Thank you so much.

You can construct the FDC yourself by
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98,
4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16,
13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60,
2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19,
3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91,
21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60,
12.73, 4.17, 6.70 ,16.45)
dat <- sort(dat, decreasing = T)
df <- data.frame(x = 100/length(dat) * 1:length(dat), y = dat)
plot(x = df$x, y = df$y, type = "l", log = "y")
So the sorted flow data is simply plotted against the percentage exceedance scale. This scale is created by dividing 100% by the number of data points which gives us the increment for each point.
Therefore
quantile(dat, p = c(0.2, 0.8), type = 1)
gives you your desired results.
Notice that the computation of the quantile differs in fdc. It seems like they just use
p <- c(0.8, 0.2)
dat[round(p * length(dat))]
> [1] 4.21 27.40
to compute the values.

R - Combine data frames to a table, separating values with a slash ("/")

I am working with the data frames shown below:
tbl45 <- structure(list(`2010's` = c(0.48, 1.45, 33.33, 25.6, 32.37, 6.76
), `2020's` = c(0.48, 0.97, 31.88, 36.71, 28.5, 1.45), `2030's` = c(0.48,
1.93, 27.54, 34.3, 33.33, 2.42), `2040's` = c(0.48, 1.93, 33.33,
26.57, 28.5, 9.18), `2050's` = c(0.48, 1.93, 33.33, 26.09, 32.85,
5.31), `2060's` = c(0.48, 3.38, 25.6, 32.37, 36.23, 1.93), `2070's` = c(0.48,
1.93, 33.82, 28.99, 31.4, 3.38), `2080's` = c(0.48, 2.42, 34.3,
31.4, 28.99, 2.42), `2090's` = c(0.48, 2.42, 31.4, 33.33, 29.95,
2.42)), .Names = c("2010's", "2020's", "2030's", "2040's", "2050's",
"2060's", "2070's", "2080's", "2090's"), row.names = c("[0,100]",
"(100,200]", "(200,300]", "(300,400]", "(400,500]", "(500,600]"
), class = "data.frame")
tbl85 <- structure(list(`2010's` = c(0.48, 1.45, 31.4, 30.43, 34.78, 1.45
), `2020's` = c(0.48, 1.45, 36.23, 29.95, 30.43, 1.45), `2030's` = c(0.48,
1.93, 32.37, 28.02, 34.3, 2.9), `2040's` = c(0.48, 2.9, 30.43,
33.33, 31.4, 1.45), `2050's` = c(0.48, 2.9, 32.85, 30.43, 29.47,
3.86), `2060's` = c(0.48, 4.83, 33.33, 30.43, 26.57, 4.35), `2070's` = c(0.48,
5.8, 31.88, 36.23, 24.15, 1.45), `2080's` = c(0.48, 5.8, 35.27,
33.82, 23.19, 1.45), `2090's` = c(1.45, 8.21, 38.16, 32.85, 17.87,
1.45)), .Names = c("2010's", "2020's", "2030's", "2040's", "2050's",
"2060's", "2070's", "2080's", "2090's"), row.names = c("[0,100]",
"(100,200]", "(200,300]", "(300,400]", "(400,500]", "(500,600]"
), class = "data.frame")
and I would like to combine them in one single table (or data frame), with the values separated by a slash ("/") or parenthesis. Then I will save it as a .xls file and copy the table to word.
The final result would be something like this (I am showing only the first column for the simplicity sake):
2010's
[0,100] 0.48 / 0.48
(100,200] 1.45 / 1.45
(200,300] 33.33 / 31.40
(300,400] 25.60 / 30.43
(400,500] 32.37 / 34.78
(500,600] 6.76 / 1.45
How can I achieve that using R?

Try this:
res <- mapply(function(x,y) paste(x,y, sep = "/"), tbl45, tbl85)
rownames(res) <- rownames(tbl45)
res
2010's 2020's 2030's 2040's 2050's 2060's
[0,100] "0.48/0.48" "0.48/0.48" "0.48/0.48" "0.48/0.48" "0.48/0.48" "0.48/0.48"
(100,200] "1.45/1.45" "0.97/1.45" "1.93/1.93" "1.93/2.9" "1.93/2.9" "3.38/4.83"
(200,300] "33.33/31.4" "31.88/36.23" "27.54/32.37" "33.33/30.43" "33.33/32.85" "25.6/33.33"
(300,400] "25.6/30.43" "36.71/29.95" "34.3/28.02" "26.57/33.33" "26.09/30.43" "32.37/30.43"
(400,500] "32.37/34.78" "28.5/30.43" "33.33/34.3" "28.5/31.4" "32.85/29.47" "36.23/26.57"
(500,600] "6.76/1.45" "1.45/1.45" "2.42/2.9" "9.18/1.45" "5.31/3.86" "1.93/4.35"
2070's 2080's 2090's
[0,100] "0.48/0.48" "0.48/0.48" "0.48/1.45"
(100,200] "1.93/5.8" "2.42/5.8" "2.42/8.21"
(200,300] "33.82/31.88" "34.3/35.27" "31.4/38.16"
(300,400] "28.99/36.23" "31.4/33.82" "33.33/32.85"
(400,500] "31.4/24.15" "28.99/23.19" "29.95/17.87"
(500,600] "3.38/1.45" "2.42/1.45" "2.42/1.45"

We could do this by unlisting both the datasets and then paste
res <- tbl45
res[] <- paste(unlist(tbl45), unlist(tbl85), sep='/')
res
# 2010's 2020's 2030's 2040's 2050's
#[0,100] 0.48/0.48 0.48/0.48 0.48/0.48 0.48/0.48 0.48/0.48
#(100,200] 1.45/1.45 0.97/1.45 1.93/1.93 1.93/2.9 1.93/2.9
#(200,300] 33.33/31.4 31.88/36.23 27.54/32.37 33.33/30.43 33.33/32.85
#(300,400] 25.6/30.43 36.71/29.95 34.3/28.02 26.57/33.33 26.09/30.43
#(400,500] 32.37/34.78 28.5/30.43 33.33/34.3 28.5/31.4 32.85/29.47
#(500,600] 6.76/1.45 1.45/1.45 2.42/2.9 9.18/1.45 5.31/3.86
# 2060's 2070's 2080's 2090's
#[0,100] 0.48/0.48 0.48/0.48 0.48/0.48 0.48/1.45
#(100,200] 3.38/4.83 1.93/5.8 2.42/5.8 2.42/8.21
#(200,300] 25.6/33.33 33.82/31.88 34.3/35.27 31.4/38.16
#(300,400] 32.37/30.43 28.99/36.23 31.4/33.82 33.33/32.85
#(400,500] 36.23/26.57 31.4/24.15 28.99/23.19 29.95/17.87
#(500,600] 1.93/4.35 3.38/1.45 2.42/1.45 2.42/1.45

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rolling Regression Data Frame - r

Related

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates for Bb

Getting an error when using gather to create clustered bar chart

Using predicted log change values from regression to predict future prices

Flow duration curve (fdc) extract low threshold

R - Combine data frames to a table, separating values with a slash ("/")

Categories

Resources