Building formula from scientific paper - r

I am trying to build a mathematical formuala from a scientific paper into R.
In the example given, I used a variable of 164 microns. This was taken from the first result in Table 3 from the paper I have attached. You'll see in this table the current velocities calculated (pretty neat!).
The overall point of this, is that I wish to aquire two things: erosional velocity and depositional velocity. The paper I have attached does so using the formulae given. I am trying to build a package which can run hundreds of mean grain sizes (i.e. the variable) through these formulae. In an ideal world, my main goal is to build a code using the given formulae, which takes the variable (mean grain size) and spits out lovely data.... I think it is possible, but unfortunaley my R skills are inadequate
Link to formulae: https://imgur.com/a/DEN721v?
Link to original scietific paper: https://link.springer.com/article/10.1007/s00531-008-0312-5
There are 5 equations, all of which feed into each other. The outcome depends on one variable to which I input at the start.
I am given four known values:
p (water density given as m^3),
ps (grain density as m^3),
g (acceleration due to gravity given as m/s^2),
v (kinematic viscosity of water given as m^2/s)
and a variable (written as d) with is the mean grain size of a sediment sample.
d Variable. The mean grain size of a sediment sample.
For example, if I had a mean grain size of 164μm this would be input as 1.64e-4.
Seeking help as my outcomes are definately not even close to what they should be.
p <- 1027.4 #water density (m^3)
ps <- 2650 #grain density (m^3)
g <- 9.81 #acceleration due to gravity (m/s^2)
v <- 1.4313e-6 #kinematic viscosity of water (m^2/s)
z100 <- 100 #level above seabed (cms)
d <- 1.64e-4 #variable (mean grain size in microns)
EQUATION 1
D1 <- 9.81*(ps-p)
D2 <- (p*v)^2
D3 <- (D1/D2)
D4 <- D3^(1/3)
D5 <- D4*d
D <- D5 #Dimensionless grain size
D
EQUATION 2.3
1 - exp(-0.001374634317)
Tcr1 <- -0.020*D
Tcr2 <- 1 - exp(Tcr1)
Tcr3 <- 0.055*Tcr2
Tcr4 <- 0.30/1+(1.2*D)
Tcr5 <- Tcr4 + Tcr3
Tcr6 <- 9.81*(ps-p)
Tcr7 <- Tcr6*d
Tcr8 <- Tcr7*Tcr5
Tcr <- Tcr8 #threshold bed shear stress (N/m^2)
exp(Tcr1)
Tcr
Ucr1 <- Tcr/p
Ucr2 <- sqrt(Ucr1)
Ucr <- Ucr2 #critical shear velocity
EQUATION 3
z0 <- d/12 #roughness length
z0
EQUATION 4
Ue1 <- z100/z0
Ue2 <- Ucr/0.41
Ue3 <- log(Ue1)
Ue4 <- Ue2*Ue3
Ue <- Ue4 # critical current velocity erosional threshold from particle size distribution
Ue
EQUATION 5
Usetl1 <- 10.36^(2)
Usetl2 <- D^(3)
Usetl3 <- 1.049*Usetl2
Usetl4 <- Usetl1 + Usetl3
Usetl5 <- Usetl4^(1/2)
Usetl6 <- Usetl5 - 10.36
Usetl7 <- v/d
Usetl8 <- Usetl7*Usetl6
Usetl <- Usetl8
Results given as cm/s, and should be in and around 20-50 cm/s.

OK, so let us try it from start.
p <- 1027.4 #water density (m^3)
ps <- 2650 #grain density (m^3)
g <- 9.81 #acceleration due to gravity (m/s^2)
v <- 1.4313e-6 #kinematic viscosity of water (m^2/s)
z100 <- 100 #level above seabed (cms)
d <- 1.64e-4 #variable (mean grain size in microns)
This last value is incorrect if the mean grain size is supposed to be in microns. It is in meters.
D <- d * (g * (ps - p) / (p * v^2))^(1/3)
The result is 3.22. There was an error in your formula; (p * v)^2 instead of p * (v^2).
Tcr <- g * (ps - p) * d * (.3 / (1 + 1.2 * D) + .055 * (1 - exp(-.02 * D)) )
Again, there was an error in your formula: .3/1 + 1.2 * D instead of .3/(1 + 1.2 * D). The result is .17.
Ucr <- sqrt(Tcr / p)
Result is .01.
z0 = d / 12
Result is 1.37E-5.
Ue <- Ucr / .41 * log(z100 / z0)
Result is .50. Not sure why we calculate it, however. Are we supposed to compare it with Uset?
Uset <- v / d * ( sqrt(10.36^2 + 1.049 * D^3) - 10.36)
Result is .01 (.0137014).
This is not what you say you should get, but it is different from what you are getting. Also, assuming it is not centimeters but meters per second, then it is about 1 cm per second.
Now, let us check the units. Firstly, you need to be more careful when you specify the units. Water and grain density is not m^3; it is kg * m^-3.
First, D. The p's are silent (present in both the numerator and the denominator):
m * (m * s^-2 / (m^4 * s^-2))^(1/3) =
m * (1/m^3)^(1/3) = m / m = 1
OK, unitless.
Next, Tcr: the whole right part of the formula is unitless (depends only on D). Otherwise,
m * s^-2 * kg * m^-3 * m = (m * kg * s^-2) * m^-2 = N / m^2.
OK, also OK.
OK, the formula for Uset now. Again, right part of the right side of the equation is unitless. The rest is
m^2 * s / m = m / s
At least the units check out.
Hope this helps (somehow).

Related

Creating a 3D Plot of a Polynomial Function with Uniform Distributed Values

I have an equation which goes like this,
2* (1-x-a-b)^2 * x * *theta* + 2 * (1-a-b-x) * x^2 * *theta* - 2 * b * x^2 + 2 * a * (1-a-b-x)^2 = 0
I want to create a function in R, that selects a and b with restriction (a + b < 1 - a + b) from an uniform distribution. After selecting, I want it to find the solutions for x (both negative and positive).
I want to repeat this process t amount of time in a for loop where I will give the theta value as an input.
After that I want it to create a 3D density plot where solutions are shown with respect to values of a,b on two axes and x on one axis.
So far I have tried to use polynom package and solve function. But I am having hard time with R when it comes to mathematics.
You need to rewrite the polynomial in standard form a0 + a1*x + a2*x^2 + a3*x^3, then you can use the base function polyroot() to find the roots. For example,
a0 <- 2 * a * (1 - a - b)^2
a1 <- 2 * (1 - a - b)^2 * theta - 4 * a * (1 - a - b)
a2 <- -4 * (1 - a - b) * theta + 2 * (1 - a - b) * theta - 2 * b + 2 * a
a3 <- 0
So this is a quadratic equation, not a cubic as it appears at first glance.
Then use
polyroot(c(a0, a1, a2))
to find the roots. Select the real roots, and put them together into a matrix roots with columns a, b, root, then use rgl::plot3d(roots) to display them.
I think you have a typo in your restriction, so I'll ignore it, and this is the plot I get for theta == 1:
theta <- 1
a <- runif(1000)
b <- runif(1000)
a0 <- 2*a*(1-a-b)^2
a1 <- 2*(1-a-b)^2*theta -4*a*(1-a-b)
a2 <- -4*(1-a-b)*theta + 2*(1-a-b)*theta-2*b+2*a
result <- matrix(numeric(), ncol = 3, dimnames = list(NULL, c("a", "b", "root")))
for (i in seq_along(a)) {
root <- polyroot(c(a0[i], a1[i], a2[i]))
if (max(Im(root)) < 1.e8)
result <- rbind(result, cbind(a[i], b[i], Re(root)))
}
library(rgl)
plot3d(result)
Created on 2022-06-14 by the reprex package (v2.0.1)
Most of the roots are really small, but for some of them a2 is nearly zero, and then they can be very large.
You can create a table with a column for each variable and filter the rows not satisfying your equation:
library(tidyverse)
set.seed(1337)
n <- 1000
tibble(
a = runif(n),
b = runif(n)
) |>
filter(a + b < 1 - a + b) |>
expand_grid(
theta = seq(0, 1, by = 1),
x = seq(0, 1, by = 1)
) |>
filter(
2 * (1 - x - a - b)^2 * x * theta + 2 * (1 - a - b - x) * x^2 * theta - 2 *
b * x^2 + 2 * a * (1 - a - b - x)^2 == 0
)
#> # A tibble: 0 × 4
#> # … with 4 variables: a <dbl>, b <dbl>, theta <dbl>, x <dbl>
Created on 2022-06-13 by the reprex package (v2.0.0)
Unfortunately, there is no point in the sampled space satisfying your equation. This is probably due to ==0 instead of <e where e is a very small error. One needs to allow small errors in numerical sampling solutions.
Why just not solve the roots of the equation analytically?

How to find root with more than one unknown

fff5=function(x)x*31*24 * (1/(31*24))*0.30 + 400*31*24 * (1/(31*24))*0.025 + ( (10 * 31 * 24 - 100*31*24/20 )/(31*24) * 6 ) - 200
fff5 function describes the cost of Amazon Elastic File System where x is the Gb of storage in Standard plan for 24hours per day 31 days, 400 is the gb of storage in EFS Infrequent Access with 24 hours per day 31 days and 10 is the MB/s throughput 24 hours per day 31 days, 200 is the maximum budget.
When i do:
uniroot(fff5, lower=0, upper=1, extendInt = "yes",maxiter = 10000)$root
[1] 533.3333
I find the highest value of GB's that can be stored in the standard plan 24 hours a day 31 days plus the cost of 400gb in the Infrequent Access and plus the cost of 10mb in the throughput with a maximum budget of 200:
fff5(533.3333)
>[1] -0.00001
> fff5(533.3334)
[1] 0.00002
How to do the same for the other two unknowns (y, z)? How to find root with more than one unknown?? How to find all the combinations of value of x y z that makes this function positive.
fff6=function(x,y,z)x*31*24 * (1/(31*24))*0.30 + y*31*24 * (1/(31*24))*0.025 + ( (z* 31 * 24 - 100*31*24/20 )/(31*24) * 6 ) - 200
The equation you propose is of the type
ax + by + cz + d = 0
that's a plan. This means that your solutions are infinite and are all points belonging to the plane defined by the equation.
Since there are infinite solutions, the only thing you can do is try to narrow the space where to look for them as much as possible.
You can choose one unknown (for example x) and treat the other two as parameters
At this point, assign reasonable values to y and z. Unfortunately I don't know what those variables indicate, but I assume they have the same order of magnitude as x found in the previous point (~ 500)
yy <- seq(400, 600, 10)
zz <- seq(400, 600, 10)
These two variables must be recombined in order to obtain a grid:
df_grid <- expand.grid(y = yy, z = zz)
ATTENTION: the longer the vectors, the heavier the calculation will be.
Now you can find the x solutions via uniroot (passing the y and z as numbers) and the solutions of your problem (within the chosen range) will be all triples x, y, z
fff6=function(x,y,z) { x*31*24 * (1/(31*24))*0.30 +
y*31*24 * (1/(31*24))*0.025 +
( (z* 31 * 24 - 100*31*24/20 )/(31*24) * 6 ) - 200
}
x_sol <- NULL
for (i in 1:nrow(df_grid)) {
xs <- uniroot(fff6, c(-10000, 10000), y = df_grid$y[i], z = df_grid$z[i] )$root
x_sol <- c(x_sol, xs)
}
df_grid$x <- x_sol
NOTE1: There are more elegant ways to avoid writing the previous for loop. For example:
x_sol <- mapply(function(y, z) uniroot(fff6, interval = c(-10000,10000),
y=y, z=z)$root, df_grid$y, df_grid$z))
df_grid$x <- x_sol
NOTE2: The range I have chosen shows negative solutions (which I suspect are not useful). A possible choice for obtaining positive solutions is:
yy <- seq(100, 300, 10)
zz <- seq(10, 30, 1)
Choose to search for solutions in an appropriate range!

Can I write my equation more efficiently in R?

I'm quite new to coding, so I don't know what the limits are for what I can do in R, and I haven't been able to find an answer for this particular kind of problem yet, although it probably has quite a simple solution.
For equation 2, A.1 is the starting value, but in each subsequent equation I need to use the previous answer (i.e. for A.3 I need A.2, for A.4 I need A.3, etc.).
A.1 <- start.x*(1-rate[1])+start.x*rate[1]
A.[2:n] <- A.[n-1]*(1-rate[2:n])+x*rate[2:n]
How do I set A.1 as the initial value, and is there a better way of writing equation 2 than to copy and paste the equation 58 times?
I've included the variables I have below:
A.1<- -13.2 # which is the same as start.x
x<- -10.18947 # x[2:n]
n<- 58
Age<-c(23:80)
rate <- function(Age){
Turnover<-(1/(1.0355*Age-3.9585))
return(Turnover)
}
I need to find the age at which A can be rounded to -11.3. I expect to see it from ages 56 to 60.
Using the new information, try this:
x<- -10.18947
n<- 58
Age <- 23:80
rate <- (1 / (1.0355 * Age - 3.9585))
A <- vector("numeric", 58)
A[1] <- -13.2
for (i in 2:n) {
A[i] <- A[i-1] * (1 - rate[i]) + x * rate[i]
}
Age[which.min(abs(A + 11.3))]
# [1] 58
plot(Age, A, type="l")
abline(h=-11.3, v=58, lty=3)
So the closest age to -11.3 is 58 years.

How do I write optimization code in R when the initial values are not directly contained within the objective function?

I'm trying to write a piece of optimization code in R to calculate a set of unknown values for a biological question concerning drosophila flies.
The dataframe is composed of 13 columns (just showing 9 in code below for clarity) with varying numbers of rows. The first three columns contain collected data and the rest of the columns are calculated using various formulas. Two of the columns, Missing_C and Missing_D are initially populated with empty data and in the optimization problem represent the initial values.
Time.min. Prob_C Prob_D Miss_C Miss_D Event_C Event_D Risk_C Risk_D
1 0 1.00 1.00 0 0 0.00 0.00 86.00 78.00
2 5 0.98 0.97 0 0 1.93 1.98 84.07 76.02
3 16 0.84 0.95 0 0 10.67 1.90 73.40 74.12
4 17 0.50 0.75 0 0 21.02 12.85 52.38 61.27
5 20 0.30 0.50 0 0 14.97 15.32 37.42 45.95
As an example of the some of the formulas used, Event_C and Risk_C are calculated with a for loop as follows:
#define values for events_c and risk_c with for loops`
temp_events_c <-vector()
temp_risk_c <-vector()
for (i in 2:no_rows) {
temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
events_c[i] <- temp_events_c
temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
risk_c[i] <- temp_risk_c
}
From this data, I also have a single, collected, value (9.1 in this instance) which relates to the values in the table. The following function defines the relationship to this value to columns Event_C, Event_D and two columns not shown in the above, Expected_C and Expected_D where the sums of those columns are represented by x[1], x[2], x[3], x[4]:
fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]
I then would like to use a minimization algorithm i.e. slsqp from nloptr to calculate the values in the Miss_C and Miss_D which ultimately satisfy this single value. The extra code for the optimization would be something like this:
x0 <- c(Miss_C,Miss_D)
heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 9.1 # heq == 0
slsqp(x0, fn, gr = NULL,
hin = NULL, heq = heq)
Obviously, that doesn't work as the initial values are not directly contained within the function that needs to be solved and that is the point I am stuck at! I'm not sure if this is an optimization problem as such or more of a general R coding question - either way, any help would be much appreciated.
Cheers, Andrew
* Edit - complete code added as per request *
#input variables
time_vector <- c(0,5,16,17,20)
prob_c <- c(1,0.977,0.835,0.5,0.30)
prob_d <- c(1,0.974,0.949,0.75,0.50)
miss_c <- c(0,0,0,0,0)
miss_d <- c(0,0,0,0,0)
#get number of rows
no_rows <- length(time_vector)
#fill events columns with dummy data
events_c <- c(0:(no_rows - 1))
events_d <- c(0:(no_rows - 1))
#define starting number at risk
risk_c_t0 <- 86
risk_d_t0 <- 78
#add t0 risk to each column
risk_c <- risk_c_t0
risk_d <-risk_d_t0
#fill risk columns with dummy data
risk_c[2:no_rows] <- c(2:no_rows)
risk_d[2:no_rows] <- c(2:no_rows)
#re-define values for events_c and risk_c with for loops
temp_events_c <-vector()
temp_risk_c <-vector()
for (i in 2:no_rows) {
temp_events_c <- ((prob_c[i] * risk_c[i-1]) - (prob_c[i] * miss_c[i-1]) - (prob_c[i-1] * risk_c[i-1]) + (prob_c[i-1] * miss_c[i-1])) / (prob_c[i] - (2 * prob_c[i-1]))
events_c[i] <- temp_events_c
temp_risk_c <- risk_c[i-1] - miss_c[i-1] - events_c[i]
risk_c[i] <- temp_risk_c
}
#re-define values for events_t with for loops
temp_events_d <-vector()
temp_risk_d <-vector()
for (j in 2:no_rows) {
temp_events_d <- ((prob_d[j] * risk_d[j-1]) - (prob_d[j] * miss_d[j-1]) - (prob_d[j-1] * risk_d[j-1]) + (prob_d[j-1] * miss_d[j-1])) / (prob_d[j] - (2 * prob_d[j-1]))
events_d[j] <- temp_events_d
temp_risk_d <- risk_d[j-1] - miss_d[j-1] - events_d[j]
risk_d[j] <- temp_risk_d
}
#calculate total risk, events and expected
total_risk <- risk_c + risk_d
total_events <- events_c + events_d
expected_c <- (risk_c * (total_events/total_risk))
expected_d <- (risk_d * (total_events/total_risk))
#place values into dataframe
df1 <- data.frame(time_vector,prob_c,prob_d, miss_c, miss_d, events_c, events_d, risk_c, risk_d, total_risk, total_events, expected_c, expected_d)
#sum of values
sum_events_C <- sum(events_c)
sum_events_d <- sum(events_d)
sum_expected_c <- sum(expected_c)
sum_expected_d <- sum(expected_d)
#chi_sq formula
chi_sq_combo <- (((sum_events_C - sum_expected_c)^2)/sum_expected_c) + (((sum_events_d - sum_expected_d)^2)/sum_expected_d)
#### end of table calculations before sim
#x <- c(sum_events_C, sum_expected_c, sum_events_d, sum_expected_d)
#x0 <- c(miss_c,miss_d) #inital values
#fn <- function(x) ((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]
#heq <- function(x) (((x[1]-x[2])^2)/x[2] + ((x[3]-x[4])^2)/x[4]) - 6.5 # heq == 0
#slsqp(x0, fn, gr = NULL,
# hin = NULL, heq = heq)
Rephrasing the comments above, I believe the problem is to use optimization to find
two values which yield a target chi-square value. A complication which may cause problems is that there are likely many solutions that yield the target, so it might be necessary to add some other requirement to make the answer unique.
To do this, you need a function of two variables which calculates the square of the difference between the chi-square value using those variables and the target value, and then you minimize that.
For example,
fn2 <- function(x) {
c <- x[1]
d <- x[2]
chisq <- (((c - sum_expected_c)^2)/sum_expected_c) +
(((d - sum_expected_d)^2)/sum_expected_d)
(chisq - 6.5)^2
}
for (i in 1:no_rows) {
x0 <- c(df1$miss_c[i],df1$miss_d[i]) #initial values
res <- nloptr::slsqp(x0, fn2)
miss_c[i] <- res$par[1]
miss_d[i] <- res$par[2]
}
This gives the same values all 5 times, so I might not have understood you completely.

Central limit theorem in [R]

I am working with the language [R] to generate a sample of M = 32000 averages each calculated by averaging 36 independent values ​​of the random variable continuous uniform distribution (0, 1) is generated as follows:
sampleA<-1:32000
for ( i in 1:32000){
MuestraAUnif<- runif(36)
sampleA[i]<-mean(MuestraAUnif)
}
For the sample generated ask me calculate relative frequency of observed averages greater than L = 0.32 +4 * 1 / 100 and compare it with the probability (approximated by "Central limit theorem") that the average N values ​​greater than L. as follows:
L<- 0.32+4*1/100
sigma<- sqrt(1/12) #(b-a)/12
miu = 0.5 #(a+b)/2
greaterA <-sum(sampleA > L) #values of the sample greater than L are 23693
xBar<- greaterA/length(sampleA)
X <- sum(sampleA)
n<-32000
Zn<- (X - n*miu)/(sigma*sqrt(n))
cat("P(xBar >",L,") = P(Z>", Zn, ")=","1 - P (Z < ", Zn,") =",1-pnorm(Zn),"\n") #print the theoretical prob Xbar greater than L
cat("sum (sampleA >",L,")/","M=", n," para N =", 36,":",xBar, "\n") #print the sampling probability print when is greater than L
The output is:
P(xBar > 0.36 ) = P(Z> -3.961838 )= 1 - P (Z < -3.961838 ) = 0.9999628
sum (sampleA > 0.36 )/ M= 32000 para N = 36 : 0.7377187
My question is: Why are so far values​​?, Presumably they should be much closer (0.9999628 is far from 0.7377187). Am I doing something wrong with my implementation?. Excuse my English.
Melkhiah66. You did everything right only change
MuestraAUnif<- runif(2) for MuestraAUnif<- runif(32)
and it should work

Resources