How to find root with more than one unknown - r

fff5=function(x)x*31*24 * (1/(31*24))*0.30 + 400*31*24 * (1/(31*24))*0.025 + ( (10 * 31 * 24 - 100*31*24/20 )/(31*24) * 6 ) - 200
fff5 function describes the cost of Amazon Elastic File System where x is the Gb of storage in Standard plan for 24hours per day 31 days, 400 is the gb of storage in EFS Infrequent Access with 24 hours per day 31 days and 10 is the MB/s throughput 24 hours per day 31 days, 200 is the maximum budget.
When i do:
uniroot(fff5, lower=0, upper=1, extendInt = "yes",maxiter = 10000)$root
[1] 533.3333
I find the highest value of GB's that can be stored in the standard plan 24 hours a day 31 days plus the cost of 400gb in the Infrequent Access and plus the cost of 10mb in the throughput with a maximum budget of 200:
fff5(533.3333)
>[1] -0.00001
> fff5(533.3334)
[1] 0.00002
How to do the same for the other two unknowns (y, z)? How to find root with more than one unknown?? How to find all the combinations of value of x y z that makes this function positive.
fff6=function(x,y,z)x*31*24 * (1/(31*24))*0.30 + y*31*24 * (1/(31*24))*0.025 + ( (z* 31 * 24 - 100*31*24/20 )/(31*24) * 6 ) - 200

The equation you propose is of the type
ax + by + cz + d = 0
that's a plan. This means that your solutions are infinite and are all points belonging to the plane defined by the equation.
Since there are infinite solutions, the only thing you can do is try to narrow the space where to look for them as much as possible.
You can choose one unknown (for example x) and treat the other two as parameters
At this point, assign reasonable values to y and z. Unfortunately I don't know what those variables indicate, but I assume they have the same order of magnitude as x found in the previous point (~ 500)
yy <- seq(400, 600, 10)
zz <- seq(400, 600, 10)
These two variables must be recombined in order to obtain a grid:
df_grid <- expand.grid(y = yy, z = zz)
ATTENTION: the longer the vectors, the heavier the calculation will be.
Now you can find the x solutions via uniroot (passing the y and z as numbers) and the solutions of your problem (within the chosen range) will be all triples x, y, z
fff6=function(x,y,z) { x*31*24 * (1/(31*24))*0.30 +
y*31*24 * (1/(31*24))*0.025 +
( (z* 31 * 24 - 100*31*24/20 )/(31*24) * 6 ) - 200
}
x_sol <- NULL
for (i in 1:nrow(df_grid)) {
xs <- uniroot(fff6, c(-10000, 10000), y = df_grid$y[i], z = df_grid$z[i] )$root
x_sol <- c(x_sol, xs)
}
df_grid$x <- x_sol
NOTE1: There are more elegant ways to avoid writing the previous for loop. For example:
x_sol <- mapply(function(y, z) uniroot(fff6, interval = c(-10000,10000),
y=y, z=z)$root, df_grid$y, df_grid$z))
df_grid$x <- x_sol
NOTE2: The range I have chosen shows negative solutions (which I suspect are not useful). A possible choice for obtaining positive solutions is:
yy <- seq(100, 300, 10)
zz <- seq(10, 30, 1)
Choose to search for solutions in an appropriate range!

Related

Adding a constraint to a simple Optimization in R

I've looked at several questions like this on SO and still cannot resolve it. But I'm looking to add a constraint where the change in volume (New Volume / Old Volume - 1 >= -10%) cannot be less than -10%.
Example
Customer
Old_volume
Elasticity
Price
X
100
-0.68
15.00
#example dataset
df <- data.frame(customer = c("X"),
old_volume = c(100),
elasticity = c(-0.68),
price = c(15.00))
#function
f3 <- function(x) {
new_vol = 100 * (1+(-0.68 * x))
new_rev = new_vol * (15.00 * (1+x))
new_cost = new_vol * 11.25
return(new_rev - new_cost) }
n_vol <- function(x) {
new_vol = 100 * (1+(-0.68 * x))
return(new_vol) }
#example run function
f3(0.25)
>>>[1] 622.5
#running the optimization
res <- optimize(f3, lower=0, upper=10, maximum = TRUE)
res$maximum
>>>0.6102941
res$objective
>>>[1] 754.9081
n_vol(0.6102941)
>>> 58.5
Therefore vol change % = 58.5/100-1 = -0.415, however I want to limit this to -0.1.
Now I want to add in a constraint where new_vol/old_vol-1 >= -0.1. However, I'm not sure how to add this in using Optimize(), optim() or lpsolve(). I was reading through the lpsolve() documentation and it seemed like the way to go, but I am confused as to what my vector of coefficients would be in the objective.in. Also looking at other packages like constrOptim, it seemed too complex for something one dimensional.
Thanks
As an example, normally on Excel, I would have a bunch of columns with these calculations and run solver with the constraint of volume change >= -0.1. However running a macro takes too long with rows > 1,000
EDIT: Added input example
You can calculate your new boundary algebraically.
change_inv <- function(frac, oldvol = 100) { oldvol * (1 + frac) }
n_vol_inv <- function(vol) { (vol / 100 - 1) / -0.68 }
n_vol_inv(vol = change_inv(frac = -0.1))
# [1] 0.1470588
max_x <- optimize(f3,
lower = 0,
upper = n_vol_inv(vol = change_inv(-0.1)),
maximum = TRUE)
max_x
# $maximum
# [1] 0.1469922
#
# $objective
# [1] 535.9664
n_vol(max_x$maximum) / 100 - 1
# [1] -0.09995469
Unfortunately, this means that the maximum is just the boundary.

Can I write my equation more efficiently in R?

I'm quite new to coding, so I don't know what the limits are for what I can do in R, and I haven't been able to find an answer for this particular kind of problem yet, although it probably has quite a simple solution.
For equation 2, A.1 is the starting value, but in each subsequent equation I need to use the previous answer (i.e. for A.3 I need A.2, for A.4 I need A.3, etc.).
A.1 <- start.x*(1-rate[1])+start.x*rate[1]
A.[2:n] <- A.[n-1]*(1-rate[2:n])+x*rate[2:n]
How do I set A.1 as the initial value, and is there a better way of writing equation 2 than to copy and paste the equation 58 times?
I've included the variables I have below:
A.1<- -13.2 # which is the same as start.x
x<- -10.18947 # x[2:n]
n<- 58
Age<-c(23:80)
rate <- function(Age){
Turnover<-(1/(1.0355*Age-3.9585))
return(Turnover)
}
I need to find the age at which A can be rounded to -11.3. I expect to see it from ages 56 to 60.
Using the new information, try this:
x<- -10.18947
n<- 58
Age <- 23:80
rate <- (1 / (1.0355 * Age - 3.9585))
A <- vector("numeric", 58)
A[1] <- -13.2
for (i in 2:n) {
A[i] <- A[i-1] * (1 - rate[i]) + x * rate[i]
}
Age[which.min(abs(A + 11.3))]
# [1] 58
plot(Age, A, type="l")
abline(h=-11.3, v=58, lty=3)
So the closest age to -11.3 is 58 years.

Hoping for help to translate a thought experiment into R code, using randomization

I'm more experienced with R than many of my peers, yet it sometimes takes hours to move a novel-to-me concept into the code line, and usually a few more to get a successful output. I don't know how to describe this in R language, so I hope you can help me- either with sample code, or pointing me in the right direction.
I have c(X1,X2,X3,...Xn) for starting variable, a non-random numeric value.
I have c(Y1,Y2,Y3,...Yn) for change variable, a non-random numeric value denoting by how much to change X, give or take, and a value between 0-10.
I have c(Z1,Z2,Z3,...Zn) which is the min and max range of X.
What I want to observe is the random sampling of all numbers X, which have all randomly had corresponding Y variable subtracted or added to them. What I'm trying to ask in this problem, is how many times will I draw X values which are exactly the X values which I initially input as well as give or take only a low Y value.
For instance,
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Zees<-c(c(115,155),c(450,474),c(510,648),c(200,244))
First iteration: X=c(135,562,579,222), second iteration: X=c(130,471,585,230)<- as you can see, X of second iteration has changed by (-5*Y1), (+3*Y2), (+2*Y3), and (+11*Y4)
What I want to output is a list of randomized X values which have changed by only a factor of their corresponding Y value, and always fall within the range of given Z values. Further, I want to examine how many times at least one- and only one- X value will be be significantly different from the corresponding,starting input X.
I feel like I'm not wording the question succinctly, but I also feel that this is why I've posted. I'm not trying to ask for hand-holding, but rather seeking advice.
I am not sure that I understood the question, do you want to reiterate the process numerous times? is it for the purpose of simulation?. Here is a start of a solution.
library(dplyr)
x <- c(135,462,579,222)
y <- c(1,3,3,2)
z.lower <- c(115, 450, 510, 200)
z.upper <- c(155, 474, 648, 244)
temp.df <- data.frame(x, y, z.lower, z.upper)
df %>%
mutate(samp = sample(seq(-10, 10, 1), nrow(temp.df))) %>% ### Sample numbers between 0 and 10
mutate(new.val = x + samp * y) %>% ### Create new X
mutate(is.bound = new.val < z.upper & new.val > z.lower) ### Check that falls in bounds
x y z.lower z.upper samp new.val is.bound
1 135 1 115 155 -10 125 TRUE
2 462 3 450 474 10 492 FALSE
3 579 3 510 648 8 603 TRUE
4 222 2 200 244 6 234 TRUE
For this dataset, this is a possibility:
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Zees<-c(c(115,155),c(450,474),c(510,648),c(200,244))
n = 10000
x_range_l <- split(Zees, rep(seq_len(length(Zees) / 2), each = 2))
mapply(function(y, x_range) sample(seq(from = x_range[1], to = x_range[2], by = y), size = n, replace = T),
Whys, x_range_l)
Note that this option depends more on the Zees than the Exes. A more complete way to do it would be:
Exes<-c(135,462,579,222)
Whys<-c(1,3,3,2)
Why_Range <- c(20, 4, 13, 11)
x_range_l <- Map(function(x, y, rng) c(x - y * rng, x + y * rng), Exes, Whys, Why_Range)
n = 10000
mapply(function(y, x_range) sample(seq(from = x_range[1], to = x_range[2], by = y), size = n, replace = T),
Whys, x_range_l)

Building formula from scientific paper

I am trying to build a mathematical formuala from a scientific paper into R.
In the example given, I used a variable of 164 microns. This was taken from the first result in Table 3 from the paper I have attached. You'll see in this table the current velocities calculated (pretty neat!).
The overall point of this, is that I wish to aquire two things: erosional velocity and depositional velocity. The paper I have attached does so using the formulae given. I am trying to build a package which can run hundreds of mean grain sizes (i.e. the variable) through these formulae. In an ideal world, my main goal is to build a code using the given formulae, which takes the variable (mean grain size) and spits out lovely data.... I think it is possible, but unfortunaley my R skills are inadequate
Link to formulae: https://imgur.com/a/DEN721v?
Link to original scietific paper: https://link.springer.com/article/10.1007/s00531-008-0312-5
There are 5 equations, all of which feed into each other. The outcome depends on one variable to which I input at the start.
I am given four known values:
p (water density given as m^3),
ps (grain density as m^3),
g (acceleration due to gravity given as m/s^2),
v (kinematic viscosity of water given as m^2/s)
and a variable (written as d) with is the mean grain size of a sediment sample.
d Variable. The mean grain size of a sediment sample.
For example, if I had a mean grain size of 164μm this would be input as 1.64e-4.
Seeking help as my outcomes are definately not even close to what they should be.
p <- 1027.4 #water density (m^3)
ps <- 2650 #grain density (m^3)
g <- 9.81 #acceleration due to gravity (m/s^2)
v <- 1.4313e-6 #kinematic viscosity of water (m^2/s)
z100 <- 100 #level above seabed (cms)
d <- 1.64e-4 #variable (mean grain size in microns)
EQUATION 1
D1 <- 9.81*(ps-p)
D2 <- (p*v)^2
D3 <- (D1/D2)
D4 <- D3^(1/3)
D5 <- D4*d
D <- D5 #Dimensionless grain size
D
EQUATION 2.3
1 - exp(-0.001374634317)
Tcr1 <- -0.020*D
Tcr2 <- 1 - exp(Tcr1)
Tcr3 <- 0.055*Tcr2
Tcr4 <- 0.30/1+(1.2*D)
Tcr5 <- Tcr4 + Tcr3
Tcr6 <- 9.81*(ps-p)
Tcr7 <- Tcr6*d
Tcr8 <- Tcr7*Tcr5
Tcr <- Tcr8 #threshold bed shear stress (N/m^2)
exp(Tcr1)
Tcr
Ucr1 <- Tcr/p
Ucr2 <- sqrt(Ucr1)
Ucr <- Ucr2 #critical shear velocity
EQUATION 3
z0 <- d/12 #roughness length
z0
EQUATION 4
Ue1 <- z100/z0
Ue2 <- Ucr/0.41
Ue3 <- log(Ue1)
Ue4 <- Ue2*Ue3
Ue <- Ue4 # critical current velocity erosional threshold from particle size distribution
Ue
EQUATION 5
Usetl1 <- 10.36^(2)
Usetl2 <- D^(3)
Usetl3 <- 1.049*Usetl2
Usetl4 <- Usetl1 + Usetl3
Usetl5 <- Usetl4^(1/2)
Usetl6 <- Usetl5 - 10.36
Usetl7 <- v/d
Usetl8 <- Usetl7*Usetl6
Usetl <- Usetl8
Results given as cm/s, and should be in and around 20-50 cm/s.
OK, so let us try it from start.
p <- 1027.4 #water density (m^3)
ps <- 2650 #grain density (m^3)
g <- 9.81 #acceleration due to gravity (m/s^2)
v <- 1.4313e-6 #kinematic viscosity of water (m^2/s)
z100 <- 100 #level above seabed (cms)
d <- 1.64e-4 #variable (mean grain size in microns)
This last value is incorrect if the mean grain size is supposed to be in microns. It is in meters.
D <- d * (g * (ps - p) / (p * v^2))^(1/3)
The result is 3.22. There was an error in your formula; (p * v)^2 instead of p * (v^2).
Tcr <- g * (ps - p) * d * (.3 / (1 + 1.2 * D) + .055 * (1 - exp(-.02 * D)) )
Again, there was an error in your formula: .3/1 + 1.2 * D instead of .3/(1 + 1.2 * D). The result is .17.
Ucr <- sqrt(Tcr / p)
Result is .01.
z0 = d / 12
Result is 1.37E-5.
Ue <- Ucr / .41 * log(z100 / z0)
Result is .50. Not sure why we calculate it, however. Are we supposed to compare it with Uset?
Uset <- v / d * ( sqrt(10.36^2 + 1.049 * D^3) - 10.36)
Result is .01 (.0137014).
This is not what you say you should get, but it is different from what you are getting. Also, assuming it is not centimeters but meters per second, then it is about 1 cm per second.
Now, let us check the units. Firstly, you need to be more careful when you specify the units. Water and grain density is not m^3; it is kg * m^-3.
First, D. The p's are silent (present in both the numerator and the denominator):
m * (m * s^-2 / (m^4 * s^-2))^(1/3) =
m * (1/m^3)^(1/3) = m / m = 1
OK, unitless.
Next, Tcr: the whole right part of the formula is unitless (depends only on D). Otherwise,
m * s^-2 * kg * m^-3 * m = (m * kg * s^-2) * m^-2 = N / m^2.
OK, also OK.
OK, the formula for Uset now. Again, right part of the right side of the equation is unitless. The rest is
m^2 * s / m = m / s
At least the units check out.
Hope this helps (somehow).

Print dates without scientific notation in rpart classification tree

When I create an rpart tree that uses a date cutoff at a node, the print methods I use - both rpart.plot and fancyRpartPlot - print the dates in scientific notation, which makes it hard to interpret the result. Here's the fancyRpartPlot:
Is there a way to print this tree with more interpretable date values? This tree plot is meaningless as all those dates look the same.
Here's my code for creating the tree and plotting two ways:
library(rpart) ; library(rpart.plot) ; library(rattle)
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree)
fancyRpartPlot(my_tree)
Using this data:
# define a random date/time selection function
generate_days <- function(N, st="2012/01/01", et="2012/12/31") {
st = as.POSIXct(as.Date(st))
et = as.POSIXct(as.Date(et))
dt = as.numeric(difftime(et,st,unit="sec"))
ev = runif(N, 0, dt)
rt = st + ev
rt
}
set.seed(1)
dat <- data.frame(
a = runif(1:100),
b = rpois(100, 5),
c = sample(c("hi","med","lo"), 100, TRUE),
d = generate_days(100)
)
From a practical standpoint, perhaps you'd like to just use days from the start of the data:
dat$d <- dat$d-as.POSIXct(as.Date("2012/01/01"))
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree,branch=1,extra=101,type=1,nn=TRUE)
This reduces the number to something manageable and meaningful (though not as meaningful as a specific date, perhaps). You may even want to round it to the nearest day or week. (I can't install GTK+ on my computer so I can't us fancyRpartPlot.)
One possible way might be to use the digits options in print to examine the tree and as.POSIXlt to convert to date:
> print(my_tree,digits=100)
n= 100
node), split, n, deviance, yval
* denotes terminal node
1) root 100 7.0885590 0.5178471
2) d>=1346478795.049611568450927734375 33 1.7406368 0.4136051
4) b>=4.5 23 1.0294497 0.3654257 *
5) b< 4.5 10 0.5350040 0.5244177 *
3) d< 1346478795.049611568450927734375 67 4.8127122 0.5691901
6) d< 1340921905.3460228443145751953125 55 4.1140164 0.5368048
12) c=hi 28 1.8580913 0.4779574
24) d< 1335890083.3241622447967529296875 18 0.7796261 0.3806526 *
25) d>=1335890083.3241622447967529296875 10 0.6012662 0.6531062 *
13) c=lo,med 27 2.0584052 0.5978317
26) d>=1337494347.697483539581298828125 8 0.4785274 0.3843749 *
27) d< 1337494347.697483539581298828125 19 1.0618892 0.6877082 *
7) d>=1340921905.3460228443145751953125 12 0.3766236 0.7176229 *
## Get date on first node
> as.POSIXlt(1346478795.049611568450927734375,origin="1970-01-01")
[1] "2012-08-31 22:53:15 PDT"
I also check the digits option in available in rpart.plot and fancyRpartPlot:
rpart.plot(my_tree,digits=10)
fancyRpartPlot(my_tree, digits=10)
I don't know how important the specific chronological date is in your classification but an alternative method would be to breakdown your dates by the characteristics. In other words, create bins based on the "year" (2012,2013,2014...) as [1,0]. "Day of the Week" (Mon, Tues, Wed, Thurs, Fri...) as [1,0]. Maybe even as "Day of Month" (1,2,3,4,5...31) as [1,0]. This adds a lot more categories to be classifying by but it eliminates the issue with working with a fully formatted date.

Resources