Understanding what is/ how to use prior.scale in Prophet? - r

I'm reviewing Prophet's documentation on the add_regressors method and came across something called a prior.scale, which is explained as Float scale for the normal prior. If not provided, holidays.prior.scale will be used..
I'm looking for information on what this is/ how it effects predictions, and how to assess/tune the values that it is set to.

Usually if you see some type of scale parameter associated with a prior, it's talking essentially about the standard deviation or spread of the prior. In this case, when you're adding a regressor, I'm guessing the prior mean is that the coefficient is equal to 0. The scale parameter is essentially a check on how large of an effect you expect this regressor to have. If you think it should have a relatively large effect, increase it and vice versa. For what it's worth, here's the description for holidays.prior.scale:
Parameter modulating the strength of the holiday components model, unless
overridden in the holidays input.

Related

Min Max component, objective function

I would like to perform some optimizations by minimizing the maximum of a specific path variable within Dymos. or the maximum of the absolute of such a variable.
In linear programming methods, this can be done by introducing slack variables.
Do you know if this has been attempted before with Dymos, or if there was a reason not to include it?
I understand gradient based methods are not entirely suitable for these problems, though I think some "functions" can be introduced to mitigate this.
For example,
The space shuttle reentry problem from [Betts][1] used as a [test example][2] in dymos, the original source contains an example where the maximum heat flux is minimized. Such functionality could be implemented with the "loc" argument as:
phase.add_objective('q_c', loc='max')
[1]: J. Betts. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. Society for Industrial and Applied Mathematics, second edition, 2010. URL: https://epubs.siam.org/doi/abs/10.1137/1.9780898718577, arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898718577, doi:10.1137/1.9780898718577.
[2]: https://openmdao.github.io/dymos/examples/reentry/reentry.html
This has been done with pseudospectral methods before. Dymos currently doesn't have any direct way of implementing this, for a few reasons:
As you said, doing this naively can introduce discontinuous gradients that confuse the optimizer. When the node at which the maximum occurs switches, this tends to cause a sharp edge discontinuity in the gradient.
Since the pseudospectral methods are discrete, you cannot guarantee that the maximum will occur at a node. It's often fine to assume it does, but sometimes your requirements might demand more precision.
There are two possible ways to get around this.
The KSComp in OpenMDAO can be used as a "differentiable maximum". Add one after the trajectory, feed it the timeseries data for the output of interest, and set it up such that it returns a smooth approximation to the maximum. The KS function is a bit conservative, so it won't pick out the precise maximum, but depending on the value of the rho option it can be tuned to get pretty close.
When a more precise value of a maximum is needed, it's pretty common to set up a trajectory such that a phase ends when the maximum or minimum is reached.
If the variable whose maximum is being sought is a state, this can be done by adding a boundary constraint on the rate source for that state.
This ensures that the maximum occurs at the first or last node in the phase (depending on if its an initial or final boundary constraint). That lets you more accurately capture its value.
If the variable being sought is not a state, its possible to use the polynomials that are used for fitting states and controls in a phase to interpolate the variable of interest. By then taking the time derivative of that polynomial we can get a reasonably good approximation for its rate. The master branch of dymos has a method add_timeseries_rate_output that does this. And soon, within a few weeks hopefully, we'll add add_boundary_rate_constraint so that these interpolated rates can be easily used as boundary constraints.
In the meantime, you should be able to achieve this by adding the timeseries rate output and then manually applying the OpenMDAO method 'add_constraint' to the resulting timeseries output, using either indices=[0] or indices=[-1] to treat it as an initial or final constraint.
This is a common enough request that we'll add some documentation on how to achieve this behavior using both the KSComp approach and the boundary constraint approach.
Personally I'm not as much of a fan of KSComp because I've had trouble getting problems getting those types of objectives to converge in the past. I've used the slack variable and that has worked well. In the following example, we take a guess at the Rotor power in static analysis, and then we run a trajectory and get the actual rotor power during the mission. The objective was to minimize aircraft weight, so if you have a large amount of power in statics, that costs more weight. The constraint shown below prevents us from decreasing our updated guess of rotor power in statics below the maximum power required during the trajectory.
p.model.add_subsystem(
'static_power_check',
om.ExecComp('Power_check = Power_ODE - Power_statics',
Power_check = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_ODE = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_statics = {'value':0.0, 'units':'kW'}),
promotes_inputs=[
('Power_ODE','hop0.main_phase.timeseries.Power_R'), ('Power_statics','Power_{rotor,slack}')],
promotes_outputs=['Power_check'])
p.model.add_constraint('Power_check', upper=0, ref=1)
The constraint on the slack variable effectively helped us ensure that our slack rotor power matched the maximum rotor power during the mission. This allowed us to get the right sizes for the rotor parts (i.e. motors).

How do interpret the scaling report to improve my model?

Pretty nice feature. I am trying to figure out how to interpret the report to see if there are any glaring problems in my model that includes a trajectory with Dymos. My model usually converges fine, although sometimes I have to change the NLP scaling to gradient based. I have no idea what that really does but usually it will make IPOPT converge if the default setting doesn't work, and vice versa.
This the what the Jacobian looks like according to the tool
I am guessing that what is desired is that the order of magnitude of the partials in the Jacobian spans as few orders of magnitude as possible. The lower two diagonal bands have magnitudes from 0.1 to 10E5 and seem to be related to phase linkages. For example we have 'traj.linkages.phase_1:h_final|phase_2:h_initial wrt traj.phases.phase_1.indep_states.states:h' with a magnitude of 10E5. Should I be doing something about this?
In the design variables everything seems to be scaled ok, with driver values an order of magnitude of 1.
In the constraints report the OOM span is wider from 10E-5 to 10E2. I am not setting defect_refs. Maybe I need to do something here?
I tend to use the table that shows the norms from the driver and model perspectives.
In this case the driver values are all scaled on the order of 1 or so. If they're huge, then the scalers/refs may need to be adjusted. (Scaling to the order of around 1 isn't guaranteed to be a good strategy, but it's usually a pretty good going in position.
You may want to do this after limiting the driver iterations. For a converged case in dymos, the values of the defect constraints are going to be close to zero and this may not give you a good idea of their initial values.
For defect_refs, for instance, if I note that the value of the constraint were something like 5.5E4, then I might set that defect ref to 1.0E-4. Again, unit scaling isn't always correct, but it does frequently work.

OpenMDAO Dymos defect_refs -- how should I set these?

I was hoping to get some information on how to set my defect refs in Dymos a smart way. I found the following notes on scaling here https://github.com/hweyandtnasa/scaling-tutorial but it lists defect scaling in Dymos as a TODO still. Should I just set them equal to the ref value for the state they pertain to?
Scaling pseudospectral optimal control problems is tricky. If you can get a copy of John Betts' Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, I highly recommend it. Betts suggest using the same scaling for both the state design variable values and the defects. This is often a good rule of thumb, but as with most approaches to scaling, isn't universal. The collocation "defects" which dictate whether the dynamics are physically correct are just the difference between the slope of the approximating polynomial and the computed equations of motion.
In situations where state values are large but tiny rates of change are significant, then different scaling is warranted in my experience. Examples of states where these can be true are aircraft range or spacecraft orbital elements. Just recently we had a situation where a low-thrust orbit transfer of spacecraft wasn't matching physics. The semi-latus rectum, for instance, is typically measured in km, so on the scale of thousands when in Earth orbit). In the units being used, a "significant" difference in the defect was less than 1E-6 (the threshold for feasibility being used). In this case, the problem was solved by bumping the defect_scaler up a few orders of magnitude (equivalent to bumping the defect_ref down a few orders of magnitude).
I'd also recommend this paper from Ross, Gong, Karpenko, and Proulx. It lays out some good rules of thumb and has an approachable example in the brachistochrone. It references costates a lot. Dymos doesn't provide automatic costate estimation yet, but they are closely related to the lagrange multipliers of the problem, which are printed in the pyoptsparse output if you use SNOPT.
The github repo you pointed out was the work of an intern and was based around this scaling method developed by Sagliano. We found it to work well in a many situations, but it's also not a panacea.
Ultimately we want some automatic scaling options in Dymos and/or OpenMDAO, but we're not sure when they might find their way into the framework. Our past work has typically tied scaling approaches more tightly to the equations of motion, and Dymos is designed to be more general in that the user can supply whatever EOM they choose.
In Dymos, if you leave the defect_ref value unset when you call set_state_options then the default behavior is to make make the defect_ref equal to the ref value. Here is why that is done:
Defects are the differences between the computed state rate from the polynomial interpolation function and the actual state rate computed by the ODE.
As you can see here:
defect = (f_approx-f_computed) * dt_dstau
the dt_dstau just adjusts things into a normalized time space called tau but it also multiplies by the time unit as well (tau is dimensionless). That means the defects are computed in the same units as the states themselves. Thus a reasonable guess for scaling is to match the scaling between the states and the defects. As Rob Falck's answer points out that is not always the right solution, but it's a good starting point.

Meaning of pct.change parameter in longpower package?

Hey so I'm trying to perform a power calculation for a longitudinal study. I've been working with the longpower package. I was a bit confused about the meaning behind the pct.change parameter in the lmmpower command when I was trying to calculate sample size for an nlme model. So, for example, in the following command what does the .3 represent.
lmmpower(model.3, pct.change = .3, t = seq(1,7,1), power = 0.80)$n
The package writeup lists it as "the percent change in the pilot estimate of the parameter of interest (beta, the placebo/null effect)" but am having trouble understanding it. If someone could explain it with a simple example I'd really appreciate it. Also not sure if this belonged here or on cross validated so sorry if it doesn't.
Whenever you do a power calculation, you need to specify an effect size, i.e. the magnitude of response that you're expecting from your experiment. This is usually the smallest effect you would reasonably expect to be able to detect, and an effect size below which you would be comfortable concluding that the effect probably wasn't important in terms of your subject area (biology, economics, business, whatever ...)
lmmpower allows you to specify either pct.change or delta; probably the reason that pct.change is emphasized is that it's often reasonable/easier to interpret proportional changes in an effect. Among other things, this makes the values of the parameter independent of the scale (units) on which the response variable is measured. Alternatively you can specify delta, which is the change in absolute units (i.e., the same units as the predictor variable is measured in).
For what it's worth
"the percent change in the pilot estimate of the parameter of interest (beta, the placebo/null effect)"
seems a little puzzling to me too; I would add "as a proportion of" before "beta" in the parenthetical clause.

How can optimization be used as a solver?

In a question on Cross Validated (How to simulate censored data), I saw that the optim function was used as a kind of solver instead of as an optimizer. Here is an example:
optim(1, fn=function(scl){(pweibull(.88, shape=.5, scale=scl, lower.tail=F)-.15)^2})
# $par
# [1] 0.2445312
# ...
pweibull(.88, shape=.5, scale=0.2445312, lower.tail=F)
# [1] 0.1500135
I have found a tutorial on optim here, but I am still not able to figure out how to use optim to work as a solver. I have several questions:
What is first parameter (i.e., the value 1 being passed in)?
What is the function that is passed in?
I can understand that it is taking the Weibull probability distribution and subtracting 0.15, but why are we squaring the result?
I believe you are referring to my answer. Let's walk through a few points:
The OP (of that question) wanted to generate (pseudo-)random data from a Weibull distribution with specified shape and scale parameters, and where the censoring would be applied for all data past a certain censoring time, and end up with a prespecified censoring rate. The problem is that once you have specified any three of those, the fourth is necessarily fixed. You cannot specify all four simultaneously unless you are very lucky and the values you specify happen to fit together perfectly. As it happened, the OP was not so lucky with the four preferred values—it was impossible to have all four as they were inconsistent. At that point, you can decide to specify any three and solve for the last. The code I presented were examples of how to do that.
As noted in the documentation for ?optim, the first argument is par "[i]nitial values for the parameters to be optimized over".
Very loosely, the way the optimization routine works is that it calculates an output value given a function and an input value. Then it 'looks around' to see if moving to a different input value would lead to a better output value. If that appears to be the case, it moves in that direction and starts the process again. (It stops when it does not appear that moving in either direction will yield a better output value.)
The point is that is has to start somewhere, and the user is obliged to specify that value. In each case, I started with the OP's preferred value (although really I could have started most anywhere).
The function that I passed in is ?pweibull. It is the cumulative distribution function (CDF) of the Weibull distribution. It takes a quantile (X value) as its input and returns the proportion of the distribution that has been passed through up to that point. Because the OP wanted to censor the most extreme 15% of that distribution, I specified that pweibull return the proportion that had not yet been passed through instead (that is the lower.tail=F part). I then subtracted.15 from the result.
Thus, the ideal output (from my point of view) would be 0. However, it is possible to get values below zero by finding a scale parameter that makes the output of pweibull < .15. Since optim (or really most any optimizer) finds the input value that minimizes the output value, that is what it would have done. To keep that from happening, I squared the difference. That means that when the optimizer went 'too far' and found a scale parameter that yielded an output of .05 from pweibull, and the difference was -.10 (i.e., < 0), the squaring makes the ultimate output +.01 (i.e., > 0, or worse). This would push the optimizer back towards the scale parameter that makes pweibull output (.15-.15)^2 = 0.
In general, the distinction you are making between an "optimizer" and a "solver" is opaque to me. They seem like two different views of the same elephant.
Another possible confusion here involves optimization vs. regression. Optimization is simply about finding an input value[s] that minimizes (maximizes) the output of a function. In regression, we conceptualize data as draws from a data generating process that is a stochastic function. Given a set of realized values and a functional form, we use optimization techniques to estimate the parameters of the function, thus extracting the data generating process from noisy instances. Part of regression analyses partakes of optimization then, but other aspects of regression are less concerned with optimization and optimization itself is much larger than regression. For example, the functions optimized in my answer to the other question are deterministic, and there were no "data" being analyzed.

Resources