Recomendations (functions/solution) to apply in OpenMDAO instead of boolean conditions (if/else) - openmdao

I have been working for a couple of months with OpenMDAO and I find myself struggling with my code when I want to impose conditions for trying to replicate a physical/engineering behaviour.
I have tried using sigmoid functions, but I am still not convinced with that, due to the difficulty about trading off sensibility and numerical stabilization. Most of times I found overflows in exp so I end up including other conditionals (like np.where) so loosing linearity.
outputs['sigmoid'] = 1 / (1 + np.exp(-x))
I was looking for another kind of step function or something like that, able to keep linearity and derivability to the ease of the optimization. I don't know if something like that exists or if there is any strategy that can help me. If it helps, I am working with an OpenConcept benchmark, which uses vectorized computations ans Simpson's rule numerical integration.
Thank you very much.
PD: This is my first ever question in stackoverflow, so I would like to apologyze in advance for any error or bad practice commited. Hope to eventually collaborate and become active in the community.
Update after Justin answer:
I will take the opportunity to define a little bit more my problem and the strategy I tried. I am trying to monitorize and control thermodynamics conditions inside a tank. One of the things is to take actions when pressure P1 reaches certein threshold P2, for defining this:
eval= (inputs['P1'] - inputs['P2']) / (inputs['P1'] + inputs['P2'])
# P2 = threshold [Pa]
# P1 = calculated pressure [Pa]
k=100 #steepness control
outputs['sigmoid'] = (1 / (1 + np.exp(-eval * k)))
eval was defined in order avoid overflows normalizing the values, so when the threshold is recahed, corrections are taken. In a very similar way, I defined a function to check if there is still mass (so flowing can continue between systems):
eval= inputs['mass']/inputs['max']
k=50
outputs['sigmoid'] = (1 / (1 + np.exp(-eval*k)))**3
maxis also used for normalizing the value and the exponent is added for reaching zero before entering in the negative domain.
PLot (sorry it seems I cannot post images yet for my reputation)
It may be important to highlight that both mass and pressure are calculated from coupled ODE integration, in which this activation functions take part. I guess OpenConcept nature 'explore' a lot of possible values before arriving the solution, so most of the times giving negative infeasible values for massand pressure and creating overflows. For that sometimes I try to include:
eval[np.where(eval > 1.5)] = 1.5
eval[np.where(eval < -1.5)] = -1.5
That is not a beautiful but sometimes effective solution. I try to avoid using it since I taste that this bounds difficult solver and optimizer work.

I could give you a more complete answer if you distilled your question down to a specific code example of the function you're wrestling with and its expected input range. If you provide that code-sample, I'll update my answer.
Broadly, this is a common challenge when using gradient based optimization. You want some kind of behavior like an if-condition to turn something on/off and in many cases thats a fundamentally discontinuous function.
To work around that we often use sigmoid functions, but these do have some of the numerical challenges you pointed out. You could try a hyberbolic tangent as an alternative, though it may suffer the same kinds of problems.
I will give you two broad options:
Option 1
sometimes its ok (even if not ideal) to leave the purely discrete conditional in the code. Lets say you wanted to represent a kind of simple piecewise function:
y = 2x; x>=0
y = 0; x < 0
There is a sharp corner in that function right at 0. That corner is not differentiable, but the function is fine everywhere else. This is very much like the absolute value function in practice, though you might not draw the analogy looking at the piecewise definition of the function because the piecewise nature of abs is often hidden from you.
If you know (or at least can check after the fact) that your final answer will no lie right on or very near to that C1 discontinuity, then its probably fine to leave the code the way is is. Your derivatives will be well defined everywhere but right at 0 and you can simply pick the left or the right answer for 0.
Its not strictly mathematically correct, but it works fine as long as you're not ending up stuck right there.
Option 2
Apply a smoothing function. This can be a sigmoid, or a simple polynomial. The exact nature of the smoothing function is highly specific to the kind of discontinuity you are trying to approximate.
In the case of the piecewise function above, you might be tempted to define that function as:
2x*sig(x)
That would give you roughly the correct behavior, and would be differentiable everywhere. But wolfram alpha shows that it actually undershoots a little. Thats probably undesirable, so you can increase the exponent to mitigate that. This however, is where you start to get underflow and overflow problems.
So to work around that, and make a better behaved function all around, you could instead defined a three part piecewise polynomial:
y = 2x; x>=a
y = c0 + c1*x + c2*x**2; -a <= x < a
y = 0 x < -a
you can solve for the coefficients as a function of a (please double check my algebra before using this!):
c0 = 1.5a
c1 = 2
c2 = 1/(2a)
The nice thing about this approach is that it will never overshoot and go negative. You can also make a reasonably small and still get decent numerics. But if you try to make it too small, c2 will obviously blow up.
In general, I consider the sigmoid function to be a bit of a blunt instrument. It works fine in many cases, but if you try to make it approximate a step function too closely, its a nightmare. If you want to represent physical processes, I find polynomial fillet functions work more nicely.
It takes a little effort to derive that polynomial, because you want it to be c1 continuous on both sides of the curve. So you have to construct the system of equations to solve for it as a function of the polynomial order and the specific relaxation you want (0.1 here).

My goto has generally been to consult the table of activation functions on wikipedia: https://en.wikipedia.org/wiki/Activation_function
I've had good luck with sigmoid and the hyperbolic tangent, scaling them such that we can choose the lower and upper values as well as choosing the location of the activation on the x-axis and the steepness.
Dymos uses a vectorization that I think is similar to OpenConcept and I've had success with numpy.where there as well, providing derivatives for each possible "branch" taken. It is true that you may have issues with derivative mismatches if you have an analysis point right on the transition, but often I've had success despite that. If the derivative at the transition becomes a hinderance then implementing a sigmoid or relu are more appropriate.
If x is of a magnitude such that it can cause overflows, consider applying units or using scaling to put it within reasonable limits if you cannot bound it directly.

Related

Is there a more efficient way of nesting logarithms?

This is a continuation of the two questions posted here,
Declaring a functional recursive sequence in Matlab
Nesting a specific recursion in Pari-GP
To make a long story short, I've constructed a family of functions which solve the tetration functional equation. I've proven these things are holomorphic. And now it's time to make the graphs, or at least, somewhat passable code to evaluate these things. I've managed to get to about 13 significant digits in my precision, but if I try to get more, I encounter a specific error. That error is really nothing more than an overflow error. But it's a peculiar overflow error; Pari-GP doesn't seem to like nesting the logarithm.
My particular mathematical function is approximated by taking something large (think of the order e^e^e^e^e^e^e) to produce something small (of the order e^(-n)). The math inherently requires samples of large values to produce these small values. And strangely, as we get closer to numerically approximating (at about 13 significant digits or so), we also get closer to overflowing because we need such large values to get those 13 significant digits. I am a god awful programmer; and I'm wondering if there could be some work around I'm not seeing.
/*
This function constructs the approximate Abel function
The variable z is the main variable we care about; values of z where real(z)>3 almost surely produces overflow errors
The variable l is the multiplier of the approximate Abel function
The variable n is the depth of iteration required
n can be set to 100, but produces enough accuracy for about 15
The functional equation this satisfies is exp(beta_function(z,l,n))/(1+exp(-l*z)) = beta_function(z+1,l,n); and this program approaches the solution for n to infinity
*/
beta_function(z,l,n) =
{
my(out = 0);
for(i=0,n-1,
out = exp(out)/(exp(l*(n-i-z)) +1));
out;
}
/*
This function is the error term between the approximate Abel function and the actual Abel function
The variable z is the main variable we care about
The variable l is the multiplier
The variable n is the depth of iteration inherited from beta_function
The variable k is the new depth of iteration for this function
n can be set about 100, still; but 15 or 20 is more optimal.
Setting the variable k above 10 will usually produce overflow errors unless the complex arguments of l and z are large.
Precision of about 10 digits is acquired at k = 5 or 6 for real z, for complex z less precision is acquired. k should be set to large values for complex z and l with large imaginary arguments.
*/
tau_K(z,l,n,k)={
if(k == 1,
-log(1+exp(-l*z)),
log(1 + tau_K(z+1,l,n,k-1)/beta_function(z+1,l,n)) - log(1+exp(-l*z))
)
}
/*
This is the actual Abel function
The variable z is the main variable we care about
The variable l is the multiplier
The variable n is the depth of iteration inherited from beta_function
The variable k is the depth of iteration inherited from tau_K
The functional equation this satisfies is exp(Abl_L(z,l,n,k)) = Abl_L(z+1,l,n,k); and this function approaches that solution for n,k to infinity
*/
Abl_L(z,l,n,k) ={
beta_function(z,l,n) + tau_K(z,l,n,k);
}
This is the code for approximating the functions I've proven are holomorphic; but sadly, my code is just horrible. Here, is attached some expected output, where you can see the functional equation being satisfied for about 10 - 13 significant digits.
Abl_L(1,log(2),100,5)
%52 = 0.1520155156321416705967746811
exp(Abl_L(0,log(2),100,5))
%53 = 0.1520155156321485241351294757
Abl_L(1+I,0.3 + 0.3*I,100,14)
%59 = 0.3353395055605129001249035662 + 1.113155080425616717814647305*I
exp(Abl_L(0+I,0.3 + 0.3*I,100,14))
%61 = 0.3353395055605136611147422467 + 1.113155080425614418399986325*I
Abl_L(0.5+5*I, 0.2+3*I,100,60)
%68 = -0.2622549204469267170737985296 + 1.453935357725113433325798650*I
exp(Abl_L(-0.5+5*I, 0.2+3*I,100,60))
%69 = -0.2622549205108654273925182635 + 1.453935357685525635276573253*I
Now, you'll notice I have to change the k value for different values. When the arguments z,l are further away from the real axis, we can make k very large (and we have to to get good accuracy), but it'll still overflow eventually; typically once we've achieved about 13-15 significant digits, is when the functions will start to blow up. You'll note, that setting k =60, means we're taking 60 logarithms. This already sounds like a bad idea, lol. Mathematically though, the value Abl_L(z,l,infinity,infinity) is precisely the function I want. I know that must be odd; nested infinite for-loops sounds like nonsense, lol.
I'm wondering if anyone can think of a way to avoid these overflow errors and obtaining a higher degree of accuracy. In a perfect world, this object most definitely converges, and this code is flawless (albeit, it may be a little slow); but we'd probably need to increase the stacksize indefinitely. In theory this is perfectly fine; but in reality, it's more than impractical. Is there anyway, as a programmer, one can work around this?
The only other option I have at this point is to try and create a bruteforce algorithm to discover the Taylor series of this function; but I'm having less than no luck at doing this. The process is very unique, and trying to solve this problem using Taylor series kind of takes us back to square one. Unless, someone here can think of a fancy way of recovering Taylor series from this expression.
I'm open to all suggestions, any comments, honestly. I'm at my wits end; and I'm wondering if this is just one of those things where the only solution is to increase the stacksize indefinitely (which will absolutely work). It's not just that I'm dealing with large numbers. It's that I need larger and larger values to compute a small value. For that reason, I wonder if there's some kind of quick work around I'm not seeing. The error Pari-GP spits out is always with tau_K, so I'm wondering if this has been coded suboptimally; and that I should add something to it to reduce stacksize as it iterates. Or, if that's even possible. Again, I'm a horrible programmer. I need someone to explain this to me like I'm in kindergarten.
Any help, comments, questions for clarification, are more than welcome. I'm like a dog chasing his tail at this point; wondering why he can't take 1000 logarithms, lol.
Regards.
EDIT:
I thought I'd add in that I can produce arbitrary precision but we have to keep the argument of z way off in the left half plane. If the variables n,k = -real(z) then we can produce arbitrary accuracy by making n as large as we want. Here's some output to explain this, where I've used \p 200 and we pretty much have equality at this level (minus some digits).
Abl_L(-1000,1+I,1000,1000)
%16 = -0.29532276871494189936534470547577975723321944770194434340228137221059739121428422475938130544369331383702421911689967920679087535009910425871326862226131457477211238400580694414163545689138863426335946 + 1.5986481048938885384507658431034702033660039263036525275298731995537068062017849201570422126715147679264813047746465919488794895784667843154275008585688490133825421586142532469402244721785671947462053*I
exp(Abl_L(-1001,1+I,1000,1000))
%17 = -0.29532276871494189936534470547577975723321944770194434340228137221059739121428422475938130544369331383702421911689967920679087535009910425871326862226131457477211238400580694414163545689138863426335945 + 1.5986481048938885384507658431034702033660039263036525275298731995537068062017849201570422126715147679264813047746465919488794895784667843154275008585688490133825421586142532469402244721785671947462053*I
Abl_L(-900 + 2*I, log(2) + 3*I,900,900)
%18 = 0.20353875452777667678084511743583613390002687634123569448354843781494362200997943624836883436552749978073278597542986537166527005507457802227019178454911106220050245899257485038491446550396897420145640 - 5.0331931122239257925629364016676903584393129868620886431850253696250415005420068629776255235599535892051199267683839967636562292529054669236477082528566454129529102224074017515566663538666679347982267*I
exp(Abl_L(-901+2*I,log(2) + 3*I,900,900))
%19 = 0.20353875452777667678084511743583613390002687634123569448354843781494362200997943624836883436552749978073278597542986537166527005507457802227019178454911106220050245980468697844651953381258310669530583 - 5.0331931122239257925629364016676903584393129868620886431850253696250415005420068629776255235599535892051199267683839967636562292529054669236477082528566454129529102221938340371793896394856865112060084*I
Abl_L(-967 -200*I,12 + 5*I,600,600)
%20 = -0.27654907399026253909314469851908124578844308887705076177457491260312326399816915518145788812138543930757803667195961206089367474489771076618495231437711085298551748942104123736438439579713006923910623 - 1.6112686617153127854042520499848670075221756090591592745779176831161238110695974282839335636124974589920150876805977093815716044137123254329208112200116893459086654166069454464903158662028146092983832*I
exp(Abl_L(-968 -200*I,12 + 5*I,600,600))
%21 = -0.27654907399026253909314469851908124578844308887705076177457491260312326399816915518145788812138543930757803667195961206089367474489771076618495231437711085298551748942104123731995533634133194224880928 - 1.6112686617153127854042520499848670075221756090591592745779176831161238110695974282839335636124974589920150876805977093815716044137123254329208112200116893459086654166069454464833417170799085356582884*I
The trouble is, we can't just apply exp over and over to go forward and expect to keep the same precision. The trouble is with exp, which displays so much chaotic behaviour as you iterate it in the complex plane, that this is doomed to work.
Well, I answered my own question. #user207421 posted a comment, and I'm not sure if it meant what I thought it meant, but I think it got me to where I want. I sort of assumed that exp wouldn't inherit the precision of its argument, but apparently that's true. So all I needed was to define,
Abl_L(z,l,n,k) ={
if(real(z) <= -max(n,k),
beta_function(z,l,n) + tau_K(z,l,n,k),
exp(Abl_L(z-1,l,n,k)));
}
Everything works perfectly fine from here; of course, for what I need it for. So, I answered my own question, and it was pretty simple. I just needed an if statement.
Thanks anyway, to anyone who read this.

Constructing Taylor Series from a Recursive function in Pari-GP

This is a continuation of my questions:
Declaring a functional recursive sequence in Matlab
Is there a more efficient way of nesting logarithms?
Nesting a specific recursion in Pari-GP
But I'll keep this question self contained. I have made a coding project for myself; which is to program a working simple calculator for a tetration function I've constructed. This tetration function is holomorphic, and stated not to be Kneser's solution (as to all the jargon, ignore); long story short, I need to run the numbers; to win over the nay-sayers.
As to this, I have to use Pari-GP; as this is a fantastic language for handling large numbers and algebraic expressions. As we are dealing with tetration (think numbers of the order e^e^e^e^e^e); this language is, of the few that exist, the best for such affairs. It is the favourite when doing iterated exponential computations.
Now, the trouble I am facing is odd. It is not so much that my code doesn't work; it's that it's overflowing because it should over flow (think, we're getting inputs like e^e^e^e^e^e; and no computer can handle it properly). I'll post the first batch of code, before I dive deeper.
The following code works perfectly; and does everything I want. The trouble is with the next batch of code. This produces all the numbers I want.
\\This is the asymptotic solution to tetration. z is the variable, l is the multiplier, and n is the depth of recursion
\\Warning: z with large real part looks like tetration; and therefore overflows very fast. Additionally there are singularities which occur where l*(z-j) = (2k+1)*Pi*I.
\\j,k are integers
beta_function(z,l,n) =
{
my(out = 0);
for(i=0,n-1,
out = exp(out)/(exp(l*(n-i-z)) +1));
out;
}
\\This is the error between the asymptotic tetration and the tetration. This is pretty much good for 200 digit accuracy if you need.
\\modify the 0.000000001 to a bigger number to make this go faster and receive less precision. When graphing 0.0001 is enough
\\Warning: This will blow up at some points. This is part of the math; these functions have singularities/branch cuts.
tau(z,l,n)={
if(1/real(beta_function(z,l,n)) <= 0.000000001, //this is where we'll have problems; if I try to grab a taylor series with this condition we error out
-log(1+exp(-l*z)),
log(1 + tau(z+1,l,n)/beta_function(z+1,l,n)) - log(1+exp(-l*z))
)
}
\\This is the sum function. I occasionally modify it; to make better graphs, but the basis is this.
Abl(z,l,n) = {
beta_function(z,l,n) + tau(z,l,n)
}
Plugging this in, you get the following expressions:
Abl(1,log(2),100)
realprecision = 28 significant digits (20 digits displayed)
%109 = 0.15201551563214167060
exp(Abl(0,log(2),100))
%110 = 0.15201551563214167060
Abl(1+I,2+0.5*I,100)
%111 = 0.28416643148885326261 + 0.80115283113944703984*I
exp(Abl(0+I,2+0.5*I,100))
%112 = 0.28416643148885326261 + 0.80115283113944703984*I
And so on and so forth; where Abl(z,l,n) = exp(Abl(z-1,l,n)). There's no problem with this code. Absolutely none at all; we can set this to 200 precision and it'll still produce correct results. The graphs behave exactly as the math says they should behave. The problem is, in my construction of tetration (the one we actually want); we have to sort of paste together the solutions of Abl(z,l,n) across the value l. Now, you don't have to worry about any of that at all; but, mathematically, this is what we're doing.
This is the second batch of code; which is designed to "paste together" all these Abl(z,l,n) into one function.
//This is the modified asymptotic solution to the Tetration equation.
beta(z,n) = {
beta_function(z,1/sqrt(1+z),n);
}
//This is the Tetration function.
Tet(z,n) ={
if(1/abs(beta_function(z,1/sqrt(1+z),n)) <= 0.00000001,//Again, we see here this if statement; and we can't have this.
beta_function(z,1/sqrt(1+z),n),
log(Tet(z+1,n))
)
}
This code works perfectly for real-values; and for complex values. Some sample values,
Tet(1+I,100)
%113 = 0.12572857262453957030 - 0.96147559586703141524*I
exp(Tet(0+I,100))
%114 = 0.12572857262453957030 - 0.96147559586703141524*I
Tet(0.5,100)
%115 = -0.64593666417664607364
exp(Tet(0.5,100))
%116 = 0.52417133958039107545
Tet(1.5,100)
%117 = 0.52417133958039107545
We can also effectively graph this object on the real-line. Which just looks like the following,
ploth(X=0,4,Tet(X,100))
Now, you may be asking; What's the problem then?
If you try and plot this function in the complex plane, it's doomed to fail. The nested logarithms produce too many singularities near the real line. For imaginary arguments away from the real-line, there's no problem. And I've produced some nice graphs; but the closer you get to the real line; the more it misbehaves and just short circuits. You may be thinking; well then, the math is wrong! But, no, the reason this is happening is because Kneser's tetration is the only tetration that is stable about the principal branch of the logarithm. Since this tetration IS NOT Kneser's tetration, it's inherently unstable about the principal branch of the logarithm. Of course, Pari just chooses the principal branch. So when I do log(log(log(log(log(beta(z+5,100)))))); the math already says this will diverge. But on the real line; it's perfectly adequate. And for values of z with an imaginary argument away from zero, we're fine too.
So, how I want to solve this, is to grab the Taylor series at Tet(1+z,100); which Pari-GP is perfect for. The trouble?
Tet(1+z,100)
*** at top-level: Tet(1+z,100)
*** ^------------
*** in function Tet: ...unction(z,1/sqrt(1+z),n))<=0.00000001,beta_fun
*** ^---------------------
*** _<=_: forbidden comparison t_SER , t_REAL.
The numerical comparison I've done doesn't translate to a comparison between t_SER and t_REAL.
So, my question, at long last: what is an effective strategy at getting the Taylor series of Tet(1+z,100) using only real inputs. The complex inputs near z=0 are erroneous; the real values are not. And if my math is right; we can take the derivatives along the real-line and get the right result. Then, we can construct a Tet_taylor(z,n) which is just the Taylor Series expansion. Which; will most definitely have no errors when trying to graph.
Any help, questions, comments, suggestions--anything, is greatly appreciated! I really need some outside eyes on this.
Thanks so much if you got to the bottom of this post. This one is bugging me.
Regards, James
EDIT:
I should add that a Tet(z+c,100) for some number c is the actual tetration function we want. There is a shifting constant I haven't talked about yet. Nonetheless; this is spurious to the question, and is more a mathematical point.
This is definitely not an answer - I have absolutely no clue what you are trying to do. However, I see no harm in offering suggestions. PARI has a built in type for power series (essentially Taylor series) - and is very good at working with them (many operations are supported). I was originally going to offer some suggestions on how to get a Taylor series out of a recursive definition using your functions as an example - but in this case, I'm thinking that you are trying to expand around a singularity which might be doomed to failure. (On your plot it seems as x->0, the result goes to -infinity???)
In particular if I compute:
log(beta(z+1, 100))
log(log(beta(z+2, 100)))
log(log(log(beta(z+3, 100))))
log(log(log(log(beta(z+4, 100)))))
...
The different series are not converging to anything. Even the constant term of the series is getting smaller with each iteration, so I am not entirely sure there is even a Taylor series expansion about x = 0.
Questions/suggestions:
Should you be expanding about a different point? (say where the curve
crosses the x-axis).
Does the Taylor series satisfy some recursive relation? For example: A(z) = log(A(z+1)). [This doesn't work, but perhaps there is another way to write it].
I suspect my answer is unlikely to be satisfactory - but then again your question is more mathematical than a practical programming problem.
So I've successfully answered my question. I haven't programmed in so long; I'm kind of shoddy. But I figured it out after enough coffee. I created 3 new functions, which allow me to grab the Taylor series.
\\This function attempts to find the number of iterations we need.
Tet_GRAB_k(A,n) ={
my(k=0);
while( 1/real(beta(A+k,n)) >= 0.0001, k++);
return(k);
}
\\This function will run and produce the same results as Tet; but it's slower; but it let's us estimate Taylor coefficients.
\\You have to guess which k to use for whatever accuracy before overflowing; which is what the last function is good for.
Tet_taylor(z,n,k) = {
my(val = beta(z+k,n));
for(i=1,k,val = log(val));
return(val);
}
\\This function produces an array of all the coefficients about a value A.
TAYLOR_SERIES(A,n) = {
my(ser = vector(40,i,0));
for(i=1,40, ser[i] = polcoeff(Tet_taylor(A+z,n,Tet_GRAB_k(A,n)),i-1,z));
return(ser);
}
After running the numbers, I'm confident this works. The Taylor series is converging; albeit rather slowly and slightly less accurately than desired; but this will have to do.
Thanks to anyone who read this. I'm just answering this question for completeness.

CRAN package submission: "Error: C stack usage is too close to the limit"

Right upfront: this is an issue I encountered when submitting an R package to CRAN. So I
dont have control of the stack size (as the issue occured on one of CRANs platforms)
I cant provide a reproducible example (as I dont know the exact configurations on CRAN)
Problem
When trying to submit the cSEM.DGP package to CRAN the automatic pretest (for Debian x86_64-pc-linux-gnu; not for Windows!) failed with the NOTE: C stack usage 7975520 is too close to the limit.
I know this is caused by a function with three arguments whose body is about 800 rows long. The function body consists of additions and multiplications of these arguments. It is the function varzeta6() which you find here (from row 647 onwards).
How can I adress this?
Things I cant do:
provide a reproducible example (at least I would not know how)
change the stack size
Things I am thinking of:
try to break the function into smaller pieces. But I dont know how to best do that.
somehow precompile? the function (to be honest, I am just guessing) so CRAN doesnt complain?
Let me know your ideas!
Details / Background
The reason why varzeta6() (and varzeta4() / varzeta5() and even more so varzeta7()) are so long and R-inefficient is that they are essentially copy-pasted from mathematica (after simplifying the mathematica code as good as possible and adapting it to be valid R code). Hence, the code is by no means R-optimized (which #MauritsEvers righly pointed out).
Why do we need mathematica? Because what we need is the general form for the model-implied construct correlation matrix of a recursive strucutral equation model with up to 8 constructs as a function of the parameters of the model equations. In addition there are constraints.
To get a feel for the problem, lets take a system of two equations that can be solved recursivly:
Y2 = beta1*Y1 + zeta1
Y3 = beta2*Y1 + beta3*Y2 + zeta2
What we are interested in is the covariances: E(Y1*Y2), E(Y1*Y3), and E(Y2*Y3) as a function of beta1, beta2, beta3 under the constraint that
E(Y1) = E(Y2) = E(Y3) = 0,
E(Y1^2) = E(Y2^2) = E(Y3^3) = 1
E(Yi*zeta_j) = 0 (with i = 1, 2, 3 and j = 1, 2)
For such a simple model, this is rather trivial:
E(Y1*Y2) = E(Y1*(beta1*Y1 + zeta1) = beta1*E(Y1^2) + E(Y1*zeta1) = beta1
E(Y1*Y3) = E(Y1*(beta2*Y1 + beta3*(beta1*Y1 + zeta1) + zeta2) = beta2 + beta3*beta1
E(Y2*Y3) = ...
But you see how quickly this gets messy when you add Y4, Y5, until Y8.
In general the model-implied construct correlation matrix can be written as (the expression actually looks more complicated because we also allow for up to 5 exgenous constructs as well. This is why varzeta1() already looks complicated. But ignore this for now.):
V(Y) = (I - B)^-1 V(zeta)(I - B)'^-1
where I is the identity matrix and B a lower triangular matrix of model parameters (the betas). V(zeta) is a diagonal matrix. The functions varzeta1(), varzeta2(), ..., varzeta7() compute the main diagonal elements. Since we constrain Var(Yi) to always be 1, the variances of the zetas follow. Take for example the equation Var(Y2) = beta1^2*Var(Y1) + Var(zeta1) --> Var(zeta1) = 1 - beta1^2. This looks simple here, but is becomes extremly complicated when we take the variance of, say, the 6th equation in such a chain of recursive equations because Var(zeta6) depends on all previous covariances betwenn Y1, ..., Y5 which are themselves dependend on their respective previous covariances.
Ok I dont know if that makes things any clearer. Here are the main point:
The code for varzeta1(), ..., varzeta7() is copy pasted from mathematica and hence not R-optimized.
Mathematica is required because, as far as I know, R cannot handle symbolic calculations.
I could R-optimze "by hand" (which is extremly tedious)
I think the structure of the varzetaX() must be taken as given. The question therefore is: can I somehow use this function anyway?
Once conceivable approach is to try to convince the CRAN maintainers that there's no easy way for you to fix the problem. This is a NOTE, not a WARNING; The CRAN repository policy says
In principle, packages must pass R CMD check without warnings or significant notes to be admitted to the main CRAN package area. If there are warnings or notes you cannot eliminate (for example because you believe them to be spurious) send an explanatory note as part of your covering email, or as a comment on the submission form
So, you could take a chance that your well-reasoned explanation (in the comments field on the submission form) will convince the CRAN maintainers. In the long run it would be best to find a way to simplify the computations, but it might not be necessary to do it before submission to CRAN.
This is a bit too long as a comment, but hopefully this will give you some ideas for optimising the code for the varzeta* functions; or at the very least, it might give you some food for thought.
There are a few things that confuse me:
All varzeta* functions have arguments beta, gamma and phi, which seem to be matrices. However, in varzeta1 you don't use beta, yet beta is the first function argument.
I struggle to link the details you give at the bottom of your post with the code for the varzeta* functions. You don't explain where the gamma and phi matrices come from, nor what they denote. Furthermore, seeing that beta are the model's parameter etimates, I don't understand why beta should be a matrix.
As I mentioned in my earlier comment, I would be very surprised if these expressions cannot be simplified. R can do a lot of matrix operations quite comfortably, there shouldn't really be a need to pre-calculate individual terms.
For example, you can use crossprod and tcrossprod to calculate cross products, and %*% implements matrix multiplication.
Secondly, a lot of mathematical operations in R are vectorised. I already mentioned that you can simplify
1 - gamma[1,1]^2 - gamma[1,2]^2 - gamma[1,3]^2 - gamma[1,4]^2 - gamma[1,5]^2
as
1 - sum(gamma[1, ]^2)
since the ^ operator is vectorised.
Perhaps more fundamentally, this seems somewhat of an XY problem to me where it might help to take a step back. Not knowing the full details of what you're trying to model (as I said, I can't link the details you give to the cSEM.DGP code), I would start by exploring how to solve the recursive SEM in R. I don't really see the need for Mathematica here. As I said earlier, matrix operations are very standard in R; analytically solving a set of recursive equations is also possible in R. Since you seem to come from the Mathematica realm, it might be good to discuss this with a local R coding expert.
If you must use those scary varzeta* functions (and I really doubt that), an option may be to rewrite them in C++ and then compile them with Rcpp to turn them into R functions. Perhaps that will avoid the C stack usage limit?

Computer Adaptive Testing 1PL Ability Calculation Math: How to implement?

Preamble:
I have been implementing my own CAT system. The resources that have helped me most are these:
An On-line, Interactive, Computer Adaptive Testing Tutorial, 11/98 -- A good explanation of how to pick a test question based on which one would return the most information. Fascinating idea, really. The equations are not illustrated with examples, however... but there is a simulation to play with. Unfortunately the simulation is down!
Computer-Adaptive Testing: A Methodology Whose Time Has Come -- This has similar equations, although it does not use IRT or the Newton-Raphson Method. It is also Rasch, not 3PL. It does, however, have a BASIC program that is far more explicit than the usual equations that are cited. I have converted portions of the program in order to get my own system to experiment with, but I would prefer to use 1PL and/or 3PL.
Rasch Dichotomous Model vs. One-parameter Logistic Model -- This clears some stuff up, but perhaps only makes me more dangerous at this stage.
Now, the question.
I want to be able to measure someone's ability level based on a series of questions that are rated at a 1PL difficulty level and of course the person's answers and whether or not they are correct.
I have to first have a function that calculates the probably of a given item. This equation gives the probability function for 1PL.
Probability correct = e^(ability - difficulty) / (1+ e^(ability - difficulty))
I'll go with this one arbitrarily for now. Using an ability estimate of 0, we get the following probabilities:
-0.3 --> 0.574442516811659
-0.2 --> 0.549833997312478
-0.1 --> 0.52497918747894
0 --> 0.5
0.1 --> 0.47502081252106
0.2 --> 0.450166002687522
0.3 --> 0.425557483188341
This makes sense. A problem targeting their level is 50/50... and the questions are harder or easier depending on which direction you go. The harder questions have a smaller chance of coming out correct.
Now... consider a test taker that has done five questions at this difficulty: -.1, 0, .1, .2, .1. Assume they got them all correct except the one that's at difficulty .2. Assuming an ability level of 0... I would want some equations to indicate that this person is slightly above average.
So... how to calculate that with 1PL? This is where it gets hard.
Looking at the equations on the various pages... I will start with an assumed ability level... and then gradually adjust it with each question after more or less like the following.
Starting Ability: B0 = 0
Ability after problem 1: B1 = B0 + [summations and function evaluated for item 1 at ability B0]
Ability after problem 2: B2 = B1 + [summations and functions evaluated for items 1-2 at ability B1]
Ability after problem 3: B3 = B2 + [summations and functions evaluated for items 1-3 at ability B2]
Ability after problem 4: B4 = B3 + [summations and functions evaluated for items 1-4 at ability B3]
Ability after problem 5: B5 = B4 + [summations and functions evaluated for items 1-5 at ability B4]
And so on.
Just reading papers on this, this is the gist of what the algorithm should be doing. But there are so many different ways to do this. The behaviour of my code is clearly wrong as I get division by zero errors... so this is where I get lost. I've messed with information functions and done derivatives, but my college level math is not cutting it.
Can someone explain to me how to do this part? The literature I've read is short on examples and the descriptions of the math appears incomplete to me. I suppose I'm asking for how to do this with a 3PL model that assumes that c is always zero and a is always 1.7 (or maybe -1.7-- whatever works.) I was trying to get to 1PL somehow anyway.
Edit: A visual guide to item response theory is the best explanation of how to do this I've seen so far, but the text gets confusing at the most critical point. I'm closer to getting this, but I'm still not understanding something. Also... the pattern of summations and functions isn't in this text like I expected.
How to do this:
This is an inefficient solution, but it works and is reasonably inituitive.
The last link I mentioned in the edit explains this.
Given a probability function, set of question difficulties, and corresponding set of evaluations-- ie, whether or not they got it correct.
With that, I can get a series of functions that will tell you the chance of their giving that exact response. Now... multiply all of those functions together.
We now have a big mess! But it's a single function in terms of the unknown ability variable that we want to find.
Next... run a slew of numbers through this function. Whatever returns the maximum value is the test taker's ability level. This can be used to either determine the standard error or to pick the next question for computer adaptive testing.

Function for returning a list of points on a Bezier curve at equal arclength

Someone somewhere has had to solve this problem. I can find many a great website explaining this problem and how to solve it. While I'm sure they are well written and make sense to math whizzes, that isn't me. And while I might understand in a vague sort of way, I do not understand how to turn that math into a function that I can use.
So I beg of you, if you have a function that can do this, in any language, (sure even fortran or heck 6502 assembler) - please help me out.
prefer an analytical to iterative solution
EDIT: Meant to specify that its a cubic bezier I'm trying to work with.
What you're asking for is the inverse of the arc length function. So, given a curve B, you want a function Linv(len) that returns a t between 0 and 1 such that the arc length of the curve between 0 and t is len.
If you had this function your problem is really easy to solve. Let B(0) be the first point. To find the next point, you'd simply compute B(Linv(w)) where w is the "equal arclength" that you refer to. To get the next point, just evaluate B(Linv(2*w)) and so on, until Linv(n*w) becomes greater than 1.
I've had to deal with this problem recently. I've come up with, or come across a few solutions, none of which are satisfactory to me (but maybe they will be for you).
Now, this is a bit complicated, so let me just give you the link to the source code first:
http://icedtea.classpath.org/~dlila/webrevs/perfWebrev/webrev/raw_files/new/src/share/classes/sun/java2d/pisces/Dasher.java. What you want is in the LengthIterator class. You shouldn't have to look at any other parts of the file. There are a bunch of methods that are defined in another file. To get to them just cut out everything from /raw_files/ to the end of the URL. This is how you use it. Initialize the object on a curve. Then to get the parameter of a point with arc length L from the beginning of the curve just call next(L) (to get the actual point just evaluate your curve at this parameter, using deCasteljau's algorithm, or zneak's suggestion). Every subsequent call of next(x) moves you a distance of x along the curve compared to your last position. next returns a negative number when you run out of curve.
Explanation of code: so, I needed a t value such that B(0) to B(t) would have length LEN (where LEN is known). I simply flattened the curve. So, just subdivide the curve recursively until each curve is close enough to a line (you can test for this by comparing the length of the control polygon to the length of the line joining the end points). You can compute the length of this sub-curve as (controlPolyLength + endPointsSegmentLen)/2. Add all these lengths to an accumulator, and stop the recursion when the accumulator value is >= LEN. Now, call the last subcurve C and let [t0, t1] be its domain. You know that the t you want is t0 <= t < t1, and you know the length from B(0) to B(t0) - call this value L0t0. So, now you need to find a t such that C(0) to C(t) has length LEN-L0t0. This is exactly the problem we started with, but on a smaller scale. We could use recursion, but that would be horribly slow, so instead we just use the fact that C is a very flat curve. We pretend C is a line, and compute the point at t using P=C(0)+((LEN-L0t0)/length(C))*(C(1)-C(0)). This point doesn't actually lie on the curve because it is on the line C(0)->C(1), but it's very close to the point we want. So, we just solve Bx(t)=Px and By(t)=Py. This is just finding cubic roots, which has a closed source solution, but I just used Newton's method. Now we have the t we want, and we can just compute C(t), which is the actual point.
I should mention that a few months ago I skimmed through a paper that had another solution to this that found an approximation to the natural parameterization of the curve. The author has posted a link to it here: Equidistant points across Bezier curves

Resources