applying the sigma function in R - r

I am trying to replicate a graph on an example on Danish Data set used in the text Non-Life Insurance Mathematics.
I want to create the following new variable from my data set so I can plot the graph. My biggest challenge is how to sum(sigma) over w over j given that I have to start from max of two values to min of two values. I don't have the faintest idea how to do it in R. Guess I have a lot to still learn in how to do operations in R.
I would appreciate if some how can give me useful tip on how to go about it.
Below is the equation in question I couldn't replace the sigma sign so I used the literally interpretation (sum)
1/λ(i)) = 1/(2m + 1) * sum Wj from {j=max(1,i−m) to min(n,i+m)} for m = 50.

Try this
m = 50
total = 0
for (j in seq(max(1, i-m), min(n, i+m)) {
total = total + W[j]
}
total = total / (2 * m + 1)
lambda = 1 / total
or this
m = 50
lambda = 1 / (sum(W[max(i,i-m) : min(n,i+m)]) / (2 * m + 1))

Related

Math Function: Compounding: Periods Required to Get 100% APY

I'm trying to find or derive a function for Google Sheets, that will return the number of periods (eg days) required to reach a specified APY (eg 100%), given the interest rate per period.
I started with a basic APY function:
r = rate per period
n = number of periods
APY = (1 + r) ^ n - 1
Example:
r = 5% (per period of a day)
n = 14.21 (number of periods, ie days)
APY = (1 + 5%) ^ 14.21 - 1
= 100.03%
I'm stuck trying to reverse the function, so I can determine n (the number of periods), if the APY is given as 100%.
Any suggestions would be most appreciated.
You need the (1+r)-log to reverse the power-to function:
r = log(APY + 1) / log(1 + r)
Examples:
r = log(1,0003 + 1) / log(1 + 0,05) = 14,209
r = log(1 + 1) / log(1 + 0,05) = 14,207
Should not matter which log you use, the log-10 or the natural log (ln), as long as you use the same function both times.

How to solve equation with rotation and translation matrices?

I working on computer vision task and have this equation:
R0*c + t0 = R1*c + t1 = Ri*c + ti = ... = Rn*c + tn ,
n is about 20 (but can be more if needs)
where each pair of R,t (rotation matrix and translation vector in 3D) is a result of i-measurement and they are known, and vector c is what I whant to know.
I've got result with ceres solver. It's good that it can handle outliers but I think it's overkill for this task.
So what methods I should use for two situations:
With outliers
Without outliers
To handle outliers you can use RANSAC:
* In each iteration randomly pick i,j (a "sample") and solve c:
Ri*c + ti = Rj*c + tj
- Set Y = Ri*c + ti
* Apply to a larger population:
- Select S={k} for which ||Rk*c + tk - Y||<e
e ~ 3*RMS of errors without outliers
- Find optimal c for all k equations (with least mean square)
- Give it a "grade": size of S
* After few iterations use optimal c found for Max "grade".
* Number of iterations: log(1-p)/log(1-w^2)
[https://en.wikipedia.org/wiki/Random_sample_consensus]
p = 0.001 (for example. It is the required certainty of the result)
w is an assumption of nonoutliers/n.

Need to calculate the percentage of distribution

I have a set of numbers for a given set of attributes:
red = 4
blue = 0
orange = 2
purple = 1
I need to calculate the distribution percentage. Meaning, how diverse is the selection? Is it 20% diverse? Is it 100% diverse (meaning an even distribution of say 4,4,4,4)?
I'm trying to create a sexy percentage that approaches 100% the more the individual values average to the same value, and a lower value the more they get lopsided.
Has anyone done this?
Here is the PHP conversion of the below example. For some reason it's not producing 1.0 with a 4,4,4,4 example.
$arrayChoices = array(4,4,4,4);
foreach($arrayChoices as $p)
$sum += $p;
print "sum: ".$sum."<br>";
$pArray = array();
foreach($arrayChoices as $rec)
{
print "p vector value: ".$rec." ".$rec / $sum."\n<br>";
array_push($pArray,$rec / $sum);
}
$total = 0;
foreach($pArray as $p)
if($p > 0)
$total = $total - $p*log($p,2);
print "total = $total <br>";
print round($total / log(count($pArray),2) *100);
Thanks in advance!
A simple, if rather naive, scheme is to sum the absolute differences between your observations and a perfectly uniform distribution
red = abs(4 - 7/4) = 9/4
blue = abs(0 - 7/4) = 7/4
orange = abs(2 - 7/4) = 1/4
purple = abs(1 - 7/4) = 3/4
for a total of 5.
A perfectly even spread will have a score of zero which you must map to 100%.
Assuming you have n items in c categories, a perfectly uneven spread will have a score of
(c-1)*n/c + 1*(n-n/c) = 2*(n-n/c)
which you should map to 0%. For a score d, you might use the linear transformation
100% * (1 - d / (2*(n-n/c)))
For your example this would result in
100% * (1 - 5 / (2*(7-7/4))) = 100% * (1 - 10/21) ~ 52%
Better yet (although more complicated) is the Kolmogorov–Smirnov statistic with which you can make mathematically rigorous statements about the probability that a set of observations have some given underlying probability distribution.
One possibility would be to base your measure on entropy. The uniform distribution has maximum entropy, so you could create a measure as follows:
1) Convert your vector of counts to P, a vector of proportions
(probabilities).
2) Calculate the entropy function H(P) for your vector of
probabilities P.
3) Calculate the entropy function H(U) for a vector of equal
probabilities which has the same length as P. (This turns out
to be H(U) = -log(1.0 / length(P)), so you don't actually
need to create U as a vector.)
4) Your diversity measure would be 100 * H(P) / H(U).
Any set of equal counts yields a diversity of 100. When I applied this to your (4, 0, 2, 1) case, the diversity was 68.94. Any vector with all but one element having counts of 0 has diversity 0.
ADDENDUM
Now with source code! I implemented this in Ruby.
def relative_entropy(v)
# Sum all the values in the vector v, convert to decimal
# so we won't have integer division below...
sum = v.inject(:+).to_f
# Divide each value in v by sum, store in new array p
pvals = v.map{|value| value / sum}
# Build a running total by calculating the entropy contribution for
# each p. Entropy is zero if p is zero, in which case total is unchanged.
# Finally, scale by the entropy equivalent of all proportions being equal.
pvals.inject(0){|total,p| p > 0 ? (total - p*Math.log2(p)) : total} / Math.log2(pvals.length)
end
# Scale these by 100 to turn into a percentage-like measure
relative_entropy([4,4,4,4]) # => 1.0
relative_entropy([4,0,2,1]) # => 0.6893917467430877
relative_entropy([16,0,0,0]) # => 0.0

How to calculate log(sum of terms) from its component log-terms

(1) The simple version of the problem:
How to calculate log(P1+P2+...+Pn), given log(P1), log(P2), ..., log(Pn), without taking the exp of any terms to get the original Pi. I don't want to get the original Pi because they are super small and may cause numeric computer underflow.
(2) The long version of the problem:
I am using Bayes' Theorem to calculate a conditional probability P(Y|E).
P(Y|E) = P(E|Y)*P(Y) / P(E)
I have a thousand probabilities multiplying together.
P(E|Y) = P(E1|Y) * P(E2|Y) * ... * P(E1000|Y)
To avoid computer numeric underflow, I used log(p) and calculate the summation of 1000 log(p) instead of calculating the product of 1000 p.
log(P(E|Y)) = log(P(E1|Y)) + log(P(E2|Y)) + ... + log(P(E1000|Y))
However, I also need to calculate P(E), which is
P(E) = sum of P(E|Y)*P(Y)
log(P(E)) does not equal to the sum of log(P(E|Y)*P(Y)). How should I get log(P(E)) without solving for P(E|Y)*P(Y) (they are extremely small numbers) and adding them.
You can use
log(P1+P2+...+Pn) = log(P1[1 + P2/P1 + ... + Pn/P1])
= log(P1) + log(1 + P2/P1 + ... + Pn/P1])
which works for any Pi. So factoring out maxP = max_i Pi results in
log(P1+P2+...+Pn) = log(maxP) + log(1+P2/maxP + ... + Pn/maxP)
where all the ratios are less than 1.

Geometric sequence puzzler

The following problem has been puzzling me for a couple of days (nb: this is not homework).
There exists two geometric sequences that sum to 9. The value of their second term (t2) is 2.
Find the common ratio (r)
Find the first element (t1) of each
The answers to (1) are 2/3 and 1/3 and the answers to (2) are 3 and 6 respectively. Unfortunately, I can't understand how these were derived.
In tackling (1) I've tried to apply algebraic substitution to solve for r as follows:
t2 = t1*r; since t2 = 2 we have:
t1 = 2/r
The equation for calculating the sum (S) of a sequence that converges to a limit is given by:
S = t1 / (1 - r)
So, I tried to plug my value of t1 into S and solve for r as follows:
9 = (2/r) / (1-r)
9(1-r) = 2/r
2/9 = r(1-r)
Unfortunately, from this point I get stuck. I need to eliminate one of the r's but I can't seem to be able to.
Next, I thought to solve for r using the formula that sums the first 2 terms (S2) of the sequence:
S2 = (t1 (1-r^2)) / (1-r)
t1 + 2 = (t1 (1-r^2)) / (1-r)
but expanding this out I again run into the same problem (can't eliminate one of the r's).
So I have 2 questions:
What am I doing wrong when deriving r?
Once I have one of its values, how I derive the other?
2/9 = r(1-r)
Unfortunately, from this point I get
stuck. I need to eliminate one of the
r's but I can't seem to be able to.
You need to learn to factorise!
2/9 = r(1-r)
2/9 = r - r^2
2 = 9r - 9r^2
9r^2 - 9r + 2 = 0
(3r)^2 - 3(3r) + 2 = 0
to make it easier, let R = 3r
R^2 - 3R + 2 = 0
(R - 1)(R - 2) = 0
so 3r - 1 = 0, or 3r - 2 = 0
i.e. r = 1/3 or r = 2/3.
And your first term is 2/(1/3) = 6, or 2/(2/3) = 3
QED!
2/9 = r (1 - r)
Rewrite this as ax2 + bx + c and use the quadratic formula to solve it:
2/9 = r - r2
r2 - r + 2/9 = 0
Using the quadratic formula, the roots are:
[-1 ± √(1 - 8/9)] / 2
= (1 ± 1/3) / 2
= 1/2 ± 1/6
= 1/3 or 2/3
Edit: Aw shoot, I spent way too long figuring out how to write plus/minus and square root. :-P

Resources