Convert a series of values in to a relative 1-10 scale - scale

Let's say I have a few sets of values like this:
Height (in inches):
John 72.3
Peter 64.5
Frank 78.5
Susan 65.8
Judy 69.0
Mary 72.7
Weight (in pounds):
John 212
Peter 232
Frank 140
Susan 355
Judy 105
Mary 76
Age (in seconds since birth)
John 662256000
Peter 1292976000
Frank 977616000
Susan 1229904000
Judy 599184000
Mary 283824000
What's the best way to convert these values into a 1-10 scale relative to the other values?
I want to be able to say John is a 6/10 on height, 10/10 on height, and a 3/10 on age (made up values) for example.
One issue I'd like to be able to avoid is having extreme values on either side distort the system too much. A very heavy or tall person shouldn't distort the entire scale.

In R, you can use quantile to find the deciles of the data and then findInterval to find the interval in which each observation lies.
x <- rnorm(100)
findInterval( x, quantile(x, seq(0,1,length=11))) )

In R
heightOrder <- order(height)
will give you the rank order of each item. If there are 10 it will go from 1-10. You could scale that to 10.
heightOrder <- heightOrder / max(heightOrder) * 10
Now that goes from 0-10. Although, now that I look at your question you asked "best way". The best way to scale depends on what you want to accomplish. You need to add more to your question to really know the best way.

Isn't it simply:
y = (x-min)/(max-min)*9+1
Perhaps with some rounding using
sprintf '%.0f'
use strict;
use warnings;
use List::MoreUtils qw( minmax );
my %people = (
John => { height => 72.3, weight => 212, age => 662256000 },
Peter => { height => 64.5, weight => 232, age => 1292976000 },
Frank => { height => 78.5, weight => 140, age => 977616000 },
Susan => { height => 65.8, weight => 355, age => 1229904000 },
Judy => { height => 69.0, weight => 105, age => 599184000 },
Mary => { height => 72.7, weight => 76, age => 283824000 },
);
sub scale {
my ($min, $max, $x) = #_;
return ($x-$min)/($max-$min)*9+1;
}
my ($min_height, $max_height) = minmax( map $_->{height}, values %people );
my ($min_weight, $max_weight) = minmax( map $_->{weight}, values %people );
my ($min_age, $max_age ) = minmax( map $_->{age }, values %people );
for my $name (keys %people) {
my $person = $people{$name};
printf("%-6s height: %2.0f/10 weight: %2.0f/10 age: %2.0f/10\n",
"$name:",
scale($min_height, $max_height, $person->{height}),
scale($min_weight, $max_weight, $person->{weight}),
scale($min_age, $max_age, $person->{age }),
);
}
Output:
Susan: height: 2/10 weight: 10/10 age: 9/10
John: height: 6/10 weight: 5/10 age: 4/10
Mary: height: 6/10 weight: 1/10 age: 1/10
Judy: height: 4/10 weight: 2/10 age: 4/10
Peter: height: 1/10 weight: 6/10 age: 10/10
Frank: height: 10/10 weight: 3/10 age: 7/10

If you want your sample to be equally distributed within each of your 1, 2, ...10, then I suggest you use quantiles. In R:
> relative.scale <- function(x) {
+ percentiles <- quantile(x, probs = seq(0,0.9,0.1))
+ sapply(x, function(v)sum(percentiles <= v))
+ }
> x <- runif(100)
> s <- relative.scale(x)
> table(s)
s
1 2 3 4 5 6 7 8 9 10
10 10 10 10 10 10 10 10 10 10

Related

a web component (using lit-element) for rating stars

It can be used to rate something.
The rating is already available (e.g. average of ratings provided by large number of users), we just need to depict the current rating by using stars (consider value up to 1 decimal point in case of rating in decimals)
Here is a vanilla Web Component version from my DEV.to post
For a Lit version, you only have to add some extra lines of code...
customElements.define("star-rating", class extends HTMLElement {
set rating(rate) {
if (!String(rate).includes("%")) rate = Number(rate) / this.stars * 100 + "%";
this.querySelector(":nth-child(2)").setAttribute("width", rate); //2nd rect
}
connectedCallback() {
let {bgcolor,stars,nocolor,color,rating} = this.attributes;
let repeat = (count, func) => Array(count).fill().map(func);
this.stars = ~~stars.value || 5;
this.innerHTML = `<svg viewBox="0 0 ${this.stars*100} 100" style=cursor:pointer>` +
`<rect height=100 fill=${nocolor.value} width=100% />` +
`<rect height=100 fill=${color.value} />` +
repeat(this.stars , (i, n) => `<path fill=${bgcolor.value} d="m${ n*100 } 0h102v100h-102v-100m91 42a6 6 90 00-4-10l-22-1a1 1 90 01-1 0l-8-21a6 6 90 00-11 0l-8 21a1 1 90 01-1 1l-22 1a6 6 90 00-4 10l18 14a1 1 90 010 1l-6 22a6 6 90 008 6l19-13a1 1 90 011 0l19 13a6 6 90 006 0a6 6 90 002-6l-6-22a1 1 90 010-1z"/>`) +
repeat(this.stars * 2, (i, n) => `<rect x=${ n*50 } n=${n} opacity=0 width=50 height=100 ` +
` onclick="this.closest('star-rating').dispatchEvent(new Event('click'))" ` +
` onmouseover="this.closest('star-rating').rating=${(n+1)/2}"/>`) +
"</svg>";
this.rating = rating.value;
}
});
<star-rating stars=5 rating="3.2"
bgcolor="green" nocolor="grey" color="gold"></star-rating>
<br>
<star-rating stars=7 rating="50%"
bgcolor="rebeccapurple" nocolor="beige" color="goldenrod"></star-rating>

R an algorithm for missing data imputation

I have a matrix M[i,j] with a large number of NA representing the population stock in a number of urban areas [i,] over the period 2000-2013 [,j]. I would like to complete missing data assuming that population grows at a constant rate.
I would like to run a loop which, for each row i of the matrix M
1. Computes the average growth rate using all the non-missing observations
2. Fills in the gaps using imputed values
Generating an example dataset:
city1=c(10,40,NA,NA,320,640);
city2=c(NA,300,NA,1200,2400,NA);
city3=c(NA,NA,4000,8000,16000,32000);
mat=rbind(city1,city2,city3)
Since the average growth rate is 4 for city1, 3 for city2, and 2 for city3, the corresponding result matrix should be:
r.city1=c(10,40,80,160,320,640);
r.city2=c(100,300,600,1200,2400,4800);
r.city3=c(1000,2000,4000,8000,16000,32000);
r.mat=rbind(city1,city2,city3)
Do you have any idea of how I could go about this?
Best,
Clément
simply approximate the growth rate
you have 1D array p[n] per city so start with that (p as population)
index i={0,1,2...,n-1} means time in some constant step
so if growth rate is constant (let call it m) then
p[1]=p[0]*m
p[2]=p[0]*(m^2)
p[3]=p[0]*(m^3)
p[i]=p[0]*(m^i)
So now just guess or approximate m and minimize the distance from each known point
as a start point you can get 2 continuous known values
m0=p[i+1]/p[i];
and then make loop around this value minimizing error for all known values at once
also you can use mine approx class (in C++) if you want
If you want to be more precise use dynamic growth rate
for example interpolation cubic or spline
and add curve fitting or also use approximation ...
Here C++ example (simple ugly slow non precise...)
const int N=3; // cities
const int n=6; // years
int p[3][n]=
{
10, 40, -1, -1, 320, 640, // city 1 ... -1 = NA
-1,300, -1,1200, 2400, -1, // city 2
-1, -1,4000,8000,16000,32000, // city 3
};
void estimate()
{
int i,I,*q,i0;
float m,m0,m1,dm,d,e,mm;
for (I=0;I<N;I++) // all cities
{
q=p[I]; // pointer to actual city
// avg growth rate
for (m0=0.0,m1=0.0,i=1;i<n;i++)
if ((q[i]>0)&&(q[i-1]>0))
{ m0+=q[i]/q[i-1]; m1++; }
if (m1<0.5) continue; // skip this city if avg m not found
m0/=m1;
// find m more closelly on interval <0.5*m0,2.0*m0>
m1=2.0*m0; m0=0.5*m0; dm=(m1-m0)*0.001;
for (m=-1.0,e=0.0;m0<=m1;m0+=dm)
{
// find first valid number
for (mm=-1,i=0;i<n;i++)
if (q[i]>0) { mm=q[i]; break; }
// comute abs error for current m0
for (d=0.0;i<n;i++,mm*=m0)
if (q[i]>0) d+=fabs(mm-q[i]);
// remember the best solution
if ((m<0.0)||(e>d)) { m=m0; e=d; }
}
// now compute missing data using m
// ascending
for (mm=-1,i=0;i<n;i++) if (q[i]>0) { mm=q[i]; break; }
for (;i<n;i++,mm*=m) if (q[i]<0) q[i]=mm;
// descending
for (mm=-1,i=0;i<n;i++) if (q[i]>0) { mm=q[i]; break; }
for (;i>=0;i--,mm/=m) if (q[i]<0) q[i]=mm;
}
}
Result:
// input
[ 10 40 NA NA 320 640 ]
[ NA 300 NA 1200 2400 NA ]
[ NA NA 4000 8000 16000 32000 ]
// output
[ 10 40 52 121 320 640 ]
[ 150 300 599 1200 2400 4790 ]
[ 1000 2000 4000 8000 16000 32000 ]
can add some rounding and tweak it a bit
also you can compute the error in both asc and desc order to be more safe
also can enhance this by using as start point value surrounded by valid values instead of first valid entry
or by computing each NA gap separately

mathematics behind modulo behavor

Preamble
This question is not about the behavior of (P)RNG and rand(). It's about using power of two values uniformly distributed against modulo.
Introduction
I knew that one should not use modulo % to convert a value from a range to another, for example to get a value between 0 and 5 from the rand() function: there will be a bias. It's explained here https://bitbucket.org/haypo/hasard/src/ebf5870a1a54/doc/common_errors.rst?at=default and in this answer Why do people say there is modulo bias when using a random number generator?
But today after investigating some code which was looking wrong, I've made a tool to demonstrate the behavor of modulo: https://gitorious.org/modulo-test/modulo-test/trees/master and found that's not clear enough.
A dice is only 3 bits
I checked with 6 values in range 0..5. Only 3 bits are needed to code those values.
$ ./modulo-test 10000 6 3
interations = 10000, range = 6, bits = 3 (0x00000007)
[0..7] => [0..5]
theorical occurences 1666.67 probability 0.16666667
[ 0] occurences 2446 probability 0.24460000 ( +46.76%)
[ 1] occurences 2535 probability 0.25350000 ( +52.10%)
[ 2] occurences 1275 probability 0.12750000 ( -23.50%)
[ 3] occurences 1297 probability 0.12970000 ( -22.18%)
[ 4] occurences 1216 probability 0.12160000 ( -27.04%)
[ 5] occurences 1231 probability 0.12310000 ( -26.14%)
minimum occurences 1216.00 probability 0.12160000 ( -27.04%)
maximum occurences 2535.00 probability 0.25350000 ( +52.10%)
mean occurences 1666.67 probability 0.16666667 ( +0.00%)
stddev occurences 639.43 probability 0.06394256 ( 38.37%)
With 3 bits of input, the results are indeed awful, but behave as expected. See answer https://stackoverflow.com/a/14614899/611560
Increasing the number of input bits
What puzzled me, was increasing the number of input bits made the results different.
You should not forgot to increase the number of iterations, eg the number of sample otherwise the results are likely wrong (see Wrong Statistics).
Lets try with 4 bits:
$ ./modulo-test 20000 6 4
interations = 20000, range = 6, bits = 4 (0x0000000f)
[0..15] => [0..5]
theorical occurences 3333.33 probability 0.16666667
[ 0] occurences 3728 probability 0.18640000 ( +11.84%)
[ 1] occurences 3763 probability 0.18815000 ( +12.89%)
[ 2] occurences 3675 probability 0.18375000 ( +10.25%)
[ 3] occurences 3721 probability 0.18605000 ( +11.63%)
[ 4] occurences 2573 probability 0.12865000 ( -22.81%)
[ 5] occurences 2540 probability 0.12700000 ( -23.80%)
minimum occurences 2540.00 probability 0.12700000 ( -23.80%)
maximum occurences 3763.00 probability 0.18815000 ( +12.89%)
mean occurences 3333.33 probability 0.16666667 ( +0.00%)
stddev occurences 602.48 probability 0.03012376 ( 18.07%)
Lets try with 5 bits:
$ ./modulo-test 40000 6 5
interations = 40000, range = 6, bits = 5 (0x0000001f)
[0..31] => [0..5]
theorical occurences 6666.67 probability 0.16666667
[ 0] occurences 7462 probability 0.18655000 ( +11.93%)
[ 1] occurences 7444 probability 0.18610000 ( +11.66%)
[ 2] occurences 6318 probability 0.15795000 ( -5.23%)
[ 3] occurences 6265 probability 0.15662500 ( -6.03%)
[ 4] occurences 6334 probability 0.15835000 ( -4.99%)
[ 5] occurences 6177 probability 0.15442500 ( -7.34%)
minimum occurences 6177.00 probability 0.15442500 ( -7.34%)
maximum occurences 7462.00 probability 0.18655000 ( +11.93%)
mean occurences 6666.67 probability 0.16666667 ( +0.00%)
stddev occurences 611.58 probability 0.01528949 ( 9.17%)
Lets try with 6 bits:
$ ./modulo-test 80000 6 6
interations = 80000, range = 6, bits = 6 (0x0000003f)
[0..63] => [0..5]
theorical occurences 13333.33 probability 0.16666667
[ 0] occurences 13741 probability 0.17176250 ( +3.06%)
[ 1] occurences 13610 probability 0.17012500 ( +2.08%)
[ 2] occurences 13890 probability 0.17362500 ( +4.18%)
[ 3] occurences 13702 probability 0.17127500 ( +2.77%)
[ 4] occurences 12492 probability 0.15615000 ( -6.31%)
[ 5] occurences 12565 probability 0.15706250 ( -5.76%)
minimum occurences 12492.00 probability 0.15615000 ( -6.31%)
maximum occurences 13890.00 probability 0.17362500 ( +4.18%)
mean occurences 13333.33 probability 0.16666667 ( +0.00%)
stddev occurences 630.35 probability 0.00787938 ( 4.73%)
Question
Please explain me why the results are different when changing the input bits (and increasing the sample count accordingly) ? What is the mathematical reasoning behind these ?
Wrong statistics
In the previous version of the question, I showed a test with 32bits of input and only 1000000 iterations, eg 10^6 samples, and said I was surprised to get correct results.
It was so wrong I'm ashamed of: there must be N times more samples to have confidence to get all 2^32 values of the generator. Here 10^6 is way to small compaired to 2^32. Bonus for people able to explain this in mathematical/statistical language..
Here the wrong results:
$ ./modulo-test 1000000 6 32
interations = 1000000, range = 6, bits = 32 (0xffffffff)
[0..4294967295] => [0..5]
theorical occurences 166666.67 probability 0.16666667
[ 0] occurences 166881 probability 0.16688100 ( +0.13%)
[ 1] occurences 166881 probability 0.16688100 ( +0.13%)
[ 2] occurences 166487 probability 0.16648700 ( -0.11%)
[ 3] occurences 166484 probability 0.16648400 ( -0.11%)
[ 4] occurences 166750 probability 0.16675000 ( +0.05%)
[ 5] occurences 166517 probability 0.16651700 ( -0.09%)
minimum occurences 166484.00 probability 0.16648400 ( -0.11%)
maximum occurences 166881.00 probability 0.16688100 ( +0.13%)
mean occurences 166666.67 probability 0.16666667 ( +0.00%)
stddev occurences 193.32 probability 0.00019332 ( 0.12%)
I still have to read and re-read the excellent article of Zed Shaw "Programmers Need To Learn Statistics Or I Will Kill Them All".
In essence, you're doing:
(rand() & 7) % 6
Let's assume that rand() is uniformly distributed on [0; RAND_MAX], and that RAND_MAX+1 is a power of two. It is clear that rand() & 7 can evaluate to 0, 1, ..., 7, and that the outcomes are equiprobable.
Now let's look at what happens when you take the result modulo 6.
0 and 6 map to 0;
1 and 7 map to 1;
2 maps to 2;
3 maps to 3;
4 maps to 4;
5 maps to 5.
This explains why you get twice as many zeroes and ones as you get the other numbers.
The same thing is happening in the second case. However, the value of the "extra" numbers is much smaller, making their contribution indistinguishable from noise.
To summarize, if you have an integer uniformly distributed on [0; M-1], and you take it modulo N, the result will be biased towards zero unless M is divisible by N.
rand() (or some other PRNG) produces values in the interval [0 .. RAND_MAX]. You want to map these to the interval [0 .. N-1] using the remainder operator.
Write
(RAND_MAX+1) = q*N + r
with 0 <= r < N.
Then for each value in the interval [0 .. N-1] there are
q+1 values of rand() that are mapped to that value if the value is smaller than r
q values of rand() that are mapped to it if the value is >= r.
Now, if q is small, the relative difference between q and q+1 is large, but if q is large - 2^32 / 6, for example - the difference cannot easily be measured.

Negative & Positive Percentage Calculation

Let's say i have 3 sets of numbers and i want the % of their difference.
30 - 60
94 - 67
10 - 14
I want a function that calculate the percentage of the difference between each 2 numbers, and the most important is to support positive and negative percentages.
Example:
30 - 60 : +100%
94 - 67 : -36% ( just guessing )
10 - 14 : +40%
Thanks
This is pretty basic math.
% difference from x to y is 100*(y-x)/x
The important issue here is whether one of your numbers is a known reference, for example, a theoretical value.
With no reference number, use the percent difference as
100*(y-x)/((x+y)/2)
The important distinction here is dividing by the average, which symmetrizes the definition.
From your example though, it seems that you might want percent error, that is, you are thinking of your first number as the reference number and want to know how the other deviates from that. Then the equation, where x is reference number, is:
100*(y-x)/x
See, e.g., wikipedia, for a small discussion on this.
for x - y the percentage is (y-x)/x*100
Simple math:
function differenceAsPercent($number1, $number2) {
return number_format(($number2 - $number1) / $number1 * 100, 2);
}
echo differenceAsPercent(30, 60); // 100
echo differenceAsPercent(94, 67); // -28.72
echo differenceAsPercent(10, 14); // 40
If the percentage is needed for a voting system then Andrey Korolyov is the only who answered correctly.
Example
10 votes for 1 vote against = 90%
10 votes for 5 votes against = 50%
10 votes for 3 votes against = 70%
100 votes for 1 vote against = 99%
1000 votes for 1 vote against = 99.9%
1 votes for 10 vote against = -90%
5 votes for 10 votes against = -50%
3 votes for 10 votes against = -70%
1 votes for 100 vote against = -99%
1 votes for 1000 vote against = -99.9%
function perc(a,b){
console.log( (a > b) ? (a-b)/a*100 : (b - a)/b*-100);
}
$c = ($a > $b) ? ($a-$b)/$a*-100 : ($b-$a)/$b*100;
In Ukraine children learn these math calculations at the age of 12 :)

Mathematical mind boggler, very confusing possible new mathematical breakthrough [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
I'm not trying to make a joke here but I am very confused I been trying to figure this out for like 6 hours straight now got about 20 notepads opened up here, 15 calculators and I cant crunch it I'm always getting too much excess in the end.
Lets explain some variables here we got to work with.
Say we got
2566 min points / 2566 max points
0 min xp / 4835 max xp
There is 2 types of jobs that need to use both variables (points and xp)
Job (1) subtracts 32 points per click and adds 72 xp per click.
Job (2) subtracts 10 points per click and adds 14 xp per click.
I'm trying to figure out how to calculate the excess properly. So it would waste the minimal amount of Job(1)'s to still have enough points to do as much Job(2)'s as it possibly can and still reach max xp.
Thats the thing I dont want to run Job1's until there are no more points left because in doing so, the Job1's will exceeds the maximum XP (2566) and I will never get to do any Job2's.
I want to get the maximum possible Job2's in then using proper calculation achieve or overflow the MaxXP of 2566 with Job1's to always achieve max XP. Pretty much my situation is that I need to get 2566 MaxXP to be able to continue completing jobs. While keeping that in mind I want to place most priority on job2's and only use Job1's to achieve the necessary MaxXP of 2566 to reset the min points to max to redo the process all over. I am trying to automate this.
Here is my equations
amountOfJob1s = (minPoints / 32)
amountOfJob2s = (minPoints / 10)
excessXP = (amountOfJob1s * 72) - maxXP
if excessXP < 0 then break
Results
mustDoJob1s = ???
mustDoJob2s = ???
Thank you if anyone can help me figure this out so I can put a good equation here I'd appreciate it.
Either this is not mathematically possible or I just can't crunch it I do believe I have enough variables.
Let job1 be the amount of job1 and job2 be the amount of job2. We are left with two equations and two unknowns:
job1 * 32 + job2 * 10 = 2566
job1 * 72 + job2 * 14 = 4835
So:
job1 = 45.683...
job2 = 110.411...
Given job1 as the higher xp/point ratio and you wanna go over 4835 xp, round job1 up, compute job2 and round it down.
job1 = 46
job1 * 32 + job2 * 10 = 2566
job2 = 109.4
job2 = 109
Check:
job1 * 32 + job2 * 10 = 2562 points
job1 * 72 + job2 * 14 = 4838 xp
Done.
Two unknowns is hardly a 'new mathematical breakthrough' :)
I assume you want to get as much "XP" as possible, while spending no more than 2566 "points" by "clicking" an integer number of times {n1, n2} on each of two "jobs". Here is the answer in Mathematica:
In[8]:= Maximize[{72 n1 + 14 n2, n1 >= 0, n2 >= 0,
32 n1 + 10 n2 <= 2566}, {n1, n2}, Integers]
Out[8]= {5956, {n1 -> 80, n2 -> 0}}
Or, maybe you need to spend exactly 2566 points? Then the best you can do is:
In[9]:= Maximize[{72 n1 + 14 n2, n1 >= 0, n2 >= 0,
32 n1 + 10 n2 == 2566}, {n1, n2}, Integers]
Out[9]= {5714, {n1 -> 78, n2 -> 7}}
Is this what you wanted?
Let a be the number of Job 1 and b the number of Job 2.
XP = 72 a + 14 b
P = 32 a + 10 b
You appear to want to solve for a and b, such that XP <= 4835, P <= 2566 and b is as large as possible.
72 a + 14 b <= 4835
32 a + 10 b <= 2566
b will be largest when a = 0, i.e.
b <= 4835 ÷ 14, => b <= 345
b <= 2566 ÷ 10, => b <= 256
As b must be both below 345 and 256, it must be below 256.
Substitute back in:
72 a + 14 × 256 <= 4835, => a <= ( 4835 - 14 × 256 ) ÷ 72, => a <= 17
32 a + 10 × 256 <= 2566, => a <= ( 2566 - 10 × 256 ) ÷ 32, => a <= 0
so a = 0, XP is 2560 and points used is 3584.
Alternatively, you can solve for the closest satisfaction of the two inequalities
72 a + 14 b <= 4835 (1)
32 a + 10 b <= 2566 (2)
b <= ( 2566 - 32 a ) ÷ 10 (3) rearrange 2
72 a <= 4835 - 1.4 ( 2566 - 32 a ) (4) subst 3 into 1
27.2 a <= 1242.6
a <= 45.68
so choose a = 45 as the largest integer solution, giving b = 112, XP is 4808, points used is 2560
For either of these, there's no computer programming required; if the constants associated with the two jobs change, then the formulas change.
For harder to solve examples, the relevant area of mathematics is called linear programming

Resources