Best way to find least standard deviation - math

I have a spreadsheet where I put numbers that represent number of verses on each paragraph of a book.
I manually distribute sequential paragraphs by number of verses, so in the spreadsheet I'll have something like this:
Verses Day
5 1
6 1
3 1
10 2
8 3
4 3
2 3
6 4
3 4
10 5
3 5
2 6
5 6
10 7
= 2,7080128015
By summing the total of verses for each day - in this case, 7 days - I get the standard deviation and try to reduce it for a better distribution of paragraphs.
The question is: what is the best way to find the least standard deviation?
I thought on using brute force to generate all possible combinations, but that is not a good idea if the number increases.
EDIT: The standard deviation is based on total number of verses of each day, which are identified sequentialy. Day 1 has total of 14 verses, day 2, 10 and so on.
1 14
2 10
3 14
4 9
5 13
6 7
7 10
= 2,7080128015

Since the total number of verses and the number of days is constant, you want to minimize
sum (avg verse count - verse count of day i)^2
i
avg verse count is a constant and simply the total number of verses divided by the number of days.
This problem can be solved with a dynamic program over the days. Let us build the partial solution function f(days, paragraph) that gives us the minimal sum of squares for distributing paragraphs 0 through paragraph over days days. We are interested in the last value of this function.
We can build the function incrementally. Calculating f(1, p) for any p is straight-forward since we just need to calculate the differences to the average and square. Then, for all other days, we can calculate
f(d, p) = min f(d - 1, i) + (avg verse count - sum verse count of paragraph j)^2
i<p j:i+1..p
That means, we check the solutions for one day less and fill up the current day with the paragraphs between the previous day's end paragraph and p. While we calculate this function, we keep a pointer to the chosen minimum element (as usual for a dynamic program). When we are done calculating the entire function, we just follow the pointers back to the start, which will give us the partitioning.
The algorithm has a running time of O(d * p^2), where d is the number of days and p is the number of paragraphs.
Example Code
Here is some example C# code that implements the above algorithm:
struct Entry
{
public double minCost;
public int predecessor;
}
public static void Main()
{
//input data
int[] versesPerParagraph = { 5, 6, 3, 10, 8, 4, 2, 6, 3, 10, 3, 2, 5, 10 };
int days = 7;
//calculate constants
double avgVerses = (double)versesPerParagraph.Sum() / days;
//set up DP table (f(d,p))
int paragraphs = versesPerParagraph.Length;
Entry[,] dp = new Entry[days, paragraphs];
//initialize table
int verseCount = 0;
for(int p = 0; p < paragraphs; ++p)
{
verseCount += versesPerParagraph[p];
double diff = avgVerses - verseCount;
dp[0, p].minCost = diff * diff;
dp[0, p].predecessor = -1;
}
//run dynamic program
for(int d = 1; d < days; ++d)
{
for(int p = d; p < paragraphs; ++p)
{
verseCount = 0;
dp[d, p].minCost = double.MaxValue;
for(int i = p; i >= d; --i)
{
verseCount += versesPerParagraph[i];
double diff = avgVerses - verseCount;
double cost = dp[d - 1, i - 1].minCost + diff * diff;
if(cost < dp[d, p].minCost)
{
dp[d, p].minCost = cost;
dp[d, p].predecessor = i - 1;
}
}
}
}
//reconstruct the partitioning
{
int p = paragraphs - 1;
for (int d = days - 1; d >= 0; --d)
{
int predecessor = dp[d, p].predecessor;
//calculate number of verses, just to show them
verseCount = 0;
for (int i = predecessor + 1; i <= p; ++i)
verseCount += versesPerParagraph[i];
Console.WriteLine($"Day {d} ranges from paragraph {predecessor + 1} to {p} and has {verseCount} verses.");
p = predecessor;
}
}
}
The output is:
Day 6 ranges from paragraph 13 to 13 and has 10 verses.
Day 5 ranges from paragraph 10 to 12 and has 10 verses.
Day 4 ranges from paragraph 9 to 9 and has 10 verses.
Day 3 ranges from paragraph 6 to 8 and has 11 verses.
Day 2 ranges from paragraph 4 to 5 and has 12 verses.
Day 1 ranges from paragraph 2 to 3 and has 13 verses.
Day 0 ranges from paragraph 0 to 1 and has 11 verses.
This partitioning gives a standard deviation of 1.15.

Related

Prime factorization of factorial

Is it possible to find prime factors of factorial without actually calculating the factorial?
My point here is to find prime factors of factorial not of a big number. Your algorithm should skip the step of having to calculate the factorial and derive prime factors from n! where n <= 4000.
Calculating the factorial and finding it's prime divisors is pretty easy, but my program crashes when the input is greater than n=22. Therfore I thought it would be pretty convinent to do the whole process without having to calculate the factorial.
function decomp(n){
var primeFactors = [];
var fact = 1;
for (var i = 2; i <= n; i++) {
fact = fact * i;
}
while (fact % 2 === 0) {
primeFactors.push(2);
fact = fact/2;
}
var sqrtFact = Math.sqrt(fact);
for (var i = 2; i <= sqrtFact; i++) {
while (fact % i === 0) {
primeFactors.push(i);
fact = fact/i;
}
}
return primeFactors;
}
I don't expect any code nor links, exemplifactions and a brief outline is enough.
Let's consider an example: 10! = 2^8 * 3^4 * 5^2 * 7^1. I computed that by computing the factors of each number from 2 to 10:
2: 2
3: 3
4: 2,2
5: 5
6: 2,3
7: 7
8: 2,2,2
9: 3,3
10: 2,5
Then I just counted each factor. There are eight 2's (1 in 2, 2 in 4, 1 in 6, 3 in 8, and 1 in 10), four 3's (1 in 3, 1 in 6, and 2 in 9), two 5's (1 in 5, and 1 in 10), and one 7 (in 7).
In terms of writing a program, just keep an array of counters (it only needs to be as large as the square root of the largest factorial you want to factor) and, for each number from 2 to the factorial, add the count of its factors to the array of counters.
Does that help?

Modify dijkstra's algorithm with some conditions

Given an undirected graph with costs on edges, find the shortest path, from given node A to B. Let's put it this way: besides the costs and edges we start at time t = 0 and for every node you are given a list with some times that you can't pass through those nodes at that times, and you can't do anything in that time you have to wait until "it passes". As the statement says, you are a prisoner and you can teleport through the cells and the teleportation time requires the cost of the edge time, and those time when you can't do anything is when a guardian is with you in the cell and they are in the cell at every timestamp given from the list, find the minimum time to escape the prison.
What I tried:
I tried to modify it like that: in the normal dijkstra you check if it's a guardian at the minimum time you find for every node, but it didn't work.. any other ideas?
int checkGuardian(int min, int ind, List *guardians)
{
for (List iter = guardians[ind]; iter; iter = iter->next)
if(min == iter->value.node)
return min + iter->value.node;
return 0;
}
void dijkstra(Graph G, int start, int end, List *guardians)
{
Multiset H = initMultiset();
int *parent = (int *)malloc(G->V * sizeof(int));
for (int i = 0; i < G->V; ++i)
{
G->distance[i] = INF;
parent[i] = -1;
}
G->distance[start] = 0;
H = insert(H, make_pair(start, 0));
while(!isEmptyMultiset(H))
{
Pair first = extractMin(H);
for (List iter = G->adjList[first.node]; iter; iter = iter->next)
if(G->distance[iter->value.node] > G->distance[first.node] + iter->value.cost
+ checkGuardian(G->distance[first.node] + iter->value.cost, iter->value.node, guardians))
{
G->distance[iter->value.node] = G->distance[first.node] + iter->value.cost
+ checkGuardian(G->distance[first.node] + iter->value.cost, iter->value.node, guardians);
H = insert(H, make_pair(iter->value.node, G->distance[iter->value.node]));
parent[iter->value.node] = first.node;
}
}
printf("%d\n", G->distance[end]);
printPath(parent, end);
printf("%d\n", end);
}
with these structures:
typedef struct graph
{
int V;
int *distance;
List *adjList;
} *Graph;
typedef struct list
{
int size;
Pair value;
struct list *tail;
struct list *next;
struct list *prev;
} *List;
typedef struct multiset
{
Pair vector[MAX];
int size;
int capacity;
} *Multiset;
typedef struct pair
{
int node, cost;
} Pair;
As an input you are given number of nodes, number of edges and start node. For the next number of edges lines you are reading and edge between 2 nodes and the cost associated with that edge, then for the next number of nodes lines you are reading a character "N" if you can't escape from that cell and "Y" if you can escape from that cell then the number of timestamps guardians are in then number of timestamps, timestamps.
For this input:
6 7 1
1 2 5
1 4 3
2 4 1
2 3 8
2 6 4
3 6 2
1 5 10
N 0
N 4 2 3 4 7
Y 0
N 3 3 6 7
N 3 10 11 12
N 3 7 8 9
I would expect this output:
12
1 4 2 6 3
But I get this output:
10
1 4 2 6 3

Number of action per year. Combinatorics question

I'm writing a diploma about vaccines. There is a region, its population and 12 month. There is an array of 12 values from 0 to 1 with step 0.01. It means which part of population should we vaccinate in every month.
For example if we have array = [0.1,0,0,0,0,0,0,0,0,0,0,0]. That means that we should vaccinate 0.1 of region population only in first month.
Another array = [0, 0.23,0,0,0,0,0,0, 0.02,0,0,0]. It means that we should vaccinate 0.23 of region population in second month and 0.02 of region population in 9th month.
So the question is: how to generate (using 3 loops) 12(months) * 12(times of vaccinating) * 100 (number of steps from 0 to 1) = 14_400 number of arrays that will contain every version of these combinations.
For now I have this code:
for(int month = 0;month<12;month++){
for (double step = 0;step<=1;step+=0.01){
double[] arr = new double[12];
arr[month] = step;
}
}
I need to add 3d loop that will vary number of vaccinating per year.
Have no idea how to write it.
Idk if it is understandable.
Hope u get it otherwise ask me, please.
You have 101 variants for the first month 0.00, 0.01..1.00
And 101 variants for the second month - same values.
And 101*101 possible combinations for two months.
Continuing - for all 12 months you have 101^12 variants ~ 10^24
It is not possible to generate and store so many combinations (at least in the current decade)
If step is larger than 0.01, then combination count might be reliable. General formula is P=N^M where N is number of variants per month, M is number of months
You can traverse all combinations representing all integers in range 0..P-1 in N-ric numeral system. Or make digit counter:
fill array D[12] with zeros
repeat
increment element at the last index by step value
if it reaches the limit, make it zero
and increment element at the next index
until the first element reaches the limit
It is similar to counting 08, 09, here we cannot increment 9, so make 10 and so on
s = 1
m = 3
mx = 3
l = [0]*m
i = 0
while i < m:
print([x/3 for x in l])
i = 0
l[i] += s
while (i < m) and l[i] > mx:
l[i] = 0
i += 1
if i < m:
l[i] += s
Python code prints 64 ((mx/s+1)^m=4^3) variants like [0.3333, 0.6666, 0.0]

Expressing Natural Number by sum of Triangular numbers

Triangular numbers are numbers which is number of things when things can be arranged in triangular shape.
For Example, 1, 3, 6, 10, 15... are triangular numbers.
o o o o o o o o o o is shape of n=4 triangular number
what I have to do is A natural number N is given and I have to print
N expressed by sum of triangular numbers.
if N = 4
output should be
1 1 1 1
1 3
3 1
else if N = 6
output should be
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6
I have searched few hours and couldn't find answers...
please help.
(I am not sure this might help, but I found that
If i say T(k) is Triangular number when n is k, then
T(k) = T(k-1) + T(k-3) + T(k-6) + .... + T(k-p) while (k-p) > 0
and p is triangular number )
Here's Code for k=-1(Read comments below)
#include <iostream>
#include <vector>
using namespace std;
long TriangleNumber(int index);
void PrintTriangles(int index);
vector<long> triangleNumList(450); //(450 power raised by 2 is about 200,000)
vector<long> storage(100001);
int main() {
int n, p;
for (int i = 0; i < 450; i++) {
triangleNumList[i] = i * (i + 1) / 2;
}
cin >> n >> p;
cout << TriangleNumber(n);
if (p == 1) {
//PrintTriangles();
}
return 0;
}
long TriangleNumber(int index) {
int iter = 1, out = 0;
if (index == 1 || index == 0) {
return 1;
}
else {
if (storage[index] != 0) {
return storage[index];
}
else {
while (triangleNumList[iter] <= index) {
storage[index] = ( storage[index] + TriangleNumber(index - triangleNumList[iter]) ) % 1000000;
iter++;
}
}
}
return storage[index];
}
void PrintTriangles(int index) {
// What Algorithm?
}
Here is some recursive Python 3.6 code that prints the sums of triangular numbers that total the inputted target. I prioritized simplicity of code in this version. You may want to add error-checking on the input value, counting the sums, storing the lists rather than just printing them, and wrapping the entire routine into a function. Setting up the list of triangular numbers could also be done in fewer lines of code.
Your code saved time but worsened memory usage by "memoizing" the triangular numbers (storing and reusing them rather than always calculating them when needed). You could do the same to the sum lists, if you like. It is also possible to make this more in the dynamic programming style: find the sum lists for n=1 then for n=2 etc. I'll leave all that to you.
""" Given a positive integer n, print all the ways n can be expressed as
the sum of triangular numbers.
"""
def print_sums_of_triangular_numbers(prefix, target):
"""Print sums totalling to target, each after printing the prefix."""
if target == 0:
print(*prefix)
return
for tri in triangle_num_list:
if tri > target:
return
print_sums_of_triangular_numbers(prefix + [tri], target - tri)
n = int(input('Value of n ? '))
# Set up list of triangular numbers not greater than n
triangle_num_list = []
index = 1
tri_sum = 1
while tri_sum <= n:
triangle_num_list.append(tri_sum)
index += 1
tri_sum += index
# Print the sums totalling to n
print_sums_of_triangular_numbers([], n)
Here are the printouts of two runs of this code:
Value of n ? 4
1 1 1 1
1 3
3 1
Value of n ? 6
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6

Finding X & Y based off of Index

Good day all
I am having a math issue, it may be due to the lack of sleep but I am totally drawing a blank.
I need to find the x and y coordinates based off of the index.
So I know the width of the grid, the height and the index. But I dont know the X and Y coordinates. i need build a formula to get that data.
For example. I know the index of 9. Through a formula i need to be able to get the number 4 for X and 2 for Y
int numOfRows = 4
int numOfCols = 5
int index = 13
int X = ?
int Y = ?
//perform math magic
x = 4
y = 3
It is very simple:
public static void foo(int i) {
int x = i % 5 + 1;
int y = i / 5 + 1;
}
It gets much easier if you start counting with 0:
| 0 1 2 3 4
-----------------
0| 0 1 2 3 4
1| 5 6 7 8 9
2|10 11 12 13 14
3|...
4|
Let a be the number in the grid and numberOfCols the number of columns (5 in this example).
In that case, it's plain to see that
the row number is a / numberOfCols (without remainder) and
the column number is a modulo numberOfCols.
You can reduce your case to this case by adding 1 to the resulting row/col numbers.

Resources