Popularity formula (using votes and age) - math

I need to create a simple formula for determining the popularity of an item based on votes and age.
Here is my current formula, which needs some work:
30 / (days between post date and now) * (vote count) = weighted vote
Whenever a vost is cast for an item it checks if its weighted vote is > 300. If an item has a weighted vote more than 300 then it is promoted to the front page.
The problem is that this formula makes it very hard for older items to be promoted.
30 / 1 day * 10 votes = 300 (promoted)
30 / 5 days * 15 votes = 90 (not promoted)
30 / 30 days * 30 votes = 30 (not promoted)
30 / 80 days * 40 votes = 15 (not promoted)
How can I alter the formula to make it relatively easier for older items to be promoted (IE. make the above four weighted values fairly close together)?

Just get a graph drawing program (maybe excel, maybe matlab, maybe GNUplot) and experiment with the formula until you feel it looks right.
There's no right or wrong with these things.

Toss a logarithm on the amount of time it's been since the item was posted. Tweak the base and the constants involved. That'll take you most of the way there.

Related

Determination of threshold values to group variable in ranges

I have, let's say, 60 empirical realizations of PPR. My goal is to create PPR vector with average values of empirical PPR. This average values depend on what upper and lower limit of TTM i take - so I can take TTM from 60 to 1 and calculate average and in PPR vector put this one average number from row 1 to 60 or I can calculate average value of PPR from TTT >= 60 and TTM <= 30 and TTM > 30 and TTM <= 1 and these two calculated numbers put in my vector accordingly to TTM values. Finaly I want to obtain something like this on chart (x-axis is TTM, green line is my empirical PPR and black line is average based on significant changes during TTM). I want to write an algorithm which will help me find the best TTM thresholds to fit the best black line to green line.
TTM PPR
60 0,20%
59 0,16%
58 0,33%
57 0,58%
56 0,41%
...
10 1,15%
9 0,96%
8 0,88%
7 0,32%
6 0,16%
Can you please help me if you know any statistical method which might be applicable in this case or base idea for an algorithm which I could implement in VBA/R ?
I have used Solver and GRG Nonlinear** to deal with it but I believe that there is something more proper to be utilized.
** with Solver I had the problem that it found optimal solution - ok, but I re-run Solver and it found me new solution (with a little bit different values of TTM) and value of target function was lower that on first time (so, was the first solution really optimal ?)
I think this is what you want. The next step would be including a method that can recognize the break points. I am sure you need to define two new parameters, one as the sensitivity and one as the minimum number of points in a sample to be accepted to be categorized as a section (between two break points including start and end point)
Please hit the checkmark next to this answer if you are happy with it.
You can download the Excel file from here:
http://www.filedropper.com/statisticspatternchange

Choco solver shift scheduling

i m a total beginner in Choco Solver. I want to make a simple shift scheduler.
i have set integer variables like this
IntVar day1 = model.intVar("day1", new int[] {0,1,2,3,4,5});
where 0 , 1,...5 is a reference ID to an employee.
I have a total of 30 variables,(one for every day of the month) since this a monthly based shift schedule.
I have set up constraints, that do not allow e.g. not be on shift for two days in a row.
My question is,
how can i set up a constraint, such that each employer has a minimum of 5 shifts ie. each value in the domain appears at least 5 times in all 30 variables ?
Thank you!
There are several ways of doing this. Give a look at model.globalCardinality and model.count, these constraints enable to count the number of times a value is used by a set of variables.
http://choco-solver.org/apidocs/org/chocosolver/solver/constraints/IConstraintFactory.html
For instance, model.count(3, vars, model.intVar(5,10)).post(); means that between 5 and 10 variables in vars should be equal to 3, so employee 3 should do between 5 and 10 shifts.

Cost Optimization across Different Suppliers for a Product

I've this following optimization problem. A company produces a product, say Big A. To produce this product, it requires 5 processes. (Please find the detail table below). For each process, there are number of supplier that supply raw material for that particular process. E.g. For process 1, there are 3 supplier 1,2 & 3.
The constrain for the CEO of this company,say C, is that for each process the CEO has to purchase supplies from Supplier 1 first, then for additional supplies from 2nd Supplier and so on.
The optimization problem is C wants 700 units for total material to produce for 1 unit of Big A then how will he do it at minimum cost. How the optimization will change if the amount of units require increases to 1500 units.
I'll be grateful if I get the solution of this answer. But if somebody can suggest me some reference regarding this problem it will be a great help too. I'm mainly using R software here.
Process Supplier Cost Units Cumm_Cost Cumm_Unit
1 1 10 100 10 100
1 2 20 110 30 210
1 3 10 200 40 410
2 1 20 100 20 100
2 2 30 150 50 250
2 3 10 150 60 400
3 1 40 130 40 130
3 2 30 140 70 270
3 3 50 120 120 390
4 1 20 120 20 120
4 2 40 120 60 240
4 3 20 180 80 420
5 1 30 180 30 180
5 2 10 160 40 320
5 3 30 140 70 460
Regards,
I will start by solving the specific problem that you have posted and then will demonsrate how to formulate the problem more abstractively. For simplicity, I will use Excel's Solver add-in to solve the problem, but any configuration of a modeling language (such as AIMMS, AMPL, LINGO, OPL, MOSEL and numerous others) with a solver (CPLEX, GUROBI, GLPK, CBC and numerous others) can be used. If you would like to use R, there exists an lpSolve package that calls the lpSolve solver (which is not the best one in the word to be honest, but it is free of charge).
Note that for "real" (large scale) integer problems, the commercial solvers CPLEX, GUROBI and XPRESS perform a lot better than others. The first completely free solver that performs decently in most tests (including Hans Mittelman's page) is CBC. CBC can be hooked up in excel and solve the built-in solver model without restrictions in the number of constraints or variables, by using this add-in. Therefore, assuming that most CPU is going to be spent by the optimization algorithm, using CBC/OpenSolver seems like an efficient choice.
SPREADSHEET SETUP
I follow some conventions for convenience:
Decision variable cells are marked Green.
Constraints are marked red.
Data are marked grey.
Objective function is marked blue.
First, lets augment the table you presented as follows:
The added columns explained briefly:
Selected?: equals 1 if the (Process, Supplier) combo is allowed to produced a positive quantity, zero otherwise.
Quantity: the quantity produced, defined for each (Process, Supplier) combo.
Max Quantity?: Equals 1 if the Suppliers produces the maximum amount of units for that particular Process.
Quantity UB: equals Units * Selected?. This makes the upper bound either equal to Units, when the Supplier is allowed to produce this Process, or zero otherwise.
Quantity LB: equals Units * Max Quantity?. This is to ensure that whenever the Max Quantity? column is 1, the produced quantity will be equal to Units.
Selection: For the 1st supplier, it equals 0. For the 2nd and 3rd suppliers, it equals the Max Quantity? of the previous supplier (row) minus the Selected? of the current supplier (row).
A screenshot with formulas:
There exist two more constraints:
There must be at least one item produced from each process and
The total number of items should be 700 (or later 1,500).
Here is their setup:
and here are the formulas:
In brief, we use SUMIF to sum the quantities that are specific to each supplier, which we are going to constrain to be more than 1 item for each process.
To finish the spreadsheet setup, we need to calculate the objective function, namely the cost of the allocation. This is easily done by taking the SUMPRODUCT of columns Quantity and Cost. Note that the cumulative quantities are derived data and not very useful in the current context.
After the above steps, the spreadsheet looks like below:
SOLVER MODEL
For the solver model we need to declare
The Objective
The Decisions
The Constraints
The Solver (and tweak some parameters if necessary).
For ease of exposition, I have given each range the name of its header. The solver model looks as follows:
It should all be explanatory, except possibly the Selected >= 0 part. The column selected equals the difference between the binary max Quantity of the previous supplier minus the Selected of the current supplier. Selected >= 0 => max Quantity of previous supplier >= Selected of current supplier. Therefore, if the previous supplier does not produce at max quantity (binary = 0), the current supplier cannot produce.
Then we need to make sure that the solver setting are OK:
and solve the problem.
Solution for req = 700 :
As we see, the model tries to avoid procedures 3 and 5 as much as possible, and satisfies the constraint "at least 1 item per process" by picking up exactly 1 item for processes 3 and 5. The objective function value is 11,710.
Solution for req = 1,500 :
Here we need more capacity, but yet process 3 seems expensive and the model tries to avoid it by allocating whatever is necessary (just 1 unit to supplier 1).
I hope this helps. The spreadsheet can be downloaded here. I include the definition of the mathematical model below, in case you would like to transfer it to another language.
MATHEMATICAL FORMULATION
A formal definition of your problem is as follows.
SETS:
PARAMETERS:
Decisions:
Objective:
Constraints:
Constraint explanation:
C1: A supplier cannot produce anything from a process if he has not been allocated to that process.
C2: If a supplier's maximum indicator is set to 1, then the production variable should be the maximum possible.
C3: We cannot select supplier s for process p if we have not produced the max quantity available from the previous supplier s_[-1].
C4: We need to produce at least 1 item from each process.
C5: the total production from all processes and suppliers should equal the required amount.
Looks like you should look at the simplex algorithm (or some existing implementation of it).
Wikipedia has a fairly nice description of the algorithm, http://en.wikipedia.org/wiki/Simplex_algorithm

Calculate the max samples with ramp up

I got this math problem. I am trying to calculate the max amount of samples when the response time is zero. My test has 3 samples (HTTP Request). The total test wait time is 11 seconds. The test is run for 15 minutes and 25 seconds. The ramp up is 25 seconds, this means that for every second 2 users are created till we reach 50.
Normally you have to wait for the server to respond, but I am trying to calculate the max amount of samples (this means response time is zero.) How do i do this. I can't simply do ((15 * 60 + 25) / 11) * 50. Because of the ramp up.
Any ideas?
EDIT:
Maybe I should translate this problem into something generic and not specific to JMeter So consider this (maybe it will make sense to me aswel ;)).
50 people are walking laps around the park. Each lap takes exactly 11 seconds to run. We got 15 minutes and 25 seconds to walk as many as possible laps. We cannot start all at the sametime but we can start 2 every second (25seconds till we are all running). How many laps can we run?
What i end up doing was manually adding it all up...
Since it takes 25s to get up to full speed, only 2 people can walk for 900s and 2 people can walk for 901s and 2 people can walk for 902s all the way to total of 50 people..
Adding that number together should give me my number i think.
If I am doing something wrong or based on wrong conclusion I like to hear your opinion ;). Or if somebody can see a formula.
Thanks in advance
I have no idea about jmeter, but I do understand your question about people running round the park :-).
If you want an exact answer to that question which ignores partial laps round the park, you'll need to do (in C/java terminology) a for loop to work it out. This is because to ignore partial laps it's necessary to round down the number of possible laps, and there isn't a simple formula that's going to take the rounding down into account. Doing that in Excel, I calculate that 4012 complete laps are possible by the 50 people.
However, if you're happy to include partial laps, you just need to work out the total number of seconds available (taking account of the ramp up), then divide by the number of people starting each second, and finally divide by how many seconds it takes to run the lap. The total number of seconds available is an arithmetic progression.
To write down the formula that includes partial laps, some notation is needed:
T = Total number of seconds (i.e. 900, given that there are 15 minutes)
P = number of People (i.e. 50)
S = number of people who can start at the Same time (i.e. 2)
L = time in seconds for a Lap (i.e. 11)
Then the formula for the total number of laps, including partial laps is
Number of Laps = P * (2 * T - (P/S - 1)) / (2*L)
which in this case equals 4036.36.
Assume we're given:
T = total seconds = 925
W = walkers = 50
N = number of walkers that can start together = 2
S = stagger (seconds between starting groups) = 1
L = lap time = 11
G = number of starting groups = ceiling(W/N) = 25
Where all are positive, W and N are integers, and T >= S*(G-1) (i.e. all walkers have a chance to start). I am assuming the first group start walking at time 0, not S seconds later.
We can break up the time into the ramp period:
Ramp laps = summation(integer i, 0 <= i < G, N*S*(G-i-1)/L)
= N*S*G*(G-1)/(2*L)
and the steady state period (once all the walkers have started):
Steady state laps = W * (T - S*(G-1))/L
Adding these two together and simplifying a little, we get:
Laps = ( N*S*G*(G-1)/2 + W*(T-S*(G-1)) ) / L
This works out to be 4150 laps.
There is a closed form solution if you're only interested in full laps. If that's the case, just let me know.

Simple Steering Behaviour: Explain This Line

I am reading the book Programming Game AI by Example, and he gives code for
a steering behaviour which causes the entity to decelerate so that it arrives
gracefully at a target. After calculating dist, the distance from target to
source he then (essentially) does this
double speed = dist/deceleration;
I just cannot understand where this comes from however, am I just missing something
really obvious? It is not listed as a known error in the book so I am guessing it
is correct.
If there was some physical truth to this, the units would have match up on either side.
From what I understand, this is akin to Zeno's paradoxes where you are trying to reach something, but you never get there because you always only travel one nth of the remaining distance.
Suppose
the simulation proceeds at intervals of one second at a time.
deceleration = 5
distance = 1000 meters
With these initial conditions, speed will be set to 200 meters per second. Because the simulation proceeds at intervals of one second, we will travel exactly 200 meters (i.e. one fifth of the remaining distance), and end up at a distance of 800 meters from the target. The new speed is determined to be: 160 meters per second
Here is what happens in the first 30 seconds:
The last 30 seconds:
The last 10 seconds:
Observations
Within the first 30 seconds, we travel roughly 998 meters
Within the first 50 seconds, we cover 999.985 meters
Within the last 10 seconds, we cover only ~1.2cm
As you can see, you get almost there very quickly, but it takes a long time to get close.
Plots by WolframAlpha
Maybe there is something missing in your calculation. For a constant accelaration (or decelleration), and ignoring initial condictions, the speed is
v = a * t
and the distance is
d = a * t^2 / 2
If you eliminate t in both equations you get
v = a * sqrt(2 * d / a)

Resources