In this page there is the official example of the roll periods function.
What is the function used ? (given N the roll period)
With a simple moving average, the first N values should be NA, but they are not.
There are 99 values, so if I put 99 as roll period, I thought I would have a straight line, but it is not.
When I put 50 and 60, it seems that only values after the first 50 ones are changed.
Does anyone know the function, or how can I find it ?
It's a trailing average.
So if there are 100 values and you set 50 as the roll period, then:
the first value will just be the first value
the second will be the average of the first two
...
the 50th will be the average of the first 50
the 51st will be the average of values 2..51
...
the 100th will be the average of values 51..100
Related
I have, let's say, 60 empirical realizations of PPR. My goal is to create PPR vector with average values of empirical PPR. This average values depend on what upper and lower limit of TTM i take - so I can take TTM from 60 to 1 and calculate average and in PPR vector put this one average number from row 1 to 60 or I can calculate average value of PPR from TTT >= 60 and TTM <= 30 and TTM > 30 and TTM <= 1 and these two calculated numbers put in my vector accordingly to TTM values. Finaly I want to obtain something like this on chart (x-axis is TTM, green line is my empirical PPR and black line is average based on significant changes during TTM). I want to write an algorithm which will help me find the best TTM thresholds to fit the best black line to green line.
TTM PPR
60 0,20%
59 0,16%
58 0,33%
57 0,58%
56 0,41%
...
10 1,15%
9 0,96%
8 0,88%
7 0,32%
6 0,16%
Can you please help me if you know any statistical method which might be applicable in this case or base idea for an algorithm which I could implement in VBA/R ?
I have used Solver and GRG Nonlinear** to deal with it but I believe that there is something more proper to be utilized.
** with Solver I had the problem that it found optimal solution - ok, but I re-run Solver and it found me new solution (with a little bit different values of TTM) and value of target function was lower that on first time (so, was the first solution really optimal ?)
I think this is what you want. The next step would be including a method that can recognize the break points. I am sure you need to define two new parameters, one as the sensitivity and one as the minimum number of points in a sample to be accepted to be categorized as a section (between two break points including start and end point)
Please hit the checkmark next to this answer if you are happy with it.
You can download the Excel file from here:
http://www.filedropper.com/statisticspatternchange
I have googled and keep ending up with formulas which are too slow. I suspect if I split the formula in steps (creating calculated columns), I might see some performance gain.
I have a table having some numeric columns along with some which would end up as slicers. The intention is to have 10th, 25th, 50th, 75th and 90th percentile over some numeric columns for the selected slicer.
This is what I have for the 10th Percentile over the column "Total Pd".
TotalPaid10thPercentile:=MINX(
FILTER(
VALUES(ClaimOutcomes[Total Pd]),
CALCULATE(
COUNTROWS(ClaimOutcomes),
ClaimOutcomes[Total Pd] <= EARLIER(ClaimOutcomes[Total Pd])
)> COUNTROWS(ClaimOutcomes)*0.1
),
ClaimOutcomes[Total Pd]
)
It takes several minutes and still no data shows up. I have around 300K records in this table.
I figured out a way to break the calculation down in a series of steps, which fetched a pretty fast solution.
For calculating the 10th percentile on Amount Paid in the table Data, I followed the below out-of-the-book formula :
Calculate the Ordinal rank for the 10th percentile element
10ptOrdinalRank:=0.10*(COUNTX('Data', [Amount Paid]) - 1) + 1
It might come out a decimal(fraction) number like 112.45
Compute the decimal part
10ptDecPart:=[10ptOrdinalRank] - TRUNC([10ptOrdinalRank])
Compute the ordinal rank of the element just below(floor)
10ptFloorElementRank:=FLOOR([10ptOrdinalRank],1)
Compute the ordinal rank of the element just above(ceiling)
10ptCeilingElementRank:=CEILING([10ptOrdinalRank], 1)
Compute element corresponding to floor
10ptFloorElement:=MAXX(TOPN([10ptFloorElementRank], 'Data',[Amount Paid],1), [Amount Paid])
Compute element corresponding to ceiling
10ptCeilingElement:=MAXX(TOPN([10ptCeilingElementRank], 'Data',[Amount Paid],1), [Amount Paid])
Compute the percentile value
10thPercValue:=[10ptFloorElement] + [10ptDecPart]*([10ptCeilingElement]-[10ptFloorElement])
I have found the performance remarkably faster than some other solutions I found on the net. Hope it helps someone in future.
I think I am mostly there, but I can't figure out the remaining piece of how to code this properly.
I begin with a single column of 15 values. I want to create two new columns with the 'previous' containing the average of the previous two values, and the 'future' creating the average of the next two values.
My code is failing because it is INCLUSIVE of the current row's values.
For example, row3 or '30' should have a 'previous' value of 15 ((10+20/2)) and a future value of 45 ((40+50)/2). instead it is returning 25 and 35 because it is including the 30 with the 20 or 40 when making the averages.
I also am stuck on how to just display the previous value.
Anyone mind telling me how to avoid this problem that I am experiencing?
I am using filter but I don't know if there is a better way to do this..
values=data.frame(col=c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150))
values$previous2 = filter(values$col, rep(1/2,2),sides=1)
values$future2 = filter(values$col, rep(1/2,2),sides=2)
values$last = #should be the previous value - ex 2nd row should be 10
values
For returning the last value try:
values$last = c(NA,values[-nrow(values),1])
or the lag function could be used as well I believe.
I would like to unit test the time writing software used at my company. In order to do this I would like to create sets of random numbers that add up to a defined value.
I want to be able to control the parameters:
Min and max value of the generated number
The n of the generated numbers
The sum of the generated numbers
For example, in 250 days a person worked 2000 hours. The 2000 hours have to randomly distributed over the 250 days. The maximum time time spend per day is 9 hours and the minimum amount is .25
I worked my way trough this SO question and found the method
diff(c(0, sort(runif(249)), 2000))
This results in 1 big number a 249 small numbers. That's why I would to be able to set min and max for the generated number. But I don't know where to start.
You will have no problem meeting any two out of your three constraints, but all three might be a problem. As you note, the standard way to generate N random numbers that add to a sum is to generate N-1 random numbers in the range of 0..sum, sort them, and take the differences. This is basically treating your sum as a number line, choosing N-1 random points, and your numbers are the segments between the points.
But this might not be compatible with constraints on the numbers themselves. For example, what if you want 10 numbers that add to 1000, but each has to be less than 100? That won't work. Even if you have ranges that are mathematically possible, forcing compliance with all the constraints might mean sacrificing uniformity or other desirable properties.
I suspect the only way to do this is to keep the sum constraint, the N constraint, do the standard N-1, sort, and diff thing, but restrict the resolution of the individual randoms to your desired minimum (in other words, instead of 0..100, maybe generate 0..10 times 10).
Or, instead of generating N-1 uniformly random points along the line, generate a random sample of points along the line within a similar low-resolution constraint.
I have an object in "R" called p_int. This is a list of 1599 peak intensity numbers.
Within every 8 values of this list is a monoisotopic peak. This peak is the most abundant (largest peak value) compared to the other 7 peaks.
Therefore what I'd like to do is write a loop which processes p_int in batches of 8.
So it will take the first 8 values, find the largest value and add this to a new object called "m_iso".
It will then continue, looking at values 9-16, 17-24, 25-32 etc.
Any advice or code in helping me achieve such a loop would be greatly appreciated.
Thanks,
Stephen.
By 1599 do you actually mean 1600? Because 1599 is not evenly divisible by 8. I'm going to assume this is true and offer the following:
m_iso <- sapply(split(p_int,rep(1:200,each=8)),max)
Or:
m_iso <- apply(matrix(p_int,nrow=8),2,max)
This will give you a vector of maximum values for each set of eight observations.