So basically what i am trying to do is to calculate what the top 10 are:
df_priceusd[order(df_priceusd$pop,decreasing=T)[1:10],]
and now i want to take the prices of the top 10 and calculate the mean of the top 10´s values to get one mean value for all of the 10.
can i somehow implement this:
mean(df_priceusd$mean_priceusd)
into my other code?
or should i go at it at another angle?
Related
I have, let's say, 60 empirical realizations of PPR. My goal is to create PPR vector with average values of empirical PPR. This average values depend on what upper and lower limit of TTM i take - so I can take TTM from 60 to 1 and calculate average and in PPR vector put this one average number from row 1 to 60 or I can calculate average value of PPR from TTT >= 60 and TTM <= 30 and TTM > 30 and TTM <= 1 and these two calculated numbers put in my vector accordingly to TTM values. Finaly I want to obtain something like this on chart (x-axis is TTM, green line is my empirical PPR and black line is average based on significant changes during TTM). I want to write an algorithm which will help me find the best TTM thresholds to fit the best black line to green line.
TTM PPR
60 0,20%
59 0,16%
58 0,33%
57 0,58%
56 0,41%
...
10 1,15%
9 0,96%
8 0,88%
7 0,32%
6 0,16%
Can you please help me if you know any statistical method which might be applicable in this case or base idea for an algorithm which I could implement in VBA/R ?
I have used Solver and GRG Nonlinear** to deal with it but I believe that there is something more proper to be utilized.
** with Solver I had the problem that it found optimal solution - ok, but I re-run Solver and it found me new solution (with a little bit different values of TTM) and value of target function was lower that on first time (so, was the first solution really optimal ?)
I think this is what you want. The next step would be including a method that can recognize the break points. I am sure you need to define two new parameters, one as the sensitivity and one as the minimum number of points in a sample to be accepted to be categorized as a section (between two break points including start and end point)
Please hit the checkmark next to this answer if you are happy with it.
You can download the Excel file from here:
http://www.filedropper.com/statisticspatternchange
In this page there is the official example of the roll periods function.
What is the function used ? (given N the roll period)
With a simple moving average, the first N values should be NA, but they are not.
There are 99 values, so if I put 99 as roll period, I thought I would have a straight line, but it is not.
When I put 50 and 60, it seems that only values after the first 50 ones are changed.
Does anyone know the function, or how can I find it ?
It's a trailing average.
So if there are 100 values and you set 50 as the roll period, then:
the first value will just be the first value
the second will be the average of the first two
...
the 50th will be the average of the first 50
the 51st will be the average of values 2..51
...
the 100th will be the average of values 51..100
I would like to unit test the time writing software used at my company. In order to do this I would like to create sets of random numbers that add up to a defined value.
I want to be able to control the parameters:
Min and max value of the generated number
The n of the generated numbers
The sum of the generated numbers
For example, in 250 days a person worked 2000 hours. The 2000 hours have to randomly distributed over the 250 days. The maximum time time spend per day is 9 hours and the minimum amount is .25
I worked my way trough this SO question and found the method
diff(c(0, sort(runif(249)), 2000))
This results in 1 big number a 249 small numbers. That's why I would to be able to set min and max for the generated number. But I don't know where to start.
You will have no problem meeting any two out of your three constraints, but all three might be a problem. As you note, the standard way to generate N random numbers that add to a sum is to generate N-1 random numbers in the range of 0..sum, sort them, and take the differences. This is basically treating your sum as a number line, choosing N-1 random points, and your numbers are the segments between the points.
But this might not be compatible with constraints on the numbers themselves. For example, what if you want 10 numbers that add to 1000, but each has to be less than 100? That won't work. Even if you have ranges that are mathematically possible, forcing compliance with all the constraints might mean sacrificing uniformity or other desirable properties.
I suspect the only way to do this is to keep the sum constraint, the N constraint, do the standard N-1, sort, and diff thing, but restrict the resolution of the individual randoms to your desired minimum (in other words, instead of 0..100, maybe generate 0..10 times 10).
Or, instead of generating N-1 uniformly random points along the line, generate a random sample of points along the line within a similar low-resolution constraint.
I have a number of fishing boat tracks, and I'm trying to detect a certain pattern in their movement using R. In doing so I have reached a point where I have discarded all points of the track where the desired pattern is not occurring within a given time window, and I'm left with the remaining georeferenced points. These points have a score value associated, which measures the 'intensity' of the desired pattern.
track_1[1:10,]:
LAT LON SCORE
1 32.34855 -35.49264 80.67
2 31.54764 -35.58691 18.14
3 31.38293 -35.25243 46.70
4 31.21447 -35.25830 22.65
5 30.76365 -35.38881 11.93
6 30.75872 -35.54733 22.97
7 30.60261 -35.95472 35.98
8 30.62818 -36.27024 31.09
9 31.35912 -35.73573 14.97
10 31.15218 -36.38027 37.60
The code bellow provides the same data
data.frame(cbind(
LAT=c(32.34855,31.54764,31.38293,31.21447,30.76365,30.75872,30.60261,30.62818,31.35912,31.15218),
LON=c(-35.49264,-35.58691,-35.25243,-35.25830,-35.38881,-35.54733,-35.95472,-36.27024,-35.73573,-36.38027),
SCORE=c(80.67,18.14,46.70,22.65,11.93,22.97,35.98,31.09,14.97,37.60)))
Because some of these points occur geographically close to each other I need to 'pool' their scores together. Hence, I now need a way to throw this data into some kind of a spatial grid and cumulatively sum the scores of all points that fall in the same cell of the grid. This would allow me to find in what areas a given fishing boat exhibits the pattern I'm after the most (and this is not just about time spent in one place). Ultimately, the preferred output would contain lat and lon for every grid cell (center), and the sum of all scores on each cell. In addition, I would also like to be able to adjust the sizing of the grid cells.
I've looked around and all I can find either does not preserve the georeferenced information, is very inefficient, or performs binning of data. There may already be some answers out there, but it might be the case that I'm not able to recognize them since I'm a bit out of my league on this stuff. Can someone please point me to some direction (package, function, etc.)? Any guidance will be greatly appreciated.
Take your lat/lon coordinates, and multiply them by the inverse of your desired grid cell edge lengths, measured in degrees. The result will be a pair of floating point numbers whose integer part identifies the grid cell in question. Take the floor of these and you have two numbers describing the cell, which you could paste to form a single string. You may add that as a new factor column of your data frame. Then you can perform operations based on that factor, like summarizing values.
Example:
latScale <- 2 # one cell for every 0.5 degrees
lonScale <- 2 # likewise
track_1$cell <- factor(with(track_1,
paste(floor(LAT*latScale), floor(LON*lonScale), sep='.')))
library(plyr)
ddply(track_1, .(cell), summarize,
LAT=mean(LAT), LON=mean(LON), SCORE=sum(SCORE))
If you want to, you can use weighted.mean instead of mean. If you don't like these factors, you can put more effort in making them nice (e.g. by using compass directions instead of signs), or drop them altogether and use a pair of integer columns instead.
Sorry, I am very weak in using R but very interested in it!
Description of my data: I am having raw data collected from a lattice design (4 reps, 44 blocks, 5 plot per block). 220 entries were used, they are classified in three groups with (FS=200 entries; PC=6 entries and TC=14 entries)!
I would like to get the simple mean and the Mean Square of each group (FS, PC and TC) and the Mean square of the error?
Look forward your kind help,
Thx
I think you could go a long way with the aggregate function, like
aggregate(Data$Values, list(Data$Groups), FUN=mean)
for your mean etc.