In OpenRefine I have a data set and I would like to round the number to the closest value and can be divided by 5 (divisible by 5).
For example:
1.35 would be 1.50
1.70 would be 2.00
I have looked into the documentation but couldn't figure out how to achieve that.
If your numbers are all integers, you could do:
floor((value+4)/5)*5
If it's possible that you'll get floating point numbers too, you could modify it along the lines of:
floor((value+4.999)/5)*5
Related
I was wondering how would I convert the Excel's Percentile rank exclusive function in R. I found a technique here which is like this:
true_df <- data.frame(some_column= c(24516,7174,13594,33838,40000))
percentilerank<-function(x){
rx<-rle(sort(x))
smaller<-cumsum(c(0, rx$lengths))[seq(length(rx$lengths))]
larger<-rev(cumsum(c(0, rev(rx$lengths))))[-1]
rxpr<-smaller/(smaller+larger)
rxpr[match(x, rx$values)]
}
dfr<-percentilerank(true_df$some_column)
#output which is similar to =PERCENTRANK.INC and NOT =PERCENTRANK.EXC
#[1] 0.50 0.00 0.25 0.75 1.00
But it is for =PERCENTRANK.INC equivalent in R. According to info popup in Excel, a =PERCENTRANK.INC takes (array, x-value of rank, [significance-optional]) and returns percentage rank inclusive of the first (0%) and last (100%) values in the array.
=PERCENTRANK.EXC is similar to its counterpart but it returns percentage rank exclusive of the first and last values in the array. Meaning not 0% or 100%.
Here is a small example using Excel to show difference:
When I apply the above R function it gives me the output similar to PERCENTRANK.INC($A$32:$A$36,A32) column. How can I achieve this? I'm new to R.
Using dplyr:
library(dplyr)
# inclusive
percent_rank(x)
# exclusive
percent_rank(c(-Inf, Inf, x))[-(1:2)]
I messed around with the code and got this:
true_df <- data.frame(some_column= c(24516,7174,13594,33838,40000))
percentilerank<-function(x){
rx<-rle(sort(x))
smaller<-cumsum(c(!0, rx$lengths))[seq(length(rx$lengths))]
larger<-rev(cumsum(c(0, rev(rx$lengths))))
rxpr<-smaller/(smaller+larger)
rxpr[match(x, rx$values)]
}
dfr<-percentilerank(true_df$some_column)
#output is now matches =PERCENTRANK.EXC
#[1] 0.5000000 0.1666667 0.3333333 0.6666667 0.8333333
Since the 0 and 100% are not included in the percentile. I changed the line smaller<-cumsum(c(0.... to smaller<-cumsum(c(!0.... and similarly to get rid of 100% where I took out [-1] from line larger<-...[-1]
This is how to replicate PERCENTRANK.EXC in other native Excel formulas:
= Round(Rank/(N + 1) - 0.05%, 3)
Maybe that will help someone.
The 3 corresponds to the default level of significance in PERCENTRANK.EXC. Change as needed.
I'm trying to find the vector which is the most close to 0.5 but not bigger than that. And I want to print another vector on the same row.
For example, I have table named 'exp' like this
num possibility
1 0.16
2 0.43
5 0.64
4 0.12
3 0.76
.
.
.
And I'm trying to find, 'which possibility is the most closest and smaller than 0.5?'.
The answer is second row, which contains 'num==2' 'possibility==0.43'
But how can I find this with coding?
And I'm trying to calculate the '+-2' range of 'num' whose possibility is the most closest and smaller than '0.5'
The num will surely be '5' and the range will be '3~7'.
But how can I do this at once with linked codes?
And whatif I have too many exp1, exp2, exp3, exp4... to do the same work? How can I automatically do this?
I tried things.
exp[which.min(exp$possibility-0.5 <0) -1 , 1]
x < exp[which.min(exp$possibility-0.5 <0) -1 , 1]+2
& x> exp[which.min(exp$possibility-0.5 <0) -1 , 1]-2
this is my best.
but I don't know why adding '<0' in the 'which.min' function makes difference, functioning like 'ifelse'. And how to find the 'closest smaller one' without using '-1' after 'which.min' function.
Actually I more want to know what are simpler and more useful tools.
Please help..
You can try something like this. you can basically set 3 to get variations. Also you could put this in a function and use lapply to iterate over all cols.
f=data.frame(a=seq(1:10), b=runif(10))
c=0.5
z=f$b-c
z=ifelse(z>0, 99, z) # add if you dont want values above 0.5
z=abs(z)
z1=order(z)[1:3]
f$b[z1]
In you first expression (and similarly for the second one), when you do exp$possibility-0.5 <0, a boolean vector is what you get and what it is fed into which.min you are getting the min out of a bunch of one and zeros (True and False) which is not what you want.
which possibility is the most closest and smaller than 0.5?
There are many ways to achieve, one is to set those larger than 0.5 to NA, first, which is done by the ifelse, then find the max probability with which.max like you mentioned:
exp$possibility[which.max(ifelse(exp$possibility> 0.5 exp$possibility> NA)),]
And I'm trying to calculate the '+-2' range of 'num' whose possibility
is the most closest and smaller than '0.5' The num will surely be '5'
and the range will be '3~7'.
You can store the number in a variable first ...
my.num <- exp[which.max(ifelse(exp$possibility> 0.5, exp$possibility, NA)), "num"]
... and subsequently retrive it by
exp[exp$num >= (my.num -2) & exp$num <= (my.num + 2), ]
or put replace my.num with the first expression if you really want a one-liner.
I have two vectors. One containing the mean of my list, one the standard deviation.
I want both numbers to be rounded to the same spot. And the rounding should depend on the standard deviation. It should round to two significant figures of it, if those are below 24, and to one if they would be 25 or bigger.
Here are examples, as this is really confusing:
2.30344 0.01223 -> 2.303 0.012
304.57231 1.35234 -> 304.6 1.3
204.43953 3.35234 -> 204 3
I know of the round function where I can enter the digits, which I would have to apply for both. I also know of the signiffuction where I could enter two digits, but how can I check if the first two digits then are smaller than 25? And how can I then figure out to what digit signif has decided to round?
I have found quite a lot of answer about rounding decimals, however I need to sort out a simple formula for LibreOffice Calc that allows me to draw up estimates ending to the nearest 7 so for example if the quote is 1372 it should be rounded down to 1367 while if it is 1375 becomes 1377 what would be a really simply formula that does not involve coding or macros?
As for now the solution I found is this one:
=(ROUND(I25/10)+0,7)*10
The problem with this is that does not round to the nearest 7 but to 7 so for example 362,00 becomes 367,00 and not 357,00 as intended.
Edit: this resolve the issue above, hope this helps:
=(ROUND((I25-10)/10)+0,7)*10
Removing 10 from the total I25 due to the ROUND function will correct the result, so for example 362 becomes 35,2 rounded to 35 + 0.7 we get 35.7 and finally 357 as intended. While for upper values, say 365 rounding 35.5 gives us 36 + 0.7 we get 367,00 again to nearest 7 units as intended!
You could use the default ROUND() function while "shifting" the values by 3:
=ROUND(A1+3;-1)-3
This gives the following results:
In other words: add your offset "3" to the initial value, ROUND() it to the nearest multiple of ten, and subtract the offset again.
But this will round 1372 to 1377, since it's a "shifted" 1375 which will round up to 1380 (see Matthew Strawbridge's comment to the question).
This question already has answers here:
Round up from .5
(7 answers)
Closed 6 years ago.
It seems there is an error in round function. Below I would expect it to return 6, but it returns 5.
round(5.5)
# 5
Other then 5.5, such as 6.5, 4.5 returns 7, 5 as we expect.
Any explanation?
This behaviour is explained in the help file of the ?round function:
Note that for rounding off a 5, the IEC 60559 standard is expected to
be used, ‘go to the even digit’. Therefore round(0.5) is 0 and
round(-1.5) is -2. However, this is dependent on OS services and on
representation error (since e.g. 0.15 is not represented exactly, the
rounding rule applies to the represented number and not to the printed
number, and so round(0.15, 1) could be either 0.1 or 0.2).
round( .5 + 0:10 )
#### [1] 0 2 2 4 4 6 6 8 8 10 10
Another relevant email exchange by Greg Snow: R: round(1.5) = round(2.5) = 2?:
The logic behind the round to even rule is that we are trying to
represent an underlying continuous value and if x comes from a truly
continuous distribution, then the probability that x==2.5 is 0 and the
2.5 was probably already rounded once from any values between 2.45 and 2.54999999999999..., if we use the round up on 0.5 rule that we learned in grade school, then the double rounding means that values
between 2.45 and 2.50 will all round to 3 (having been rounded first
to 2.5). This will tend to bias estimates upwards. To remove the
bias we need to either go back to before the rounding to 2.5 (which is
often impossible to impractical), or just round up half the time and
round down half the time (or better would be to round proportional to
how likely we are to see values below or above 2.5 rounded to 2.5, but
that will be close to 50/50 for most underlying distributions). The
stochastic approach would be to have the round function randomly
choose which way to round, but deterministic types are not
comforatable with that, so "round to even" was chosen (round to odd
should work about the same) as a consistent rule that rounds up and
down about 50/50.
If you are dealing with data where 2.5 is likely to represent an exact
value (money for example), then you may do better by multiplying all
values by 10 or 100 and working in integers, then converting back only
for the final printing. Note that 2.50000001 rounds to 3, so if you
keep more digits of accuracy until the final printing, then rounding
will go in the expected direction, or you can add 0.000000001 (or
other small number) to your values just before rounding, but that can
bias your estimates upwards.
When I was in college, a professor of Numerical Analysis told us that the way you describe for rounding numbers is the correct one. You shouldn't always round up the number (integer).5, because it is equally distant from the (integer) and the (integer + 1). In order to minimize the error of the sum (or the error of the average, or whatever), half of those situations should be rounded up and the other half should be rounded down. The R programmers seem to share the same opinion as my professor of Numerical Analysis...