What explains the 1 decimal place rounding of x.x5 in R? - r

I'm looking for an explanation of how 1 decimal place rounding works for a sequence like this in R:
seq(1.05, 2.95, by = .1)
At high school, I'd round this up, i.e. 2.05 becomes 2.1. But R rounds it to 2 for 1 decimal place rounding.
Round up from .5
The following rounding function from the above stackoverflow answer consistently achieves the high school rounding:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*posneg
}
This code compares the R rounding and rounding from above.
data.frame(cbind(
Number = seq(1.05, 2.95, by = .1),
Popular.Round = round2(seq(1.05, 2.95, by = .1), 1),
R.Round = round(seq(1.05, 2.95, by = .1), 1)))
With R rounding, 1.05 is rounded up to 1.1 whereas 2.05 is rounded down to 2. Then again 1.95 is rounded up to 2 and 2.95 is rounded up to 3 as well.
If it is "round to even", why is it 3, i.e. odd number.
Is there a better response than "just deal with it" when asked about this behavior?

Too long to read? Scroll below
This was an interesting study for me personally. According to documentation:
Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE
754’) is expected to be used, ‘go to the even digit’. Therefore
round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on
OS services and on representation error (since e.g. 0.15 is not
represented exactly, the rounding rule applies to the represented
number and not to the printed number, and so round(0.15, 1) could be
either 0.1 or 0.2).
Rounding to a negative number of digits means rounding to a power of
ten, so for example round(x, digits = -2) rounds to the nearest
hundred.
For signif the recognized values of digits are 1...22, and non-missing
values are rounded to the nearest integer in that range. Complex
numbers are rounded to retain the specified number of digits in the
larger of the components. Each element of the vector is rounded
individually, unlike printing.
Firstly, you asked "If it is "round to even", why is it 3, i.e. odd number." To be clear, the round to even rule applies for rounding off a 5. If you run round(2.5) or round(3.5), then R returns 2 and 4, respectively.
If you go here, https://stat.ethz.ch/pipermail/r-help/2008-June/164927.html, then you see this response:
The logic behind the round to even rule is that we are trying to
represent an underlying continuous value and if x comes from a truly
continuous distribution, then the probability that x==2.5 is 0 and the
2.5 was probably already rounded once from any values between 2.45 and 2.54999999999999..., if we use the round up on 0.5 rule that we learned in grade school, then the double rounding means that values
between 2.45 and 2.50 will all round to 3 (having been rounded first
to 2.5). This will tend to bias estimates upwards. To remove the
bias we need to either go back to before the rounding to 2.5 (which is
often impossible to impractical), or just round up half the time and
round down half the time (or better would be to round proportional to
how likely we are to see values below or above 2.5 rounded to 2.5, but
that will be close to 50/50 for most underlying distributions). The
stochastic approach would be to have the round function randomly
choose which way to round, but deterministic types are not
comforatable with that, so "round to even" was chosen (round to odd
should work about the same) as a consistent rule that rounds up and
down about 50/50.
If you are dealing with data where 2.5 is likely to represent an exact
value (money for example), then you may do better by multiplying all
values by 10 or 100 and working in integers, then converting back only
for the final printing. Note that 2.50000001 rounds to 3, so if you
keep more digits of accuracy until the final printing, then rounding
will go in the expected direction, or you can add 0.000000001 (or
other small number) to your values just before rounding, but that can
bias your estimates upwards.
Short Answer: If you always round 5s upward, then your data biases upward. But if you round by evens, then your rounded-data, at large, becomes balanced.
Let's test this using your data:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*posneg
}
x <- data.frame(cbind(
Number = seq(1.05, 2.95, by = .1),
Popular.Round = round2(seq(1.05, 2.95, by = .1), 1),
R.Round = round(seq(1.05, 2.95, by = .1), 1)))
> mean(x$Popular.Round)
[1] 2.05
> mean(x$R.Round)
[1] 2.02
Using a bigger sample:
x <- data.frame(cbind(
Number = seq(1.05, 6000, by = .1),
Popular.Round = round2(seq(1.05, 6000, by = .1), 1),
R.Round = round(seq(1.05, 6000, by = .1), 1)))
> mean(x$Popular.Round)
[1] 3000.55
> mean(x$R.Round)
[1] 3000.537

Related

Is there an R function to round UP a number one decimal point?

I have a dataset with numbers with several significant figures. I need to round to the nearest decimal point up and the nearest decimal point down.
here is an example of the code for reference with some random numbers similar to the ones I'm working with:
x <- c(0.987, 1.125, 0.87359, 1.2)
high_rounded <- round(x, digits = 1)
low_rounded <- high_rounded - 0.1
and then I need to be able to use both the high_rounded and low_rounded variables in further analyses. The way I have the code written right now it will only work when the number is rounded up, but if it needs to be rounded down then it doesn't work. The round() function only works to round a number however it needs to be rounded, but I am not able to specify to round up or down.
I have also tried:
ceiling(x)
But this only rounds up to the nearest integer, and I need the nearest decimal point.
How about ceiling(x*10)/10 ... ?
You can use round_any from the plyr package
plyr::round_any(x <- c(0.987, 1.125, 0.87359, 1.2), accuracy = 0.1, f = ceiling)
[1] 1.0 1.2 0.9 1.2

cut function produces uneven first break

I'm exploring the use of the cut function and am trying to cut the following basic vector into 10 breaks. I'm able to do it, but I'm confused as to why my initial break occurs at -0.1 rather than 0:
test_vec <- 0:10
test_vec2 <- cut(test_vec, breaks = 10)
test_vec2
yields:
(-0.01,1] (-0.01,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10]
Why does this produce 2 instances of (-0.01,1] (-0.01,1] and the lower number does not start at 0?
tl;dr to get what you might want, you'll probably need to specify breaks explicitly, and include.lowest=TRUE:
cut(x,breaks=0:10,include.lowest=TRUE)
The issue is probably this, from the "Details" of ?cut:
When ‘breaks’ is specified as a single number, the range of the
data is divided into ‘breaks’ pieces of equal length, and then the
outer limits are moved away by 0.1% of the range to ensure that
the extreme values both fall within the break intervals.
Since the range is (0,10), the outer limits are (-0.01, 10.01); as #Onyambu suggests, the results are asymmetric because the value at 0 lies on the left-hand boundary (not included) whereas the value at 10 lies on the right-hand boundary (included).
The (apparent) asymmetry is due to formatting; if you follow the code below (the core of base:::cut.default(), you'll see that the top break is actually at 10.01, but gets formatted as "10" because the default number of digits is 3 ...
x <- 0:10
breaks <- 10
dig <- 3
nb <- as.integer(breaks+1)
dx <- diff(rx <- range(x, na.rm = TRUE))
breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + dx/1000)
ch.br <- formatC(0 + breaks, digits = dig, width = 1L)

Force zeros with percent formatting

I'd like to display a dataframe feature in percentages with 4 decimal places:
scales::percent(1:3/12345)
"0.0081%" "0.0162%" "0.0243%"
This shows each value as a percent to 4 decimal places.
But if I try e.g.
scales::percent(c(1:3/12345, 0.9), accuracy = 4)
[1] "0%" "0%" "0%" "88%"
I lose the values for the first 3. I'd like those to show as
"0.0081%", "0.0162%" "0.0243%".
How can I force the same number of digits while formatting as percent? I always want 4 digits to the right of the decimal, even if they are all zero.
You can do:
scales::percent(c(1:3/12345, 0.9), accuracy = 0.0001)
[1] "0.0081%" "0.0162%" "0.0243%" "90.0000%"
The accuracy argument has a rather counterintuitive functioning, meaning that if you want to have an output with more decimal places, you need to use a number smaller than 1 (which is also the default value). Every decimal place in the accuracy argument then represents a decimal place in the output.
To illustrate the function at greater depth. If you want an output with one decimal place:
scales::percent(c(1:3/12345, 0.9), accuracy = 0.1)
[1] "0.0%" "0.0%" "0.0%" "90.0%"
while if want it with three decimal places:
scales::percent(c(1:3/12345, 0.9), accuracy = 0.001)
[1] "0.008%" "0.016%" "0.024%" "90.000%"

Limit result of subtraction to a minimum of zero

I have a vector 'x' where values ranges from 0 to 1, e.g. x <- c(0, 0.5. 1). I'm subtracting, say, 0.5 from 'x':
x - 0.5
The result of x - 0.5 will range from -0.5 to 0.5. However, I want to constrain the minimum of the result to 0, i.e. the new range will be 0.5 to 0, any previously negative numbers will now be coerced to 0.
Is there a simple way of doing this? I've looked for "constrain" and "limit" and such. I assume I could probably bash it into shape with if or filtering but I was hoping there was an elegant function that hasn't surfaced in my searches.
See ?pmax.
pmax(0, x - 0.5)
I.e. pick whichever is larger -- that would be zero if x < 0.5.

Looks like a simple graphing problem

At present I have a control to which I need to add the facility to apply various acuteness (or sensitivity). The problem is best illustrated as an image:
Graph http://img87.imageshack.us/img87/7886/control.png
As you can see, I have X and Y axess that both have arbitrary limits of 100 - that should suffice for this explanation. At present, my control is the red line (linear behaviour), but I would like to add the ability for the other 3 curves (or more) i.e. if a control is more sensitive then a setting will ignore the linear setting and go for one of the three lines. The starting point will always be 0, and the end point will always be 100.
I know that an exponential is too steep, but can't seem to figure a way forward. Any suggestions please?
The curves you have illustrated look a lot like gamma correction curves. The idea there is that the minimum and maximum of the range stays the same as the input, but the middle is bent like you have in your graphs (which I might note is not the circular arc which you would get from the cosine implementation).
Graphically, it looks like this:
(source: wikimedia.org)
So, with that as the inspiration, here's the math...
If your x values ranged from 0 to 1, the function is rather simple:
y = f(x, gamma) = x ^ gamma
Add an xmax value for scaling (i.e. x = 0 to 100), and the function becomes:
y = f(x, gamma) = ((x / xmax) ^ gamma) * xmax
or alternatively:
y = f(x, gamma) = (x ^ gamma) / (xmax ^ (gamma - 1))
You can take this a step further if you want to add a non-zero xmin.
When gamma is 1, the line is always perfectly linear (y = x). If x is less than 1, your curve bends upward. If x is greater than 1, your curve bends downward. The reciprocal value of gamma will convert the value back to the original (x = f(y, 1/g) = f(f(x, g), 1/g).
Just adjust the value of gamma according to your own taste and application needs. Since you're wanting to give the user multiple options for "sensitivity enhancement", you may want to give your users choices on a linear scale, say ranging from -4 (least sensitive) to 0 (no change) to 4 (most sensitive), and scale your internal gamma values with a power function. In other words, give the user choices of (-4, -3, -2, -1, 0, 1, 2, 3, 4), but translate that to gamma values of (5.06, 3.38, 2.25, 1.50, 1.00, 0.67, 0.44, 0.30, 0.20).
Coding that in C# might look something like this:
public class SensitivityAdjuster {
public SensitivityAdjuster() { }
public SensitivityAdjuster(int level) {
SetSensitivityLevel(level);
}
private double _Gamma = 1.0;
public void SetSensitivityLevel(int level) {
_Gamma = Math.Pow(1.5, level);
}
public double Adjust(double x) {
return (Math.Pow((x / 100), _Gamma) * 100);
}
}
To use it, create a new SensitivityAdjuster, set the sensitivity level according to user preferences (either using the constructor or the method, and -4 to 4 would probably be reasonable level values) and call Adjust(x) to get the adjusted output value. If you wanted a wider or narrower range of reasonable levels, you would reduce or increase that 1.5 value in the SetSensitivityLevels method. And of course the 100 represents your maximum x value.
I propose a simple formula, that (I believe) captures your requirement. In order to have a full "quarter circle", which is your extreme case, you would use (1-cos((x*pi)/(2*100)))*100.
What I suggest is that you take a weighted average between y=x and y=(1-cos((x*pi)/(2*100)))*100. For example, to have very close to linear (99% linear), take:
y = 0.99*x + 0.01*[(1-cos((x*pi)/(2*100)))*100]
Or more generally, say the level of linearity is L, and it's in the interval [0, 1], your formula will be:
y = L*x + (1-L)*[(1-cos((x*pi)/(2*100)))*100]
EDIT: I changed cos(x/100) to cos((x*pi)/(2*100)), because for the cos result to be in the range [1,0] X should be in the range of [0,pi/2] and not [0,1], sorry for the initial mistake.
You're probably looking for something like polynomial interpolation. A quadratic/cubic/quartic interpolation ought to give you the sorts of curves you show in the question. The differences between the three curves you show could probably be achieved just by adjusting the coefficients (which indirectly determine steepness).
The graph of y = x^p for x from 0 to 1 will do what you want as you vary p from 1 (which will give the red line) upwards. As p increases the curve will be 'pushed in' more and more. p doesn't have to be an integer.
(You'll have to scale to get 0 to 100 but I'm sure you can work that out)
I vote for Rax Olgud's general idea, with one modification:
y = alpha * x + (1-alpha)*(f(x/100)*100)
alt text http://www4c.wolframalpha.com/Calculate/MSP/MSP4501967d41e1aga1b3i00004bdeci2b6be2a59b?MSPStoreType=image/gif&s=6
where f(0) = 0, f(1) = 1, f(x) is superlinear, but I don't know where this "quarter circle" idea came from or why 1-cos(x) would be a good choice.
I'd suggest f(x) = xk where k = 2, 3, 4, 5, whatever gives you the desired degre of steepness for &alpha = 0. Pick a value for k as a fixed number, then vary α to choose your particular curve.
For problems like this, I will often get a few points from a curve and throw it through a curve fitting program. There are a bunch of them out there. Here's one with a 7-day free trial.
I've learned a lot by trying different models. Often you can get a pretty simple expression to come close to your curve.

Resources