Numeric Matching / Extracting with Hard Coded Values in R - r

Having trouble understanding numeric matching / indexing in R.
If I have a situation where I create a dataframe such as:
options(digits = 3)
x <- seq(from = 0, to = 5, by = 0.10)
TestDF <- data.frame(x = x, y = dlnorm(x))
and I wanted to compare a hardcoded value to my y column -
> TestDF[TestDF$y == 0.0230,]$x
numeric(0)
That being said, if I compare to the value that's straight out of the dataframe (which for an x value of 4.9, should be a y value of 0.0230).
> TestDF[TestDF$y == TestDF[50,]$y,]$x
[1] 4.9
Does this have to do with exact matching? If I limit the digits to 3 decimal point, then 0.0230000 won't be the same as the original value in y I'm comparing to? If this is the case, is there a way around it if I do need to extract values based on rounded, hard-coded values?

You can use round() function to reduce the number of decimal digits to the preferred scale of the floating point number. See below.
set.seed(1L)
x <- seq(from = 0, to = 5, by = 0.10)
TestDF <- data.frame(x = x, y = dlnorm(x))
constant <- 0.023
TestDF[ with(TestDF, round(y, 3) == constant), ]
# x y
# 50 4.9 0.02302884

You can compare the rounded y with the stated value:
> any(TestDF$y == 0.0230)
[1] FALSE
> any(round(TestDF$y, 3) == 0.0230)
[1] TRUE
I'm not certain you grok the meaning of the digits option. From ?options it says about digits
digits: controls the number of significant digits to print when printing numeric values.
(emphasis mine.) So this only affects how the values are printed, not how they are stored.
You generated a set of reals, none of which are exactly 0.0230. This has nothing to do with exact matching. The value you indicated should be 0.0230 is actually stored as
> with(TestDF, print(y[50], digits = 22))
[1] 0.02302883835550340041465
regardless of the digits setting in options because that setting only affects the printed value. And the issue is not exact matching because even with the small fudge allowed by the recommended way to do comparisons, all.equal(), y[50] and 0.0230 are still not equal
> with(TestDF, all.equal(0.0230, y[50]))
[1] "Mean relative difference: 0.001253842"

Related

R: round up from .6

How rounding up starting at .6 (not at .5)?
For example, round(53.51245, 4) will return me 53.5125, but I want 53.5124.
How can I specify a separation number (namely increase the values ​​starting from .6)?
I'm not sure if this is a duplicate of the post linked to in the comments (but the post may certainly be relevant). From what I understand OP would like to "round" values up or down if they are >= 0.6 or < 0.6, respectively. (The linked post refers to the number of digits a number should be rounded to, which is a different issue.)
In response to OPs question, here is an option where we define a custom function my.round
my.round <- function(x, digits = 4, val = 0.6) {
z <- x * 10^digits
z <- ifelse(signif(z - trunc(z), 1) >= val, trunc(z + 1), trunc(z))
z / 10^digits
}
Then
x <- 53.51245
my.round(x, 4)
#[1] 53.5124
x <- 53.51246
my.round(x, 4)
#[1] 53.5125
my.round is vectorised, so we could have done
my.round(c(53.51245, 53.51246, 53.51246789), digits = 4)
#[1] 53.5124 53.5125 53.5125

Losing precision in dataframe while changing column datatype [duplicate]

I have a number, for example 1.128347132904321674821 that I would like to show as only two decimal places when output to screen (or written to a file). How does one do that?
x <- 1.128347132904321674821
EDIT:
The use of:
options(digits=2)
Has been suggested as a possible answer. Is there a way to specify this within a script for one-time use? When I add it to my script it doesn't seem to do anything different and I'm not interested in a lot of re-typing to format each number (I'm automating a very large report).
--
Answer: round(x, digits=2)
Background: Some answers suggested on this page (e.g., signif, options(digits=...)) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.
Solution:
The following code shows exactly two decimal places for the number x.
format(round(x, 2), nsmall = 2)
For example:
format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"
A more general function is as follows where x is the number and k is the number of decimals to show. trimws removes any leading white space which can be useful if you have a vector of numbers.
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
E.g.,
specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"
Discussion of alternatives:
The formatC answers and sprintf answers work fairly well. But they will show negative zeros in some cases which may be unwanted. I.e.,
formatC(c(-0.001), digits = 2, format = "f")
# [1] "-0.00"
sprintf(-0.001, fmt = '%#.2f')
# [1] "-0.00"
One possible workaround to this is as follows:
formatC(as.numeric(as.character(round(-.001, 2))), digits = 2, format = "f")
# [1] "0.00"
You can format a number, say x, up to decimal places as you wish. Here x is a number with many decimal places. Suppose we wish to show up to 8 decimal places of this number:
x = 1111111234.6547389758965789345
y = formatC(x, digits = 8, format = "f")
# [1] "1111111234.65473890"
Here format="f" gives floating numbers in the usual decimal places say, xxx.xxx, and digits specifies the number of digits. By contrast, if you wanted to get an integer to display you would use format="d" (much like sprintf).
You can try my package formattable.
> # devtools::install_github("renkun-ken/formattable")
> library(formattable)
> x <- formattable(1.128347132904321674821, digits = 2, format = "f")
> x
[1] 1.13
The good thing is, x is still a numeric vector and you can do more calculations with the same formatting.
> x + 1
[1] 2.13
Even better, the digits are not lost, you can reformat with more digits any time :)
> formattable(x, digits = 6, format = "f")
[1] 1.128347
for 2 decimal places assuming that you want to keep trailing zeros
sprintf(5.5, fmt = '%#.2f')
which gives
[1] "5.50"
As #mpag mentions below, it seems R can sometimes give unexpected values with this and the round method e.g. sprintf(5.5550, fmt='%#.2f') gives 5.55, not 5.56
Something like that :
options(digits=2)
Definition of digits option :
digits: controls the number of digits to print when printing numeric values.
If you prefer significant digits to fixed digits then, the signif command might be useful:
> signif(1.12345, digits = 3)
[1] 1.12
> signif(12.12345, digits = 3)
[1] 12.1
> signif(12345.12345, digits = 3)
[1] 12300
Check functions prettyNum, format
to have trialling zeros (123.1240 for example) use sprintf(x, fmt='%#.4g')
The function formatC() can be used to format a number to two decimal places. Two decimal places are given by this function even when the resulting values include trailing zeros.
I'm using this variant for force print K decimal places:
# format numeric value to K decimal places
formatDecimal <- function(x, k) format(round(x, k), trim=T, nsmall=k)
Note that numeric objects in R are stored with double precision, which gives you (roughly) 16 decimal digits of precision - the rest will be noise. I grant that the number shown above is probably just for an example, but it is 22 digits long.
Looks to me like to would be something like
library(tutoR)
format(1.128347132904321674821, 2)
Per a little online help.
if you just want to round a number or a list, simply use
round(data, 2)
Then, data will be round to 2 decimal place.
I wrote this function that could be improve but looks like works well in corner cases. For example, in the case of 0.9995 the vote correct answer gives us 1.00 which is incorrect. I use that solution in the case that the number has no decimals.
round_correct <- function(x, digits, chars = TRUE) {
if(grepl(x = x, pattern = "\\.")) {
y <- as.character(x)
pos <- grep(unlist(strsplit(x = y, split = "")), pattern = "\\.", value = FALSE)
if(chars) {
return(substr(x = x, start = 1, stop = pos + digits))
}
return(
as.numeric(substr(x = x, start = 1, stop = pos + digits))
)
} else {
return(
format(round(x, 2), nsmall = 2)
)
}
}
Example:
round_correct(10.59648, digits = 2)
[1] "10.59"
round_correct(0.9995, digits = 2)
[1] "0.99"
round_correct(10, digits = 2)
[1] "10.00"
here's my approach from units to millions.
digits parameter let me adjust the minimum number of significant values (integer + decimals). You could adjust decimal rounding inside first.
number <-function(number){
result <- if_else(
abs(number) < 1000000,
format(
number, digits = 3,
big.mark = ".",
decimal.mark = ","
),
paste0(
format(
number/1000000,
digits = 3,
drop0trailing = TRUE,
big.mark = ".",
decimal.mark = ","
),
"MM"
)
)
# result <- paste0("$", result)
return(result)
}
library(dplyr)
# round the numbers
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, 2), nsmall = 2)}))
Here I am changing all numeric values to have only 2 decimal places. If you need to change it to more decimal places
# round the numbers for k decimal places
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, k), nsmall = k)}))
Replace the k with the desired number of decimal places

R: Change Vector Output to Several Ranges

I am using Jenks Natural Breaks via the BAMMtools package to segment my data in RStudio Version 1.0.153. The output is a vector that shows where the natural breaks occur in my data set, as such:
[1] 14999 41689 58415 79454 110184 200746
I would like to take the output above and create the ranges inferred by the breaks. Ex: 14999-41689, 41690-58415, 58416-79454, 79455-110184, 110185-200746
Are there any functions that I can use in R Studio to accomplish this? Thank you in advance!
Input data
x <- c(14999, 41689, 58415, 79454, 110184, 200746)
If you want the ranges as characters you can do
y <- x; y[1] <- y[1] - 1 # First range given in question doesn't follow the pattern. Adjusting for that
paste(head(y, -1) + 1, tail(y, -1), sep = '-')
#[1] "14999-41689" "41690-58415" "58416-79454" "79455-110184" "110185-200746"
If you want a list of the actual sets of numbers in each range you can do
seqs <- Map(seq, head(y, -1) + 1, tail(y, -1))
You can definitely create your own function that produces the exact output you're looking for, but you can use the cut function that will give you something like this:
# example vector
x = c(14999, 41689, 58415, 79454, 110184, 200746)
# use the vector and its values as breaks
ranges = cut(x, x, dig.lab = 6)
# see the levels
levels(ranges)
#[1] "(14999,41689]" "(41689,58415]" "(58415,79454]" "(79454,110184]" "(110184,200746]"

How to format upto two decimal digit in R? [duplicate]

I have a number, for example 1.128347132904321674821 that I would like to show as only two decimal places when output to screen (or written to a file). How does one do that?
x <- 1.128347132904321674821
EDIT:
The use of:
options(digits=2)
Has been suggested as a possible answer. Is there a way to specify this within a script for one-time use? When I add it to my script it doesn't seem to do anything different and I'm not interested in a lot of re-typing to format each number (I'm automating a very large report).
--
Answer: round(x, digits=2)
Background: Some answers suggested on this page (e.g., signif, options(digits=...)) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.
Solution:
The following code shows exactly two decimal places for the number x.
format(round(x, 2), nsmall = 2)
For example:
format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"
A more general function is as follows where x is the number and k is the number of decimals to show. trimws removes any leading white space which can be useful if you have a vector of numbers.
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
E.g.,
specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"
Discussion of alternatives:
The formatC answers and sprintf answers work fairly well. But they will show negative zeros in some cases which may be unwanted. I.e.,
formatC(c(-0.001), digits = 2, format = "f")
# [1] "-0.00"
sprintf(-0.001, fmt = '%#.2f')
# [1] "-0.00"
One possible workaround to this is as follows:
formatC(as.numeric(as.character(round(-.001, 2))), digits = 2, format = "f")
# [1] "0.00"
You can format a number, say x, up to decimal places as you wish. Here x is a number with many decimal places. Suppose we wish to show up to 8 decimal places of this number:
x = 1111111234.6547389758965789345
y = formatC(x, digits = 8, format = "f")
# [1] "1111111234.65473890"
Here format="f" gives floating numbers in the usual decimal places say, xxx.xxx, and digits specifies the number of digits. By contrast, if you wanted to get an integer to display you would use format="d" (much like sprintf).
You can try my package formattable.
> # devtools::install_github("renkun-ken/formattable")
> library(formattable)
> x <- formattable(1.128347132904321674821, digits = 2, format = "f")
> x
[1] 1.13
The good thing is, x is still a numeric vector and you can do more calculations with the same formatting.
> x + 1
[1] 2.13
Even better, the digits are not lost, you can reformat with more digits any time :)
> formattable(x, digits = 6, format = "f")
[1] 1.128347
for 2 decimal places assuming that you want to keep trailing zeros
sprintf(5.5, fmt = '%#.2f')
which gives
[1] "5.50"
As #mpag mentions below, it seems R can sometimes give unexpected values with this and the round method e.g. sprintf(5.5550, fmt='%#.2f') gives 5.55, not 5.56
Something like that :
options(digits=2)
Definition of digits option :
digits: controls the number of digits to print when printing numeric values.
If you prefer significant digits to fixed digits then, the signif command might be useful:
> signif(1.12345, digits = 3)
[1] 1.12
> signif(12.12345, digits = 3)
[1] 12.1
> signif(12345.12345, digits = 3)
[1] 12300
Check functions prettyNum, format
to have trialling zeros (123.1240 for example) use sprintf(x, fmt='%#.4g')
The function formatC() can be used to format a number to two decimal places. Two decimal places are given by this function even when the resulting values include trailing zeros.
I'm using this variant for force print K decimal places:
# format numeric value to K decimal places
formatDecimal <- function(x, k) format(round(x, k), trim=T, nsmall=k)
Note that numeric objects in R are stored with double precision, which gives you (roughly) 16 decimal digits of precision - the rest will be noise. I grant that the number shown above is probably just for an example, but it is 22 digits long.
Looks to me like to would be something like
library(tutoR)
format(1.128347132904321674821, 2)
Per a little online help.
if you just want to round a number or a list, simply use
round(data, 2)
Then, data will be round to 2 decimal place.
I wrote this function that could be improve but looks like works well in corner cases. For example, in the case of 0.9995 the vote correct answer gives us 1.00 which is incorrect. I use that solution in the case that the number has no decimals.
round_correct <- function(x, digits, chars = TRUE) {
if(grepl(x = x, pattern = "\\.")) {
y <- as.character(x)
pos <- grep(unlist(strsplit(x = y, split = "")), pattern = "\\.", value = FALSE)
if(chars) {
return(substr(x = x, start = 1, stop = pos + digits))
}
return(
as.numeric(substr(x = x, start = 1, stop = pos + digits))
)
} else {
return(
format(round(x, 2), nsmall = 2)
)
}
}
Example:
round_correct(10.59648, digits = 2)
[1] "10.59"
round_correct(0.9995, digits = 2)
[1] "0.99"
round_correct(10, digits = 2)
[1] "10.00"
here's my approach from units to millions.
digits parameter let me adjust the minimum number of significant values (integer + decimals). You could adjust decimal rounding inside first.
number <-function(number){
result <- if_else(
abs(number) < 1000000,
format(
number, digits = 3,
big.mark = ".",
decimal.mark = ","
),
paste0(
format(
number/1000000,
digits = 3,
drop0trailing = TRUE,
big.mark = ".",
decimal.mark = ","
),
"MM"
)
)
# result <- paste0("$", result)
return(result)
}
library(dplyr)
# round the numbers
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, 2), nsmall = 2)}))
Here I am changing all numeric values to have only 2 decimal places. If you need to change it to more decimal places
# round the numbers for k decimal places
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, k), nsmall = k)}))
Replace the k with the desired number of decimal places

return the index of a vector when the difference between the index and value satisfies a condition in r

I have been having trouble phrasing this question, so if anyone can edit it up to standard that would be great.
I have a vector that looks like this:
x <- c(1, 2, 5)
How do i return the last index where the difference between the value of the vector in that position and the position is = 0.
In this case, I would like to have
2
as the difference between the value of the vector and its position for the third element is > 0
x[3]-3.
As a side note, this is part of a larger function, where the vector 'x' was built as a vector of values that satisfy a condition (being outside of a range). In this example, the vector 'x' was built as the indexes of the vector
y <- c(1, -0.544099347708607, 0.0330854828196116, 0.126862586350202, -0.189999318205021, 0.0709946572904202, -0.0290039765997793, 0.12201693346217, -0.120410983904152, 0.0974094609584081, -0.119147919464352, 0.0154264136176002, 0.115102403861495, -0.145980255860186, 0.116998886386955, -0.137041816761002, 0.114352714471954, 0.0228895094121642, -0.0679735427311049, 0.0350071153004831, -0.0145366468920295)
Which are outside of the range (-.18, .18)
plot.ts(y)
abline(h = 0.18)
abline(h = -0.18)
You can use the Position function:
Position(function(x) {x == 0}, x - 1:length(x), right=T)
See http://stat.ethz.ch/R-manual/R-devel/library/base/html/funprog.html for more functions.
Or as #Frank said below,
Position(`!`, x - 1:length(x), right=T)
This is because 0 is falsey and other numbers are truthy.
I think the simplest approach is to test equality, not test for the difference being zero:
tail(which(x==seq_along(x)),1)
# 2
Here is another approach:
index <- 1:length(x)
max(which(x - index == 0))
#[1] 2
Or as the other Frank points out, you could test for equality instead of the difference being 0.
max(which(x == index))
One can also try this
tail(which((1:length(x)-x)==0),1)

Resources