Related
I have a vector with equal sized 0/1 elements, dna. And a similar vector with same size, flip. If the flip = 1, I want to flip the corresponding figure in the dna vector. So 0 would change to 1 and 1 would change to 0. And without looping to make it fast. My real dataset has a lot of data.
Below is some sample data:
#input
dna = c('0101010100', '1010101010', '1010101011')
flip = c('0100000001', '0000000000', '1000000000')
#requested answer
dna_flipped = c('0001010101', '1010101010', '0010101011')
#first element: second and 10th character is flipped
#second element: nothing is changed
#third element: first character is changed
#try loop solution
flip_split = lapply(strsplit(flip, ''), function(x) which(x == '1'))
for (i in 1:length(dna)){
for(j in seq_along(flip_split[[i]])){
k = flip_split[[i]][j]
substring(dna[[i]],k,k) = as.character(abs(1 - as.integer(substring(dna[[i]],k,k))))
}
}
How can this be done without a loop?
I think the logic you are describing is equivalent to a logical XOR. The difficult part is applying this to character strings. The following should work, and is st least vectorized per element so you don't need to iterate along individual characters:
unname(unlist(Map(function(a, b) {
paste(as.numeric(xor(as.numeric(charToRaw(a)) - 48 == 1,
as.numeric(charToRaw(b)) - 48 == 1)), collapse = "")
}, a = dna, b = flip)))
#> [1] "0001010101" "1010101010" "0010101011"
Or, perhaps more efficiently, as Ritchie Sacramento points out:
unname(unlist(Map(function(a, b) {
rawToChar(as.raw(as.numeric(xor(as.numeric(charToRaw(a)) - 48 == 1,
as.numeric(charToRaw(b)) - 48 == 1)) + 48))
}, a = dna, b = flip)))
#> [1] "0001010101" "1010101010" "0010101011"
How rounding up starting at .6 (not at .5)?
For example, round(53.51245, 4) will return me 53.5125, but I want 53.5124.
How can I specify a separation number (namely increase the values starting from .6)?
I'm not sure if this is a duplicate of the post linked to in the comments (but the post may certainly be relevant). From what I understand OP would like to "round" values up or down if they are >= 0.6 or < 0.6, respectively. (The linked post refers to the number of digits a number should be rounded to, which is a different issue.)
In response to OPs question, here is an option where we define a custom function my.round
my.round <- function(x, digits = 4, val = 0.6) {
z <- x * 10^digits
z <- ifelse(signif(z - trunc(z), 1) >= val, trunc(z + 1), trunc(z))
z / 10^digits
}
Then
x <- 53.51245
my.round(x, 4)
#[1] 53.5124
x <- 53.51246
my.round(x, 4)
#[1] 53.5125
my.round is vectorised, so we could have done
my.round(c(53.51245, 53.51246, 53.51246789), digits = 4)
#[1] 53.5124 53.5125 53.5125
I have a number, for example 1.128347132904321674821 that I would like to show as only two decimal places when output to screen (or written to a file). How does one do that?
x <- 1.128347132904321674821
EDIT:
The use of:
options(digits=2)
Has been suggested as a possible answer. Is there a way to specify this within a script for one-time use? When I add it to my script it doesn't seem to do anything different and I'm not interested in a lot of re-typing to format each number (I'm automating a very large report).
--
Answer: round(x, digits=2)
Background: Some answers suggested on this page (e.g., signif, options(digits=...)) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.
Solution:
The following code shows exactly two decimal places for the number x.
format(round(x, 2), nsmall = 2)
For example:
format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"
A more general function is as follows where x is the number and k is the number of decimals to show. trimws removes any leading white space which can be useful if you have a vector of numbers.
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
E.g.,
specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"
Discussion of alternatives:
The formatC answers and sprintf answers work fairly well. But they will show negative zeros in some cases which may be unwanted. I.e.,
formatC(c(-0.001), digits = 2, format = "f")
# [1] "-0.00"
sprintf(-0.001, fmt = '%#.2f')
# [1] "-0.00"
One possible workaround to this is as follows:
formatC(as.numeric(as.character(round(-.001, 2))), digits = 2, format = "f")
# [1] "0.00"
You can format a number, say x, up to decimal places as you wish. Here x is a number with many decimal places. Suppose we wish to show up to 8 decimal places of this number:
x = 1111111234.6547389758965789345
y = formatC(x, digits = 8, format = "f")
# [1] "1111111234.65473890"
Here format="f" gives floating numbers in the usual decimal places say, xxx.xxx, and digits specifies the number of digits. By contrast, if you wanted to get an integer to display you would use format="d" (much like sprintf).
You can try my package formattable.
> # devtools::install_github("renkun-ken/formattable")
> library(formattable)
> x <- formattable(1.128347132904321674821, digits = 2, format = "f")
> x
[1] 1.13
The good thing is, x is still a numeric vector and you can do more calculations with the same formatting.
> x + 1
[1] 2.13
Even better, the digits are not lost, you can reformat with more digits any time :)
> formattable(x, digits = 6, format = "f")
[1] 1.128347
for 2 decimal places assuming that you want to keep trailing zeros
sprintf(5.5, fmt = '%#.2f')
which gives
[1] "5.50"
As #mpag mentions below, it seems R can sometimes give unexpected values with this and the round method e.g. sprintf(5.5550, fmt='%#.2f') gives 5.55, not 5.56
Something like that :
options(digits=2)
Definition of digits option :
digits: controls the number of digits to print when printing numeric values.
If you prefer significant digits to fixed digits then, the signif command might be useful:
> signif(1.12345, digits = 3)
[1] 1.12
> signif(12.12345, digits = 3)
[1] 12.1
> signif(12345.12345, digits = 3)
[1] 12300
Check functions prettyNum, format
to have trialling zeros (123.1240 for example) use sprintf(x, fmt='%#.4g')
The function formatC() can be used to format a number to two decimal places. Two decimal places are given by this function even when the resulting values include trailing zeros.
I'm using this variant for force print K decimal places:
# format numeric value to K decimal places
formatDecimal <- function(x, k) format(round(x, k), trim=T, nsmall=k)
Note that numeric objects in R are stored with double precision, which gives you (roughly) 16 decimal digits of precision - the rest will be noise. I grant that the number shown above is probably just for an example, but it is 22 digits long.
Looks to me like to would be something like
library(tutoR)
format(1.128347132904321674821, 2)
Per a little online help.
if you just want to round a number or a list, simply use
round(data, 2)
Then, data will be round to 2 decimal place.
I wrote this function that could be improve but looks like works well in corner cases. For example, in the case of 0.9995 the vote correct answer gives us 1.00 which is incorrect. I use that solution in the case that the number has no decimals.
round_correct <- function(x, digits, chars = TRUE) {
if(grepl(x = x, pattern = "\\.")) {
y <- as.character(x)
pos <- grep(unlist(strsplit(x = y, split = "")), pattern = "\\.", value = FALSE)
if(chars) {
return(substr(x = x, start = 1, stop = pos + digits))
}
return(
as.numeric(substr(x = x, start = 1, stop = pos + digits))
)
} else {
return(
format(round(x, 2), nsmall = 2)
)
}
}
Example:
round_correct(10.59648, digits = 2)
[1] "10.59"
round_correct(0.9995, digits = 2)
[1] "0.99"
round_correct(10, digits = 2)
[1] "10.00"
here's my approach from units to millions.
digits parameter let me adjust the minimum number of significant values (integer + decimals). You could adjust decimal rounding inside first.
number <-function(number){
result <- if_else(
abs(number) < 1000000,
format(
number, digits = 3,
big.mark = ".",
decimal.mark = ","
),
paste0(
format(
number/1000000,
digits = 3,
drop0trailing = TRUE,
big.mark = ".",
decimal.mark = ","
),
"MM"
)
)
# result <- paste0("$", result)
return(result)
}
library(dplyr)
# round the numbers
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, 2), nsmall = 2)}))
Here I am changing all numeric values to have only 2 decimal places. If you need to change it to more decimal places
# round the numbers for k decimal places
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, k), nsmall = k)}))
Replace the k with the desired number of decimal places
I have a number, for example 1.128347132904321674821 that I would like to show as only two decimal places when output to screen (or written to a file). How does one do that?
x <- 1.128347132904321674821
EDIT:
The use of:
options(digits=2)
Has been suggested as a possible answer. Is there a way to specify this within a script for one-time use? When I add it to my script it doesn't seem to do anything different and I'm not interested in a lot of re-typing to format each number (I'm automating a very large report).
--
Answer: round(x, digits=2)
Background: Some answers suggested on this page (e.g., signif, options(digits=...)) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.
Solution:
The following code shows exactly two decimal places for the number x.
format(round(x, 2), nsmall = 2)
For example:
format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"
A more general function is as follows where x is the number and k is the number of decimals to show. trimws removes any leading white space which can be useful if you have a vector of numbers.
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
E.g.,
specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"
Discussion of alternatives:
The formatC answers and sprintf answers work fairly well. But they will show negative zeros in some cases which may be unwanted. I.e.,
formatC(c(-0.001), digits = 2, format = "f")
# [1] "-0.00"
sprintf(-0.001, fmt = '%#.2f')
# [1] "-0.00"
One possible workaround to this is as follows:
formatC(as.numeric(as.character(round(-.001, 2))), digits = 2, format = "f")
# [1] "0.00"
You can format a number, say x, up to decimal places as you wish. Here x is a number with many decimal places. Suppose we wish to show up to 8 decimal places of this number:
x = 1111111234.6547389758965789345
y = formatC(x, digits = 8, format = "f")
# [1] "1111111234.65473890"
Here format="f" gives floating numbers in the usual decimal places say, xxx.xxx, and digits specifies the number of digits. By contrast, if you wanted to get an integer to display you would use format="d" (much like sprintf).
You can try my package formattable.
> # devtools::install_github("renkun-ken/formattable")
> library(formattable)
> x <- formattable(1.128347132904321674821, digits = 2, format = "f")
> x
[1] 1.13
The good thing is, x is still a numeric vector and you can do more calculations with the same formatting.
> x + 1
[1] 2.13
Even better, the digits are not lost, you can reformat with more digits any time :)
> formattable(x, digits = 6, format = "f")
[1] 1.128347
for 2 decimal places assuming that you want to keep trailing zeros
sprintf(5.5, fmt = '%#.2f')
which gives
[1] "5.50"
As #mpag mentions below, it seems R can sometimes give unexpected values with this and the round method e.g. sprintf(5.5550, fmt='%#.2f') gives 5.55, not 5.56
Something like that :
options(digits=2)
Definition of digits option :
digits: controls the number of digits to print when printing numeric values.
If you prefer significant digits to fixed digits then, the signif command might be useful:
> signif(1.12345, digits = 3)
[1] 1.12
> signif(12.12345, digits = 3)
[1] 12.1
> signif(12345.12345, digits = 3)
[1] 12300
Check functions prettyNum, format
to have trialling zeros (123.1240 for example) use sprintf(x, fmt='%#.4g')
The function formatC() can be used to format a number to two decimal places. Two decimal places are given by this function even when the resulting values include trailing zeros.
I'm using this variant for force print K decimal places:
# format numeric value to K decimal places
formatDecimal <- function(x, k) format(round(x, k), trim=T, nsmall=k)
Note that numeric objects in R are stored with double precision, which gives you (roughly) 16 decimal digits of precision - the rest will be noise. I grant that the number shown above is probably just for an example, but it is 22 digits long.
Looks to me like to would be something like
library(tutoR)
format(1.128347132904321674821, 2)
Per a little online help.
if you just want to round a number or a list, simply use
round(data, 2)
Then, data will be round to 2 decimal place.
I wrote this function that could be improve but looks like works well in corner cases. For example, in the case of 0.9995 the vote correct answer gives us 1.00 which is incorrect. I use that solution in the case that the number has no decimals.
round_correct <- function(x, digits, chars = TRUE) {
if(grepl(x = x, pattern = "\\.")) {
y <- as.character(x)
pos <- grep(unlist(strsplit(x = y, split = "")), pattern = "\\.", value = FALSE)
if(chars) {
return(substr(x = x, start = 1, stop = pos + digits))
}
return(
as.numeric(substr(x = x, start = 1, stop = pos + digits))
)
} else {
return(
format(round(x, 2), nsmall = 2)
)
}
}
Example:
round_correct(10.59648, digits = 2)
[1] "10.59"
round_correct(0.9995, digits = 2)
[1] "0.99"
round_correct(10, digits = 2)
[1] "10.00"
here's my approach from units to millions.
digits parameter let me adjust the minimum number of significant values (integer + decimals). You could adjust decimal rounding inside first.
number <-function(number){
result <- if_else(
abs(number) < 1000000,
format(
number, digits = 3,
big.mark = ".",
decimal.mark = ","
),
paste0(
format(
number/1000000,
digits = 3,
drop0trailing = TRUE,
big.mark = ".",
decimal.mark = ","
),
"MM"
)
)
# result <- paste0("$", result)
return(result)
}
library(dplyr)
# round the numbers
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, 2), nsmall = 2)}))
Here I am changing all numeric values to have only 2 decimal places. If you need to change it to more decimal places
# round the numbers for k decimal places
df <- df %>%
mutate(across(where(is.numeric), .fns = function(x) {format(round(x, k), nsmall = k)}))
Replace the k with the desired number of decimal places
I'm working with variables resembling the data val values created below:
# data --------------------------------------------------------------------
data("mtcars")
val <- c(mtcars$wt, 10.55)
I'm cutting this variable in the following manner:
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val)
res <- cut2(x = val, cuts = cut_breaks)
which produces the following results:
> table(res)
res
[ 1, 2) [ 2, 3) [ 3, 4) [ 4, 5) [ 5, 6) 6 7 8 9 [10,11]
4 8 16 1 3 0 0 0 0 1
In the created output I would like to change the following:
I'm not interested in creating grups with one value. Ideally, I would like to for each group to have at least 3 / 4 values. Paradoxically, I can leave with groups having 0 values as those will dropped later on when mergining on my real data
Any changes to the cutting mechanism, have to work on a variable with integer values
The cuts have to be pretty. I'm trying to avoid something like 1.23 - 2.35. Even if those values would be most sensible considering the distribution.
In effect, what I'm trying to achieve is this: try to make more or less even pretty group and if getting a really tiny group then bump it together with the next group, do not worry about empty groups.
Full code
For convenience, the full code is available below:
# Libs --------------------------------------------------------------------
Vectorize(require)(package = c("scales", "Hmisc"),
character.only = TRUE)
# data --------------------------------------------------------------------
data("mtcars") val <- c(mtcars$wt, 10.55)
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val) res <-
cut2(x = val, cuts = cut_breaks)
What I've tried
First approach
I tried to play with the eps.correct = 0 value in the pretty_breaks like in the code:
cut_breaks <- pretty_breaks(n = cuts, eps.correct = 0)(variable)
but none of the values gets me anwhere were close
Second approach
I've also tried using the m= 5 argument in the cut2 function but I keep on arriving at the same result.
Comment replies
My breaks function
I tried the mybreaks function but I would have to put some work into it to get nice cuts for more bizzare variables. Broadly speaking, pretty_breaks cuts well for me, juts the tiny groups that occur from time to time are not desired.
> set.seed(1); require(scales)
> mybreaks <- function(x, n, r=0) {
+ unique(round(quantile(x, seq(0, 1, length=n+1)), r))
+ }
> x <- runif(n = 100)
> pretty_breaks(n = 5)(x)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> mybreaks(x = x, n = 5)
[1] 0 1
You could use the quantile() function as a relatively easy way to get similar numbers of observations in each of your groups.
For example, here's a function that takes a vector of values x, a desired number of groups n, and a desired rounding off point r for the breaks, and gives you suggested cut points.
mybreaks <- function(x, n, r=0) {
unique(round(quantile(x, seq(0, 1, length=n+1)), r))
}
cut_breaks <- mybreaks(val, 5)
res <- cut(val, cut_breaks, include.lowest=TRUE)
table(res)
[2,3] (3,4] (4,11]
8 16 5