Formatting negative numbers with brackets - r

I would like to format my negative numbers in "Accounting" format, i.e. with brackets.
For example, I would like to format -1000000 as (1,000,000).
I know the way of introducing thousands-separator:
prettyNum(-1000000, big.mark=",",scientific=F)
However, I am not sure how to introduce the brackets. I would like to be able to apply the formatting to a whole vector, but I would want only the negative numbers to be affected. Not that after introducing the thousands separator, the vector of numbers is now a characater vector, example:
"-50,000" "50,000" "-50,000" "-49,979" "-48,778" "-45,279" "-41,321"
Any ideas? Thanks.

You can try my package formattable which has a built-in function accounting to apply accounting format to numeric vector.
> # devtools::install_github("renkun-ken/formattable")
> library(formattable)
> accounting(c(123456,-23456,-789123456))
[1] 123,456.00 (23,456.00) (789,123,456.00)
You can print the numbers as integers:
> accounting(c(123456,-23456,-789123456), format = "d")
[1] 123,456 (23,456) (789,123,456)
These numbers work with arithmetic calculations:
> money <- accounting(c(123456,-23456,-789123456), format = "d")
> money
[1] 123,456 (23,456) (789,123,456)
> money + 5000
[1] 128,456 (18,456) (789,118,456)
It also works in data.frame printing:
> data.frame(date = as.Date("2015-01-01") + 1:10,
balance = accounting(cumsum(rnorm(10, 0, 100000))))
date balance
1 2015-01-02 (21,929.80)
2 2015-01-03 (246,927.59)
3 2015-01-04 (156,210.85)
4 2015-01-05 (135,122.80)
5 2015-01-06 (199,713.06)
6 2015-01-07 (91,938.03)
7 2015-01-08 (34,600.47)
8 2015-01-09 147,165.57
9 2015-01-10 180,443.31
10 2015-01-11 251,141.04

Another way, without regex:
x <- c(-50000, 50000, -50000, -49979, -48778, -45279, -41321)
x.comma <- prettyNum(abs(x), big.mark=',')
ifelse(x >= 0, x.comma, paste0('(', x.comma, ')'))
# [1] "(50,000)" "50,000" "(50,000)" "(49,979)" "(48,778)" "(45,279)" "(41,321)"

Here's an approach that gives the leading spaces:
x <- c(-10000000, -4444, 1, 333)
num <- gsub("^\\s+|\\s+$", "", prettyNum(abs(x), ,big.mark=",", scientific=F))
num[x < 0] <- sprintf("(%s)", num[x < 0])
sprintf(paste0("%0", max(nchar(as.character(num))), "s"), num)
## [1] "(10,000,000)" " (4,444)" " 1" " 333"

A very easy approach is using paste0 and sub. Here's a simple function for this:
my.format <- function(num){
ind <- grepl("-", num)
num[ind] <- paste0("(", sub("-", "", num[ind]), ")")
num
}
> num <- c("-50,000", "50,000", "-50,000", "-49,979", "-48,778", "-45,279", "-41,321")
> my.format(num)
[1] "(50,000)" "50,000" "(50,000)" "(49,979)" "(48,778)" "(45,279)" "(41,321)"
If you want to reverse the situation, let's say, you have a vector like this:
num2 <- my.format(num)
and you want to replace (·) by -, then try
sub(")", "", sub("\\(", "-", num2))

Maybe this?:
a <- prettyNum(-1000000, ,big.mark=",",scientific=F)
paste("(", sub("-", "", a), ")", sep = "")

Related

Add a column on a dataframe

I have an R problem if you can help.
x <- data.frame("LocationCode" = c("ESC3","RIECAA6","SJHMAU","RIE104","SJH11","SJHAE","RIEAE1","WGH54","RIE205","GSBROB"), "HospitalNumber" = c("701190923R","2905451068","700547389X","AN11295201","1204541612","104010665","800565884R","620063158W","600029720K","1112391223"),"DisciplineName" = c("ESC Biochemistry", "RIE Haematology","SJH Biochemistry","RIE Biochemistry","SJH Biochemistry","WGH Biochemistry","ESC Biochemistry","WGH Biochemistry","SJH Biochemistry","RIE Haematology"))
From the dataframe above i do wish to add a new column (CRN) made up of all "HospitalNumber" rows with 9 digits plus 1 letter at the end (e.g 701190923R), create another column (TIT) with the rest of the rows which does not meet the 1st criteria
You can do this in base using the code
# Identify cases which match 9 digits then one letter
CRMMatch <- grepl("^\\d{9}[[:alpha:]]$", as.character(x$HospitalNumber))
#Create columns from Hospital number among the matches or those that do not match
x$CRN[CRMMatch] <- as.character(x$HospitalNumber)[CRMMatch]
x$TIT[!CRMMatch] <- as.character(x$HospitalNumber)[!CRMMatch]
# clean up by removing the variable created of matches
rm(CRMMatch)
A dplyr version could be
library(dplyr)
x <-
x %>%
mutate(CRN = if_else(grepl("^\\d{9}[[:alpha:]]$", as.character(HospitalNumber)),as.character(HospitalNumber), NA_character_),
TIT = if_else(!grepl("^\\d{9}[[:alpha:]]$", as.character(HospitalNumber)),as.character(HospitalNumber), NA_character_))
You can detect what you need with the instruction
library(stringr)
str_which(x$HospitalNumber,"[:digit:][:alpha:]")
and you get:
> str_which(x$HospitalNumber,"[:digit:][:alpha:]")
[1] 1 3 7 8 9
Then you know what positions you need and what you don't
Quite similar to Kerry Jackson's approach but using ifelse in base R. I have also converted your x$HospitalNumber from factor to character from the start, assuming that this is what you really want:
x[2] <- as.character( x[ , 2 ] )
x$CRN <- ifelse( grepl( "^\\d{9}[[:alpha:]]$", x$HospitalNumber) , x$HospitalNumber, "" )
x$TIT <- ifelse( x$CRN != "", "", x$HospitalNumber )
gives you
> x
LocationCode HospitalNumber DisciplineName CRN TIT
1 ESC3 701190923R ESC Biochemistry 701190923R
2 RIECAA6 2905451068 RIE Haematology 2905451068
3 SJHMAU 700547389X SJH Biochemistry 700547389X
4 RIE104 AN11295201 RIE Biochemistry AN11295201
5 SJH11 1204541612 SJH Biochemistry 1204541612
6 SJHAE 104010665 WGH Biochemistry 104010665
7 RIEAE1 800565884R ESC Biochemistry 800565884R
8 WGH54 620063158W WGH Biochemistry 620063158W
9 RIE205 600029720K SJH Biochemistry 600029720K
10 GSBROB 1112391223 RIE Haematology 1112391223

R - How to change values with decimals into a different form

I have a variable that takes values of the form x.1, x.2 or x.3 currently with x being any number followed by the decimal point.
I would like to convert x.1 to x.333, x.2 to x.666 and x.3 to x.999 or in this case I would assume it would be rounded up to the whole number.
Context: running regression analysis containing a variable of innings pitched (baseball pitchers) which currently have data values of the .1, .2, .3 form above.
Help would be much appreciated!
You can use x %% 1 to get the fractional part of a number in R. Then just multiply that by 3.333 and add the result back on to the integer part of your number to get total innings pitched.
x <- 2.3
as.integer(x) + (x %% 1 * 3.333)
[1] 2.9999
(Use 3.333 instead of 0.333 to move the decimal.)
Depending on the exact context, it could be nice to keep the component parts -- if that's the case, I would be a little verbose and utilize tidyr and dplyr:
library(tidyr)
library(dplyr)
vec <- c("123.1", "456.2", "789.3")
df <- data.frame(vec)
df %>%
separate(vec, into = c("before_dot", "after_dot"), remove = FALSE, convert = TRUE) %>%
mutate(after_dot_times_333 = after_dot * 333,
new_var = paste(before_dot, after_dot_times_333, sep = "."))
# vec before_dot after_dot after_dot_times_333 new_var
# 1 123.1 123 1 333 123.333
# 2 456.2 456 2 666 456.666
# 3 789.3 789 3 999 789.999
Alternatively, you could accomplish this in one line:
sapply(strsplit(vec, "\\."), function(x) paste(x[1], as.numeric(x[2]) * 333, sep = "."))

Replace specific characters in a variable in data frame in R

I want to replace all ,, -, ), ( and (space) with . from the variable DMA.NAME in the example data frame. I referred to three posts and tried their approaches but all failed.:
Replacing column values in data frame, not included in list
R replace all particular values in a data frame
Replace characters from a column of a data frame R
Approach 1
> shouldbecomeperiod <- c$DMA.NAME %in% c("-", ",", " ", "(", ")")
c$DMA.NAME[shouldbecomeperiod] <- "."
Approach 2
> removetext <- c("-", ",", " ", "(", ")")
c$DMA.NAME <- gsub(removetext, ".", c$DMA.NAME)
c$DMA.NAME <- gsub(removetext, ".", c$DMA.NAME, fixed = TRUE)
Warning message:
In gsub(removetext, ".", c$DMA.NAME) :
argument 'pattern' has length > 1 and only the first element will be used
Approach 3
> c[c == c(" ", ",", "(", ")", "-")] <- "."
Sample data frame
> df
DMA.CODE DATE DMA.NAME count
111 22 8/14/2014 12:00:00 AM Columbus, OH 1
112 23 7/15/2014 12:00:00 AM Orlando-Daytona Bch-Melbrn 1
79 18 7/30/2014 12:00:00 AM Boston (Manchester) 1
99 22 8/20/2014 12:00:00 AM Columbus, OH 1
112.1 23 7/15/2014 12:00:00 AM Orlando-Daytona Bch-Melbrn 1
208 27 7/31/2014 12:00:00 AM Minneapolis-St. Paul 1
I know the problem - gsub uses pattern and only first element . The other two approaches are searching the entire variable for the exact value instead of searching within value for specific characters.
You can use the special groups [:punct:] and [:space:] inside of a pattern group ([...]) like this:
df <- data.frame(
DMA.NAME = c(
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Boston (Manchester)",
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Minneapolis-St. Paul"),
stringsAsFactors=F)
##
> gsub("[[:punct:][:space:]]+","\\.",df$DMA.NAME)
[1] "Columbus.OH" "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester." "Columbus.OH"
[5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"
If your data frame is big you might want to look at this fast function from stringi package. This function replaces every character of specific class for another. In this case character class is L - letters (inside {}), but big P (before {}) indicates that we are looking for the complements of this set, so for every non letter character. Merge indicates that consecutive matches should be merged into a single one.
require(stringi)
stri_replace_all_charclass(df$DMA.NAME, "\\P{L}",".", merge=T)
## [1] "Columbus.OH" "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester." "Columbus.OH"
## [5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"
And some benchmarks:
x <- sample(df$DMA.NAME, 1000, T)
gsubFun <- function(x){
gsub("[[:punct:][:space:]]+","\\.",x)
}
striFun <- function(x){
stri_replace_all_charclass(x, "\\P{L}",".", T)
}
require(microbenchmark)
microbenchmark(gsubFun(x), striFun(x))
Unit: microseconds
expr min lq median uq max neval
gsubFun(x) 3472.276 3511.0015 3538.097 3573.5835 11039.984 100
striFun(x) 877.259 893.3945 907.769 929.8065 3189.017 100

r ifelse date not adding days

I need to compute a condition over a column date in R. Atable would be:
PIL_final1<-data.frame( prior_day1_cart=c(4,8),
prior_day1_comp=c('2014-06-03','2014-06-07'),
dia_lim_23_cart=c('201-07-30','201-07-30') )
PIL_final1$prior_day1_comp<-as.Date(PIL_final1$prior_day1_comp, format='%Y-%m-%d')
PIL_final1$dia_lim_23_cart<-as.Date(PIL_final1$dia_lim_23_cart, format='%Y-%m-%d')
So I use ifelse:
PIL_final1$llamar_dia<-ifelse(PIL_final1$prior_day1_cart+6>23,
PIL_final1$dia_lim_23_cart ,
PIL_final1$prior_day1_comp+6)
But I get:
> PIL_final1
prior_day1_cart prior_day1_comp dia_lim_23_cart llamar_dia
1 4 2014-06-03 0201-07-30 16230
2 8 2014-06-07 0201-07-30 16234
And if I do:
> PIL_final1$prior_day1_comp+6
[1] "2014-06-09" "2014-06-13"
I get the right results.
How can I do the ifelse and get the date? thanks.
Also if I try this, I still get a number (although different):
> PIL_final1$llamar_dia<-ifelse(PIL_final1$prior_day1_cart+6>23,
+ PIL_final1$dia_lim_23_cart ,
+ as.Date(PIL_final$prior_day1_comp+6,format="%Y-%m-%d"))
> PIL_final1
prior_day1_cart prior_day1_comp dia_lim_23_cart llamar_dia
1 4 2014-06-03 0201-07-30 16376
2 8 2014-06-07 0201-07-30 16377
Edition:
Also if I do this:
> as.Date(ifelse(PIL_final1$prior_day1_cart+6>23, PIL_final1$dia_lim_23_cart ,
+ PIL_final1$prior_day1_comp+6), format="%Y-%m-%d", origin="1970-01-01")
[1] "2014-06-09" "2014-06-13"
I get the right results, but if I replace the ifelse with the vector result, I get the wrong dates:
> PIL_final1$llamar_dia<-ifelse(PIL_final1$prior_day1_cart+6>23,
+ PIL_final1$dia_lim_23_cart ,
+ PIL_final$prior_day1_comp+6)
> as.Date(PIL_final1$llamar_dia, format="%Y-%m-%d", origin="1970-01-01")
[1] "2014-11-02" "2014-11-03"
from ?ifelse :
The mode of the result may depend on the value of test (see the examples), Sometimes it is better >to use a construction such as
ifelse(test, yes, no) ~~ (tmp <- yes; tmp[!test] <- no[!test]; tmp)
Applying this :
dat$d3 <-
with(dat,{
tmp <- d2+6; tmp[!(x+6>23)] <- d1[!(x+6>23)]; tmp
})
dat
x d1 d2 d3
1 4 2014-06-03 0201-07-30 2014-06-03
2 8 2014-06-07 0201-07-30 2014-06-07
Maybe you should modify this to handle missing values in test.
Note I changed the variables names since yours are really long to type and a real source of errors.
dat <- data.frame( x=c(4,8),
d1=c('2014-06-03','2014-06-07'),
d2=c('201-07-30','201-07-30') )

Represent numeric value with typical dollar amount format

I have a data frame storing the dollar amount, it looks like this
> a
cost
1 1e+05
2 2e+05
I would like it can be shown as this
> a
cost
1 $100,000
2 $200,000
How to do that in R?
DF <- data.frame(cost=c(1e4, 2e5))
#assign a class
class(DF$cost) <- c("money", class(DF$cost))
#S3 print method for the class
print.money <- function(x, ...) {
print.default(paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=",")))
}
#format method, which is necessary for formating in a data.frame
format.money <- function(x, ...) {
paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=","))
}
DF
# cost
#1 $10,000.00
#2 $200,000.00
This will get you everything except the commas:
> sprintf("$%.2f", seq(100,100000,by=10000)/7)
[1] "$14.29" "$1442.86" "$2871.43" "$4300.00" "$5728.57" "$7157.14" "$8585.71" "$10014.29" "$11442.86" "$12871.43"
Getting those is pretty complicated, as shown in these questions:
How can I format currency with commas in C?
How to format a number from 1123456789 to 1,123,456,789 in C?
Luckily, this is implemented in the scales package:
library('scales')
> dollar_format()(c(100, 0.23, 1.456565, 2e3))
## [1] "$100.00" "$0.23" "$1.46" "$2,000.00"
> dollar_format()(c(1:10 * 10))
## [1] "$10" "$20" "$30" "$40" "$50" "$60" "$70" "$80" "$90" "$100"
> dollar(c(100, 0.23, 1.456565, 2e3))
## [1] "$100.00" "$0.23" "$1.46" "$2,000.00"
> dollar(c(1:10 * 10))
## [1] "$10" "$20" "$30" "$40" "$50" "$60" "$70" "$80" "$90" "$100"
> dollar(10^(1:8))
## [1] "$10" "$100" "$1,000" "$10,000" "$100,000" "$1,000,000" "$10,000,000" "$100,000,000"
You can use the currency() function from the formattable package. With OP's example
a <- data.frame(cost = c(1e+05, 2e+05))
a
cost
1 1e+05
2 2e+05
library(formattable)
a$cost <- currency(a$cost, digits = 0L)
a
cost
1 $100,000
2 $200,000
By default, 2 digits after the decimal point are shown. This has been overruled using the digits parameter to meet OP's expectations.
The benfit of formattable is that numbers are still numbers even with a format attached, e.g.,
a$cost2 <- 2 * a$cost
a
cost cost2
1 $100,000 $200,000
2 $200,000 $400,000
A very simple way is
library(priceR)
values <- c(1e5, 2e5)
format_dollars(values)
# [1] "$100,000" "$200,000"
Notes
Add decimal places with format_dollars(values, 2) i.e.
"$100,000.00" "$200,000.00"
For other currencies use format_currency(values, "€") which gives "€100,000" "€200,000" etc

Resources