I have a data frame storing the dollar amount, it looks like this
> a
cost
1 1e+05
2 2e+05
I would like it can be shown as this
> a
cost
1 $100,000
2 $200,000
How to do that in R?
DF <- data.frame(cost=c(1e4, 2e5))
#assign a class
class(DF$cost) <- c("money", class(DF$cost))
#S3 print method for the class
print.money <- function(x, ...) {
print.default(paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=",")))
}
#format method, which is necessary for formating in a data.frame
format.money <- function(x, ...) {
paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=","))
}
DF
# cost
#1 $10,000.00
#2 $200,000.00
This will get you everything except the commas:
> sprintf("$%.2f", seq(100,100000,by=10000)/7)
[1] "$14.29" "$1442.86" "$2871.43" "$4300.00" "$5728.57" "$7157.14" "$8585.71" "$10014.29" "$11442.86" "$12871.43"
Getting those is pretty complicated, as shown in these questions:
How can I format currency with commas in C?
How to format a number from 1123456789 to 1,123,456,789 in C?
Luckily, this is implemented in the scales package:
library('scales')
> dollar_format()(c(100, 0.23, 1.456565, 2e3))
## [1] "$100.00" "$0.23" "$1.46" "$2,000.00"
> dollar_format()(c(1:10 * 10))
## [1] "$10" "$20" "$30" "$40" "$50" "$60" "$70" "$80" "$90" "$100"
> dollar(c(100, 0.23, 1.456565, 2e3))
## [1] "$100.00" "$0.23" "$1.46" "$2,000.00"
> dollar(c(1:10 * 10))
## [1] "$10" "$20" "$30" "$40" "$50" "$60" "$70" "$80" "$90" "$100"
> dollar(10^(1:8))
## [1] "$10" "$100" "$1,000" "$10,000" "$100,000" "$1,000,000" "$10,000,000" "$100,000,000"
You can use the currency() function from the formattable package. With OP's example
a <- data.frame(cost = c(1e+05, 2e+05))
a
cost
1 1e+05
2 2e+05
library(formattable)
a$cost <- currency(a$cost, digits = 0L)
a
cost
1 $100,000
2 $200,000
By default, 2 digits after the decimal point are shown. This has been overruled using the digits parameter to meet OP's expectations.
The benfit of formattable is that numbers are still numbers even with a format attached, e.g.,
a$cost2 <- 2 * a$cost
a
cost cost2
1 $100,000 $200,000
2 $200,000 $400,000
A very simple way is
library(priceR)
values <- c(1e5, 2e5)
format_dollars(values)
# [1] "$100,000" "$200,000"
Notes
Add decimal places with format_dollars(values, 2) i.e.
"$100,000.00" "$200,000.00"
For other currencies use format_currency(values, "€") which gives "€100,000" "€200,000" etc
Related
In an input data structure
data.frame(stockname = c("google", "amazon"), open= c(30, 40), close = c(32, 48))
How is it possible to convert the interger number which is existed in every row with the respectively percentage using as sum the full sum of all rows of the data frame to calculate the percentage.
The total sum of all rows of the input data is 150 so the percentage example output data frame is
data.frame(stockname = c("google", "amazon"), open= c(20%, 26.7%), close = c(21.3%, 32%))
This can be done using prop.table:
df[-1] <- prop.table(df[-1])
df
# stockname open close
# 1 google 0.2000000 0.2133333
# 2 amazon 0.2666667 0.3200000
If you're interested in formatting the output as well, look at sprintf. Starting with the source data again, try:
df[-1] <- sprintf("%.1f%%", unlist(prop.table(df[-1])) * 100)
df
# stockname open close
# 1 google 20.0% 21.3%
# 2 amazon 26.7% 32.0%
Perhaps this is what you are after
df[-1] <- df[-1] / sum(df[-1])
which gives
> df
stockname open close
1 google 0.2000000 0.2133333
2 amazon 0.2666667 0.3200000
I have a variable that takes values of the form x.1, x.2 or x.3 currently with x being any number followed by the decimal point.
I would like to convert x.1 to x.333, x.2 to x.666 and x.3 to x.999 or in this case I would assume it would be rounded up to the whole number.
Context: running regression analysis containing a variable of innings pitched (baseball pitchers) which currently have data values of the .1, .2, .3 form above.
Help would be much appreciated!
You can use x %% 1 to get the fractional part of a number in R. Then just multiply that by 3.333 and add the result back on to the integer part of your number to get total innings pitched.
x <- 2.3
as.integer(x) + (x %% 1 * 3.333)
[1] 2.9999
(Use 3.333 instead of 0.333 to move the decimal.)
Depending on the exact context, it could be nice to keep the component parts -- if that's the case, I would be a little verbose and utilize tidyr and dplyr:
library(tidyr)
library(dplyr)
vec <- c("123.1", "456.2", "789.3")
df <- data.frame(vec)
df %>%
separate(vec, into = c("before_dot", "after_dot"), remove = FALSE, convert = TRUE) %>%
mutate(after_dot_times_333 = after_dot * 333,
new_var = paste(before_dot, after_dot_times_333, sep = "."))
# vec before_dot after_dot after_dot_times_333 new_var
# 1 123.1 123 1 333 123.333
# 2 456.2 456 2 666 456.666
# 3 789.3 789 3 999 789.999
Alternatively, you could accomplish this in one line:
sapply(strsplit(vec, "\\."), function(x) paste(x[1], as.numeric(x[2]) * 333, sep = "."))
I need to compute a condition over a column date in R. Atable would be:
PIL_final1<-data.frame( prior_day1_cart=c(4,8),
prior_day1_comp=c('2014-06-03','2014-06-07'),
dia_lim_23_cart=c('201-07-30','201-07-30') )
PIL_final1$prior_day1_comp<-as.Date(PIL_final1$prior_day1_comp, format='%Y-%m-%d')
PIL_final1$dia_lim_23_cart<-as.Date(PIL_final1$dia_lim_23_cart, format='%Y-%m-%d')
So I use ifelse:
PIL_final1$llamar_dia<-ifelse(PIL_final1$prior_day1_cart+6>23,
PIL_final1$dia_lim_23_cart ,
PIL_final1$prior_day1_comp+6)
But I get:
> PIL_final1
prior_day1_cart prior_day1_comp dia_lim_23_cart llamar_dia
1 4 2014-06-03 0201-07-30 16230
2 8 2014-06-07 0201-07-30 16234
And if I do:
> PIL_final1$prior_day1_comp+6
[1] "2014-06-09" "2014-06-13"
I get the right results.
How can I do the ifelse and get the date? thanks.
Also if I try this, I still get a number (although different):
> PIL_final1$llamar_dia<-ifelse(PIL_final1$prior_day1_cart+6>23,
+ PIL_final1$dia_lim_23_cart ,
+ as.Date(PIL_final$prior_day1_comp+6,format="%Y-%m-%d"))
> PIL_final1
prior_day1_cart prior_day1_comp dia_lim_23_cart llamar_dia
1 4 2014-06-03 0201-07-30 16376
2 8 2014-06-07 0201-07-30 16377
Edition:
Also if I do this:
> as.Date(ifelse(PIL_final1$prior_day1_cart+6>23, PIL_final1$dia_lim_23_cart ,
+ PIL_final1$prior_day1_comp+6), format="%Y-%m-%d", origin="1970-01-01")
[1] "2014-06-09" "2014-06-13"
I get the right results, but if I replace the ifelse with the vector result, I get the wrong dates:
> PIL_final1$llamar_dia<-ifelse(PIL_final1$prior_day1_cart+6>23,
+ PIL_final1$dia_lim_23_cart ,
+ PIL_final$prior_day1_comp+6)
> as.Date(PIL_final1$llamar_dia, format="%Y-%m-%d", origin="1970-01-01")
[1] "2014-11-02" "2014-11-03"
from ?ifelse :
The mode of the result may depend on the value of test (see the examples), Sometimes it is better >to use a construction such as
ifelse(test, yes, no) ~~ (tmp <- yes; tmp[!test] <- no[!test]; tmp)
Applying this :
dat$d3 <-
with(dat,{
tmp <- d2+6; tmp[!(x+6>23)] <- d1[!(x+6>23)]; tmp
})
dat
x d1 d2 d3
1 4 2014-06-03 0201-07-30 2014-06-03
2 8 2014-06-07 0201-07-30 2014-06-07
Maybe you should modify this to handle missing values in test.
Note I changed the variables names since yours are really long to type and a real source of errors.
dat <- data.frame( x=c(4,8),
d1=c('2014-06-03','2014-06-07'),
d2=c('201-07-30','201-07-30') )
I would like to format my negative numbers in "Accounting" format, i.e. with brackets.
For example, I would like to format -1000000 as (1,000,000).
I know the way of introducing thousands-separator:
prettyNum(-1000000, big.mark=",",scientific=F)
However, I am not sure how to introduce the brackets. I would like to be able to apply the formatting to a whole vector, but I would want only the negative numbers to be affected. Not that after introducing the thousands separator, the vector of numbers is now a characater vector, example:
"-50,000" "50,000" "-50,000" "-49,979" "-48,778" "-45,279" "-41,321"
Any ideas? Thanks.
You can try my package formattable which has a built-in function accounting to apply accounting format to numeric vector.
> # devtools::install_github("renkun-ken/formattable")
> library(formattable)
> accounting(c(123456,-23456,-789123456))
[1] 123,456.00 (23,456.00) (789,123,456.00)
You can print the numbers as integers:
> accounting(c(123456,-23456,-789123456), format = "d")
[1] 123,456 (23,456) (789,123,456)
These numbers work with arithmetic calculations:
> money <- accounting(c(123456,-23456,-789123456), format = "d")
> money
[1] 123,456 (23,456) (789,123,456)
> money + 5000
[1] 128,456 (18,456) (789,118,456)
It also works in data.frame printing:
> data.frame(date = as.Date("2015-01-01") + 1:10,
balance = accounting(cumsum(rnorm(10, 0, 100000))))
date balance
1 2015-01-02 (21,929.80)
2 2015-01-03 (246,927.59)
3 2015-01-04 (156,210.85)
4 2015-01-05 (135,122.80)
5 2015-01-06 (199,713.06)
6 2015-01-07 (91,938.03)
7 2015-01-08 (34,600.47)
8 2015-01-09 147,165.57
9 2015-01-10 180,443.31
10 2015-01-11 251,141.04
Another way, without regex:
x <- c(-50000, 50000, -50000, -49979, -48778, -45279, -41321)
x.comma <- prettyNum(abs(x), big.mark=',')
ifelse(x >= 0, x.comma, paste0('(', x.comma, ')'))
# [1] "(50,000)" "50,000" "(50,000)" "(49,979)" "(48,778)" "(45,279)" "(41,321)"
Here's an approach that gives the leading spaces:
x <- c(-10000000, -4444, 1, 333)
num <- gsub("^\\s+|\\s+$", "", prettyNum(abs(x), ,big.mark=",", scientific=F))
num[x < 0] <- sprintf("(%s)", num[x < 0])
sprintf(paste0("%0", max(nchar(as.character(num))), "s"), num)
## [1] "(10,000,000)" " (4,444)" " 1" " 333"
A very easy approach is using paste0 and sub. Here's a simple function for this:
my.format <- function(num){
ind <- grepl("-", num)
num[ind] <- paste0("(", sub("-", "", num[ind]), ")")
num
}
> num <- c("-50,000", "50,000", "-50,000", "-49,979", "-48,778", "-45,279", "-41,321")
> my.format(num)
[1] "(50,000)" "50,000" "(50,000)" "(49,979)" "(48,778)" "(45,279)" "(41,321)"
If you want to reverse the situation, let's say, you have a vector like this:
num2 <- my.format(num)
and you want to replace (·) by -, then try
sub(")", "", sub("\\(", "-", num2))
Maybe this?:
a <- prettyNum(-1000000, ,big.mark=",",scientific=F)
paste("(", sub("-", "", a), ")", sep = "")
I have those 2 lists :
> FHM_CS
$X3
[1] 100
$X5
[1] 100
$X7
[1] 54.23706 63.48137 51.04026 60.14302 70.39396 56.59812 75.41480 88.26871 70.96976 54.20140 63.43252
[12] 51.00868 60.10348 70.33980 56.56310 75.36522 88.20079 70.92585
$X9
[1] 38.63259 27.74551 21.17788 100.00000 73.08030 55.78148 85.86148 38.56665 27.71148 21.15804
11] 72.99067 55.72924 85.78107
$XAS
[1] 0
$XPW
[1] 49.07016 40.02288 23.87023 100.00000 89.30224 53.26115 69.98929 0.00000
and
> FHM_CD
$X3
[1] 14.8840750 17.7316138 6.1164435 0.0000000 1.1435141 14.8904265 17.7375474 6.1241709 1.1506441
[10] 14.6751282 17.5364297 5.8621689 0.9089743
$X5
[1] 74.41660 76.74417 80.95828 58.58119 62.34946 69.17199 57.25100 61.14029 68.18193 74.38872 76.72114
[12] 80.94284 58.53606 62.31217 69.14699 57.20442 61.10180 68.15613 74.34258 76.68302 80.91730 58.46136
[23] 62.25047 69.10565 57.12732 61.03811 68.11346
$X7
[1] 66.30768 60.56507 68.29355 49.37678 40.74842 52.36058 36.42026 25.58356 40.16773 66.32983 60.59541
[12] 68.31317 49.41006 40.79401 52.39005 36.46206 25.64082 40.20475
$X9
[1] 66.14771 75.68765 81.44262 55.01738 67.69397 75.34112 51.61251 65.24862 73.47461 66.20550 75.71747
[12] 81.46000 55.09417 67.73359 75.36421 51.69510 65.29125 73.49945
$XAS
[1] 25.62701 45.29201 44.51013 0.00000 17.99168 16.81963
$XPW
[1] 7.344758 24.428011 54.927770 0.000000 4.637824 43.124615 39.752560 100.000000
And I would like to do a "clustured jittered plot" for every line from both list. For example : X3 from FHM_CS right by X3 from FHM_CD and so one for every row of the lists.
I was thinking of using qplot from ggplot2 with geom="jitter", but I would also like to add an horizontal bar for each bar space to show the mean every list.
It would be something like this except I would like to add the mean for every list as a horizontal red bar (with its value if possible) and the clustering part (like FHM_CS in blue and FHM_CD in red).
So how to convert those list to dataframe and how to plot this from there ?
To create a data.frame, you can do it crudely like this:
df <- data.frame(values=unlist(FHM_CS, use.names=FALSE), tag=rep(names(FHM_CS), times=sapply(FHM_CS, length))
But for usage with ggplot2, we should merge everything into a single dataframe:
df.CS <- data.frame(values=unlist(FHM_CS, use.names=FALSE), tag=rep(names(FHM_CS), times=sapply(FHM_CS, length)), class='CS', stringsAsFactors=TRUE)
df.CD <- data.frame(values=unlist(FHM_CD, use.names=FALSE), tag=rep(names(FHM_CD), times=sapply(FHM_CD, length)), class='CD', stringsAsFactors=TRUE)
my.data <- rbind(df.CS, df.CD)
Edit Alternatively, as seen what Michele found, use melt:
library(reshape2)
df.CD <- data.frame(melt(FHM_CD), class='CD')
df.CS <- data.frame(melt(FHM_CS), class='CS')
## Except now, instead of `tag`, we have `L1`.
my.data <- rbind(df.CD, df.CS)
my.data$tag <- my.data$L1
End of edit
Then to plot, as you want (I was lazy and didn't enter much data):
library(ggplot2)
ggplot(my.data, aes(x=interaction(tag, class), y=values)) + geom_point(position=position_jitter())
But lets try to add the horizontal bars. But, I would use facetting so we will get the following:
ggplot(my.data, aes(x=tag, y=values)) + geom_point(position=position_jitter()) + stat_summary(fun.y='mean', geom='errorbarh', aes(xmin=as.integer(tag)-0.3, xmax=as.integer(tag)+0.3), height=0) + facet_grid(.~class)
Edit 2
First manually create the interaction vector:
my.data$it <- with(my.data, interaction(tag, class, sep=' - ', lex.order=TRUE))
Then we plot as previously.
ggplot(my.data, aes(x=it, y=values)) + geom_point(position=position_jitter()) + stat_summary(fun.y='mean', geom='errorbarh', aes(xmin=as.integer(it)-0.3, xmax=as.integer(it)+0.3, height=0, colour=class))
Of course, you might want to edit the arguments to position_jitter() to squeeze the points more closer.
lst1 <- list("A", "B", "C")
lst2 <- list(rnorm(1000), rnorm(1000), rnorm(1000))
library(ggplot2)
library(reshape2)
df <- merge(melt(lst1, value.name="id"), melt(lst2), by="L1")
# L1 is just an output from melt.list method and represents the list items' index
> head(df)
L1 id value
1 1 A 2.0216986
2 1 A 1.4856589
3 1 A -0.2204599
4 1 A 0.6514056
5 1 A 0.3035737
6 1 A 0.8371660
qplot(id, value, data=df, geom="jitter")