how to add commas every 3 digits using R? [duplicate] - r

I'd like to format numbers with both thousands separator and specifying the number of decimals. I know how to do these separately, but not together.
For example, I use format per this for the decimals:
FormatDecimal <- function(x, k) {
return(format(round(as.numeric(x), k), nsmall=k))
}
FormatDecimal(1000.64, 1) # 1000.6
And for thousands separator, formatC:
formatC(1000.64, big.mark=",") # 1,001
These don't play nicely together though:
formatC(FormatDecimal(1000.64, 1), big.mark=",")
# 1000.6, since no longer numeric
formatC(round(as.numeric(1000.64), 1), nsmall=1, big.mark=",")
# Error: unused argument (nsmall=1)
How can I get 1,000.6?
Edit: This differs from this question which asks about formatting 3.14 as 3,14 (was flagged as possible dup).

format not formatC:
format(round(as.numeric(1000.64), 1), nsmall=1, big.mark=",") # 1,000.6

formatC(1000.64, format="f", big.mark=",", digits=1)
(sorry if i'm missing something.)

The scales library has a label_comma function:
scales::label_comma(accuracy = .1)(1000.64)
[1] "1,000.6"
With additional arguments if you want to use something other than a comma in the thousands place or another character instead of a decimal point, etc (see below).
Note: the output of label_comma(...) is a function to make it easier to use in ggplot2 arguments, hence the additional parentheses notation. This could be helpful if you're using the same format repeatedly:
my_comma <- scales::label_comma(accuracy = .1, big.mark = ".", decimal.mark = ",")
my_comma(1000.64)
[1] "1.000,6"
my_comma(c(1000.64, 1234.56))
[1] "1.000,6" "1.234,6"

formattable provides comma:
library(formattable)
comma(1000.64, digits = 1) # 1,000.6
comma provides an elementary interface to formatC.

Related

change decimals numbers in bar chart in r [duplicate]

I'd like to format numbers with both thousands separator and specifying the number of decimals. I know how to do these separately, but not together.
For example, I use format per this for the decimals:
FormatDecimal <- function(x, k) {
return(format(round(as.numeric(x), k), nsmall=k))
}
FormatDecimal(1000.64, 1) # 1000.6
And for thousands separator, formatC:
formatC(1000.64, big.mark=",") # 1,001
These don't play nicely together though:
formatC(FormatDecimal(1000.64, 1), big.mark=",")
# 1000.6, since no longer numeric
formatC(round(as.numeric(1000.64), 1), nsmall=1, big.mark=",")
# Error: unused argument (nsmall=1)
How can I get 1,000.6?
Edit: This differs from this question which asks about formatting 3.14 as 3,14 (was flagged as possible dup).
format not formatC:
format(round(as.numeric(1000.64), 1), nsmall=1, big.mark=",") # 1,000.6
formatC(1000.64, format="f", big.mark=",", digits=1)
(sorry if i'm missing something.)
The scales library has a label_comma function:
scales::label_comma(accuracy = .1)(1000.64)
[1] "1,000.6"
With additional arguments if you want to use something other than a comma in the thousands place or another character instead of a decimal point, etc (see below).
Note: the output of label_comma(...) is a function to make it easier to use in ggplot2 arguments, hence the additional parentheses notation. This could be helpful if you're using the same format repeatedly:
my_comma <- scales::label_comma(accuracy = .1, big.mark = ".", decimal.mark = ",")
my_comma(1000.64)
[1] "1.000,6"
my_comma(c(1000.64, 1234.56))
[1] "1.000,6" "1.234,6"
formattable provides comma:
library(formattable)
comma(1000.64, digits = 1) # 1,000.6
comma provides an elementary interface to formatC.

How to transform long names into shorter (two-part) names

I have a character vector in which long names are used, which will consist of several words connected by delimiters in the form of a dot.
x <- c("Duschekia.fruticosa..Rupr...Pouzar",
"Betula.nana.L.",
"Salix.glauca.L.",
"Salix.jenisseensis..F..Schmidt..Flod.",
"Vaccinium.minus..Lodd...Worosch")
The length of the names is different. But only the first two words of the entire name are important.
My goal is to get names up to 7 symbols: 3 initial symbols from the first two words and a separator in the form of a "dot" between them.
Very close to my request are these examples, but I do not know how to apply these code variations to my case.
R How to remove characters from long column names in a data frame and
how to append names to " column names" of the output data frame in R?
What should I do to get exit names to look like this?
x <- c("Dus.fru",
"Bet.nan",
"Sal.gla",
"Sal.jen",
"Vac.min")
Any help would be appreciated.
You can do the following:
gsub("(\\w{1,3})[^\\.]*\\.(\\w{1,3}).*", "\\1.\\2", x)
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"
First we match up to 3 characters (\\w{1,3}), then ignore anything which is not a dot [^\\.]*, match a dot \\. and then again up to 3 characters (\\w{1,3}). Finally anything, that comes after that .*. We then only use the things in the brackets and separate them with a dot \\1.\\2.
Split on dot, substring 3 characters, then paste back together:
sapply(strsplit(x, ".", fixed = TRUE), function(i){
paste(substr(i[ 1 ], 1, 3), substr(i[ 2], 1, 3), sep = ".")
})
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"
Here a less elegant solution than kath's, but a bit more easy to read, if you are not an expert in regex.
# Your data
x <- c("Duschekia.fruticosa..Rupr...Pouzar",
"Betula.nana.L.",
"Salix.glauca.L.",
"Salix.jenisseensis..F..Schmidt..Flod.",
"Vaccinium.minus..Lodd...Worosch")
# A function that takes three characters from first two words and merges them
cleaner_fun <- function(ugly_string) {
words <- strsplit(ugly_string, "\\.")[[1]]
short_words <- substr(words, 1, 3)
new_name <- paste(short_words[1:2], collapse = ".")
return(new_name)
}
# Testing function
sapply(x, cleaner_fun)
[1]"Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"

Create character vector of filenames in numeric sequence with leading zeroes

I need to generate 100 file names.
How would you generate the corresponding character vector files in R containing 100 file names: plot01.png, plot02.png, plot03.png, ..., plot99.png, plot100.png? Notice that the numbers of the first 9 files start with 0.
The obvious but very ineffective solution is to write a vector with 100 file names. I'm trying to figure out a more effective way to create this character vector.
A concise option is paste0("plot", sprintf("%02d.png", 1:100)):
[1] "plot01.png" "plot02.png" "plot03.png" "plot04.png" ...
Another approach that is more characters to write, but maybe easier to follow, is string padding with str_pad from the stringr package:
library(stringr)
paste0("plot", str_pad(1:100, width = 2, side = "left", pad = 0), ".png")
Combine paste and formatC:
paste(formatC(1:100, flag = "0", width = 2), "png", sep = ".")
# [1] "01.png" "02.png" "03.png" "04.png" "05.png" "06.png" "07.png" ...

Format number in R with both comma thousands separator and specified decimals

I'd like to format numbers with both thousands separator and specifying the number of decimals. I know how to do these separately, but not together.
For example, I use format per this for the decimals:
FormatDecimal <- function(x, k) {
return(format(round(as.numeric(x), k), nsmall=k))
}
FormatDecimal(1000.64, 1) # 1000.6
And for thousands separator, formatC:
formatC(1000.64, big.mark=",") # 1,001
These don't play nicely together though:
formatC(FormatDecimal(1000.64, 1), big.mark=",")
# 1000.6, since no longer numeric
formatC(round(as.numeric(1000.64), 1), nsmall=1, big.mark=",")
# Error: unused argument (nsmall=1)
How can I get 1,000.6?
Edit: This differs from this question which asks about formatting 3.14 as 3,14 (was flagged as possible dup).
format not formatC:
format(round(as.numeric(1000.64), 1), nsmall=1, big.mark=",") # 1,000.6
formatC(1000.64, format="f", big.mark=",", digits=1)
(sorry if i'm missing something.)
The scales library has a label_comma function:
scales::label_comma(accuracy = .1)(1000.64)
[1] "1,000.6"
With additional arguments if you want to use something other than a comma in the thousands place or another character instead of a decimal point, etc (see below).
Note: the output of label_comma(...) is a function to make it easier to use in ggplot2 arguments, hence the additional parentheses notation. This could be helpful if you're using the same format repeatedly:
my_comma <- scales::label_comma(accuracy = .1, big.mark = ".", decimal.mark = ",")
my_comma(1000.64)
[1] "1.000,6"
my_comma(c(1000.64, 1234.56))
[1] "1.000,6" "1.234,6"
formattable provides comma:
library(formattable)
comma(1000.64, digits = 1) # 1,000.6
comma provides an elementary interface to formatC.

How to display numeric columns in an R dataframe without scientific notation ('e+07')

I have an R dataframe with one column containing a stringt of numbers but I would like to treat them as a factor (mainly to stop R shortening the numbers using e+04 etc...). One way I have found to fix this problem is to edit the csv file the data is taken from, and add a dummy entry that has a word in the desired column and then reimporting it. How do I get this effect using R functions without messing around with the csv?
To clarify, my dataframe looks like this:
pNum,Condition,numberEntered
1,2,5.0970304e+07
I want to change the data type of numberEntered from numeric to factor and get rid of the pesky e+07.
As Joshua said, it is a printing issue not a storage issue. You can change the way all numbers are printed (=by adjusting getOption("scipen").
x <- c(1, 2, 509703045845, 0.0001)
print(x)
options(scipen = 50)
print(x)
Alternatively, you may wish to change the way just those numbers are formatted. (This converts them to character.) It is worth getting to know format and formatC. To get you started, compare
format(x)
format(x, digits = 10)
format(x, digits = 3)
format(x, digits = 3, scientific = 5)
format(x, trim = TRUE, digits = 3, scientific = 5)
formatC(x)
formatC(x, format = "fg")
formatC(x, format = "fg", flag = "+")
Sorry to say, but you've been spending time trying to fix a problem that doesn't exist. Use str to check the types of data in your data.frame and you'll see that numberEntered is num and it isn't being "shortened". The only issue is the number of significant digits being printed.
options(digits=7)
(x <- data.frame(pNum=1,Condition=2,numberEntered=509703045845))
options(digits=10)
x
You can use options(digits=22) to set it to print the maximum number of significant digits. See ?options for more information.
I would advise against storing floating-point numbers as factors... but you can still do it. But I have also included several other options.
> txt <- "pNum,Condition,numberEntered
+ 1,2,5.0970304e+07"
> dat <- read.csv(textConnection(txt),colClasses=c("integer","integer","factor"))
> dat
pNum Condition numberEntered
1 1 2 5.0970304e+07
> dat[,3]
[1] 5.0970304e+07
Levels: 5.0970304e+07
> dat <- read.csv(textConnection(txt),colClasses=c("integer","integer","character"))
> dat[,3]
[1] "5.0970304e+07"
> dat <- read.csv(textConnection(txt),colClasses=c("integer","integer","numeric"))
> dat[,3]
[1] 50970304
> print.numeric <- function(...) formatC(...,format="f")
> print(dat[,3])
[1] "50970304.0000"

Resources