How to remove leading "0." in a numeric R variable - r

How can one concisely change a numeric R variable (keeping it numeric) so that, e.g.,
"-0.34" becomes simply "-.34"?

Only when you output a numeric value do you have to choose a concrete representation (i.e., how the number should be formatted). You cannot change a numeric variable from "-0.34" to "-.34"; both are representations for the same number.
However, when you output an expression e, you can choose how it should be formatted. I don't know of any build-in way to leave off the leading "0", but you could always just remove it manually:
> sub("^(-?)0.", "\\1.", sprintf("%.2f", -0.34))
[1] "-.34"
You can define a function for convenience, e.g.,
numformat <- function(val) { sub("^(-?)0.", "\\1.", sprintf("%.2f", val)) }

In addition to the existing answers, I wanted to mention that the package weights has a function rd() which can be used to "round numbers to text with no leading zero". Of course, the result is not numeric but character.
library("weights")
rd(-0.341, digits=2)
[1] "-.34"

I needed to show numbers to 3 decimal places.
If you want to print to an arbitrary number of decimal places and you don't want to have to add another package (i.e., the weights package above), then this function (adapted from #stefan's answer) seems to work:
numformat <- function(x, digits = 2) {
ncode <- paste0("%.", digits, "f")
sub("^(-?)0.", "\\1.", sprintf(ncode, x))
}
So for example:
> numformat(-.232, 2)
[1] "-.23"
> numformat(-.232, 3)
[1] "-.232"
> numformat(-.232, 4)
[1] "-.2320"

In addition to #stefan's nice answer, I stumbled upon the following code which accomplishes the same thing but prints out more decimal places:
f = function(X1)gsub("0\\.","\\.", X1)

If it's for reporting in R Markdown I use the package MOTE with the function apa() and code: apa(-0.34, 2, FALSE) this will return -.34 in my documents.

Related

Why can't I combine Reduce with paste when using "*" as a character?

I'm trying to get the output "1*2*4*5" from (function(x) Reduce(paste0(toString("*")),x))(c(1,2,4,5)), but no matter how I manipulate Reduce, paste0, and the asterisks, I'm either getting error messages or the asterisks being treated as multiplication (giving 40). Where am I going wrong?
Reduce uses a function with two arguments to which it applies the previous result and the next element of the vector. Therefore, you need a function of both x and y:
Reduce(function(x,y)paste0(x,"*",y),c(1,2,4,5))
#[1] "1*2*4*5"
As an aside, you can provide an initial value to be applied as x for the first element of the vector with init =.
Reduce(function(x,y)paste0(x,"*",y),c(1,2,4,5), init = 0)
#[1] "0*1*2*4*5"
One thing you may have tried was this:
Reduce(paste0("*"),c(1,2,4,5))
#[1] 40
This applies the multiplication operator to x and y, because paste0("*") evaluates to "*".
Another base R option is to use paste within gsub, e.g.,
x <- 1:5
gsub("\\s","*",Reduce(paste,x))
which gives
> gsub("\\s","*",Reduce(paste,x))
[1] "1*2*3*4*5"
KISS method:
(with improvements as suggested by #nicola)
bar <- as.character(1:5)
paste0(bar,sep="",collapse='*')
#[1] "1*2*3*4*5"

Convert superscripted numbers from string into scientific notation (from Unicode, UTF8)

I imported a vector of p-values from an Excel table. The numbers are given as superscripted Unicode strings. After hours of trying I still struggle to convert them into numbers.
See example below. Simple conversion with as.numeric() doesn't work. I also tried to use Regex to capture the superscripted numbers, but it turned out that each superscripted number has a distinct Unicode code, for which there is no translation.
test <- c("0.0126", "0.000289", "4.26x10⁻¹⁴", "6.36x10⁻⁴⁸",
"4.35x10⁻⁹", "0.115", "0.0982", "0.000187", "0.0484", "0.000223")
as.numeric(test)
Does somebody know of an R-package which could do the translation painlessly, or do I have to translate the codes one by one into digits?
This kind of formatting is definitely not very portable... Here's one possible solution though, for the exercise...
test <- c("0.0126", "0.000289", "4.26x10⁻¹⁴", "6.36x10⁻⁴⁸",
"4.35x10⁻⁹", "0.115", "0.0982", "0.000187", "0.0484",
"0.000223")
library(utf8)
library(stringr)
# normalize, ie everything to "normal text"
testnorm <- utf8_normalize(test, map_case = TRUE, map_compat = TRUE)
# replace exponent part
# \\N{Minus Sign} is the unicode name of the minus sign symbol
# (see [ICU regex](http://userguide.icu-project.org/strings/regexp))
# it is necessary because the "-" is not a plain text minus sign...
testnorm <- str_replace_all(testnorm, "x10\\N{Minus Sign}", "e-")
# evaluate these character strings
p_vals <- sapply(X = testnorm,
FUN = function(x) eval(parse(text = x)),
USE.NAMES = FALSE
)
# everything got adjusted to the "e-48" element...
format(p_vals, digits = 2, scientific = F)

Converting internationally formatted strings to numeric

I have a file with internationally formatted numbers (i.e strings) including units of measurement. In this case the decimal place is indicated by "," and the 1e3 seperator is indicated as "." (i.e. German number formats).
a <- c('2.200.222 €',
' 180.109,3 €')
or
b <- c('28,42 m²',
'47,70 m²')
I'd like to convert these strings efficiently to numeric. I've tried to filter out numbers by codes like
require(stringr)
str_extract(a, pattern='[0-9]+.[0-9]+.[0-9]+')
str_extract(b, pattern='[0-9]+,[0-9]+')
however, this does seem to be too prone to errors and I guess there must be a more standardized way. So here's my question: Is there a custom function, package or something else that is capable of such a problem?
Thank you very much!
Here is a function that uses gsub to deal with the sample data you posted:
x <- c('2.200.222 €', ' 180.109,3 €', '28,42 m²', '47,70 m²')
strip <- function(x){
z <- gsub("[^0-9,.]", "", x)
z <- gsub("\\.", "", z)
gsub(",", ".", z)
}
as.numeric(strip(x))
[1] 2200222.00 180109.30 28.42 47.70
It works like this:
First strip out all trailing non-digits (and anything after these non-digits)
Then strip out all periods.
Finally, convert commas to periods.

What do the %op% operators in mean? For example "%in%"?

I tried to do this simple search but couldn't find anything on the percent (%) symbol in R.
What does %in% mean in the following code?
time(x) %in% time(y) where x and y are matrices.
How do I look up help on %in% and similar functions that follow the %stuff% pattern, as I cannot locate the help file?
Related questions:
What does eg %+% do? in R
The R %*% operator
What does %*% mean in R
What does %||% do in R?
What does %>% mean in R
I didn't think GSee's or Sathish's answers went far enough because "%" does have meaning all by itself and not just in the context of the %in% operator. It is the mechanism for defining new infix operators by users. It is a much more general issue than the virtues of the %in% infix operator or its more general prefix ancestor match. It could be as simple as making a pairwise "s"(um) operator:
`%s%` <- function(x,y) x + y
Or it could be more interesting, say making a second derivative operator:
`%DD%` <- function(expr, nam="x") { D(D( bquote(.(expr)), nam), nam) }
expression(x^4) %DD% "x"
# 4 * (3 * x^2)
The %-character also has importance in the parsing of Date, date-time, and C-type format functions like strptime, formatC and sprintf.
Since that was originally written we have seen the emergence of the magrittr package with the dplyr elaboration that demonstrates yet another use for %-flanked operators.
So the most general answer is that % symbols are handled specially by the R parser. Since the parser is used to process plotmath expressions, you will also see extensive options for graphics annotations at the ?plotmath help page.
%op% denotes an infix binary operator. There are several built-in operators using %, and you can also create your own.
(A single % sign isn't a keyword in R. You can see a list of keywords on the ?Reserved help page.)
How do I get help on binary operators?
As with anything that isn't a standard variable name, you have to to enclose the term in quotes or backquotes.
?"%in%"
?`%in%`
Credit: GSee's answer.
What does %in% do?
As described on the ?`%in%` help page (which is actually the ?match help page since %in% is really only an infix version of match.),
[%in%] returns a logical vector indicating if there is a match or not for its left operand
It is most commonly used with categorical variables, though it can be used with numbers as well.
c("a", "A") %in% letters
## [1] TRUE FALSE
1:4 %in% c(2, 3, 5, 7, 11)
## [1] FALSE TRUE TRUE FALSE
Credit: GSee's answer, Ari's answer, Sathish's answer.
How do I create my own infix binary operators?
These are functions, and can be defined in the same way as any other function, with a couple of restrictions.
It's a binary opertor, so the function must take exactly two arguments.
Since the name is non-standard, it must be written with quotes or backquotes.
For example, this defines a matrix power operator.
`%^%` <- function(x, y) matrixcalc::matrix.power(x, y)
matrix(1:4, 2) %^% 3
Credit: BondedDust's answer, Ari's answer.
What other % operators are there?
In base R:
%/% and %% perform integer division and modular division respectively, and are described on the ?Arithmetic help page.
%o% gives the outer product of arrays.
%*% performs matrix multiplication.
%x% performs the Kronecker product of arrays.
In ggplot2:
%+% replaces the data frame in a ggplot.
%+replace% modifies theme elements in a ggplot.
%inside% (internal) checks for values in a range.
%||% (internal) provides a default value in case of NULL values. This function also appears internally in devtools, reshape2, roxygen2 and knitr. (In knitr it is called %n%.)
In magrittr:
%>% pipes the left-hand side into an expression on the right-hand side.
%<>% pipes the left-hand side into an expression on the right-hand side, and then assigns the result back into the left-hand side object.
%T>% pipes the left-hand side into an expression on the right-hand side, which it uses only for its side effects, returning the left-hand side.
%,% builds a functional sequence.
%$% exposes columns of a data.frame or members of a list.
In data.table:
%between% checks for values in a range.
%chin% is like %in%, optimised for character vectors.
%like% checks for regular expression matches.
In Hmisc:
%nin% returns the opposite of %in%.
In devtools:
%:::% (internal) gets a variable from a namespace passed as a string.
In sp:
%over% performs a spatial join (e.g., which polygon corresponds to some points?)
In rebus:
%R% concatenates elements of a regex object.
More generally, you can find all the operators in all the packages installed on your machine using:
library(magrittr)
ip <- installed.packages() %>% rownames
(ops <- setNames(ip, ip) %>%
lapply(
function(pkg)
{
rdx_file <- system.file("R", paste0(pkg, ".rdx"), package = pkg)
if(file.exists(rdx_file))
{
rdx <- readRDS(rdx_file)
fn_names <- names(rdx$variables)
fn_names[grepl("^%", fn_names)]
}
}
) %>%
unlist
)
Put quotes around it to find the help page. Either of these work
> help("%in%")
> ?"%in%"
Once you get to the help page, you'll see that
‘%in%’ is currently defined as
‘"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0’
Since time is a generic, I don't know what time(X2) returns without knowing what X2 is. But, %in% tells you which items from the left hand side are also in the right hand side.
> c(1:5) %in% c(3:8)
[1] FALSE FALSE TRUE TRUE TRUE
See also, intersect
> intersect(c(1:5), c(3:8))
[1] 3 4 5
More generally, %foo% is the syntax for a binary operator. Binary operators in R are really just functions in disguise, and take two arguments (the one before and the one after the operator become the first two arguments of the function).
For example:
> `%in%`(1:5,4:6)
[1] FALSE FALSE FALSE TRUE TRUE
While %in% is defined in base R, you can also define your own binary function:
`%hi%` <- function(x,y) cat(x,y,"\n")
> "oh" %hi% "my"
oh my
%in% is an operator used to find and subset multiple occurrences of the same name or value in a matrix or data frame.
For example 1: subsetting with the same name
set.seed(133)
x <- runif(5)
names(x) <- letters[1:5]
x[c("a", "d")]
# a d
# 0.5360112 0.4231022
Now you change the name of "d" to "a"
names(x)[4] <- "a"
If you try to extract the similar names and its values using the previous subscript, it will not work. Notice the result, it does not have the elements of [1] and [4].
x[c("a", "a")]
# a a
# 0.5360112 0.5360112
So, you can extract the two "a"s from different position in a variable by using %in% binary operator.
names(x) %in% "a"
# [1] TRUE FALSE FALSE TRUE FALSE
#assign it to a variable called "vec"
vec <- names(x) %in% "a"
#extract the values of two "a"s
x[vec]
# a a
# 0.5360112 0.4231022
Example 2: Subsetting multiple values from a column
Refer this site for an example

How would you write a wrapper function or class to format numbers as percent, currency, etc. in R?

In a previous question, I asked whether whether a convenient wrapper exists inside base R to format numbers as percentages.
This elicited three responses:
Probably not.
Such a wrapper would be too narrow to be useful. It is better that useRs learn how to use existing tools, such as sprintf, which can format numbers in a highly flexible way.
Such a wrapper is problematic, anyway, since you lose the ability to perform calculations on the object.
Still, in my view the sprintf function is just a little bit too obfuscated for the R beginner to learn (except if they come from a C background). Perhaps a better solution is to modify format or prettyNum to have options for adding prefixes and suffixes, so you could easily create percents, currencies, degrees, etc.
Question:
How would you design a function, class or set of functions to elegantly deal with formatting numbers as percentages, currencies, degrees, etc?
I would probably keep things very simple. format() is generally useful for most basic formatting needs. I would extend that with a simple wrapper that allowed arbitrary prefix and suffix strings. Here is a simple version:
formatVal <- function(x, prefix = "", suffix = "", sep = "", collapse = NULL,
...) {
x <- format(x, ...)
x <- paste(prefix, x, suffix, sep = sep, collapse = collapse)
x
}
If I were doing this for real, I would probably not have the collapse argument in the definition of formatVal(), but instead process it out of ..., but for illustration I kept the above function simple.
Using:
set.seed(1)
m <- runif(5)
some simple examples of usage
> formatVal(m*100, suffix = "%")
[1] "26.55087%" "37.21239%" "57.28534%" "90.82078%" "20.16819%"
> formatVal(m*100, suffix = "%", digits = 2)
[1] "27%" "37%" "57%" "91%" "20%"
> formatVal(m*100, suffix = "%", digits = 2, nsmall = 2)
[1] "26.55%" "37.21%" "57.29%" "90.82%" "20.17%"
> formatVal(m, prefix = "£")
[1] "£0.2655087" "£0.3721239" "£0.5728534" "£0.9082078" "£0.2016819"
> formatVal(m, prefix = "£", digits = 1)
[1] "£0.3" "£0.4" "£0.6" "£0.9" "£0.2"
> formatVal(m, prefix = "£", digits = 1, nsmall = 2)
[1] "£0.27" "£0.37" "£0.57" "£0.91" "£0.20"
print.formatted <- function(x)
{
print(paste(attr(x,"prefix"), sprintf(x*attr(x,"scaleFactor"),fmt=paste("%.",attr(x,"precision"),"f",sep="")), attr(x,"suffix"), sep=""))
}
as.percent <- function(x,precision=3)
{
class(x) <- c(class(x),"formatted")
attr(x,"scaleFactor")<-100
attr(x,"prefix")<-""
attr(x,"suffix")<-"%"
attr(x,"precision")<-precision
return(x)
}
as.currency <- function(x,prefix="£")
{
class(x) <- c(class(x),"formatted")
attr(x,"scaleFactor")<-1
attr(x,"prefix")<-prefix
attr(x,"suffix")<-""
attr(x,"precision")<-2
return(x)
}
as.percent(runif(3))
[1] "21.585%" "12.396%" "37.744%"
x <- as.currency(rnorm(3,500,100))
x
[1] "£381.93" "£339.49" "£521.74"
2*x
[1] "£763.86" "£678.98" "£1043.48"
I think this could be done through attributes, e.g. let v <- 3.4. If it is pounds Sterling, we could use something like:
attributes(v)<-list(style = "descriptor", type = "currency", category = "pound")
If it is a percentage:
attributes(v)<-list(style = "descriptor", type = "proportion", category = "percentage")
Then, a special print method would be necessary. One could also incorporate a translation method, e.g. to convert from GBP to USD (pounds to dollars), centimeters to inches, etc.
The descriptor is essentially my view on a reserved kind of flag for indicating special handling for the given number. This could later extend to text strings, such as addresses and names. For other numbers, such as phone numbers, there may be special decompositions into country code, intra-country area/regional codes, all the way down to extensions.
Such a package may be akin to ggplot for data types - special methods for storing, transforming, and printing things within types?
Such a system might ensure that dimensions are correct when multiplying values. That has real utility in a lot of applications.
To my knowledge, the only widespread handling of units in R is for bytes (bytes, KB, MB, etc.) and time (hours, seconds, etc.). Even so, the handling, while simple, isn't obvious - I still have to tell print the units to use. For instance, If I want to print an object's size in KB, I can't simply calculate object.size(v)/1024 - the output is reported in fractions of a byte, rather than KB; I have to use print(object.size(v), units = "K").
ggplot2 has a bunch of functions for formatting common specific cases. These would be ideal, but for two things: they aren't really general enough, and you shouldn't really have to load ggplot2 (with all it's dependencies) to get at such functions. You could try contacting Hadley to get the signatures changed to pass more things to format, and have them moved to a lower level package (plyr maybe, or their own package, ggtools?).

Resources