Change default number formatting in R - r

Is there a way to change the default number formatting in R so that numbers will print a certain way without repeatedly having to use the format() function? For example, I would like to have
> x <- 100000
> x
[1] 100,000
instead of
> x <- 100000
> x
[1] 100000

Well if you want to save keystrokes, binding the relevant R function to some pre-defined key-strokes, is fast and simple in any of the popular text editors.
Aside from that, I suppose you can always just write a small formatting function to wrap your expression in; so for instance:
fnx = function(x){print(formatC(x, format="d", big.mark=","), quote=F)}
> 567 * 43245
[1] 24519915
> fnx(567*4325)
[1] 2,452,275
R has several utility functions that will do this. I prefer "formatC" because it's a little more flexible than 'format' and 'prettyNum'.
In my function above, i wrapped the formatC call in a call to 'print' in order to remove the quotes (") from the output, which i don't like (i prefer to look at 100,000 rather than "100,000").

I don't know how to change the default (in fact, I would advise against it because including the comma makes it a character).
You can do this:
> prettyNum(100000, big.mark=",", scientific=FALSE)
[1] "100,000"

Related

Can you create an R function that calls using a prefix and suffix (operating like brackets)?

I have read about prefix functions and infix functions on Hadley Wickham's Advanced R website. I would like to know if there is any way to define functions that are called by placing a prefix and suffix around a single argument, so that the prefix and suffix operate like brackets. Is there any way to create a function like this, and if so, how do you do it?
An example for formulation: In order to give a specific example for formulation, suppose you have an object char that is a character string. You want to create a function that is called on a character string using the prefix _# and suffix #_ and the function adds five dashes to the front of the character string. If programmed successfully, it would operate as shown below.
char
[1] "Hello"
_#char#_
[1] "-----Hello"
There is a way to do this as long as your special operator takes a particular form, that is .%_% char %_%. . This is because the parser will interpret the dot as a variable name. If we use non-standard evaluation, we don't need the dot to actually exist, and we only need to use this as a marker for opening and closing our special operator. So we can do something like this:
`%_%` <- function(a, b)
{
if((deparse(match.call()$a) != ".") +
(deparse(match.call()$b) != ".") != 1)
stop("Unrecognised SPECIAL")
if(deparse(match.call()$a == "."))
return(`attr<-`(b, "prepped", TRUE))
if(attr(a, "prepped"))
return(paste0("-----", a))
stop("Unrecognised SPECIAL")
}
.%_% "hello" %_%.
#> [1] "-----hello"
However, this is a weird thing to do in R. It's not idiomatic and uses more keystrokes than a simple function call would. It would also very likely cause unpredictable problems in places where non-standard evaluation is used. This is really just a demo to show that it can be done. Not that it should be done.
Writing a simple function seems like a more R-like solution. If terseness is a priority, then maybe something like
._ <- function(x) paste0("-----", x)
._("hello")
# [1] "-----hello"
Or if you wanted something more bracket-like
.. <- structure(list(NULL), class="dasher")
`[.dasher` <- function(a, x) paste0("-----", x)
..["hello"]
# [1] "-----hello"
Another way to use a custom class would be to redefine the - operator to paste that value in front of the string. For example
literal <- function(x) {class(x)<-"literal"; x}
`-.literal` <- function(e1, e2) {literal(paste0("-", unclass(e1)))}
print.literal <- function(x) print(unclass(x))
Then you can do
val <- literal("hello")
-----val
# [1] "-----hello"
---val
# [1] "---hello"
So here the number of - you type is the number you get in the output.
You can get creative/weird with syntax, but you need to make sure whatever symbols you come up with can be parsed by the parser otherwise you are out-of-luck.

How to match any character existing between a pattern and a semicolon

I am trying to get anything existing between sample_id= and ; in a vector like this:
sample_id=10221108;gender=male
tissue_id=23;sample_id=321108;gender=male
treatment=no;tissue_id=98;sample_id=22
My desired output would be:
10221108
321108
22
How can I get this?
I've been trying several things like this, but I don't find the way to do it correctly:
clinical_data$sample_id<-c(sapply(myvector, function(x) sub("subject_id=.;", "\\1", x)))
You could use sub with a capture group to isolate that which you are trying to match:
out <- sub("^.*\\bsample_id=(\\d+).*$", "\\1", x)
out
[1] "10221108" "321108" "22"
Data:
x <- c("sample_id=10221108;gender=male",
"tissue_id=23;sample_id=321108;gender=male",
"treatment=no;tissue_id=98;sample_id=22")
Note that the actual output above is character, not numeric. But, you may easily convert using as.numeric if you need to do that.
Edit:
If you are unsure that the sample IDs would always be just digits, here is another version you may use to capture any content following sample_id:
out <- sub("^.*\\bsample_id=([^;]+).*$", "\\1", x)
out
You could try the str_extract method which utilizes the Stringr package.
If your data is separated by line, you can do:
str_extract("(?<=\\bsample_id=)([:digit:]+)") #this tells the extraction to target anything that is proceeded by a sample_id= and is a series of digits, the + captures all of the digits
This would extract just the numbers per line, if your data is all collected like that, it becomes a tad more difficult because you will have to tell the extraction to continue even if it has extracted something. The code would look something like this:
str_extract_all("((?<=sample_id=)\\d+)")
This code will extract all of the numbers you're looking for and the output will be a list. From there you can manipulate the list as you see fit.

Regular expression to find function calls in a function body

Please consider the body of read.table as a text file, created with the following code:
sink("readTable.txt")
body(read.table)
sink()
Using regular expressions, I'd like to find all function calls of the form foo(a, b, c) (but with any number of arguments) in "readTable.txt". That is, I'd like the result to contain the names of all called functions in the body of read.table. This includes nested functions of the form
foo(a, bar(b, c)). Reserved words (return, for, etc) and functions that use back-ticks ('=='(), '+'(), etc) can be included since I can remove them later.
So in general, I'm looking for the pattern text( or text ( then possible nested functions like text1(text2(, but skipping over the text if it's an argument, and not a function. Here's where I'm at so far. It's close, but not quite there.
x <- readLines("readTable.txt")
regx <- "^(([[:print:]]*)\\(+.*\\))"
mat <- regexpr(regx, x)
lines <- regmatches(x, mat)
fns <- gsub(".*( |(=|(<-)))", "", lines)
head(fns, 10)
# [1] "default.stringsAsFactors()" "!missing(text))"
# [3] "\"UTF-8\")" "on.exit(close(file))" "(is.character(file))"
# [6] "(nzchar(fileEncoding))" "fileEncoding)" "\"rt\")"
# [9] "on.exit(close(file))" "\"connection\"))"
For example, in [9] above, the calls are there, but I do not want file in the result. Ideally it would be on.exit(close(.
How can I go about improving this regular expression?
If you've ever tried to parse HTML with a regular expression you know what a nightmare it can be. It's always better to use some HTML parser and extract info that way. I feel the same way about R code. The beauty of R is that it's functional and you inspect any function via code.
Something like
call.ignore <-c("[[", "[", "&","&&","|","||","==","!=",
"-","+", "*","/", "!", ">","<", ":")
find.funcs <- function(f, descend=FALSE) {
if( is.function(f)) {
return(find.funcs(body(f), descend=descend))
} else if (is(f, "name") | is.atomic(f)) {
return(character(0))
}
v <- list()
if (is(f, "call") && !(deparse(f[[1]]) %in% call.ignore)) {
v[[1]] <- deparse(f)
if(!descend) return(v[[1]])
}
v <- append(v, lapply(as.list(f), find.funcs, descend=descend))
unname(do.call(c, v))
}
could work. Here we iterate over each object in the function looking for calls, ignoring those you don't care about. You would run it on a function like
find.funcs(read.table)
# [1] "default.stringsAsFactors()"
# [2] "missing(file)"
# [3] "missing(text)"
# [4] "textConnection(text, encoding = \"UTF-8\")"
# [5] "on.exit(close(file))"
# [6] "is.character(file)"
# ...
You can set the descend= parameter to TRUE if you want to look in calls to functions for other functions.
I'm sure there are plenty of packages that make this easier, but I just wanted to show how simple it really is.
Recursive Regex in Perl Mode
In the general case, I am sure you're aware of the hazards of trying to match such constructions: what if your file contains things like if() that you don't want to match?
That being said, I believe this recursive regex fits the requirements as I understand them
[a-z]+(\((?:`[()]|[^()]|(?1))*\))
See demo.
I'm not completely up to scratch on R syntax, but something like this should work, and you can tweak the function name and arguments to suit your needs:
grepl("[a-z]+(\\((?:`[()]|[^()]|(?1))*\\))", subject, perl=TRUE);
Explanation
[a-z]+ matches the letters before the opening parenthesis
( starts Group 1
\( matches an opening parenthesis
(?: starts a non-capture group that will be repeated. The capture group matches several possibilities:
BACKTICK[()] matches a backtick + ( or ) (sorry, don't know how to make the backtick appear in this editor
|[^()] OR match one character that is not a parenthesis
|(?1) OR match the pattern defined by the Group 1 parentheses (recurse)
)* close non-capture group, repeat zero or more times
\) matches a closing parenthesis
) ends Group 1

R - Plot: How to format in 10-base scientific notation and put it text, mtex, title etc functions?

I have numeric variable, say K=3.5e-5 (its values is calculated throughout my script). I want to write this value somewhere (title, as text in the plot, etc) in my plot as:
K_{root} = 3.5 10^{-5} cm /d
I have tried the functions bquote, substitute and no one worked.
Let's put the question in examples. I have tried the following:
1)
png("exp_1.png")
kroot = 3.5e-5
plot(1:10,1:10,
text(4,9,bquote(italic(K[root])~"="~.(kroot)~"cm/d")))
dev.off()
Try my favorite function, paste().
plot(1:10,1:10,
text(4,9,gsub("e",paste("K[root]=",format(k,scientific=TRUE),"cm/d",sep=" "),replacement=" 10^")))
You can replace the "e" here using the function gsub. I've edited my answer to include this.
The output:
> k=.0000035
> k
[1] 3.5e-06
> gsub("e",paste("K[root]=",format(k,scientific=TRUE),"} cm/d",sep=" "),replacement=" 10^{ ")
[1] "K[root]= 3.5 10^{ -06 } cm/d"
You can remove the extra spaces around { -06 } by using the function substr, if it's important, or simply leave out the curly brackets in the gsub statement.
I try to avoid using paste inside expressions. There is generally a cleaner way to approach this:
expon <- floor(log10(kroot)) # returns -5
mantis <- kroot*10^(-1*expon ) # returns 3.5
plot(1:10,1:10,
text(4,9,substitute( italic(K[root]) == mantis %.% pten* expon ~cm/d,
list(expon=expon, mantis=mantis, pten=" 10^")))

R: Using ellipsis with a function that takes a arbitrary number of arguments

Many a times, I find myself typing the following
print(paste0(val1,',',val2,',',val3)) to print the output from a function with variables separated by a comma.
It is handy when I want to copy generate a csv file from the output.
I was wondering if I can write a function in R that does this for me. With many attempts, I could only get to this the following.
ppc <- function(string1,string2,...) print(paste0(string1,',',string2,',',...,))
It works well for at the maximum of three arguments.
> ppc(1,2,3)
[1] "1,2,3"
> ppc(1,2,3,4)
[1] "1,2,34"
ppc(1,2,3,4) should have given "1,2,3,4". How can I correct my function? I somehow believe that this is possible in R.
You don't need to write your own function. You can do this with paste.
paste(1:3,collapse=",")
# [1] "1,2,3"
Or, in case you insist on a ppc() function:
ppc <- function(...) paste(...,sep=",")
ppc(1,2,3,4)

Resources