R NAMESPACE export pattern using Perl style (non-consuming regular expression) - r

I want to export all functions from an R-package calle myPackageId that do not start with a period and do not start with string "myPackageId_".
Functions with the second pattern are automatically generated by Rcpp in C-code as "RcppExport SEXP myPackageId_cFunctionname" that should not be exported by the package.
I found a solution using non-consuming regular expression:
exportPattern("(?=^[^\\.])(?=^(?!myPackageId_))")
This works with R grep with option perl=TRUE. However, default R grep with extended RE and R CMD INSTALL complain about an invalid pattern.
words <- c(".test","test","myPackageId_test")
grep("(?=^[^\\.])(?=^(?!myPackageId_))",words, perl=TRUE)
grep("(?=^[^\\.])(?=^(?!myPackageId_))",words)
In the above example, the word "test" would be accepted, whereas the other words ".test" and "myPackageId_test" would not be accepted.
Expected inputs are all valid R names. These are the usual words composed of ASCII characters without whitespace. In R also the period, ".", can start a name.
Is there a pattern that I can use with grep option perl=FALSE to achieve the same goal?
Or can I tell R somehow in the NAMESPACE file to use the perl variant with grep?

Related

Does R 4.0.0. make it possible to define foo"(...)" operators, similar to the newly introduced r"(...)" syntax?

R 4.0.0 brings in a new syntax for raw strings:
r"(raw string here can contain anything except the closing sequence)"
But this same construct in R 3.x.x produced a syntax error:
Error: unexpected string constant in "r"(asdasd)""
Does it mean that the interpreter was changed in R 4.0.0. ?
And if so - does R 4.0.0. provide a mechanism to define custom functions like foo"()" ?
No, that's not possible at the moment (nor would I anticipate it becoming possible anytime soon).
Here's the NEWS item:
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
https://cran.r-project.org/doc/manuals/r-devel/NEWS.html
Then from ?Quotes:
Raw character constants are also available using a syntax similar to
the one used in C++: r"(...)" with ... any character
sequence, except that it must not contain the closing sequence
)". The delimiter pairs [] and {} can also be
used, and R can be used in place of r. For additional
flexibility, a number of dashes can be placed between the opening quote
and the opening delimiter, as long as the same number of dashes appear
between the closing delimiter and the closing quote.
https://github.com/wch/r-source/blob/trunk/src/library/base/man/Quotes.Rd
Here's the (git mirror of the SVN patch of the) commit where this functionality was added:
https://github.com/wch/r-source/commit/8b0e58041120ddd56cd3bb0442ebc00a3ab67ebc

Extract decimal numbers from string in Sparklyr

I've been trying to extract decimal numbers from strings in sparklyr, but it does not work with the regular syntax you would normally use outside of Spark.
I have tried using regexp_extract but it returns empty strings.
regexp_extract($170.5M, "[[:digit:]]+\\.*[[:digit:]]*")
I'm trying to get 170.5 as a result.
You could use regexpr from base R
v <- "$170.5M"
regmatches(v, regexpr("\\d*\\.\\d", v))
# [1] "170.5"
You may use
regexp_extract(col_value, "[0-9]+(?:[.][0-9]+)?")
Or
regexp_extract(col_value, "\\p{Digit}+(?:\\.\\p{Digit}+)?")
Your [[:digit:]]+\.*[[:digit:]]* regex does not work, becuae regexp_extract expects a Java compatible regex pattern and that engine does not support POSIX character classes in the [:classname:] syntax. You may use digit POSIX character class like \p{Digit}, see Java regex documentation.
See regexp_extract documentation:
Extract a specific(idx) group identified by a java regex, from the specified string column.

stargazer and omit regular expressions

I am trying to use regular expressions to omit some variables in stargazer. I finally found a working regex, but it's using the Perl standard. This doesn't work for the base regex in R, though regexpr in R can take a perl=T option. Given that you wrap the regex for variable sets to omit in "", you can't really pass it this option. Any ideas on how to use perl regex with stargazer?
An example of the regex I would like to use is
placed.ind2*(?:(?!:switchind).)*$
applied to these 4 strings:
placed.ind2PROF SERVICES
placed.ind2TRANSPORT
placed.ind2PROF SERVICES:switchind2TRUE
placed.ind2TRANSPORT:switchind2TRUE
I would like the first two to be selected, but the last to be.
Starting from version 4.0 (on CRAN now), you can run stargazer with the argument perl=TRUE to allow for Perl-compatible regular expressions in your other arguments.

Paste "25 \%" in R for further processing in LaTeX

I want a character variable in R taking the value from, lets say "a", and adding " \%", to create a %-sign later in LaTeX.
Usually I'd do something like:
a <- 5
paste(a,"\%")
but this fails.
Error: '\%' is an unrecognized escape in character string starting "\%"
Any ideas? A workaround would be to define another command giving the %-sign in LaTeX, but I'd prefer a solution within R.
As many other languages, certain characters in strings have a different meaning when they're escaped. One example for that is \n, which means newline instead of n. When you write \%, R tries to interpret % as a special character and fails doing so. You might want to try to escape the backslash, so that it is just a backslash:
paste(a, "\\%")
You can read on escape sequences here.
You can also look at the latexTranslate function from the Hmisc package, which will escape special characters from strings to make them LaTeX-compatible :
R> latexTranslate("You want to give me 100$ ? I agree 100% !")
[1] "You want to give me 100\\$ ? I agree 100\\% !"

using R to copy files

As part of a larger task performed in R run under windows, I would like to copy selected files between directories. Is it possible to give within R a command like cp patha/filea*.csv pathb (notice the wildcard, for extra spice)?
I don't think there is a direct way (shy of shelling-out), but something like the following usually works for me.
flist <- list.files("patha", "^filea.+[.]csv$", full.names = TRUE)
file.copy(flist, "pathb")
Notes:
I purposely decomposed in two steps, they can be combined.
See the regular expression: R uses true regex, and also separates the file pattern from the path, in two separate arguments.
note the ^ and $ (beg/end of string) in the regex -- this is a common gotcha, as these are implicit to wildcard-type patterns, but required with regexes (lest some file names which match the wildcard pattern but also start and/or end with additional text be selected as well).
In the Windows world, people will typically add the ignore.case = TRUE argument to list.files, in order to emulate the fact that directory searches are case insensitive with this OS.
R's glob2rx() function provides a convenient way to convert wildcard patterns to regular expressions. For example fpattern = glob2rx('filea*.csv') returns a different but equivalent regex.
You can
use system() to fire off a command as if it was on shell, incl globbing
use list.files() aka dir() to do the globbing / reg.exp matching yourself and the copy the files individually
use file.copy on individual files as shown in mjv's answer

Resources