stargazer and omit regular expressions - r

I am trying to use regular expressions to omit some variables in stargazer. I finally found a working regex, but it's using the Perl standard. This doesn't work for the base regex in R, though regexpr in R can take a perl=T option. Given that you wrap the regex for variable sets to omit in "", you can't really pass it this option. Any ideas on how to use perl regex with stargazer?
An example of the regex I would like to use is
placed.ind2*(?:(?!:switchind).)*$
applied to these 4 strings:
placed.ind2PROF SERVICES
placed.ind2TRANSPORT
placed.ind2PROF SERVICES:switchind2TRUE
placed.ind2TRANSPORT:switchind2TRUE
I would like the first two to be selected, but the last to be.

Starting from version 4.0 (on CRAN now), you can run stargazer with the argument perl=TRUE to allow for Perl-compatible regular expressions in your other arguments.

Related

Number after captured group in regex

I want to write a simple RegEx to add leading zeros to my R code. Simplest way is to find (\s)\.(\d) and replace it with \10.\2. But it doesn't work in R as it apparently thinks it's 10th captured group rather than 1st followed by a literal 0. According to this question RStudio uses PCRE but no method for PCRE (or any other engine) from those described here works in RStudio find & replace feature. Is it possible to put a number after a captured group without leaving RStudio?
As a work-around, you can use lookarounds here:
Search for: (?<=\s)\.(?=\d)
Replace with: 0.
See the regex demo.

Regular Expressions, containing specific words

I need to specify that my user name has to start with one of two words, followed by a backslash. It can only accept the words cat or dog at the beggining and then a backslash is expected, for example:
cat\something
dog\something
Both are accepted. What's the regular expression for this? I've tried some regular expressions but I haven«t figured it out yet.
The solution is:
^cat\\.*|^dog\\.*
https://regex101.com/ is a great tool for testing and evaluating regular expressions.

Extract decimal numbers from string in Sparklyr

I've been trying to extract decimal numbers from strings in sparklyr, but it does not work with the regular syntax you would normally use outside of Spark.
I have tried using regexp_extract but it returns empty strings.
regexp_extract($170.5M, "[[:digit:]]+\\.*[[:digit:]]*")
I'm trying to get 170.5 as a result.
You could use regexpr from base R
v <- "$170.5M"
regmatches(v, regexpr("\\d*\\.\\d", v))
# [1] "170.5"
You may use
regexp_extract(col_value, "[0-9]+(?:[.][0-9]+)?")
Or
regexp_extract(col_value, "\\p{Digit}+(?:\\.\\p{Digit}+)?")
Your [[:digit:]]+\.*[[:digit:]]* regex does not work, becuae regexp_extract expects a Java compatible regex pattern and that engine does not support POSIX character classes in the [:classname:] syntax. You may use digit POSIX character class like \p{Digit}, see Java regex documentation.
See regexp_extract documentation:
Extract a specific(idx) group identified by a java regex, from the specified string column.

Match everything up until first instance of a colon

Trying to code up a Regex in R to match everything before the first occurrence of a colon.
Let's say I have:
time = "12:05:41"
I'm trying to extract just the 12. My strategy was to do something like this:
grep(".+?(?=:)", time, value = TRUE)
But I'm getting the error that it's an invalid Regex. Thoughts?
Your regex seems fine in my opinion, I don't think you should use grep, also you are missing perl=TRUE that is why you are getting the error.
I would recommend using :
stringr::str_extract( time, "\\d+?(?=:)")
grep is little different than it is being used here, its good for matching separate values and filtering out those which has similar pattern, but you can't pluck out values within a string using grep.
If you want to use Base R you can also go for sub:
sub("^(\\d+?)(?=:)(.*)$","\\1",time, perl=TRUE)
Also, you may split the string using strsplit and filter out the first string like below:
strsplit(time, ":")[[1]][1]

Conditional regular expressions in stringr

I'm wondering how to implement a conditional regular expression in R. It seems that this can be implemented in PERL:
?(if)then|else
However, I'm having trouble figuring out how to implement this in R. As a simple example, let's say I have the following strings:
c('abcabd', 'abcabe')
I would like the regular expression to match "bd" if it is there and "bc" otherwise, then replace it with "zz". Thus, I would like the strings above to be:
c('abcazz', 'azzabe')
I have tried this using both sub and str_replace neither of which seem to work. It seems that my syntax might be wrong in sub:
sub('b(?(?=d)d|c)', 'zz', c('abcabe','abcabd'), perl=TRUE)
[1] "azzabe" "azzabd"
The logic is "match b, if followed by d match d, otherwise match c". With str_replace, I get errors :
str_replace(c('abcabe','abcabd'), regex('b(?(?=d)d|c)'), 'zz')
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
Use of regexp feature that is not yet implemented. (U_REGEX_UNIMPLEMENTED)
I primarily use stringr so would prefer a solution using str_replace but open to solutions using sub.
You are almost near but you should have conditional pattern true assertion in each step:
(?(?=.*bd)bd|bc)
Live demo
You don't even need conditional regex:
^(.*)bd|bc
R code:
sub('^(.*)bd|bc', '\\1zz', c('abcabe','abcabd'))

Resources