regex out a text from a line in R - r

I have line like this:
x<-c("System configuration: lcpu=96 mem=196608MB ent=16.00")
I need to the the value equal to ent and store it in val object in R
I am doing this not not seem to be working. Any ideas?
val<-x[grep("[0-9]$", x)]

use sub:
val <- sub('^.* ent=([[:digit:]]+)', '\\1', x)

If ent is always at the end then:
sub(".*ent=", "", x)
If not try strapplyc in the gsubfn package which returns only the portion of the regular expression within parentheses:
library(gsubfn)
strapplyc(x, "ent=([.0-9]+)", simplify = TRUE)
Also it could be converted to numeric at the same time using strapply :
strapply(x, "ent=([.0-9]+)", as.numeric, simplify = TRUE)

Using rex may make this type of task a little simpler.
Note this solution correctly includes . in the capture, as does G. Grothendieck's answer.
x <- c("System configuration: lcpu=96 mem=196608MB ent=16.00")
library(rex)
val <- as.numeric(
re_matches(x,
rex("ent=",
capture(name = "ent", some_of(digit, "."))
)
)$ent
)
#>[1] 16

Related

Extract a part of string and join to the start

I have a list
temp_list <- c("Temp01_T1", "Temp03_T1", "Temp04_T1", "Temp11_T6",
"Temp121_T6", "Temp99_T8")
I want to change this list as follows
output_list <- c("T1_Temp01", "T1_Temp03", "T1_Temp04", "T6_Temp11",
"T6_Temp121", "T8_Temp99")
Any leads would be appreciated
Base R answer using sub -
sub('(\\w+)_(\\w+)', '\\2_\\1', temp_list)
#[1] "T1_Temp01" "T1_Temp03" "T1_Temp04" "T6_Temp11" "T6_Temp121" "T8_Temp99"
You can capture the data in two capture groups, one before underscore and another one after the underscore and reverse them using backreference.
This should work.
sub("(.*)(_)(.*)", "\\3\\2\\1", temp_list)
You capture the three groups, one before the underscore, one is the underscore and one after and then you rearrange the order in the replacement expression.
python approach: splitting on '_', reversing order with -1 step index slicing, and joining with '_':
['_'.join(i.split('_')[::-1]) for i in temp_list]
You can use str_split() from stringr and then use sapply() to paste it in the order you want if it is always seperated by a _
x <- stringr::str_split(temp_list, "_")
sapply(x, function(x){paste(x[[2]],x[[1]],sep = "_")})
A trick with paste + read.table
> do.call(paste, c(rev(read.table(text = temp_list, sep = "_")), sep = "_"))
[1] "T1_Temp01" "T1_Temp03" "T1_Temp04" "T6_Temp11" "T6_Temp121"
[6] "T8_Temp99"

Replace multiple strings comprising of a different number of characters with one gsubfn()

Here Replace multiple strings in one gsub() or chartr() statement in R? it is explained to replace multiple strings of one character at in one statement with gsubfn(). E.g.:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", " " = ""), x)
# "doremig_k"
I would however like to replace the string 'doremi' in the example with ''. This does not work:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", "doremi" = ""), x)
# "doremi g_k"
I guess it is because of the fact that the string 'doremi' contains multiple characters and me using the metacharacter . in gsubfn. I have no idea what to replace it with - I must confess I find the use of metacharacters sometimes a bit difficult to udnerstand. Thus, is there a way for me to replace '-' and 'doremi' at once?
You might be able to just use base R sub here:
x <- "doremi g-k"
result <- sub("doremi\\s+([^-]+)-([^-]+)", "\\1_\\2", x)
result
[1] "g_k"
Does this work for you?
gsubfn::gsubfn(pattern = "doremi|-", list("-" = "_", "doremi" = ""), x)
[1] " g_k"
The key is this search: "doremi|-" which tells to search for either "doremi" or "-". Use "|" as the or operator.
Just a more generic solution to #RLave's solution -
toreplace <- list("-" = "_", "doremi" = "")
gsubfn(paste(names(toreplace),collapse="|"), toreplace, x)
[1] " g_k"

How would I replace all but the last period with underscore?

How would I replace all but the last period with underscore?
x <- "foo.foo.foo.foo.f"
# "foo_foo_foo_foo.f"
Maybe this is helpful
library(stringi)
stri_replace_last(str = stri_replace_all(str = x,regex = "\\.",replacement = "\\_"),regex = "\\_",replacement = "\\.")
#Richard Scriven's comment worked best for me:
gsub("\\.(?=[^.]*\\.)", "_", x, perl = TRUE)
A PCRE option would be
gsub("(\\.[^.]*)$(*SKIP)(*FAIL)|\\.", "_", x, perl = TRUE)
#[1] "foo_foo_foo_foo.f"
A slightly different approach, but should do what you need it to:
library(stringr)
x <- "foo.foo.foo.foo.f"
x_split <- str_split(x, "\\.")[[1]]
x_new <- paste(x_split[-length(x_split)], collapse = "_")
x_new <- paste(x_new, x_split[length(x_split)], sep = ".")
x_new
# "foo_foo_foo_foo.f"
It will always treat the last split differently and will generalise to any possible text between the periods.
You could probably avoid the use of the stringr package if you wanted to (it just wraps stringi and base R string functions with a common interface).

How to convert CamelCase to not.camel.case in R

In R, I'd like to convert
c("ThisText", "NextText")
to
c("this.text", "next.text")
This is the reverse of this SO question, and the same as this one but with dots in R rather than underscores in PHP.
Not clear what the entire set of rules is here but we have assumed that
we should lower case any upper case character after a lower case one and insert a dot between them and also
lower case the first character of the string if succeeded by a lower case character.
To do this we can use perl regular expressions with sub and gsub:
# test data
camelCase <- c("ThisText", "NextText", "DON'T_CHANGE")
s <- gsub("([a-z])([A-Z])", "\\1.\\L\\2", camelCase, perl = TRUE)
sub("^(.[a-z])", "\\L\\1", s, perl = TRUE) # make 1st char lower case
giving:
[1] "this.text" "next.text" "DON'T_CHANGE"
You could do this also via the snakecase package:
install.packages("snakecase")
library(snakecase)
to_snake_case(c("ThisText", "NextText"), sep_out = ".")
# [1] "this.text" "next.text"
Github link to package: https://github.com/Tazinho/snakecase
You can replace all capitals with themselves and a preceeding dot with gsub, change everything tolower, and the substr out the initial dot:
x <- c("ThisText", "NextText", "LongerCamelCaseText")
substr(tolower(gsub("([A-Z])","\\.\\1",x)),2,.Machine$integer.max)
[1] "this.text" "next.text" "longer.camel.case.text"
Using stringr
x <- c("ThisText", "NextText")
str_replace_all(string = x,
pattern = "(?<=[a-z0-9])(?=[A-Z])",
replacement = ".") %>%
str_to_lower()
OR
x <- c("ThisText", "NextText")
str_to_lower(
str_replace_all(string = x,
pattern = "(?<=[a-z0-9])(?=[A-Z])",
replacement = ".")
)

Regular Expression: replace the n-th occurence

does someone know how to find the n-th occurcence of a string within an expression and how to replace it by regular expression?
for example I have the following string
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
and I want to replace the 5th occurence of '-' by '|'
and the 7th occurence of '-' by "||" like
[1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa
How do I do this?
Thanks,
Florian
(1) sub It can be done in a single regular expression with sub:
> sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(2) sub twice or this variation which calls sub twice:
> txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE)
> sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(3) sub.fun or this variation which creates a function sub.fun which does one substitute. it makes use of fn$ from the gsubfn package to substitute n-1, pat, and value into the sub arguments. First define the indicated function and then call it twice.
library(gsubfn)
sub.fun <- function(x, pat, n, value) {
fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE)
}
> sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|")
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(We could have modified the arguments to sub in the body of sub.fun using paste or sprintf to give a base R solution but at the expense of some additional verbosity.)
This can be reformulated as a replacement function giving this pleasing sequence:
"sub.fun<-" <- sub.fun
tt <- txt # make a copy so that we preserve the input txt
sub.fun(tt, "-", 7) <- "||"
sub.fun(tt, "-", 5) <- "|"
> tt
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(4) gsubfn Using gsubfn from the gsubfn package we can use a particularly simple regular expression (its just "-") and the code has quite a straight forward structure. We perform the substitution via a proto method. The proto object containing the method is passed in place of a replacement string. The simplicity of this approach derives fron the fact that gsubfn automatically makes a count variable available to such methods:
library(gsubfn) # gsubfn also pulls in proto
p <- proto(fun = function(this, x) {
if (count == 5) return("|")
if (count == 7) return("||")
x
})
> gsubfn("-", p, txt)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
UPDATE: Some corrections.
UPDATE 2: Added a replacement function approach to (3).
UPDATE 3: Added pat argument to sub.fun.
An alternative possibility is using Hadley's stringr package which builds the basis for the function I wrote:
require(stringr)
replace.nth <- function(string, pattern, replacement, n) {
locations <- str_locate_all(string, pattern)
str_sub(string, locations[[1]][n, 1], locations[[1]][n, 2]) <- replacement
string
}
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
txt.new <- replace.nth(txt, "-", "|", 5)
txt.new <- replace.nth(txt.new, "-", "||", 7)
txt.new
# [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa-aaa||aaa-aaa"
One way to do this is to use gregexpr to find the positions of the -:
posns <- gregexpr("-",txt)[[1]]
And then pasting together the relevant pieces and separators:
paste0(substr(txt,1,posns[5]-1),"|",substr(txt,posns[5]+1,posns[7]-1),"||",substr(txt,posns[7]+1,nchar(txt)))
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

Resources