How to convert CamelCase to not.camel.case in R - r

In R, I'd like to convert
c("ThisText", "NextText")
to
c("this.text", "next.text")
This is the reverse of this SO question, and the same as this one but with dots in R rather than underscores in PHP.

Not clear what the entire set of rules is here but we have assumed that
we should lower case any upper case character after a lower case one and insert a dot between them and also
lower case the first character of the string if succeeded by a lower case character.
To do this we can use perl regular expressions with sub and gsub:
# test data
camelCase <- c("ThisText", "NextText", "DON'T_CHANGE")
s <- gsub("([a-z])([A-Z])", "\\1.\\L\\2", camelCase, perl = TRUE)
sub("^(.[a-z])", "\\L\\1", s, perl = TRUE) # make 1st char lower case
giving:
[1] "this.text" "next.text" "DON'T_CHANGE"

You could do this also via the snakecase package:
install.packages("snakecase")
library(snakecase)
to_snake_case(c("ThisText", "NextText"), sep_out = ".")
# [1] "this.text" "next.text"
Github link to package: https://github.com/Tazinho/snakecase

You can replace all capitals with themselves and a preceeding dot with gsub, change everything tolower, and the substr out the initial dot:
x <- c("ThisText", "NextText", "LongerCamelCaseText")
substr(tolower(gsub("([A-Z])","\\.\\1",x)),2,.Machine$integer.max)
[1] "this.text" "next.text" "longer.camel.case.text"

Using stringr
x <- c("ThisText", "NextText")
str_replace_all(string = x,
pattern = "(?<=[a-z0-9])(?=[A-Z])",
replacement = ".") %>%
str_to_lower()
OR
x <- c("ThisText", "NextText")
str_to_lower(
str_replace_all(string = x,
pattern = "(?<=[a-z0-9])(?=[A-Z])",
replacement = ".")
)

Related

How to extract specific symbol from text in R

I'm setting up a new project in R ,and want to extract specific symbol from text
X <- c("amazing tiny phone ^_^","so cute!!! <3")
I would like to extract ^_^ and <3 from X in R
Thank you!
More straightforward
X = c("amazing tiny phone ^_^","so cute!!! <3","^_^ and :) are my fav symbols")
patt=c("=d" ,"<3" , ":o" , ":(" ,
":)" , "(y)" , ":*" , "^_^", ":d" ,";)" , ":'(")
variable = sapply(X,function(x){
i = which(patt%in%strsplit(x," ")[[1]])
if (length(i)>0){
paste(patt[i],collapse=" ")
} else{NA}
})
names(variable)=NULL
> variable
[1] "^_^" "<3" ":) ^_^" NA
#GraemeForst A generalization could be achieved using groupings and lookaheads:
group <- "[\\^\\_\\<\\>3\\:\\(\\)\\;]"
pat <- sprintf(".*[\\s\\b](%s+)(?!\\1)", group)
group defines the character grouping. Basically all symbols we want to extract.
pat defines our matching pattern. The [\\s\\b] says prior to a possible match there must be either a blank or the boundary. And (?!\\1) say after a match there cannot be an element of group.
Here is a demo:
X <- c("amazing tiny phone ^_^","so cute!!! <3", "I like pizza :)", "hello beautiful ;)")
gsub(pat, "\\1", grep(pat, X, value = TRUE, perl = TRUE), perl = TRUE)
# [1] "^_^" "<3" ":)" ";)"
This can be further refined and generalized. An very simple step one can add is to extend the grouping.
Old Answer
You can use regex for this:
# create the pattern to be extracted
pat = ".*(\\^\\_\\^).*|.*(\\<3).*" # escape special characters with "\\" and ".*" to specify there may be text before/after
# extract
gsub(pat, "\\1\\2", grep(pat, X, value = TRUE, perl = TRUE), perl = TRUE)
# [1] "^_^" "<3"

Replace multiple strings comprising of a different number of characters with one gsubfn()

Here Replace multiple strings in one gsub() or chartr() statement in R? it is explained to replace multiple strings of one character at in one statement with gsubfn(). E.g.:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", " " = ""), x)
# "doremig_k"
I would however like to replace the string 'doremi' in the example with ''. This does not work:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", "doremi" = ""), x)
# "doremi g_k"
I guess it is because of the fact that the string 'doremi' contains multiple characters and me using the metacharacter . in gsubfn. I have no idea what to replace it with - I must confess I find the use of metacharacters sometimes a bit difficult to udnerstand. Thus, is there a way for me to replace '-' and 'doremi' at once?
You might be able to just use base R sub here:
x <- "doremi g-k"
result <- sub("doremi\\s+([^-]+)-([^-]+)", "\\1_\\2", x)
result
[1] "g_k"
Does this work for you?
gsubfn::gsubfn(pattern = "doremi|-", list("-" = "_", "doremi" = ""), x)
[1] " g_k"
The key is this search: "doremi|-" which tells to search for either "doremi" or "-". Use "|" as the or operator.
Just a more generic solution to #RLave's solution -
toreplace <- list("-" = "_", "doremi" = "")
gsubfn(paste(names(toreplace),collapse="|"), toreplace, x)
[1] " g_k"

How to replace the certain character in certain position in the string?

I have a question that is how to replace a character which is in a certain place. For example:
str <- c("abcdccc","hijklccc","abcuioccc")
#I want to replace character "c" which is in position 3 to "X" how can I do that?
#I know the function gsub and substr, but the only idea I have got so far is
#use if() to make it. How can I do it quickly?
#ideal result
>str
"abXdccc" "hijklccc" "abXuioccc"
It's a bit awkward, but you can replace a single character dependent on that single character's value like:
ifelse(substr(str,3,3)=="c", `substr<-`(str,3,3,"X"), str)
#[1] "abXdccc" "hijklccc" "abXuioccc"
If you are happy to overwrite the value, you could do it a bit cleaner:
substr(str[substr(str,3,3)=="c"],3,3) <- "X"
str
#[1] "abXdccc" "hijklccc" "abXuioccc"
I wonder if you can use a regex lookahead here to get what you are after.
str <- c("abcdccc","hijklccc","abcuioccc")
gsub("(^.{2})(?=c)(.*$)", "\\1X\\2", str, perl = T)
Or using a positive lookbehind as suggested by thelatemail
sub("(?<=^.{2})c", "X", str, perl = TRUE)
What this is doing is looking to match the letter c which is after any two characters from the start of the string. The c is replaced with X.
(?<= is the start of positive lookbehind
^.{2} means any two characters from the start of the string
)c is the last part which says it has to be a c after the two characters
[1] "abXcdccc" "hijklccc" "abXcuioccc"
If you want to read up more about regex being used (link)
Additionally a generalised function:
switch_letter <- function(x, letter, position, replacement) {
stopifnot(position > 1)
pattern <- paste0("(?<=^.{", position - 1, "})", letter)
sub(pattern, replacement, x, perl = TRUE)
}
switch_letter(str, "c", 3, "X")
This should work too:
str <- c("abcdefg","hijklnm","abcuiowre")
a <- strsplit(str[1], "")[[1]]
a[3] <- "X"
a <- paste(a, collapse = '')
str[1] <- a
How about this idea:
c2Xon3 <- function(x){sprintf("%s%s%s",substring(x,1,3),gsub("c","X",substring(x,3,3)),substring(x,4,nchar(x)))}
str <- c("abcdccc","hijklccc","abcuioccc")
strNew <- sapply(str,c2Xon3 )
This should work
str <- c("abcdefg","hijklnm","abcuiowre")
for (i in 1:length(str))
{
if (substr(str[i],3,3)=='c') {
substr(str[i], 3, 3) <- "X"
}
}
You can just use ifelse with gsub, i.e.
ifelse(substr(str, 3, 3) == 'c', paste0(substring(str, 1, 2),'X', substring(str, 4)), str)
#[1] "abXdccc" "hijklccc" "abXuioccc"

Regular Expression: replace the n-th occurence

does someone know how to find the n-th occurcence of a string within an expression and how to replace it by regular expression?
for example I have the following string
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
and I want to replace the 5th occurence of '-' by '|'
and the 7th occurence of '-' by "||" like
[1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa
How do I do this?
Thanks,
Florian
(1) sub It can be done in a single regular expression with sub:
> sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(2) sub twice or this variation which calls sub twice:
> txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE)
> sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(3) sub.fun or this variation which creates a function sub.fun which does one substitute. it makes use of fn$ from the gsubfn package to substitute n-1, pat, and value into the sub arguments. First define the indicated function and then call it twice.
library(gsubfn)
sub.fun <- function(x, pat, n, value) {
fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE)
}
> sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|")
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(We could have modified the arguments to sub in the body of sub.fun using paste or sprintf to give a base R solution but at the expense of some additional verbosity.)
This can be reformulated as a replacement function giving this pleasing sequence:
"sub.fun<-" <- sub.fun
tt <- txt # make a copy so that we preserve the input txt
sub.fun(tt, "-", 7) <- "||"
sub.fun(tt, "-", 5) <- "|"
> tt
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(4) gsubfn Using gsubfn from the gsubfn package we can use a particularly simple regular expression (its just "-") and the code has quite a straight forward structure. We perform the substitution via a proto method. The proto object containing the method is passed in place of a replacement string. The simplicity of this approach derives fron the fact that gsubfn automatically makes a count variable available to such methods:
library(gsubfn) # gsubfn also pulls in proto
p <- proto(fun = function(this, x) {
if (count == 5) return("|")
if (count == 7) return("||")
x
})
> gsubfn("-", p, txt)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
UPDATE: Some corrections.
UPDATE 2: Added a replacement function approach to (3).
UPDATE 3: Added pat argument to sub.fun.
An alternative possibility is using Hadley's stringr package which builds the basis for the function I wrote:
require(stringr)
replace.nth <- function(string, pattern, replacement, n) {
locations <- str_locate_all(string, pattern)
str_sub(string, locations[[1]][n, 1], locations[[1]][n, 2]) <- replacement
string
}
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
txt.new <- replace.nth(txt, "-", "|", 5)
txt.new <- replace.nth(txt.new, "-", "||", 7)
txt.new
# [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa-aaa||aaa-aaa"
One way to do this is to use gregexpr to find the positions of the -:
posns <- gregexpr("-",txt)[[1]]
And then pasting together the relevant pieces and separators:
paste0(substr(txt,1,posns[5]-1),"|",substr(txt,posns[5]+1,posns[7]-1),"||",substr(txt,posns[7]+1,nchar(txt)))
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

regex out a text from a line in R

I have line like this:
x<-c("System configuration: lcpu=96 mem=196608MB ent=16.00")
I need to the the value equal to ent and store it in val object in R
I am doing this not not seem to be working. Any ideas?
val<-x[grep("[0-9]$", x)]
use sub:
val <- sub('^.* ent=([[:digit:]]+)', '\\1', x)
If ent is always at the end then:
sub(".*ent=", "", x)
If not try strapplyc in the gsubfn package which returns only the portion of the regular expression within parentheses:
library(gsubfn)
strapplyc(x, "ent=([.0-9]+)", simplify = TRUE)
Also it could be converted to numeric at the same time using strapply :
strapply(x, "ent=([.0-9]+)", as.numeric, simplify = TRUE)
Using rex may make this type of task a little simpler.
Note this solution correctly includes . in the capture, as does G. Grothendieck's answer.
x <- c("System configuration: lcpu=96 mem=196608MB ent=16.00")
library(rex)
val <- as.numeric(
re_matches(x,
rex("ent=",
capture(name = "ent", some_of(digit, "."))
)
)$ent
)
#>[1] 16

Resources