In R, I'd like to convert
c("ThisText", "NextText")
to
c("this.text", "next.text")
This is the reverse of this SO question, and the same as this one but with dots in R rather than underscores in PHP.
Not clear what the entire set of rules is here but we have assumed that
we should lower case any upper case character after a lower case one and insert a dot between them and also
lower case the first character of the string if succeeded by a lower case character.
To do this we can use perl regular expressions with sub and gsub:
# test data
camelCase <- c("ThisText", "NextText", "DON'T_CHANGE")
s <- gsub("([a-z])([A-Z])", "\\1.\\L\\2", camelCase, perl = TRUE)
sub("^(.[a-z])", "\\L\\1", s, perl = TRUE) # make 1st char lower case
giving:
[1] "this.text" "next.text" "DON'T_CHANGE"
You could do this also via the snakecase package:
install.packages("snakecase")
library(snakecase)
to_snake_case(c("ThisText", "NextText"), sep_out = ".")
# [1] "this.text" "next.text"
Github link to package: https://github.com/Tazinho/snakecase
You can replace all capitals with themselves and a preceeding dot with gsub, change everything tolower, and the substr out the initial dot:
x <- c("ThisText", "NextText", "LongerCamelCaseText")
substr(tolower(gsub("([A-Z])","\\.\\1",x)),2,.Machine$integer.max)
[1] "this.text" "next.text" "longer.camel.case.text"
Using stringr
x <- c("ThisText", "NextText")
str_replace_all(string = x,
pattern = "(?<=[a-z0-9])(?=[A-Z])",
replacement = ".") %>%
str_to_lower()
OR
x <- c("ThisText", "NextText")
str_to_lower(
str_replace_all(string = x,
pattern = "(?<=[a-z0-9])(?=[A-Z])",
replacement = ".")
)
Related
I'm setting up a new project in R ,and want to extract specific symbol from text
X <- c("amazing tiny phone ^_^","so cute!!! <3")
I would like to extract ^_^ and <3 from X in R
Thank you!
More straightforward
X = c("amazing tiny phone ^_^","so cute!!! <3","^_^ and :) are my fav symbols")
patt=c("=d" ,"<3" , ":o" , ":(" ,
":)" , "(y)" , ":*" , "^_^", ":d" ,";)" , ":'(")
variable = sapply(X,function(x){
i = which(patt%in%strsplit(x," ")[[1]])
if (length(i)>0){
paste(patt[i],collapse=" ")
} else{NA}
})
names(variable)=NULL
> variable
[1] "^_^" "<3" ":) ^_^" NA
#GraemeForst A generalization could be achieved using groupings and lookaheads:
group <- "[\\^\\_\\<\\>3\\:\\(\\)\\;]"
pat <- sprintf(".*[\\s\\b](%s+)(?!\\1)", group)
group defines the character grouping. Basically all symbols we want to extract.
pat defines our matching pattern. The [\\s\\b] says prior to a possible match there must be either a blank or the boundary. And (?!\\1) say after a match there cannot be an element of group.
Here is a demo:
X <- c("amazing tiny phone ^_^","so cute!!! <3", "I like pizza :)", "hello beautiful ;)")
gsub(pat, "\\1", grep(pat, X, value = TRUE, perl = TRUE), perl = TRUE)
# [1] "^_^" "<3" ":)" ";)"
This can be further refined and generalized. An very simple step one can add is to extend the grouping.
Old Answer
You can use regex for this:
# create the pattern to be extracted
pat = ".*(\\^\\_\\^).*|.*(\\<3).*" # escape special characters with "\\" and ".*" to specify there may be text before/after
# extract
gsub(pat, "\\1\\2", grep(pat, X, value = TRUE, perl = TRUE), perl = TRUE)
# [1] "^_^" "<3"
Here Replace multiple strings in one gsub() or chartr() statement in R? it is explained to replace multiple strings of one character at in one statement with gsubfn(). E.g.:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", " " = ""), x)
# "doremig_k"
I would however like to replace the string 'doremi' in the example with ''. This does not work:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", "doremi" = ""), x)
# "doremi g_k"
I guess it is because of the fact that the string 'doremi' contains multiple characters and me using the metacharacter . in gsubfn. I have no idea what to replace it with - I must confess I find the use of metacharacters sometimes a bit difficult to udnerstand. Thus, is there a way for me to replace '-' and 'doremi' at once?
You might be able to just use base R sub here:
x <- "doremi g-k"
result <- sub("doremi\\s+([^-]+)-([^-]+)", "\\1_\\2", x)
result
[1] "g_k"
Does this work for you?
gsubfn::gsubfn(pattern = "doremi|-", list("-" = "_", "doremi" = ""), x)
[1] " g_k"
The key is this search: "doremi|-" which tells to search for either "doremi" or "-". Use "|" as the or operator.
Just a more generic solution to #RLave's solution -
toreplace <- list("-" = "_", "doremi" = "")
gsubfn(paste(names(toreplace),collapse="|"), toreplace, x)
[1] " g_k"
I have a question that is how to replace a character which is in a certain place. For example:
str <- c("abcdccc","hijklccc","abcuioccc")
#I want to replace character "c" which is in position 3 to "X" how can I do that?
#I know the function gsub and substr, but the only idea I have got so far is
#use if() to make it. How can I do it quickly?
#ideal result
>str
"abXdccc" "hijklccc" "abXuioccc"
It's a bit awkward, but you can replace a single character dependent on that single character's value like:
ifelse(substr(str,3,3)=="c", `substr<-`(str,3,3,"X"), str)
#[1] "abXdccc" "hijklccc" "abXuioccc"
If you are happy to overwrite the value, you could do it a bit cleaner:
substr(str[substr(str,3,3)=="c"],3,3) <- "X"
str
#[1] "abXdccc" "hijklccc" "abXuioccc"
I wonder if you can use a regex lookahead here to get what you are after.
str <- c("abcdccc","hijklccc","abcuioccc")
gsub("(^.{2})(?=c)(.*$)", "\\1X\\2", str, perl = T)
Or using a positive lookbehind as suggested by thelatemail
sub("(?<=^.{2})c", "X", str, perl = TRUE)
What this is doing is looking to match the letter c which is after any two characters from the start of the string. The c is replaced with X.
(?<= is the start of positive lookbehind
^.{2} means any two characters from the start of the string
)c is the last part which says it has to be a c after the two characters
[1] "abXcdccc" "hijklccc" "abXcuioccc"
If you want to read up more about regex being used (link)
Additionally a generalised function:
switch_letter <- function(x, letter, position, replacement) {
stopifnot(position > 1)
pattern <- paste0("(?<=^.{", position - 1, "})", letter)
sub(pattern, replacement, x, perl = TRUE)
}
switch_letter(str, "c", 3, "X")
This should work too:
str <- c("abcdefg","hijklnm","abcuiowre")
a <- strsplit(str[1], "")[[1]]
a[3] <- "X"
a <- paste(a, collapse = '')
str[1] <- a
How about this idea:
c2Xon3 <- function(x){sprintf("%s%s%s",substring(x,1,3),gsub("c","X",substring(x,3,3)),substring(x,4,nchar(x)))}
str <- c("abcdccc","hijklccc","abcuioccc")
strNew <- sapply(str,c2Xon3 )
This should work
str <- c("abcdefg","hijklnm","abcuiowre")
for (i in 1:length(str))
{
if (substr(str[i],3,3)=='c') {
substr(str[i], 3, 3) <- "X"
}
}
You can just use ifelse with gsub, i.e.
ifelse(substr(str, 3, 3) == 'c', paste0(substring(str, 1, 2),'X', substring(str, 4)), str)
#[1] "abXdccc" "hijklccc" "abXuioccc"
does someone know how to find the n-th occurcence of a string within an expression and how to replace it by regular expression?
for example I have the following string
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
and I want to replace the 5th occurence of '-' by '|'
and the 7th occurence of '-' by "||" like
[1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa
How do I do this?
Thanks,
Florian
(1) sub It can be done in a single regular expression with sub:
> sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(2) sub twice or this variation which calls sub twice:
> txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE)
> sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(3) sub.fun or this variation which creates a function sub.fun which does one substitute. it makes use of fn$ from the gsubfn package to substitute n-1, pat, and value into the sub arguments. First define the indicated function and then call it twice.
library(gsubfn)
sub.fun <- function(x, pat, n, value) {
fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE)
}
> sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|")
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(We could have modified the arguments to sub in the body of sub.fun using paste or sprintf to give a base R solution but at the expense of some additional verbosity.)
This can be reformulated as a replacement function giving this pleasing sequence:
"sub.fun<-" <- sub.fun
tt <- txt # make a copy so that we preserve the input txt
sub.fun(tt, "-", 7) <- "||"
sub.fun(tt, "-", 5) <- "|"
> tt
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
(4) gsubfn Using gsubfn from the gsubfn package we can use a particularly simple regular expression (its just "-") and the code has quite a straight forward structure. We perform the substitution via a proto method. The proto object containing the method is passed in place of a replacement string. The simplicity of this approach derives fron the fact that gsubfn automatically makes a count variable available to such methods:
library(gsubfn) # gsubfn also pulls in proto
p <- proto(fun = function(this, x) {
if (count == 5) return("|")
if (count == 7) return("||")
x
})
> gsubfn("-", p, txt)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
UPDATE: Some corrections.
UPDATE 2: Added a replacement function approach to (3).
UPDATE 3: Added pat argument to sub.fun.
An alternative possibility is using Hadley's stringr package which builds the basis for the function I wrote:
require(stringr)
replace.nth <- function(string, pattern, replacement, n) {
locations <- str_locate_all(string, pattern)
str_sub(string, locations[[1]][n, 1], locations[[1]][n, 2]) <- replacement
string
}
txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"
txt.new <- replace.nth(txt, "-", "|", 5)
txt.new <- replace.nth(txt.new, "-", "||", 7)
txt.new
# [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa-aaa||aaa-aaa"
One way to do this is to use gregexpr to find the positions of the -:
posns <- gregexpr("-",txt)[[1]]
And then pasting together the relevant pieces and separators:
paste0(substr(txt,1,posns[5]-1),"|",substr(txt,posns[5]+1,posns[7]-1),"||",substr(txt,posns[7]+1,nchar(txt)))
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"
I have line like this:
x<-c("System configuration: lcpu=96 mem=196608MB ent=16.00")
I need to the the value equal to ent and store it in val object in R
I am doing this not not seem to be working. Any ideas?
val<-x[grep("[0-9]$", x)]
use sub:
val <- sub('^.* ent=([[:digit:]]+)', '\\1', x)
If ent is always at the end then:
sub(".*ent=", "", x)
If not try strapplyc in the gsubfn package which returns only the portion of the regular expression within parentheses:
library(gsubfn)
strapplyc(x, "ent=([.0-9]+)", simplify = TRUE)
Also it could be converted to numeric at the same time using strapply :
strapply(x, "ent=([.0-9]+)", as.numeric, simplify = TRUE)
Using rex may make this type of task a little simpler.
Note this solution correctly includes . in the capture, as does G. Grothendieck's answer.
x <- c("System configuration: lcpu=96 mem=196608MB ent=16.00")
library(rex)
val <- as.numeric(
re_matches(x,
rex("ent=",
capture(name = "ent", some_of(digit, "."))
)
)$ent
)
#>[1] 16