Erase comma and apostrophe in character R - r

I want to remove the comma and the apostrophe but the point of the following character. After that pass to numeric
I have this:
characterExample <- "234'564,900.99"
I want 234564900.99
I try the following but I can't:
result <- gsub("[:punct:].","", characterExample)

Another option is to explicitly remove the characters you want to remove:
gsub("[',]", "", characterExample)
#[1] "234564900.99"
``

An option is to not match the digits or the . by using ^ within the square bracket
gsub("[^0-9.]+","", characterExample)
#[1] "234564900.99"
Or another option is to make use of SKIP/FAIL for the ., while matching the rest of the punct
gsub("(\\.)(*SKIP)(*F)|[[:punct:]]+", "", characterExample, perl = TRUE)
#[1] "234564900.99"
NOTE: Both solutions make sure that it matches any punct characters other than the . and replace with blank ("")

It can also use the pipe symbol like this:
#Code
gsub(",|'","", characterExample)
Output:
gsub(",|'","", characterExample)
[1] "234564900.99"

Related

Extract all text after last occurrence of a special character

I have the string in R
BLCU142-09|Apodemia_mejicanus
and I would like to get the result
Apodemia_mejicanus
Using the stringr R package, I have tried
str_replace_all("BLCU142-09|Apodemia_mejicanus", "[[A-Z0-9|-]]", "")
# [1] "podemia_mejicanus"
which is almost what I need, except that the A is missing.
You can use
sub(".*\\|", "", x)
This will remove all text up to and including the last pipe char. See the regex demo. Details:
.* - any zero or more chars as many as possible
\| - a | char (| is a special regex metacharacter that is an alternation operator, so it must be escaped, and since string literals in R can contain string escape sequences, the | is escaped with a double backslash).
See the R demo online:
x <- c("BLCU142-09|Apodemia_mejicanus", "a|b|c|BLCU142-09|Apodemia_mejicanus")
sub(".*\\|", "", x)
## => [1] "Apodemia_mejicanus" "Apodemia_mejicanus"
We can match one or more characters that are not a | ([^|]+) from the start (^) of the string followed by | in str_remove to remove that substring
library(stringr)
str_remove(str1, "^[^|]+\\|")
#[1] "Apodemia_mejicanus"
If we use [A-Z] also to match it will match the upper case letter and replace with blank ("") as in the OP's str_replace_all
data
str1 <- "BLCU142-09|Apodemia_mejicanus"
You can always choose to _extract rather than _remove:
s <- "BLCU142-09|Apodemia_mejicanus"
stringr::str_extract(s,"[[:alpha:]_]+$")
## [1] "Apodemia_mejicanus"
Depending on how permissive you want to be, you could also use [[:alpha:]]+_[[:alpha:]]+ as your target.
I would keep it simple:
substring(my_string, regexpr("|", my_string, fixed = TRUE) + 1L)

Selecting a specific letters from a character after a specific symbol

Need to take out all the characters before "(" and combine them with ";"
stringr::word(pilist, 2, sep = '(\\s*|\\')
pilist = "pi1(tag1,tag2);pi2(tag3,tag4,tag5);"
I expect the output as
"pi1;pi2"
You can try this pattern
\([^)]+\)|;+$
Regex Demo
Note:- Use escape character as \\ or \ depending on your regex engine
If you are string has the exact same structure as shown, we can remove everything which comes between round brackets and the trailing ; using gsub
gsub("\\(.*?\\)|;$", "", pilist)
#[1] "pi1;pi2"
However, following your description it can also be done by extracting the words which we want instead of removing. Using str_extract_all
paste0(stringr::str_extract_all(pilist, "(\\w+)(?=\\(.*\\))")[[1]], collapse = ";")
#[1] "pi1;pi2"

capitalize the first letter of two words separated by underscore using stringr

I have a string like word_string. What I want is Word_String. If I use the function str_to_title from stringr, what I get is Word_string. It does not capitalize the second word.
Does anyone know any elegant way to achieve that with stringr? Thanks!
Here is a base R option using sub:
input <- "word_string"
output <- gsub("(?<=^|_)([a-z])", "\\U\\1", input, perl=TRUE)
output
[1] "Word_String"
The regex pattern used matches and captures any lowercase letter [a-z] which is preceded by either the start of the string (i.e. it's the first letter) or an underscore. Then, we replace with the uppercase version of that single letter. Note that the \U modifier to change to uppercase is a Perl extension, so we must use sub in Perl mode.
Can also use to_any_case from snakecase
library(snakecase)
to_any_case(str1, "title", sep_out = "_")
#[1] "Word_String"
data
str1 <- "word_string"
This is obviously overly complicating but another base possibility:
test <- "word_string"
paste0(unlist(lapply(strsplit(test, "_"),function(x)
paste0(toupper(substring(x,1,1)),
substring(x,2,nchar(x))))),collapse="_")
[1] "Word_String"
You could first use gsub to replace "_" by " " and apply the str_to_title function
Then use gsub again to change it back to your format
x <- str_to_title(gsub("_"," ","word_string"))
gsub(" ","_",x)

Remove part of a string until a character is found R

I have a regex problem or somewhat regex related problem...
I have strings that look like this:
"..........))))..)))))))"
"....))))))))...)).))))..))"
"......))))...)))...)))))"
I want to remove the initial dot sequence, so that I only get the string starting by the first occurence of ")" symbol. Say, the output would be somthing like:
"))))..)))))))"
"))))))))...)).))))..))"
"))))...)))...)))))"
I assume it would be somewhat similar to a lookahead regex but cannot figure out the correct one...
Any help?
Thanks
We match for 0 or more dots (\\.*) from the start (^) of the string and replace it with blank
sub("^\\.*", "", v1)
#[1] "))))..)))))))" "))))))))...)).))))..))" "))))...)))...)))))"
If it needs to start from ), then as above match 0 or more dots till the first ) and replace with the )
sub("^\\.*\\)", ")", v1)
#[1] "))))..)))))))" "))))))))...)).))))..))" "))))...)))...)))))"
data
v1 <- c("..........))))..)))))))", "....))))))))...)).))))..))", "......))))...)))...)))))")
You can simply remove dots from the beginning of the line (marked in the regex by ^) until you reach a non-dot character:
a <- "..........))))..)))))))"
b <- "....))))))))...)).))))..))"
c <- "......))))...)))...)))))"
sub("^\\.*", "", a) # "))))..)))))))"
sub("^\\.*", "", b) # "))))))))...)).))))..))"
sub("^\\.*", "", c) # "))))...)))...)))))"
The way your question is worded, the goal isn't to remove just . from the beginning, but any symbol until the first ) is encountered. So this answer is a more general solution.
stringr::str_extract("..........))))..)))))))","\\).*$")
Alternatively, if you want to stick with base R, you could use sub/gsub like this:
gsub("[^\\)]*(\\).*$)","\\1","..........))))..)))))))")
sub("[^\\)]*","","..........))))..)))))))")

How to take only that part of a string which occurs before a pattern of 2 dots?

I used a code of regular expressions which only took stuff before the 2nd occurrence of a dot. The following is the code:-
colnames(final1)[i] <- gsub("^([^.]*.[^.]*)..*$", "\\1", colnames(final)[i])
But now i realized i wanted to take the stuff before the first occurrence of a pattern of 2 dots.
I tried
gsub(",.*$", "", colnames(final)[i]) (changed the , to ..)
gsub("...*$", "", colnames(final)[i])
But it didn't work
The example to try on
KC1.Comdty...PX_LAST...USD......Comdty........
converted to
KC1.Comdty.
or
"LIT.US.Equity...PX_LAST...USD......Comdty........"
to
"LIT.US.Equity."
Can anyone suggest anything?
Thanks
We could use sub to match 2 or more dots followed by other characters and replace it with blank
sub("\\.{2,}.*", "", str1)
#[1] "KC1.Comdty" "LIT.US.Equity"
The . is a metacharacter implying any character. So, we need to escape (\\.) to get the literal meaning of the character
data
str1 <- c("KC1.Comdty...PX_LAST...USD......Comdty.......", "LIT.US.Equity...PX_LAST...USD......Comdty........")
Another solution with strsplit:
str1 <- c("KC1.Comdty...PX_LAST...USD......Comdty.......", "LIT.US.Equity...PX_LAST...USD......Comdty........")
sapply(strsplit(str1, "\\.{2}\\w"), "[", 1)
# [1] "KC1.Comdty." "LIT.US.Equity."
To also include the dot at the end with #akrun's answer, one can do:
sub("\\.{2}\\w.*", "", str1)
# [1] "KC1.Comdty." "LIT.US.Equity."

Resources