I'm trying to find the right regex to grepl weather a string contains digits[0-9] and the special character "-" only.
ex,
str1="00-25" #TRUE
str2="0a-2" #FALSE
I have tried
grepl("[^[:digit:]|-]",str2)
#[1] TRUE
thoughts?
You want to check if the string has only digit and -.
To create the ensemble, you need to use "[]" so :
[0-9-]
Now you want to check that every character of the string is in the ensemble you have created, in other term you want to start(^) and finish($) by this ensemble :
^[0-9-]$
Finally in the variable there is 1 or more character, so I use the "+" :
grepl("^[0-9-]+$",str)
Related
I have very simple issue with replacing the strings second occurrence with the new string.
Lets say we have this string
string <- c("A12A32")
and we want to replace the second A with B string. A12B32 is the expected output.
by following this relevant post
How to replace second or more occurrences of a dot from a column name
I tried,
replace_second_A <- sub("(\\A)\\A","\\1B", string)
print(replace_second_A)
[1] "A12A32"
it seems no change in the second A why?
Note that .*? matches the shortest string until the next A:
string <- "A12A32"
sub("(A.*?)A", "\\1B", string)
## [1] "A12B32"
First, there is no need to escape the letter A using backslashes. They are only required to escape special characters that have other meanings e.g. "." means "any character", "\\." means "period".
Second, your regular expression "(\\A)\\A" reads "match A followed by another A, keeping the first A for reuse." You don't have two consecutive "A", they are separated by digits.
So this works ("\\d+" means "match 1 or more digits"):
sub("(A\\d+)A","\\1B", "A12A32")
[1] "A12B32"
I have column values in my data frame like my_name_is_khan , hello|this|is|it and so on . How do I use str_extract to do this? When I use str_extract, this is what I get. When i do not exactly know the character length before the first special char (- or |), what do I do ?
str_extract("my-name-is-khan", pattern = "[a-z]{1,6}")
[1] "my"
I have a CSV file where numeric values are stored in a way like this:
+000000000000000000000001101.7100
The number above is 1101.71. This string is always the same length, so number of zeroes before the actual number depends on numberĀ“s length.
How can I drop the + and all 0s before the actual number so I can then convert it to numeric easily?
If it is of fixed width, then substring will be a faster option
as.numeric(substring(str1, nchar(str1)-8))
#[1] 1101.71
but if we don't know how many 0's will be there at the beginning, then another option is sub where we match a + at the start (^) of the string followed by 0 or more elements of 0 (0*) and replace with blank ("")
as.numeric(sub("^\\+0*", "", str1))
#[1] 1101.71
Note that we escape the + as it is a metacharacter implying one or more
I may miss an important point, but my best try would be like this:
1) read the values as a character
2) use substr to get rid of the first character, namely the plus sign
3) convert column with as.integer / this way we safely loose any leading zeroes
What regular expression can retrieve (e.g. with sup()) the characters before the second period. Given a character vector like:
v <- c("m_s.E1.m_x.R1PE1", "m_xs.P1.m_s.R2E12")
I would like to have returned this:
[1] "m_s.E1" "m_xs.P1"
> sub( "(^[^.]+[.][^.]+)(.+$)", "\\1", v)
[1] "m_s.E1" "m_xs.P1"
Now to explain it: The symbols inside the first and third paired "[ ]" match any character except a period ("character classes"), and the "+"'s that follow them let that be an arbitrary number of such characters. The [.] therefore is only matching the first period, and the second period will terminate the match. Parentheses-pairs allow you to specific partial sections of matched characters and there are two sections. The second section is any character (the period symbol) repeated an arbitrary number of times until the end of the string, $. The "\\1" specifies only the first partial match as the returned value.
The ^ operator means different things inside and outside the square-brackets. Outside it refers to the length-zero beginning of the string. Inside at the beginning of a character class specification, it is the negation operation.
This is a good use case for "character classes" which are described in the help page found by typing:
?regex
Not regex but the qdap package has the beg2char (beginning of string 2 n character) to handle this:
library(qdap)
beg2char(v, ".", 2)
## [1] "m_s.E1" "m_xs.P1"
I want a function to return TRUE if a string contains only letters, and FALSE otherwise.
I had a hard time finding a solution for this problem using R even though there are many answer pages for other languages.
We can use grep. We match letters [A-Za-z] from the start (^) to the end $ of the string.
grepl('^[A-Za-z]+$', str1)
#[1] TRUE FALSE
data
str1 <- c('Azda', 'A123Zda')