I have a data frame, and i want to find especial characters so i use:
example$bb <- ifelse(grepl("*****", example$aa)==T, 1, 0)
But R says :
Error in grepl("*****", example$aa :
invalid regular expression, reason 'Invalid use of repetition operators'
Any suggestion?
How to do i write the symbol *****?
* is a meta character, use the escape meta character / to search for it
grepl('/*', '***')
[1] TRUE
Related
I have a simple character string:
y <- "Location 433900E 387200N, Lat 53.381 Lon -1.490, 131 metres amsl"
When I perform regex capture on it:
stringr::str_extract(r'Lat(.*?)\,', y)
I get this error:
>Error: malformed raw string literal at line 1
why?
With R's raw strings (introduced in version 4.0.0), you need to use either ( or [ or { with the quotes, e.g.,
r'{Lat(.*?)\,}'
This is documented at ?Quotes (and in the release notes):
Raw character constants are also available using a syntax similar to the one used in C++: r"(...)" with ... any character sequence, except that it must not contain the closing sequence )". The delimiter pairs [] and {} can also be used, and R can be used in place of r."
So I am trying to split my string based on all the punctuations and space wherever they occur in the string (hence the + sign) except for on "#" & "/" because I don't want it to split #n/a which it does. I did search a lot on this problem but can't get to the solution. Any suggestions?
t<-"[[:punct:][:space:]]+"
bh <- tolower(strsplit(as.character(a), t)[[1]])
I have also tried storing the following to t but it also gives error
t<-"[!"\$%&'()*+,\-.:;<=>?#\[\\\]^_`{|}~\\ ]+"
Error: unexpected input in "t<-"[!"\"
One alternate is to substitute #n/a but I want to know how to do it without having to do that.
You may use a PCRE regex with a lookahead that will restrict the bracket expression pattern:
t <- "(?:(?![#/])[[:punct:][:space:]])+"
bh <- tolower(strsplit(as.character(a), t, perl=TRUE)[[1]])
The (?:(?![#/])[[:punct:][:space:]])+ pattern matches 1 or more repetitions of any punctuation or whitespace that is not # and / chars.
See the regex demo.
If you want to spell out the symbols you want to match inside a bracket expression you may fix your other pattern like
t <- "[][!\"$%&'()*+,.:;<=>?#\\\\^_`{|}~ -]+"
Note that ] must be right after the opening [, [ inside the expression does not need to be escaped, - can be put unescaped at the end, a \ should be defined with 4 backslashes. $ does not have to be escaped.
Using RODBC package I pulled data from access to R.
One of the names in table in access is : "JED_credit\debit (local)"
When I'm trying to refer to the cell in R I get:
JED_credit\debit (local)
"Error: unexpected input in "JED_credit\"
I think it is not the recommended way of defining variables, but it is working by using backward ticks.
> `var` <- 'test'
> var
[1] "test"
> `var/bla` <- 'test'
> `var/bla`
[1] "test"
> `var()bla` <- 'test'
> `var()bla`
[1] "test"
> `var\bla` <- 'test'
> `var\bla`
[1] "test"
There are a few characters sequences in almost all languages which have a special meaning. For example \n stands for a newline character.
In your regular string, the parser treats \d in JED_credit\debit as a special character sequence rather than a part of the string. To make it a part of the string, you need to escape the back slash with another\ thus making your variable name as JED_credit\\debit.
Answer improvements are welcome.
Here is the string:
> raw.data[27834,1]
[1] "\xff$GPGGA"
I have tried advice from the following two questions, but with no luck:
How to escape a backslash in R?
How to escape backslashes in R string
Does anyone have a different solution from the above questions that might help? The ideal solution would be to remove the "\xff" portion, but for any combination of letters.
There is no backslash in that string. The displayed backslash is an escape marker. This and other features about entry and display of "special situations" are described in the ?Quotes help page.. You've been given one regex rather elliptical approach to removal. Here are a couple of other approaches .... only some of which actually succeed because the \ff is the first "character" and it's not really legal as an R character:
s <- "\xff$GPGGA"
strsplit(s, "")
#[[1]]
#[1] NA
Warning message:
In strsplit(s, "") : input string 1 is invalid in this locale
substr(s, 1,1)
#Error in substr(s, 1, 1) : invalid multibyte string at '<ff>$GP<47>GA'
gsub('.*([^A-Za-z].*)', '\\1',"\xff$GPGGA")#[1]
#[1] "$GPGGA"
?Quotes
gsub('\xff', '',"\xff$GPGGA")#[1]
#[1] "$GPGGA"
I think the reason that the regex functions don't choke on that string is that regex is actually a system mediated process whereas strsplit and substr are internal R functions.
#RichardScriven posts an example and when I tried to replicated it, I get yet a different example that shows the mapping to displayed characters is system specific. I'm on OSX 10.10.1 (Yosemite)>
cat('\xff')
ˇ
(I left off the octothorpe (#) that I would normally out in.)
Here is the string:
> raw.data[27834,1]
[1] "\xff$GPGGA"
I have tried advice from the following two questions, but with no luck:
How to escape a backslash in R?
How to escape backslashes in R string
Does anyone have a different solution from the above questions that might help? The ideal solution would be to remove the "\xff" portion, but for any combination of letters.
There is no backslash in that string. The displayed backslash is an escape marker. This and other features about entry and display of "special situations" are described in the ?Quotes help page.. You've been given one regex rather elliptical approach to removal. Here are a couple of other approaches .... only some of which actually succeed because the \ff is the first "character" and it's not really legal as an R character:
s <- "\xff$GPGGA"
strsplit(s, "")
#[[1]]
#[1] NA
Warning message:
In strsplit(s, "") : input string 1 is invalid in this locale
substr(s, 1,1)
#Error in substr(s, 1, 1) : invalid multibyte string at '<ff>$GP<47>GA'
gsub('.*([^A-Za-z].*)', '\\1',"\xff$GPGGA")#[1]
#[1] "$GPGGA"
?Quotes
gsub('\xff', '',"\xff$GPGGA")#[1]
#[1] "$GPGGA"
I think the reason that the regex functions don't choke on that string is that regex is actually a system mediated process whereas strsplit and substr are internal R functions.
#RichardScriven posts an example and when I tried to replicated it, I get yet a different example that shows the mapping to displayed characters is system specific. I'm on OSX 10.10.1 (Yosemite)>
cat('\xff')
ˇ
(I left off the octothorpe (#) that I would normally out in.)