How to extract substring in R [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a string like the below
x <- "Supplier will initially respond to High Priority incidents. Supplier will subsequently update EY every 60 minutes or at an interval EY specifies. Reporting and Response times will be capture in ServiceNow which, save in respect of manifest error, will be conclusive proof of the time period taken."
I want to extract 2 words after the word "every".
How this can be achieved in R?

We can use str_extract by using a regex workaround ((?<=every\\s)) followed by two words
library(stringr) #corrected the package here
unlist(str_extract_all(x, "(?<=every\\s)(\\w+\\s+\\w+)"))
#[1] "60 minutes"
Or using base R
regmatches(x, gregexpr("(?<=every\\s)(\\w+\\s+\\w+)", x, perl = TRUE))[[1]]
#[1] "60 minutes"

Something like this in base R,
Splitting every word of the string and then finding the index of occurrence of word every and then selecting next two words from that index.
wordsplit <- unlist(strsplit(x, " ", fixed = TRUE))
indx <- grep("\\bevery\\b", wordsplit)
wordsplit[(indx+1):(indx +2)]
#[1] "60" "minutes"
Or as #DavidArenburg suggested we can also use match instead of grep
wordsplit[match("every", wordsplit) + 1:2]

Related

R regular expression to parse call option code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a call option code in the form of:
.TSLA181012C100
I'd like to parse it to pull out the 18, 10 and 12. However, I'm not quite sure how to do that as the letters after the period can be of variable length and so can the numbers after the C.
Is there a regex way to find the "C" from the right and get the 6 digits to the left of that?
We can try using sub here for a base R option:
code <- ".TSLA181012C100"
num1 <- sub("^\\.[A-Z]+(\\d{2})\\d{4}C.*", "\\1", code)
num1
num2 <- sub("^\\.[A-Z]+\\d{2}(\\d{2})\\d{2}C.*", "\\1", code)
num2
num3 <- sub("^\\.[A-Z]+\\d{4}(\\d{2})C.*", "\\1", code)
num3
[1] "18"
[1] "10"
[1] "12"
This regex should work: (\d{2}){3}C or simply \d+C

In R - how do I replace all letters in a string with other letters? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to anonymize names but in a very specific way so that the format of the entire string is still the same (spaces, hyphens, periods are preserved) but all the letters are scrambled. I want to consistently replace say all A's with C's, all D's with Z's, and so on. How would I do that?
We can use chartr
chartr('AD', 'CZ', str1)
#[1] "CZ,ZC. C"
data
str1 <- c('AD,DA. C')
Maybe use gsub?
string <- "ABCDEFG"
text <- gsub('A', 'C', string )
string <- gsub('D', 'Z', string )
string
[1] "CBCZEFG"

R - extract number from string(special solution) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
i have a string like this x <- "avd_1xx_2xx_3xx"
i need to extract the number from x(string) and put them in new variables
num1 <- 1xx
num1 <- 2xx
num1 <- 3xx
however, i can't predict the number of digits for each number
for instance, this x would be "avd_1_2_3" or "avd_11_21_33" or likes
could you give me some solutions?
Thanks
We can use str_extract from stringr. To extract multiple matches we use str_extract_all, which returns a list of length 1 (as we have a single element in 'x'). To extract the list element, we can use [[ i.e. [[1]].
library(stringr)
str_extract_all(x, "\\d+[a-z]*")[[1]]
#[1] "1xx" "2xx" "3xx"
A similar option using base R would be regmatches/gregexpr
regmatches(x, gregexpr("\\d+[a-z]*", x))[[1]]
#[1] "1xx" "2xx" "3xx"
The pattern we match is one or more numbers (\\d+) followed by zero or more lower case letters ([a-z]*).
It is better to keep it as a vector rather than having multiple objects in the global environment.

How can get some specific part from a column? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a csv file about IPC data.
It's look like
year date applicant ... ipc number
1978 1/1 noel A43B 13/20
1979 2/2 liam B06C 14/20
1980 3/3 chris D01E 01/30
...
For example,
I need 'A43B','B06C','D01E' of ipc number
but not 'A43B 13/20', 'B07C 14/20', 'D01E 01/30'
Could you please let me know how to deal with it?
Ad hoc I think there are two possibilities:
1. As Pascal wrote go for strsplit and sapply
sapply(
strsplit(c("A43B 13/20", "B06C 14/20", "D01E 01/30"),split = " "),
"[[", 1)
2. Use regular expressions and the function gsub
gsub(pattern = " [0-9]{2}\\/[0-9]{2}", replacement = "", c("A43B 13/20", "B06C 14/20","D01E 01/30"))
We can use sub. We match one or more space (\\s+) followed by one of more characters (.*) to the end ($) of the string and replace with ''.
sub('\\s+.*$', '', str1)
#[1] "A43B" "B07C" "D01E"
data
str1 <- c('A43B 13/20', 'B07C 14/20', 'D01E 01/30')

Extracting number from a non-consistent string in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have my data as below
Idle|Idle|Idle|Idle|Idle|Idle|Idle
Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle
Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle
I want to extract numbers in between(which is phone number in ASCII format) which is
(56|55|49|50|53|48|54|52 for 2nd line and 49|51|48|50|50|49|50|57|56|57|56 for 3rd line),
convert them to numbers between "0 and 9" and concatenate as string/number in new column as phone_number in same data set.
2nd row of new column should be 871230652 and 3rd row should be 13022129898
In ASCII format 48 represents 0 and 57 represents 9
Please help
Thanks,
Here's an approach with regular expressions:
res <- sapply(regmatches(x, gregexpr("^(?:Idle\\|)*\\K\\d+(?=\\|)|\\G(?!^)\\|\\K\\d+",
x, perl = TRUE)),
function(x) paste(as.integer(x) - 48, collapse = ""))
# [1] "" "87125064" "13022129898"
If you want to exclude the empty strings, you can use the following command:
res[as.logical(nchar(res))]
# [1] "87125064" "13022129898"
Here x is this vector:
x <- c("Idle|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle")

Resources