R regular expression to parse call option code [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a call option code in the form of:
.TSLA181012C100
I'd like to parse it to pull out the 18, 10 and 12. However, I'm not quite sure how to do that as the letters after the period can be of variable length and so can the numbers after the C.
Is there a regex way to find the "C" from the right and get the 6 digits to the left of that?

We can try using sub here for a base R option:
code <- ".TSLA181012C100"
num1 <- sub("^\\.[A-Z]+(\\d{2})\\d{4}C.*", "\\1", code)
num1
num2 <- sub("^\\.[A-Z]+\\d{2}(\\d{2})\\d{2}C.*", "\\1", code)
num2
num3 <- sub("^\\.[A-Z]+\\d{4}(\\d{2})C.*", "\\1", code)
num3
[1] "18"
[1] "10"
[1] "12"

This regex should work: (\d{2}){3}C or simply \d+C

Related

Remove everything that doesn't begin with pattern in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have vector myvec. I would like to edit the values in the vector so that anything that doesn't begin with "NAC", I want to delete them in addition to stuffs after "_".
myvec = c("NAC1001_09ADAA", "TI09AA_NAC02111", "NACT10099_099AD")
Result I want:
NAC1001, NAC02111, NACT10099
What do I need to do for this?
We can use str_extract
library(stringr)
str_extract(myvec, '(?<=\\b|_)NACT?\\d+')
#[1] "NAC1001" "NAC02111" "NACT10099"
Or with sub from base R
sub(".*(NACT?\\d+).*", "\\1", myvec)
Split on underscore "_", then keep the one that starts with "N":
sapply(strsplit(myvec, "_"), function(i) i[ startsWith(i, "N") ])
# [1] "NAC1001" "NAC02111" "NACT10099"

How can I extract a substring in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have this string, for instance:
str1 = "UNCID_999277.TCGA-CV-7254-01A-11R-2016-07.111118_UNC11-SN627_0167_AD09WDACXX_TAGCTT.txt"
I would like to extract this substring, for instance:
TCGA-CV-7254
I tried something link this:
gsub(pattern = "(*.)(TCGA*)(.*)",
replacement = "\\2",
x = nameArq)
But it returns:
[1] "UNCID_999277TCGA"
Thanks for any help!
You almost had it. In the first parentheses, the period needs to come first (this means "repeat any character any number of times"). You also need some unique endpoint for the second part of your regex.
gsub(pattern = "(.*)(TCGA.*4)(.*)",
replacement = "\\2",
x = str1)

R - extract number from string(special solution) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
i have a string like this x <- "avd_1xx_2xx_3xx"
i need to extract the number from x(string) and put them in new variables
num1 <- 1xx
num1 <- 2xx
num1 <- 3xx
however, i can't predict the number of digits for each number
for instance, this x would be "avd_1_2_3" or "avd_11_21_33" or likes
could you give me some solutions?
Thanks
We can use str_extract from stringr. To extract multiple matches we use str_extract_all, which returns a list of length 1 (as we have a single element in 'x'). To extract the list element, we can use [[ i.e. [[1]].
library(stringr)
str_extract_all(x, "\\d+[a-z]*")[[1]]
#[1] "1xx" "2xx" "3xx"
A similar option using base R would be regmatches/gregexpr
regmatches(x, gregexpr("\\d+[a-z]*", x))[[1]]
#[1] "1xx" "2xx" "3xx"
The pattern we match is one or more numbers (\\d+) followed by zero or more lower case letters ([a-z]*).
It is better to keep it as a vector rather than having multiple objects in the global environment.

R how to cast string to a vector [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am still a beginner in R and I can't find an answer to my question:
I use a string:
string1="c('T-shirt', 'Polo', 'Pull')"
And I need my object string1 to be a vector.
You can evaluate the expression in the string using
eval(parse(text=string1))
result:
[1] "T-shirt" "Polo" "Pull"
I am not sure what you want the final output to be. If you want string1 to be a vector of strings, the right syntax should be
string1 <- c("T-shirt", "Polo", "Pull")
Please clarify if you want a different output
You can do by both ways
eval(parse(text=string1))
or
c <- gsub("\\(|\\)|c|'", "", string1)
d <- strsplit(c,",")
e <- d[[1]]
e
This could be done with str_extract without using the eval(parse.
library(stringr)
str_extract_all(string1, "(?<=')[[:alpha:]-]+")[[1]]
#[1] "T-shirt" "Polo" "Pull"
data
string1="c('T-shirt', 'Polo', 'Pull')"

Extracting number from a non-consistent string in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have my data as below
Idle|Idle|Idle|Idle|Idle|Idle|Idle
Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle
Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle
I want to extract numbers in between(which is phone number in ASCII format) which is
(56|55|49|50|53|48|54|52 for 2nd line and 49|51|48|50|50|49|50|57|56|57|56 for 3rd line),
convert them to numbers between "0 and 9" and concatenate as string/number in new column as phone_number in same data set.
2nd row of new column should be 871230652 and 3rd row should be 13022129898
In ASCII format 48 represents 0 and 57 represents 9
Please help
Thanks,
Here's an approach with regular expressions:
res <- sapply(regmatches(x, gregexpr("^(?:Idle\\|)*\\K\\d+(?=\\|)|\\G(?!^)\\|\\K\\d+",
x, perl = TRUE)),
function(x) paste(as.integer(x) - 48, collapse = ""))
# [1] "" "87125064" "13022129898"
If you want to exclude the empty strings, you can use the following command:
res[as.logical(nchar(res))]
# [1] "87125064" "13022129898"
Here x is this vector:
x <- c("Idle|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle")

Resources