R - extract number from string(special solution) [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
i have a string like this x <- "avd_1xx_2xx_3xx"
i need to extract the number from x(string) and put them in new variables
num1 <- 1xx
num1 <- 2xx
num1 <- 3xx
however, i can't predict the number of digits for each number
for instance, this x would be "avd_1_2_3" or "avd_11_21_33" or likes
could you give me some solutions?
Thanks

We can use str_extract from stringr. To extract multiple matches we use str_extract_all, which returns a list of length 1 (as we have a single element in 'x'). To extract the list element, we can use [[ i.e. [[1]].
library(stringr)
str_extract_all(x, "\\d+[a-z]*")[[1]]
#[1] "1xx" "2xx" "3xx"
A similar option using base R would be regmatches/gregexpr
regmatches(x, gregexpr("\\d+[a-z]*", x))[[1]]
#[1] "1xx" "2xx" "3xx"
The pattern we match is one or more numbers (\\d+) followed by zero or more lower case letters ([a-z]*).
It is better to keep it as a vector rather than having multiple objects in the global environment.

Related

Compare a character vector by regex with another character vector [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have two character vectors and I just want to compare them and just keep those, which contain the same character pattern, here country.
a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv", "nutr_sup_AUS.csv")
b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_AGO_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv")
#wished result:
result_a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv")
result_b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv")
I thought about subsetting first and compare the strings then:
a_ISO<-str_sub(a, start=10, end = -5) #subset just ISO name
b_ISO<-str_sub(b, start =12, end = -9 ) #subset just ISO name
dif1<-setdiff(a, b) # get difference (order is important)
dif2<-setdiff(b,a) # get difference
dif<-c(dif1,dif2) # selection which to remove
But I don't know from here how to compare a and b with dif. So basically How to compare a character vector by regex with another character vector.
I think you should extract the characters with a more general approach with regex, not with position. I think it is also easier to just subset the elements you want to keep with intersect() rather than determining the ones to drop with settdiff():
Extract the three-character code with a regex:
index_a<-stringr::string_extract(a, "[A-Z]{3}")
index_b<-stringr::string_extract(b, "[A-Z]{3}")
Then subset the vectors with intersect() and base indexing:
intersect_ab<-intersect(index_a, index_b)
result_a<-a[index_a %in% intersect_ab]
result_b<-b[index_b %in% intersect_ab]
That said, your solution does work with an additional final step:
result_a<-a[!dif1 %in% a_ISO]
result_b<-b[!dif2 %in% b_ISO]

remove quotes from string on only numeric values in r [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Given the string, how can I get the desired outcome?
('9','','','','','','','','31.23','testing7'),('10','','','','','','','','31.23','testing10')
Desired Output
(9,'','','','','','','',31.23,'testing7'),(10,'','','','','','','',31.23,'testing10')
Try the following regex:
s <- "('9','','','','','','','','31.23','testing7'),
('10','','','','','','','','31.23','testing10')"
gsub("'(-?\\d+(?:[\\.,]\\d+)?)'", x = s, replacement = "\\1")
Regex explanation:
' match literal single quote character
() capture group
? match between 0-1 times
\\d+ match digit 1 and unlimited times
\\1 group 1
This should allow for negative numbers and decimals.
Output
"(9,'','','','','','','',31.23,'testing7'),(10,'','','','','','','',31.23,'testing10')"

How to split a list in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have an R list where all of the values are in the first position (i.e. list[1]), while I want all the values to be spread evenly throughout the list (list[1] contains one value, list[2] contains the next, etc.). I have been trying unsuccessfully for a while to split the values one position into separate values (each value is a string of characters separated by spaces) but nothing has worked.
Below is an illustration of the sort of situation I am in.
Say "test" is the name of a list in R. Test is an object of length 1, and if you enter test[1] in the console, the output is thousands of values formatted like so:
[1] "value1" "value2" "value3" ... etc.
Now I want to somehow split the contents of list[1] so that each separated character string is in a separate position, so test[1] is "value1", test[2] is "value2", etc. I have looked around for and attempted many purported solutions to this sort of issue (recent example here: List to integer or double in R) but nothing has worked for me so far.
Here's a simple way:
l1 <- list(l1 = round(rnorm(100, 0, 5), 0))
v <- unlist(l1)
l2 <- as.list(v)
length of l1 is 1 and length of l2 is 100. Is this what you are after?

How to extract substring in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a string like the below
x <- "Supplier will initially respond to High Priority incidents. Supplier will subsequently update EY every 60 minutes or at an interval EY specifies. Reporting and Response times will be capture in ServiceNow which, save in respect of manifest error, will be conclusive proof of the time period taken."
I want to extract 2 words after the word "every".
How this can be achieved in R?
We can use str_extract by using a regex workaround ((?<=every\\s)) followed by two words
library(stringr) #corrected the package here
unlist(str_extract_all(x, "(?<=every\\s)(\\w+\\s+\\w+)"))
#[1] "60 minutes"
Or using base R
regmatches(x, gregexpr("(?<=every\\s)(\\w+\\s+\\w+)", x, perl = TRUE))[[1]]
#[1] "60 minutes"
Something like this in base R,
Splitting every word of the string and then finding the index of occurrence of word every and then selecting next two words from that index.
wordsplit <- unlist(strsplit(x, " ", fixed = TRUE))
indx <- grep("\\bevery\\b", wordsplit)
wordsplit[(indx+1):(indx +2)]
#[1] "60" "minutes"
Or as #DavidArenburg suggested we can also use match instead of grep
wordsplit[match("every", wordsplit) + 1:2]

How can get some specific part from a column? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a csv file about IPC data.
It's look like
year date applicant ... ipc number
1978 1/1 noel A43B 13/20
1979 2/2 liam B06C 14/20
1980 3/3 chris D01E 01/30
...
For example,
I need 'A43B','B06C','D01E' of ipc number
but not 'A43B 13/20', 'B07C 14/20', 'D01E 01/30'
Could you please let me know how to deal with it?
Ad hoc I think there are two possibilities:
1. As Pascal wrote go for strsplit and sapply
sapply(
strsplit(c("A43B 13/20", "B06C 14/20", "D01E 01/30"),split = " "),
"[[", 1)
2. Use regular expressions and the function gsub
gsub(pattern = " [0-9]{2}\\/[0-9]{2}", replacement = "", c("A43B 13/20", "B06C 14/20","D01E 01/30"))
We can use sub. We match one or more space (\\s+) followed by one of more characters (.*) to the end ($) of the string and replace with ''.
sub('\\s+.*$', '', str1)
#[1] "A43B" "B07C" "D01E"
data
str1 <- c('A43B 13/20', 'B07C 14/20', 'D01E 01/30')

Resources