Writing a Regex Expression for Extracting from a string [duplicate] - r

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Extract all numbers from a single string in R
(4 answers)
Closed 1 year ago.
I was thinking I could use str_extract_all or something in tidyverse, but I am not sure how to get it, because what my string returns is not correct.
This is the string:
str <- "12, 47, 48 The integers numbers are also interesting: 189 2036 314 \',\' is a separator, so please extract these numbers 125,789,1450 and also these 564,90456. 7890$ per month "

We can use str_extract_all to extract multiple instances of one of more digits (\\d+). The output will be a list of length 1. So, we extract the list element with [[
library(stringr)
str_extract_all(string1, "\\d+")[[1]]
-output
[1] "12" "47" "48" "189" "2036" "314" "125" "789" "1450" "564" "90456" "7890"

For a base R option, we can use regmatches along with gregexpr:
regmatches(string1, gregexpr("\\d+", string1))
[1] "12" "47" "48" "189" "2036" "314" "125" "789" "1450" "564" "90456" "7890"

Related

Gsub the first to digits of a date

I want to reshape a year '1984' into '84' in my dataset. I just want to remove the first to digits ('19') and ('20') so only the last two numbers will remain.
I've tried the following:
gsub('19+', '', year)
gsub('20+', '', year)
These codes also delete the years 1919 or 2020 completely but that's not the idea.
What code can I try while using gsub?
Using 19+ will match a 1 followed by 1 or more times a 9. Using 20+ will match a 2 followed by 1 or more times a zero. As gsub replaces all matches of a string, you will match both 1919 and 2020, as it will alse for example 19999919 or 200.
You might use a pattern to match either 19 or 20 and capture the last 2 digits in a capture group.
In the replacement use the first capture group using \\1, and use word boundaries \b around the pattern to prevent the digits being part of a larger string.
gsub('\\b(?:19|20)(\\d\\d)\\b', '\\1', "1984")
Output
[1] "84"
R demo
A broader match could be matching 2 digits at the start instead of 19 or 20.
gsub('\\b\\d{2}(\\d{2})\\b', '\\1', "1984")
Using ^ for beginning of string.
gsub("^19|^20", "", year)
# [1] "19" "28" "37" "46" "55" "64" "73" "82" "91" "00" "09" "18"
Alternatively using substring.
substring(year, 3)
# [1] "19" "28" "37" "46" "55" "64" "73" "82" "91" "00" "09" "18"
Data:
year <- seq(1919, 2021, 9)

Convert from string format "c(\"4\", \"5\", \"7\", \"8\", \"9\", \"10\")" to character [duplicate]

This question already has answers here:
Evaluate expression given as a string
(8 answers)
Closed 3 years ago.
How can I convert the string:
a <- "c(\"4\", \"5\", \"7\", \"8\", \"9\", \"10\")"
to a vector of values: 4,5,7,8,9,10 ?
The not-so-likeable eval parse can be handy here
as.integer(eval(parse(text = a)))
#[1] 4 5 7 8 9 10
Or maybe you want to keep them as characters as your title indicates.
eval(parse(text = a))
#[1] "4" "5" "7" "8" "9" "10"
Based on how complicated the string is you could also extract all the digits from the string.
stringr::str_extract_all(a, "\\d+")[[1]]
Or in base R
regmatches(a, gregexpr("\\d+", a))[[1]]

Finding last digits in text using stringr [duplicate]

This question already has an answer here:
stringr extract full number from string
(1 answer)
Closed 3 years ago.
Trying to use StringR to find all the digits which occur at the end of the text.
For example
x <- c("Africa-123-Ghana-2", "Oceania-123-Sydney-200")
and StringR operation should return
"2 200"
I believe there might be multiple methods, but what would be the best code for this?
Thanks.
You could use
sub(".*-(\\d+)$", "\\1", x)
#[1] "2" "200"
Or
stringr::str_extract(x, "\\d+$")
Or
stringi::stri_extract_last_regex(x, "\\d+")
We can use regexpr/regmatches in base R to match one or more digits (\\d+) at the end ($) of the string
regmatches(x, regexpr("\\d+$", x))
#[1] "2" "200"
Or with sub, we match characters until the last character that is not a digit and replace with blank ("")
sub(".*\\D+", "", x)
#[1] "2" "200"
Or using strsplit
sapply(strsplit(x, "-"), tail, 1)
#[1] "2" "200"
Or using stringr with str_match
library(stringr)
str_match(x, "(\\d+)$")[,1]
#[1] "2" "200"
Or with str_remove
str_remove(x, ".*\\D+")
#[1] "2" "200"

Separate data of complex coding into 2 or more columns [duplicate]

This question already has answers here:
split character data into numbers and letters
(8 answers)
Closed 5 years ago.
I have a dataset that looks like this:
and I would like to split the Cloud column into two columns, one column for the letters and another column for only the numbers of each coding, the problem is that in some rows there is a combination of two or three codes (OVC32 is one code for example) per row.
any help on how can I split this into just two columns is much appreciated
You can separate Number and Letter from "Cloud" using like this:
Cloud <- c("BKN130", "FEW090 SCT120 BKN150", "FEW200", "BKN140", "BKN120 BKN190")
Cloud_Letter <- gsub("[[:digit:]]","",Cloud)
Cloud_Letter
[1] "BKN" "FEW SCT BKN" "FEW" "BKN"
[5] "BKN BKN"
Cloud_Number <- str_extract_all(Cloud, "\\d+")
Cloud_Number
[[1]]
[1] "130"
[[2]]
[1] "090" "120" "150"
[[3]]
[1] "200"
[[4]]
[1] "140"
[[5]]
[1] "120" "190"

Changing a column of a dataframe in R

I have a dataframe in R with a column with values as "s1-112", "s10-112", "s3656-112" etc. Now i want to change the values to only the part after "s" and before "-112" that is the number after s. is there a way?
You could use gsub here
x<-c("s1-112", "s10-112", "s3656-112")
gsub("s(.*)-112", "\\1", x)
# [1] "1" "10" "3656"
Or (using #MrFlick's data)
library(stringr)
str_extract(x, perl('\\d+(?=-)'))
#[1] "1" "10" "3656"

Resources