This question already has answers here:
Extracting a string between other two strings in R
(4 answers)
Closed 2 years ago.
I wish to extract the decimal value in the string without the % sign. So in this case, I want the numeric 0.45
x <- "document.write(GIC_annual[\"12-17 MTH\"][\"99999.99\"]);0.450%"
str_extract(x, "^;[0-9.]")
My attempt fails. Here's my thinking.
Begin the extraction at the semicolon ^;
Grab any numbers between 0 and 9.
Include the decimal point
You also have this option:
stringr::str_extract(y, "\\d\\.\\d{1,}(?=%)")
[1] "0.450"
So basically you look ahead and check if there is % or not, if yes, you capture the digits before it.
Details
\\d digit;
\\. dot;
\\d digit;
{1,} capturing 1 or more digit after .;
(?=%) look ahead and check if there is % and if there is one, it retuns captured number
Since you don't want semi-colon in the output use it as lookbehind regex.
stringr::str_extract(x, "(?<=;)[0-9]\\.[0-9]+")
#[1] "0.450"
In base R using sub :
sub('.*;([0-9]\\.[0-9]+).*', '\\1', x)
Related
This question already has answers here:
Remove part of string after "."
(6 answers)
Extract string before "|" [duplicate]
(3 answers)
Closed 1 year ago.
I'm trying to extract matches preceding a pattern in R. Lets say that I have a vector consisting of the next elements:
my_vector
> [1] "ABCC12|94160" "ABCC13|150000" "ABCC1|4363" "ACTA1|58"
[5] "ADNP2|22850" "ADNP|23394" "ARID1B|57492" "ARID2|196528"
I'm looking for a regular expression to extract all characters preceding the "|". The expected result must be something like this:
my_new_vector
> [1] "ABCC12" "ABCC13" "ABCC1" "ACTA1"
and so on.
I have already tried using stringr functions and regular expressions based on look arounds, but I failed.
I really appreciate your advices and help to solve my issue.
Thanks in advance!
We could use trimws and specify the whitespace as a regex that matches the | (metacharacter - so escape \\ followed by one or more character (.*)
trimws(my_vector, whitespace = "\\|.*")
This question already has answers here:
remove leading 0s with stringr in R
(3 answers)
Closed 2 years ago.
I'm trying to remove the 0 that appears at the beginning of some observations for Zipcode in the following table:
I think the sub function is probably my best choice but I only want to do the replacement for observations that begin with 0, not all observations like the following does:
data_individual$Zipcode <-sub(".", "", data_individual$Zipcode)
Is there a way to condition this so it only removes the first character if the Zipcode starts with 0? Maybe grepl for those that begin with 0 and generate a dummy variable to use?
We can specify the ^0+ as pattern i.e. one or more 0s at the start (^) of the string instead of . (. in regex matches any character)
data_individual$Zipcode <- sub("^0+", "", data_individual$Zipcode)
Or with tidyverse
library(stringr)
data_individual$Zipcode <- str_remove(data_individual$Zipcode, "^0+")
Another option without regex would be to convert to numeric as numeric values doesn't support prefix 0 (assuming all zipcodes include only digits)
data_individual$Zipcode <- as.numeric(data_individual$Zipcode)
This question already has answers here:
remove repeated character between words
(4 answers)
Closed 3 years ago.
I have this text:
F <- "hhhappy birthhhhhhdayyy"
and I want to remove the repeat characters, I tried this code
https://stackoverflow.com/a/11165145/10718214
and it works, but I need to remove repeat characters if it repeats more than 2, and if it repeated 2 times keep it.
so the output that I expect is
"happy birthday"
any help?
Try using sub, with the pattern (.)\\1{2,}:
F <- ("hhhappy birthhhhhhdayyy")
gsub("(.)\\1{2,}", "\\1", F)
[1] "happy birthday"
Explanation of regex:
(.) match and capture any single character
\\1{2,} then match the same character two or more times
We replace with just the single matching character. The quantity \\1 represents the first capture group in sub.
This question already has answers here:
Complete word matching using grepl in R
(3 answers)
Closed 4 years ago.
Whenever english character of length 1 exists, I want that to be combined with the previous text.
gsub('(.*)\\s+([a-zA-Z]{1})', "\\1\\2", 'Anti-Candida a ингибинов')
Anti-Candidaa ингибинов
For the example below, it should return 'Anti-Candida am ингибинов' as 'am' is of length 2.
gsub('(.*)\\s+([a-zA-Z]{1})', "\\1\\2", 'Anti-Candida am ингибинов')
You can use this regex:
\W+([a-zA-Z])\b
replace with \\1. The trick here is to match a word boundary after the single letter.
Demo
Your regex will work as well, if you just add that \b at the end.
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"