how to find the element in between two elements in a character vector created by an rtf document [duplicate] - r

This question already has answers here:
Extracting a string between other two strings in R
(4 answers)
Closed 1 year ago.
I have an object created from an rtf document using the code:sample_doc <- read_rtf("sample.doc") (I had to use read_rtf because the document is actually an rtf).
I know somewhere in the document there are two phrases (an element in the character vector) apple and orange and that there must be an element in between them. I just want to extract that in-between element. What should I do?
Thanks!

You can use positive lookbehind and lookahead to target the pattern in between, this regex should give u what u need:
(?<=orange)(.*)(?=apple)

Related

Replace last characters of a string with its entire elements [duplicate]

This question already has answers here:
Extracting the last n characters from a string in R
(15 answers)
Closed 3 years ago.
I have an element in my dataframe which I want to modify.
I have a column with the following type of values
https://mns-xyz-eu.abc.com/ccs/proposal?action=view&proposalId=12345
I want to replace the entire string with just the last 5 characters (i.e)
Replace the entire character string with 12345 in this case.
How do I achieve this?
Thanks a lot.
One option is using a positive look behind using stringr::str_extract
str_extrct('https://mns-xyz-eu.abc.com/ccs/proposal?action=view&proposalId=12345',
'(?<=proposalId\\=)\\d+')
#Simple option
str_extract('https://mns-xyz-eu.abc.com/ccs/proposal?action=view&proposalId=12345', '\\d+')

R: Regular Expression for Twitter hashtags? [duplicate]

This question already has answers here:
What characters are allowed in twitter hashtags?
(6 answers)
Closed 4 years ago.
I'm trying to come up with a regular expression that matches Twitter hashtags. Twitter hashtags have the following rules:
1)They cannot contain spaces,
2)They cannot contain punctuation
3) They cannot start with or use only numbers.
This is what I've come up so far, but it still has issues with spaces and punctuation characters:
"#{1}[^0-9]*[^[::punct::]\\s]*?[A-z0-9]*?"
Would appreciate any help with this. Thanks!
Your regex looks a bit complicated, you only need to match the # then a letter and then alphanumeric characters.
You also don't need quantifier for a single character. This should work:
#[a-zA-Z]\w*
If you won't allow underscores (they are legal characters in tweets), use this instead:
#[a-zA-Z][\da-zA-Z]*
It looks like the real spec for a hashtag however is that underscores and numbers are valid anywhere as long as they're at least a letter.
So this would be better:
#\w*[a-zA-Z]\w*
This regex captures only valid hashtags :
(#[a-zA-Z]+[\w]?)(?:\s|$)

Trying to validate two different format in one regular expression [duplicate]

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 5 years ago.
I want to validate these formats in one regular expression in asp.net:
XX-XXXXXXX or XXX-XX-XXXX
These have to be numeric only no characters except the "-".
Is this possible? I've been trying without any success so I want to ask the experts.
Thanks,
Pune
The following should work given your requirements.
"(^\d{2}-\d{7}$)|(^\d{3}-\d{2}-\d{4}$)"
Try something like this:
/^([0-9]{2}-[0-9]{7}|[0-9]{2}-[0-9]{2}-[0-9]{4})$/
[0-9] means any character from 0 to 9.
{X} means X times
| means "or"
- means "-"
and ( and ) delimits a group for replacing
^ and $ delimit the beginning and the ending of the match.

Double quotes within character strings [duplicate]

This question already has answers here:
Double quotes not escaped in R
(1 answer)
Get indices of all character elements matches in string in R
(1 answer)
Closed 5 years ago.
I want to do two thing:
1) I want to create a character string with a double quote inside. An example in R would look like follows:
x <- 'vjghvbh"kljnj"kjbn"jk'
[1] "vjghvbh\"kljnj\"kjbn\"jk"
Question 1: How could I create such a character string without the backslash inside?
I tried to use gsub(), but unfortunately that didn't work. I also found some sources, which suggested cat(), but that just prints my character, but does not store it in x.
2) Let's assume that I solved Question 1. Then my character would look like follows:
[1] "vjghvbh"kljnj"kjbn"jk"
Now I need to find the positions of the double quotes. Based on this thread I tried gregexpr(). However, this also did not work, since I was not able to specify the pattern.
Question 2: How could I find the position of the double quotes within my character string?
The result in R should look like this:
[1] 8 14 19

Selecting the nth character within a loop using R [duplicate]

This question already has answers here:
how to replace nth character of a string in a column in r
(3 answers)
Closed 2 years ago.
For context, I am writing a code in R that selects out the most common character from a list of strings - determining the most common character in the first position of each string, and so on. To start I am running a loop within a loop to save each character to a list for use later.
I am trying to use the head function to select out each character along the string, which of course is giving me the first character, first two characters, and so on when what I want is the first, second, third, etc. character to be saved to the list.
Here is my code so far:
Store <- list()
for (j in (1:SequenceNumber)){
SequenceLength <- length(Sequences[[j]])
for (i in (1:SequenceLength)){
Store[[length(Store)+1]] <- head(Sequences[[j]], n=i)
}
}
So in summary, I am wondering what (probably extremely simple) solution there might be to select the nth element only within a loop using R.
I have tried looking around for a solution, but can only find results selecting out a specified range (for example, the first five results), instead of the nth result.
To get the Nth letter in a string use substring. For example, the 5th letter in Chicago:
> substring("Chicago", 5, 5)
[1] "a"

Resources