Double quotes within character strings [duplicate] - r

This question already has answers here:
Double quotes not escaped in R
(1 answer)
Get indices of all character elements matches in string in R
(1 answer)
Closed 5 years ago.
I want to do two thing:
1) I want to create a character string with a double quote inside. An example in R would look like follows:
x <- 'vjghvbh"kljnj"kjbn"jk'
[1] "vjghvbh\"kljnj\"kjbn\"jk"
Question 1: How could I create such a character string without the backslash inside?
I tried to use gsub(), but unfortunately that didn't work. I also found some sources, which suggested cat(), but that just prints my character, but does not store it in x.
2) Let's assume that I solved Question 1. Then my character would look like follows:
[1] "vjghvbh"kljnj"kjbn"jk"
Now I need to find the positions of the double quotes. Based on this thread I tried gregexpr(). However, this also did not work, since I was not able to specify the pattern.
Question 2: How could I find the position of the double quotes within my character string?
The result in R should look like this:
[1] 8 14 19

Related

how to find the element in between two elements in a character vector created by an rtf document [duplicate]

This question already has answers here:
Extracting a string between other two strings in R
(4 answers)
Closed 1 year ago.
I have an object created from an rtf document using the code:sample_doc <- read_rtf("sample.doc") (I had to use read_rtf because the document is actually an rtf).
I know somewhere in the document there are two phrases (an element in the character vector) apple and orange and that there must be an element in between them. I just want to extract that in-between element. What should I do?
Thanks!
You can use positive lookbehind and lookahead to target the pattern in between, this regex should give u what u need:
(?<=orange)(.*)(?=apple)

Replace last characters of a string with its entire elements [duplicate]

This question already has answers here:
Extracting the last n characters from a string in R
(15 answers)
Closed 3 years ago.
I have an element in my dataframe which I want to modify.
I have a column with the following type of values
https://mns-xyz-eu.abc.com/ccs/proposal?action=view&proposalId=12345
I want to replace the entire string with just the last 5 characters (i.e)
Replace the entire character string with 12345 in this case.
How do I achieve this?
Thanks a lot.
One option is using a positive look behind using stringr::str_extract
str_extrct('https://mns-xyz-eu.abc.com/ccs/proposal?action=view&proposalId=12345',
'(?<=proposalId\\=)\\d+')
#Simple option
str_extract('https://mns-xyz-eu.abc.com/ccs/proposal?action=view&proposalId=12345', '\\d+')

Trying to validate two different format in one regular expression [duplicate]

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 5 years ago.
I want to validate these formats in one regular expression in asp.net:
XX-XXXXXXX or XXX-XX-XXXX
These have to be numeric only no characters except the "-".
Is this possible? I've been trying without any success so I want to ask the experts.
Thanks,
Pune
The following should work given your requirements.
"(^\d{2}-\d{7}$)|(^\d{3}-\d{2}-\d{4}$)"
Try something like this:
/^([0-9]{2}-[0-9]{7}|[0-9]{2}-[0-9]{2}-[0-9]{4})$/
[0-9] means any character from 0 to 9.
{X} means X times
| means "or"
- means "-"
and ( and ) delimits a group for replacing
^ and $ delimit the beginning and the ending of the match.

Selecting the nth character within a loop using R [duplicate]

This question already has answers here:
how to replace nth character of a string in a column in r
(3 answers)
Closed 2 years ago.
For context, I am writing a code in R that selects out the most common character from a list of strings - determining the most common character in the first position of each string, and so on. To start I am running a loop within a loop to save each character to a list for use later.
I am trying to use the head function to select out each character along the string, which of course is giving me the first character, first two characters, and so on when what I want is the first, second, third, etc. character to be saved to the list.
Here is my code so far:
Store <- list()
for (j in (1:SequenceNumber)){
SequenceLength <- length(Sequences[[j]])
for (i in (1:SequenceLength)){
Store[[length(Store)+1]] <- head(Sequences[[j]], n=i)
}
}
So in summary, I am wondering what (probably extremely simple) solution there might be to select the nth element only within a loop using R.
I have tried looking around for a solution, but can only find results selecting out a specified range (for example, the first five results), instead of the nth result.
To get the Nth letter in a string use substring. For example, the 5th letter in Chicago:
> substring("Chicago", 5, 5)
[1] "a"

How to edit names using regular expression in R? [duplicate]

This question already has answers here:
Find a word before one of two possible separators
(4 answers)
Closed 8 years ago.
I have names like as following. I just want to keep the part before . . How
>name
uc001aaa.3
uc001aac.4
uc001aae.4
uc001aah.4
uc001aai.1
uc001aak.3
uc001aal.1
uc001aam.4
uc001aaq.2
uc001aar.2
How can I implement this using regex or sub in R ?
I thought this would certainly be a duplicate, but despite the number of gsub question I can't easily find one (e.g. https://stackoverflow.com/questions/23844473/exclude-a-pattern-in-all-collumn-names-in-r). Update: ironically, the closest one is a question the OP asked a few days ago, How to trim the column name of the matrix? ...
Anyway,
gsub("\\.[0-9]$","",name)
does what you want;
\\. specifies a literal . character (one backslash is required to specify that . is literal rather than meaning "any character"; the second is required to protect the first!). As #MatthewLundberg points out you could also use [.] here (. is interpreted literally, rather than as "any character", within the range brackets []).
[0-9] means "a single character in the range 0-9" (not, as you seem to think, the first 9 characters of the string)
$ means "end of string"
So this will remove a dot plus a single number from the end of every string. It doesn't matter how many characters are before the dot. On the other hand, if you might have multiple numeric values, e.g. foo.123, you would need "\\.[0-9]+$ instead (the + means "one or more of the preceding pattern")
Here is a strsplit method, which separates the string on . characters, and keeps the first portion:
sapply(strsplit(name, '[.]'), '[', 1)
## [1] "uc001aaa" "uc001aac" "uc001aae" "uc001aah" "uc001aai" "uc001aak" "uc001aal" "uc001aam" "uc001aaq" "uc001aar"
I'm using the regular expression [.] to match a literal dot rather than \\. because I find it more readable. (It also helps if you have multiple levels of interpretation, but that's not an issue here.)

Resources