Referring the title, I'm figuring how to convert space between words to be %20 .
For example,
> y <- "I Love You"
How to make y = I%20Love%20You
> y
[1] "I%20Love%20You"
Thanks a lot.
Another option would be URLencode():
y <- "I love you"
URLencode(y)
[1] "I%20love%20you"
gsub() is one option:
R> gsub(pattern = " ", replacement = "%20", x = y)
[1] "I%20Love%20You"
The function curlEscape() from the package RCurl gets the job done.
library('RCurl')
y <- "I love you"
curlEscape(urls=y)
[1] "I%20love%20you"
I like URLencode() but be aware that sometimes it does not work as expected if your url already contains a %20 together with a real space, in which case not even the repeated option of URLencode() is doing what you want.
In my case, I needed to run both URLencode() and gsub consecutively to get exactly what I needed, like so:
a = "already%20encoded%space/a real space.csv"
URLencode(a)
#returns: "encoded%20space/real space.csv"
#note the spaces that are not transformed
URLencode(a, repeated=TRUE)
#returns: "encoded%2520space/real%20space.csv"
#note the %2520 in the first part
gsub(" ", "%20", URLencode(a))
#returns: "encoded%20space/real%20space.csv"
In this particular example, gsub() alone would have been enough, but URLencode() is of course doing more than just replacing spaces.
Related
I have strings that look like these:
> ABCD.1:f_HJK
> ABFD.1:f_HTK
> CJD:f_HRK
> QQYP.2:f_HDP
So basically, I have always a string in the first part, I could have a part with . and a number, and after this part I always have ':' and a string.
I would like to remove the '. + number' when it is included in the string, using R.
I know that maybe regular expressions could be useful but I have not idea about I can apply them in this context. I know that I can substitute the '.' with gsub, but not idea about how I can add the information about number and ':'.
Thank you for your help.
Does this work:
v <- c('ABCD.1:f_HJK','ABFD.1:f_HTK','CJD:f_HRK','QQYP.2:f_HDP')
v
[1] "ABCD.1:f_HJK" "ABFD.1:f_HTK" "CJD:f_HRK" "QQYP.2:f_HDP"
gsub('([A-Z]{,4})(\\.\\d)?(:.*)','\\1\\3',v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
You could also use any of the following depending on the structure of your string
If no other period and numbers in the string
sub("\\.\\d+", "", v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
If you are only interested in the first pattern matched.
sub("^([A-Z]+)\\.\\d+:", "\\1:", v)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
Same as above, invoking perl. ie no captured groups
sub("^[A-Z]+\\K\\.\\d+", "", v, perl = TRUE)
[1] "ABCD:f_HJK" "ABFD:f_HTK" "CJD:f_HRK" "QQYP:f_HDP"
If I understood your explanation correctly, this should do the trick:
gsub("(\\.\\d+)", "", string)
I have a string printed out like this:
"\"Jenna and Alex were making cupcakes.\", \"Jenna asked Alex whether all were ready to be frosted.\", \"Alex said that\", \" some of them \", \"were.\", \"He added\", \"that\", \"the rest\", \"would be\", \"ready\", \"soon.\", \"\""
(The "\" wasn't there. R just automatically prints it out.)
I would like to calculate how many non-empty segments there are in this string. In this case the answer should be 11.
I tried to convert it to a vector, but R ignores the quotation marks so I still ended up with a vector with length 1.
I don't know whether I need to extract those segments first and then count, or there're easier ways to do that.
If it's the former case, which regular expression function best suits my need?
Thank you very much.
You can use scan to convert your large string into a vector of individual ones, then use nchar to count the lengths. Assuming your large string is x:
y <- scan(text=x, what="character", sep=",", strip.white=TRUE)
Read 12 items
sum(nchar(y)>0)
[1] 11
I assume a segment is defined as anything between . or ,. An option using strsplit can be found as:
length(grep("\\w+", trimws(strsplit(str, split=",|\\.")[[1]])))
#[1] 11
Note: trimws is not mandatory in above statement. I have included so that one can get the value of each segment by just adding value = TRUE argument in grep.
Data:
str <- "\"Jenna and Alex were making cupcakes.\", \"Jenna asked Alex whether all were ready to be frosted.\", \"Alex said that\", \" some of them \", \"were.\", \"He added\", \"that\", \"the rest\", \"would be\", \"ready\", \"soon.\", \"\""
strsplit might be one possibility?
txt <- "Jenna and Alex were making cupcakes., Jenna asked Alex whether all were ready to be frosted.,
Alex said that, some of them , were., He added, that, the rest, would be, ready, soon.,"
a <- strsplit(txt, split=",")
length(a[[1]])
[1] 11
If the backslashes are part of the text it doesnt really change a lot, except for the last element which would have "\"" in it. By filtering that out, the result is the same:
txt <- "\"Jenna and Alex were making cupcakes.\", \"Jenna asked Alex whether all
were ready to be frosted.\", \"Alex said that\", \" some of them \",
\"were.\", \"He added\", \"that\", \"the rest\", \"would be\", \"ready\", \"soon.\", \"\""
a <- strsplit(txt, split=", \"")
length(a[[1]][a[[1]] != "\""])
[1] 11
This is an absurd idea, but it does work:
txt <- "\"Jenna and Alex were making cupcakes.\", \"Jenna asked Alex whether all were ready to be frosted.\", \"Alex said that\", \" some of them \", \"were.\", \"He added\", \"that\", \"the rest\", \"would be\", \"ready\", \"soon.\", \"\""
Txt <-
read.csv(text = txt,
header = FALSE,
colClasses = "character",
na.strings = c("", " "))
sum(!vapply(Txt, is.na, logical(1)))
Looking for some guidance on how to replace a curly apostrophe with a straight apostrophe in an R list of character vectors.
The reason I'm replacing the curly apostrophes - later in the script, I check each list item, to see if it's found in a dictionary (using qdapDictionary) to ensure it's a real word and not garbage. The dictionary uses straight apostrophes, so words with the curly apostrophes are being "rejected."
A sample of the code I have currently follows. In my test list, item #6 contains a curly apostrophe, and item #2 has a straight apostrophe.
Example:
list_TestWords <- as.list(c("this", "isn't", "ideal", "but", "we", "can’t", "fix", "it"))
func_ReplaceTypographicApostrophes <- function(x) {
gsub("’", "'", x, ignore.case = TRUE)
}
list_TestWords_Fixed <- lapply(list_TestWords, func_ReplaceTypographicApostrophes)
The result: No change. Item 6 still using curly apostrophe. See output below.
list_TestWords_Fixed
[[1]]
[1] "this"
[[2]]
[1] "isn't"
[[3]]
[1] "ideal"
[[4]]
[1] "but"
[[5]]
[1] "we"
[[6]]
[1] "can’t"
[[7]]
[1] "fix"
[[8]]
[1] "it"
Any help you can offer will be most appreciated!
This might work: gsub("[\u2018\u2019\u201A\u201B\u2032\u2035]", "'", x)
I found it over here: http://axonflux.com/handy-regexes-for-smart-quotes
You might be running up against a bug in R on Windows. Try using utf8::as_utf8 on your input. Alternatively, this also works:
library(utf8)
list_TestWords <- as.list(c("this", "isn't", "ideal", "but", "we", "can’t", "fix", "it"))
lapply(list_TestWords, utf8_normalize, map_quote = TRUE)
This will replace the following characters with ASCII apostrophe:
U+055A ARMENIAN APOSTROPHE
U+2018 LEFT SINGLE QUOTATION MARK
U+2019 RIGHT SINGLE QUOTATION MARK
U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
U+FF07 FULLWIDTH APOSTROPHE
It will also convert your text to composed normal form (NFC).
I see a problem in your call to gsub:
gsub("/’", "/'", x, ignore.case = TRUE)
You are prefixing the curly single quote with a forward slash. I don't know why you are doing this. I could speculate that you are trying to escape the quote characters, but this is having the side effect that your pattern is now trying to match a forward slash followed by a quote. As this never occurs in your text, no replacements are being made. You should be doing this:
gsub("’", "'", x, ignore.case = TRUE)
Follow the link below for a demo which shows that using the above gsub calls works as you expect.
Demo
Was about to say the same thing.
Try using str_replace from stringr package, will not need to use slashes
I was facing similar problem. Somehow non of the solutions worked for me. So I devised an indirect way of doing it by identifying apostrophe and replacing it with the required format.
gsub("(\\w)(\\W)(\\w\\s)", "\\1'\\3","sid’s bicycle")
[1] "sid's bicycle"
Hope it helps someone.
I am trying to get the text between two words in a sentence.
For example the sentence is -
x <- "This is my first sentence"
Now I want the text between This and first which is is my .
I have tried various functions from R like grep, grepl, pmatch , str_split. However, I could not get exactly what I want .
This is the closest what I have reached with gsub.
gsub(".*This\\s*|first*", "", x)
The output it gives is
[1] "is my sentence"
In reality, what I need is only
[1] "is my"
Any help would be appreciated.
You need .* at the end to match zero or more characters after the 'first'
gsub('^.*This\\s*|\\s*first.*$', '', x)
#[1] "is my"
Another approach using rm_between from the qdapRegex package.
library(qdapRegex)
rm_between(x, 'This', 'first', extract=TRUE)[[1]]
# [1] "is my"
Since this question is used as a reference, I'll add some possible solutions to build a complete overview. Both are based on a look-ahead/look-behind regex pattern.
base R
regmatches( x, gregexpr("(?<=This ).*(?= first)", x, perl = TRUE ) )
stringr
stringr::str_extract_all( x, "(?<=This ).+(?= first)" )
I try to add single quotes to a string but don't see how to do it. For instance I would like to replace ABC by 'ABC'.
I have played with paste, cat, print but don't see how to do it.
Any solution?
Thanks,
Vincent
Maybe use sQuote?
sQuote("ABC")
# [1] "'ABC'"
This (like its sibling dQuote) is frequently used to put quotes around some message or other text that's being printed to the console:
cat("ABC", "\n")
# ABC
cat(sQuote("ABC"), "\n")
# 'ABC'
Do note (as is documented in ?sQuote) that, depending on the type of quotes needed for your task, you may need to first reset options("useFancyQuotes"). To ensure that the function decorates your text with simple upright ASCII quotes, for example, do the following:
options(useFancyQuotes = FALSE)
sQuote("ABC")
# [1] "'ABC'"
Just use paste:
R> paste("'", "ABC", "'", sep="")
[1] "'ABC'"
or the new variety
R> paste0("'", "ABC", "'")
[1] "'ABC'"
Extending #vodka answer:
s <- c("cat", "dog")
a <- "'"
mapply(paste0, a, s, a)
Using Reduce and paste0
Reduce(paste0,list("'","a","'"))
[1] "'a'"
Maybe I'm missing something:
s <- "cat"
a <- "'"
paste(a,s,a,sep="")