R keep double quotes from list to String - r

I have a list that looks like this
[1] "SCOPUS_ID:84942789431" "SCOPUS_ID:84928151617" "SCOPUS_ID:84939229259" "SCOPUS_ID:84946407175"
[5] "SCOPUS_ID:84933039513" "SCOPUS_ID:84942789431" "SCOPUS_ID:84942607254" "SCOPUS_ID:84948165954"
[9] "SCOPUS_ID:84926379258" "SCOPUS_ID:84946771354" "SCOPUS_ID:84944223683" "SCOPUS_ID:84942789431"
[13] "SCOPUS_ID:84939169499" "SCOPUS_ID:84947104346" "SCOPUS_ID:84948764343" "SCOPUS_ID:84938075139"
[17] "SCOPUS_ID:84946196118" "SCOPUS_ID:84930820238" "SCOPUS_ID:84947785321" "SCOPUS_ID:84933496680"
[21] "SCOPUS_ID:84942789431"
I want to use the function toString but to keep the double quotes so to look like this
[1] " \"SCOPUS_ID:84942789431\", \"SCOPUS_ID:84928151617\", ... "

I'll admit that I'm fairly confused by what you're asking for, but I think this is what you want:
x <- c("SCOPUS_ID:84942789431", "SCOPUS_ID:84928151617", "SCOPUS_ID:84939229259")
paste('"', x, '"', sep = "", collapse = ", ")
# [1] "\"SCOPUS_ID:84942789431\", \"SCOPUS_ID:84928151617\", \"SCOPUS_ID:84939229259\""
I know you said you didn't want to use paste because it'll take 2-3 seconds, but I can't think of an alternative that gives you what you want right now. I'm sure others will have suggestions.

Related

concat a SPLIT variable in R

I've been trying to split a string in R and then joining it back together but none of the tricks have worked for what I need.
!!!Important !!! My question is not a duplicate:
saving a split result into a variable and then pasting, collapsing etc is not the same as just paste a vector like this
paste(c("bla", "bla"), collapse = " ")
> paste(c("The","birch", "canoe"), collapse = ' ')
[1] "The birch canoe"
> paste(s, collapse=" ")
[1] "c(\"The\", \"birch\", \"canoe\", \"slid\", \"on\", \"the\", \"smooth\", \"planks.\")"
Here's the code:
I take pre-saved sentences in R
sentences[1]
and split it
s <- str_split(sentences[1])
this is what I get:
[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
Now when I try to join this back together I get backslashes
toString(s)
"c(\"The\", \"birch\", \"canoe\", \"slid\", \"on\", \"the\", \"smooth\", \"planks.\")"
paste produces the same result:
> paste(s)
[1] "c(\"The\", \"birch\", \"canoe\", \"slid\", \"on\", \"the\", \"smooth\", \"planks.\")"
I tried using str_split_fixed and wrap it into a vector, but it joins the sentence back together with a comma, even if I ask it not to.
v <- as.vector(str_split_fixed(sentences[1], " ", 5))
toString(v, sep="")
[1] "The, birch, canoe, slid, on the smooth planks."
I thought maybe str_split_i or str_split_1 could solve it as according to the documentation in theory it should, but that's what I get when I try to use it
"could not find function "str_split_1" "
Are there any other ways to join back a string after splitting it without it producing commas or backslashes?..
See the difference between:
s <- list(c("The" , "birch" , "canoe" , "slid" , "on" , "the" , "smooth" , "planks."))
paste(s[1], collapse = " ")
#[1] "c(\"The\", \"birch\", \"canoe\", \"slid\", \"on\", \"the\", \"smooth\", \"planks.\")"
and
paste(s[[1]], collapse = " ")
#[1] "The birch canoe slid on the smooth planks."
This is because [[ will extract the vector, and [ and will keep the output as a list.

Use gsub to replace curly apostrophe with straight apostrophe in R list of character vectors

Looking for some guidance on how to replace a curly apostrophe with a straight apostrophe in an R list of character vectors.
The reason I'm replacing the curly apostrophes - later in the script, I check each list item, to see if it's found in a dictionary (using qdapDictionary) to ensure it's a real word and not garbage. The dictionary uses straight apostrophes, so words with the curly apostrophes are being "rejected."
A sample of the code I have currently follows. In my test list, item #6 contains a curly apostrophe, and item #2 has a straight apostrophe.
Example:
list_TestWords <- as.list(c("this", "isn't", "ideal", "but", "we", "can’t", "fix", "it"))
func_ReplaceTypographicApostrophes <- function(x) {
gsub("’", "'", x, ignore.case = TRUE)
}
list_TestWords_Fixed <- lapply(list_TestWords, func_ReplaceTypographicApostrophes)
The result: No change. Item 6 still using curly apostrophe. See output below.
list_TestWords_Fixed
[[1]]
[1] "this"
[[2]]
[1] "isn't"
[[3]]
[1] "ideal"
[[4]]
[1] "but"
[[5]]
[1] "we"
[[6]]
[1] "can’t"
[[7]]
[1] "fix"
[[8]]
[1] "it"
Any help you can offer will be most appreciated!
This might work: gsub("[\u2018\u2019\u201A\u201B\u2032\u2035]", "'", x)
I found it over here: http://axonflux.com/handy-regexes-for-smart-quotes
You might be running up against a bug in R on Windows. Try using utf8::as_utf8 on your input. Alternatively, this also works:
library(utf8)
list_TestWords <- as.list(c("this", "isn't", "ideal", "but", "we", "can’t", "fix", "it"))
lapply(list_TestWords, utf8_normalize, map_quote = TRUE)
This will replace the following characters with ASCII apostrophe:
U+055A ARMENIAN APOSTROPHE
U+2018 LEFT SINGLE QUOTATION MARK
U+2019 RIGHT SINGLE QUOTATION MARK
U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
U+FF07 FULLWIDTH APOSTROPHE
It will also convert your text to composed normal form (NFC).
I see a problem in your call to gsub:
gsub("/’", "/'", x, ignore.case = TRUE)
You are prefixing the curly single quote with a forward slash. I don't know why you are doing this. I could speculate that you are trying to escape the quote characters, but this is having the side effect that your pattern is now trying to match a forward slash followed by a quote. As this never occurs in your text, no replacements are being made. You should be doing this:
gsub("’", "'", x, ignore.case = TRUE)
Follow the link below for a demo which shows that using the above gsub calls works as you expect.
Demo
Was about to say the same thing.
Try using str_replace from stringr package, will not need to use slashes
I was facing similar problem. Somehow non of the solutions worked for me. So I devised an indirect way of doing it by identifying apostrophe and replacing it with the required format.
gsub("(\\w)(\\W)(\\w\\s)", "\\1'\\3","sid’s bicycle")
[1] "sid's bicycle"
Hope it helps someone.

What is the equivalent to VBAs "&" in R?

In excel (and Excel VBA) it is really helpful to connect text and variable using "&":
a = 5
msgbox "The value is: " & a
will give
"The value is: 5"
How can I do this in R? I know there is a way to use "paste". However I wonder if there isn't any trick to do it as simple as in Excel VBA.
Thanks in advance.
This blog post suggests to define your own concatenation operator, which is similar to what VBA (and Javascript) has, but it retains the power of paste:
"%+%" <- function(...) paste0(..., sep = "")
"Concatenate hits " %+% "and this."
# [1] "Concatenate hits and this."
I am not a big fan of this solution though because it kind of obscures what paste does under the hood. For instance, is it intuitive to you that this would happen?
"Concatenate this string " %+% "with this vector: " %+% 1:3
# [1] "Concatenate this string with this vector: 1"
# [2] "Concatenate this string with this vector: 2"
# [3] "Concatenate this string with this vector: 3"
In Javascript for instance, this would give you Concatenate this string with this vector: 1,2,3, which is quite different. I cannot speak for Excel, but you should think about whether this solution is not more confusing to you than it is useful.
If you need Javascript-like solution, you can also try this:
"%+%" <- function(...) {
dots = list(...)
dots = rapply(dots, paste, collapse = ",")
paste(dots, collapse = "")
}
"Concatenate this string " %+% "with this string."
# [1] "Concatenate this string with this string."
"Concatenate this string " %+% "with this vector: " %+% 1:3
# [1] "Concatenate this string with this vector: 1,2,3"
But I haven't tested extensively, so be on lookout for unexpected results.
Another possibility is to use sprintf:
a <- 5
cat(sprintf("The value is %d\n",a))
## The value is 5
the %d denotes integer formatting (%f would give "The value is 5.000000"). The \n denotes a newline at the end of the string.
sprintf() can be more convenient than paste or paste0 when you want to put together a lot of pieces, e.g.
sprintf("The value of a is %f (95% CI: {%f,%f})",
a_est,a_lwr,a_upr)

removing everything after first 'backslash' in a string

I have a vector like below
vec <- c("abc\edw\www", "nmn\ggg", "rer\qqq\fdf"......)
I want to remove everything after as soon as first slash is encountered, like below
newvec <- c("abc","nmn","rer")
Thank you.
My original vector is as below (only the head)
[1] "peoria ave\nste \npeoria" [2] "wood dr\nphoenix"
"central ave\nphoenix"
[4] "southern ave\nphoenix" [5] "happy valley rd\nste
\nglendaleaz " "the americana at brand\n americana way\nglendale"
Here the problem is my original csv file does not contain backslashes, but when i read it backslashes appear. Original csv file is as below
[1] "peoria ave [2] "wood dr
nste nphoenix"
npeoria"
As you can see, they are actually separated by "ENTER" but when i read it in R using read.csv() they are replaced by backslashes.
another solution :
sub("\\\\.*", "", x)
vec <- c("abc\\edw\\www", "nmn\\ggg", "rer\\qqq\\fdf")
sub("([^\\\\])\\\\.*","\\1", vec)
[1] "abc" "nmn" "rer"
strssplit(vec, "\\\\") should do the job.
TO select the first element [[1]][1] 2nd [[1]][2]

How to Convert "space" into "%20" with R

Referring the title, I'm figuring how to convert space between words to be %20 .
For example,
> y <- "I Love You"
How to make y = I%20Love%20You
> y
[1] "I%20Love%20You"
Thanks a lot.
Another option would be URLencode():
y <- "I love you"
URLencode(y)
[1] "I%20love%20you"
gsub() is one option:
R> gsub(pattern = " ", replacement = "%20", x = y)
[1] "I%20Love%20You"
The function curlEscape() from the package RCurl gets the job done.
library('RCurl')
y <- "I love you"
curlEscape(urls=y)
[1] "I%20love%20you"
I like URLencode() but be aware that sometimes it does not work as expected if your url already contains a %20 together with a real space, in which case not even the repeated option of URLencode() is doing what you want.
In my case, I needed to run both URLencode() and gsub consecutively to get exactly what I needed, like so:
a = "already%20encoded%space/a real space.csv"
URLencode(a)
#returns: "encoded%20space/real space.csv"
#note the spaces that are not transformed
URLencode(a, repeated=TRUE)
#returns: "encoded%2520space/real%20space.csv"
#note the %2520 in the first part
gsub(" ", "%20", URLencode(a))
#returns: "encoded%20space/real%20space.csv"
In this particular example, gsub() alone would have been enough, but URLencode() is of course doing more than just replacing spaces.

Resources