How to insert backslash followed by single quote using paste0 in R? - r

I'm trying to separate the elements in a vector with \' and a comma using paste0. For example:
test_vector = c("test1", "test2", "test3")
I would like to use paste0 to generate the following output:
\'test1\', \'test2\', \'test3\'
because the backslash character is an escape character itself,
paste0(test_vector, collapse = "\', \'")
generates the following:
"test1', 'test2', 'test3"

How about
(x <- paste0("\\'", test_vector, "\\'", collapse = ", "))
# [1] "\\'test1\\', \\'test2\\', \\'test3\\'"
We can check the actual result with cat() (since the second backslash is only present when printed to the console).
cat(x)
# \'test1\', \'test2\', \'test3\'

Related

Get rid of extra sep in the paste function in R

I am trying to get rid of the extra sep in the paste function in R.
It looks easy but I cannot find a non-hacky way to fix it. Assume l1-l3 are lists
l1 = list(a=1)
l2 = list(b=2)
l3 = list(c=3)
l4 = list(l1,l2=l2,l3=l3)
note that the first element of l4 is not named. Now I want to add a constant to the names like below:
names(l4 ) = paste('Name',names(l4),sep = '.')
Here is the output:
names(l4)
[1] "Name." "Name.l2" "Name.l3"
How can I get rid of the . in the first output (Name.)
We can ue trimws (from R 3.6.0 - can specify whitespace with custom character)
trimws(paste('Name',names(l4),sep = '.'), whitespace = "\\.")
#[1] "Name" "Name.l2" "Name.l3"
Or with sub to match the . (. is a metacharacter for any character, so we escape \\ to get the literal meaning) at the end ($) of the string and replace with blank ("")
sub("\\.$", "", paste('Name',names(l4),sep = '.'))
If the . is already there in the names at the end, we can use an index option
ifelse(nzchar(names(l4)), paste("Name", names(l4), sep="."), "Name")
#[1] "Name" "Name.l2." "Name.l3"

R - Construct a string with double quotations

I basically need the outcome (string) to have double quotations, thus need of escape character. Preferabily solving with R base, without extra R packages.
I have tried with squote, shQuote and noquote. They just manipulate the quotations, not the escape character.
My list:
power <- "test"
myList <- list (
"power" = power)
I subset the content using:
myList
myList$power
Expected outcome (a string with following content):
" \"power\": \"test\" "
Using package glue:
library(glue)
glue(' "{names(myList)}": "{myList}" ')
"power": "test"
Another option using shQuote
paste(shQuote(names(myList), type = "cmd"),
shQuote(unlist(myList), type = "cmd"),
sep = ": ")
# [1] "\"power\": \"test\""
Not sure to get your expectation. Is it what you want?
myList <- list (
"power" = "test"
)
stringr::str_remove_all(
as.character(jsonlite::toJSON(myList, auto_unbox = TRUE)),
"[\\{|\\}]")
# [1] "\"power\":\"test\""
If you want some spaces:
x <- stringr::str_remove_all(
as.character(jsonlite::toJSON(myList, auto_unbox = TRUE)),
"[\\{|\\}]")
paste0(" ", x, " ")

Replace multiple strings comprising of a different number of characters with one gsubfn()

Here Replace multiple strings in one gsub() or chartr() statement in R? it is explained to replace multiple strings of one character at in one statement with gsubfn(). E.g.:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", " " = ""), x)
# "doremig_k"
I would however like to replace the string 'doremi' in the example with ''. This does not work:
x <- "doremi g-k"
gsubfn(".", list("-" = "_", "doremi" = ""), x)
# "doremi g_k"
I guess it is because of the fact that the string 'doremi' contains multiple characters and me using the metacharacter . in gsubfn. I have no idea what to replace it with - I must confess I find the use of metacharacters sometimes a bit difficult to udnerstand. Thus, is there a way for me to replace '-' and 'doremi' at once?
You might be able to just use base R sub here:
x <- "doremi g-k"
result <- sub("doremi\\s+([^-]+)-([^-]+)", "\\1_\\2", x)
result
[1] "g_k"
Does this work for you?
gsubfn::gsubfn(pattern = "doremi|-", list("-" = "_", "doremi" = ""), x)
[1] " g_k"
The key is this search: "doremi|-" which tells to search for either "doremi" or "-". Use "|" as the or operator.
Just a more generic solution to #RLave's solution -
toreplace <- list("-" = "_", "doremi" = "")
gsubfn(paste(names(toreplace),collapse="|"), toreplace, x)
[1] " g_k"

String replacement using sub function

I am attempting to extract the names of NBA players from a column in a database. However, the format of the the names in the names column is the following:
"LeBron James\\jamesle01"
I used the following regex expression inside a sub function to attempt to keep only the name portion:
sub("([A-Z]\\w+\\s*-*'*[a-z]*\\s*\\.*|[A-Z]\\.\\s*)\\*\\*[a-z]*\\d*\\d*", replacement = "\\1", x = nba_salaries$Names)
The expression is meant to take into account for unusual names that contain more than just alphanumeric characters (e.g. Michael Kidd-Gilchrist, De'Andre Jordan, Luc Mbah a Moute, etc.)
However, when I run the following,
head(nba_salaries$Names)
The names end up being in the same format.
I have used regexr.com to ensure that the regex expression captures the strings properly.
How about this, you can split the text by the "\\" string, and then take only the first element:
text <- c( "LeBron James\\jamesle01", "Michael Jordan\\jamesle01" )
sapply( strsplit( text, "\\\\" ), "[", 1 )
Which gives
[1] "LeBron James" "Michael Jordan"
To explain. The "[" is a function*, which is being called within sapply. So we pass the result of strsplit as the X in sapply, and apply the [ function to it* with the parameter 1 to take the 1st element. Here's another way to put it:
text <- strsplit( text, "\\\\" )
This will output a list, with each list element containing a vector, where the first element is the text before the "\\" string, and the second element contains any text after it. Then we use the "[" function*, passing the parameter 1, to take the first element of each of those vectors:
text <- sapply( X = text, FUN = "[", 1 )
Edit to add, I personally like using the magrittr pipe for things like this, just to make it a little more readable:
library( magrittr )
text <- strsplit( x = text, split = "\\\\" ) %>%
sapply( FUN = "[", 1 )
the "[" function is the function called when you subset with []. eg: vector[1:3] or in this case vector[1] (thanks #MathewLundberg for the suggestion here)

Avoid that space in column name is replaced with period (".") when using read.csv()

I am using R to do some data pre-processing, and here is the problem that I am faced with: I input the data using read.csv(filename,header=TRUE), and then the space in variable names became ".", for example, a variable named Full Code became Full.Code in the generated dataframe. After the processing, I use write.xlsx(filename) to export the results, while the variable names are changed. How to address this problem?
Besides, in the output .xlsx file, the first column become indices(i.e., 1 to N), which is not what I am expecting.
If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.
To get spaces back in the names, do this (right before you export - R does let you have spaces in variable names, but it's a pain):
# A simple regular expression to replace dots with spaces
# This might have unintended consequences, so be sure to check the results
names(yourdata) <- gsub(x = names(yourdata),
pattern = "\\.",
replacement = " ")
To drop the first-column index, just add row.names = FALSE to your write.xlsx(). That's a common argument for functions that write out data in tabular format (write.csv() has it, too).
Here's a function (sorry, I know it could be refactored) that makes nice column names even if there are multiple consecutive dots and trailing dots:
makeColNamesUserFriendly <- function(ds) {
# FIXME: Repetitive.
# Convert any number of consecutive dots to a single space.
names(ds) <- gsub(x = names(ds),
pattern = "(\\.)+",
replacement = " ")
# Drop the trailing spaces.
names(ds) <- gsub(x = names(ds),
pattern = "( )+$",
replacement = "")
ds
}
Example usage:
ds <- makeColNamesUserFriendly(ds)
Just to add to the answers already provided, here is another way of replacing the “.” or any other kind of punctation in column names by using a regex with the stringr package in the way like:
require(“stringr”)
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")
For example try:
data <- data.frame(variable.x = 1:10, variable.y = 21:30, variable.z = "const")
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")
and
colnames(data)
will give you
[1] "variable x" "variable y" "variable z"

Resources