Remove specific string - r

I would like to remove this character
c("
I use this
df <- gsub("c/(/"", " ", df$text)
But I receive this error:
Error: unexpected string constant in "inliwc <- gsub("c/(/"", ""
What can I do?

You need to escape the round brackets as well as the quotes which can be done as :
temp <- 'this is ac(" string'
gsub("c\\(\"", " ", temp)
#OR use single quotes in gsub
#gsub('c\\("', " ", temp)
#[1] "this is a string"
A faster way would be to use fixed = TRUE
gsub('c("', " ", temp, fixed = TRUE)
You can also use sub if there is a single occurrence of the pattern in the string.

The opening round bracket is a regex metacharacter; in R, its literal use needs to be escaped using \\:
text <- "c("
text <- gsub("c\\(", "", text)

We can also use sub
sub('c[()]"', '', temp)
#[1] "this is a string"
data
temp <- 'this is ac(" string'

Related

Delete string parts within delimiter

I have a string as"dfgdf" sa"2323":
a <- "as\"dfgdf\" sa\"2323\""
The delimiter (same for the start and the end) here is ". So what I want is to get a string were everything is deleted within delimiter but not delimiter itself. So the end result string should look like as"" sa""
You could match " and forget what is matched using \K
Then use a negated character class matching any char except " or a whitespace character and use lookarounds to assert " to the right.
Use perl=TRUE to enable Perl-like regular expressions.
a <- "as\"dfgdf\" sa\"2323\""
gsub('"\\K[^"\\s]+(?=")', "", a, perl=TRUE)
Output
[1] "as\"\" sa\"\""
R demo
Here is another base R option using paste0 + strsplit
s <- paste0(paste0(unlist(strsplit(a, '"\\w+"')), '""'), collapse = "")
which gives
> s
[1] "as\"\" sa\"\""
> cat(s)
as"" sa""
Here is one option with a regex lookaround to match a word (\\w+) that succeeds a double quote and precedes one as pattern and is replaced by blank ("")
cat(gsub('(?<=")\\w+(?=")', "", a, perl = TRUE), "\n")
#as"" sa""
Or without regex lookaround
cat(gsub('"\\w+"', '""', a), "\n")
#as"" sa""
I also found a way with stringr library:
library(stringr)
a <- "as\"dfgdf\" sa\"2323\""
result <- str_replace_all(a, "\".*?\"", "\"\"")
cat(result)

How to throw out spaces and underscores only from the beginning of the string?

I want to ignore the spaces and underscores in the beginning of a string in R.
I can write something like
txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)
But I think there could be an elegant solution
txt <- " 9PM 8-Oct-2014_0.335kwh "
txt <- gsub("^[\\s+|\\_+]", "", txt)
txt
The output should be "9PM 8-Oct-2014_0.335kwh ". But my code gives " 9PM 8-Oct-2014_0.335kwh ".
How can I fix it?
You could bundle the \s and the underscore only in a character class and use quantifier to repeat that 1+ times.
^[\s_]+
Regex demo
For example:
txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)
Or as #Tim Biegeleisen points out in the comment, if only the first occurrence is being replaced you could use sub instead:
txt <- sub("[\\s_]+", "", txt, perl=TRUE)
Or using a POSIX character class
txt <- sub("[[:space:]_]+", "", txt)
More info about perl=TRUE and regular expressions used in R
R demo
The stringr packages offers some task specific functions with helpful names. In your original question you say you would like to remove whitespace and underscores from the start of your string, but in a comment you imply that you also wish to remove the same characters from the end of the same string. To that end, I'll include a few different options.
Given string s <- " \t_blah_ ", which contains whitespace (spaces and tabs) and underscores:
library(stringr)
# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "
# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"
In case you're looking to remove whitespace only – there are, after all, no underscores at the start or end of your example string – there are a couple of stringr functions that will help you keep things simple:
# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "
str_trim(s, side = "right")
# [1] " \t_blah_"
str_trim(s, side = "both") # This is the default.
# [1] "_blah_"
# `str_squish` reduces repeated whitespace anywhere in string.
s <- " \t_blah blah_ "
str_squish(s)
# "_blah blah_"
The same pattern [\\s_]+ will also work in base R's sub or gsub, with some minor modifications, if that's your jam (see Thefourthbird`s answer).
You can use stringr as:
txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"
Or the trimws in Base R
trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

Need to trim last character string only if is blank or "."

I have a large vector of words read from an excel file. Some of those records end with space or "." period. Only in those cases, I need to trim those chars.
Example:
"depresion" "tristeza."
"nostalgia" "preocupacion."
"enojo." "soledad "
"frustracion" "desesperacion "
"angustia." "desconocidos."
Notice some words end normal without "." or " ".
Is there a way to do that?
I have this
substr(conceptos, 1, nchar(conceptos)-1))
to test for the last character (conceptos is this long vector)
Thanks for any advise,
We can use sub to match zero or more . or spaces and replace it with blank ("")
sub("(\\.| )*$", "", v1)
#[1] "depresion" "tristeza" "nostalgia" "preocupacion" "enojo"
#[6] "soledad" "frustracion" "desesperacion"
#[9] "angustia" "desconocidos"
data
v1 <- c("depresion","tristeza.","nostalgia","preocupacion.",
"enojo.","soledad ","frustracion","desesperacion ",
"angustia.","desconocidos.")
Regular expressions are good for this:
library(stringr)
x = c("depresion", "tristeza.", "nostalgia", "preocupacion.",
"enojo.", "soledad ", "frustracion", "desesperacion ",
"angustia.", "desconocidos.")
x_replaced = str_replace(x, "(\\.|\\s)$", "")
The pattern (\\.|\\s)$ will match a . or any whitespace that occurs right at the end of the string.
Try this:
iif((mid(trim(conceptos), Len(conceptos), 1) == ".") ? substr(conceptos, 1, nchar(conceptos)-1)) : trim(conceptos))

Removing punctuation between two words

I have a data frame (df) and I would like to remove punctuation.
However there an issue with dot between 2 words and at the end of one word like this:
test.
test1.test2
I use this to remove the punctuation:
library(tm)
removePunctuation(df)
and the result I take is this:
test
test1test2
but I would like to take this as result:
test
test1 test2
How is it possible to have a space between two words in the removing process?
You can use chartr for single character substitution:
chartr(".", " ", c("test1.test2"))
# [1] "test1 test2"
#akrun suggested trimws to remove the space at the end of your test string:
str <- c("test.", "test1.test2")
trimws(chartr(".", " ", str))
# [1] "test" "test1 test2"
We can use gsub to replace the . with a white space and remove the trailing/leading spaces (if any) with trimws.
trimws(gsub('[.]', ' ', str1))
#[1] "test" "test1 test2"
NOTE: In regex, . by itself means any character. So we should either keep it inside square brackets[.]) or escape it (\\.) or with option fixed=TRUE
trimws(gsub('.', ' ', str1, fixed=TRUE))
data
str1 <- c("test.", "test1.test2")
you can also use strsplit:
a <- "test."
b <- "test1.test2"
do.call(paste, as.list(strsplit(a, "\\.")[[1]]))
[1] "test"
do.call(paste, as.list(strsplit(b, "\\.")[[1]]))
[1] "test1 test2"

remove all line breaks (enter symbols) from the string using R

How to remove all line breaks (enter symbols) from the string?
my_string <- "foo\nbar\rbaz\r\nquux"
I've tried gsub("\n", "", my_string), but it doesn't work, because new line and line break aren't equal.
You need to strip \r and \n to remove carriage returns and new lines.
x <- "foo\nbar\rbaz\r\nquux"
gsub("[\r\n]", "", x)
## [1] "foobarbazquux"
Or
library(stringr)
str_replace_all(x, "[\r\n]" , "")
## [1] "foobarbazquux"
I just wanted to note here that if you want to insert spaces where you found newlines the best option is to use the following:
gsub("\r?\n|\r", " ", x)
which will insert only one space regardless whether the text contains \r\n, \n or \r.
Have had success with:
gsub("\\\n", "", x)
With stringr::str_remove_all
library(stringr)
str_remove_all(my_string, "[\r\n]")
# [1] "foobarbazquux"

Resources