I have a string x as below. I am trying to replace "c("" and "\nLOC" such that I am left with only Abc,xyz.
x<-"c(\"Abc, xyz\\nLOC"
This is what I tried which works but is there a shorter way of doing it?
x <-str_replace_all(x, "[^[:alnum:]]", " ")
x <-str_replace_all(x, "c ", "")
x <-str_replace_all(x, "nLOC", "")
With only a single example, it's hard to know what to generalize... You could do it all in one big pattern,
str_replace_all(x, "[^[:alnum:],]|^c|nLOC", "")
[1] "Abc,xyz"
Related
Does anybody knows how can i replace "\" in r?
Other answers have posted something like:
l <- "1120190\neconomic"
gsub("\\", "", l, fixed=TRUE)
But didn't work in my case.
The \n is a symbol for newline and if you want to replace it with space you can use the following:
l <- "1120190\neconomic"
cat(l)
gsub("\n", " ", l, fixed=TRUE)
Note, that the output would be:
1120190
economic
[1] "1120190 economic"
Is there any effective way to remove punctuation in text but keeping hyphenated expressions, such as "accident-prone"?
I used the following function to clean my text
clean.text = function(x)
{
# remove rt
x = gsub("rt ", "", x)
# remove at
x = gsub("#\\w+", "", x)
x = gsub("[[:punct:]]", "", x)
x = gsub("[[:digit:]]", "", x)
# remove http
x = gsub("http\\w+", "", x)
x = gsub("[ |\t]{2,}", "", x)
x = gsub("^ ", "", x)
x = gsub(" $", "", x)
x = str_replace_all(x, "[^[:alnum:][:space:]'-]", " ")
#return(x)
}
and apply it on hyphenated expressions that returned
my_text <- "accident-prone"
new_text <- clean.text(text)
new_text
[1] "accidentprone"
while my desired output is
"accident-prone"
I have referenced this thread but didn't find it worked on my situation. There must be some regex things that I haven't figured out. It will be really appreciated if someone could enlighten me on this.
Putting my two cents in, you could use (*SKIP)(*FAIL) with perl = TRUE and remove any non-word characters:
data <- c("my-test of #$%^&*", "accident-prone")
(gsub("(?<![^\\w])[- ](?=\\w)(*SKIP)(*FAIL)|\\W+", "", data, perl = TRUE))
Resulting in
[1] "my-test of" "accident-prone"
See a demo on regex101.com.
Here the idea is to match what you want to keep
(?<![^\\w])[- ](?=\\w)
# a whitespace or a dash between two word characters
# or at the very beginning of the string
let these fail with (*SKIP)(*FAIL) and put what you want to be removed on the right side of the alternation, in this case
\W+
effectively removing any non-word-characters not between word characters.
You'd need to provide more examples for testing though.
The :punct: set of characters includes the dash and you are removing them. You could make an alternate character class that omits the dash. You do need to pay special attention to the square-brackets placements and escape the double quote and the backslash:
(test <- gsub("[]!\"#$%&'()*+,./:;<=>?#[\\^_`{|}~]", "", "my-test of #$%^&*") )
[1] "my-test of "
The ?regex (help page) advises against using ranges. I investigated whether there might be any simplification using my local ASCII sequence of punctuation, but it quickly became obvious that was not the way to go for other reasons. There were 5 separate ranges, and the "]" was in the middle of one of them so there would have been 7 ranges to handle in addition to the "]" which needs to come first.
I want to replace a white space with ONE backslash and a whitespace like this:
"foo bar" --> "foo\ bar"
I found how to replace with multiple backslashes but wasn't able to adapt it to a single backslash.
I tried this so far:
x <- "foo bar"
gsub(" ", "\\ ", x)
# [1] "foo bar"
gsub(" ", "\\\ ", x)
# [1] "foo bar"
gsub(" ", "\\\\ ", x)
# [1] "foo\\ bar"
However, all the outcomes do not satisfy my needs. I need the replacement to dynamically create file paths which contain folders with names like
/some/path/foo bar/foobar.txt.
To use them for shell commands in system() white spaces have to be exited with a \ to
/some/path/foo\ bar/foobar.txt.
Do you know how to solve this one?
Your problem is a confusion between the content of a string and its representation. When you print out a string in the ordinary way in R you will never see a single backslash (unless it's denoting a special character, e.g. print("y\n"). If you use cat() instead, you'll see only a single backslash.
x <- "foo bar"
y <- gsub(" ", "\\\\ ", x)
print(y)
## [1] "foo\\ bar"
cat(y,"\n") ## string followed by a newline
## foo\ bar
There are 8 characters in the string; 6 letters, one space, and the backslash.
nchar(y) ## 8
For comparison, consider \n (newline character).
z <- gsub(" ", "\n ", x)
print(z)
## [1] "foo\n bar"
cat(z,"\n")
## foo
## bar
nchar(z) ## 8
If you're constructing file paths, it might be easier to use forward slashes instead - forward slashes work as file separators in R on all operating systems (even Windows). Or check out file.path(). (Without knowing exactly what you're trying to do, I can't say more.)
To replace a space with one backslash and a space, you do not even need to use regular expression, use your gsub(" ", "\\ ", x) first attempt with fixed=TRUE:
> x <- "foo bar"
> res <- gsub(" ", "\\ ", x, fixed=TRUE)
> cat(res, "\n")
foo\ bar
See an online R demo
The cat function displays the "real", literal backslashes.
In excel (and Excel VBA) it is really helpful to connect text and variable using "&":
a = 5
msgbox "The value is: " & a
will give
"The value is: 5"
How can I do this in R? I know there is a way to use "paste". However I wonder if there isn't any trick to do it as simple as in Excel VBA.
Thanks in advance.
This blog post suggests to define your own concatenation operator, which is similar to what VBA (and Javascript) has, but it retains the power of paste:
"%+%" <- function(...) paste0(..., sep = "")
"Concatenate hits " %+% "and this."
# [1] "Concatenate hits and this."
I am not a big fan of this solution though because it kind of obscures what paste does under the hood. For instance, is it intuitive to you that this would happen?
"Concatenate this string " %+% "with this vector: " %+% 1:3
# [1] "Concatenate this string with this vector: 1"
# [2] "Concatenate this string with this vector: 2"
# [3] "Concatenate this string with this vector: 3"
In Javascript for instance, this would give you Concatenate this string with this vector: 1,2,3, which is quite different. I cannot speak for Excel, but you should think about whether this solution is not more confusing to you than it is useful.
If you need Javascript-like solution, you can also try this:
"%+%" <- function(...) {
dots = list(...)
dots = rapply(dots, paste, collapse = ",")
paste(dots, collapse = "")
}
"Concatenate this string " %+% "with this string."
# [1] "Concatenate this string with this string."
"Concatenate this string " %+% "with this vector: " %+% 1:3
# [1] "Concatenate this string with this vector: 1,2,3"
But I haven't tested extensively, so be on lookout for unexpected results.
Another possibility is to use sprintf:
a <- 5
cat(sprintf("The value is %d\n",a))
## The value is 5
the %d denotes integer formatting (%f would give "The value is 5.000000"). The \n denotes a newline at the end of the string.
sprintf() can be more convenient than paste or paste0 when you want to put together a lot of pieces, e.g.
sprintf("The value of a is %f (95% CI: {%f,%f})",
a_est,a_lwr,a_upr)
I am trying to replace strings in R in a large number of texts.
Essentially, this reproduces the format of the data from which I try to delete the '\n' parts.
document <- as.list(c("This is \\na try-out", "And it \\nfails"))
I can do this with a loop and gsub but it takes forever. I looked at this post for a solution. So I tried: temp <- apply(document, 2, function(x) gsub("\\n", " ", fixed=TRUE)). I also used lapply, but it also gives an error message. I can't figure this out, help!
use lapply if you want to return a list
document <- as.list(c("This is \\na try-out", "And it \\nfails"))
temp <- lapply(document, function(x) gsub("\\n", " ", x, fixed=TRUE))
##[[1]]
##[1] "This is a try-out"
##[[2]]
##[1] "And it fails"