Replace two dots in a string with gsub - r

I'm trying to use the following code to replace two dots for only one:
test<-"test..1"
gsub("\\..", ".", test, fixed=TRUE)
and getting:
[1] "test..1"
I tried several combinations of escape strings, including brackets [] with no success.
What am I doing wrong?

If you are going to use fixed = TRUE, use the (non-interpreted) character .:
> gsub("..", ".", test, fixed = TRUE)
Otherwise, within regular expressions (fixed = FALSE), . has a special meaning (any character) so you'll want to prefix it with a backslash to mean "the dot character":
> gsub("\\.\\.", ".", test)
> gsub("\\.{2}", ".", test)

Related

R RegEx gsub() Equivalent of "Line Operations>Remove Empty Lines (Containing Blank Characters)" in CSV file

I have a CSV fwith several columns: Tweet, date, etc. The spaces in some Tweets is causing blank lines and undesired truncated lines.
What works:
1. Using Notepad++'s function "Line Operations>Remove Empty Lines (Containing Blank Characters)"
2. Search and replace: \r with nothing.
However, I need to do this for a large number of files, and I can't manage to find a Regular Expression with gsub() in R that will do what the Notepadd++ function does.
Note that replacing ^[ \t]*$\r?\n with nothing and then \r with nothing does work in Notepad++, but not in R, as suggested here, but it does not work with g(sub) in R.
I have tried the following code:
tx <- readLines("tweets.csv")
subbed <-gsub(pattern = "^[ \\t]*$\\r?\\n", replace = "", x = tx)
subbed <-gsub(pattern = "\r", replace = "", x = subbed)
writeLines(subbed, "output.csv")
This is the input:
This is the desired output:
You may use
library(readtext)
tx <- readtext("weets.csv")
subbed <- gsub("(?m)^\\h*\\R?", "", tx$text, perl=TRUE)
subbed <- gsub("\r", "", subbed, fixed=TRUE)
writeLines(trimws(subbed), "output.csv")
The readtext llibrary reads the file into a single variable and thus all line break chars are kept.

Replace slash with a single backslash in R

This is probably trivial, but I have failed to find any question referring to this exact issue.
My issue does not have to do with coming up with a suitable regex, it has to do with accurately specifying the replacement part.
x = "file_path/file_name.txt" - this is what I have
# "file_path\file_name.txt" - this is what I want
Here is what I tried:
library(stringr)
str_detect(string = x, pattern = "/") # returns TRUE, as expected
#str_replace_all(string = x, pattern = "/", replacement = "\") # fails, R believes I'm escaping the quote in the replacement
str_replace_all(string = x, pattern = "/", replacement = "\\") # this results to "file_pathfile_name.txt", missing the backslash altogether
str_replace_all(string = x, pattern = "/", replacement = "\\\\") # this results to "file_path\\file_name.txt", which is not what I want
Any help would be greatly appreciated.
The solution is to escape the escape character which means 4 '\' in the end.
cat(gsub('/', '\\\\', "file_path/file_name.txt"))
Look at the difference between your standard output with like 'print()' which escapes the escape character, or get the plain string by using 'cat()'.
str_replace_all(string = x, pattern = "/", replacement = "\\\\")
cat(str_replace_all(string = x, pattern = "/", replacement = "\\\\"))

Gsub a every element after a keyword in R

I'd like to remove all elements of a string after a certain keyword.
Example :
this.is.an.example.string.that.I.have
Desired Output :
This.is.an.example
I've tried using gsub('string', '', list) but that only removes the word string. I've also tried using the gsub('^string', '', list) but that also doesn't seem to work.
Thank you.
Following simple sub may help you here.
sub("\\.string.*","",variable)
Explanation: Method of using sub
sub(regex_to_replace_text_in_variable,new_value,variable)
Difference between sub and gsub:
sub: is being used for performing substitution on variables.
gsub: gsub is being used for same substitution tasks only but only thing it will be perform substitution on ALL matches found though sub performs it only for first match found one.
From help page of R:
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
You can try this positive lookbehind regex
S <- 'this.is.an.example.string.that.I.have'
gsub('(?<=example).*', '', S, perl=TRUE)
# 'this.is.an.example'
You can use strsplit. Here you split your string after a key word, and retain the first part of the string.
x <- "this.is.an.example.string.that.I.have"
strsplit(x, '(?<=example)', perl=T)[[1]][1]
[1] "this.is.an.example"

Set number of arguments programmatically

I have the following string:
test <- "C:\\Users\\stefanj\\Documents\\Automation_Desk\\script.R"
I am separating the string on the backslash characters with the following code:
pdf_path_long <- unlist(strsplit(test, "\\\\",
fixed = FALSE, perl = FALSE, useBytes = FALSE))
What I want to do is:
pdf_path_short <- file.path(pdf_path_long[1], pdf_path_long[2], ...)
Problem is:
I know how to count the elements in the pdf_path_short - length(pdf_path_long), but I don't know how to set them in the file.path as the number of elements will very based on the length of the path.
You can directly (no need for a strsplit call) use gsub on test to change the separators (with fixed=TRUE so you don't need to escape the double backslash), you will get same output as with file.path:
pdf_path_short <- gsub("\\", "/", test, fixed=TRUE)
pdf_path_short
# "C:/Users/stefanj/Documents/Automation_Desk/script.R"
Of course, you can change the replacement part with whatever separator you need.
Note: you can also check normalizePath function:
normalizePath(test, "/", mustWork=FALSE)
#[1] "C:/Users/stefanj/Documents/Automation_Desk/script.R"

Deletion using gsub in R [duplicate]

I'm trying to use the following code to replace two dots for only one:
test<-"test..1"
gsub("\\..", ".", test, fixed=TRUE)
and getting:
[1] "test..1"
I tried several combinations of escape strings, including brackets [] with no success.
What am I doing wrong?
If you are going to use fixed = TRUE, use the (non-interpreted) character .:
> gsub("..", ".", test, fixed = TRUE)
Otherwise, within regular expressions (fixed = FALSE), . has a special meaning (any character) so you'll want to prefix it with a backslash to mean "the dot character":
> gsub("\\.\\.", ".", test)
> gsub("\\.{2}", ".", test)

Resources