This is probably trivial, but I have failed to find any question referring to this exact issue.
My issue does not have to do with coming up with a suitable regex, it has to do with accurately specifying the replacement part.
x = "file_path/file_name.txt" - this is what I have
# "file_path\file_name.txt" - this is what I want
Here is what I tried:
library(stringr)
str_detect(string = x, pattern = "/") # returns TRUE, as expected
#str_replace_all(string = x, pattern = "/", replacement = "\") # fails, R believes I'm escaping the quote in the replacement
str_replace_all(string = x, pattern = "/", replacement = "\\") # this results to "file_pathfile_name.txt", missing the backslash altogether
str_replace_all(string = x, pattern = "/", replacement = "\\\\") # this results to "file_path\\file_name.txt", which is not what I want
Any help would be greatly appreciated.
The solution is to escape the escape character which means 4 '\' in the end.
cat(gsub('/', '\\\\', "file_path/file_name.txt"))
Look at the difference between your standard output with like 'print()' which escapes the escape character, or get the plain string by using 'cat()'.
str_replace_all(string = x, pattern = "/", replacement = "\\\\")
cat(str_replace_all(string = x, pattern = "/", replacement = "\\\\"))
Related
I have a simple R script:
file1 <- read.csv2("D:/Home/file1.csv", strip.white = TRUE, header = FALSE)
file2 <- read.csv2("D:/Home/file2.csv", strip.white = TRUE, header = FALSE)
df <- merge(file1, file2, by.x = c(2), by.y = c(1))
df2 <- data.frame(new_col = paste('"', df$V2, '#', df$V1, '#', df$V2.y, '",', sep = ""))
write.table(df2, append = FALSE, file = outFile, sep = "#", quote = FALSE, row.names = FALSE, col.names = FALSE)
File 1 is like this:
100;folder/path/myfile.mp3
101;folder/path/anotherfile.mp3
102;folder/path/finalfile.mp3
File 2 is like this:
folder\path\myfile;64
folder\path\anotherfile;58
folder\path\finalfile;34
So my script merges file 1 with file 2 based on the path column (second column in file 1 and 1st column in file 2). It does this fine if both files have forward slashes for each row.
The problem is that file 1 has forward slashes and file 2 has backslashes so the merge isn't working.
How do I make it so that the merge will work given that they both use different slashes? In other words, how can I convert all of file2 to use forward slashes prior to the merge? I need the final result to use forward slashes, not backslashes.
I have looked through lots of other questions and answers and replacing backslashes to forward slashes has been asked before but only on strings. I can't find a question asking how to replace every slash in the whole source CSV file. So I don't believe this is a duplicate.
Many thanks.
This should work:
file2$column = gsub(pattern = "\\\\", replacement = "/", x = file2$column)
Replace column in my code with whatever the column name is.
Another regex could be the following.
x <- 'a\\b\\c'
gsub('[\\]', '/', x)
#[1] "a/b/c"
Or, using argument fixed = TRUE,
gsub('\\', '/', x, fixed = TRUE)
#[1] "a/b/c"
Now it's a matter of applying the above to the column(s) of the dataframe.
I'd like to remove all elements of a string after a certain keyword.
Example :
this.is.an.example.string.that.I.have
Desired Output :
This.is.an.example
I've tried using gsub('string', '', list) but that only removes the word string. I've also tried using the gsub('^string', '', list) but that also doesn't seem to work.
Thank you.
Following simple sub may help you here.
sub("\\.string.*","",variable)
Explanation: Method of using sub
sub(regex_to_replace_text_in_variable,new_value,variable)
Difference between sub and gsub:
sub: is being used for performing substitution on variables.
gsub: gsub is being used for same substitution tasks only but only thing it will be perform substitution on ALL matches found though sub performs it only for first match found one.
From help page of R:
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
You can try this positive lookbehind regex
S <- 'this.is.an.example.string.that.I.have'
gsub('(?<=example).*', '', S, perl=TRUE)
# 'this.is.an.example'
You can use strsplit. Here you split your string after a key word, and retain the first part of the string.
x <- "this.is.an.example.string.that.I.have"
strsplit(x, '(?<=example)', perl=T)[[1]][1]
[1] "this.is.an.example"
I have the following string:
test <- "C:\\Users\\stefanj\\Documents\\Automation_Desk\\script.R"
I am separating the string on the backslash characters with the following code:
pdf_path_long <- unlist(strsplit(test, "\\\\",
fixed = FALSE, perl = FALSE, useBytes = FALSE))
What I want to do is:
pdf_path_short <- file.path(pdf_path_long[1], pdf_path_long[2], ...)
Problem is:
I know how to count the elements in the pdf_path_short - length(pdf_path_long), but I don't know how to set them in the file.path as the number of elements will very based on the length of the path.
You can directly (no need for a strsplit call) use gsub on test to change the separators (with fixed=TRUE so you don't need to escape the double backslash), you will get same output as with file.path:
pdf_path_short <- gsub("\\", "/", test, fixed=TRUE)
pdf_path_short
# "C:/Users/stefanj/Documents/Automation_Desk/script.R"
Of course, you can change the replacement part with whatever separator you need.
Note: you can also check normalizePath function:
normalizePath(test, "/", mustWork=FALSE)
#[1] "C:/Users/stefanj/Documents/Automation_Desk/script.R"
I've written a function, which extracts coordinate numbers from strings. E.g. "E 10,9598 °" will be 10.9598.
extract_coordinates <- function(x) {
coord <- gsub(x = x, pattern = "[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]",
replacement = "")
coord <- gsub(x = coord, pattern = "°", replacement = "")
coord <- gsub(x = coord, pattern = "[:space:]", replacement = "")
coord <- gsub(x = coord, pattern = ",", replacement = ".")
as.numeric(coord)
}
When I run devtools::check() though this will give me a warning because "°" is a non-ascii character. I tried using the unicode "U+00B0" as a pattern in gsub but that doesn't work.
How do I have to change my code, so there is no warning anymore?
You can use charToRaw("°") to get the \uxxxx escape code, and then use that in the R code. For example, I have code that uses ã in the word Não. To get through devtools::check(), this is needed:
charToRaw("ã") # answer is \u00a3
Then, Não becomes N\u00a3o in my code, and problem solved.
I'm trying to use the following code to replace two dots for only one:
test<-"test..1"
gsub("\\..", ".", test, fixed=TRUE)
and getting:
[1] "test..1"
I tried several combinations of escape strings, including brackets [] with no success.
What am I doing wrong?
If you are going to use fixed = TRUE, use the (non-interpreted) character .:
> gsub("..", ".", test, fixed = TRUE)
Otherwise, within regular expressions (fixed = FALSE), . has a special meaning (any character) so you'll want to prefix it with a backslash to mean "the dot character":
> gsub("\\.\\.", ".", test)
> gsub("\\.{2}", ".", test)