R - Concatenating strings with escape characters - r

I am trying to create a multi-line string in R out of several other strings and adding line breaks with /n. When I use paste I do not get a new line i simply get "\n" concatenated in with my strings.
x <- "hello"
y <- "world
paste(x,"\n",y)
# [1] "hello \n world"
I would line to get something along the lines of:
hello
world
I know that cat would output the correct answer to the console but I would like to store this multi-line string in a variable.
Any help would be greatly appreciated. Thank you.

Related

R: remove every word that ends with ".htm"

I have a df = desc with a variable "value" that holds long text and would like to remove every word in that variable that ends with ".htm" . I looked for a long time around here and regex expressions and cannot find a solution.
Can anyone help? Thank you so much!
I tried things like:
library(stringr)
desc <- str_replace_all(desc$value, "\*.htm*$", "")
But I get:
Error: '\*' is an unrecognized escape in character string starting ""\*"
This regex:
Will Catch all that ends with .htm
Will not catch instances with .html
Is not dependent on being in the beginning / end of a string.
strings <- c("random text shouldbematched.htm notremoved.html matched.htm random stuff")
gsub("\\w+\\.htm\\b", "", strings)
Output:
[1] "random text notremoved.html random stuff"
I am not sure what exactly you would like to accomplish, but I guess one of those is what you are looking for:
words <- c("apple", "test.htm", "friend.html", "remove.htm")
# just replace the ".htm" from every string
str_replace_all(words, ".htm", "")
# exclude all words that contains .htm anywhere
words[!grepl(pattern = ".htm", words)]
# exlude all words that END with .htm
words[substr(words, nchar(words)-3, nchar(words)) != ".htm"]
I am not sure if you can use * to tell R to consider any value inside a string, so I would first remove it. Also, in your code you are setting a change in your variable "value" to replace the entire df.
So I would suggest the following:
desc$value <- str_replace(desc$value, ".htm", "")
By doing so, you are telling R to remove all .htm that you have in the desc$value variable alone. I hope it works!
Let's assume you have, as you say, a variable "value" that holds long text and you want to remove every word that ends in .html. Based on these assumptions you can use str_remove all:
The main point here is to wrap the pattern into word boundary markers \\b:
library(stringr)
str_remove_all(value, "\\b\\w+\\.html\\b")
[1] "apple and test2.html01" "the word must etc. and as well" "we want to remove .htm"
Data:
value <- c("apple test.html and test2.html01",
"the word friend.html must etc. and x.html as well",
"we want to remove .htm")
To achieve what you want just do:
desc$value <- str_replace(desc$value, ".*\\.htm$", "")
You are trying to escape the star and it is useless. You get an error because \* does not exist in R strings. You just have \n, \t etc...
\. does not exist either in R strings. But \\ exists and it produces a single \ in the resulting string used for the regular expression. Therefore, when you escape something in a R regexp you have to escape it twice:
In my regexp: .* means any chars and \\. means a real dot. I have to escape it twice because \ needs to be escape first from the R string.

Deleting everything after after a certain string in R

I have some data in an object called all_lines that is a character class in R (as a result of reading into R a PDF file). My objective: to delete everything before a certain string and delete everything after another string.
The data looks like this and it is stored in the object all_lines
class(all_lines)
"character"
[1] LABORATORY Research Cover Sheet
[2] Number 201111EZ Title Maximizing throughput"
[3] " in Computers
[4] Start Date 01/15/2000
....
[49] Introduction
[50] Some text here and there
[51] Look more text
....
[912] Citations
[913] Author_1 Paper or journal
[914] Author_2 Book chapter
I want to delete everything before the string 'Introduction' and everything after 'Citations'. However, nothing I find seems to do the trick. I have tried the following commands from these posts: How to delete everything after a matching string in R and multiple on-line R tutorials on how to do just this. Here are some commands that I have tried and all I get is the string 'Introduction' deleted in the all_lines with everything else returned.
str_remove(all_lines, "^.*(?=(Introduction))")
sub(".*Introduction", "", all_lines)
gsub(".*Introduction", "", all_lines)
I have also tried to delete everything after the string 'Citations' using the same commands, such as:
sub("Citations.*", "", all_lines)
Am I missing anything? Any help would really be appreciated!
It looks like your variable is vector of character strings. One element per line in the document.
We can use the grep() function here to locate the lines containing the desired text. I am assuming only 1 line contains "Introduction" and only 1 line contains "Citations"
#line numbers containing the start and end
Intro <- grep("Introduction", all_lines)
Citation <- grep("Citations", all_lines)
#extract out the desired portion.
abridged <- all_lines[Intro:Citation]
You may need to add 1 or substract 1 if you would like to actually remove the "Introduction" or "Citations" line.
Assuming you can accept a single string as output, you could collapse the input into a single string and then use gsub():
all_lines <- paste(all_lines, collapse = " ")
output <- gsub("^.*?(?=\\bIntroduction\\b)|(?<=\\bCitations\\b).*$", "", all_lines)

In R, Is there a way to break long string literals? [duplicate]

I want to split a line in an R script over multiple lines (because it is too long). How do I do that?
Specifically, I have a line such as
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/then/some/more')
Is it possible to split the long path over multiple lines? I tried
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/
then/some/more')
with return key at the end of the first line; but that does not work.
Thanks.
Bah, comments are too small. Anyway, #Dirk is very right.
R doesn't need to be told the code starts at the next line. It is smarter than Python ;-) and will just continue to read the next line whenever it considers the statement as "not finished". Actually, in your case it also went to the next line, but R takes the return as a character when it is placed between "".
Mind you, you'll have to make sure your code isn't finished. Compare
a <- 1 + 2
+ 3
with
a <- 1 + 2 +
3
So, when spreading code over multiple lines, you have to make sure that R knows something is coming, either by :
leaving a bracket open, or
ending the line with an operator
When we're talking strings, this still works but you need to be a bit careful. You can open the quotation marks and R will read on until you close it. But every character, including the newline, will be seen as part of the string :
x <- "This is a very
long string over two lines."
x
## [1] "This is a very\nlong string over two lines."
cat(x)
## This is a very
## long string over two lines.
That's the reason why in this case, your code didn't work: a path can't contain a newline character (\n). So that's also why you better use the solution with paste() or paste0() Dirk proposed.
You are not breaking code over multiple lines, but rather a single identifier. There is a difference.
For your issue, try
R> setwd(paste("~/a/very/long/path/here",
"/and/then/some/more",
"/and/then/some/more",
"/and/then/some/more", sep=""))
which also illustrates that it is perfectly fine to break code across multiple lines.
Dirk's method above will absolutely work, but if you're looking for a way to bring in a long string where whitespace/structure is important to preserve (example: a SQL query using RODBC) there is a two step solution.
1) Bring the text string in across multiple lines
long_string <- "this
is
a
long
string
with
whitespace"
2) R will introduce a bunch of \n characters. Strip those out with strwrap(), which destroys whitespace, per the documentation:
strwrap(long_string, width=10000, simplify=TRUE)
By telling strwrap to wrap your text to a very, very long row, you get a single character vector with no whitespace/newline characters.
For that particular case there is file.path :
File <- file.path("~",
"a",
"very",
"long",
"path",
"here",
"that",
"goes",
"beyond",
"80",
"characters",
"and",
"then",
"some",
"more")
setwd(File)
The glue::glue function can help. You can write a string on multiple lines in a script but remove the line breaks from the string object by ending each line with \\:
glue("some\\
thing")
something
I know this post is old, but I had a Situation like this and just want to share my solution. All the answers above work fine. But if you have a Code such as those in data.table chaining Syntax it becomes abit challenging. e.g. I had a Problem like this.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][, Rain:=tstrsplit(files$file, "/")[1:4][[2]]][, Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][, Geom:=tstrsplit(files$file, "/")[1:4][[4]]][time_[s]<=12000]
I tried most of the suggestions above and they didn´t work. but I figured out that they can be split after the comma within []. Splitting at ][ doesn´t work.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][,
Rain:=tstrsplit(files$file, "/")[1:4][[2]]][,
Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][,
Geom:=tstrsplit(files$file, "/")[1:4][[4]]][`time_[s]`<=12000]
There is no coinvent way to do this because there is no operator in R to do string concatenation.
However, you can define a R Infix operator to do string concatenation:
`%+%` = function(x,y) return(paste0(x,y))
Then you can use it to concat strings, even break the code to multiple lines:
s = "a" %+%
"b" %+%
"c"
This will give you "abc".
This will keep the \n character, but you can also just wrap the quote in parentheses. Especially useful in RMarkdown.
t <- ("
this is a long
string
")
I do this all of the time. Use the paste0() function.
Rootdir = "/myhome/thisproject/part1/"
Subdir = "subdirectory1/subsubdir2/"
fullpath = paste0( Rootdir, Subdir )
fullpath
> fullpath
[1] "/myhome/thisproject/part1/subdirectory1/subsubdir2/"

use newline as collapse parameter in paste and store result via lapply [duplicate]

How do I use the new line character in R?
myStringVariable <- "Very Nice ! I like";
myStringVariabel <- paste(myStringVariable, "\n", sep="");
The above code DOESN'T work
P.S There's significant challenges when googling this kind of stuff since the query "R new line character" does seem to confuse google. I really wish R had a different name.
The nature of R means that you're never going to have a newline in a character vector when you simply print it out.
> print("hello\nworld\n")
[1] "hello\nworld\n"
That is, the newlines are in the string, they just don't get printed as new lines. However, you can use other functions if you want to print them, such as cat:
> cat("hello\nworld\n")
hello
world
You can also use writeLines.
> writeLines("hello\nworld")
hello
world
And also:
> writeLines(c("hello","world"))
hello
world
Example on NewLine Char:
for (i in 1:5)
{
for (j in 1:i)
{
cat(j)
}
cat("\n")
}
Result:
1
12
123
1234
12345

Split code over multiple lines in an R script

I want to split a line in an R script over multiple lines (because it is too long). How do I do that?
Specifically, I have a line such as
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/then/some/more')
Is it possible to split the long path over multiple lines? I tried
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/
then/some/more')
with return key at the end of the first line; but that does not work.
Thanks.
Bah, comments are too small. Anyway, #Dirk is very right.
R doesn't need to be told the code starts at the next line. It is smarter than Python ;-) and will just continue to read the next line whenever it considers the statement as "not finished". Actually, in your case it also went to the next line, but R takes the return as a character when it is placed between "".
Mind you, you'll have to make sure your code isn't finished. Compare
a <- 1 + 2
+ 3
with
a <- 1 + 2 +
3
So, when spreading code over multiple lines, you have to make sure that R knows something is coming, either by :
leaving a bracket open, or
ending the line with an operator
When we're talking strings, this still works but you need to be a bit careful. You can open the quotation marks and R will read on until you close it. But every character, including the newline, will be seen as part of the string :
x <- "This is a very
long string over two lines."
x
## [1] "This is a very\nlong string over two lines."
cat(x)
## This is a very
## long string over two lines.
That's the reason why in this case, your code didn't work: a path can't contain a newline character (\n). So that's also why you better use the solution with paste() or paste0() Dirk proposed.
You are not breaking code over multiple lines, but rather a single identifier. There is a difference.
For your issue, try
R> setwd(paste("~/a/very/long/path/here",
"/and/then/some/more",
"/and/then/some/more",
"/and/then/some/more", sep=""))
which also illustrates that it is perfectly fine to break code across multiple lines.
Dirk's method above will absolutely work, but if you're looking for a way to bring in a long string where whitespace/structure is important to preserve (example: a SQL query using RODBC) there is a two step solution.
1) Bring the text string in across multiple lines
long_string <- "this
is
a
long
string
with
whitespace"
2) R will introduce a bunch of \n characters. Strip those out with strwrap(), which destroys whitespace, per the documentation:
strwrap(long_string, width=10000, simplify=TRUE)
By telling strwrap to wrap your text to a very, very long row, you get a single character vector with no whitespace/newline characters.
For that particular case there is file.path :
File <- file.path("~",
"a",
"very",
"long",
"path",
"here",
"that",
"goes",
"beyond",
"80",
"characters",
"and",
"then",
"some",
"more")
setwd(File)
The glue::glue function can help. You can write a string on multiple lines in a script but remove the line breaks from the string object by ending each line with \\:
glue("some\\
thing")
something
I know this post is old, but I had a Situation like this and just want to share my solution. All the answers above work fine. But if you have a Code such as those in data.table chaining Syntax it becomes abit challenging. e.g. I had a Problem like this.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][, Rain:=tstrsplit(files$file, "/")[1:4][[2]]][, Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][, Geom:=tstrsplit(files$file, "/")[1:4][[4]]][time_[s]<=12000]
I tried most of the suggestions above and they didn´t work. but I figured out that they can be split after the comma within []. Splitting at ][ doesn´t work.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][,
Rain:=tstrsplit(files$file, "/")[1:4][[2]]][,
Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][,
Geom:=tstrsplit(files$file, "/")[1:4][[4]]][`time_[s]`<=12000]
There is no coinvent way to do this because there is no operator in R to do string concatenation.
However, you can define a R Infix operator to do string concatenation:
`%+%` = function(x,y) return(paste0(x,y))
Then you can use it to concat strings, even break the code to multiple lines:
s = "a" %+%
"b" %+%
"c"
This will give you "abc".
This will keep the \n character, but you can also just wrap the quote in parentheses. Especially useful in RMarkdown.
t <- ("
this is a long
string
")
I do this all of the time. Use the paste0() function.
Rootdir = "/myhome/thisproject/part1/"
Subdir = "subdirectory1/subsubdir2/"
fullpath = paste0( Rootdir, Subdir )
fullpath
> fullpath
[1] "/myhome/thisproject/part1/subdirectory1/subsubdir2/"

Resources