Convert names as vector in R - r

I have more than 100 name-items on my R Script without " " and "," between them. I want to make a vector from them.
AWE XYA Name3 WERFS XYAGD ...... DSFSF
The vector should be
vec <- c("AWE", "XYA" ,"Name3" ,"WERFS" ,"XYAGD" ...... ,"DSFSF")
Instead of manually entering " " and ,. Is there a way to automate this?

If you want to do that from Rstudio, you have some solutions here.
You also have a Rstudio addin to put quotation mark around words:
remotes::install_github("hrbrmstr/hrbraddins")
See there or there. After putting quotation marks, you can do a find and replace after selecting the area in the script to transform " into ",

Assuming the file in which this is stored is called temp.R, you can use scan to get a character vector. This will also work if you have text (.txt) file.
vec <- scan('temp.R', what = "character", quiet = TRUE)

Related

R: remove every word that ends with ".htm"

I have a df = desc with a variable "value" that holds long text and would like to remove every word in that variable that ends with ".htm" . I looked for a long time around here and regex expressions and cannot find a solution.
Can anyone help? Thank you so much!
I tried things like:
library(stringr)
desc <- str_replace_all(desc$value, "\*.htm*$", "")
But I get:
Error: '\*' is an unrecognized escape in character string starting ""\*"
This regex:
Will Catch all that ends with .htm
Will not catch instances with .html
Is not dependent on being in the beginning / end of a string.
strings <- c("random text shouldbematched.htm notremoved.html matched.htm random stuff")
gsub("\\w+\\.htm\\b", "", strings)
Output:
[1] "random text notremoved.html random stuff"
I am not sure what exactly you would like to accomplish, but I guess one of those is what you are looking for:
words <- c("apple", "test.htm", "friend.html", "remove.htm")
# just replace the ".htm" from every string
str_replace_all(words, ".htm", "")
# exclude all words that contains .htm anywhere
words[!grepl(pattern = ".htm", words)]
# exlude all words that END with .htm
words[substr(words, nchar(words)-3, nchar(words)) != ".htm"]
I am not sure if you can use * to tell R to consider any value inside a string, so I would first remove it. Also, in your code you are setting a change in your variable "value" to replace the entire df.
So I would suggest the following:
desc$value <- str_replace(desc$value, ".htm", "")
By doing so, you are telling R to remove all .htm that you have in the desc$value variable alone. I hope it works!
Let's assume you have, as you say, a variable "value" that holds long text and you want to remove every word that ends in .html. Based on these assumptions you can use str_remove all:
The main point here is to wrap the pattern into word boundary markers \\b:
library(stringr)
str_remove_all(value, "\\b\\w+\\.html\\b")
[1] "apple and test2.html01" "the word must etc. and as well" "we want to remove .htm"
Data:
value <- c("apple test.html and test2.html01",
"the word friend.html must etc. and x.html as well",
"we want to remove .htm")
To achieve what you want just do:
desc$value <- str_replace(desc$value, ".*\\.htm$", "")
You are trying to escape the star and it is useless. You get an error because \* does not exist in R strings. You just have \n, \t etc...
\. does not exist either in R strings. But \\ exists and it produces a single \ in the resulting string used for the regular expression. Therefore, when you escape something in a R regexp you have to escape it twice:
In my regexp: .* means any chars and \\. means a real dot. I have to escape it twice because \ needs to be escape first from the R string.

Replace \" with ' in R

I'm working with CSV files and the problem is that some rows have columns containing \" inside. A simple example would be:
"Row 42"; "Some value"; "Description: \"xyz\""; "Anoher value"
As you can see, the third column contains that combination and when I use the read_csv method in R, the input format is messed up. One working solution is to open the CSV file in Notepad++ and simply replace \" with ', for example. However, I'd prefer to have this automated.
I'm able to replace the \" with ' by using
gsub('\\\\"', "\\\'", df)
However, I'm not able to write it in the original format. Whenever I read the CSV file with R, I lose the quotation marks indicating the columns. So, in other words, my current method outputs the following:
"Row 42; Some value; Description: 'xyz'; Anoher value"
The quotation marks before and after ; are missing.
It's almost fine, but when opening the preprocessed file with Excel, it doesn't recongize the columns. I think the most convenient solution would be to read the CSV file simply as one big string containing all the quotation marks, replacing the desired combination explained above and then write it again. However, I'm not able to read the file as one big string containing all the quotation marks.
Is there a way to read the CSV file with R containing all the quotation marks? Do you have any other solutions to achieve that?
Already tried read.table? It comes with the base installation of R.
Define sep=';' as the separator and use nothing as quotes, quotes=''. Then gsub the redundant quotes away and do trimws. This should fix your data.
x <- '"Row 42"; "Some value;" "Description: \"xyz\""; "Anoher value"'
tab <- read.table(text=x, sep=';', quote='')
tab[] <- lapply(tab, \(x) trimws(gsub(x, pat='\\"', rep='')))
tab
# V1 V2 V3 V4
# 1 Row 42 Some value Description: xyz Anoher value
In your case use read.table(file='<path to .csv file>', sep=';', quote='')
I found the solution, if anyone else faces the same problem:
data <- read_lines(inputFileName)
preprocessed <- gsub('\\\\"', "\\\'", data)
write_lines(preprocessed, outputFileName)

In R, Is there a way to break long string literals? [duplicate]

I want to split a line in an R script over multiple lines (because it is too long). How do I do that?
Specifically, I have a line such as
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/then/some/more')
Is it possible to split the long path over multiple lines? I tried
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/
then/some/more')
with return key at the end of the first line; but that does not work.
Thanks.
Bah, comments are too small. Anyway, #Dirk is very right.
R doesn't need to be told the code starts at the next line. It is smarter than Python ;-) and will just continue to read the next line whenever it considers the statement as "not finished". Actually, in your case it also went to the next line, but R takes the return as a character when it is placed between "".
Mind you, you'll have to make sure your code isn't finished. Compare
a <- 1 + 2
+ 3
with
a <- 1 + 2 +
3
So, when spreading code over multiple lines, you have to make sure that R knows something is coming, either by :
leaving a bracket open, or
ending the line with an operator
When we're talking strings, this still works but you need to be a bit careful. You can open the quotation marks and R will read on until you close it. But every character, including the newline, will be seen as part of the string :
x <- "This is a very
long string over two lines."
x
## [1] "This is a very\nlong string over two lines."
cat(x)
## This is a very
## long string over two lines.
That's the reason why in this case, your code didn't work: a path can't contain a newline character (\n). So that's also why you better use the solution with paste() or paste0() Dirk proposed.
You are not breaking code over multiple lines, but rather a single identifier. There is a difference.
For your issue, try
R> setwd(paste("~/a/very/long/path/here",
"/and/then/some/more",
"/and/then/some/more",
"/and/then/some/more", sep=""))
which also illustrates that it is perfectly fine to break code across multiple lines.
Dirk's method above will absolutely work, but if you're looking for a way to bring in a long string where whitespace/structure is important to preserve (example: a SQL query using RODBC) there is a two step solution.
1) Bring the text string in across multiple lines
long_string <- "this
is
a
long
string
with
whitespace"
2) R will introduce a bunch of \n characters. Strip those out with strwrap(), which destroys whitespace, per the documentation:
strwrap(long_string, width=10000, simplify=TRUE)
By telling strwrap to wrap your text to a very, very long row, you get a single character vector with no whitespace/newline characters.
For that particular case there is file.path :
File <- file.path("~",
"a",
"very",
"long",
"path",
"here",
"that",
"goes",
"beyond",
"80",
"characters",
"and",
"then",
"some",
"more")
setwd(File)
The glue::glue function can help. You can write a string on multiple lines in a script but remove the line breaks from the string object by ending each line with \\:
glue("some\\
thing")
something
I know this post is old, but I had a Situation like this and just want to share my solution. All the answers above work fine. But if you have a Code such as those in data.table chaining Syntax it becomes abit challenging. e.g. I had a Problem like this.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][, Rain:=tstrsplit(files$file, "/")[1:4][[2]]][, Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][, Geom:=tstrsplit(files$file, "/")[1:4][[4]]][time_[s]<=12000]
I tried most of the suggestions above and they didn´t work. but I figured out that they can be split after the comma within []. Splitting at ][ doesn´t work.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][,
Rain:=tstrsplit(files$file, "/")[1:4][[2]]][,
Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][,
Geom:=tstrsplit(files$file, "/")[1:4][[4]]][`time_[s]`<=12000]
There is no coinvent way to do this because there is no operator in R to do string concatenation.
However, you can define a R Infix operator to do string concatenation:
`%+%` = function(x,y) return(paste0(x,y))
Then you can use it to concat strings, even break the code to multiple lines:
s = "a" %+%
"b" %+%
"c"
This will give you "abc".
This will keep the \n character, but you can also just wrap the quote in parentheses. Especially useful in RMarkdown.
t <- ("
this is a long
string
")
I do this all of the time. Use the paste0() function.
Rootdir = "/myhome/thisproject/part1/"
Subdir = "subdirectory1/subsubdir2/"
fullpath = paste0( Rootdir, Subdir )
fullpath
> fullpath
[1] "/myhome/thisproject/part1/subdirectory1/subsubdir2/"

How to make R stop reading rows in a text file at a line containing a specific character?

For example, I want to read lines from the beginning of a text file up to a string with ";" symbol excluding this string.
Thanks a lot.
A very simple approach might be to read the contents of the using readLines:
content = readLines("data.txt")
And then split the character data on the ;:
split_content = strsplit(content, split = ";")
And then extract the first elememt, i.e. the text up to the semicolon:
first_element = lapply(split_content, "[[", 1]
The result is a list of all the text in the rows of the data file up to the semicolon.
Ps I'm not entirely sure about the last line...I can't check it as I've got no access to R right now.

Split code over multiple lines in an R script

I want to split a line in an R script over multiple lines (because it is too long). How do I do that?
Specifically, I have a line such as
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/then/some/more')
Is it possible to split the long path over multiple lines? I tried
setwd('~/a/very/long/path/here/that/goes/beyond/80/characters/and/
then/some/more')
with return key at the end of the first line; but that does not work.
Thanks.
Bah, comments are too small. Anyway, #Dirk is very right.
R doesn't need to be told the code starts at the next line. It is smarter than Python ;-) and will just continue to read the next line whenever it considers the statement as "not finished". Actually, in your case it also went to the next line, but R takes the return as a character when it is placed between "".
Mind you, you'll have to make sure your code isn't finished. Compare
a <- 1 + 2
+ 3
with
a <- 1 + 2 +
3
So, when spreading code over multiple lines, you have to make sure that R knows something is coming, either by :
leaving a bracket open, or
ending the line with an operator
When we're talking strings, this still works but you need to be a bit careful. You can open the quotation marks and R will read on until you close it. But every character, including the newline, will be seen as part of the string :
x <- "This is a very
long string over two lines."
x
## [1] "This is a very\nlong string over two lines."
cat(x)
## This is a very
## long string over two lines.
That's the reason why in this case, your code didn't work: a path can't contain a newline character (\n). So that's also why you better use the solution with paste() or paste0() Dirk proposed.
You are not breaking code over multiple lines, but rather a single identifier. There is a difference.
For your issue, try
R> setwd(paste("~/a/very/long/path/here",
"/and/then/some/more",
"/and/then/some/more",
"/and/then/some/more", sep=""))
which also illustrates that it is perfectly fine to break code across multiple lines.
Dirk's method above will absolutely work, but if you're looking for a way to bring in a long string where whitespace/structure is important to preserve (example: a SQL query using RODBC) there is a two step solution.
1) Bring the text string in across multiple lines
long_string <- "this
is
a
long
string
with
whitespace"
2) R will introduce a bunch of \n characters. Strip those out with strwrap(), which destroys whitespace, per the documentation:
strwrap(long_string, width=10000, simplify=TRUE)
By telling strwrap to wrap your text to a very, very long row, you get a single character vector with no whitespace/newline characters.
For that particular case there is file.path :
File <- file.path("~",
"a",
"very",
"long",
"path",
"here",
"that",
"goes",
"beyond",
"80",
"characters",
"and",
"then",
"some",
"more")
setwd(File)
The glue::glue function can help. You can write a string on multiple lines in a script but remove the line breaks from the string object by ending each line with \\:
glue("some\\
thing")
something
I know this post is old, but I had a Situation like this and just want to share my solution. All the answers above work fine. But if you have a Code such as those in data.table chaining Syntax it becomes abit challenging. e.g. I had a Problem like this.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][, Rain:=tstrsplit(files$file, "/")[1:4][[2]]][, Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][, Geom:=tstrsplit(files$file, "/")[1:4][[4]]][time_[s]<=12000]
I tried most of the suggestions above and they didn´t work. but I figured out that they can be split after the comma within []. Splitting at ][ doesn´t work.
mass <- files[, Veg:=tstrsplit(files$file, "/")[1:4][[1]]][,
Rain:=tstrsplit(files$file, "/")[1:4][[2]]][,
Roughness:=tstrsplit(files$file, "/")[1:4][[3]]][,
Geom:=tstrsplit(files$file, "/")[1:4][[4]]][`time_[s]`<=12000]
There is no coinvent way to do this because there is no operator in R to do string concatenation.
However, you can define a R Infix operator to do string concatenation:
`%+%` = function(x,y) return(paste0(x,y))
Then you can use it to concat strings, even break the code to multiple lines:
s = "a" %+%
"b" %+%
"c"
This will give you "abc".
This will keep the \n character, but you can also just wrap the quote in parentheses. Especially useful in RMarkdown.
t <- ("
this is a long
string
")
I do this all of the time. Use the paste0() function.
Rootdir = "/myhome/thisproject/part1/"
Subdir = "subdirectory1/subsubdir2/"
fullpath = paste0( Rootdir, Subdir )
fullpath
> fullpath
[1] "/myhome/thisproject/part1/subdirectory1/subsubdir2/"

Resources