Can R paste() output "\"? - r

As stated in the Intro to R manual,
paste("\\")
prints
[1] "\\"
Is it possible for paste to print out
[1] "\"
?
update: I didn't want Gavin's this nice answer to get stuck in the comments below, so I'll paste it here:
print(xtable(as.matrix("\\citep{citation}")), sanitize.text.function = function(x) {x})

You are confusing how something is stored and how it "prints".
You can use paste to combine a \ with something else, but if you print it then the printed representation will have \ to escape the \, but if you output it to a file or the screen using cat instead, then you get the single \, for example:
> tmp <- paste( "\\", "cite{", sep="" )
> print(tmp)
[1] "\\cite{"
> cat(tmp, "\n")
\cite{

That is the printed representation of a single "\" in R. Clearly the right answer will depend on your end usage, but will something like this do:
> citations <- paste("title", 1:3, sep = "")
> cites <- paste("\\citep{", citations, "}", sep = "")
> writeLines(cites)
\citep{title1}
\citep{title2}
\citep{title3}
Using writeLines() you can output that to a file using something like:
> writeLines(cites, con = file("cites.txt"))
Resulting in the following file:
$ cat cites.txt
\citep{title1}
\citep{title2}
\citep{title3}

One way to do is is to use the write command, e.g.
> write("\\", file="")
\
Write is usually used to write to files, so you need to set file="" to get it to print to STDOUT.
The \ is repeated in the write command so that it doesn't escape the closing quotation mark.
I'm not sure if this is the correct way to do it, but it works for me.
Edit: Realised slightly too late that you were using the paste() command. Hopefully my answer still bears some relevance to your plight. Apologies.

Related

Create a string with special character in R

I have an issue creating a string with a special character. I have asked a similar question and I have also read answers to similar questions about my problem but I am not able to find the solution.
I want to create a string character with a special character. I have been trying with cat but I know it is only for printing, not for saving the string in a variable in R.
I want as a result this:
> cat("C:\\Users\\ppp\\ddd\\")
C:\Users\ppp\ddd\
and I have been trying with paste and collapse but without success:
> x = c("C:","Users","ppp","ddd")
> t <- paste0(x, collapse = '\n')
> t
[1] "C:\nUsers\nppp\nddd"
Are you sure you don't want
x = c("C:","Users","ppp","ddd")
t <- paste0(x, collapse = '/')
t
[1] "C:/Users/ppp/ddd"
R uses this format for setting working directories.
You can also do:
x = c("C:","Users","ppp","ddd")
t <- paste0(x, collapse = '\\')
t
[1] "C:\\Users\\ppp\\ddd"
Although this result looks wrong, if you are using the string in a shell() command in R to be interpreted Windows for example, it will be interpreted correctly
Not Answering... but
t <- paste0(x, collapse = '/')
"C:/Users/ppp/ddd" seems to work on windows.

Interchangeable simulating and writing data to a file

I'm experimenting with R and I try to interchangeably simulate and write data to a file. I tried out many variants for example:
connection<-file("file.txt", open="w")
for (i in 1:2){
X<-runif(3,0,1)
writeLines(as.character(X), con=connection, sep="\n")
}
close(connection)
But what I get is
0.442033957922831
0.0713443560525775
0.950616024667397
0.0807233764789999
0.186026858631521
0.658676357707009
instead of something like
0.442033957922831 0.0713443560525775 0.950616024667397
0.0807233764789999 0.186026858631521 0.658676357707009
Could you explain me what I'm doing wrong?
We can paste the elements in 'X' to a single string and then use sep='\n', otherwise after each element, it is jumping to nextline
connection<-file("file.txt", open="w")
for (i in 1:2){
X<-runif(3,0,1)
writeLines(paste(X, collapse=" "), con=connection, sep="\n")
}
close(connection)
-output
Instead of writing line by line in a for loop we can create the string once and write it in the text file in one-go.
We can use replicate to repeat the runif code n times, paste the numbers row-wise, and paste them again collapsing with a new line character.
temp <- paste0(apply(t(replicate(2, runif(3,0,1))), 1, paste, collapse = ' '),
collapse = '\n')
connection <- file("file.txt")
writeLines(temp, connection)
close(connection)
where temp gives us a string of length one which looks like this :
temp
#[1] "0.406911700032651 0.416268902365118 0.698520892066881\n0.96398281189613 0.834513065638021 0.655840792460367"
which looks in text file as :
cat(temp)
#0.406911700032651 0.416268902365118 0.698520892066881
#0.96398281189613 0.834513065638021 0.655840792460367

Writing to file in R one line after the other

I have the following piece of code to write to an R file one line at a time.
for (i in c(1:10)){
writeLines(as.character(i),file("output.csv"))
}
It just writes 10 presumably over-writing the previous lines. How do I make R append the new line to the existing output? append = TRUE does not work.
append = TRUE does work when using the function cat (instead of writeLines), but only if you give cat a file name, not when you give it a file object: whether a file is being appended to or overwritten is a property of the file object itself, i.e. it needs to be specifried when the file is being opened.
Thus both of these work:
f = file('filename', open = 'a') # open in “a”ppend mode
for (i in 1 : 10) writeLines(i, f)
for (i in 1 : 10) cat(i, '\n', file = 'filename', sep = '', append = TRUE)
Calling file manually is almost never necessary in R.
… but as the other answer shows, you can (and should!) avoid the loop anyway.
You won't need a loop. Use newline escape charater \n as separator instead.
vec <- c(1:10)
writeLines(as.character(vec), file("output.csv"), sep="\n")

R RegEx gsub() Equivalent of "Line Operations>Remove Empty Lines (Containing Blank Characters)" in CSV file

I have a CSV fwith several columns: Tweet, date, etc. The spaces in some Tweets is causing blank lines and undesired truncated lines.
What works:
1. Using Notepad++'s function "Line Operations>Remove Empty Lines (Containing Blank Characters)"
2. Search and replace: \r with nothing.
However, I need to do this for a large number of files, and I can't manage to find a Regular Expression with gsub() in R that will do what the Notepadd++ function does.
Note that replacing ^[ \t]*$\r?\n with nothing and then \r with nothing does work in Notepad++, but not in R, as suggested here, but it does not work with g(sub) in R.
I have tried the following code:
tx <- readLines("tweets.csv")
subbed <-gsub(pattern = "^[ \\t]*$\\r?\\n", replace = "", x = tx)
subbed <-gsub(pattern = "\r", replace = "", x = subbed)
writeLines(subbed, "output.csv")
This is the input:
This is the desired output:
You may use
library(readtext)
tx <- readtext("weets.csv")
subbed <- gsub("(?m)^\\h*\\R?", "", tx$text, perl=TRUE)
subbed <- gsub("\r", "", subbed, fixed=TRUE)
writeLines(trimws(subbed), "output.csv")
The readtext llibrary reads the file into a single variable and thus all line break chars are kept.

How to read \" double-quote escaped values with read.table in R

I am having trouble to read a file containing lines like the one below in R.
"_:b5507F4C7x59005","Fabiana D\"atri"
Any idea? How can I make read.table understand that \" is the escape of quote?
Cheers,
Alexandre
It seems to me that read.table/read.csv cannot handle escaped quotes.
...But I think I have an (ugly) work-around inspired by #nullglob;
First read the file WITHOUT a quote character.
(This won't handle embedded , as #Ben Bolker noted)
Then go though the string columns and remove the quotes:
The test file looks like this (I added a non-string column for good measure):
13,"foo","Fab D\"atri","bar"
21,"foo2","Fab D\"atri2","bar2"
And here is the code:
# Generate test file
writeLines(c("13,\"foo\",\"Fab D\\\"atri\",\"bar\"",
"21,\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")
# Read ignoring quotes
tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)
# Go through and cleanup
for (i in seq_len(NCOL(tbl))) {
if (is.character(tbl[[i]])) {
x <- tbl[[i]]
x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
tbl[[i]] <- gsub('\\\\"', '"', x) # Unescape quotes
}
}
The output is then correct:
> tbl
V1 V2 V3 V4
1 13 foo Fab D"atri bar
2 21 foo2 Fab D"atri2 bar2
On Linux/Unix (or on Windows with cygwin or GnuWin32), you can use sed to convert the escaped double quotes \" to doubled double quotes "" which can be handled well by read.csv:
p <- pipe(paste0('sed \'s/\\\\"/""/g\' "', FILENAME, '"'))
d <- read.csv(p, ...)
rm(p)
Effectively, the following sed command is used to preprocess the CSV input:
sed 's/\\"/""/g' file.csv
I don't call this beautiful, but at least you don't have to leave the R environment...
My apologies ahead of time that this isn't more detailed -- I'm right in the middle of a code crunch.
You might consider using the scan() function. I created a simple sample file "sample.csv," which consists of:
V1,V2
"_:b5507F4C7x59005","Fabiana D\"atri"
Two quick possibilities are (with output commented so you can copy-paste to the command line):
test <- scan("sample.csv", sep=",", what='character',allowEscapes=TRUE)
## Read 4 items
test
##[1] "V1" "V2" "_:b5507F4C7x59005"
##[4] "Fabiana D\\atri\n"
or
test <- scan("sample.csv", sep=",", what='character',comment.char="\\")
## Read 4 items
test
## [1] "V1" "V2" "_:b5507F4C7x59005"
## [4] "Fabiana D\\atri\n"
You'll probably need to play around with it a little more to get what you want. And I see that you've already mentioned writeLines, so you may have already tried this. Either way, good luck!
I was able to get your eample to work by setting the quote argument:
> read.csv('test.csv',quote="'",head=FALSE)
V1 V2
1 "_:b5507F4C7x59005" "Fabiana D\\"atri"
2 "_:b5507F4C7x59005" "Fabiana D\\"atri"
read_delim from package readr can handle escaped and doubled double quotes, using the arguments escape_double and escape_backslash.
For example, if our file escapes quotes by doubling them:
"quote""","hello"
1,2
then we use
read_delim(file, delim=',') # default escape_backslash=FALSE, escape_double=TRUE
If our file escapes quotes with a backslash:
"quote\"","hello"
1,2
we use
read_delim(file, delim=',', escape_double=FALSE, escape_backslash=TRUE)
As of newer R versions, readr::read_delim() is the correct answer.
data = read_delim(filename, delim = "\t", quote = "\"",
escape_backslash=T, escape_double=F,
# The columns depend on your data
col_names = c("timeStart", "posEnd", "added", "removed"),
col_types = "nncc"
)
This should be fine with read.csv(). Take a look at the help for ?read.csv - the option for specifying the quote is quote = "....". In this case, though, there may be a problem: it seems that read.csv() prefers to see matching quotes.
I tried the same with read.table("sample.txt", header = FALSE, as.is = TRUE), with your text in sample.txt, and it seems to work. When all else fails with read.csv(), I tend to back up to read.table() and specify the parameters carefully.

Resources