How to convert ending line characters from Windows to Unix using R - r

I wonder how to convert ending line characters from Windows to Unix using R.
I saw in another post that it's possible using the script write(), but when I try that, it doesn't work(it returns an empty file). Instead, I'd like to use the write.table() command, if it's possible.

Let's write some text:
library(readr)
text <- c("line one", "line two")
write_lines(text, file = "text.linux.txt", sep = "\n")
write_lines(text, file = "text.macos.txt", sep = "\r")
write_lines(text, file = "text.windows.txt", sep = "\r\n")
There are similar options e.g. eol in write.table and write_csv to set the end of line characters.

Related

Writing to file in R one line after the other

I have the following piece of code to write to an R file one line at a time.
for (i in c(1:10)){
writeLines(as.character(i),file("output.csv"))
}
It just writes 10 presumably over-writing the previous lines. How do I make R append the new line to the existing output? append = TRUE does not work.
append = TRUE does work when using the function cat (instead of writeLines), but only if you give cat a file name, not when you give it a file object: whether a file is being appended to or overwritten is a property of the file object itself, i.e. it needs to be specifried when the file is being opened.
Thus both of these work:
f = file('filename', open = 'a') # open in “a”ppend mode
for (i in 1 : 10) writeLines(i, f)
for (i in 1 : 10) cat(i, '\n', file = 'filename', sep = '', append = TRUE)
Calling file manually is almost never necessary in R.
… but as the other answer shows, you can (and should!) avoid the loop anyway.
You won't need a loop. Use newline escape charater \n as separator instead.
vec <- c(1:10)
writeLines(as.character(vec), file("output.csv"), sep="\n")

R RegEx gsub() Equivalent of "Line Operations>Remove Empty Lines (Containing Blank Characters)" in CSV file

I have a CSV fwith several columns: Tweet, date, etc. The spaces in some Tweets is causing blank lines and undesired truncated lines.
What works:
1. Using Notepad++'s function "Line Operations>Remove Empty Lines (Containing Blank Characters)"
2. Search and replace: \r with nothing.
However, I need to do this for a large number of files, and I can't manage to find a Regular Expression with gsub() in R that will do what the Notepadd++ function does.
Note that replacing ^[ \t]*$\r?\n with nothing and then \r with nothing does work in Notepad++, but not in R, as suggested here, but it does not work with g(sub) in R.
I have tried the following code:
tx <- readLines("tweets.csv")
subbed <-gsub(pattern = "^[ \\t]*$\\r?\\n", replace = "", x = tx)
subbed <-gsub(pattern = "\r", replace = "", x = subbed)
writeLines(subbed, "output.csv")
This is the input:
This is the desired output:
You may use
library(readtext)
tx <- readtext("weets.csv")
subbed <- gsub("(?m)^\\h*\\R?", "", tx$text, perl=TRUE)
subbed <- gsub("\r", "", subbed, fixed=TRUE)
writeLines(trimws(subbed), "output.csv")
The readtext llibrary reads the file into a single variable and thus all line break chars are kept.

Is there a way to make R strings verbatim (not escaped)?

Typical example:
path <- "C:/test/path" # great
path <- "C:\\test\\path" # also great
path <- "C:\test\path"
Error: '\p' is an unrecognized escape in character string starting ""C:\test\p"
(of course - \t is actually an escape character.)
Is there any mark that can be used to treat the string as verbatim? Or can it be coded?
It would be really useful when copy/pasting path names in Windows...
R 4.0.0 introduces raw strings:
dir <- r"(c:\Program files\R)"
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html
https://blog.revolutionanalytics.com/2020/04/r-400-is-released.html
You can use scan ( but only in interactive session -- not in source)
Like
path=scan(what="",allowEscapes=F,nlines=1)
C:\test\path
print(path)
And then
Ctrl+A ++ Ctrl+Enter
give you result
But not work in function or source :
{
path=scan(what="character",allowEscapes=F,nlines=1)
C:\test\path
print(path)
}
throw error
Maybe readline() or scan(what = "charactor"), both work in terminal, not script or function:
1.readline():
> path <- readline()
C:\test\path #paste your path, ENTER
> path
[1] "C:\\test\\path"
2.scan(what = "charactor"):
> path = scan(what = "character")
1: C:\test\path #paste, ENTER
2: #ENTER
#Read 1 item
> path
[1] "C:\\test\\path"
EDIT:
Try this:
1.Define a function getWindowsPath():
> getWindowsPath <- function() #define function
{
return(scan(file = "clipboard", what = "character"))
}
2.Copy windows path using CTRL+C:
#CTRL+C: C:\test\path
> getWindowsPath()
#Read 1 item
[1] "C:\\test\\path"
If you are copying and pasting in windows, you can set up a file connection to the clipboard. Then you can use scan to read from it, with allowEscapes turned off. However, Windows allows spaces in file paths, and scan doesn't understand that, so you have to wrap the result in paste0 with collapse set to a 0-length character string.
x = file(description = "clipboard")
y = paste0(scan(file = x, what = "character", allowEscapes = F), collapse = "")
Unfortunately, this only works for the path currently in the clipboard, so if you are copying and pasting lots of paths into an R script, this is not a solution. A workaround in that situation would be to paste each path into a separate text file and save it. Then, in your main script, you could run the following
y = paste0(scan(file = "path1.txt", what = "character", allowEscapes = F), collapse = "")
You would probably need one saved file for each path.

R doesn't append lines to file (using Cat or Write)

I tried some options from stackoverflow(e.g.1) but this also doens't work so maybe there is a mistake in my code:
fileConn<-file("outputR.txt")
for (i in 1:length(lines)){
line = lines[i]
fields = strsplit(line, "\t")[[1]]
id = fields[1]
goIDs = fields[2:length(fields)]
list = as.list(GOCCANCESTOR[goIDs])
text = paste(toString(id), ":", toString(goIDs))
cat(text, file=fileConn, append=TRUE, sep = "\n")
}
close(fileConn)
when I run this code it keeps overwriting the data in the outputR.txt file.
Any suggestions to fix this problem?
the problem is that you are using a Fileconnection in combination with cat then the append won't work. There are several option you could use, the most easy one is to this:
first "create" the file, if you want to add a header for example:
header = "some header"
## if you don't want to use a header then leave the header blank
header =""
cat(text, file="outputR.txt", append=FALSE, sep = "\n")
notice the append = FALSE this is necessary if you want to clear the file if it already exist otherwise you have to use append = TRUE
the you can write text to it using:
text = text = paste(toString(id), ":", toString(goIDs))
cat(text file="outputR.txt", append=TRUE, sep = "\n")
You have two options here:
1.
Open the file in write mode:
lines <- c("aaaaa", "bbbb")
fileConn<-file("test.txt", "w")
for (i in 1:length(lines)){
line = lines[i]
cat(line, file=fileConn, append=TRUE, sep = "\n")
}
close(fileConn)
2
Use the write function with the append argument:
lines <- c("aaaaa", "bbbb")
for (i in 1:length(lines)){
line = lines[i]
write(line,file="test2",append=TRUE)
}
As the help page for cat states:
append: logical. Only used if the argument file is the name of file (and not a connection or "|cmd"). If TRUE output will be appended to file; otherwise, it will overwrite the contents of file.
thus, if you use a connection in the file argument the value of the append argument is ignored.
simply specify the file argument as name of file:
cat(text, file="outputR.txt", append=TRUE, sep = "\n")
alternatively you can open the file connection with the correct mode specified
w+ - Open for reading and writing, truncating file initially.
fileConn <- file("outputR.txt", open = "w+")
for (i in 1:length(lines)){
text <- paste("my text in line", i)
cat(text, file = fileConn, sep = "\n")
}
close(fileConn)

Keep rows separate with write.table R

I'm trying to produce some files that have slightly unusual field seperators.
require(data.table)
dset <- data.table(MPAN = c(rep("AAAA",1000),rep("BBBB",1000),rep("CCCC",1000)),
INT01 = runif(3000,0,1), INT02 = runif(3000,0,1), INT03 = runif(3000,0,1))
write.table(dset,"C:/testing_write_table.csv",
sep = "|",row.names = FALSE, col.names = FALSE, na = "", quote = FALSE, eol = "")
I'm findiong however that the rows are not being kept seperate in the output file, e.g.
AAAA|0.238683722680435|0.782154920976609|0.0570344978477806AAAA|0.9250325632......
Would you know how to ensure the text file retains distinct rows?
Cheers
You are using the wrong eol argument. The end of line argument needs to be a break line:
This worked for me:
require(data.table)
dset <- data.table(MPAN = c(rep("AAAA",1000),rep("BBBB",1000),rep("CCCC",1000)),
INT01 = runif(3000,0,1), INT02 = runif(3000,0,1), INT03 = runif(3000,0,1))
write.table(dset,"C:/testing_write_table.csv", #save as .txt if you want to open it with notepad as well as excel
sep = "|",row.names = FALSE, col.names = FALSE, na = "", quote = FALSE, eol = "\n")
Using the break line symbol '\n' as the end of line argument creates separate lines for me.
Turns out this was a UNIX - Windows encoding issue. So something of a red herring, but perhaps worth recording in case anyone else has this at first perplexing issue.
It turns out that Windows notepad sometimes struggles to render files generated in UNIX properly, a quick test to see if this is the issue is to open in Windows WordPad instead and you may find that it will render properly.

Resources