Avoid blank line at end of file when using writeLines - r

In R: Is it possible to avoid having a blank line at the end of a text file generated by writeLines? If not, is there any other way of generating a text file from within R without having a blank line at the end?

There is no blank line.
R (correctly) ends each line with '\n' (or '\r\n' on Windows). In other words, the file consists of lines, and each line ends with a line break.
Unfortunately, there are many tools (especially on Windows) which treat such files incorrectly and display an extra line at the end. However, that’s a fault with these tools, not with R. Consequently, this shouldn’t be fixed in the R code.
As a hack to appease buggy tools, the only recourse is to set the sep argument of writeLines to the empty string, '', and insert the line breaks between lines manually (using paste).

I had exactly the same concern (different grid, though) and even your comment of the accept answer (by Konrad) did not work for me.
I found the answer here, and here is the full code:
fileConn = file("mytext.txt")
writeLines(c("line1", "line2", "line3"), sep="\n", fileConn)
#now connect to UNIX server and upload your file
library(ssh)
session=ssh_connect("user#server.com")
scp_upload(session, files="mytext.txt")
#Here is the trick, convert all the Windows extra chars to unix
ssh_exec_wait(session, command="dos2unix mytext.txt")
#Then start your Grid job
ssh_exec_wait(session, command="sbatch mytext.txt")
ssh_disconnect(session)

Related

R exports a blank last line in text file by "write.table()" [duplicate]

In R: Is it possible to avoid having a blank line at the end of a text file generated by writeLines? If not, is there any other way of generating a text file from within R without having a blank line at the end?
There is no blank line.
R (correctly) ends each line with '\n' (or '\r\n' on Windows). In other words, the file consists of lines, and each line ends with a line break.
Unfortunately, there are many tools (especially on Windows) which treat such files incorrectly and display an extra line at the end. However, that’s a fault with these tools, not with R. Consequently, this shouldn’t be fixed in the R code.
As a hack to appease buggy tools, the only recourse is to set the sep argument of writeLines to the empty string, '', and insert the line breaks between lines manually (using paste).
I had exactly the same concern (different grid, though) and even your comment of the accept answer (by Konrad) did not work for me.
I found the answer here, and here is the full code:
fileConn = file("mytext.txt")
writeLines(c("line1", "line2", "line3"), sep="\n", fileConn)
#now connect to UNIX server and upload your file
library(ssh)
session=ssh_connect("user#server.com")
scp_upload(session, files="mytext.txt")
#Here is the trick, convert all the Windows extra chars to unix
ssh_exec_wait(session, command="dos2unix mytext.txt")
#Then start your Grid job
ssh_exec_wait(session, command="sbatch mytext.txt")
ssh_disconnect(session)

R Dataframe from a Text file with 2 Byts Separator

if you can help with converting a big text:
sample of the text :
X1"II"ID_Sitze.x"II"Produktionsdatum.x"II"Herstellernummer.x"II"Werksnummer.x"II"Fehlerhaft.x"II"Fehlerhaft_Datum.x"II"Fehlerhaft_Fahrleistung.x"II"ID_Sitze.y"II"Produktionsdatum.y"II"Herstellernummer.y"II"Werksnummer.y"II"Fehlerhaft.y"II"Fehlerhaft_Datum.y"II"Fehlerhaft_Fahrleistung.y""1"II1II"K2LE1-109-1091-2"II2008-11-12II"109"II1091II1II2010-10-18II37080IINAIINAIINAIINAIINAIINAIINA"2"II2II"K2LE1-109-1091-1"II2008-11-12II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"3"II3II"K2LE1-109-1091-12"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"4"II4II"K2LE1-109-1091-5"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"5"II5II"K2LE1-109-1091-40"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"6"II6II"K2LE1-109-1091-15"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"7"II7II"K2LE1-109-1091-31"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"8"II8II"K2LE1-109-1091-6"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"9"II9II"K2LE1-109-1091-8"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"10"II10II"K2LE1-109-1091-25"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"11"II11II"K2LE1-109-1091-24"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"12"II12II"K2LE1-109-1091-36"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"13"II13II"K2LE1-109-1091-33"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"14"II14II"K2LE1-109-1091-42"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"15"II15II"K2LE1-109-1091-14"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"16"II16II"K2LE1-109-1091-21"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"17"II17II"K2LE1-109-1091-43"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"18"II18II"K2LE1-109-1091-44"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"19"II19II"K2LE1-109-1091-19"II2008-11-13II"109"II1091II1II2010-10-19II37
with separator "II" to a Dataframe.
i have used :
df_BSt7<-readLines("Komponente_K2LE1.txt")
df_BST7<-str_replace_all(df_BSt7,"II",",")
df_BST7<-read.table(df_BST7,sep = ",")
head(df_BST7)
but I am always getting an Error
could not allocate memory (206 Mb) in C function 'R_AllocStringBuffer'
and when i call head() I am getting
'"X1","ID_Sitze.x","Produktionsdatum.x","Herstellernummer.x","Werksnummer.x","Fehlerhaft.x","Fehlerhaft_Datum.x","Fehlerhaft_Fahrleistung.x","ID_Sitze.y","Produktionsdatum.y","Herstellernummer.y","Werksnummer.y","Fehlerhaft.y","Fehlerhaft_Datum.y","Fehlerhaft_Fahrleistung.y""1",1,"K2LE1-109-1091-2",2008-11-12,"109",1091,1,2010-10-18,37080,NA,NA,NA,NA,NA,NA,NA"2",2,"K2LE1-109-1091-1",2008-11-12,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"3",3,"K2LE1-109-1091-12",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"4",4,"K2LE1-109-1091-5",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"5",5,"K2LE1-109-1091-40",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"6",6,"K2LE1-109-1091-15",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"7",7,"K2LE1-109-1091-31",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"8",8,"K2LE1-109-1091-6",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"9",9,"K2LE1-109-1091-8",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"10",10,"K2LE1-109-109 [... abgeschnitten]
So, there are several possible problems, some might be specific to your examples.
Clean example data
First, let's take a look at your example data. In what you provide, there are no newlines, everything is on a single line. Is that the case in the original "Komponente_K2LE1.txt" file? If yes, we might need some more work to find where to add newlines (see below).
The first column name, X1, only has a quote on the right. It can't work without the quote on the left: "X1"IIID_Sitze.
The saved dataframe has 16 columns, I expect because there is a row number at the beginning of each row which is not in the header. So we can add an additional column header to have 16 of them:
"row_nb"II"X1"II"ID_Sitze.x"II"Produktionsdatum.x"II"Herstellernummer.x"II"Werksnummer.x"II"Fehlerhaft.x"II"Fehlerhaft_Datum.x"II"
Then we have a small problem with line 19 which is truncated, I assume it comes from your copy/paste and that's not a problem with the full file. So let's forget about it for now. So I have this text:
raw_lines <- '"row_nb"II"X1"II"ID_Sitze.x"II"Produktionsdatum.x"II"Herstellernummer.x"II"Werksnummer.x"II"Fehlerhaft.x"II"Fehlerhaft_Datum.x"II"Fehlerhaft_Fahrleistung.x"II"ID_Sitze.y"II"Produktionsdatum.y"II"Herstellernummer.y"II"Werksnummer.y"II"Fehlerhaft.y"II"Fehlerhaft_Datum.y"II"Fehlerhaft_Fahrleistung.y"
"1"II1II"K2LE1-109-1091-2"II2008-11-12II"109"II1091II1II2010-10-18II37080IINAIINAIINAIINAIINAIINAIINA
"2"II2II"K2LE1-109-1091-1"II2008-11-12II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"3"II3II"K2LE1-109-1091-12"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"4"II4II"K2LE1-109-1091-5"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"5"II5II"K2LE1-109-1091-40"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"6"II6II"K2LE1-109-1091-15"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"7"II7II"K2LE1-109-1091-31"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"8"II8II"K2LE1-109-1091-6"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"9"II9II"K2LE1-109-1091-8"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"10"II10II"K2LE1-109-1091-25"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"11"II11II"K2LE1-109-1091-24"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"12"II12II"K2LE1-109-1091-36"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"13"II13II"K2LE1-109-1091-33"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"14"II14II"K2LE1-109-1091-42"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"15"II15II"K2LE1-109-1091-14"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"16"II16II"K2LE1-109-1091-21"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"17"II17II"K2LE1-109-1091-43"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"18"II18II"K2LE1-109-1091-44"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA'
Now you are replacing "II" with "," and reading it with read.table(), which is perfectly correct, except that read.table() would assume you're giving a file name and throw an error as it can't open that connection (that file). To make it work you need this:
df_BST7<-str_replace_all(raw_lines,'II',",")
df_BST7 <- read.table(text = df_BST7,sep = ",")
So now that does run on my computer.
Side note, since you're already using the tidyverse, you could as well use that equivalent code instead:
df_BST7 <- str_replace_all(raw_lines,'II',",")
df_BST7 <- read_csv(df_BST7)
which could help with something later
The error message
Now the error message you get suggests it's a memory problem. I see 2 possibilities: the table is so big it can't fit in your computer's memory, or indeed your whole input table is on a single line, so that makes a very long line, which won't fit in memory.
Whole table too big
I don't think it's the problem here, but just in case, check how big the file on the disk is, and how much memory is free on your computer, and whether you could free up enough memory by just closing a few programs. Possibly you could save your modified text to disk and delete it from R's memory with rm(df_BSt7), then load it directly from disk into df_BST7. Since the raw text fits in memory, that should work. If memory is a challenge, you can replace read_csv() with read_csv_chunked() and process one chunk at a time.
All on one line
I think this is the most likely. Again, there are two possibilities.
Missing carriage return
Actually line breaks can be described in 2 ways, Unix-like systems (MacOS and GNU/Linux) use the symbol newline (\n), whereas Windows uses a pair of carriage return and newline (\r\n). I'm not sure how this could create problems inside R, but if your file was generated on a Unix-like system and you're trying to read it on Windows that's an explanation. Then the goal would become to replace \n with \r\n.
No line breaks at all
If there is absolutely no line break, neither \r nor \n, then we need to guess where they are. On a Unix system you could try awk or sed, but there are ways to do it in R. The following code should work, except the last column will need some cleaning up afterwards:
raw_lines2 <- str_remove_all(raw_lines2, "\r")
all_fields <- raw_lines2 %>%
str_split("II") %>%
unlist()
nb_lines <- (length(all_fields) - 1)/15
reconstruct_lines <- map_chr(0:(nb_lines-1), ~ paste(all_fields[(2+15*.):(16+15*.)], collapse = ",")) %>%
paste(collapse = "\n")
cat(reconstruct_lines)

fwrite(, append = TRUE) appends wrong way

I am facing a problem with the fwrite function from the DataTable package in R. In fact it appends the wrong way and I'd end up with something like:
**user ref version status Type DataExtraction**
user1 2.02E+11 1 Pending 1 No
user2 2.02E+11 1 Saved 2 No"user3" 2.01806E+11 1 Saved NB No
I am using the function as follows :
library(data.table)
fwrite(Save, "~/Downloads/Register.csv", append = TRUE, sep = ",", quote = TRUE)
Reproducible example:
fwrite(data.table(user="user3",
ref="204094093",
version="2",
status="Pending",
Type="1",DataExtraction="No"),
"~/Downloads/test.csv", sep = ",", append = FALSE)
fwrite(data.table(user="user3",
ref="204094093",
version="2",
status="Pending",
Type="1",DataExtraction="No"),
"~/Downloads/test.csv", sep = ",", append = TRUE)
I'm not sure if it isolates the problem, but it seems that if I manually change something in the .csv file (for instance rename DataExtraction to Extraction), the problem of appending in the wrong way occurs.
Does someone know what is going wrong?
Thanks!
When I run your example code I have no problems with the behavior - the file comes out as desired. Based on your comments about manually changing what is in the file, and what the undesired output looks like, here is what I believe is happening. When fwrite() (and many other similar IO functions) write to a file, each line has at the end of it a line break (in R, this is generally represented as \n). This is desired, so that subsequent lines of data indeed appear on subsequent lines of the file. Generally this will also mean that when you open a file in a text editor, there will be a blank line at the very end, since this reflects the line break in the last line that was written. (different editors handle this differently though). So, I suspect what is happening is that when you go in and manually edit the file in your editor, you are somehow losing that last line break. What this means is that when you go to write again using append, there is no line break at the end of the file, and therefore you get the undesired behavior of two lines of data on a single line of the file.
So, the solution would be to either find how to prevent your manual editing from deleting that last line break character. Barring that, there are ways to just write a single line break character to the file using R. E.g. with the cat() function.

Remove line feed in CSV using Unix script

I have a CSV file and I want to remove the all line feeds (LF or \n) which are all coming in between the double quotes alone.
Can you please provide me an Unix script to perform the above task. I have given the input and expected output below.
Input :
No,Status,Date
1,"Success
Error",1/15/2018
2,"Success
Error
NA",2/15/2018
3,"Success
Error",3/15/2018
Expected output:
No,Status,Date
1,"Success Error",1/15/2018
2,"Success Error NA",2/15/2018
3,"Success Error",3/15/2018
I can't write everything for you, as I am not sure about your system as well as which bash version that is running on it. But here are a couple of suggestions that you might want to consider.
https://www.unix.com/shell-programming-and-scripting/31021-removing-line-breaks-shell-variable.html
https://www.unix.com/shell-programming-and-scripting/19484-remove-line-feeds.html
How to remove carriage return from a string in Bash
https://unix.stackexchange.com/questions/57124/remove-newline-from-unix-variable
Remove line breaks in Bourne Shell from variable
https://unix.stackexchange.com/questions/254644/how-do-i-remove-newline-character-at-the-end-of-file
https://serverfault.com/questions/391360/remove-line-break-using-awk

Using com.opencsv.CSVReader on windows stops reading lines prematurely

I have two files that are identical except for the line ending codes. The one that uses the newline (linux/Unix)character works (reads all 550 rows of data) and the one that uses carriage return and line feed (Windows) stops returning lines after reading 269 lines. In both cases the data is read correctly up to the point where they stop.
If I run dos2unix on the file that fails, the resulting file works.
I would like to be able read CSV files regardless of their origin. If I could at least detect that the file is in the wrong format before reading part of the data that would be helpful
Even if I could tell at any time in the middle of reading the file that it was not going to work, I could output an error.
My current state of reading half the file and terminating with no error is dangerous.
The problem is that under the covers openCSV uses a BufferedReader which reads a line from the stream until it gets to the Systems line.seperator.
If you know beforehand what the line separator of the file is then in your application just do a System.setProperty("line.separator", newLine) where newLine is either "\n" or "\r\n" based on the file you are about to parse. Or you can pass that in as a parameter.
If you want to automatically detect the file character. Create a method that will take the file you want, create a BufferedReader and read a single line. If the last character is a '\r' then your system system uses "\n" but you want to set it to "\r\n". Else if line.contains("\n") returns true then you are on a system that uses "\r\n" and you want to set it to "\n". Otherwise the system and the file you are reading have compatible line feed characters.
Just note if you do change the system line feed character be sure to set it back after processing the file in case your program is processing multiple files.

Resources