In R: Is it possible to avoid having a blank line at the end of a text file generated by writeLines? If not, is there any other way of generating a text file from within R without having a blank line at the end?
There is no blank line.
R (correctly) ends each line with '\n' (or '\r\n' on Windows). In other words, the file consists of lines, and each line ends with a line break.
Unfortunately, there are many tools (especially on Windows) which treat such files incorrectly and display an extra line at the end. However, that’s a fault with these tools, not with R. Consequently, this shouldn’t be fixed in the R code.
As a hack to appease buggy tools, the only recourse is to set the sep argument of writeLines to the empty string, '', and insert the line breaks between lines manually (using paste).
I had exactly the same concern (different grid, though) and even your comment of the accept answer (by Konrad) did not work for me.
I found the answer here, and here is the full code:
fileConn = file("mytext.txt")
writeLines(c("line1", "line2", "line3"), sep="\n", fileConn)
#now connect to UNIX server and upload your file
library(ssh)
session=ssh_connect("user#server.com")
scp_upload(session, files="mytext.txt")
#Here is the trick, convert all the Windows extra chars to unix
ssh_exec_wait(session, command="dos2unix mytext.txt")
#Then start your Grid job
ssh_exec_wait(session, command="sbatch mytext.txt")
ssh_disconnect(session)
if you can help with converting a big text:
sample of the text :
X1"II"ID_Sitze.x"II"Produktionsdatum.x"II"Herstellernummer.x"II"Werksnummer.x"II"Fehlerhaft.x"II"Fehlerhaft_Datum.x"II"Fehlerhaft_Fahrleistung.x"II"ID_Sitze.y"II"Produktionsdatum.y"II"Herstellernummer.y"II"Werksnummer.y"II"Fehlerhaft.y"II"Fehlerhaft_Datum.y"II"Fehlerhaft_Fahrleistung.y""1"II1II"K2LE1-109-1091-2"II2008-11-12II"109"II1091II1II2010-10-18II37080IINAIINAIINAIINAIINAIINAIINA"2"II2II"K2LE1-109-1091-1"II2008-11-12II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"3"II3II"K2LE1-109-1091-12"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"4"II4II"K2LE1-109-1091-5"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"5"II5II"K2LE1-109-1091-40"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"6"II6II"K2LE1-109-1091-15"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"7"II7II"K2LE1-109-1091-31"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"8"II8II"K2LE1-109-1091-6"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"9"II9II"K2LE1-109-1091-8"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"10"II10II"K2LE1-109-1091-25"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"11"II11II"K2LE1-109-1091-24"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"12"II12II"K2LE1-109-1091-36"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"13"II13II"K2LE1-109-1091-33"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"14"II14II"K2LE1-109-1091-42"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"15"II15II"K2LE1-109-1091-14"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"16"II16II"K2LE1-109-1091-21"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"17"II17II"K2LE1-109-1091-43"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"18"II18II"K2LE1-109-1091-44"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA"19"II19II"K2LE1-109-1091-19"II2008-11-13II"109"II1091II1II2010-10-19II37
with separator "II" to a Dataframe.
i have used :
df_BSt7<-readLines("Komponente_K2LE1.txt")
df_BST7<-str_replace_all(df_BSt7,"II",",")
df_BST7<-read.table(df_BST7,sep = ",")
head(df_BST7)
but I am always getting an Error
could not allocate memory (206 Mb) in C function 'R_AllocStringBuffer'
and when i call head() I am getting
'"X1","ID_Sitze.x","Produktionsdatum.x","Herstellernummer.x","Werksnummer.x","Fehlerhaft.x","Fehlerhaft_Datum.x","Fehlerhaft_Fahrleistung.x","ID_Sitze.y","Produktionsdatum.y","Herstellernummer.y","Werksnummer.y","Fehlerhaft.y","Fehlerhaft_Datum.y","Fehlerhaft_Fahrleistung.y""1",1,"K2LE1-109-1091-2",2008-11-12,"109",1091,1,2010-10-18,37080,NA,NA,NA,NA,NA,NA,NA"2",2,"K2LE1-109-1091-1",2008-11-12,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"3",3,"K2LE1-109-1091-12",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"4",4,"K2LE1-109-1091-5",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"5",5,"K2LE1-109-1091-40",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"6",6,"K2LE1-109-1091-15",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"7",7,"K2LE1-109-1091-31",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"8",8,"K2LE1-109-1091-6",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"9",9,"K2LE1-109-1091-8",2008-11-13,"109",1091,0,NA,0,NA,NA,NA,NA,NA,NA,NA"10",10,"K2LE1-109-109 [... abgeschnitten]
So, there are several possible problems, some might be specific to your examples.
Clean example data
First, let's take a look at your example data. In what you provide, there are no newlines, everything is on a single line. Is that the case in the original "Komponente_K2LE1.txt" file? If yes, we might need some more work to find where to add newlines (see below).
The first column name, X1, only has a quote on the right. It can't work without the quote on the left: "X1"IIID_Sitze.
The saved dataframe has 16 columns, I expect because there is a row number at the beginning of each row which is not in the header. So we can add an additional column header to have 16 of them:
"row_nb"II"X1"II"ID_Sitze.x"II"Produktionsdatum.x"II"Herstellernummer.x"II"Werksnummer.x"II"Fehlerhaft.x"II"Fehlerhaft_Datum.x"II"
Then we have a small problem with line 19 which is truncated, I assume it comes from your copy/paste and that's not a problem with the full file. So let's forget about it for now. So I have this text:
raw_lines <- '"row_nb"II"X1"II"ID_Sitze.x"II"Produktionsdatum.x"II"Herstellernummer.x"II"Werksnummer.x"II"Fehlerhaft.x"II"Fehlerhaft_Datum.x"II"Fehlerhaft_Fahrleistung.x"II"ID_Sitze.y"II"Produktionsdatum.y"II"Herstellernummer.y"II"Werksnummer.y"II"Fehlerhaft.y"II"Fehlerhaft_Datum.y"II"Fehlerhaft_Fahrleistung.y"
"1"II1II"K2LE1-109-1091-2"II2008-11-12II"109"II1091II1II2010-10-18II37080IINAIINAIINAIINAIINAIINAIINA
"2"II2II"K2LE1-109-1091-1"II2008-11-12II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"3"II3II"K2LE1-109-1091-12"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"4"II4II"K2LE1-109-1091-5"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"5"II5II"K2LE1-109-1091-40"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"6"II6II"K2LE1-109-1091-15"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"7"II7II"K2LE1-109-1091-31"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"8"II8II"K2LE1-109-1091-6"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"9"II9II"K2LE1-109-1091-8"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"10"II10II"K2LE1-109-1091-25"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"11"II11II"K2LE1-109-1091-24"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"12"II12II"K2LE1-109-1091-36"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"13"II13II"K2LE1-109-1091-33"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"14"II14II"K2LE1-109-1091-42"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"15"II15II"K2LE1-109-1091-14"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"16"II16II"K2LE1-109-1091-21"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"17"II17II"K2LE1-109-1091-43"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA
"18"II18II"K2LE1-109-1091-44"II2008-11-13II"109"II1091II0IINAII0IINAIINAIINAIINAIINAIINAIINA'
Now you are replacing "II" with "," and reading it with read.table(), which is perfectly correct, except that read.table() would assume you're giving a file name and throw an error as it can't open that connection (that file). To make it work you need this:
df_BST7<-str_replace_all(raw_lines,'II',",")
df_BST7 <- read.table(text = df_BST7,sep = ",")
So now that does run on my computer.
Side note, since you're already using the tidyverse, you could as well use that equivalent code instead:
df_BST7 <- str_replace_all(raw_lines,'II',",")
df_BST7 <- read_csv(df_BST7)
which could help with something later
The error message
Now the error message you get suggests it's a memory problem. I see 2 possibilities: the table is so big it can't fit in your computer's memory, or indeed your whole input table is on a single line, so that makes a very long line, which won't fit in memory.
Whole table too big
I don't think it's the problem here, but just in case, check how big the file on the disk is, and how much memory is free on your computer, and whether you could free up enough memory by just closing a few programs. Possibly you could save your modified text to disk and delete it from R's memory with rm(df_BSt7), then load it directly from disk into df_BST7. Since the raw text fits in memory, that should work. If memory is a challenge, you can replace read_csv() with read_csv_chunked() and process one chunk at a time.
All on one line
I think this is the most likely. Again, there are two possibilities.
Missing carriage return
Actually line breaks can be described in 2 ways, Unix-like systems (MacOS and GNU/Linux) use the symbol newline (\n), whereas Windows uses a pair of carriage return and newline (\r\n). I'm not sure how this could create problems inside R, but if your file was generated on a Unix-like system and you're trying to read it on Windows that's an explanation. Then the goal would become to replace \n with \r\n.
No line breaks at all
If there is absolutely no line break, neither \r nor \n, then we need to guess where they are. On a Unix system you could try awk or sed, but there are ways to do it in R. The following code should work, except the last column will need some cleaning up afterwards:
raw_lines2 <- str_remove_all(raw_lines2, "\r")
all_fields <- raw_lines2 %>%
str_split("II") %>%
unlist()
nb_lines <- (length(all_fields) - 1)/15
reconstruct_lines <- map_chr(0:(nb_lines-1), ~ paste(all_fields[(2+15*.):(16+15*.)], collapse = ",")) %>%
paste(collapse = "\n")
cat(reconstruct_lines)
I am facing a problem with the fwrite function from the DataTable package in R. In fact it appends the wrong way and I'd end up with something like:
**user ref version status Type DataExtraction**
user1 2.02E+11 1 Pending 1 No
user2 2.02E+11 1 Saved 2 No"user3" 2.01806E+11 1 Saved NB No
I am using the function as follows :
library(data.table)
fwrite(Save, "~/Downloads/Register.csv", append = TRUE, sep = ",", quote = TRUE)
Reproducible example:
fwrite(data.table(user="user3",
ref="204094093",
version="2",
status="Pending",
Type="1",DataExtraction="No"),
"~/Downloads/test.csv", sep = ",", append = FALSE)
fwrite(data.table(user="user3",
ref="204094093",
version="2",
status="Pending",
Type="1",DataExtraction="No"),
"~/Downloads/test.csv", sep = ",", append = TRUE)
I'm not sure if it isolates the problem, but it seems that if I manually change something in the .csv file (for instance rename DataExtraction to Extraction), the problem of appending in the wrong way occurs.
Does someone know what is going wrong?
Thanks!
When I run your example code I have no problems with the behavior - the file comes out as desired. Based on your comments about manually changing what is in the file, and what the undesired output looks like, here is what I believe is happening. When fwrite() (and many other similar IO functions) write to a file, each line has at the end of it a line break (in R, this is generally represented as \n). This is desired, so that subsequent lines of data indeed appear on subsequent lines of the file. Generally this will also mean that when you open a file in a text editor, there will be a blank line at the very end, since this reflects the line break in the last line that was written. (different editors handle this differently though). So, I suspect what is happening is that when you go in and manually edit the file in your editor, you are somehow losing that last line break. What this means is that when you go to write again using append, there is no line break at the end of the file, and therefore you get the undesired behavior of two lines of data on a single line of the file.
So, the solution would be to either find how to prevent your manual editing from deleting that last line break character. Barring that, there are ways to just write a single line break character to the file using R. E.g. with the cat() function.
I have a CSV file and I want to remove the all line feeds (LF or \n) which are all coming in between the double quotes alone.
Can you please provide me an Unix script to perform the above task. I have given the input and expected output below.
Input :
No,Status,Date
1,"Success
Error",1/15/2018
2,"Success
Error
NA",2/15/2018
3,"Success
Error",3/15/2018
Expected output:
No,Status,Date
1,"Success Error",1/15/2018
2,"Success Error NA",2/15/2018
3,"Success Error",3/15/2018
I can't write everything for you, as I am not sure about your system as well as which bash version that is running on it. But here are a couple of suggestions that you might want to consider.
https://www.unix.com/shell-programming-and-scripting/31021-removing-line-breaks-shell-variable.html
https://www.unix.com/shell-programming-and-scripting/19484-remove-line-feeds.html
How to remove carriage return from a string in Bash
https://unix.stackexchange.com/questions/57124/remove-newline-from-unix-variable
Remove line breaks in Bourne Shell from variable
https://unix.stackexchange.com/questions/254644/how-do-i-remove-newline-character-at-the-end-of-file
https://serverfault.com/questions/391360/remove-line-break-using-awk