'Incomplete final line' warning when trying to read a .csv file into R - r

I'm trying to read a .csv file into R and upon using this formula:
pheasant<-read.table(file.choose(),header=TRUE,sep=",")
I get this warning message:
"incomplete final line found by readTableHeader on 'C:\Documents and Settings..."
There are a couple of things I thought may have caused this warning, but unfortunately I don't know enough about R to diagnose the problem myself so I thought I'd post here in the hope someone else can diagnose it for me!
the .csv file was originally an Excel file, which I saved into .csv format
the file comprises three columns of data
each data column is of a differing length, i.e. there are a different number of values in each column
I want to compare the means (using t-test or equivalent depending on normal / not normal distribution) of two of the columns at a time, so for example, t-test between column 1 values and column 2 values, then a t-test of column 1 and column 3 values, etc.
Any help or suggestions would be seriously appreciated!

The message indicates that the last line of the file doesn't end with an End Of Line (EOL) character (linefeed (\n) or carriage return+linefeed (\r\n)). The original intention of this message was to warn you that the file may be incomplete; most datafiles have an EOL character as the very last character in the file.
The remedy is simple:
Open the file
Navigate to the very last line of the file
Place the cursor the end of that line
Press return
Save the file

The problem is easy to resolve;
it's because the last line MUST be empty.
Say, if your content is
line 1,
line2
change it to
line 1,
line2
(empty line here)
Today I met this kind problem, when I was trying to use R to read a JSON file, by using command below:
json_data<-fromJSON(paste(readLines("json01.json"), collapse=""))
; and I resolve it by my above method.

Are you really sure that you selected the .csv file and not the .xls file? I can only reproduce the error if I try to read in an .xls file. If I try to read in a .csv file or any other text file, it's impossible to recreate the error you get.
> Data <- read.table("test.csv",header=T,sep=",")
> Data <- read.table("test.xlsx",header=T,sep=",")
Warning message:
In read.table("test.xlsx", header = T, sep = ",") :
incomplete final line found by readTableHeader on 'test.xlsx'
readTableHead is the c-function that gives the error. It tries to read in the first n lines (standard the first 5 ) to determine the type of the data. The rest of the data is read in using scan(). So the problem is the format of the file.
One way of finding out, is to set the working directory to the directory where the file is. That way you see the extension of the file you read in. I know on Windows it's not shown standard, so you might believe it's csv while it isn't.
The next thing you should do, is open the file in Notepad or Wordpad (or another editor) and check that the format is equivalent to my file test.csv:
Test1,Test2,Test3
1,1,1
2,2,2
3,3,3
4,4,
5,5,
,6,
This file will give you the following dataframe :
> read.table(testfile,header=T,sep=",")
Test1 Test2 Test3
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 NA
5 5 5 NA
6 NA 6 NA
The csv format saved by excel seperates all cells with a comma. Empty cells just don't have a value. read.table() can easily deal with this, and recognizes empty cells just fine.

Use readLines() (with warn = FALSE) to read the file into a character vector first.
After that use the text = option to read the vector into a data frame with read.table()
pheasant <- read.table(
text = readLines(file.choose(), warn = FALSE),
header = TRUE,
sep = ","
)

I realized that several answers have been provided but no real fix yet.
The reason, as mentioned above, is a "End of line" missing at the end of the CSV file.
While the real Fix should come from Microsoft, the walk around is to open the CSV file with a Text-editor and add a line at the end of the file (aka press return key).
I use ATOM software as a text/code editor but virtually all basic text editor would do.
In the meanwhile, please report the bug to Microsoft.
Question: It seems to me that it is a office 2016 problem. Does anyone have the issue on a PC?

I have solved this problem with changing encoding in read.table argument from fileEncoding = "UTF-16" to fileEncoding = "UTF-8".

I received the same message. My fix included: I deleted all the additional sheets (tabs) in the .csv file, eliminated non-numeric characters, resaved the file as comma delimited and loaded in R v 2.15.0 using standard language:
filename<-read.csv("filename",header=TRUE)
As an additional safeguard, I closed the software and reopened before I loaded the csv.

In various European locales, as the comma character serves as decimal point, the read.csv2 function should be used instead.

I got this problem once when I had a single quote as part of the header. When I removed it (i.e. renamed the respective column header from Jimmy's data to Jimmys data), the function returned no warnings.

In my case, it was literally the final line. The issue was fixed by literally adding a blank row at the bottom of the CSV file.
FROM
cola,colb,colc
1,2,3
4,5,6
7,8,9
INTO
cola,colb,colc
1,2,3
4,5,6
7,8,9
Take a look closer on that extra space at the very last row. Just add that blank line and it will fix the issue.
NOTE
It seems that R's CSV parser is looking for that very last new line character as the new line separator. This is more known to programmers as the \r\n or \r characters.

The problem that you're describing occurred for me when I renamed a .xlsx as .csv.
What fixed it for me was going "Save As" and then saving it as a .csv again.

To fix this issue through R itself, I just used read.xlsx(..) instead of a read.csv(). Works like a charm!! You do not even have to rename. Renaming an xlsx into to csv is not a viable solution.

Open the file in text wrangler or notepad ++ and show the formating e.g. in text wrangler you do show invisibles. That way you can see the new line or tabs characters
Often excel will add all sorts of tabs in the wrong places and not a last new line character, but you need to show the symbols to see this.

My work around was that I opened the csv file in a text editor, removed the excessive commas on the last value, then saved the file. For example for the following file
Test1,Test2,Test3
1,1,1
2,2,2
3,3,3
4,4,
5,5,
,6,,
Remove the commas after 6, then save the file.

I've experienced a similar problem, however this appears to a generic warning, and may not in fact be related to the line-end character. In my case it was giving this error because the file I was using contained Cyrillic characters, once I replaced them with latin characters the error disappeared.

I tried different solutions, such as using a text editor to insert a new line and get the End Of Line character as recommended in the top answer above. None of these worked, unfortunately.
The solution that did finally work for me was very simple: I copy-pasted the content of a CSV file into a new blank CSV file, saved it, and the problem was gone.

There is a quite simple solution (if it is indeed the finale line which is causing troubles) where you don't need to open the file before reading it:
cat("\n", file = "your/File/Dir", append = TRUE)
Found this solution here.

Related

Warning message:In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'hola.csv'

Problem: in R I get Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'hola.csv'
To simplify I created a basic table in excel and I have saved it in all the .csv formats it offers (comma separated values, csv UTF8, MS2 csv etc) and the error persists in all of them. I'm working in mac 10.15 catalina, Excel version 16.29.1 (2019).
I changed the language of my laptop from Spain to Uk, selecting , for groups and . for decimals, as some people here suggested it may be due to some countries languages by default using semicolon instead of commas for csv. After this, as expected, csv are indeed created separated by commas, but I still get the warning.
As suggested, if I open the file in textedit and click enter at the end, saving it afterwards, R works perfectly and the error disappears, but it does not seem practical/efficient to do that every single time I want to open a csv. On the other hand it remains a mystery to me why working colleagues using mac UK configuration do not get this error (neither do I when I open csv they have created on their laptops).
Can it be the Excel version? Should I ignore the warning? (the table looks fine when opening it). thanks!
aq2<-read.csv("hola.csv")
That is a warning message generated because R's read.table expects the final line to include an end of line character (either or ). It's almost always an unnecessary warning. Many programs including Excel, will create files like that.
You should carefully read the error message. It says incomplete final line found by readTableHeader. This refers to the last row of your .csv file and suggests that this line is incomplete for R to read. So what could be the problem? If you have a csv (=comma separated values) file, it might well be that each line has a certain formatting. Check if this formatting is consistently applied throughout the file. An issue that often pops up in hand-collected data. If you post an excerpt of your data using tail(aq2) (from tidyverse package) we could have a look at the last line and check the formatting to answer the issue in more depth. Eventually, it is just a warning, not an error message. Still, important to understand warnings.

Fread unusual line ending causing error

I am attempting to download a large database of NYC taxi data, publicly available at the NYC TLC website.
library(data.table)
feb14 <- fread('https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv', header = T)
Executing the above code successfully downloads the data (which takes a few minutes), but then fails to parse due to an internal error. I have tried removing header = T as well.
Is there a workaround in order to deal with the "unusual line endings" in fread ?
Error in fread("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv", :
Internal error. No eol2 immediately before line 3 after sep detection.
In addition: Warning message:
In fread("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv", :
Detected eol as \n\r, a highly unusual line ending. According to Wikipedia the Acorn BBC used this. If it is intended that the first column on the next row is a character column where the first character of the field value is \r (why?) then the first column should start with a quote (i.e. 'protected'). Proceeding with attempt to read the file.
It seems that the issues might be caused due the presence of a blank line between the header and data in the original .csv file. Deleting the line from the .csv using notepad++ seemed to fix it for me.
Sometimes other options like read.csv/read.table can behave differently... so you can always try that. (Maybe the source code tells why, havent looked into that).
Another option is to use readLines() to read in such a file. As far as I know, no parsing/formatting is done here. So this is, as far as I know, the most basic way to read a file
At last, a quick fix: use the option 'skip = ...' in fread, or control the end by saying 'nrows = ...'.
There is something fishy with fread. data.table is the faster, more performance oriented for reading large files, however in this case the behavior is not optimal. You may want to raise this issue on github
I am able to reproduce the issue on downloaded file even with nrows = 5 or even with nrows = 1 but only if stick to the original file. If I copy paste the first few rows and then try, the issue is gone. The issue also goes away if I read directly from the web with small nrows. This is not even an encoding issue, hence my recommendation to raise an issue.
I tried reading the file using read.csv and 100,000 rows without an issue and under 6 seconds.
feb14_2 <- read.csv("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv", header = T, nrows = 100000)
header = T is a redundant argument so would not make a difference for fread but is needed for read.csv.

Deal with escaped commas in CSV file?

I'm reading in a file in R using fread as such
test.set = fread("file.csv", header=FALSE, fill=TRUE, blank.lines.skip=TRUE)
Where my csv consists of 6 columns. An example of a row in this file is
"2014-07-03 11:25:56","61073a09d113d3d3a2af6474c92e7d1e2f7e2855","Securenet Systems Radio Playlist Update","Your Love","Fred Hammond & Radical for Christ","50fcfb08424fe1e2c653a87a64ee92d7"
However, certain rows are formatted in a particular way when there is a comma inside one of the cells. For instance,
"2014-07-03 11:25:59","37780f2e40f3af8752e0d66d50c9363279c55be6","Spotify","\"Hello\", He Lied","Red Box","b226ff30a0b83006e5e06582fbb0afd3"
produces an error of the sort
Expecting 6 cols, but line 5395818 contains text after processing all
cols. Try again with fill=TRUE. Another reason could be that fread's
logic in distinguishing one or more fields having embedded sep=','
and/or (unescaped) '\n' characters within unbalanced unescaped quotes
has failed. If quote='' doesn't help, please file an issue to figure
out if the logic could be improved.
As you can see, the value that is causing the error is "\"Hello\", He Lied", which I want to be read by fread as "Hello, He Lied". I'm not sure how to account for this, though - I've tried using fill=TRUE and quote="" as suggested, but the error still keeps coming up. It's probably just a matter of finding the right parameter(s) for fread; anyone know what those might be?
In read.table() from base R this issue is solvable.
Using Import data into R with an unknown number of columns?
In fread from data.table this is not possible.
Issue logged for this : https://github.com/Rdatatable/data.table/issues/2669

read.csv in R doesn't import all rows from csv file

I have a comma separated dataset of around 10,000 rows. When doing read.csv, R created a dataframe rows lesser than the original file. It excluded/rejected 200 rows.
When I open the csv file in Excel, the file looks okay. The file is well formatted for line delimiters and also field delimiters (as per parsing done by Excel).
I have identified the row numbers in my file which are getting rejected but I can't identify the cause by glancing over them.
Is there any way to look at logs or something which includes reason why R rejected these records?
The OP indicates that the problem is caused by quotes in the CSV-file.
When the records in the CSV-file are not quoted, but only a few records contain quotes. The file can be opened using the quote="" option in read.csv. This disables quotes.
data <- read.csv(filename, quote="")
Another solution is to remove all quotes from the file, but this will also result in modified data (your strings don't contain any quotes anymore) and will give problems of your fields contain comma's.
lines <- readLines(filename)
lines <- gsub('"', '', lines, fixed=TRUE)
data <- read.csv(textConnection(lines))
A slightly more safe solution, which will only delete quotes when not just before or after a comma:
lines <- readLines(filename)
lines <- gsub('([^,])"([^,])', '\\1""\\2', lines)
data <- read.csv(textConnection(lines))
I had same issue where difference between number of rows present in csv file and number of rows read by read.csv() command was significant. I used fread() command from data.table package in place of read.csv and it solved the problem.
The records rejected was due to presence of double quotes in the csv file. I removed the double quotes on notepad++ before reading the file in R. If you can suggest a better way to remove the double quotes in R (before reading the file), please leave a comment below.
Pointed out by Jan van der Laan. He deserves the credit.
In your last question you want to remove double quotes (that is "") before reading the csv file in R. This probably is best done as a file preprocessing step using a one line Shell scripting "sed" comment (treated in the Unix & Linux forum).
sed -i 's/""/"/g' test.csv

The woes of endless columns in .csv data in R

So I have a bunch of .csv files that were output by a simulation. I'm writing an R script to run through them and make a histogram of a column in each .csv file. However, the .csv is written in such a way that R does not like it. When I was testing it, I had been originally opening the files in Excel and apparently this changed the format to one R liked. Then when I went back to run the script on the entire folder I discovered that R doesn't like the format.
I was reading the data in as:
x <- read.csv("synch-imit-characteristics-2-tags-2-size-200-cost-0.1run-2-.csv", strip.white=TRUE)
Error in read.table(test, strip.white = TRUE, header = TRUE) :
more columns than column names
Investigating I found that the original .csv file, which R does not like, looks different than after the test one I opened with excel. I copied and pasted the first bit below after opening it in notepad:
cost,0.1
mean-loyalty, mean-hospitality
0.9885449527316088, 0.33240076252915735
weight,1 of p1, 2 of p1,
However, in notepad, there is no apparent formatting. In fact, between rows there is no space at all, ie it is cost,0.1mean-loyalty,mean-hospitality0.988544, etc. So it is weird to me as well that when I cope and paste it from notepad it gets the desired formatting as above. Anyway, moving on, after I had opened it in excel it got transferred to this"
cost,0.1,,,,,,,,
mean-loyalty, mean-hospitality,,,,,,,,
0.989771257,0.335847092,,,,,,,,
weight,1 of p1, etc...
So it seems like the data originally has no separation between rows (but I don't know how excel figures it out, or copying and pasting it) but R doesn't pick up on this. Instead, it views it all as one row (and since I have 40,000+ rows, it doesn't have that many columns). I don't want to have to open and save every file in excel. Is there a way to get R to read the data as desired?
Since when I copy and paste it from notepad it had new lines for the rows, it seems like I just need R to read it knowing that commas separate columns on the same row and a return separates rows. I tried messing around with all the sep="" commands I could find. But I can't figure it out.
To first solve the Notepad issue:
You must have CR (carriage return, \r) characters between the lines (and no LF, \n characters, causing Notepad to see it as one line).
Some programs accept this as well as a new line character, some don't.
You can for example use Notepad++ to replace all '\r' with '\n' or '\r\n', using Replace wih the "Extended" option. First select View > Show Symbol > Show all characters, so see what you are doing.
Finally, to get back to R:
(As it was pointed out, R can actually handle CR as a newline)
read.csv assumes that you have non-empty header names in the first row, but instead you have:
cost,0.1
while later in the data you have a row with more than just two columns:
weight,1 of p1, 2 of p1,
This means that not all columns have a header name (and I wonder if 0.1 was supposed to be a header name anyway).
The two solutions can be:
add a header including all columns, or
as it was pointed out in a comment use header=F.

Resources