I'm new to R and I'm trying to read a tsv file where sometimes there is a "#" in the table. R just stopped reading when coming across the "#" and gave me the error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 6227 did not have 6 elements
I looked at that line in the file and I found the "#". The data looks like this:
CM School Supply #1 Upland CA 3 8 Shopping
When I delete it R can continue reading the table,but I have more "#"s in the file...
How to set the variables in the read.table()? I tried to search for a solution everywhere but failed... Hope someone here can help me out. Thanks!
You can completely turn off read.table()'s interpretation of comment characters (by default set to "#") by setting comment.char="" in your call to read.table().
Related
I have this similar problem: read.csv warning 'EOF within quoted string' prevents complete reading of file
That is, when I load a csv R says:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
I can get rid of this error by applying: quotes="" to read.csv
But the main problem still exists, only 22111 rows of 689233 in total are read into R. I would like to try removing all special characters from the csv to see if this clears the problem.
Related I found this: How to remove specific special characters in R
But is there a way to do it in read.csv, that is in the phase when I'm reading in the file?
Did you try fread from data.table? It can optimize the task and likely deal with some common issues. As you haven't provide any piece of data, I'm giving a silly example:
> fread('col1,col2\n5,"4\n3"')
col1 col2
1: 5 4\n3
It was indeed a special charcter. There was a → (arrow, hexadecimal value 0x1A) on line 22,112.
After deleting the arrow I get the data to load normally!
Solution of datatable expord csv with special chahracters
Find charset from
https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.js
or
https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.min.js
and change it to 'UTF-8-BOM'from 'UTF-8'
I try to read .csv file from website into R as following:
poll = read.csv("http://www.aec.gov.au/About_AEC/cea- notices/files/2013/prdelms.gaz.statics.130901.09.00.02.csv")
But then I got the warning message:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Then I searched previous StackOverflow, and changed my code to:
poll = read.csv("http://www.aec.gov.au/About_AEC/cea-notices/files/2013/prdelms.gaz.statics.130901.09.00.02.csv", quote="")
This seemed to solve problem, I got no warning, and got 8855 * 26 data. My question is:
What did the original problem mean, and why did the second code fix it?
Thank you!
Your file contains a symbol ", but this symbol is normally interpreted as a quote. This broke the line that contains the symbol. You have to disable the use of this symbol as a quote.
I have very large text message which contain "",*\n* but while reading a file whose one of the column contain text is not getting read properly just because message contain "" and "\n". I have used the following
dat = read.csv("abc", header=F, sep=",", quote ="\"'", stringsAsFactors=FALSE, allowEscapes=T, flush=T, comment.char="")
It reads file incorrectly with read.csv and reading as a table,getting an error
dat = read.table("abc", header =F,sep="," , quote = "\"'",,stringsAsFactors = FALSE,allowEscapes=T,flush =T,comment.char="")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 38 did not have 20 elements
So my row gets break in message column,I saved my file as eol ='\r\r\n' and quote=T but while reading I didn't find any parameter to read it back in the same format.
saved file as
write.table(z,file="abc",append=F,quote=T,sep=",",eol="\r\r\r\r\r\n",row.names=F,col.names=F)
in this example
"In case you know,give some hint
lot of text.....
.................
---------------------------------------------------------------------------
\"thank you very much for your time
and your effort\"
---------------------------------------------------------------------------"
it breaks after
"In case you know,give some hint
lot of text.....
.................
---------------------------------------------------------------------------
\"thank you very much for your time
while reading how can i use eol in order to retrieve complete text message in the same column .I am not able to read a written file back,though the file successfully uploded in Mysql with loading script.Any help in this direction.
thanks.
A question closer to mine was asked ans answered here.
My problem if fairly simple: I need to import in R a .tsv file, but I cannot because some elements contain a \t so that I received an error like:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 34 did not have 6 elements
One way to proceed would be to use gsub in order to replace the \ts. But the file is quite big in size, around 11GB, and doing this pre-processing would probably be too much for my machine. Any idea about a possible short-cut here?
Some context: at the end I need to import the whole dataset into a SQL database; I could do it without doing this conversion but at that point I would have the same problem.
I have a text document that is separated by tab. I did notice a bunch of tabs after the data in the text doc and am unsure if that is the issue here.
I have set the working directory:
setwd("D:/Classes/CSC/gmcar_price")
Then I attempt to read the table using
data=read.table("gmcar_price.txt", header=T)
But this error is coming up:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 11 did not have 13 elements
Any idea what is going on here? I have looked at line 11 and all the data is there.
Edit:
this is the format of the data
Price,Mileage,Make,Model,Trim,Type,Cylinder,Liter,Doors,Cruise,Sound,Leather
data=read.table("gmcar.price.txt", header=T, sep = "\t")
Thanks to shujaa, this solved the issue that I was having.