Error in reading a massive file - r

I have very large text message which contain "",*\n* but while reading a file whose one of the column contain text is not getting read properly just because message contain "" and "\n". I have used the following
dat = read.csv("abc", header=F, sep=",", quote ="\"'", stringsAsFactors=FALSE, allowEscapes=T, flush=T, comment.char="")
It reads file incorrectly with read.csv and reading as a table,getting an error
dat = read.table("abc", header =F,sep="," , quote = "\"'",,stringsAsFactors = FALSE,allowEscapes=T,flush =T,comment.char="")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 38 did not have 20 elements
So my row gets break in message column,I saved my file as eol ='\r\r\n' and quote=T but while reading I didn't find any parameter to read it back in the same format.
saved file as
write.table(z,file="abc",append=F,quote=T,sep=",",eol="\r\r\r\r\r\n",row.names=F,col.names=F)
in this example
"In case you know,give some hint
lot of text.....
.................
---------------------------------------------------------------------------
\"thank you very much for your time
and your effort\"
---------------------------------------------------------------------------"
it breaks after
"In case you know,give some hint
lot of text.....
.................
---------------------------------------------------------------------------
\"thank you very much for your time
while reading how can i use eol in order to retrieve complete text message in the same column .I am not able to read a written file back,though the file successfully uploded in Mysql with loading script.Any help in this direction.
thanks.

Related

Reading CSV with multi-line columns in R

A dataset I am trying to read oddly contains a whole lot of multi-line texts in one column. read.csv("the_ill_formated_file.csv") is able to read some of it, with a number of columns mixed up for some rows then throws up a warning message
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
fread("the_ill_formated_file.csv") is unable to read it at all throwing up this error message
Error in fread("the_ill_formated_file.csv") :
Internal error. No eol2 immediately before line 30, 'p' instead
In addition: Warning message:
In fread("the_ill_formated_file.csv") :
Detected eol as \n\r, a highly unusual line ending. According to Wikipedia the Acorn BBC used this. If it is intended that the first column on the next row is a character column where the first character of the field value is \r (why?) then the first column should start with a quote (i.e. 'protected'). Proceeding with attempt to read the file.
The following is a snippet of how the file is formatted:
"comment_id", "comment", "post_date", "reply_count", "reply_ids"
1001, "This comment is multi-line with
space between each line!
Quite a fancy format this one", "2015-08-16" , 3, "{1,2,3}"
1002, "This second row is all on a single line, which is the usual format read.csv/fread in R will expect it", "2015-08-17" , 0, "{}"
Got the same mixed up columns when I opened it in Excel.
Thanks in advance for the assistance.

How to remove special characters while loading a csv in R?

I have this similar problem: read.csv warning 'EOF within quoted string' prevents complete reading of file
That is, when I load a csv R says:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
I can get rid of this error by applying: quotes="" to read.csv
But the main problem still exists, only 22111 rows of 689233 in total are read into R. I would like to try removing all special characters from the csv to see if this clears the problem.
Related I found this: How to remove specific special characters in R
But is there a way to do it in read.csv, that is in the phase when I'm reading in the file?
Did you try fread from data.table? It can optimize the task and likely deal with some common issues. As you haven't provide any piece of data, I'm giving a silly example:
> fread('col1,col2\n5,"4\n3"')
col1 col2
1: 5 4\n3
It was indeed a special charcter. There was a → (arrow, hexadecimal value 0x1A) on line 22,112.
After deleting the arrow I get the data to load normally!
Solution of datatable expord csv with special chahracters
Find charset from
https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.js
or
https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.min.js
and change it to 'UTF-8-BOM'from 'UTF-8'

Error reading .csv file from website into R

I try to read .csv file from website into R as following:
poll = read.csv("http://www.aec.gov.au/About_AEC/cea- notices/files/2013/prdelms.gaz.statics.130901.09.00.02.csv")
But then I got the warning message:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Then I searched previous StackOverflow, and changed my code to:
poll = read.csv("http://www.aec.gov.au/About_AEC/cea-notices/files/2013/prdelms.gaz.statics.130901.09.00.02.csv", quote="")
This seemed to solve problem, I got no warning, and got 8855 * 26 data. My question is:
What did the original problem mean, and why did the second code fix it?
Thank you!
Your file contains a symbol ", but this symbol is normally interpreted as a quote. This broke the line that contains the symbol. You have to disable the use of this symbol as a quote.

R stops reading a table when coming across "#"

I'm new to R and I'm trying to read a tsv file where sometimes there is a "#" in the table. R just stopped reading when coming across the "#" and gave me the error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 6227 did not have 6 elements
I looked at that line in the file and I found the "#". The data looks like this:
CM School Supply #1 Upland CA 3 8 Shopping
When I delete it R can continue reading the table,but I have more "#"s in the file...
How to set the variables in the read.table()? I tried to search for a solution everywhere but failed... Hope someone here can help me out. Thanks!
You can completely turn off read.table()'s interpretation of comment characters (by default set to "#") by setting comment.char="" in your call to read.table().

`read.table` error with a tab separated file

I have a text document that is separated by tab. I did notice a bunch of tabs after the data in the text doc and am unsure if that is the issue here.
I have set the working directory:
setwd("D:/Classes/CSC/gmcar_price")
Then I attempt to read the table using
data=read.table("gmcar_price.txt", header=T)
But this error is coming up:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 11 did not have 13 elements
Any idea what is going on here? I have looked at line 11 and all the data is there.
Edit:
this is the format of the data
Price,Mileage,Make,Model,Trim,Type,Cylinder,Liter,Doors,Cruise,Sound,Leather
data=read.table("gmcar.price.txt", header=T, sep = "\t")
Thanks to shujaa, this solved the issue that I was having.

Resources