missing lines when read csv file into R - r

I have trouble with reading csv file into R. The file contains more than 10000 lines, but only 4977 lines are read into R. And there is no missing value in the file. My code below:
mydata = read.csv("12260101.csv", quote = "\"", skipNul = TRUE)
write.csv(mydata, "check.csv")

It's hard to say without seeing the CSV file. You might want to compare rows that aren't being imported with the imported ones.
I would try using the function read_csv() from the package readr or fread() from data.table.

As other posters pointed out, hard to reproduce without an example. I had a similar issue with read.csv but fread worked without any problems. Might be worth a shot to give it a try.

Related

Writing to a CSV file producing errors

I am using R to analyze some text data. After doing some aggregation, I had a new dataframe I wanted to write to a csv file, so I can use it in other analyses. The dataframe looks correct in r- it only has 2 columns with text data- but once I write the csv and open it, the text is scattered across different columns. Here is the code I was using:
write.csv(new_df, "4.19 Group 1_agg by user try 2.csv")
I tried adding in an extra bit of code to specify that it should be using UTF-8, since I've heard this could be an encoding error, so the code then looked like this:
write.csv(new_df, "4.19 Group 1_agg by user try 2.csv", fileEncoding = "UTF-8")
I also tried reading in the file differently (using fread instead of read.csv)
Still, the csv file looks wrong/messy in many places. Here is what it should look like:
This is what it looks like currently:
Again, I think the error must be in writing the csv file, because everything looks good in R when I check it using names and head. Any help is appreciated, thank you!

Reading in file gives empty rows and columns

Given this CSV file:
How to read a file so that the extra commas that are not a part of data are excluded?
Seems that the file is ok. Have you tried the correct options for arguments in your importing function?
Would you like to try read_delim() from the readr package?

Troubles with importing data into R

I know it is a basic question and I have been looking for a specific answer for months.
Here is the deal:
Every time that I try to import tables to R, there is a problem and they never get imported properly. I have done this with my own files and with files that I got from courses. I have tried putting comas, semicolons, I have used the (header=TRUE,
sep=",", row.names="id")
But it just won't work.
Here is what I mean. I am really getting desperate with being unable to complete this very simple task that prevents me to go on with the actual analysis.
Thank you very much in advance.
enter image description here
enter image description here
Lke the first comment says, the problem is the separator. You can try
fread(file_name, sep = ";") # data.table package
read.csv(file_name, sep = ";") # utils package
It could be that you're importing European .csv files or that your Excel is set to some other language. You may also want to check on the decimal separator dec = "," which is common for European .csv files.

Fread function change name of first column in large csv files

I have 2.3 GB csv file. When I read it using fread function present in data.table library of R , it adds a '' symbol to the first column .
So my data's first column was 'HistoryID' , after reading it through fread , it becomes 'HistoryID'. Other columns remain unaffected.
Is there a specific encoding which should be used to solve this problem ?
When I read the data in read.csv function, this problem gets solved if we use ' UTF-8-BOM' encoding , but the same doesn't seem to work for fread.
According to the documentation on cran -- R-data.html#Variations-on-read_002etable
Byte Order Marks still cause problem with encoding, and can be dealt with like this:
it can be read on Windows by
read.table("intro.dat", fileEncoding = "UTF-8")
but on a Unix-alike might need
read.table("intro.dat", fileEncoding = "UTF-8-BOM")
Check the section 2.1 Variations on read.table
It also seems to suggest that read.csv uses this trick.

Importing data into R, which fileformat is the easiest?

I have a few datasets in the following formats: .asc, .wf1, .xls. I am indifferent about which one I use, as they are exactly the same. Could anyone please tell me which one of the fileformats is easiest to import into R, and how this is done?
save xls to txt or csv, they are easiest for R to read:
but be sure that only one header line or no header line is recommended
try
read.table('*.txt', header=T)
read.table('*.txt', header=F)
read.delim(*, header=F)
read.csv("*.csv")
etc.
Definitely not .xls. If .asc is some sort of fixed-width format, than that can be read in easily with read.csv or read.table.
Other formats that are easy to read include CSV (comma- or tab-separated text files) and DTA (Stata files, via read.dta in the foreign package).
Edit: #KarlOveHufthammer pointed out that .asc is most likely a fixed-width format. In which case read.fwf is the tool to use to read it in to R. Note that FWF is a pain in the heiny to deal with, though, in that you have to have the column widths and names of every column stored somewhere else, then convert that to a format that read.fwf can use--and that's before problems like overlapping ranges.

Resources