Issues importing a csv in R - r

I'm trying to teach myself R (just started).
I decided to import 2 csv files to practice a join on them.
One file imported just fine, the other one is giving off the following errors:
Here is the csv file link:
https://data.world/jonathankkizer/occupation-computerization
I used the following statement
occupationforjoin<-read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
header=TRUE, sep=",")
Warning messages:
1: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 1 appears to contain embedded nulls
2: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 2 appears to contain embedded nulls
3: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 3 appears to contain embedded nulls
4: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 4 appears to contain embedded nulls
5: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string
7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input
I found on StackOverflow that it could be due to encoding, so I used the suggested solution and executed the statement
occupationforjoin<-read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
header=TRUE, sep=",", fileEncoding="UTF-16LE")
It gave me a different error message:
Error in read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
:
more columns than column names
I also tried using the read.csv function to no avail.
How do I fix this problem and import the data set successfully? None of the solutions (e.g., using "skipNul = TRUE", "comment.char="" " parameters) that I found online helped.
UPD:
Here's the paste of the data set if you don't want to download the csv file from the data world:
https://pastebin.com/SPEtWT6f

I finally found the solution!
I was going nuts; even my instructor didn't know how to fix it!
This statement works:
o<-read.csv("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/Occ.txt", header=T, sep="\t", fileEncoding="UTF-16LE")
Like I said in my original question: I tried using fileEncoding="UTF-16LE" and it didn't help. After asking the question, I tried using sep="\t", and it didn't help. But using both of them did the trick!

Try to use the function of read_csv() from the readr package.

Use dataframe = read.csv("name_of_file.csv")
or
dataframe = read.csv(file.choose()).
Hope this will work.

Related

Why is my txt file not being completely read with read.delim?

I am trying to read this large text file (~3gb) into R, but unfortunately I am not being able to read fully load it. What happens is that I'm missing a lot of rows (I get a dataframe of ~700 thousand rows, while I know the file has at least 4-5million).
The code I was initially using was as follows:
df<-read.delim("file.txt",quote = "",comment.char = "")
However, besides noticing that R wasn't loading all the rows, I was also receiving this warning:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
I searched for a bit online, and I found I could solve it by adding the skipNul = TRUE argument. When I included it in the read.delim function, the warning stopped showing, but my file keeps missing a lot of rows, and still returns the same number of rows as before.
I have loaded files of similar size in the past, so I'm not sure why this is happening.
If someone has any idea what might be causing the problem, I would be very thankful.

Reading csv file using R and RStudio

I am trying to read a csv file in R, but I am getting some errors.
This is what I have and also I have set the correct path
mydata <- read.csv("food_poisioning.csv")
But I am getting this error
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>Y'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
I believe I am getting this error because my csv file is actually not separated by comma, but it has spaces. This is what is looks like:
I tried using sep=" ", but it didn't work.
If you're having difficulty using read.csv() or read.table() (or writing other import commands), try using the "Import Dataset" button on the Environment panel in RStudio. It is useful especially when you are not sure how to specify the table format or when the table format is complex.
For your .csv file, use "From Text (readr)..."
A window will pop up and allow you to choose a file/URL to upload. You will see a preview of the data table after you select a file/URL. You can click on the column names to change the column class or even "skip" the column(s) you don't need. Use the Import Options to further manage your data.
Here is an example using CreditCard.csv from Vincent Arel-Bundock's Github projects:
You can also modify and/or copy and paste the code in Code Preview, or click Import to run the code when you are ready.

Read only selected columns in R read.csv

I have encountered this strange xlsx file, which have a lot of empty columns, and when I swipe rightwards in Microsoft Excel, the empty columns kept appearing. I want to read this file using read.csv, but it gives me warning In read.table(file = file, header = header, sep = sep, quote = quote, : line 1 appears to contain embedded nulls, I am guessing this is because of the empty columns. How can I solve this and read it using read.csv? Thank you

Importing data from csv File to R

I had troubles importing data I need from .csv files to R.
So to check, I created a simple .csv from excel with 2 columns and 3 rows - it reads like this in notepad
what,now
1,4
2,5
3,6
When I try import this data into R
d <- read.csv("D:/Book1.csv")
it gives a warning message,
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'D:/Book1.csv'
and then when I view the data, it's some gibberish.
What do I do?
I was using a work PC, and the files were encrypted - which was the reason why importing data into R was not working. I bypassed it by copying data into a text document.
Thanks everyone!

Using read.csv when a data entry is a space (not blank!)

I am having a problem with using read.csv in R. I am trying to import a file that has been saved as a .csv file in Excel. Missing values are blank, but I have a single entry in one column which looks blank, but is in fact a space. Using the standard command that I have been using for similar files produces this error:
raw.data <- read.csv("DATA_FILE.csv", header=TRUE, na.strings="", encoding="latin1")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at ' floo'
I have tried a few variations, adding arguments to the read.csv() command such as na.strings=c(""," ") and strip.white=TRUE, but these result in the exact same error.
It is a similar error to what you get when you use the wrong encoding option, but I am pretty sure this shouldn't be a problem here. I have of course tried manually removing the space (in Excel), and this works, but as I'm trying to write generic code for a Shiny tool, this is not really optimal.

Resources