I have encountered this strange xlsx file, which have a lot of empty columns, and when I swipe rightwards in Microsoft Excel, the empty columns kept appearing. I want to read this file using read.csv, but it gives me warning In read.table(file = file, header = header, sep = sep, quote = quote, : line 1 appears to contain embedded nulls, I am guessing this is because of the empty columns. How can I solve this and read it using read.csv? Thank you
Related
I am trying to read a csv file in R, but I am getting some errors.
This is what I have and also I have set the correct path
mydata <- read.csv("food_poisioning.csv")
But I am getting this error
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>Y'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
I believe I am getting this error because my csv file is actually not separated by comma, but it has spaces. This is what is looks like:
I tried using sep=" ", but it didn't work.
If you're having difficulty using read.csv() or read.table() (or writing other import commands), try using the "Import Dataset" button on the Environment panel in RStudio. It is useful especially when you are not sure how to specify the table format or when the table format is complex.
For your .csv file, use "From Text (readr)..."
A window will pop up and allow you to choose a file/URL to upload. You will see a preview of the data table after you select a file/URL. You can click on the column names to change the column class or even "skip" the column(s) you don't need. Use the Import Options to further manage your data.
Here is an example using CreditCard.csv from Vincent Arel-Bundock's Github projects:
You can also modify and/or copy and paste the code in Code Preview, or click Import to run the code when you are ready.
I'm trying to teach myself R (just started).
I decided to import 2 csv files to practice a join on them.
One file imported just fine, the other one is giving off the following errors:
Here is the csv file link:
https://data.world/jonathankkizer/occupation-computerization
I used the following statement
occupationforjoin<-read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
header=TRUE, sep=",")
Warning messages:
1: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 1 appears to contain embedded nulls
2: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 2 appears to contain embedded nulls
3: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 3 appears to contain embedded nulls
4: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 4 appears to contain embedded nulls
5: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
: line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string
7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input
I found on StackOverflow that it could be due to encoding, so I used the suggested solution and executed the statement
occupationforjoin<-read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
header=TRUE, sep=",", fileEncoding="UTF-16LE")
It gave me a different error message:
Error in read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
:
more columns than column names
I also tried using the read.csv function to no avail.
How do I fix this problem and import the data set successfully? None of the solutions (e.g., using "skipNul = TRUE", "comment.char="" " parameters) that I found online helped.
UPD:
Here's the paste of the data set if you don't want to download the csv file from the data world:
https://pastebin.com/SPEtWT6f
I finally found the solution!
I was going nuts; even my instructor didn't know how to fix it!
This statement works:
o<-read.csv("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/Occ.txt", header=T, sep="\t", fileEncoding="UTF-16LE")
Like I said in my original question: I tried using fileEncoding="UTF-16LE" and it didn't help. After asking the question, I tried using sep="\t", and it didn't help. But using both of them did the trick!
Try to use the function of read_csv() from the readr package.
Use dataframe = read.csv("name_of_file.csv")
or
dataframe = read.csv(file.choose()).
Hope this will work.
I have two vectors of filenames, sample_files and actual_files. Each vector contains the filepaths of 4 files. In sample_files, each file is a .csv file and when I call lapply(sample_files, read.csv), I get a list with four dataframes as expected.
However, with the actual_files vector, each file is a .txt file that is double-pipe delimited (blurgh) and I get the below error when calling lapply(actual_files, read.csv(sep="|"))
Error in read.table(file = file, header = header, sep = sep, quote = quote,
: argument "file" is missing, with no default
I tried reading in each file separately with read.csv(filename, sep="|") and it created the dataframe, no problem.
What am I doing wrong with the second lapply call to get that error?
My question is similar to this question but it hasn't been answered.
thank you!
I want to import a big csv file in R (approximately 14 million rows and 13 columns). So I tried to use fread with the following code :
my_data <- fread(my_file,
sep = ";",
header = TRUE,
na.strings=c(""," ","NA"),
quote = "",
fill = TRUE,
check.names=FALSE,
stringsAsFactors=FALSE))
However, I got the following error :
Error in fread(path_alertes_profil, sep = ";", header = TRUE, na.strings = c("", :
Expecting 13 cols, but line 18533 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=';' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.
Therefore I tried to import my file with the function read_delim from the readr package, with the same parameters. It worked since my file appeared in the global environment (I'm working with RStudio). However, it only got 741629 rows instead of the 14+ million rows
How can I solve this problem (I tried to find a solution for the error when using fread() but didn't find any useful resource)
I am having a problem with using read.csv in R. I am trying to import a file that has been saved as a .csv file in Excel. Missing values are blank, but I have a single entry in one column which looks blank, but is in fact a space. Using the standard command that I have been using for similar files produces this error:
raw.data <- read.csv("DATA_FILE.csv", header=TRUE, na.strings="", encoding="latin1")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
invalid multibyte string at ' floo'
I have tried a few variations, adding arguments to the read.csv() command such as na.strings=c(""," ") and strip.white=TRUE, but these result in the exact same error.
It is a similar error to what you get when you use the wrong encoding option, but I am pretty sure this shouldn't be a problem here. I have of course tried manually removing the space (in Excel), and this works, but as I'm trying to write generic code for a Shiny tool, this is not really optimal.