I have a .csv file that contains 285000 observations. Once I tried to import dataset, here is the warning and it shows 166000 observations.
Joint <- read.csv("joint.csv", header = TRUE, sep = ",")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
When I coded with quote, as follows:
Joint2 <- read.csv("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
When I coded like that, it shows 483000 observations:
Joint <- read.table("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
What should I do to read the file properly?
I think the problem has to do with file encoding. There are a lot of special characters in the header.
If you know how your file is encoded you can specify using the fileEncoding argument to read.csv.
Otherwise you could try to use fread from data.table. It is able to read the file despite the encoding issues. It will also be significantly faster for reading such a large data file.
Related
I have a text file as my data. The text file is delimited using double pipes – i.e. “||”.
The data files look like this:
Cat1||Cat2||Cat3\n
1||abc||23.5
When I load the data with read.table with sep = "||", I get one byte limitation error.
df <- read.table("data.txt",
+ header = FALSE,
+ sep = "||")
Error in scan(file, what = "", sep = sep, quote = quote, nlines = 1, quiet = TRUE, :
invalid 'sep' value: must be one byte
In python, I can use '\|\|' as a separator to load the data.
Please help, how can I load the text file data in r data frame?
I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")
I'm trying to read a .csv in R. Some rows of one column have text with a komma, within double quotes: "example, another example"
But R alters the rows (it adds rows) when I try to read it like this:
steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',')
This one doesn't work either when I did a search on the internet:
steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
This is the error message:
steekproef <- read.csv("steekproef.csv", header = T, sep =",", quote ="\"")
comes with error:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Gives data.frame: 1391160 obs. of 29 variables
str(steekproef) gives no error but a
'data.frame': 3103620 obs. of 29 variables:
The dataset has 29 columns and 3019438 rows
I don't think that the problem is caused be the "example, another example":
I created a testfile in Excel and saved it as .csv. It looks like this in Notepad++:
test,num
"example, another example",1
"example, another example",2
example,3
example,4
I could import it without problems using
steekproef<- read.csv('steekproef.csv', header = T, sep = ',')
or steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
Your first try: steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',') gave me the Error in read.table(file = file, header = header, sep = sep, quote = quote,: duplicate 'row.names' are not allowed
I'm loading a csv file into R from the Quandl database.
the file is comma delimited and the data looks as follows:
quandl code,name
WIKI/ACT,"Actavis, Inc."
WIKI/ADM,"Archer-Daniels-Midland Company"
WIKI/AEE,"Ameren Corporation"
...
...
i use the following code to load the data:
US.Stocks <-read.table(file=ABC,header=FALSE,sep=",")
however, i get the following error:
Error in read.table(data.frame(file = ABC, header = FALSE, :
'file' must be a character string or connection
Can someone pls help me with what im doing wrong? suspect ive not classified some parameter in the read.csv command?
thanks
Tom
you should use
read.csv(file = yourFilePath, header = TRUE)
but I think the problem is in your file path, maybe you are missing file extension and remember to wrap your file path in double qoutes ( "yourfilepath" )
UPDATE:
read.csv is just wrappers around read.table with some default parameters
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...) {
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
}
I am trying to read a .tsv (tab-separated value) file into R using a specific encoding. It's supposedly windows-1252. And it has a header.
Any suggestions for the code to put it into a data frame?
Something like this perhaps?
mydf <- read.table('thefile.txt', header=TRUE, sep="\t", fileEncoding="windows-1252")
str(mydf)
You can also use:
read.delim('thefile.txt', header= T, fileEncoding= "windows-1252")
Simply entering the command into your R consol:
> read.delim
function (file, header = TRUE, sep = "\t", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
reveals that read.delim is a packaged read.table command that already specifies tabs as your data's separator. read.delim might be more convenient if you're working with a lot of tsv files.
The difference between the two commands is discussed in more detail in this Stack question.
df <- read.delim(~/file_directory/file_name.tsv, header = TRUE) will be working fine for single .tsv file, because it is already tab separated, so no need sep = "\t". fileEncoding= "windows-1252" could be used but not necessary.