R reading a tsv file using specific encoding - r

I am trying to read a .tsv (tab-separated value) file into R using a specific encoding. It's supposedly windows-1252. And it has a header.
Any suggestions for the code to put it into a data frame?

Something like this perhaps?
mydf <- read.table('thefile.txt', header=TRUE, sep="\t", fileEncoding="windows-1252")
str(mydf)

You can also use:
read.delim('thefile.txt', header= T, fileEncoding= "windows-1252")
Simply entering the command into your R consol:
> read.delim
function (file, header = TRUE, sep = "\t", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
reveals that read.delim is a packaged read.table command that already specifies tabs as your data's separator. read.delim might be more convenient if you're working with a lot of tsv files.
The difference between the two commands is discussed in more detail in this Stack question.

df <- read.delim(~/file_directory/file_name.tsv, header = TRUE) will be working fine for single .tsv file, because it is already tab separated, so no need sep = "\t". fileEncoding= "windows-1252" could be used but not necessary.

Related

R write.table function inserts unwanted empty line at the bottom of my csv

I have this code:
write.table(df, file = f, append = F, quote = TRUE, sep = ";",
eol = "\n", na = "", dec = ".", row.names = FALSE,
col.names = TRUE, qmethod = c("escape", "double"))
where df is my data frame and f is a .csv file name.
The problem is that the resulting csv file has an empty line at the end.
When I try to read the file:
dd<-read.table(f,fileEncoding = "UTF-8",sep = ";",header = T,quote = "\"")
I get the following error:
incomplete final line found by readTableHeader
Is there something I am missing?
Thank you in advance.
UPDATE: I solved the problem deleting the UTF-8 file encoding into the read.table:
dd<-read.table(f,sep = ";",header = T,quote = "\"")
but I can't explain the reason of this, since the default for write.table seems to be UTF-8 anyway (I checked this using an advanced text tool).
Any idea of why this is happening?
Thank you,

Import text file in r with more than one byte as separator

I have a text file as my data. The text file is delimited using double pipes – i.e. “||”.
The data files look like this:
Cat1||Cat2||Cat3\n
1||abc||23.5
When I load the data with read.table with sep = "||", I get one byte limitation error.
df <- read.table("data.txt",
+ header = FALSE,
+ sep = "||")
Error in scan(file, what = "", sep = sep, quote = quote, nlines = 1, quiet = TRUE, :
invalid 'sep' value: must be one byte
In python, I can use '\|\|' as a separator to load the data.
Please help, how can I load the text file data in r data frame?

read.csv warning 'EOF within quoted string' to read whole file

I have a .csv file that contains 285000 observations. Once I tried to import dataset, here is the warning and it shows 166000 observations.
Joint <- read.csv("joint.csv", header = TRUE, sep = ",")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
When I coded with quote, as follows:
Joint2 <- read.csv("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
When I coded like that, it shows 483000 observations:
Joint <- read.table("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
What should I do to read the file properly?
I think the problem has to do with file encoding. There are a lot of special characters in the header.
If you know how your file is encoded you can specify using the fileEncoding argument to read.csv.
Otherwise you could try to use fread from data.table. It is able to read the file despite the encoding issues. It will also be significantly faster for reading such a large data file.

Issues reading data as csv in R

I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")

loading .csv file in R?

I'm loading a csv file into R from the Quandl database.
the file is comma delimited and the data looks as follows:
quandl code,name
WIKI/ACT,"Actavis, Inc."
WIKI/ADM,"Archer-Daniels-Midland Company"
WIKI/AEE,"Ameren Corporation"
...
...
i use the following code to load the data:
US.Stocks <-read.table(file=ABC,header=FALSE,sep=",")
however, i get the following error:
Error in read.table(data.frame(file = ABC, header = FALSE, :
'file' must be a character string or connection
Can someone pls help me with what im doing wrong? suspect ive not classified some parameter in the read.csv command?
thanks
Tom
you should use
read.csv(file = yourFilePath, header = TRUE)
but I think the problem is in your file path, maybe you are missing file extension and remember to wrap your file path in double qoutes ( "yourfilepath" )
UPDATE:
read.csv is just wrappers around read.table with some default parameters
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...) {
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
}

Resources