Read csv with timestamp to R. Define colClass in table.read - r

I'm trying to read a table (.CSV 120K x 21 wide) assigning object classes to columns with:
read.table(file = "G1to21jan2015.csv",
header = TRUE,
colClasses = c (rep("POSICXct", 6),
rep("numeric", 2),
rep("POSICXct", 2),
"numeric",
NULL,
"numeric",
NULL,
rep("character", 2),
rep("numeric", 5))
)
I get the following error:
Error in read.table(file = "G1to21jan2015.csv", header = TRUE, colClasses = c(rep("POSICXct", :
more columns than column names
I've confirmed that the csv has 21 columns and so (I believe) does my request.
by removing second argument header = TRUE, I get a different error though:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 19 elements
Note
I'm using POSICXct to read data in format: 1/5/2015 15:00:00 where m/d/Y H:M, numeric to read data like 1559, NULL to columns which are empty and I want to skip and character for text

For an unconventional date-time format, one can import as character (step 1) and then coerce the column via strp (step 2)
step 1
df <- read.table(file = "data.csv",
header = TRUE,
sep = "," ,
dec = "." ,
colClasses = "character",
comment.char = ""
)
step 2
strptime(df$v1, "%m/%d/%y %H:%M")
v1 being the name of the column to coerce (in this case date-time in the unconventional format 12/13/2014 15:16:17)
Notes
Using argument sep is necessary since read.table default for sep = "".
When using read.csv there is no need to use the sep argument, which defaults to ",".
Using comment.char = "" (when possible) improves reading time.
Useful info at http://cran.r-project.org/doc/manuals/r-release/R-data.pdf

Related

Warning message in R when using colClasses when reading csv files

I am using lapply to read a list of files. The files have multiple rows and columns, and I interested in the first row in the first column. The code I am using is:
lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))
The first row has three columns but I am only reading the first one. From other posts on stackoverflow I found that the way to do this would be to use colClasses = c('character', 'NULL', 'NULL'). While this approach is working, I would like to know the underlying issue that is causing the following error message to be generated and hopefully prevent it from popping up:
"In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3"
It's to let you know that you're just keeping one column of the data out of three because it doesn't know how to handle colClasses of "NULL". Note your NULL is in quotation marks.
An example:
write.csv(data.frame(fi=letters[1:3],
fy=rnorm(3,500,1),
fo=rnorm(3,50,2))
,file="a.csv",row.names = F)
write.csv(data.frame(fib=letters[2:4],
fyb=rnorm(3,5,1),
fob=rnorm(3,50,2))
,file="b.csv",row.names = F)
file_list=list("a.csv","b.csv")
lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))
Which results in:
[[1]]
FALSE.
1 fi
[[2]]
FALSE.
1 fib
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3
Which is the same as if you used:
lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c('character', 'asdasd', 'asdasd'))
But the warning goes away (and you get the rest of the row as a result) if you do:
lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c( 'character',NULL, NULL))
You can see where errors and warnings come from in source code for a function by entering, for example, read.table directly without anything following it, then searching for your particular warning within it.

How to avoid factors in R when reading csv data

I have data in a csv file. when i get it read, the columns are in factor levels using which I cannot do any computation.
I used
as.numeric(df$variablename) but it renders a completely different set of data for the variable.
original data in the variable: 2961,488,632,
as.numeric output: 1,8,16
When reading data using read.table you can
specify how your data is separated sep = ,
what the decimal point is dec = ,
how NA characters look like na.strings =
that you do not want to convert strings to factors stringsAsFactors = F
In your case you could use something like:
read.table("mycsv.csv", header = TRUE, sep = ",", dec = ".", stringsAsFactors = F,
na.strings = c("", "-"))
In addition to the answer by Cettt , there's also colClasses.
If you know in advance what data types the columns your csv file has, you can specify this. This stops R from "guessing" what the datatype is, and lets you know when something isn't right, rather than deciding it must be a string. e.g. if your 4-column csv file has columns that are Text, Factors, Integer, Numeric, you can use
read.table("mycsv.csv", header = T, sep = ",", dec = ".",
colClasses=c("character", "factor", "integer", "numeric"))
Edited to add:
As pointed out by gersht, the issue is likely some non-number in the numbers column. Often, this can be how the value NA was coded. Specifying colClasses causes R to give an error message when it encounters any such "not numeric or NA" values, so you can easily see the issue. If it's a non-default coding of NA, use the argument na.strings = c("NA", "YOUR NA VALUE") If it's another issue, you'll likely have to fix the file before importing. For example:
read.table(sep=",",
colClasses=c("character", "numeric"),
text="
cat,11
canary,12
dog,1O") # NB not a 10; it's a 1 and a capital-oh.
gives
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
scan() expected 'a real', got '1O'

read.csv warning 'EOF within quoted string' to read whole file

I have a .csv file that contains 285000 observations. Once I tried to import dataset, here is the warning and it shows 166000 observations.
Joint <- read.csv("joint.csv", header = TRUE, sep = ",")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
When I coded with quote, as follows:
Joint2 <- read.csv("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
When I coded like that, it shows 483000 observations:
Joint <- read.table("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
What should I do to read the file properly?
I think the problem has to do with file encoding. There are a lot of special characters in the header.
If you know how your file is encoded you can specify using the fileEncoding argument to read.csv.
Otherwise you could try to use fread from data.table. It is able to read the file despite the encoding issues. It will also be significantly faster for reading such a large data file.

Error Reading a CSV File in R

I am trying to read a bunch of files from http://www.ercot.com/gridinfo/load/load_hist, all the files are read properly with read.csv except for the last one, the file for 2017. When I attempt to read the file with read.csv I get the following error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
scan() expected 'a real', got '"8'
However, I have checked with Excel and there is not "8 or 8 value in the file. The error message seems to be clear, but I can't find the "8 or 8 and I have the same issue even if I read 0 rows (with the nrows argument of the read.csv function).
hold2 <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)))
hold2 <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=0)
Also, in the last row of the file there are values that do not respect the format in the rest of the file. I would like to skip the last line, but there are no argument in the read.csv function to do this. Is there any work around? I am thinking or using something like:
hold2 <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=nrow(read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""))-1))
Any thoughts on how to best to this? Thanks
Using the readr package
> df <- readr::read_csv("~/Desktop/native_load_2017.csv")
Parsed with column specification:
cols(
`Hour Ending` = col_character(),
COAST = col_number(),
EAST = col_number(),
FWEST = col_number(),
NORTH = col_number(),
NCENT = col_number(),
SOUTH = col_number(),
SCENT = col_character(),
WEST = col_number(),
ERCOT = col_number()
)
>
can see the SCENT column is being parsed as character (due to the difference in format of values in the last row that you noted). Below, specifying the first column as character and the default as col_number() reads the file (to note: col_number() handles the commas and decimal points present in the columns you had as double).
options(digits=7)
df <- readr::read_csv("~/Desktop/native_load_2017.csv", col_types = cols(
`Hour Ending` = col_character(),
.default = col_number())
)
sapply(df, class)
#df[complete.cases(df),] # to remove the last row if needed

R - read.csv add rows when loading the dataset

I'm trying to read a .csv in R. Some rows of one column have text with a komma, within double quotes: "example, another example"
But R alters the rows (it adds rows) when I try to read it like this:
steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',')
This one doesn't work either when I did a search on the internet:
steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
This is the error message:
steekproef <- read.csv("steekproef.csv", header = T, sep =",", quote ="\"")
comes with error:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Gives data.frame: 1391160 obs. of 29 variables
str(steekproef) gives no error but a
'data.frame': 3103620 obs. of 29 variables:
The dataset has 29 columns and 3019438 rows
I don't think that the problem is caused be the "example, another example":
I created a testfile in Excel and saved it as .csv. It looks like this in Notepad++:
test,num
"example, another example",1
"example, another example",2
example,3
example,4
I could import it without problems using
steekproef<- read.csv('steekproef.csv', header = T, sep = ',')
or steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
Your first try: steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',') gave me the Error in read.table(file = file, header = header, sep = sep, quote = quote,: duplicate 'row.names' are not allowed

Resources