Error Reading a CSV File in R - r

I am trying to read a bunch of files from http://www.ercot.com/gridinfo/load/load_hist, all the files are read properly with read.csv except for the last one, the file for 2017. When I attempt to read the file with read.csv I get the following error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
scan() expected 'a real', got '"8'
However, I have checked with Excel and there is not "8 or 8 value in the file. The error message seems to be clear, but I can't find the "8 or 8 and I have the same issue even if I read 0 rows (with the nrows argument of the read.csv function).
hold2 <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)))
hold2 <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=0)
Also, in the last row of the file there are values that do not respect the format in the rest of the file. I would like to skip the last line, but there are no argument in the read.csv function to do this. Is there any work around? I am thinking or using something like:
hold2 <- read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""), header=TRUE, sep=",", dec = ".", colClasses=c("character",rep("double",9)), nrows=nrow(read.csv(paste(PATH, "\\CSV\\", "native_load_2017.csv", sep=""))-1))
Any thoughts on how to best to this? Thanks

Using the readr package
> df <- readr::read_csv("~/Desktop/native_load_2017.csv")
Parsed with column specification:
cols(
`Hour Ending` = col_character(),
COAST = col_number(),
EAST = col_number(),
FWEST = col_number(),
NORTH = col_number(),
NCENT = col_number(),
SOUTH = col_number(),
SCENT = col_character(),
WEST = col_number(),
ERCOT = col_number()
)
>
can see the SCENT column is being parsed as character (due to the difference in format of values in the last row that you noted). Below, specifying the first column as character and the default as col_number() reads the file (to note: col_number() handles the commas and decimal points present in the columns you had as double).
options(digits=7)
df <- readr::read_csv("~/Desktop/native_load_2017.csv", col_types = cols(
`Hour Ending` = col_character(),
.default = col_number())
)
sapply(df, class)
#df[complete.cases(df),] # to remove the last row if needed

Related

How to avoid factors in R when reading csv data

I have data in a csv file. when i get it read, the columns are in factor levels using which I cannot do any computation.
I used
as.numeric(df$variablename) but it renders a completely different set of data for the variable.
original data in the variable: 2961,488,632,
as.numeric output: 1,8,16
When reading data using read.table you can
specify how your data is separated sep = ,
what the decimal point is dec = ,
how NA characters look like na.strings =
that you do not want to convert strings to factors stringsAsFactors = F
In your case you could use something like:
read.table("mycsv.csv", header = TRUE, sep = ",", dec = ".", stringsAsFactors = F,
na.strings = c("", "-"))
In addition to the answer by Cettt , there's also colClasses.
If you know in advance what data types the columns your csv file has, you can specify this. This stops R from "guessing" what the datatype is, and lets you know when something isn't right, rather than deciding it must be a string. e.g. if your 4-column csv file has columns that are Text, Factors, Integer, Numeric, you can use
read.table("mycsv.csv", header = T, sep = ",", dec = ".",
colClasses=c("character", "factor", "integer", "numeric"))
Edited to add:
As pointed out by gersht, the issue is likely some non-number in the numbers column. Often, this can be how the value NA was coded. Specifying colClasses causes R to give an error message when it encounters any such "not numeric or NA" values, so you can easily see the issue. If it's a non-default coding of NA, use the argument na.strings = c("NA", "YOUR NA VALUE") If it's another issue, you'll likely have to fix the file before importing. For example:
read.table(sep=",",
colClasses=c("character", "numeric"),
text="
cat,11
canary,12
dog,1O") # NB not a 10; it's a 1 and a capital-oh.
gives
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
scan() expected 'a real', got '1O'

Issues reading data as csv in R

I have a large data set of (~20000x1). Not all the fields are filled, in other words the data does have missing values. Each feature is a string.
I have done the following code runs:
Input:
data <- read.csv("data.csv", header=TRUE, quote = "")
datan <- read.table("data.csv", header = TRUE, fill = TRUE)
Output for the second code:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 80 elements
Input:
datar <- read.csv("data.csv", header = TRUE, na.strings = NA)
Output:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
I run into essentially 4 problems, that I see. Two of the problems are the error message stated above. The third one is if it doesn't spit out an error message, when I look at the global environment window, I see not all my rows are accounted for, like ~14000 samples are missing but the feature number is right. The other problem I see is, again, not all the samples are counted for and the feature number is not correct.
How can I solve this??
Try the argument comment.char = "" as well as quote. The hash (#) is being read by R as a comment and will cut the line short.
Can you open the CSV using Notepad++? This will allow you to see 'invisible' characters and any other non-printable characters. That file may not contain what you think it contains! When you get the sourcing issue resolved, you can choose the CSV file with a selector tool.
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
Or, hard-code the path, and read the data into R.
# Read CSV into R
MyData <- read.csv(file="c:/your_path_here/Data.csv", header=TRUE, sep=",")

csv files error in R

When trying to read a local csv file im getting the error
Error in xts(dat, order.by = as.Date(rownames(dat), "%m/%d/%Y")) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
im trying out the example from https://rpubs.com/mohammadshadan/288218 which is the following:
tmp_file <- "test.csv"
# Create dat by reading tmp_file
dat <- read.csv(tmp_file,header=FALSE)
# Convert dat into xts
xts(dat, order.by = as.Date(rownames(dat), "%m/%d/%Y"))
# Read tmp_file using read.zoo
dat_zoo <- read.zoo(tmp_file, index.column = 0, sep = ",", format = "%m/%d/%Y")
# Convert dat_zoo to xts
dat_xts <- as.xts(dat_zoo)
the thing is when i try to read the file like in the example which is reading the file from the server this works somehow but not when i try with a csv file locally even if its the same info as the file in the web.
i have tried creating the csv file with Notepad,Notepad++ and Excel with no luck.
Any idea what im missing?, i have also tried using read.table instead of csv with the same results...
File can be found at: https://ufile.io/zfqje
if header=TRUE i get the following error:
Warning messages: 1: In read.table(file = file, header = header, sep =
sep, quote = quote, : incomplete final line found by
readTableHeader on 'test.csv'
2: In read(file, ...) : incomplete
final line found by readTableHeader on 'test.csv'
The problem is the header=FALSE argument in read.csv.
read.csv will choose the first column as the row names if there is a header and the first row contains one fewer field than the number of columns. When header = FALSE, it doesn't create the row names.
Here is an example of the problem:
dat <- read.csv(text = "a,b
1/02/2015,1,3
2/03/2015,2,4", header = F)
as.Date(rownames(dat), "%m/%d/%Y")
#> [1] NA NA NA
By removing header = F, the problem is fixed:
dat <- read.csv(text = "a,b
1/02/2015,1,3
2/03/2015,2,4")
as.Date(rownames(dat), "%m/%d/%Y")
#> [1] "2015-01-02" "2015-02-03"

Read csv with timestamp to R. Define colClass in table.read

I'm trying to read a table (.CSV 120K x 21 wide) assigning object classes to columns with:
read.table(file = "G1to21jan2015.csv",
header = TRUE,
colClasses = c (rep("POSICXct", 6),
rep("numeric", 2),
rep("POSICXct", 2),
"numeric",
NULL,
"numeric",
NULL,
rep("character", 2),
rep("numeric", 5))
)
I get the following error:
Error in read.table(file = "G1to21jan2015.csv", header = TRUE, colClasses = c(rep("POSICXct", :
more columns than column names
I've confirmed that the csv has 21 columns and so (I believe) does my request.
by removing second argument header = TRUE, I get a different error though:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 19 elements
Note
I'm using POSICXct to read data in format: 1/5/2015 15:00:00 where m/d/Y H:M, numeric to read data like 1559, NULL to columns which are empty and I want to skip and character for text
For an unconventional date-time format, one can import as character (step 1) and then coerce the column via strp (step 2)
step 1
df <- read.table(file = "data.csv",
header = TRUE,
sep = "," ,
dec = "." ,
colClasses = "character",
comment.char = ""
)
step 2
strptime(df$v1, "%m/%d/%y %H:%M")
v1 being the name of the column to coerce (in this case date-time in the unconventional format 12/13/2014 15:16:17)
Notes
Using argument sep is necessary since read.table default for sep = "".
When using read.csv there is no need to use the sep argument, which defaults to ",".
Using comment.char = "" (when possible) improves reading time.
Useful info at http://cran.r-project.org/doc/manuals/r-release/R-data.pdf

Numeric variables converted to factors when reading a CSV file

I'm trying to read a .csv file into R where all the column are numeric. However, they get converted to factor everytime I import them.
Here's a sample of how my CSV looks like:
This is my code:
options(StringsAsFactors=F)
data<-read.csv("in.csv", dec = ",", sep = ";")
As you can see, I set dec to , and sep to ;. Still, all the vectors that should be numerics are factors!
Can someone give me some advice? Thanks!
Your NA strings in the csv file, N/A, are interpreted as character and then the whole column is converted to character. If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor. You can use the argument na.strings to tell read.csv which strings should be interpreted as NA.
A small example:
df <- read.csv(text = "x;y
N/A;2,2
3,3;4,4", dec = ",", sep = ";")
str(df)
df <- read.csv(text = "x;y
N/A;2,2
3,3;4,4", dec = ",", sep = ";", na.strings = "N/A")
str(df)
Update following comment
Although not apparent from the sample data provided, there is also a problem with instances of '$' concatenated to the numbers, e.g. '$3,3'. Such values will be interpreted as character, and then the dec = "," doesn't help us. We need to replace both the '$' and the ',' before the variable is converted to numeric.
df <- read.csv(text = "x;y;z
N/A;1,1;2,2$
$3,3;5,5;4,4", dec = ",", sep = ";", na.strings = "N/A")
df
str(df)
df[] <- lapply(df, function(x){
x2 <- gsub(pattern = "$", replacement = "", x = x, fixed = TRUE)
x3 <- gsub(pattern = ",", replacement = ".", x = x2, fixed = TRUE)
as.numeric(x3)
}
)
df
str(df)
You could have gotten your original code to work actually - there's a tiny typo ('stringsAsFactors', not 'StringsAsFactors'). The options command wont complain with the wrong text, but it just wont work. When done correctly, it'll read it as char, instead of factors. You can then convert columns to whatever format you want.
I just had this same issue, and tried all the fixes on this and other duplicate posts. None really worked all that well. The way I went about fixing it was actually on the excel side. If you highlight all the columns in your source file (in excel), right click==> format cells then select 'number' it'll import perfectly fine (so long as you have no non-numeric characters below the header)

Resources