Dealing with NA value in data importing loop in R - r

I'm importing several hundred files into a single file in order to analyze is after using:
files.pet <- sort(list.files(pattern = '1998[0-9][0-9][0-9][0-9].pet'), decreasing = FALSE)
all_data.pet <- NA;
for (pet.atual in files.pet) {
data.atual <-
read.table(file = pet.atual,
header = FALSE,
sep = ",",
quote = "\"",
comment.char = ";");
data.atual <- cbind(data.atual, Desig = pet.atual)
all_data.pet <- rbind(all_data.pet, data.atual)
}
Which runs good until it finds one file giving this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 399 did not have 9 elements
Which has a NA value in one of the columns, is there a way to tell the loop to ignore this and keep importing? Or should i just erase/replace NA in the row?
Also while I'm asking can anyone give me a insight on the meaning of:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
I read around but had little luck understanding what it actually means.
Thanks alot! (sorry if the questions are pretty obvious but I'm new to R)

Related

Problems w. read.csv, error message: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got

I get a repeated error message (below) when trying to import a CSV-file with data, and there has been no problems previous years when using exactly the same R-script and the read.csv-command. I get the impression this is a common problem, and the usual advice is to use read.csv rather than scan, but as I have done this I am stuck and would be grateful for information.
Here is the script:
#Read in all individual data for the year to be updated
Idata <- read.csv("Exp3.csv", sep = ";", header = T,
colClasses=c("numeric", rep("character",4), rep("factor",8), "numeric",
"factor", rep("numeric",11), "factor"))
Here is the error message:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
scan() expected 'a real', got '63991,21.1074,Ischnura,elegans,06/20
/21,HojeA14,1074,0,mature,1,1073,blue,,0,androchrome,,,,,,,,,,,2021,KP'
Would be grateful for any help!
Check the separator argument. In read.csv it's coded as ; but the data is comma-separated.

My CSV data when read into R via read.table after so many lines, stops creating new rows and separating the values by ","

When reading my CVS data into R, after reading so many values as normal the data stops being separated by "," leaving lots of data missing
Here is how I load my data into R.
CODATA <- read.table( file.choose("CO2 Emissions per country.cvs"), header = TRUE, sep = "," )
I'm given this warning.
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 3 elements
Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
Then the value is...
"Cote dIvoire,0.42,0.44,0.44,0.43,0.45,0.49,0.46,0.51,0.45,0.4,0.33,0.32,0.38,0.33,0.29,0.28,0.26,0.25,0.23,0.21,0.21,0.2,0.21,0.21,0.22,0.25,0.3,0.3,0.4,0.37,0.36,0.36,0.29,0.31,0.32,0.32,0.3,0.34,0.31,0.29\nNigeria,0.1,0.12,0.15,0.16,0.18,0.22,0.27,0.3,0.32,0.35,0.39,0.42,0.41,0.37,0.38,0.35,0.35,0.36,0.36,0.3,0.34,0.4,0.36,0.29,0.28,0.3,0.34,0.29,0.31,0.34,0.38,0.39,0.36,0.36,0.4,0.35,0.32,0.33,0.27,0.29\nKenya,0.28,0.28.....(and so on)
where the values haven't been separated. The data is meant to start a new line with each country. It reads the previous 100 or so countries as normal up to Cote dIvoire.
Is there any way to fix without editing the csv file and changing the code to load it in?
Thank you for any help given.
You're best checking over the CSV file again for any problems. You could also try CODATA <- read.csv("CO2 Emissions per country.csv") rather than read.table?

R - Loop through directory throws error but I do not know where (try and catch)

I have a loop, which is supposed to take all files which fit the provided Regex.
However, some files obviously don't have the correct amount of columns in all rows. Therefore, the loop crashes.
I do now want to find out, which files cause these errors. There are 100s of files, but only a few that do cause this error.
I know from Java, that I would now try to make a try-catch clause and to print the name of the files in order to find them, have a look and erase/change them. I can't deal with that in R though:
#PATH WITH ALL FILES
files <- list.files(path="/Users/Test/Trackingpoint",
pattern="Trackingpoint.*\\.csv\\.gz", full.names=TRUE, recursive=FALSE)
Trackingpoint_Tables <-
tryCatch({
lapply(files, function(x) {
a <- read.table(gzfile(x), sep = "\t", header = TRUE)
})
}, warning = function(w) {
print(w)
}, error = function(e) {
print(e)
})
As you know, what I have in w and e is not the file itself, but the error. How can I print the file's name and respectively any other information from the file itself?
I want my code to ignore the errors and just proceed, but to tell me, where this error occures (which file).
Right now, it only says:
<simpleError in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip, multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes, flush = flush, encoding = encoding, skipNul = skipNul): line 24610 did not have 44 elements>
A simple change from read.table to read.csv and fill=TRUE were sufficient.

read.csv warning 'EOF within quoted string' to read whole file

I have a .csv file that contains 285000 observations. Once I tried to import dataset, here is the warning and it shows 166000 observations.
Joint <- read.csv("joint.csv", header = TRUE, sep = ",")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
When I coded with quote, as follows:
Joint2 <- read.csv("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
When I coded like that, it shows 483000 observations:
Joint <- read.table("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
What should I do to read the file properly?
I think the problem has to do with file encoding. There are a lot of special characters in the header.
If you know how your file is encoded you can specify using the fileEncoding argument to read.csv.
Otherwise you could try to use fread from data.table. It is able to read the file despite the encoding issues. It will also be significantly faster for reading such a large data file.

R - read.csv add rows when loading the dataset

I'm trying to read a .csv in R. Some rows of one column have text with a komma, within double quotes: "example, another example"
But R alters the rows (it adds rows) when I try to read it like this:
steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',')
This one doesn't work either when I did a search on the internet:
steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
This is the error message:
steekproef <- read.csv("steekproef.csv", header = T, sep =",", quote ="\"")
comes with error:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Gives data.frame: 1391160 obs. of 29 variables
str(steekproef) gives no error but a
'data.frame': 3103620 obs. of 29 variables:
The dataset has 29 columns and 3019438 rows
I don't think that the problem is caused be the "example, another example":
I created a testfile in Excel and saved it as .csv. It looks like this in Notepad++:
test,num
"example, another example",1
"example, another example",2
example,3
example,4
I could import it without problems using
steekproef<- read.csv('steekproef.csv', header = T, sep = ',')
or steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
Your first try: steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',') gave me the Error in read.table(file = file, header = header, sep = sep, quote = quote,: duplicate 'row.names' are not allowed

Resources