R - read.csv add rows when loading the dataset - r

I'm trying to read a .csv in R. Some rows of one column have text with a komma, within double quotes: "example, another example"
But R alters the rows (it adds rows) when I try to read it like this:
steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',')
This one doesn't work either when I did a search on the internet:
steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
This is the error message:
steekproef <- read.csv("steekproef.csv", header = T, sep =",", quote ="\"")
comes with error:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Gives data.frame: 1391160 obs. of 29 variables
str(steekproef) gives no error but a
'data.frame': 3103620 obs. of 29 variables:
The dataset has 29 columns and 3019438 rows

I don't think that the problem is caused be the "example, another example":
I created a testfile in Excel and saved it as .csv. It looks like this in Notepad++:
test,num
"example, another example",1
"example, another example",2
example,3
example,4
I could import it without problems using
steekproef<- read.csv('steekproef.csv', header = T, sep = ',')
or steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
Your first try: steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',') gave me the Error in read.table(file = file, header = header, sep = sep, quote = quote,: duplicate 'row.names' are not allowed

Related

Merge files in dataframe with new column name. Error

I have a directory with txt files tab-delimited. The size of each one is around 200MB. What I want is to merge all files adding one extra column with the filename.
The code that I have used:
all_txt <- rbindlist(mapply(
c,
(
list.files(
path = "./",
pattern = "*.vcf.gz.hg38_multianno.txt",
full.names = TRUE
) %>%
lapply(
read.table,
header = TRUE,
sep = "\t",
encoding = "UTF-8"
)
),
(
list.files(
path = "./",
pattern = "*.txt",
full.names = TRUE
) %>%
basename() %>%
as.list()
),
SIMPLIFY = FALSE
),
fill = T)
When it starts, I get the following warnings and then an error:
Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 254480 did not have 145 elements
How I can identify the file that doesn't have 145 elements?
All files should contain 145 columns.
thanks!
As #islem mentioned, adding data.table::fread solved the issue.

How can I read from a text file in R?

How can I read from a text file? I have the following data in a text file-
A,B,C,D
E,F,G,H
Iam trying to choose the file interactively.
read.delim(file.choose(), sep=",")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls<br>
2: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 2 appears to contain embedded nulls<br>
3: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 3 appears to contain embedded nulls<br>
4: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 4 appears to contain embedded nulls<br>
5: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 5 appears to contain embedded nulls<br>
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
:
EOF within quoted string<br>
7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
:
embedded nul(s) found in input
I wish to simply read the data and store it in a variable.
Just use read.csv:
your_df <- read.csv(file="/path/to/your_file.csv", header=FALSE)
your_df
v1 v2 v3 v4
1 A B C D
2 E F G H
The header parameter to read.csv tells R that your input CSV file does not have a leading header row with column names.
Install and attach data.table then use fread
fread(file.choose(), sep = ",")
Your error could be due to encoding issues - specify the right encoding:
fread(file.choose(), sep = ",", encoding = "INSERT YOUR ENCODING"`)

read.table and drop last few lines (R)

I'm trying to convert stuff from a text file into a data frame in R. I want to skip the first three lines, and then only read until the line above *End using read.table like so:
df <- read.table("file.txt", sep = ",", dec = ".", skip = 3, nrows = length("file.txt")-2)
but I get this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 8 did not have 4 elements
data looks like this:
*Keyword
*Node
$ Node,X,Y,Z
1,977201.91822656,3678881.46362572,0
2,977200.22079647,3678888.57347347,0
3,977198.87254619,3678898.82239956,0
4,977191.95056633,3679152.85114021,0
*End

read.csv warning 'EOF within quoted string' to read whole file

I have a .csv file that contains 285000 observations. Once I tried to import dataset, here is the warning and it shows 166000 observations.
Joint <- read.csv("joint.csv", header = TRUE, sep = ",")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
When I coded with quote, as follows:
Joint2 <- read.csv("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
When I coded like that, it shows 483000 observations:
Joint <- read.table("joint.csv", header = TRUE, sep = ",", quote="", fill= TRUE)
What should I do to read the file properly?
I think the problem has to do with file encoding. There are a lot of special characters in the header.
If you know how your file is encoded you can specify using the fileEncoding argument to read.csv.
Otherwise you could try to use fread from data.table. It is able to read the file despite the encoding issues. It will also be significantly faster for reading such a large data file.

Dealing with NA value in data importing loop in R

I'm importing several hundred files into a single file in order to analyze is after using:
files.pet <- sort(list.files(pattern = '1998[0-9][0-9][0-9][0-9].pet'), decreasing = FALSE)
all_data.pet <- NA;
for (pet.atual in files.pet) {
data.atual <-
read.table(file = pet.atual,
header = FALSE,
sep = ",",
quote = "\"",
comment.char = ";");
data.atual <- cbind(data.atual, Desig = pet.atual)
all_data.pet <- rbind(all_data.pet, data.atual)
}
Which runs good until it finds one file giving this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 399 did not have 9 elements
Which has a NA value in one of the columns, is there a way to tell the loop to ignore this and keep importing? Or should i just erase/replace NA in the row?
Also while I'm asking can anyone give me a insight on the meaning of:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
I read around but had little luck understanding what it actually means.
Thanks alot! (sorry if the questions are pretty obvious but I'm new to R)

Resources