I'm trying to convert stuff from a text file into a data frame in R. I want to skip the first three lines, and then only read until the line above *End using read.table like so:
df <- read.table("file.txt", sep = ",", dec = ".", skip = 3, nrows = length("file.txt")-2)
but I get this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 8 did not have 4 elements
data looks like this:
*Keyword
*Node
$ Node,X,Y,Z
1,977201.91822656,3678881.46362572,0
2,977200.22079647,3678888.57347347,0
3,977198.87254619,3678898.82239956,0
4,977191.95056633,3679152.85114021,0
*End
Related
I have a directory with txt files tab-delimited. The size of each one is around 200MB. What I want is to merge all files adding one extra column with the filename.
The code that I have used:
all_txt <- rbindlist(mapply(
c,
(
list.files(
path = "./",
pattern = "*.vcf.gz.hg38_multianno.txt",
full.names = TRUE
) %>%
lapply(
read.table,
header = TRUE,
sep = "\t",
encoding = "UTF-8"
)
),
(
list.files(
path = "./",
pattern = "*.txt",
full.names = TRUE
) %>%
basename() %>%
as.list()
),
SIMPLIFY = FALSE
),
fill = T)
When it starts, I get the following warnings and then an error:
Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
number of items read is not a multiple of the number of columns
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 254480 did not have 145 elements
How I can identify the file that doesn't have 145 elements?
All files should contain 145 columns.
thanks!
As #islem mentioned, adding data.table::fread solved the issue.
How can I read from a text file? I have the following data in a text file-
A,B,C,D
E,F,G,H
Iam trying to choose the file interactively.
read.delim(file.choose(), sep=",")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls<br>
2: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 2 appears to contain embedded nulls<br>
3: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 3 appears to contain embedded nulls<br>
4: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 4 appears to contain embedded nulls<br>
5: In read.table(file = file, header = header, sep = sep, quote = quote,
:
line 5 appears to contain embedded nulls<br>
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
:
EOF within quoted string<br>
7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
:
embedded nul(s) found in input
I wish to simply read the data and store it in a variable.
Just use read.csv:
your_df <- read.csv(file="/path/to/your_file.csv", header=FALSE)
your_df
v1 v2 v3 v4
1 A B C D
2 E F G H
The header parameter to read.csv tells R that your input CSV file does not have a leading header row with column names.
Install and attach data.table then use fread
fread(file.choose(), sep = ",")
Your error could be due to encoding issues - specify the right encoding:
fread(file.choose(), sep = ",", encoding = "INSERT YOUR ENCODING"`)
I'm importing several hundred files into a single file in order to analyze is after using:
files.pet <- sort(list.files(pattern = '1998[0-9][0-9][0-9][0-9].pet'), decreasing = FALSE)
all_data.pet <- NA;
for (pet.atual in files.pet) {
data.atual <-
read.table(file = pet.atual,
header = FALSE,
sep = ",",
quote = "\"",
comment.char = ";");
data.atual <- cbind(data.atual, Desig = pet.atual)
all_data.pet <- rbind(all_data.pet, data.atual)
}
Which runs good until it finds one file giving this error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 399 did not have 9 elements
Which has a NA value in one of the columns, is there a way to tell the loop to ignore this and keep importing? Or should i just erase/replace NA in the row?
Also while I'm asking can anyone give me a insight on the meaning of:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
I read around but had little luck understanding what it actually means.
Thanks alot! (sorry if the questions are pretty obvious but I'm new to R)
I'm trying to read a .csv in R. Some rows of one column have text with a komma, within double quotes: "example, another example"
But R alters the rows (it adds rows) when I try to read it like this:
steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',')
This one doesn't work either when I did a search on the internet:
steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
This is the error message:
steekproef <- read.csv("steekproef.csv", header = T, sep =",", quote ="\"")
comes with error:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Gives data.frame: 1391160 obs. of 29 variables
str(steekproef) gives no error but a
'data.frame': 3103620 obs. of 29 variables:
The dataset has 29 columns and 3019438 rows
I don't think that the problem is caused be the "example, another example":
I created a testfile in Excel and saved it as .csv. It looks like this in Notepad++:
test,num
"example, another example",1
"example, another example",2
example,3
example,4
I could import it without problems using
steekproef<- read.csv('steekproef.csv', header = T, sep = ',')
or steekproef <- read.csv('steekproef.csv', header = T, quote = "\"", sep = ',')
Your first try: steekproef <- read.csv('steekproef.csv', header = T, quote = "", sep = ',') gave me the Error in read.table(file = file, header = header, sep = sep, quote = quote,: duplicate 'row.names' are not allowed
I'm trying to read a table (.CSV 120K x 21 wide) assigning object classes to columns with:
read.table(file = "G1to21jan2015.csv",
header = TRUE,
colClasses = c (rep("POSICXct", 6),
rep("numeric", 2),
rep("POSICXct", 2),
"numeric",
NULL,
"numeric",
NULL,
rep("character", 2),
rep("numeric", 5))
)
I get the following error:
Error in read.table(file = "G1to21jan2015.csv", header = TRUE, colClasses = c(rep("POSICXct", :
more columns than column names
I've confirmed that the csv has 21 columns and so (I believe) does my request.
by removing second argument header = TRUE, I get a different error though:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 19 elements
Note
I'm using POSICXct to read data in format: 1/5/2015 15:00:00 where m/d/Y H:M, numeric to read data like 1559, NULL to columns which are empty and I want to skip and character for text
For an unconventional date-time format, one can import as character (step 1) and then coerce the column via strp (step 2)
step 1
df <- read.table(file = "data.csv",
header = TRUE,
sep = "," ,
dec = "." ,
colClasses = "character",
comment.char = ""
)
step 2
strptime(df$v1, "%m/%d/%y %H:%M")
v1 being the name of the column to coerce (in this case date-time in the unconventional format 12/13/2014 15:16:17)
Notes
Using argument sep is necessary since read.table default for sep = "".
When using read.csv there is no need to use the sep argument, which defaults to ",".
Using comment.char = "" (when possible) improves reading time.
Useful info at http://cran.r-project.org/doc/manuals/r-release/R-data.pdf