I want to import a triangular dataset (33 elements on first line, 32 on the second line, 31 on third line,...)
I tried:
Xij=read.table( file=file.choose(), header=FALSE)
which gives me the error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 9 elements
Can somebody please help me solving this?
Many thanks in advance!
You could do the following:
lines <- readLines(file.choose())
data <- strsplit(lines, ' ')
You will have the list of lines in 'data', and you can create a data frame according to your needs. E.g.:
n <- length(data)
m <- length(data[[n]])
for(i in 1:n) {
data[[i]] <- as.numeric(data[[i]])
length(data[[i]]) <- m
df <- data.frame(matrix(unlist(data), nrow=n, byrow=T))
I am new to R and trying to learn how to read the text below. I am using
data <- read.table("myvertices.txt", stringsAsFactors=TRUE, sep=",")
hoping to convey that the "FID..." should be associated with the comma separated numbers below them.
The error I get is:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 13 did not have 2 elements
How would I read the following format
into something like
FID001 -120.9633 51.8496
FID001 -121.42749 52.293
FID001 -121.25453 52.3195
FID002 -65.4794 47.69011
FID002 -65.4797 47.0401
FID003 -65.849 47.5215
FID003 -65.467 47.515
Here is a possible way to achieve this:
data <- read.table("myvertices.txt") # Read as-is.
fid1 <- c(grep("^FID", data$V1), nrow(data) +1) # Get the row numbers containing "FID.."
df1 <- diff(x = fid1, lag = 1) # Calculate the length+1 rows to read
listdata <- lapply(seq_along(df1),
function(n) cbind(FID = data$V1[fid1[n]],
skip = fid1[n],
nrows = df1[n] -1,
sep = ",")))
data2 <- do.call(rbind, listdata) # Combine all the read tables into a single data frame.
I am trying to read.table data in clipboard to get around DRM-ed file. But I could not understand why it does not work as below. I added " comment.char="" " and it didn't help.
t1 = read.table( "clipboard", header=T, sep="\t" )
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 23938 did not have 23 elements
t1_0 = readLines( "clipboard" )
t2 = sapply( t1_0, function(x) strsplit(x, "\t") )
table( sapply(t2, length ) )
This is my data as "*.txt"
ar ?,35.30,2.60, ?,42.4,24.2,47.1,69
arn 1.23,27.00,3.59,122, 0.0,40.0,40.0,30
be 1.24,26.50,2.90,121,16.0,20.7,29.7,72
bi1 1.07,29.10,3.10,114,44.0, 2.6,26.3,68
bi2 1.08,43.70,2.40,105,32.6, 5.8,10.7,42
bie 1.39,29.50,2.78,126,14.0, 0.0,50.0,78
bn 1.31,26.30,2.10,119,15.7,15.7,30.4,72
bo 1.27,27.60,3.50,116,16.8,23.0,35.2,69
by 1.11,32.60,2.90,113,15.8,15.8,15.0,57
Then, it failed when I command
r2 <- read.table('StoneFlakes.txt',header=TRUE,na.strings='?')
The error ;
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 9 elements
Anyone can help me?
text <- readLines('StoneFlakes.txt')
text <- gsub(",", " ", text)
read.table(textConnection(text), header=TRUE, na.strings='?')
#insted of using textConnection you can also use
read.table(text=text, header=TRUE, na.strings='?')
I am playing around with R. I want to create my dictionary from a txt file. I have 2 .txt files as:
To load these 2 files in R, I am doing the following:
txt_files = list.files(pattern = '*.txt');
data = lapply(txt_files, read.table, sep = ",")
#here I get error
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 2 elements
In addition: Warning message:
In FUN(c("1.txt", "2.txt")[[1L]], ...) :
incomplete final line found by readTableHeader on '1.txt'
dict <- c(data)
#dict <- c("sky","blue","bright","sun") // original dictionary, want to replace this by above method
docs <- c(D1 = "The sky is blue.", D2 = "The sun is bright.", D3 = "The sun in the sky is bright.")
dd <- Corpus(VectorSource(docs))
dtm <- DocumentTermMatrix(dd, control = list(weighting = weightTfIdf,dictionary = dict))
I am getting the following error:
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
Can anybody tell me, what I am doing wrong?
I don't think you should use read.table for those irregular data files. Why not just use readLines() instead
txt_files <- list.files(pattern = '*.txt');
data <- lapply(txt_files, readLines)
dict <- gsub(",$","", unlist(data))
docs <- c(D1 = "The sky is blue.", D2 = "The sun is bright.", D3 = "The sun in the sky is bright.")
dd <- Corpus(VectorSource(docs))
dtm <- DocumentTermMatrix(dd,
control = list(weighting = weightTfIdf,dictionary = dict))
Note we had to remove the training comma ourselves with this method, but that's pretty easy.
I have a file that has 22268 rows BY 2521 columns. When I try to read in the file using this line of code:
file <- read.table(textfile, skip=2, header=TRUE, sep="\t", fill=TRUE, blank.lines.skip=FALSE)
But I only get 13024 rows BY 2521 columns read in and the following error:
Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns
I also used this command to see what rows had an incorrect number of columns:
x <-count.fields(textfile, sep="\t", skip=2)
incorrect <- which(x != 2521)
and got back a list of about 20 rows that were incorrect.
Is there a way to fill these rows with NA values?
I thought that is what the "fill" parameter does in the read.table function, but it doesn't appear so.
Is there a way to ignore these rows that are identified in the "incorrect" variable?
you can use readLines() to input the data, then find the offending rows.
con <- file("path/to/file.csv", "rb")
rawContent <- readLines(con) # empty
close(con) # close the connection to the file, to keep things tidy
then take a look at rawContent
To find the rows with an incorrect number of columns, for example:
expectedColumns <- 2521
delim <- "\t"
indxToOffenders <-
sapply(rawContent, function(x) # for each line in rawContent
length(gregexpr(delim, x)[[1]]) != expectedColumns # count the number of delims and compare that number to expectedColumns
Then to read in your data:
myDataFrame <- read.csv(rawContent[-indxToOffenders], header=??, sep=delim)