Importing triangular data

Importing triangular data - r

I want to import a triangular dataset (33 elements on first line, 32 on the second line, 31 on third line,...)
I tried:
Xij=read.table( file=file.choose(), header=FALSE)
which gives me the error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 9 elements
Can somebody please help me solving this?
Many thanks in advance!

You could do the following:
lines <- readLines(file.choose())
data <- strsplit(lines, ' ')
You will have the list of lines in 'data', and you can create a data frame according to your needs. E.g.:
n <- length(data)
m <- length(data[[n]])
for(i in 1:n) {
data[[i]] <- as.numeric(data[[i]])
length(data[[i]]) <- m
}
df <- data.frame(matrix(unlist(data), nrow=n, byrow=T))

Related

R - reading several data tables from one text file

I am new to R and trying to learn how to read the text below. I am using
data <- read.table("myvertices.txt", stringsAsFactors=TRUE, sep=",")
hoping to convey that the "FID..." should be associated with the comma separated numbers below them.
The error I get is:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 13 did not have 2 elements
How would I read the following format
FID001:
-120.9633,51.8496
-121.42749,52.293
-121.25453,52.3195
FID002:
-65.4794,47.69011
-65.4797,47.0401
FID003:
-65.849,47.5215
-65.467,47.515
into something like
FID001 -120.9633 51.8496
FID001 -121.42749 52.293
FID001 -121.25453 52.3195
FID002 -65.4794 47.69011
FID002 -65.4797 47.0401
FID003 -65.849 47.5215
FID003 -65.467 47.515

Here is a possible way to achieve this:
data <- read.table("myvertices.txt") # Read as-is.
fid1 <- c(grep("^FID", data$V1), nrow(data) +1) # Get the row numbers containing "FID.."
df1 <- diff(x = fid1, lag = 1) # Calculate the length+1 rows to read
listdata <- lapply(seq_along(df1),
function(n) cbind(FID = data$V1[fid1[n]],
read.table("myvertices.txt",
skip = fid1[n],
nrows = df1[n] -1,
sep = ",")))
data2 <- do.call(rbind, listdata) # Combine all the read tables into a single data frame.

Error in read.table - number of elements in a specific line

I am trying to read.table data in clipboard to get around DRM-ed file. But I could not understand why it does not work as below. I added " comment.char="" " and it didn't help.
t1 = read.table( "clipboard", header=T, sep="\t" )
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 23938 did not have 23 elements
t1_0 = readLines( "clipboard" )
t2 = sapply( t1_0, function(x) strsplit(x, "\t") )
table( sapply(t2, length ) )
23
406799

function 'read.table' and error

This is my data as "*.txt"
ID LBI RTI WDI FLA PSF FSF ZDF1 PROZD
ar ?,35.30,2.60, ?,42.4,24.2,47.1,69
arn 1.23,27.00,3.59,122, 0.0,40.0,40.0,30
be 1.24,26.50,2.90,121,16.0,20.7,29.7,72
bi1 1.07,29.10,3.10,114,44.0, 2.6,26.3,68
bi2 1.08,43.70,2.40,105,32.6, 5.8,10.7,42
bie 1.39,29.50,2.78,126,14.0, 0.0,50.0,78
bn 1.31,26.30,2.10,119,15.7,15.7,30.4,72
bo 1.27,27.60,3.50,116,16.8,23.0,35.2,69
by 1.11,32.60,2.90,113,15.8,15.8,15.0,57
Then, it failed when I command
r2 <- read.table('StoneFlakes.txt',header=TRUE,na.strings='?')
The error ;
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 9 elements
Anyone can help me?

Use
text <- readLines('StoneFlakes.txt')
text <- gsub(",", " ", text)
read.table(textConnection(text), header=TRUE, na.strings='?')
#insted of using textConnection you can also use
read.table(text=text, header=TRUE, na.strings='?')

Create vocabulary of corpus from multiple txt files

I am playing around with R. I want to create my dictionary from a txt file. I have 2 .txt files as:
#1.txt
sky,
sun
#2.txt
blue,
bright
To load these 2 files in R, I am doing the following:
library(tm)
txt_files = list.files(pattern = '*.txt');
data = lapply(txt_files, read.table, sep = ",")
#here I get error
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 2 elements
In addition: Warning message:
In FUN(c("1.txt", "2.txt")[[1L]], ...) :
incomplete final line found by readTableHeader on '1.txt'
dict <- c(data)
#dict <- c("sky","blue","bright","sun") // original dictionary, want to replace this by above method
docs <- c(D1 = "The sky is blue.", D2 = "The sun is bright.", D3 = "The sun in the sky is bright.")
dd <- Corpus(VectorSource(docs))
dtm <- DocumentTermMatrix(dd, control = list(weighting = weightTfIdf,dictionary = dict))
I am getting the following error:
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
Can anybody tell me, what I am doing wrong?

I don't think you should use read.table for those irregular data files. Why not just use readLines() instead
txt_files <- list.files(pattern = '*.txt');
data <- lapply(txt_files, readLines)
dict <- gsub(",$","", unlist(data))
docs <- c(D1 = "The sky is blue.", D2 = "The sun is bright.", D3 = "The sun in the sky is bright.")
dd <- Corpus(VectorSource(docs))
dtm <- DocumentTermMatrix(dd,
control = list(weighting = weightTfIdf,dictionary = dict))
inspect(dtm)
Note we had to remove the training comma ourselves with this method, but that's pretty easy.

Reading in a file - Warning Message

I have a file that has 22268 rows BY 2521 columns. When I try to read in the file using this line of code:
file <- read.table(textfile, skip=2, header=TRUE, sep="\t", fill=TRUE, blank.lines.skip=FALSE)
But I only get 13024 rows BY 2521 columns read in and the following error:
Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns
I also used this command to see what rows had an incorrect number of columns:
x <-count.fields(textfile, sep="\t", skip=2)
incorrect <- which(x != 2521)
and got back a list of about 20 rows that were incorrect.
Is there a way to fill these rows with NA values?
I thought that is what the "fill" parameter does in the read.table function, but it doesn't appear so.
OR
Is there a way to ignore these rows that are identified in the "incorrect" variable?

you can use readLines() to input the data, then find the offending rows.
con <- file("path/to/file.csv", "rb")
rawContent <- readLines(con) # empty
close(con) # close the connection to the file, to keep things tidy
then take a look at rawContent
To find the rows with an incorrect number of columns, for example:
expectedColumns <- 2521
delim <- "\t"
indxToOffenders <-
sapply(rawContent, function(x) # for each line in rawContent
length(gregexpr(delim, x)[[1]]) != expectedColumns # count the number of delims and compare that number to expectedColumns
)
Then to read in your data:
myDataFrame <- read.csv(rawContent[-indxToOffenders], header=??, sep=delim)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Importing triangular data - r

Related

R - reading several data tables from one text file

Error in read.table - number of elements in a specific line

function 'read.table' and error

Create vocabulary of corpus from multiple txt files

Reading in a file - Warning Message

Categories

Resources