Read a portion of a table - r

In R, I am trying to only read the first four columns of a text file. Is there a way to do this? This is some example code:
example <- read.table("example.txt", header=TRUE)
When I run this, I get the following error message:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 9 elements
This is why I only want to read in the first four columns.

Related

read.csv.sql() with missing values

I am trying to filter a very large file with real-time data - more than 93000 rows x 109 columns.
A lot of the columns have missing values so when I try to use the read.csv.sql() function from sqldf package, I get the following error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 109 elements
Is there any workaround such as fill=TRUE for read.csv that can be used with read.csv.sql()?

Error in read.fasta

While using seqinr package to read fasta file and predict the length of fasta sequence for code,following the error being shown--
library("seqinr")
ncrna <- read.fasta(file = "ncrna_noncode_v3.fasta")
length(ncra)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 85 did not have 14 elements
Please suggest a possible solution for this error.
You can use the Biostrings package from Bioconductor.
library(Biostrings)
ncrna <- readDNAStringSet("ncrna_noncode_v3.fasta")
# use `width` to see the length of each sequence
str(width(ncrna))
BTW, please show part of your fasta file.

In Scan EOF error while reading CSV file

I am trying to read a CSV file into R. I tried:
data <- read.csv(file="train.csv")
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
But, this reads in only a small percentage of the total observations. Then I tried removing quotes:
data <- read.csv(file="train.csv",quote = "",sep = ",",header = TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
Since the data is text, it seems there is some issue with the delimiter.
It is difficult to share the entire data set as it is huge. I tried going to the line where the error comes, but there seems to be no non printable character. I also tried other readers like fread(), but to no avail.
Have encountered this before. Can be very tricky. Try a specialized CSV reader.:
library(readr)
data <- read_csv(file="train.csv")
This should do it.

Large two column file for read.table in R, how to automatically ditch lines with the "line X did not have 2 elements"?

I have a large file consisting of two columns which I want to read in. While doing read.table I encounter this:
> x <- read.table('lindsay.csv')
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 493016 did not have 2 elements
Unfortunately that line is only 2% of the file..so finding all those lines which have a bit of corruption is really hard. Is there a way to read.table and automatically skip those lines which do not have 2 elements?
For starters, use read.csv() or use sep="," with read.table() if in fact you are working with a comma delimited value file.
x <- read.csv('lindsay.csv')
OR
x <- read.table('lindsay.csv', sep=",")
If that still doesn't work, you really should find out what is special about those lines and preprocess the text to fix them. This may mean either removing them, or correcting mistakes, or something else I haven't imagined.

trouble in read tab delimited file R

I am trying to read a tab delimited file that looks like this:
I am using the read.table for this propose but I am not able to read the file.
table<- read.table("/Users/Desktop/R-test/HumanHT-12_V4_0_R2_15002873_B.txt",
header =FALSE, sep = "\t",
comment.char="#", check.names=FALSE)
When I run the code I have this error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 2 elements
What am I doing wrong while reading the table?
I am not so familiarized with R, so any help would be really useful.
I am very familiar with this type of file: It is a GEO platform data for Microarray analysis.
As baptiste proposed above, the best way is to skip the first few lines by skip=9. You may replace read.table(...,sep="\t") with just read.delim(...). Then you will have your table with suitable column names - please note that the column names should be in the 1st line.
Then if you are really interested in the first 9 lines you may read them by readLines(...) command and paste the data to your table by acting like this:
foo = read.delim(...)
bar = readLines(...)
baz = list(foo, bar)

Resources