Error in read.fasta - r

While using seqinr package to read fasta file and predict the length of fasta sequence for code,following the error being shown--
library("seqinr")
ncrna <- read.fasta(file = "ncrna_noncode_v3.fasta")
length(ncra)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 85 did not have 14 elements
Please suggest a possible solution for this error.

You can use the Biostrings package from Bioconductor.
library(Biostrings)
ncrna <- readDNAStringSet("ncrna_noncode_v3.fasta")
# use `width` to see the length of each sequence
str(width(ncrna))
BTW, please show part of your fasta file.

Related

read.csv.sql() with missing values

I am trying to filter a very large file with real-time data - more than 93000 rows x 109 columns.
A lot of the columns have missing values so when I try to use the read.csv.sql() function from sqldf package, I get the following error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 109 elements
Is there any workaround such as fill=TRUE for read.csv that can be used with read.csv.sql()?

Reading PISA data into R - read.table error

I am trying to read data from the PISA 2012 study (http://pisa2012.acer.edu.au/downloads.php) into R using the read.table function. This is the code I tried:
pisa <- read.table("pisa2012.txt", sep = "")
unfortunately I keep getting the following error message:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
: line 2 did not have 184 elements
I have tried to set
header = T
but then get the following error message
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:line 1 did not have 184 elements
Lastly, this is what the .txt file looks like ...
http://postimg.org/image/4u9lqtxqd/
Thanks for your help!
You can see from the first line that you'll need some sort of control file to delimit the individual variables. So, from working with PISA in other environments, I know the first three columns corrrespond to the ISO 3 letter country code (e.g., ALB). What follows are numbers and letters that need to be made sense of in a meaninful way by separating them. You could use the codebook for this (https://pisa2012.acer.edu.au/downloads/M_stu_codebook.pdf), but that is a real bear for every single variable. Why not download in SPSS or sAS and import? Not a 'slick' solution, but without a control file, you'd have a lot of manual work to do.
I just read the files using readr package. So what will you need: readr package, the TXT file, SAScii package and the associated sas file.
So, let say you want to read the student files. Then you will need the following files: INT_STU12_DEC03.txt and INT_STU12_DEC03.sas.
##################### READING STUDENT DATA ###################
## Loading the dictionary
dic_student = parse.SAScii(sas_ri = 'INT_STU12_SAS.sas')
## Creating the positions to read_fwf
student <- read_fwf(file = 'INT_STU12_DEC03.txt', col_positions = fwf_widths(dic_student$width), progress = T)
colnames(student) <- dic_student$varname
OBS 1: As i'm using Linux, I needed to delete the first lines from the sas file and change the encoding to UTF-8.
OBS 2: The lines deleted, were:
libname M_DEC03 "C:\XXX";
filename STU "C:\XXX\INT_STU12_DEC03.txt";
options nofmterr;
OBS 3: The dataset takes about 1Gb, so you will need enougth RAM.

In Scan EOF error while reading CSV file

I am trying to read a CSV file into R. I tried:
data <- read.csv(file="train.csv")
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
But, this reads in only a small percentage of the total observations. Then I tried removing quotes:
data <- read.csv(file="train.csv",quote = "",sep = ",",header = TRUE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
Since the data is text, it seems there is some issue with the delimiter.
It is difficult to share the entire data set as it is huge. I tried going to the line where the error comes, but there seems to be no non printable character. I also tried other readers like fread(), but to no avail.
Have encountered this before. Can be very tricky. Try a specialized CSV reader.:
library(readr)
data <- read_csv(file="train.csv")
This should do it.

read.csv() warning: unable to read a csv file in R

I was trying to read a csv file in R and read.csv gives me a warning and consequently stops reading from there on. I think it's something related to an extra quote being there. How can I resolve this?
(csv file put on a public share below for access)
> scoresdf = read.csv('http://aftabubuntu.cloudapp.net/trainDataEnglish.csv')
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
I got the same error on read.csv. I managed to get it working with the rio package:
library(rio)
dat <- import("http://aftabubuntu.cloudapp.net/trainDataEnglish.csv")
and the readr package:
library(readr)
dat <- read_csv("http://aftabubuntu.cloudapp.net/trainDataEnglish.csv")
and the data.table package:
library(data.table)
dat <- fread("http://aftabubuntu.cloudapp.net/trainDataEnglish.csv")
Try
url <- 'http://aftabubuntu.cloudapp.net/trainDataEnglish.csv'
scoresdf = read.csv(url,quote="")
As you suspected, there is indeed an embedded quotation mark somewhere within your document.

Read a portion of a table

In R, I am trying to only read the first four columns of a text file. Is there a way to do this? This is some example code:
example <- read.table("example.txt", header=TRUE)
When I run this, I get the following error message:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 9 elements
This is why I only want to read in the first four columns.

Resources