Problem extracting specific transcript sequences from genome - r

I'm working on Linux in R to get specific transcript sequences from the genome. I am able to import the genome, extract transcripts annotations (tx), and get transcript sequences (tx_seq). I am also able to import genomic annotations for the transcripts I'm interested in as GRanges (uorf_annotations) and convert them transcript coordinates (uorf_tx_annotations).
The libraries I have loaded are
#BiocManager::install(version="3.9")
#BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
#BiocManager::install("S4Vectors")
#BiocManager::install(c("GenomicFeatures", "Biostrings", "GenomicRanges"))
library(broom)
library(BSgenome.Hsapiens.UCSC.hg19)
library(Biostrings)
library(GenomicRanges)
library(GenomicFeatures)
library(rtracklayer)
library(tidyverse)
library(plyranges)
options(scipen=999)
The chunk of code that works is
uorf_tx_annotations <- mapToTranscripts(uorf_annotations, tx) %>%
print()
I get stuck when I try and get the transcript sequences from uorf_tx_annotations. The code I am using for this step is
uorf_transcript_seq <- extractTranscriptSeqs(genome, uorf_tx_annotations) %>%
print()
and the error I am getting is "Error in extractTranscriptSeqs(genome, uorf_tx_annotations) : failed to extract the exon ranges from 'transcripts' with exonsBy(transcripts, by="tx", ...)"
I use extractTranscriptSeqs earlier on to get tx_seq, so I'm not sure what's wrong here. I thought that everything I needed was in GRanges, but in case uorf_annotations wasn't in the GRanges I tried (inspired from documentation at https://rdrr.io/bioc/GenomicFeatures/man/extractTranscriptSeqs.html)
uorf_tx_annotations <- mapToTranscripts(uorf_annotations, tx) %>%
exonsBy(by = "tx") %>%
print()
but this just introduces a new error: "Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘exonsBy’ for signature ‘"GRanges"’"

Related

R "unable to find an inherited method for function" type error in migrateR, when making mvmt objects from ltraj objects

I have been following the migrateR vignette while traying to create animal movement models.
So far I made a list of ltraj objects from the adehabitat package and have worked with that:
set.coordinates <- function(x){coordinates(x) <- c("Long","Lat"); x}
set.proj4string <- function(x){proj4string(x) <- CRS("+proj=longlat +datum=WGS84"); x}
Base_ind <- lapply(Base_ind, set.coordinates)
Base_ind <- lapply(Base_ind, set.proj4string)
make.ltraj <- function(q){as.ltraj(xy=coordinates(q), date=q$TelemDate, id=q$AID,
burst = q$AID, infolocs = q[4])}
Base_ind_tr <- lapply(Base_ind, make.ltraj)
make.nsd <- function(x){mvmtClass(x)}
Base.nsd <- lapply(Base_ind_tr, make.nsd)
After this the error appears:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘burst’ for signature ‘"ltraj", "missing"’
My ltraj objects look the same as in the vignette:
Type of the traject: Type II (time recorded)
Time zone: UTC *
Irregular traject. Variable time lag between two locs
Characteristics of the bursts:
id burst nb.reloc NAs date.begin date.end
1 An1 An1 1323 0 2021-05-26 11:02:26 2021-10-24 17:00:38
infolocs provided. The following variables are available:
[1] "elev"
burst exists as an attribute of the ltraj objects.
I have also gotten the error for individual animals and for all of them in a single data frame.
I have a vague idea how these sort of errors occur but am not sure how to solve it.

export data frame to till format

I am using night-time light satellite data. I have performed a calibration method between two satellite data for the same year. for which I had converted the tiff file to dataframe. Now I need to export the data frame to tiff format. following are the codes I tried but some error is shown
library (sp)
library (raster)
library (rgdal)
writeRaster(NTL_new2, "E:\phd\data\calliberation test\rstudio\test.tif", format="GTiff", overwrite=TRUE)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘writeRaster’ for signature ‘"data.frame", "character"’
can anyone guide me on how to go about
That is the type of message you will get when attempting to apply an S4 generic function to an object of a class for which no defined S4 method exists (or at least has been attached to the current R session).
Here's an example using the raster package (for spatial raster data), which is chock full of S4 functions.
library(raster)
## raster::rotate() is an S4 function with just one method, for "Raster" class objects
isS4(rotate)
# [1] TRUE
showMethods(rotate)
# Function: rotate (package raster)
# x="Raster"
## Lets see what happens when we pass it an object that's *not* of class "Raster"
x <- 1:10
class(x)
# [1] "integer"
rotate(x)
# Error in (function (classes, fdef, mtable) :
# unable to find an inherited method for function ‘rotate’ for signature ‘"integer"’

Sentiment Analysis Of A Dataset With Multiple NewsPaper Articles

I'm trying to call get_nrc_sentiment in R but getting the following error:
Error in get_nrc_sentiment(Test) : Data must be a character vector.
Can anyone see what I'm doing wrong?
library("RDSTK")
library("readr")
library("qdap")
library("syuzhet")
library("ggplot2")
library(readxl)
Test <- read_excel("Test.xlsx")
View(Test)
scores = get_nrc_sentiment(Test) //throwing error
I suspect that the Test.xlsx file your are reading in has multiple columns. In that case, the Test object would not be a character vector, but a dataframe. Putting the dataframe object into the get_nrc_sentiment() causes the error. You can check test with class(Test) to determine what kind of R object it is.

Web Scraping (in R) - readHTMLTable error

I have a file called Schedule.csv, which is structured as follows:
URLs
http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=27&year=2015
http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=28&year=2015
I am trying to use the explanation provided in the following question to scrape the html tables but it isn't working: How to scrape HTML tables from a list of links
My current code is as follows:
library(XML)
schedule<-read.csv("Schedule.csv")
stats <- list()
for(i in seq_along(schedule))
{
print(i)
total <- readHTMLTable(schedule[i])
n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
stats[[i]] <- as.data.frame(total[[which.max(n.rows)]])
}
I get an error when I run this code as follows:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"data.frame"’
If I manually type the URL's in a vector as per below I get exactly what I want when I run the readHTMLTable code.
schedule<-c("http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=27&year=2015","http://www.basketball-reference.com/friv/dailyleaders.cgi?month=10&day=28&year=2015")
Can someone please explain to me why the read.csv is not giving me a usable vector of information to input into the readHTMLTable function?
read.csv creates a data.frame in your shcedule. Then you want to access it by rows (seq_along and schedule[i] work along the columns of the data frame)
In your case you can do:
for (i in 1:nrow (schedule)) {
total <- readHTMLTable(schedule[i, 1])
as I understand you want the first column of your data.frame, change the , 1] or use column names otherwise.
Also notice that read.csv will read your first column as a factor so you may prefer to read it as a character:
schedule<-read.csv("Schedule.csv", as.is = TRUE)
An other alternative if your file has a unique column is to use readLines an then you can keep your loop as it was...
schedule<-readLines("Schedule.csv")
stats <- list()
for(i in seq_along(schedule))
{
print(i)
total <- readHTMLTable(schedule[i])
...
but be careful with the column names because they will be in the first element of your schedule vector

Batch convert .csv files to .shp files in R

I am trying to convert a large number (>500) of text files into shapefiles. I can successfully convert a single .csv into a projected shapefile. And I can get lapply and 'for' loops to work when just loading, cleaning up, and exporting the text files. But the code fails when I add in steps to convert to shapefiles within the loops. Below are two ways I've tried tackling the problem and the associated error messages:
General processing/definitions-
library(rgdal)
library(sp)
crs.geo<-CRS("+proj=utm +zone=14 +ellps=GRS80 +datum=NAD83 +units=m +no_defs ") #define projection
coltype=c("character","character","character","numeric","numeric") #define data types for input .csv (x,y UTM coords are columns 4,5)
setwd("C:/.../testdata/out")
all.the.filenames <- list.files(pattern= "\\.csv") #create list of files to batch process
head(exampledata,2)
Point Location Time easting northing
1 Trackpoint 14 S 661117 3762441 12/1/2008 5:57:02 AM 661117 3762441
2 Trackpoint 14 S 661182 3762229 12/1/2008 5:58:02 AM 661182 3762229
Batch conversion with a 'for' loop
names <- substr(all.the.filenames, 1, nchar(all.the.filenames)-4) #clean up file names
for(i in names) {
filepath <- file.path("../out",paste(i,".csv",sep=""))
assign(i, read.table(filepath, colClasses=coltype, header=TRUE, sep=",", na.strings=c("NA","")))
coordinates(i) <- c(4,5) #coords in columns 4,5
proj4string(i) <- crs.geo
writeOGR(i,"C:/Users/Seth/Documents/testdata/out","*",driver="ESRI Shapefile") }
R returns this error message:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘coordinates<-’ for signature ‘"character"’
If I end the 'for' loop after the 'assign' line, it successfully imports all .csv files as separate objects in R. The problem seems to be that function 'coordinates' is not seeing the coords as numeric, and I get the same error message no matter how explicitly I try to define them as such (e.g., coordinates(i) <- c(as.numeric("easting","northing")) Also, these lines of code work successfully when applied to a single .csv file, the problem is when I subset within a loop.
Batch conversion using lapply
files.to.process <- lapply(all.the.filenames, function(x) read.csv(x, colClasses=coltype, header=TRUE))
lapply(files.to.process, function(c) coordinates(c)<-c("easting","northing"))
[[1]]
[1] "easting" "northing"
[[2]]
[1] "easting" "northing"
[[3]]
[1] "easting" "northing"
[[4]]
[1] "easting" "northing"
[[5]]
[1] "easting" "northing"
lapply(files.to.process, function(p) proj4string(p) <- crs.geo)
which returns the error message:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘proj4string<-’ for signature ‘"data.frame", "CRS"’
#Double-check if function 'coordinates' worked
class(files.to.process) == "SpatialPoints"
[1] FALSE
Conclusion/problem
With both approaches the problem seems to be in the 'coordinates' step to make a spatial object. What am I doing wrong in the loops? Thanks much for any help!
Seth H.
In your first attempt, the object i inside the loop is a character object. So,
coordinates(get(i))
would work better; I don't have a batch of csv files to test it on.
In the second attempt using lapply(), I'm not exactly sure what's going on, but
class(files.to.process)
should be "list", so what you want to do is
lapply(files.to.process,class)
and that will tell you if the objects are of class spatialpoints. I'm guessing they are data.frames, and you need one more step in between.

Resources