Reading circular read mapping (BAM) into R with readGAlignments() - r

I am trying to create a circular genome map in R using the ggbio package. I am new to ggbio and related packages like GenomicAlignments and GenomicRanges.
I exported my read mapping as a BAM file (with associated index file) and tried to use readGAlignmentsFromBam() to read in the file.
myreads <- readGAlignmentsFromBam("final.assembly", index = "final.assembly", use.names = TRUE)
But I always get
Warning message:
In GenomicRanges:::valid.GenomicRanges.seqinfo(x) :
GAlignments object contains 12006 out-of-bound ranges located on sequence
Consensus. Note that only ranges located on a non-circular sequence whose
length is not NA can be considered out-of-bound (use seqlengths() and
isCircular() to get the lengths and circularity flags of the underlying
sequences).
Which makes sense - it's a circular chromosome, so some reads will be outside of the "linear" reference sequence. The question is, how do I fix it? I've attempted adding isCircular = c(TRUE) as an argument, but that did not help. It would seem there is a flag somewhere (in the BAM file? in the R code?) that should be set which isn't, but I can't figure out where.
Apologies for not having a reproducible example, but this is a huge BAM file and I am not familiar enough with the file type to mock up the data.

Related

Difficulty opening a package data file of unknown type

I am trying to load the state map from the maps package into an R object. I am hoping it is a SpatialPolygonsDataFrame or something I can turn into one after I have inspected it. However I am failing at the first step – getting it into an R object. I do not know the file type.
I first tried to assign the map() output to an R object directly:
st_m <- maps::map(database = "state")
draws the map, but str(st_m) appears to do nothing, unless it is redrawing the same map.
Then I tried loading it as a dataset: st_m <- data("stateMapEnv", package="maps") but this just returns a string:
> str(stateMapEnv)
chr "R_MAP_DATA_DIR"
I opened the maps directory win-library/3.4/maps/mapdata/ and found what I think is the map file, “state.L”.
I tried reading it with scan and got an error message I do not understand:
scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
scan() expected 'a real', got '#'
I then opened the file with Notepad++. It appears to be a binary or compressed file.
So I thought it might be an R data file with an unusual extension. But my attempt to load it returned a “bad magic number” error:
st_m <- load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
bad restore file magic number (file may be corrupted) -- no data loaded
Observing that these responses have progressed from the unhelpful through the incomprehensible to the occult, I thought it best to seek assistance from the wizards of stackoverflow.
This should be able to export the 'state' or any other maps dataset for you:
library(ggplot2)
state_dataset <- map_data("state")

From POV-Ray to rgl

I've followed this tutorial to export a POV-Ray graphic to a STL file, through Meshlab. I've also tried the export to the OBJ format. Everything seems to work fine from the creation of the POV-Ray graphic to the exporting in Meshlab.
But then I've tried to render the graphic in R with the functions readSTL and readOBJ of the rgl package, and the problem is here.
In fact, the exported STL file is empty:
solid STL generated by MeshLab
endsolid vcg
So, of course, rgl::readSTL renders nothing in R.
The OBJ file is not empty, but it contains no faces (only vertices and vertex normals):
####
#
# OBJ File Generated by Meshlab
#
####
# Object blob.obj
#
# Vertices: 8437
# Faces: 0
#
####
vn -0.900372 -0.267658 -0.343060
v -4.000525 2.600000 -0.833225
......
After running rgl::readOBJ in R the rendering is just a white scene, there's nothing. Even if there's no face, we could expect to get some points.
Maybe I'm mistaken during one step of the procedure. Do you have any idea about:
how to export to a non-empty STL file in Meshlab ?
how to get the points with readOBJ in R ?
how to get the faces in the OBJ file when exporting from Meshlab ?
do you know another way to go from POV-Ray to rgl, if possible preserving the colors ?
Update
I've found a way to get the faces in the OBJ file: instead of doing Screened Poisson Surface Reconstruction in Meshlab, as said in the tuto, I do Surface Reconstruction: Ball Pivoting.
But then rgl::readOBJ generates this error:
Error in order(vlinks[[i]][, 2]) : argument 1 is not a vector
The same procedure allows to export a non-empty STL file. But then rgl::readSTL generates this error:
Error in matrix(NA, 3 * n, 3) : invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(NA, 3 * n, 3) : NAs introduced by coercion to integer range
Ok, I've found.
Follow the linked tuto to create the file blob.asc in POV-Ray
In Meshlab, open this file, and do Surface Reconstruction: Ball Pivoting in the menu Filters -> Remeshing, Simplification and Reconstruction. Perhaps one previously needs to do Screened Poisson Surface Reconstruction before, I don't know.
Export the file as STL. Check "Binary Encoding" (a default option), because rgl::readSTL reads binary STL files only, not ASCII files.
In R, now you can do rgl::readSTL. And this works:

problems with multiDiv in paleotree package

I am trying to use the package paleotree to build LTT plots, but I get the following error when I try to input my trees.
a=read.tree(file.choose()) # to choose newick/nexus file
multiDiv(a)
Error in multiDiv(a) : Data of Unknown Type
Does paleotools only take objects of class 'multiphylo' ? I converted the imput tree to class multiphylo, but it still gives the same error. Can anyone suggest how to go about it?
I'm the author of package paleotree. I think what is going on here is that you are passing a single tree to multiDiv, which is setup for analyzing lists of objects, each of which are converted to a diversity curve. You probably want phyloDiv() instead. I can't be certain without know more about your data.

How to save Variant Call Format (VCF) file to disk in R using VariantAnnotation Package

I've searched the web for this without much luck. More or less you always get to the example from the VariantAnnotation Package. And since this example works fine on my computer I have no idea why the VCF I created does not.
The problem: I want to determine the number and location of SNPs in selected genes. I have a large VCF file (over 5GB) that has info on all SNPs on all chromosomes for several mice strains. Obviously my computer freezes if I try to do anything on the whole genome scale, so I first determined genomic locations of genes of interest on chromosome 1. I then used the VariantAnnotation Package to get only the data relating to my genes of interest out of the VCF file:
library(VariantAnnotation)
param<-ScanVcfParam(
info=c("AC1","AF1","DP","DP4","INDEL","MDV","MQ","MSD","PV0","PV1","PV2","PV3","PV4","QD"),
geno=c("DP","GL","GQ","GT","PL","SP","FI"),
samples=strain,
fixed="FILTER",
which=gnrng
)
The code above is taken out of a function I wrote which takes strain as an argument. gnrng refers to a GRanges object containing genomic locations of my genes of interest.
vcf<-readVcf(file, "mm10",param)
This works fine and I get my vcf (dim: 21783 1) but when I try to save it won't work
file.vcf<-tempfile()
writeVcf(vcf, file.vcf)
Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
I even tried in parallel, doing the example from the package first and then substituting for my VCF file:
#This is the example:
out1.vcf<-tempfile()
in1<-readVcf(fl,"hg19")
writeVcf(in1,out1.vcf)
This works just fine, but if I only substitute in1 for my vcf I get the same error.
I hope I made myself clear... And any help will be greatly appreciated!! Thanks in advance!
Thanks for reporting this bug. The problem is fixed in version 1.9.47 (devel branch). The fix will be available in the release branch after April 14.
The problem was that you selectively imported 'FILTER' from the 'fixed' field but not 'ALT'. writeVcf() was throwing an error because there was no ALT value to write out. If you don't have access to the version with the fix, a work around would be to import the ALT field.
ScanVcfParam(fixed = c("ALT", "FILTER"))
You can see what values were imorted with the fixed() accessor:
fixed(vcf)
Please report and bugs or problems on the Bioconductor mailing list Martin referenced. More Bioc users will see the question and you'll get help more quickly.
Valerie
Here's a reproducible example
library(VariantAnnotation)
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
param <- ScanVcfParam(fixed="FILTER")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
## Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
The problem seems to be that writeVcf expects the object to have an 'ALT' field, so
param <- ScanVcfParam(fixed="ALT")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
succeeds.

Kindly check the R command

I am doing following in Cooccur library in R.
> fb<-read.table("Fb6_peaks.bed")
> f1<-read.table("F16_peaks.bed")
everything is ok with the first two commands and I can also display the data:
> fb
> f1
But when I give the next command as given below
> explore_pairs(c("fb", "f1"))
I get an error message:
Error in sum(sapply(tf1_s, score_sample, tf2_hits = tf2_s, hit_list = hit_l)) :
invalid 'type' (list) of argument
Could anyone suggest something?
Despite promising to release a version to the Bioconductor depository in the article the authors published over a year ago, they have still not delivered. The gz file that is attached to the article is not of a form that my installation recognizes. Your really should be corresponding with the authors for this question.
The nature of the error message suggests that the function is expecting a different data class. You should be looking at the specification for the arguments in the help(explore_pairs) file. If it is expecting 2 matrices, then wrapping data.matrix around the arguments may solve the problem, but if it is expecting a class created by one of that packages functions then you need to take the necessary step to construct the right objects.
The help file for explore_pairs does exist (at least in the MAN directory) and says the first argument should be a character vector with further provisos:
\arguments{
\item{factornames}{an vector of character strings, each naming a GFF-like
data frame containing the binding profile of a DNA-binding factor.
There is also a load utility, load_GFF, which I assume is designed for creation of such files.
Try rename your data frame:
names(fb)=c("seq","start","end")
Check the example datasets. The column names are as above. I set the names and it worked.

Resources