cummeRbund Create Gene Set Error - r

I am having trouble creating a Gene Set using cummeRbund (R software used to analyze cufflinks, cuffdiff output).
I have been working from the cummeRbund manual that can be found here The directions have worked up until the point of creating the gene sets.
Before creating the gene sets you need to create a vector of gene_ids to include. In the example they enclose each item in this list in quotation marks. I have a created a gene_ids .txt file named OtoSCOPE_v7_list_oneline.txt the first 4 entries in this list are shown below.
“Adcy1” “Bdp1” “Bsnd” “Cabp2”
Here is the create gene sets portion of the script that I have been using.
###################################
# Creating Gene Sets
###################################
#first created a vector of gene_ids that you want included in your gene set
base_dir <- "/Users/paulranum/Documents/cummeRbund"
otoscope_genes <- read.table(file.path(base_dir, "OtoSCOPE_v7_list_oneline.txt"), stringsAsFactors=FALSE)
data(cuff)
myGeneIds<-otoscope_genes
myGeneIds
myGenes<-getGenes(cuff, myGeneIds)
myGenes
When I run this I get the following output and errors.
> data(cuff)
Warning message:
In data(cuff) : data set 'cuff' not found
> myGeneIds<-otoscope_genes
> myGeneIds
V1 V2 V3 V4
1 “Adcy1” “Bdp1” “Bsnd” “Cabp2”
> myGenes<-getGenes(cuff, myGeneIds)
Error in rsqlite_send_query(conn#ptr, statement) :
cannot start a transaction within a transaction
> myGenes
Error: object 'myGenes' not found
From what I can tell there are two main issues going on.
it is not recognizing my data(cuff) command. cuff is the name of my CuffSet data file this file has worked for everything else. is this not the correct data file?
the error after the myGenes<-getGenes(cuff, myGeneIds) command:
Error in rsqlite_send_query(conn#ptr, statement) : cannot start a
transaction within a transaction
Thanks for reading any help would be very much appreciated.

Related

Reading .h5ad file in R using Convert

I'm trying to read a .h5ad file in my RStudio.
I first converted the .h5ad file to .h5Seurat file using the Convert() function in library(SeuratDisk).
The code for my attempt can be found here:
> library(Seurat)
> library(SeuratDisk)
> Convert("train.h5ad", "train.h5Seurat")
Warning: Unknown file type: h5ad
Warning: 'assay' not set, setting to 'RNA'
Creating h5Seurat file for version 3.1.5.9900
Adding X as data
Adding X as counts
Adding meta.features from var
Adding X_Compartment_tSNE as cell embeddings for Compartment_tSNE
Adding X_tSNE as cell embeddings for tSNE
Adding layer counts as data in assay counts
Adding layer counts as counts in assay counts
> train_seurat <- LoadH5Seurat("train.h5Seurat")
Validating h5Seurat file
Error: Ambiguous assays
The data which I'm trying to read can be found here: https://drive.google.com/drive/folders/1cXYoKNU9qY0f1bbYNh2uykWG6juVJln7
To add, I tried:
> train_seurat <- LoadH5Seurat("train.h5Seurat", assays = "RNA")
But I faced the same issue. Trying to find something quick.
Kindly try the anndata library but note that the data type won’t be seurat as you would want. It’ll be an anndata class object.

Error while using colSums with tab-delimited file

I'm new at R and I'm currently trying to get some statistical data from a file. It is a large set of data in txt tab-delimited file. While importing the file I had no problem and all of the data is shown correctly as a table in rstudio. However, when I'm trying to make any sort of calculations using colsums,
> colSums("Wages and salaries")
Error in colSums("Wages and salaries") : 'x' must be an array of at
least two dimensions
I do receive an error
x' must be an array of at least two dimensions.
"Wages and Salaries" is the name of the column I'm trying to get the sum of.
Using V1 or any other column name that was created by r gives me another error
> colSums(V2)
Error in is.data.frame(x) : object 'V2' not found
The way I'm importing the file is
rm(list=ls())
filename <- read.delim("~/filename.txt", header=FALSE)`
> is.data.frame(filename)
[1] TRUE
This gives me a matrix type data table with rows and columns the same way excel would show me the data.
The reason I'm trying to get a sum of all of the numbers in column is to later get sum of several different columns.
I'm very new at R and I could not find an answer to my question as most of the examples are using just a very small set of data that was created in the r.
In R you can access a column in 2 ways:
filename["Wages and salaries"]
or
filename$`Wages and salaries`
So, please try :
colSums(filename["Wages and salaries"])

How to call a variable from one script into another script with R?

I explain my problem : I have two R scripts. I create a variable into the first script that I would like to call into the secondd script. The problem is that to get this variable into the first script, I need to write some R lines, but I don't want to rewrite these lines into the second script.
Any help?
Edit : From a previous topic, I tried these lines in my second R script:
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="combs") #combs is the name of my variable
get(ls()[as.matrix(n)])
}
select_var <- loadRData('RHO_COR.R') #RHO_COR.R is the name of my first script , I execute this command in my second script
But I get a warning message :
Error in load(fileName) :
bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘RHO_COR.R’ has magic number 'libra'
Use of save versions prior to 2 is deprecated

Error reading hdf file in R

I have two hdf4 files namely file 1:"MYD04_L2.A2011001.2340.006.2014078044212.hdf" and file 2: "MYD04_L2.A2011031.mosaic.006.AOD_550_DT_DB_Combined.hdf". First one is raw data file with 72 sub-datasets and second one is the file I obtained after ordering (i.e. post-processed). For the first R code:
layer_name <- getSds("MYD04_L2.A2011001.2340.006.2014078044212.hdf",method="mrt")
layer_name$SDSnames[66:68]
[1] "AOD_550_Dark_Target_Deep_Blue_Combined"
[2] "AOD_550_Dark_Target_Deep_Blue_Combined_QA_Flag"
[3] "AOD_550_Dark_Target_Deep_Blue_Combined_Algorithm_Flag"
It works ok with method="gdal" as well. However, when I try to read file 2, a window pops up showing gdalinfo.exe has stopped working (method = "gdal"). The same kind of problem arises for mrt and it shows sdslist.exe has stopped working. I get following error message:
Error in sds[[i]] <- substr(sdsRaw[i], 1, 11) == "SDgetinfo: " :
attempt to select less than one element in integerOneIndex
Is single layer is the issue here? As the first one has 72 sub-data sets and second one has only one sub-data set (assuming because of the given file name as I couldn't read it), have R failed to read the data file? Can anyone propose any solution for reading such data files? If ncdf4 package is the solution with enabled hdf4, can anyone explain, step-by-step, how can I enable hdf4 and build ncdf4 using windows platform?

How to save Variant Call Format (VCF) file to disk in R using VariantAnnotation Package

I've searched the web for this without much luck. More or less you always get to the example from the VariantAnnotation Package. And since this example works fine on my computer I have no idea why the VCF I created does not.
The problem: I want to determine the number and location of SNPs in selected genes. I have a large VCF file (over 5GB) that has info on all SNPs on all chromosomes for several mice strains. Obviously my computer freezes if I try to do anything on the whole genome scale, so I first determined genomic locations of genes of interest on chromosome 1. I then used the VariantAnnotation Package to get only the data relating to my genes of interest out of the VCF file:
library(VariantAnnotation)
param<-ScanVcfParam(
info=c("AC1","AF1","DP","DP4","INDEL","MDV","MQ","MSD","PV0","PV1","PV2","PV3","PV4","QD"),
geno=c("DP","GL","GQ","GT","PL","SP","FI"),
samples=strain,
fixed="FILTER",
which=gnrng
)
The code above is taken out of a function I wrote which takes strain as an argument. gnrng refers to a GRanges object containing genomic locations of my genes of interest.
vcf<-readVcf(file, "mm10",param)
This works fine and I get my vcf (dim: 21783 1) but when I try to save it won't work
file.vcf<-tempfile()
writeVcf(vcf, file.vcf)
Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
I even tried in parallel, doing the example from the package first and then substituting for my VCF file:
#This is the example:
out1.vcf<-tempfile()
in1<-readVcf(fl,"hg19")
writeVcf(in1,out1.vcf)
This works just fine, but if I only substitute in1 for my vcf I get the same error.
I hope I made myself clear... And any help will be greatly appreciated!! Thanks in advance!
Thanks for reporting this bug. The problem is fixed in version 1.9.47 (devel branch). The fix will be available in the release branch after April 14.
The problem was that you selectively imported 'FILTER' from the 'fixed' field but not 'ALT'. writeVcf() was throwing an error because there was no ALT value to write out. If you don't have access to the version with the fix, a work around would be to import the ALT field.
ScanVcfParam(fixed = c("ALT", "FILTER"))
You can see what values were imorted with the fixed() accessor:
fixed(vcf)
Please report and bugs or problems on the Bioconductor mailing list Martin referenced. More Bioc users will see the question and you'll get help more quickly.
Valerie
Here's a reproducible example
library(VariantAnnotation)
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
param <- ScanVcfParam(fixed="FILTER")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
## Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
The problem seems to be that writeVcf expects the object to have an 'ALT' field, so
param <- ScanVcfParam(fixed="ALT")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
succeeds.

Resources