I'm trying to read a .h5ad file in my RStudio.
I first converted the .h5ad file to .h5Seurat file using the Convert() function in library(SeuratDisk).
The code for my attempt can be found here:
> library(Seurat)
> library(SeuratDisk)
> Convert("train.h5ad", "train.h5Seurat")
Warning: Unknown file type: h5ad
Warning: 'assay' not set, setting to 'RNA'
Creating h5Seurat file for version 3.1.5.9900
Adding X as data
Adding X as counts
Adding meta.features from var
Adding X_Compartment_tSNE as cell embeddings for Compartment_tSNE
Adding X_tSNE as cell embeddings for tSNE
Adding layer counts as data in assay counts
Adding layer counts as counts in assay counts
> train_seurat <- LoadH5Seurat("train.h5Seurat")
Validating h5Seurat file
Error: Ambiguous assays
The data which I'm trying to read can be found here: https://drive.google.com/drive/folders/1cXYoKNU9qY0f1bbYNh2uykWG6juVJln7
To add, I tried:
> train_seurat <- LoadH5Seurat("train.h5Seurat", assays = "RNA")
But I faced the same issue. Trying to find something quick.
Kindly try the anndata library but note that the data type won’t be seurat as you would want. It’ll be an anndata class object.
I'm new at R and I'm currently trying to get some statistical data from a file. It is a large set of data in txt tab-delimited file. While importing the file I had no problem and all of the data is shown correctly as a table in rstudio. However, when I'm trying to make any sort of calculations using colsums,
> colSums("Wages and salaries")
Error in colSums("Wages and salaries") : 'x' must be an array of at
least two dimensions
I do receive an error
x' must be an array of at least two dimensions.
"Wages and Salaries" is the name of the column I'm trying to get the sum of.
Using V1 or any other column name that was created by r gives me another error
> colSums(V2)
Error in is.data.frame(x) : object 'V2' not found
The way I'm importing the file is
rm(list=ls())
filename <- read.delim("~/filename.txt", header=FALSE)`
> is.data.frame(filename)
[1] TRUE
This gives me a matrix type data table with rows and columns the same way excel would show me the data.
The reason I'm trying to get a sum of all of the numbers in column is to later get sum of several different columns.
I'm very new at R and I could not find an answer to my question as most of the examples are using just a very small set of data that was created in the r.
In R you can access a column in 2 ways:
filename["Wages and salaries"]
or
filename$`Wages and salaries`
So, please try :
colSums(filename["Wages and salaries"])
I explain my problem : I have two R scripts. I create a variable into the first script that I would like to call into the secondd script. The problem is that to get this variable into the first script, I need to write some R lines, but I don't want to rewrite these lines into the second script.
Any help?
Edit : From a previous topic, I tried these lines in my second R script:
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="combs") #combs is the name of my variable
get(ls()[as.matrix(n)])
}
select_var <- loadRData('RHO_COR.R') #RHO_COR.R is the name of my first script , I execute this command in my second script
But I get a warning message :
Error in load(fileName) :
bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘RHO_COR.R’ has magic number 'libra'
Use of save versions prior to 2 is deprecated
I have two hdf4 files namely file 1:"MYD04_L2.A2011001.2340.006.2014078044212.hdf" and file 2: "MYD04_L2.A2011031.mosaic.006.AOD_550_DT_DB_Combined.hdf". First one is raw data file with 72 sub-datasets and second one is the file I obtained after ordering (i.e. post-processed). For the first R code:
layer_name <- getSds("MYD04_L2.A2011001.2340.006.2014078044212.hdf",method="mrt")
layer_name$SDSnames[66:68]
[1] "AOD_550_Dark_Target_Deep_Blue_Combined"
[2] "AOD_550_Dark_Target_Deep_Blue_Combined_QA_Flag"
[3] "AOD_550_Dark_Target_Deep_Blue_Combined_Algorithm_Flag"
It works ok with method="gdal" as well. However, when I try to read file 2, a window pops up showing gdalinfo.exe has stopped working (method = "gdal"). The same kind of problem arises for mrt and it shows sdslist.exe has stopped working. I get following error message:
Error in sds[[i]] <- substr(sdsRaw[i], 1, 11) == "SDgetinfo: " :
attempt to select less than one element in integerOneIndex
Is single layer is the issue here? As the first one has 72 sub-data sets and second one has only one sub-data set (assuming because of the given file name as I couldn't read it), have R failed to read the data file? Can anyone propose any solution for reading such data files? If ncdf4 package is the solution with enabled hdf4, can anyone explain, step-by-step, how can I enable hdf4 and build ncdf4 using windows platform?
I've searched the web for this without much luck. More or less you always get to the example from the VariantAnnotation Package. And since this example works fine on my computer I have no idea why the VCF I created does not.
The problem: I want to determine the number and location of SNPs in selected genes. I have a large VCF file (over 5GB) that has info on all SNPs on all chromosomes for several mice strains. Obviously my computer freezes if I try to do anything on the whole genome scale, so I first determined genomic locations of genes of interest on chromosome 1. I then used the VariantAnnotation Package to get only the data relating to my genes of interest out of the VCF file:
library(VariantAnnotation)
param<-ScanVcfParam(
info=c("AC1","AF1","DP","DP4","INDEL","MDV","MQ","MSD","PV0","PV1","PV2","PV3","PV4","QD"),
geno=c("DP","GL","GQ","GT","PL","SP","FI"),
samples=strain,
fixed="FILTER",
which=gnrng
)
The code above is taken out of a function I wrote which takes strain as an argument. gnrng refers to a GRanges object containing genomic locations of my genes of interest.
vcf<-readVcf(file, "mm10",param)
This works fine and I get my vcf (dim: 21783 1) but when I try to save it won't work
file.vcf<-tempfile()
writeVcf(vcf, file.vcf)
Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
I even tried in parallel, doing the example from the package first and then substituting for my VCF file:
#This is the example:
out1.vcf<-tempfile()
in1<-readVcf(fl,"hg19")
writeVcf(in1,out1.vcf)
This works just fine, but if I only substitute in1 for my vcf I get the same error.
I hope I made myself clear... And any help will be greatly appreciated!! Thanks in advance!
Thanks for reporting this bug. The problem is fixed in version 1.9.47 (devel branch). The fix will be available in the release branch after April 14.
The problem was that you selectively imported 'FILTER' from the 'fixed' field but not 'ALT'. writeVcf() was throwing an error because there was no ALT value to write out. If you don't have access to the version with the fix, a work around would be to import the ALT field.
ScanVcfParam(fixed = c("ALT", "FILTER"))
You can see what values were imorted with the fixed() accessor:
fixed(vcf)
Please report and bugs or problems on the Bioconductor mailing list Martin referenced. More Bioc users will see the question and you'll get help more quickly.
Valerie
Here's a reproducible example
library(VariantAnnotation)
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
param <- ScanVcfParam(fixed="FILTER")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
## Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
The problem seems to be that writeVcf expects the object to have an 'ALT' field, so
param <- ScanVcfParam(fixed="ALT")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
succeeds.