I have a bam file does anyone know how to convert a bam file to a csv file? I am trying to use R-software to open the bam file but I am not sure how to get the variables from the bam files so far I have used the below mentioned coding:
rm(list=ls())
#install bam packages
source("http://bioconductor.org/biocLite.R")
biocLite("Rsamtools",suppressUpdates=TRUE)
biocLite("RNAseqData.HNRNPC.bam.chr14",suppressUpdates=TRUE)
biocLite("GenomicAlignments",suppressUpdates=TRUE)
#load library
library(Rsamtools)
library(RNAseqData.HNRNPC.bam.chr14)
library(GenomicAlignments)
bamfile <- file.path("C:","Users","azzop","Desktop","I16-1144-01-esd_m1_CGCTCATT-AGGCGAAG_tophat2","accepted_hits.bam")
gal<-readGAlignments(bamfile)
gal
length(gal)
names(gal)
When I inserted names(gal) it gave me NULL not sure it is the correct.
I would like to convert the bam to csv and it would be easier to read the data
I would suggest converting BAM to BED and then reading BED file into R.
You can convert BAM to BED using bedtools.
This abstract code should work:
bamfile <- "C:/Users/azzop/Desktop/I16-1144-01-esd_m1_CGCTCATT-AGGCGAAG_tophat2/accepted_hits.bam"
# This code line sends command to convert BAM to BED (might take some time)
system(paste("bedtools bamtobed -i", bamfile, "> myBed.bed"))
library(data.table)
myData <- fread("myBed.bed")
Here I'm using function fread from a data.table package for a fast data read.
Related
How can I read in a .feather file from the web (e.g. GitHub) in R? I can read formats as .csv or .dta from GitHub directly as raw
# CSV
coursedata <- read.csv(file = 'https://raw.githubusercontent.com/MarcoKuehne/seminars_in_applied_economics/main/Data/GF_2020.csv')
# DTA
library(haven)
soep <- read_dta("https://github.com/MarcoKuehne/seminars_in_applied_economics/blob/main/Data/soep_lebensz_en.dta?raw=true")
But the same approach fails for arrow and read_feather.
library(arrow)
digital <- read_feather("https://github.com/MarcoKuehne/seminars_in_applied_economics/blob/main/Data/Digital_Literacy_EN.feather?raw=true")
Is there a direct way or a nested command? Or am I required to download the file manually or programmatically as a temporary file?
I have downloaded the latest R package and am using RStudio and am trying to convert a pgm image into a csv file and am using a readImage function.
Although any time I do
img <- readImage(file)
where file is the filepath
I get
Error in readImage(file) : could not find function "readImage"
Is there some other pack I need to download or am I using it wrong?
You can use the magick package to read pgm files.
First, you need to do:
install.packages("magick")
Now you call
library(magick)
In my case, I have a pgm file in my R home directory, so I make the file path with:
file <- path.expand("~/cat.pgm")
Now I can read the image and convert it into a matrix of RGB strings by doing:
img <- image_read(file)
ras <- as.raster(img)
mat <- as.matrix(ras)
To write this to csv format, I can do:
write.csv(mat, "cat.csv", row.names = FALSE)
So now I have the image saved as a csv file. To read this back in, and prove it works, I can do:
cat_csv <- read.csv("cat.csv")
cat_ras <- as.raster(as.matrix(cat_csv))
plot(cat_ras)
Note though that the csv file is very large - 9MB, which is one of the reasons why it is rarely a good idea to store an image as csv.
Created on 2022-02-05 by the reprex package (v2.0.1)
I have problems when reading a file of performance into R. Is there any example files so I know how to name the rows/columns? The data I have is; fund name (207), year, month and performance. I have saved the file as csv but R does´t seem to understand the format. Thanks in advance! /Johanna
Use following syntax:
setwd("D:/Your Directory")
# Load CSV data
fund <- read.csv(
file = "YourFile.csv",
quote = "\"")
#Peek data
head(fund)
How can one make a Rasterbrick in R from several hdf5 files? Often, data are provided in hdf5 format and one has to convert it to a more friendly format for easy handling.
At the moment I know of the rhdf5 package but how to get a RasterBrick is that which I am unsure about.
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
library("rhdf5")
library("raster")
You can access several hdf5 files on this link http://mirador.gsfc.nasa.gov/cgi-bin/mirador/cart.pl?C1=GPM_3IMERGHH&CGISESSID=fb3b45e091f081aba8823f3e3f85a7d9&LBT_THRESHOLD=4000000.
You can use two files for illustration.
Thanks!
AT.
One option is using gdalUtils to convert the hdf5 files into GTiff. Once you did that you can read them in a stack. Here is an example code:
# list all the `hdf5` files
files <- list.files(path=".", pattern=paste(".*.h5",sep=""), all.files=FALSE, full.names=TRUE)
#choose the band that you want using the sds[] option and write GTiff files.
for (i in (files)) {
sds <- get_subdatasets(i)
r2 <- gdal_translate(sds[1], dst_dataset =paste(i,".tif",sep=""))}
I read a lot of files into R from zipped sources. I try to use the R function unz to read from zipped files because unlike unzip it does not leave any unzipped files on my harddisk.
However, this does not seem to work for zipped *.dta (Stata) files:
library(foreign)
temp <- tempfile()
download.file("http://databank.worldbank.org/data/download/WDI_csv.zip", temp)
wdi_unz <- read.csv(unz(temp, "WDI_Data.csv"))
unlink(temp)
temp <- tempfile()
download.file("http://www.rug.nl/research/ggdc/data/pwt/v80/pwt80.zip",temp)
pwt_unzip <- read.dta(unzip(temp, "pwt80.dta"))
pwt_unz <- read.dta(unz(temp, "pwt80.dta"))
unlink(temp)
Sorry for using the rather large World Development Indicators database (its 40+ MB), but I did not find any better working example.
The code produces an error when reading pwt_unz, [edit: but not when reading pwt_unzip]. What is the problem there? Probably it has something to do with the return value of unz not being compatible with the input for read.dta?
I think you need read.dta
Have a look here :
http://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html