Use of R AllelicImbalance Package - r

I am a Bioinformatics student and pretty much an amateur R user. I have gone through almost the whole of the internet without getting any examples for the working of this package except the documentation provided by 'Bioconductor'-website. I am currently working on the 'AllelicImbalance' package provided by 'Bioconductor' for handling genomics data such as large 'BAM' files to find specific allelic positions and then analyzing them. I have gone ahead with their tutorial by creating a BAM format file. Then later to operate the file, I have set my working directory where my files have been saved.
The code for operating is -
> library(AllelicImbalance)
> searchArea<-GRanges(seqnames = c("17"), ranges = IRanges(79478301, 79478361))
> pathToFiles<-system.file("ERR009135.bam", package = "AllelicImbalance")
> reads<-impBamGAL(pathToFiles, searchArea, verbose=FALSE)
I get an error after the last command which says -
Error in .local(UserDir, ...) : No bam files found in
In addition: Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
path[1]="": The filename, directory name, or volume label syntax is incorrect
The samples that I have used are present in here and are paired-end 'fastqc' files - ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR009/ERR009135
To download these files, a stable and fast internet connection is required. Later I converted these files to the 'BAM' format using the 'Linux' in the file 'ERR009135.bam'.
I would love to know if it is a general R syntax error or is it a major error where I need to make modifications with my files such as the 'BAM' files (ERR009135.bam). Please do reply with any suggestions or modifications. Will be deeply appreciated.
Thank you in advance!

Related

how to load sample.RData file using R?

I am trying to load "sample.RData" file in R. The error it gives is
Error in load("sample.RData") : bad restore file magic number (file
may be corrupted) -- no data loaded In addition: Warning message: file
‘sample.RData’ has magic number 'RDX3' Use of save versions prior to
2 is deprecated
I checked many solutions like, to use save() instead of load etc. but not successful. Please recommend any solution.
Are you trying to load like this:
load("your_directory/object_name.RData")

Text mining with tm in R antiword error

So I'm rather new to R, and I'm learning how to mine text from this handy website: https://eight2late.wordpress.com/2015/05/27/a-gentle-introduction-to-text-mining-using-r/
I do have my own text set of .doc, .docx, and .xlsx files and I'm trying to mine them. They're located in a folder in my working directory called 'files', but I have already encountered an error after simply writing a few lines of code.
The code I have so far is:
library(tm)
library(readtext)
data = readtext('files')
At this point, after waiting for 25 seconds or so, I get the error:
Error: System call to 'antiword' failed (1): The Big Block Depot is damaged
and the code stops running there.
I have tried searching online for solutions but it seems like a fairly rare error and so I only found 1 possible solution at https://github.com/ropensci/antiword/issues/1 but that did not work for me.
This solution suggested that one of my files were corrupt, and suggested using the code
fixInNamespace(antiword, pos="package:antiword")
to change the error to a warning to not interrupt the reading of the files. I tried that, and at first it raised the error of
Error in as.environment(pos):
no item called "package:antiword" on the search list
After which, I loaded the antiword library with a library(antiword) and changed the stop( to a warning(. However, when I ran the data = readtext('files') line again, it immediately raised the error
Error in is_windows() : could not find function "is_windows"
I'm at a loss here! Any help would be appreciated. Should I be using another package in this case?
I had the same problem with my code, where I tried to get a doc. file in R. I also used the readtext library. What helped me was converting the Word documents I was trying to get into R from doc. to docx. When I ran the same code after it worked.

"Warning message: package ‘ XLConnect ’ is not available (for R version 3.3.3)"

I receive this message when I try to install "XLConnect" in R. I am trying to use some data from excel, and I don't know how else to load it into the program.
I'm completely new to R and programming, so any help is greatly appreciated!
This usually happens when there is a typo in the package name. The code below should work. My guess is that the capitalization might have been off when you tried it?
install.packages("XLConnect")
I'm reasonably sure that you had spaces on either side of term "XLConnect". At least that is what your error message is telling us. If trying what Mallick suggested does not succeed, then [edit] to include exact copies of any error message in the body of the question.
Other ways to transfer data from Excel to R include copying to the Clipboard or exporting as a '.csv' file. There must be hundreds of questions and answers about the Excel - to - R Highway Eternal Resurfacing Project. One more recent addition is the readxl package (function is read_excel) and that choice doesn't have the Java version dependencies that cause trouble for some XLConnect useRs.

Error in R Tesseract

I have the R Tesseract package working with the default eng.traineddata under OSX, but it simply won't find other languages.
trial <- ocr("test.png", engine = tesseract(language = "jpn", datapath="/Users/histmr/Library/R/3.3/library/tesseract/tessdata"))
Generates the error:
Failed loading language 'jpn'
Tesseract couldn't load any languages!
Error in tesseract_engine_internal(datapath, language) :
Unable to find training data for: jpn
I've checked with
tesseract_info()
$datapath
[1] "/Users/histmr/Library/R/3.3/library/tesseract/tessdata/"
$available
[1] "eng" "jpn"
$version
[1] "3.05.00"
Sometimes I get references to a "TESSDATA_PREFIX environment variable" but I don't know where that is. How can I get the correct directory path (I can see the file in the directory) or edit the "TESSDATA_PREFIX environment variable"?
The problem seems to occur with Japanese but NOT French
tesseract_download("fra")
french <- tesseract("fra")
Works fine! But
tesseract_download("jpn")
japanese <- tesseract("jpn")
Generates an error
The error message Error in tesseract_engine_internal(datapath, language) said the language file, in your case jpn.traineddata, is not available in the TESSDATA_PREFIX which is the default path for storing all the trained language data. If you haven't set the path, you may open a terminal and type the command below.
export TESSDATA_PREFIX=/Users/histmr/Library/R/3.3/library/tesseract/tessdata/
Hope this help.
One possible problem is multiple installs of Tesseract (I used Homebrew and MacPorts) creating multiple TESSDATA folders. Strangely R was happier with a seemingly identical folder, but in a different place closer to root, ordinarily hidden under OSX. I got things working with
export TESSDATA_PREFIX=/opt/local/share
I hope this helps

Error in file(file, "rt") : cannot open the connection now occurring after running OK

I have tried to rerun some programs which worked perfectly well before, and most of them use the same call as below but now none of them work.
It could be a latent problem with the code come to life or it is something within my overall environment that has changed.
I have tried reloading the complete directory from archive from 2 months ago when I was using the programmes and they do not work now.
After reading the previous answers I have tried
setwd("Documents/Paper1/ThirdDraft/DTW_DATA")`
to make data and program folders peers but R will not let me change it.
I am an occasional user of R and not conversant with the environment so some hints / advice on a possible approach would be very helpful.
Thank you
The data files are at Documents/Paper1/ThirdDraft/DTW_DATA/Binned_Base_Data
working directory at /Users/briank/Documents/Paper1/ThirdDraft/DTW_DATA/DTW_R_Programmes`
#
# Import Data
#
chan11Data <- read.csv("Documents/Paper1/ThirdDraft/DTW_DATA/Binned_Base_Data/Channel_11.csv",
+ header = TRUE, fill = TRUE)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'Documents/Paper1/ThirdDraft/DTW_DATA/Binned_Base_Data/Channel_11.csv': No such file or directory
If your working directory is
/Users/briank/Documents/Paper1/ThirdDraft/DTW_DATA/DTW_R_Programmes
Then R won't find this file
"Documents/Paper1/ThirdDraft/DTW_DATA/Binned_Base_Data/Channel_11.csv"
but it should be able to find
"/Users/briank/Documents/Paper1/ThirdDraft/DTW_DATA/Binned_Base_Data/Channel_11.csv"
or
"../Binned_Base_Data/Channel_11.csv"
You are mixing up absolute and relative file paths, and that as a consequence, you are implementing bad coding practices that result in the kind of trouble that you are facing right now.
To be clear,
/Users/briank/Documents/Paper1/ThirdDraft/DTW_DATA/
is an absolute file path that starts at the root of your disk, whereas
Documents/Paper1/ThirdDraft/DTW_DATA
is a relative file path that will start in your working directory.
Your current predicament
Since everything that you are doing happens in the following directory, that directory should be your working directory:
/Users/briank/Documents/Paper1/ThirdDraft/DTW_DATA/
If you are using RStudio, which I recommend you do, then I recommend that you create a Project in that directory. Opening the Rroject (e.g. DTW_DATA.Rproj) will automatically set the working directory to the path above (i.e. what you did with setwd).
Now, within that directory, you seem to have two directories:
DTW_R_Programmes – which is where I suppose your R scripts are located
Binned_Base_Data – which is where your data, e.g. Channel_11.csv, are located
A possible fix
If you opened your R script by double-clicking it, it is likely that your working directory was set to DTW_R_Programmes -- in which case you need to "go back" one level to find your data, as in
../Binned_Base_Data/Channel_11.csv
Another possible fix
Instead of using ../ as above, the solution I would like to recommend is to move all your R scripts to the root directory of your project, i.e. DTW_DATA. That should limit confusion in the future, and make your project more manageable.
What you want to have is
DTW_DATA/
1_a_script.r
2_another_script.r
etc.
Binned_Base_Data/
Channel_11.csv
Channel_12.csv
etc.
Then, in your script(s), just indicate
d <- read.csv("Binned_Base_Data/Channel_11.csv")
… and enjoy the sheer simplicity of it.
If the first line of your #rstats script is "setwd(..." I will come into your lab and SET YOUR COMPUTER ON FIRE. -- Twitter

Resources