chromoMap error - subscript out of bounds - r

I am new to this community and hope my post is correct.
I tried to run a test case on chromoMap using the chromosome file (chrom) and annotation file (anno) below
> read.table(chrom)
V1 V2 V3
1 7 1 1000
> read.table(anno)
V1 V2 V3 V4
1 An1 7 10 30
2 An2 7 15 40
Unfortunately, I constantly ran into an error shown below. I tried to have a look into the code, but was not able to figure out the problem.
> chromoMap(chrom,anno)
********************************** __ __ ************
** __**|__ * __* __ * __ __ * __ *| | |* __ * __ **
**|__**| |*| *|__|*| | |*|__|*| | |*|_ |*|__|**
***********************************************| **
*****************************************************
OUTPUT:
Number of Chromosome sets: 1
Number of Chromosomes in set 1 : 1
Processing data..
Number of annotations in data set 1 : 2
Error in temp.list[[inputData[[h]]$ch_name[i]]] : subscript out of bounds
Also the columns seemed to be recognized correctly.
> ncol(read.table(anno))
[1] 4
> ncol(read.table(chrom))
[1] 3
I know its a trivial problem but I am happy for any suggestions.
Thanks!

When using R objects as input, you need to pass it within a list like:
chromoMap(list(chrom),list(anno))
It allows passing the objects for multiple ploidy within a list. For instance, for ploidy = 2 ,you can use:
chromoMap(list(chrom1,chrom2),list(anno1,anno2))
Thanks!

Try to change your chromosome name from 7 to VII.

Related

Selecting columns in R depending on user input

I have a dataframe col_metadata in R that goes as:
sample | b | c | ...
____________________
S1 | 1 | 1 | ...
S2 | 1 | 2 | ...
S3 | 2 | 2 | ...
S4 | 3 | 3 | ...
I want to make a function that gives me samples that have given values in front of them. For eg.,
fun(b,c(1,2))
should return
S1 S2 S3
while
fun(c,c(2,3))
should return
S2 S3 S4
and so on. If the column would have been fixed (say, b), I could simply do:
col_metaData[col_metaData$b %in% inputList,]$sample
But since there can be many more columns(hence I can't use if-else), I was looking for a different method to do the same. Can someone please help me do this? Thanks...
I solved it. Just in case anyone comes looking for an answer, we can use this:
col_metaData[col_metaData[,b] %in% inputList,]$sample
Notice [,b] instead of $b.

Syntax error when using count in loop

I am trying to run a loop where I count the total in each file under the variable _merge, and then count certain outcomes of _merge, such as _merge=1 and so on. I then want to calculate percentages by dividing each instance of _merge by the total under _merge.
Below is my code:
/*define local list*/
local ward_names B C D E FN FS GS HE
/*loop for each dbase*/
foreach file of local ward_names {
use "../../../cleaning/sra/output/`file'_ward_CTS_Merged.dta", clear
count if _merge
local ward_count=r(N)
count if _merge==1
local count_master=r(N)
count if _merge==2
local count_using=r(N)
count if _merge==3
local count_match=r(N)
clear
set obs 1
g ward_count='ward_count'
g count_master=`count_master'
g count_using=`count_using'
g count_match=`count_match'
g ward= "`file'"
save "../temp/`file'_collapsed_diagnostics.dta", replace
clear
The code was running fine until I tried to add the total count for each ward file:
g ward_count='ward_count'
'ward_count' invalid name
Is this a syntax error or something more severe?
You need to use ` instead of ' when you refer to a local macro:
generate ward_count = `ward_count'
EDIT:
As per #NickCox's recommendation you can improve your code by using the tabulate command with its matcell() option to get the counts all at once:
tabulate _merge, matcell(A)
_merge | Freq. Percent Cum.
------------------------+-----------------------------------
master only (1) | 1 16.67 16.67
matched (3) | 5 83.33 100.00
------------------------+-----------------------------------
Total | 6 100.00
matrix list A
A[2,1]
c1
r1 1
r2 5
So you could then do the following:
generate count_master = A[1,1]
generate count_match = A[2,1]

How to print git history in rmarkdown?

I am writing an analysis report with rmarkdown and would like to have a "document versions" section in which I would indicate the different versions of the document and the changes made.
Instead of writing it down manually, I was thinking about using git history and inserting it automatically in the markdown document (formatting it in a table).
How can I do that? Is it possible?
Install git2r, https://github.com/ropensci/git2r then you can do stuff like:
> r = repository(".")
> cs = commits(r)
> cs[[1]]
[02cf9a0] 2017-02-02: uses Rcpp attributes instead of inline
So now I have a list of all the commits on this repo. You can get the info out of each commit and format as per your desire into your document.
> summary(cs[[1]])
Commit: 02cf9a0ff92d3f925b68853374640596530c90b5
Author: barryrowlingson <b.rowlingson#gmail.com>
When: 2017-02-02 23:03:17
uses Rcpp attributes instead of inline
11 files changed, 308 insertions, 151 deletions
DESCRIPTION | - 0 + 2 in 2 hunks
NAMESPACE | - 0 + 2 in 1 hunk
R/RcppExports.R | - 0 + 23 in 1 hunk
R/auxfunctions.R | - 1 + 1 in 1 hunk
R/skewt.r | - 0 + 3 in 1 hunk
R/update_params.R | - 1 + 1 in 1 hunk
R/update_params_cpp.R | -149 + 4 in 2 hunks
src/.gitignore | - 0 + 3 in 1 hunk
src/RcppExports.cpp | - 0 + 76 in 1 hunk
src/hello_world.cpp | - 0 + 13 in 1 hunk
src/update_params.cpp | - 0 +180 in 1 hunk
So if you just want the time and the commit message then you can grab it out of the object.
> cs[[3]]#message
[1] "fix imports etc\n"
> cs[[3]]#committer#when
2017-01-20 23:26:20
I don't know if there's proper accessor functions for these rather than using #-notation to get slots. Need to read the docs a bit more...
You can make a data frame from your commits this way:
> do.call(rbind,lapply(cs,function(cs){as(cs,"data.frame")}))
which converts the dates to POSIXct objects, which is nice. Creating a markdown table from the data frame should be trivial!
You can manually convert git log to markdown with pretty=format [1]
Something like
git log --reverse --pretty=format:'| %H | %s |'
This will output something like this:
| a8d5defb511f1e44ddea21b42aec9b03ee768253 | initial commit |
| fdd9865e9cf01bd53c4f1dc106ee603b0a730f48 | fix tests |
| 10b58e8dd9cf0b9bebbb520408f0b342df613627 | add Dockerfile |
| d039004e8073a20b5d6eab1979c1afa213b78fa3 | update README.md |
1: https://git-scm.com/docs/pretty-formats

How to eliminate nothing elements in a array (1D) in Julia?

I would like to know how I could eliminate nothing elements in a Julia array (1D) like the one below. It was built from reading a text file with lines with no relevant information mixed with lines with relevant information. "nothing" is type Void and I would like to clean the array of all of it.
nothing
nothing
nothing
nothing
nothing
" -16.3651\t 0.1678\t -4.6997\t -14.0152\t -2.6855\t -16.0294\t -7.8049\t -27.1912\t -5.0354\t -14.5187\t\r\n"
" -16.4490\t -1.0910\t -3.6087\t -12.6724\t -1.5945\t -14.7705\t -7.2174\t -25.2609\t -3.7766\t -14.3509\t\r\n"
" -16.4490\t -2.2659\t -2.4338\t -10.9100\t -0.5875\t -13.6795\t -6.7139\t -22.9950\t -2.9373\t -14.0991\t\r\n"
testvector[testvector.!=nothing] is also a very readable option.
benchmarking can help choose the most efficient code.
How are you reading that file?
You can filter out nothings from an array:
filter(x -> !is(nothing, x), [nothing, 42]) # => Any[42]
But you may want to clean your data first, with a tsv (tab separated values) file like this:
-16.3651 0.1678 -4.6997 -14.0152 -2.6855 -16.0294 -7.8049 -27.1912 -5.0354 -14.5187
-16.4490 -1.0910 -3.6087 -12.6724 -1.5945 -14.7705 -7.2174 -25.2609 -3.7766 -14.3509
-16.4490 -2.2659 -2.4338 -10.9100 -0.5875 -13.6795 -6.7139 -22.9950 -2.9373 -14.0991
Using readdlm:
julia> readdlm("data.tsv")
3x10 Array{Float64,2}:
-16.3651 0.1678 -4.6997 -14.0152 … -27.1912 -5.0354 -14.5187
-16.449 -1.091 -3.6087 -12.6724 -25.2609 -3.7766 -14.3509
-16.449 -2.2659 -2.4338 -10.91 -22.995 -2.9373 -14.0991
Using DataFrmaes.readtable:
julia> df = readtable("data.tsv");
julia> names!(df, [symbol(x) for x in 'A':'J'])
2x10 DataFrames.DataFrame
| Row | A | B | C | D | E | F | G |
|-----|---------|---------|---------|----------|---------|----------|---------|
| 1 | -16.449 | -1.091 | -3.6087 | -12.6724 | -1.5945 | -14.7705 | -7.2174 |
| 2 | -16.449 | -2.2659 | -2.4338 | -10.91 | -0.5875 | -13.6795 | -6.7139 |
| Row | H | I | J |
|-----|----------|---------|----------|
| 1 | -25.2609 | -3.7766 | -14.3509 |
| 2 | -22.995 | -2.9373 | -14.0991 |
one simple way is using filter! function to update your vector like this:
testvector=[fill(nothing,10) ; [1,2,3]];
# =>13-element Array{Any,1}:
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# nothing
# 1
# 2
# 3
filter!(x->x!=nothing, testvector)
# => 3-element Array{Any,1}:
# 1
# 2
# 3
thanks #Daniel Arndt
EDIT, Refer to this paragraph from Julia doc:
nothing is a special value that does not print anything at the
interactive prompt. Other than not printing, it is a completely normal
value and you can test for it programmatically.
I think all of the conditions below, reach us to the same result
x!=nothing
x!==nothing
!is(x,nothing)
!isa(x,Void)
typeof(x)!=Void
To add to the answers above, it appears:
filter(!isnothing, [nothing, 42])
is a working shorthand for filter(x -> !isnothing(x), [nothing, 42]), and will correctly return 42.
Dear All,
At the end, the code became this:
tmpFile=open(fileName)
tmp=readdlm(tmpFile);
ind=pmap(typeof,tmp[:,1]).!=SubString{ASCIIString}; # if the first column typeof is string, than pmap will return false, else, it return true. This will provide an index of valid/not valid rows.
tmpClean=tmp[ind,:]; # only valid rows will be used
If you may have any suggestion to improve it, I would appreciate it. Thank you for your help.

Replacement and non-matches with 'sub'

Months ago I ended up with a sub statement that originally worked with my input data. It has since stopped working causing me to re-examine my ugly process. I hate to share it but it accomplished several things at once:
active$id[grep("CIR",active$description)] <- sub(".*CIR0*(\\d+).*","\\1",active$description[grep("CIR",active$description)],perl=TRUE)
This statement created a new id column by finding rows that had an id embedded in the description column. The sub statement would find the number following a "CIR0" and populate the id column iff there was an id within a row's description. I recognize it is inefficient with the embedded grep subsetting either side of the assignment.
Is there a way to have a 'sub' replacement be NA or empty if the pattern does not match? I feel like I'm missing something very simple but ask for the community's assistance. Thank you.
Example with the results of creating an id column:
| name | id | description |
|------+-----+-------------------|
| a | 343 | Here is CIR00343 |
| b | | Didn't have it |
| c | 123 | What is CIR0123 |
| d | | CIR lacks a digit |
| e | 452 | CIR452 is next |
I was struggling with the same issue a few weeks ago. I ended up using the str_match function from the stringr package. It returns NA if the target string is not found. Just make sure you subset the result correctly. An example:
library(stringr)
str = "Little_Red_Riding_Hood"
sub(".*(Little).*","\\1",str) # Returns 'Little'
sub(".*(Big).*","\\1",str) # Returns 'Little_Red_Riding_Hood'
str_match(str,".*(Little).*")[1,2] #Returns 'Little'
str_match(str,".*(Big).*")[1,2] # Returns NA
I think in this case you could try using ifelse(), i.e.,
active$id[grep("CIR",active$description)] <- ifelse(match, replacement, "")
where match should evaluate to true if there's a match, and replacement is what that element would be replaced with in that case. Likewise, if match evaluates to false, that element's replaced with an empty string (or NA if you prefer).

Resources