Constructing an object using the genoset package in R - r

The genoset R package has a function for building a GenoSet by putting together several matrices and a RangedData object that specifies co-ordinates.
I have the following objects - three matrices, all with the same name, and a RangedData object of the following format (called locData).
space ranges |
<factor> <IRanges> |
cg00000957 1 [ 5937253, 5937253] |
cg00001349 1 [166958439, 166958439] |
cg00001583 1 [200011786, 200011786] |
cg00002028 1 [ 20960010, 20960010] |
cg00002719 1 [169396706, 169396706] |
cg00002837 1 [ 44513358, 44513358] |
When I try to create a GenoSet, though, I get the following error.
DMRSet=GenoSet(locData,Exprs,meth,unmeth,universe=NULL)
Error in .Call2("IRanges_from_integer", from, PACKAGE = "IRanges") :
cannot create an IRanges object from an integer vector with missing values.
What am I doing wrong? all the objects I'm putting together have the same rownames, except for the IRanges object itself, which I don't think has rownames since it isn't a matrix.
Additionally, the "column" of locData has non-integer characters.
Thank you!

It sounds like your "locData" may not be a RangedData. It can alternatively be a GRanges. Either way, you will want to name all of your arguments.
The underlying eSet class will be upset about that once you get past the locData trouble.
DMRSet=GenoSet(locData=locData,exprs=Exprs,meth=meth,unmeth=unmeth,universe=NULL)
Pete

Related

How to remove seq of array elments matching a pattern at the beginning and the end?

Suppose that I have a json file, in which the following pattern appears many times.
... [ ... ["X"], ... ,["Y"] ... ] ...
I want to remove everything between each pair of ["X"] and ["Y"]. How can I do it?
Assuming pairwise occurrences, and that "between" means excluding the border items, you could query the indices of the border items, and fetch everything in between.
Given the following sample, the following filter will produce the following output:
[
["A"], ["B"], ["X"], ["C"], ["D"], ["Y"], ["E"], ["F"],
["G"], ["H"], ["X"], ["I"], ["J"], ["Y"], ["K"], ["L"]
]
[
(
[[indices([["X"]]), indices([["Y"]])] | 0, transpose[][], infinite]
| _nwise(2)
)
as [$a, $b] | .[$a:$b+1][]
]
[["A"],["B"],["X"],["Y"],["E"],["F"],["G"],["H"],["X"],["Y"],["K"],["L"]]
If the border items may occur in any order, and not necessarily in equal amounts, before transposing the two lists of indices they would need to be filtered first to only contain mutually successive positions.
The question seems to have two components:
(1) Given an array, how can I delete all segments that are "bookended" by two values?
(2) Given a single JSON entity (aka document), how can I perform the above-mentioned deletion operation on all arrays, no matter where they occur within the document?
Here, I will offer an alternative to #pmf's solution for (1) and show how to apply it to an entire JSON entity.
Here's the alternative, which has the possible advantage that it doesn't make any strong assumptions about the occurrence of the $x and $y values, and allows for both interpretations regarding the removal of the bookends themselves:
# Input: an array
# Remove all stretches from $x to the next $y,
# removing both bookends too if and only if $bookends.
# Both bookends must be present for a stretch to be removed.
def remove_all_xy($x; $y; $bookends):
# The helper function removes a single stretch from $x to $y, if any
def r:
index($x) as $ix
| if $ix then .[$ix+1:] as $tail
| ($tail | index($y)) as $iy
| if $iy
then (if $bookends then 0 else 1 end) as $adjust
| .[:$ix + $adjust] + ($tail | .[1+$iy - $adjust:] | r)
else . end
else . end;
r;
Now let's say you decide on some function, foo($x;$y;$bookends), for performing the per-array operation. To apply it to the whole document,
you could write:
walk(if type == "array" then foo($x;$y;$bookends) else . end)
This might not be as efficient as possible, but in practice it should suffice. (If not, then simply adapt the standard walk.)

groupby an element with jq

I have the following json:
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d2"},"org":"TΙ UIH","rc":{"$event":"13"}}
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d3"},"org":"TΙ UIH","rc":{"$event":"13"}}
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d4"},"org":"AB KIO","rc":{"$event":"13"}}
{"us":{"$event":"5bbf4a4f43d8950b5b0cc6d5"},"org":"GH SVS","rc":{"$event":"17"}}
How could i achieve the following output result? (tsv)
13 TΙ UIH 2
13 AB KIO 1
17 GH SVS 1
so far from what i have searched,
jq -sr 'group_by(.org)|.[]|[.[0].org, length]|#tsv'
how could i add one more group_by to achieve the desired result?
I was able to obtain the expected result from your sample JSON using the following :
group_by(.org, .rc."$event")[] | [.[0].rc."$event", .[0].org, length] | #tsv
You can try it on jqplay.org.
The modification of the group_by clause ensures we will have one entry by pair of .org/.rc.$event (without it we would only have one entry by .org, which might hide some .rc.$event).
Then we add the .rc.$event to the array you create just as you did with the .org, accessing the value of the first item of the array since we know they're all the same anyway.
To sort the result, you can put it in an array and use sort_by(.[0]) which will sort by the first element of the rows :
[group_by(.org, .rc."$event")[] | [.[0].rc."$event", .[0].org, length]] | sort_by(.[0])[] | #tsv

How to get average of last N numbers in a stream with static memory

I have a stream of numbers and in every cycle I need to count the average of last N of them. This can be, of course, solved using an array where I store the last N numbers and in every cycle I shift it, add the new one and count the average.
N = 3
+---+-----+
| a | avg |
+---+-----+
| 1 | |
| 2 | |
| 3 | 2.0 |
| 4 | 3.0 |
| 3 | 3.3 |
| 3 | 3.3 |
| 5 | 3.7 |
| 4 | 4.0 |
| 5 | 4.7 |
+---+-----+
First N numbers (where there "isn't enough data for counting the average") doesn't interest me much, so the results there may be anything/undefined.
My question is, can this be done without using an array, that is, with static amount of memory? If so, then how?
I'll do the coding myself - I just need to know the theory.
Thanks
Think of this as a black box containing some state. If you control the input stream, you can draw conclusions on the state. In your sliding window array-based approach, it is kind of obvious that if you feed a bunch of zeros into the algorithm after the original input, you get a bunch of averages with a decreasing number of non-zero values taken into account. The last one has just one original non-zero value, so if you multiply that my N you get the last input back. Using that and the second-to-last output which accounts for two non-zero inputs, you can reconstruct the second-to-last input, and so on.
So essentially your algorithm needs to maintain sufficient state to reconstruct the last N elements of input, at least if you formulate it as an on-line algorithm. I don't think an off-line algorithm can do any better, except if you consider it reading the input multiple times, but I don't have as strong an agument for that.
Of course, in some theoretical models you can avoid the array and e.g. encode all the state into a single arbitrary length integer, but that's just cheating the theory, and doesn't make any difference in practice.

Column data types classification in R

I have a database. How to get all of colums types, and save it to a file. Distinctive types:
- Float
- Integer
- BigInteger
- String
My code is:
library(foreign)
library(memisc)
data <- read.spss("data.sav", use.value.labels = FALSE, max.value.labels = 100)
write.table(summary(data), "out.txt")
But, this code only distinguishes between two types of data... (numeric, String)
out sample:
Length Class Mode
SubsID 20582 -none- numeric
SubsID_RN 20582 -none- character
responseid 20582 -none- numeric
required output:
SubsID BigInteger
SubsID_RN String
responseid Integer
In R, the type system works differently from many of the other common languages. First of, everything in R is an object and one of the basic object types is the vector. The type of the vector itself is defined by the data that it contains. There are six atomic vector types which can be accessed by the typeof function. In the R documentation you can find the following table
+------------+----------+--------------+
| typeof | mode | storage.mode |
+------------+----------+--------------+
| logical | logical | logical |
| integer | numeric | integer |
| double | numeric | double |
| complex | complex | complex |
| character | character| character |
| raw | raw | raw |
+------------+----------+--------------+
As you can see, there is no difference between float and double or Integer and BigInteger. Also a String is just a character in R.
So in your case, if you want to know the specific basic type of each of your variables, you could use
lapply(data, typeof)
The R documentation has more information about objects and basic types:
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects
you can call the class or the type of your columns like this:
lapply(your_data_frame, class)
lapply(your_data_frame, typeof)
there's no such thing as 'BigInteger' in R. cf. data structures in hadley's adv-r for a more detailed explanation

How to create a matrix with dynamic rows and columns in ASP.NET?

I have to make a control in ASP.NET that allows me to create a matrix. I have a list of strings (obtained from a method) that will be the rows (each string is one row), and I have another list of strings (obtained from other method) that will be the columns (each string is one column). After that, depending on the row-cloumn cross I have to put an image in that position, something like this:
x | y | z
a | OK | OK | BAD|
------------------
b | OK |BAD | OK |
------------------
c |BAD |BAD | BAD|
How can I achieve this? Thanks a lot in advance!
You can use nested Repeaters.
The outer repeater for rows, the inner one for columns/cells.

Resources