R - apriori() support vs coverage? - r

I am using apriori to find assocaition rules and I am running into an issue:
| rule | support | confidence | lift | coverage |
|---------------|--------------------|------------------|--------------|-------------------|
| {A} => {B} | 8.616999e-05 | 0.01502544 | 19.11896 | 0.005734940 |
| {A} => {C} | 8.944227e-05 | 0.01559602 | 49.05084 | 0.005734940 |
The manual states that Coverage:
Provides the generic function and the needed S4 method to calculate the coverage (support of the
left-hand-side) of rules.
For a small ruleset coverage is equal to support. Why does coverage differ from support for large rulesets?

From the man page for interestMeasure:
"coverage", cover, LHS-support Support of the left-hand-side of the rule, i.e., supp(X). A measure of to how often the rule can be applied. Range: [0, 1]
So, coverage is greater or (in some rare cases) equal to the support of the rule.
Sorry that this is somewhat confusing in the documentation, but, you know, we all love to maintain documentation. I will improve the documentation for coverage in the next release...

Related

how to find the complexity of T(n) = T(3n/4) + T(n/3) + n^2

I was asked to find the asymptotic complexity of the given function using recursion tree
but I'm struggling to find the correct complexity at each level
Let's draw out the first two levels of the recursion tree:
+------------------+
| Input size n |
| Work done: n^2 |
+------------------+
/ \
+--------------------+ +--------------------+
| Input size: 3n/4 | | Input size: n/3 |
| Work done: 9n^2/16 | | Work done: n^2/9 |
+--------------------+ +--------------------+
Once we've done that, let's sum up the work done by each layer. That top layer does n2 work. That next layer does
(9/16)n2 + (1/9)n2 = (43/48)n2
total work. Notice that the work done by this second level is (43/48)ths of the work done in the level just above it. If you expand out a few more levels of the recursion tree, you'll find that the next level does (43/48)2n2 work, the level below that does (43/48)3n2 work, and that more generally the work done by level l in the tree is (43/48)ln2. (Convince yourself of this - don't just take my word for it!)
From there, you can compute the total amount of work done by recursion tree by summing up the work done per level across all the levels of the tree. As a hint, you're looking at the sum of a geometric sequence that decays from one term to the next - does this remind you of any of the cases of the Master Theorem?

Need I worry about bugs in old FF versions?

As per Paul Irish's advice I'm using border-box box sizing with a universal selector.
As part of my effort to make my site able to handle increased font sizes set by visually impaired users, I've set my header dimensions with em. It also has min-width and min-height set, but in pixels, in order that the heading block won't shrink below given minimums if the user shrinks the font.
The problem is I read that Firefox up until version 17 has a border-box with min/max height bug (also in this SO question). Given that I now have FF 30 on my machine, which I assume is common, should I be worrying about a FF browser bug from two years ago?
Just because browsershots.org and browserstack.com offer screenshots for old FF, should I worry about them? I hear about people on intranets being stuck with IE 6, 7 or 8, but might this apply to Firefox? I read that, by default, Firefox is set to automatically update itself.
(I don't think my site targets a particular enough audience to able to hazard a guess about what they'd be using or from where/with what they will be viewing.)
If I should implement a workaround for this bug, none of suggestions I've seen involve simply setting box-sixing back to content-box only for the header, and then adjusting its dimenions to suit. Can I simply do that?
It seems I may also be subject to an old FF background-image-not-displaying bug.
According to StatCounter, this is the use of old Firefox versions (July 2014):
Version | 2.0 | 3.0 | 3.5 | 3.6 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16
‰ | .2 | .8 | .2 | 1.6 | .4 | .3 | .4 | .2 | .6 | .3 | .7 | .6 | 1.4 | .6 | .6 | .8 | 1.1
Suming them all, it's 10.8 ‰ = 1.08 %
It's you who should decide if it's worth implementing a workaround for those 1% or not.

create complex gremlin-java query

I have model implemented in titan graph database with relations presented below:
[A] ---(e1)---> [B] <---(e2)--- [C] ---(e3)---> [D]
| | | | | | |
prop:id | prop:number | | label:e3 |
| | prop:id |
label:e1 label:e2 prop:number
prop:prop1
A and B are "main vertices" (for example users), vertices B and C are "less important vertices" describing some data connected with users.
The input for the query algorithm is property id of vertex A.
I want to find all such vertices D, that are connected with A in the manner shown above. What's more I want to remember the property prop1 of the edge e1 between A and B.
More precisely, I want to efficiently retrieve pairs (prop1, numberD) where prop1 is the property of edge between A -> B (if the edge has this property), and numberD is the property number from D.
I don't know how to efficiently implement this query.
It is easy to retrieve only vertices D (using GremlinPipes):
pipe
.start(startVertex)
.outE("e1")
.inV().hasProperty("number")
.inE("e2")
.outV().hasProperty("id")
.outE("e3")
.inV().hasProperty("number");
But problems occur when I need to get also edges e1 and match them with vertices D.
I tried to compute all these steps separately, but is seems to be very inefficient.
Do you have any suggestions how to implement this (maybe using several queries) using gremlin-java or gremlin-groovy?
Thanks!
Take a look at the Pattern Match Pattern described here:
https://github.com/tinkerpop/gremlin/wiki/Pattern-Match-Pattern
startVertex.outE('e1').as('e')
.inV().hasProperty('number').inE("e2")
.outV().hasProperty("id")
.outE("e3")
.inV().hasProperty("number").as('d')
.table(t)
This should give an iterator of maps
[e:e1, d:D]
From each of these maps, you can easily extract the properties you are interested in.

Qt QTableView top-to-bottom flow

I have a grid that's display using a control that inherits from QTableView. Right now the grid is displayed left-to-right and then as things overflow, it goes to the next row like this
-----------
| 1 | 2 | 3 |
|------------
| 4 | | |
|------------
| | | |
-----------
but I want it to go top-to-bottom first and then as things overflow, it should go to next column like this
-----------
| 1 | 4 | |
|------------
| 2 | | |
|------------
| 3 | | |
-----------
I'm mostly a .Net developer and it's pretty trivial with .net winforms controls, but how do I get QTableView to do this?
Thanks
The data displayed is a function of your model. The best way to change the behavior is to create a proxy QAbstractTableModel that swaps the rows and the columns. How complicated this will be will be dependent on your current model. It almost sounds like you have a linear model and that the view just presents it in a table-like fashion in which case you're probably using the wrong model.
If you really do have a linear model, consider using QAbstractListModel and QListView. The QListView then has a flow property that will allow you to choose between left-to-right and top-to-bottom.

Translation of clustering problem to graph theory language

I have a rectangular planar grid, with each cell assigned some integer weight. I am looking for an algorithm to identify clusters of 3 to 6 adjacent cells with higher-than-average weight. These blobs should have approximately circular shape.
For my case the average weight of the cells not containing a cluster is around 6, and that for cells containing a cluster is around 6+4, i.e. there is a "background weight" somewhere around 6. The weights fluctuate with a Poisson statistic.
For small background greedy or seeded algorithms perform pretty well, but this breaks down if my cluster cells have weights close to fluctuations in the background i.e. they will tend to find a cluster even though there is nothing. Also, I cannot do a brute-force search looping through all possible setups because my grid is large (something like 1000x1000) and I plan to do this very often (10^9 times). I have the impression there might exist ways to tackle this in graph theory. I heard of vertex-covers and cliques, but am not sure how to best translate my problem into their language. I know that graph theory might have issues with the statistical nature of the input, but I would be interest to see what algorithms from there could find, even if they cannot identify every cluster.
Here an example clipping: the framed region has on average 10 entries per cell, all other cells have on average 6. Of course the grid extends further.
| 8| 8| 2| 8| 2| 3|
| 6| 4| 3| 6| 4| 4|
===========
| 8| 3||13| 7| 11|| 7|
|10| 4||10| 12| 3|| 2|
| 5| 6||11| 6| 8||12|
===========
| 9| 4| 0| 2| 8| 7|
For graph theory solutions there are a couple of sentences in wikipedia, but you are probably best posting on MathOverflow. This question might also be useful.
The traditional method (and probably best considering its ubiquity) in computing to solve these is raster analysis - well known in the world of GIS and Remote Sensing, so there are a number of tools that provide implementations. Keywords to use to find the one most suited to your problem would be raster, nearest neighbor, resampling, and clustering. The GDAL library is often the basis for other tools.
E.g. http://clusterville.org/spatialtools/index.html
You could try checking out the GDAL library and source code to see if you can use it in your situation, or to see how it is implemented.
For checking for circular shapes you could convert remaing values to polygons and check the resultant feature
http://www.gdal.org/gdal_polygonize.html
I'm not sure I see a graph theory analogy, but you can speed things up by pre-computing an area integral. This feels like a multi-scale thing.
A[i,j] = Sum[Sum[p[u,v], {u, 0, i},
{v, 0, j}]];
Then the average brightness of a rectangular (a,b), (c,d) region is
(A[c,d] - (A[c,b] + A[a,d]) + A[a,b])/((c-a)(d-b))
Overflow is probably not your friend if you have big numbers in your cells.
Use the union-find algorithm for clustering? It's very fast.
I guess the graph would result from considering each pair of neighboring high-valued cells connected. Use the union-find algorithm to find all clusters, and accept all those above a certain size, perhaps with shape constraints too (eg based on average squared distance from cluster center vs cluster size). It's a trivial variation on the union-find algorithm to collect statistics that you would need for this as you go (count, sum of x, sum of x^2, sum of y, sum of y^2).
If you are just looking for a way to translate your problem into a graph problem heres what you could do.
From each point look at all your neighbors (this could be the 8 adjacent squares or the 4 adjacent depending on what you want). Look for the neighbor with the max value, if it is larger than you then connect yourself to this square. if it is smaller then do nothing.
after this you will have a forest or possibly a tree (though i imagine that that is unlikely). This is one possible translation of your matrix to a graph. Im not sure if it is the most useful translation.

Resources