Quering association rules created by weka apriori algorithm - associations

The aprori algorithm produces a large number of rules, is there any way query/filter the resulting ruleset, for example looking for rules with specific items appearing in the antecedent, or rules with specific size?

I can help you only partially.
When using apriori algorithm in R, you can add an option to specify the size of the output rules. The attribute to do that is maxlen.
In this case, for example, I want to obtain only rules with size 2.
rules <- apriori(orders, parameter = list(supp = 0.01, conf = 0.5,
maxlen=2))
By this attribute we will obtain rules such as:
[item1] -> [item2]
Daniele

Related

How to use Simulated Annealing in R (GenSA) for a function with discrete variables with a few options without pattern?

I want to use Simulated Annealing. My objective function exist of multiple variables, for some of them there are only a few options possible. I saw the same question on Stack here:
How to use simulated annealing for a function with discrete paremeters?, but there was no answer but a reference to: How to put mathematical constraints with GenSA function in R.
I don't understand how to apply the advice from the second link to my situation (but I think the answer can be found there).
For example:
v <- c(50, 50, 25, 25)
lower <- c(0,0,0,20)
upper <- c(100,100,50,40)
out <- GenSA(v, lower = lower, upper = upper, fn = efficientFunction)
Assume that the fourth parameter, v[4], only can be in {20,25,30,35,40}. They suggested the use of Lagrange multipliers, hence, I was thinking of something like: lambda * ceil(v[4] / 5). Is this a good idea ?
But what can I do it the sample space of a variable does not have a nice pattern, for example third parameter, v[3], only can be in {0,21,33,89,100}. I don't understand why a Lagrange multiplier can help in this situation. Do I need to make the form of my parameters different that they follow a pattern or is there another option?
In case Lagrange multipliers are the only option, I'll end up with with 8 of these formulations in my objective. It seems to me that there is another option, but I don't know how!
With kind regards and thanks in advance,
Roos
With SA, you could start with a very simple neighbourhood sheme,
pick 1 of the parameters, and change it by selecting a new valid setting, 1 above, or 1 below the current one (we assume that they have a order, like I feel is your case).
There are no Lagrange multipliers involved in SA as I know. But there are many variations and maybe some with Constrainsts or other make use of them.

Is it possible to set initial values to use in optimisation?

I'm currently using SQSLP, and defining my design variables like so:
p.model.add_design_var('indeps.upperWeights', lower=np.array([1E-3, 1E-3, 1E-3]))
p.model.add_design_var('indeps.lowerWeights', upper=np.array([-1E-3, -1E-3, -1E-3]))
p.model.add_constraint('cl', equals=1)
p.model.add_objective('cd')
p.driver = om.ScipyOptimizeDriver()
However, it insists on trying [1, 1, 1] for both variables. I can't override with val=[...] in the component because of how the program is structured.
Is it possible to get the optimiser to accept some initial values instead of trying to set anything without a default value to 1?
By default, OpenMDAO initializes variables to a value of 1.0 (this tends to avoid unintentional divide-by-zero if things were initialized to zero).
Specifying shape=... on input or output results in the variable values being populated by 1.0
Specifying val=... uses the given value as the default value.
But that's only the default values. Typically, when you run an optimization, you need to specify initial values of the variables for the given problem at hand. This is done after setup, through the problem object.
The set_val and get_val methods on problem allow the user to convert units. (using Newtons here for example)
p.set_val('indeps.upperWeights', np.array([1E-3, 1E-3, 1E-3]), units='N')
p.set_val('indeps.upperWeights', np.array([-1E-3, -1E-3, -1E-3]), units='N')
There's a corresponding get_val method to retrieve values in the desired units after optimization.
You can also access the problem object as though it were a dictionary, although doing so removes the ability to specify units (you get the variable values in its native units).
p['indeps.upperWeights'] = np.array([1E-3, 1E-3, 1E-3])
p['indeps.upperWeights'] = np.array([-1E-3, -1E-3, -1E-3])

Recursion in FP-Growth Algorithm

I am trying to implement FP-Growth (frequent pattern mining) algorithm in Java. I have built the tree, but have difficulties with conditional FP tree construction; I do not understand what recursive function should do. Given a list of frequent items (in increasing order of frequency counts) - a header, and a tree (list of Node class instances) what steps should the function take?
I have hard time understanding this pseudocode above. Are alpha and Betha nodes in the Tree, and what do generate and construct functions do? I can do FP-Growth by hand, but find the implementation extremely confusing. If that could help, I can share my code for FP-Tree generation. Thanks in advance.
alpha is the prefix that lead to this specific prefix tree
beta is the new prefix (of the tree to be constructed)
the generate line means something like: add to result set the pattern beta with support anItem.support
the construct function creates the new patterns from which the new tree is created
an example of the construct function (bottom up way) would be something like:
function construct(Tree, anItem)
conditional_pattern_base = empty list
in Tree find all nodes with tag = anItem
for each node found:
support = node.support
conditional_pattern = empty list
while node.parent != root_node
conditional_pattern.append(node.parent)
node = node.parent
conditional_pattern_base.append( (conditional_pattern, support))
return conditional_pattern_base

Decreasing support threshold for arules in R

I am working on association rules that are considered outliers. I noticed that arules does not show results for rules that have a support less than .10. Is there any way that I could view rules that have a support of .1 (10%) or less?
I tried the following code to try to filter out rules with less than a .1 support. I suspect rules with less than a .1 support do not show up because there would be too many? In any case, here's the code I'm using to see rules with less than a .1 support. By the way, this code works when I want to see greater than anything over .1 .
rulesb = rulesa[quality(rulesa)$support<0.1]
From the examples in ?apriori:
data("Adult")
## Mine association rules.
rules <- apriori(Adult,
parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
summary(rules)
Set supp to the value you like.

How to find Confidence of association rule in Apriori algorithm

I am using Apriori algorithm to identify the frequent item sets of the customer.Based on the identified frequent item sets I want to prompt suggest items to customer when customer adds a new item to his shopping list.Assume my one identified frequent set is [2,3,5].My question is;
if user has already added item 2 and item 5, i want check the confidence of the rule to suggest item 3. for that;
confidence = support of (2,3,5)/ support (3) ?
OR
confidence = support of (2,3,5)/ support (2,5)?
which equation is correct? please help!!
If the association rule is (2,5) -> (3), than is X = (2,5) and Y = (3). The confidence of an association rule is the support of (X U Y) divided by the support of X. Therefore, the confidence of the association rule is in this case the support of (2,5,3) divided by the support of (2,5).
Suppose A^B -> C then
Confidence = support(A^B->C)
i.e. a number of transactions in which all three items are present / support(A,B)
i.e. a number of transactions in which both A and B are present.
So the answer is confidence= support(2,5,3)/support (2,5)
If you just want the answer without any explanation:
confidence = support of (2,3,5)/ support (2,5) in your question is the answer.
What is your antecedent?
Stop.treating equarions as black boxes you need to lok up. understand them or you will fail.

Resources