How to find Confidence of association rule in Apriori algorithm - associations

I am using Apriori algorithm to identify the frequent item sets of the customer.Based on the identified frequent item sets I want to prompt suggest items to customer when customer adds a new item to his shopping list.Assume my one identified frequent set is [2,3,5].My question is;
if user has already added item 2 and item 5, i want check the confidence of the rule to suggest item 3. for that;
confidence = support of (2,3,5)/ support (3) ?
OR
confidence = support of (2,3,5)/ support (2,5)?
which equation is correct? please help!!

If the association rule is (2,5) -> (3), than is X = (2,5) and Y = (3). The confidence of an association rule is the support of (X U Y) divided by the support of X. Therefore, the confidence of the association rule is in this case the support of (2,5,3) divided by the support of (2,5).

Suppose A^B -> C then
Confidence = support(A^B->C)
i.e. a number of transactions in which all three items are present / support(A,B)
i.e. a number of transactions in which both A and B are present.
So the answer is confidence= support(2,5,3)/support (2,5)

If you just want the answer without any explanation:
confidence = support of (2,3,5)/ support (2,5) in your question is the answer.

What is your antecedent?
Stop.treating equarions as black boxes you need to lok up. understand them or you will fail.

Related

How would you optimize dividing bi variate data in R?

I'm not looking for a specific line a code - just built in functions or common packages that may help me do the following. Basically, something like, write up some code and use this function. I'm stuck on how to actually optimize - should I use SGD?
I have two variables, X, Y. I want to separate Y into 4 groups so that the L2, that is $(Xji | Yi - mean(Xji) | Yi)^2$ is minimized subject to the constraint that there are at least n observations in each group.
How would one go about solving this? I'd imagine you can't do this with the optim function? Basically the algo needs to move 3 values around (there are 3 cutoff points for Y) until L2 is minimized subject to n being a certain size.
Thanks
You could try optim and simply add a penalty if the constraints are not satisfied: since you minimise, add zero if all constraints are okay; otherwise a positive number.
If that does not work, since you only look for three cutoff points, I'd probably try a grid search, i.e. compute the objective function for different levels of the cutoff point; throw away those that violate the constraints, and then keep the best solution.

Is it possible to represent 'average value' in programming?

Had a tough time thinking of an appropriate title, but I'm just trying to code something that can auto compute the following simple math problem:
The average value of a,b,c is 25. The average value of b,c is 23. What is the value of 'a'?
For us humans we can easily compute that the value of 'a' is 29, without the need to know b and c. But I'm not sure if this is possible in programming, where we code a function that takes in the average values of 'a,b,c' and 'b,c' and outputs 'a' automatically.
Yes, it is possible to do this. The reason for this is that you can model the sort of problem being described here as a system of linear equations. For example, when you say that the average of a, b, and c is 25, then you're saying that
a / 3 + b / 3 + c / 3 = 25.
Adding in the constraint that the average of b and c is 23 gives the equation
b / 2 + c / 2 = 23.
More generally, any constraint of the form "the average of the variables x1, x2, ..., xn is M" can be written as
x1 / n + x2 / n + ... + xn / n = M.
Once you have all of these constraints written out, solving for the value of a particular variable - or determining that many solutions exists - reduces to solving a system of linear equations. There are a number of techniques to do this, with Gaussian elimination with backpropagation being a particularly common way to do this (though often you'd just hand this to MATLAB or a linear algebra package and have it do the work for you.)
There's no guarantee in general that given a collection of equations the computer can determine whether or not they have a solution or to deduce a value of a variable, but this happens to be one of the nice cases where the shape of the contraints make the problem amenable to exact solutions.
Alright I have figured some things out. To answer the question as per title directly, it's possible to represent average value in programming. 1 possible way is to create a list of map data structures which store the set collection as key (eg. "a,b,c"), while the average value of the set will be the value (eg. 25).
Extract the key and split its string by comma, store into list, then multiply the average value by the size of list to get the total (eg. 25x3 and 23x2). With this, no semantic information will be lost.
As for the context to which I asked this question, the more proper description to the problem is "Given a set of average values of different combinations of variables, is it possible to find the value of each variable?" The answer to this is open. I can't figure it out, but below is an attempt in describing the logic flow if one were to code it out:
Match the lists (from Paragraph 2) against one another in all possible combinations to check if a list contains all elements in another list. If so, substract the lists (eg. abc-bc) as well as the value (eg. 75-46). If upon substracting we only have 1 variable in the collection, then we have found the value for this variable.
If there's still more than 1 variables left such as abcd - bc = ad, then store the values as a map data structure and repeat the process, till the point where the substraction count in the full iteration is 0 for all possible combinations (eg. ac can't substract bc). This is unfortunately not where it ends.
Further solutions may be found by combining the lists (eg. ac + bd = abcd) to get more possible ways to subtract and derive at the answer. When this is the case, you just don't know when to stop trying, and the list of combinations will get exponential. Maybe someone with strong related mathematical theories may be able to prove that upon a certain number of iteration, further additions are useless and hence should stop. Heck, it may even be possible that negative values are also helpful, and hence contradict what I said earlier about 'ac' can't subtract 'bd' (to get a,c,-b,-d). This will give even more combinations to compute.
People with stronger computing science foundations may try what templatetypedef has suggested.

More info needed on number of nodes generated by Breadth First Search

I am new to AI and was going through Peter Norvig book. I've looked into this question already What is the number of nodes generated by breadth-first search?.
It says that if we apply goal test to each node when it is selected for expansion then we have nodes = 1 + b + b^2 + b^3 + ... + b^d + (b^(d+1) - b)
But what if my goal state is a leaf node at the final depth. So there is no depth at all after the goal. Then how can b^(d+1) evaluate?. eg: in a tree with max depth 3, if my goal lies at depth 3, then how would I evaluate b^(3+1) when there is no 4th level at all?. Please clear my doubt. Thanks in advance!
Note that the answer you linked mentioned that that is the amount of nodes that will be generated in the worst case.
Generated means that not all of those nodes are tested to see if they are the goal; they're simply generated and stored so that they can eventually be compared to the goal in case the goal is not found yet.
Worst case has two important implications. Try to visualize the Breadth-First Search going from left to right, then down one level, then left to right again, then down, etc. With worst case we assume that, on whatever depth level d the goal is located, the goal is the very last (rightmost) node. This means that all nodes to the left of it are compared to the goal node, and any successors/children of them are generated as well.
Now, I know that you said that in your case there are no nodes at a depth level below d, but the second implication of saying worst case is that we do assume there are basically infinitely many depth levels.
Indeed, for your case that equation is not entirely correct, but this is simply because you don't have the worst case. In your case, the search process would indeed not have to generate the last (b^(d+1) - b) nodes of the equation.
A final note on the terminology you used: you asked how b^(d+1) (for example, b^(3+1) can be evaluated if there is no depth level below d = 3. There is still no problem to mathematically evaluate that term. Even in your case there is no depth level 4, we can still mathematically evaluate the term b^(3+1). In your case it would not make sense to do so, because it is not correct, but we can still evaluate the term just fine.

Is a number sequence increasing or decreasing?

I'm asking for a non-programming point of view because i want to see the meaning - why is it that way?
There is a sequence in one book and the formula for it is (2n+3)/(6n-5). And it is said that it is decreasing which can be seen by the obtained formula: -28/((6+1)(6n-5)). I see the formula works for every member but how can i obtain that formula which determines if the sequence is decreasing or increasing?
What you're interested in is the difference between two sequential elements, take for example n and (n+1).
The nth term is (2n+3)/(6n-5)
The (n+1)th term is (2n+5)/(6n+1)
Now, you can find the difference between these two terms:
f(n+1)-f(n) = (2n+5)/(6n+1) - (2n+3)/(6n-5)
Notice that, conceptually, the value is the Difference between one term and the next one.
This simplifies to the expression you wrote. Now, just to be pedantic, there is a small typo in the solution you gave, but it looks like an actual typo, not a misunderstanding or wrong answer. You have "(6+1)" where it should be "(6n+1)"
Now, when this value is positive, the sequence is increasing, and when it is negative the sequence is decreasing. This value, for example, will always be negative for n>5/6. There is a negative number in the numerator, and no way for the denominator to become negative to cancel it out.
Go to : http://www.wolframalpha.com/widgets/view.jsp?id=c44e503833b64e9f27197a484f4257c0
Under "derivative of" input your formula : (2*x+3)/(6*x-5)
Click "submit" button
Click the "Step-by-step solution" link
OPs question: How to get from (2*x+3)/(6*x-5) to -28/(5-6x)^2
Answer: Find the first derivative of (2*x+3)/(6*x-5)
How: Start with quotient rule for finding derivatives http://en.wikipedia.org/wiki/Quotient_rule to simplify that you'll need a few other rules http://en.wikipedia.org/wiki/Category:Differentiation_rules

How to calculate the likelihood for an element in a route that traverses a probability graph is correct?

I have an asymmetric directed graph with a set of probabilities (so the likelihood that a person will move from point A to B, or point A to C, etc). Given a route through all the points, I would like to calculate the likelihood that each choice made in the route is a good choice.
As an example, suppose a graph of just 2 points.
//In a matrix, the probabilities might look like
//A B
[ 0 0.9 //A
0.1 0 ] //B
So the probability of moving from A to B is 0.9 and from B to A is 0.1. Given the route A->B, how correct is the first point (A), and how correct is the second point (B).
Suppose I have a bigger matrix with a route that goes A->B->C->D. So, some examples of what I would like to know:
How likely is it that A comes before B,C, & D
How likely is it that B comes after A
How likely is it that C & D come after B
Basically, at each point, I want to know the likelihood that the previous points come before the current and also the likelihood that the following points come after. I don't need something that is statistically sound. Just an indicator that I can use for relative comparisons. Any ideas?
update: I see that this question is not useful to everyone but the answer is really useful to me so I've tried to make the description of the problem more clear and will include my answer shortly in case it helps someone.
I don't think that's possible efficiently. If there was an algorithm to calculate the probability that a point was in the wrong position, you could simply work out which position was least wrong for each point, and thus calculate the correct order. The problem is essentially the same as finding the optimal route.
The subsidiary question is what the probability is 'of', here. Can the probability be 100%? How would you know?
Part of the reason the travelling salesman problem is hard is that there is no way to know that you have the optimal solution except looking at all the solutions and finding that it is the shortest.
Replace probability matrix (p) with -log(p) and finding shortest path in that matrix would solve your problem.
After much thought, I came up with something that suits my needs. It still has the the same problem where to get an accurate answer would require checking every possible route. However, in my case, only checking direct and the first indirect routes are enough to give an idea of how "correct" my answer is.
First I need the confidence for each probability. This is a separate calculation and is contained in a separate matrix (that maps 1 to 1 to the probability matrix). I just take the 1.0-confidenceInterval for each probability.
If I have a route A->B->C->D, I calculate a "correctness indicator" for a point. It looks like I am getting some sort of average of a direct route and the first level of indirect routes.
Some examples:
Denote P(A,B) as probability that A comes before B
Denote C(A,B) as confidence in the probability that A comes before B
Denote P`(A,C) as confidence that A comes before C based on the indirect route A->B->C
At point B, likelihood that A comes before it:
indicator = P(A,B)*C(A,B)/C(A,B)
At point C, likelihood that A & B come before:
P(A,C) = P(A,B)*P(B,C)
C(A,C) = C(A,B)*C(B,C)
indicator = [P(A,C)*C(A,C) + P(B,C)*C(B,C) + P'(A,C)*C'(A,C)]/[C(A,C)+C(B,C)+C'(A,C)]
So this gives me some sort of indicator that is always between 0 and 1, and takes the first level indirect route into account (from->indirectPoint->to). It seems to provide the rough estimation I was looking for. It is not a great answer, but it does provide some estimate and since nothing else provides anything better, it is suitable

Resources