How to calculate Mean Average Rank (MAR) and Mean First Rank (MFR)? - formula

I encounter the terminologies (MAR and MFR) in almost all fault localization papers, but I cannot find a concrete example of how to compute them.
The definitions in the paper (Isolating Compiler Optimization Faults via Differentiating Finer-grained Options, SANER'22) are as follow:
• Mean First Rank (MFR): calculates the mean of the
rank of the first buggy element (i.e., buggy file in our
study) for each bug in the ranking list. The position of
the first buggy element is focused on in MFR. Smaller is
better for this metric.
• Mean Average Rank (MAR): calculates the mean of the
average rank of all buggy elements (i.e., buggy files in
our study) for each bug in the ranking list. The position
of all buggy elements are focused on in MAR. Smaller
is better for this metric.
Now, let's assume that I have 5 different rankings for 5 bugs:
Bug 1: 1. a, 2. b, 3. c, 4. d, 5. e (buggy element is 'a', i.e., the first-ranked element)
Bug 2: 1. b, 2. a, 3. c, 4. d, 5. e (buggy element is 'a', i.e., the second-ranked element)
Bug 3: 1. c, 2. d, 3. a, 4. d, 5. e (buggy element is 'c', i.e., the first-ranked element)
Bug 4: 1. a, 2. e, 3. c, 4. b, 5. e (buggy element is 'a', i.e., the first-ranked element)
Bug 5: 1. a, 2. c, 3. d, 4. b, 5. e (buggy element is 'a', i.e., the first-ranked element)
With the above example, first, can we calculate the MFR and MAR? If yes, how to do it?

Related

State diagram, calculate the value after these functions

I have this picture of a state diagram and have to calculate the value of x after a few events. The events are e1-e2-e2-e2-e2
The x would be 2 in the beginning.
First event is e1, so I think it would become 4 after that event.
Next is e2 and I was wondering because the exit is x=x-1, so would it go to state B, because it is less that 4, or C because it was 4, but became 3 in the exit?
And lets suppose it goes to B, and becomes 5, and we do e2 again. Would nothing happen because the only possibility is x>5 and it is equal to 5?
Assuming that the guard between A and C is x>=4 (since there is no e defined) I made a small transition table:
So the final state should be B and X is 11.
In UML state machines, the guards are evaluated, when beeing in the original state. I.e., when receiving e2 the first time, x is 4 and thus you take the transition to C, unser the assumption that e is x (otherwise it doesnt make sense) . After you decided going to C, and thus leave A, you aubstract 1 from x due to the exit ocndition. When beeing in C, you can change B by trigger of e2, which is unguarded (the guard x>5 belongs to the transition from B to C). Now x is 6, as you add 3 due to the entry condition. Then you receive the next e2 and transide to B, where you add 1, so x is now 7. When receiving the next e2, you check the guard on the transition to C, which demands that x is greater 5, which holds. So lets go to C and execute the entry action once again. So x is now 10. Then you get one more e2, so the state changes to C and its entry action is executed, thus x is 11.
So after the execution of the given events, x is 11 and the statemachine is in state B.

How to calculate probability of a specific type of independent chance intersection

Let's say I have 4 events: A, B, C, and D. All of these events have an independent chance of 25% to occur.
It's possible that only 1 of these 4 events occur : A, B, C, or D.
It's possible that 2 of these 4 events occur at once : (A,B), (A,C), (A,D), (B,C), or (B,D).
It's possible that 3 of these 4 events occur at once : (A,B,C) or (B,C,D)
It's possible that all 4 events occur at once : (A,B,C,D)
I understand that the odds of (A,B) or (A,B,C) happening would be calculated by P(A)*P(B) or P(A)*P(B)*P(C) respectively. But how do you determine the odds of whether you get any one of the pairs of two or any one of the pairs of three?
Is it as simple as saying there are 12 possible outcomes and, for instance, any one pair of two would be 5 of those outcomes so 5/12 = 41.67%? Is this consistent regardless of what A, B, C, and D's individual occurrence chances are?
Your event "any two" is equivalent to:
A and B but neither C nor D
A and C but neither B nor D
A and D but neither B nor C
B and C but neither A nor D
B and D but neither A nor C
C and D but neither A nor B
Each of these events has probability (1/4)(1/4)(3/4)(3/4) = 9/256; there are six of them, so the total probability of any of them is 54/256.
For exactly three happening, we get four distinct events having probability 3/256 each. The overall probability of any of these is 12/256.
The probability of all and none are 1/256 and 81/256, respectively. Each can occur only one way.
Finally, there are four ways for only one of the four events to happen, and each of these outcomes has probability 27/256. The total is this 108/256.
81/256 + 108/256 + 54/256 + 12/256 + 1/256 = 256/256, as expected.

Julia : BLAS.gemm!() parameters

I want to use the BLAS package. To do so, the meaning of the two first parameters of the gemm() function is not evident for me.
What do the parameters 'N' and 'T' represent?
BLAS.gemm!('N', 'T', lr, alpha, A, B, beta, C)
What is the difference between BLAS.gemm and BLAS.gemm! ?
According to the documentation
gemm!(tA, tB, alpha, A, B, beta, C)
Update C as alpha * A * B + beta*C or the other three variants according to tA (transpose A) and tB. Returns the updated C.
Note: here, alpha and beta must be float type scalars. A, B and C are all matrices. It's up to you to make sure the matrix dimensions match.
Thus, the tA and tB parameters refer to whether you want to apply the transpose operation to A or to B before multiplying. Note that this will cost you some computation time and allocations - the transpose isn't free. (thus, if you were going to apply the multiplication many times, each time with the same transpose specification, you'd be better off storing your matrix as the transposed version from the beginning). Select N for no transpose, T for transpose. You must select one or the other.
The difference between gemm!() and gemv!() is that for gemm!() you already need to have allocated the matrix C. The ! is a "modify in place" signal. Consider the following illustration of their different uses:
A = rand(5,5)
B = rand(5,5)
C = Array(Float64, 5, 5)
BLAS.gemm!('N', 'T', 1.0, A, B, 0.0, C)
D = BLAS.gemm('N', 'T', 1.0, A, B)
julia> C == D
true
Each of these, in essence, perform the calculation C = A * B'. (Technically, gemm!() performs C = (0.0)*C + (1.0)*A * B'.)
Thus, the syntax for the modify in place gemm!() is a bit unusual in some respects (unless you've already worked with a language like C in which case it seems very intuitive). You don't have the explicit = sign like you frequently do when calling functions in assigning values in a high level object oriented language like Julia.
As the illustration above shows, the outcome of gemm!() and gemm() in this case is identical, even though the syntax and procedure to achieve that outcome is a bit different. Practically speaking, however, performance differences between the two can be significant, depending on your use case. In particular, if you are going to be performing that multiplication operation many times, replacing/updating the value of C each time, then gemm!() can be a decent bit quicker because you don't need to keep re-allocating new memory each time, which does have time costs, both in the initial memory allocation and then in the garbage collection later on.

A more generalized expand.grid function?

expand.grid(a,b,c) produces all the combinations of the values in a,b, and c in a matrix - essentially filling the volume of a three-dimensional cube. What I want is a way of getting slices or lines out of that cube (or higher dimensional structure) centred on the cube.
So, given that a,b, c are all odd-length vectors (so they have a centre), and in this case let's say they are of length 5. My hypothetical slice.grid function:
slice.grid(a,b,c,dimension=1)
returns a matrix of the coordinates of points along the three central lines. Almost equivalent to:
rbind(expand.grid(a[3],b,c[3]),
expand.grid(a,b[3],c[3]),
expand.grid(a[3],b[3],c))
almost, because it has the centre point repeated three times. Furthermore:
slice.grid(a,b,c,dimension=2)
should return a matrix equivalent to:
rbind(expand.grid(a,b,c[3]), expand.grid(a,b[3],c), expand.grid(a[3],b,c))
which is the three intersecting axis-aligned planes (with repeated points in the matrix at the intersections).
And then:
slice.grid(a,b,c,dimension=3)
is the same as expand.grid(a,b,c).
This isn't so bad with three parameters, but ideally I'd like to do this with N parameters passed to the function expand.grid(a,b,c,d,e,f,dimension=4) - its unlikely I'd ever want dimension greater than 3 though.
It could be done by doing expand.grid and then extracting those points that are required, but I'm not sure how to build that criterion. And I always have the feeling that this function exists tucked in some package somewhere...
[Edit] Right, I think I have the criterion figured out now - its to do with how many times the central value appears in each row. If its less than or equal to your dimension+1...
But generating the full matrix gets big quickly. It'll do for now.
Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :
slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}
# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)

Horizontal and Vertical Parity check codes

I was reading about horizontal and vertical parity check codes. One of the properties of these codes is that the final parity check (the lower right bit) is equal to modulo 2 sum of horizontal parity checks and also equal to modulo 2 of sum of vertical parity checks.
I did not understand, why this is true. I can see them in the examples but i really cant come up with any formal/intuitive proof about the same.
Any help/hints will be appreciated.
Thanks,
Chander
Each row and column is sum modulo 2. And result is sum of all numbers mod 2. It does not matter how you count.
Rule is:
((a mod c) + (b mod c)) mod c == (a+b) mod c
This is because every wrong bit propagates the parity either horizontally either vertically..
think about having your matrix of bits:
A B C D
E F G H
I J K L
M N O P
now some of these bits are wrongly transmitted, so you have a total of y errors that are layed around but you don't know where inside the matrix.
If you go by rows (so you calculate horizontal parity) you will be sure that the sum of every row parity modulo 2 will be 0 if you have an even number of errors in that row, 1 otherwise. You will be also sure of the fact that you are considering all of them since you do this work for every row.
Finally if you suppose to correct a bit from a row and alter another one in another one the final result won't change, since you basically remove 1 from a rows to add it elsewhere.
Then think about doing it by columns, you will end up with the same exact behaviour, the only difference is that errors can be distribuited in a different way but adding vertical parity together modulo 2 will take into account same considerations. Since the number of total errors is the same it will be an even number or an odd number either for rows and columns.

Resources