Code example
$(function () {
(function (i, s, o, g, r, a, m) {
i['GoogleAnalyticsObject'] = r;
i[r] = i[r] || function () {
(i[r].q = i[r].q || []).push(arguments)
}, i[r].l = 1 * new Date();
a = s.createElement(o),
m = s.getElementsByTagName(o)[0];
a.async = 1;
a.src = g;
m.parentNode.insertBefore(a, m)
})(window, document, 'script', '//www.google-analytics.com/analytics.js', 'ga');
ga('create', 'UA-[Site]-[Nr]', [SiteLabel]);
ga('set', {
'dimension1': [State],
'dimension2': [Title],
'dimension3': [LocationID],
'dimension4': [Owner],
'dimension5': [UserID],
'dimension6': [Uri],
'dimension7': [Exception]
});
ga('send', 'pageview');
});
Use case example
Use cases where [A, B, C, D] are different pages and S* is the session.
#Use case 1
------------------
S1 : A B A B
S2 : A B A
S3 : B
#Use case 2
------------------
S1 : A B A C A D A
#Use case 3
------------------
S1 : A B C D
#Use case 4
------------------
S1 : A A A A A
Expected results:
Mode 1: Hit – value is applied to the single hit for which it has been set.
Mode 2: Session – value is applied to all hits in a single session.
#Use case 1
Mode 1:
A : 4 visits
B : 4 visits
Mode 2:
A : 2 visits
B : 3 visits
#Use case 2
Mode 1:
A : 4 visits
B : 1 visits
C : 1 visits
D : 1 visits
Mode 2:
A : 1 visits
B : 1 visits
C : 1 visits
D : 1 visits
#Use case 3
Mode 1:
A : 1 visits
B : 1 visits
C : 1 visits
D : 1 visits
Mode 2:
A : 1 visits
B : 1 visits
C : 1 visits
D : 1 visits
#Use case 4
Mode 1:
A : 5 visits
Mode 2:
A : 1 visits
Results I seem to get
Changing the scope does not seem to have any effect.
#Use case 1
A : 2 (7 page visits)
B : 1 (1 page visits)
#Use case 2
A : 1 (7 page visits)
#Use case 3
A : 1 (4 page visits)
#Use case 4
A : 1 (5 page visits)
Question
How can I get expected results?
First since you are setting custom dimensions I believe [A, B, C, D] should represent different values of custom dimensions, not pages.
When using Hit Level Custom Dimensions you should be looking at pageviews instead of visits/sessions. So what you really should be expecting to see is:
#Use case 1
Mode 1:
A : 4 pageviews
B : 4 pageviews
Mode 2:
A : 2 visits
B : 3 visits
#Use case 2
Mode 1:
A : 4 pageviews
B : 1 pageviews
C : 1 pageviews
D : 1 pageviews
Mode 2:
A : 1 visits
B : 1 visits
C : 1 visits
D : 1 visits
#Use case 3
Mode 1:
A : 1 pageviews
B : 1 pageviews
C : 1 pageviews
D : 1 pageviews
Mode 2:
A : 1 visits
B : 1 visits
C : 1 visits
D : 1 visits
#Use case 4
Mode 1:
A : 5 pageviews
Mode 2:
A : 1 visits
If you redo your tests with this in mind the numbers should match.
Related
I try to treat ordinal data as a function of time in order to analyze if there is a significant difference between two different modalities.
Here is my data set
num
Note
Semaine
Companion
mummies
mycosis
1
2
13
0
1
2
2
1
13
0
1
1
3
4
13
0
1
1
4
2
13
0
1
1
31
5
13
1
2
1
1
3
14
0
3
2
num is the number of th plant, there are 30 plant in each modality so a total of 60 plants.
Note is the grade given according to the amount of aphid found on the plant :
1 "absence"
2 "founder"
3 "founder + larvae"
4 "colony"
5 "winged colony"
Semaine is the weak in which the statments were done
Companion is the modality
0 "without companion plant"
1 "with companion plant"
mummies is the grade given according to the amount of mummies found on the plant :
1 "absence"
2 "few"
3 "a lot"
I performed a linear regression using the polr function but time is taken into account here as a fixed factor.
head(aphid, 2)
aphid$Note = factor(aphid$Note)
e = polr(Note ~ num + Semaine + Companion, data = aphid, Hess = TRUE)
n4 = data.frame(num = 30, Semaine = 14, Companion = 0)
predict(e, n4, type = "probs")
exp(cbind(OR = coef(e), confint(e)))
I want to find a way to see if there is a statistical difference between the number of aphids, mummies, mycosis and weeks of the two modalities. Is there a model that can do that ?
I got stuck at this for a long time and couldn't find answer elsewhere.
Below is my data:
Market Start Type(0 or 1)
A 1
A 2
A 4
A 6
A 10
A 2
B 2
B 4
B 6
B 8
B 4
B 9
C 1
C 4
C 7
C 3
C 9
C 11
C 12
And I want to complete the Type column based on following conditions:
If Market is A and Start is 1,2,3, then Type is 1, otherwise 0
If Market is B and Start is 2,4,5, then Type is 1, otherwise 0
If Market is C and Start is 4,6,9, then Type is 1, otherwise 0
In Alteryx, I tried using the formula tool three times:
IIF ( [Market]="A" && ([Start] in (1,2,3),"1","0")
IIF ( [Market]="B" && ([Start] in (2,4,5),"1","0")
IIF ( [Market]="C" && ([Start] in (4,6,9),"1","0")
But the third IIF function overwrites the previous two. Is there any other tools in Alteryx that can do what I want to do? Or is there something wrong with my code?
Thanks in advance. Really appreciate it.
It evaluates to False and places a zero for any market <> "C"... try a single Formula tool with:
IF [Market]="A" THEN
IIF([Start] in (1,2,3),"1","0")
ELSEIF [Market]="B" THEN
IIF([Start] in (2,4,5),"1","0")
ELSEIF [Market]="C" THEN
IIF([Start] in (4,6,9),"1","0")
ENDIF
This should eliminate overlap.
A=c("f","t","t","f","t","f","f","f","t","f")
B=c("t","t","t","t","t","f","f","f","t","t")
class=c("+","+","+","-","+","-","-","-","-","-")
df=data.frame(A,B,class)
df
A B class
1 f t +
2 t t +
3 t t +
4 f t -
5 t t +
6 f f -
7 f f -
8 f f -
9 t t -
10 f t -
I partitioned attribute A or B due to the class as follows :
{A}
[T , F]
/ \
------- -------
[3+,1-] [1+,5-]
{B}
[T , F]
/ \
------- -------
[4+,3-] [0+,3-]
depending on the above formula I calculated entropy by this code in R .
1- for attribute A
t=table(A,class)
t
class
A - +
f 5 1
t 1 3
prop1=t[1,]/sum(t[1,])
prop1
- +
0.8333333 0.1666667
prop2=t[2,]/sum(t[2,])
prop2
- +
0.25 0.75
H1=-(prop1[1]*log2(prop1[1]))-(prop1[2]*log2(prop1[2]))
H1
0.6500224
H2=-(prop2[1]*log2(prop2[1]))-(prop2[2]*log2(prop2[2]))
H2
0.8112781
entropy=(table(A)[1]/length(A))*H1 +(table(A)[2]/length(A))*H2
entropy
0.7145247
2- for attribute B
t=table(B,class)
t
class
B - +
f 3 0
t 3 4
prop1=t[1,]/sum(t[1,])
prop1
- +
1 0
prop2=t[2,]/sum(t[2,])
prop2
- +
0.4285714 0.5714286
H1=-(prop1[1]*log2(prop1[1]))-(prop1[2]*log2(prop1[2]))
H1
NaN
H2=-(prop2[1]*log2(prop2[1]))-(prop2[2]*log2(prop2[2]))
H2
0.9852281
entropy=(table(B)[1]/length(B))*H1 +(table(B)[2]/length(B))*H2
entropy
NaN
when I calculate entropy for attribute B the result give me NaN that is due to zero(0) (log2(0) is error ) . in such situation how can I fix this error or how can make H1 give me zero instead of NaN
I'm running the apriori algorithm like this:
rules <-apriori(dt)
inspect(rules)
where dt is my data.frame with this format:
> head(dt)
Cus T C B
1: C1 0 1 1
2: C2 0 1 0
3: C3 0 1 0
4: C4 0 1 0
5: C5 0 1 0
6: C6 0 1 1
The idea of the data set is to capture the customer and whether he\she bought three different items (T, C and B) on a particular purchase. For example, based on the information above, we can see that C1 bought C and B; customers C2 to C5 bought only C and customer C6 bought only C and B.
the output is the following:
lhs rhs support confidence lift
1 {} => {T=0} 0.90 0.9000000 1.0000000
2 {} => {C=1} 0.91 0.9100000 1.0000000
3 {B=0} => {T=0} 0.40 0.8163265 0.9070295
4 {B=0} => {C=1} 0.40 0.8163265 0.8970621
5 {B=1} => {T=0} 0.50 0.9803922 1.0893246
6 {B=1} => {C=1} 0.51 1.0000000 1.0989011
My questions are:
1) how can I get rid of rules where T,C or B are equal to 0. If you think about it, the rule {B=0} => {T=0} or even {B=1} => {T=0} doesn't really make sense.
2)I was reading about the apriori algorithm and in most of the examples, each line represents the actual transactions so in my case, it should be something like:
C,B
C
C
C
C
C, B
instead of my sets of ones and zeros, is that a rule? Or can I still work with my format?
Thanks
Not sure what the aim of the program is supposed to be, but the aim of the Apriori algorithm is first to extract frequent itemsets of a given data, in which frequent itemsets are a certain quantity of items which often appear as such quantity in the data. And second to generate of those extracted frequent itemsets association rules. An association rule looks for example like this:
B -> C
Which in the stated case means, that customers who bought B buys C too to a certain probability. Whereby the probability is determined by the support and confidence level of the Apriori algorithm. The support level regulates the amount of frequent itemsets and the confidence level the amount of association rules. Association rules over the confidence are called strong association rules.
Do not understand against this backdrop why for the determination whether a customer bought different articles the Apriori algorithm is used. This could be answered by an if statement. And the provided output makes no sense in this context. The output says for example for the third line that if a customer does not buy B then he buys not T with a support of 40% and a confidence of 81.6%. Apart of that association rules does not have a support, only the association rule B -> C is correct, but it's confidence value wrong.
Nevertheless, if the aim is to generate described association rules the original Apriori cannot operate an input in this format:
> head(dt)
Cus T C B
1: C1 0 1 1
2: C2 0 1 0
3: C3 0 1 0
4: C4 0 1 0
5: C5 0 1 0
6: C6 0 1 1
For the uncustomized Apriori algorithm a data set needs this format:
> head(dt)
C1: {B, C}
C2: {C}
C3: {C}
C4: {C}
C5: {C}
C6: {B, C}
See two solutions: Either to format the input wherever or to customize the Apriori algorithm to this format what would be argubaly a change of the input format within the algorithm. To clarify the need of the stated input format, the Apriori algorithm in a nutshell with the provided data:
Support level = 0.3
Confidence level = 0.3
Number of customers = 6
Total number of B's bought = 2
Total number of C's bought = 6
Support of B = 2 / 6 = 0.3 >= 0.3 = support level
Support of C = 6 / 6 = 1 >= 0.3 = support level
Support of B, C = 2 / 6 = 0.3 >= 0.3 = support level
-> Frequent itemsets = {B, C, BC}
-> Association rules = {B -> C}
Confidence of B -> C = 2 / 2 = 1 >= 0.3 = confidence level
-> Strong association rules = {B -> C}
Hope this helps.
So reading through this paper:
http://www.cs.nyu.edu/~mohri/pub/fla.pdf
I see that a weighted finite state transducer (WFST) is a semiring, and many operations on WFST can be expressed in terms of "sum" and "product" over the semiring. For example, composition of Transducers one and two is:
(T1 ◦ T2)(x, y) = ⊕ z∈∆∗ T1(x, z)⊗T2(z, y)
But I can't seem to find an explanation on how do pure sum and product of WFST, and am having trouble backing out the operation from the composition example above.
A demonstration over this example would be much appreciated:
format: state1 state2, input alphabet : output alphabet, transition prob
T1
0 1 a : b, 0.1
0 2 b : b, 0.2
2 3 b : b, 0.3
0 0 a : a, 0.4
1 3 b : a, 0.5
T2
0 1 b : a, 0.1
1 2 b : a, 0.2
1 1 a : d, 0.3
1 2 a : c, 0.4
Example taken from: How to perform FST (Finite State Transducer) composition
--------------- update ------------
Found the answer in this document: http://www.cs.nyu.edu/~mohri/pub/hwa.pdf
page 12