Test for significant change of means in application insights

Test for significant change of means in application insights - azure-application-insights

I did a change on my webpage on a given date.
Now a want to measure if there is a significant impact on the usage due to the change. How can I do a sampled t-test in Application Insights? Splitting the time series data on a given date than comparing the two sets?

In the KQL there is a built-in function for calculating Welch's t-test: welch_test().
Given a table T with a metric m and a change date d, you can calculate the test by aggregating the metric before and after the change:
T
| summarize m1 = avgif(m, Timestamp < d),
v1 = varianceif(m, Timestamp < d),
c1 = countif(Timestamp < d),
m2 = avgif(m, Timestamp > d),
v2 = varianceif(m, Timestamp > d),
c2 = countif(Timestamp > d)
| extend pValue=welch_test(m1,v1,c1,m2,v2,c2)

Related

2-sample independent t-test where each of two columns is in different data frame

I need to run a 2-sample independent t-test, comparing Column1 to Column2. But Column1 is in DataframeA, and Column2 is in DataframeB. How should I do this?
Just in case relevant (feel free to ignore): I am a true beginner. My experience with R so far has been limited to running 2-sample matched t-tests within the same data frame by doing the following:
t.test(response ~ Column1,
data = (Dataframe1 %>%
gather(key = "Column1", value = "response", "Column1", "Column2")),
paired = TRUE)

TL;DR
t_test_result = t.test(DataframeA$Column1, DataframeB$Column2, paired=TRUE)
Explanation
If the data is paired, I assume that both dataframes will have the same number of observations (same number of rows). You can check this with nrow(DataframeA) == nrow(DataframeB) .
You can think of each column of a dataframe as a vector (an ordered list of values). The way that you have used t.test is by using a formula (y~x), and you were essentially saying: Given the dataframe specified in data, perform a t test to assess the significance in the difference in means of the variable response between the paired groups in Column1.
Another way of thinking about this is by grabbing the data in data and separating it into two vectors: the vector with observations for the first group of Column1, and the one for the second group. Then, for each vector, you compute the mean and stdev and apply the appropriate formula that will give you the t statistic and hence the p value.
Thus, you can just extract those 2 vectors separately and provide them as arguments to the t.test() function. I hope it was beginner-friendly enough ^^ otherwise let me know
EDIT: a few additions
(I was going to reply in the comments but realized I did not have space hehe)
Regarding the what #Ashish did in order to turn it into a Welch's test, I'd say it was to set var.equal = FALSE. The paired parameter controls whether the t-test is run on paired samples or not, and since your data frames have unequal number of rows, I'm suspecting the observations are not matched.
As for the Cohen's d effect size, you can check this stats exchange question, from which I copy the code:
For context, m1 and m2 are the group's means (which you can get with n1 = mean(DataframeA$Column1)), s1 and s2 are the standard deviations (s2 = sd(DataframeB$Column2)) and n1 and n2 the sample sizes (n2 = length(DataframeB$Column2))
lx <- n1- 1 # Number of observations in group 1
ly <- n2- 1 # # Number of observations in group 1
md <- abs(m1-m2) ## mean difference (numerator)
csd <- lx * s1^2 + ly * s2^2
csd <- csd/(lx + ly)
csd <- sqrt(csd) ## common sd computation
cd <- md/csd ## cohen's d

This should work for you
res = t.test(DataFrameA$Column1, DataFrameB$Column2, alternative = "two.sided", var.equal = FALSE)

Sign of Cohen's d is unaffected by reversing order of factor levels in R

I'm using Cohen's d (implemented using cohen.d() from the effsize package) as a measure of effect size in my dependent variable between two levels of a factor.
My code looks like this: cohen.d(d, f) where d is a vector of numeric values and f is a factor with two levels: "A" and "B".
Based on my understanding, the sign of Cohen's d is dependent on the order of means (i.e. factor levels) entered into the formula. However, my cohen.d() command returns a negative value (and negative CIs), even if I reverse the order of levels in f.
Here is a reproducible example:
library('effsize')
# Load in Chickweight data
a=ChickWeight
# Cohens d requires two levels in factor f, so take the first two available in Diet
a=a[a$Diet==c(1,2),]
a$Diet=a$Diet[ , drop=T]
# Compute cohen's d with default order of Diet
d1 = a$weight
f1 = a$Diet
cohen1 = cohen.d(d1,f1)
# Re-order levels of Diet
a$Diet = relevel(a$Diet, ref=2)
# Re-compute cohen's d
d2 = a$weight
f2 = a$Diet
cohen2 = cohen.d(d2,f2)
# Compare values
cohen1
cohen2
Can anyone explain why this is the case, and/or if I'm doing something wrong?
Thanks in advance for any advice!

I'm not entirely sure what the reasoning behind the issue in your example is (maybe someone else can comment here), but if you look at the examples under ?cohen.d, there are a few different methods for calculating it:
treatment = rnorm(100,mean=10)
control = rnorm(100,mean=12)
d = (c(treatment,control))
f = rep(c("Treatment","Control"),each=100)
## compute Cohen's d
## treatment and control
cohen.d(treatment,control)
## data and factor
cohen.d(d,f)
## formula interface
cohen.d(d ~ f)
If you use the first example of cohen.d(treatment, control) and reverse that to cohen.d(control, treatment) you get the following:
cohen.d(treatment, control)
Cohen's d
d estimate: -1.871982 (large)
95 percent confidence interval:
inf sup
-2.206416 -1.537547
cohen.d(control, treatment)
Cohen's d
d estimate: 1.871982 (large)
95 percent confidence interval:
inf sup
1.537547 2.206416
So using the two-vector method from the examples with your data, we can do:
a1 <- a[a$Diet == 1,"weight"]
a2 <- a[a$Diet == 2,"weight"]
cohen3a <- cohen.d(a1, a2)
cohen3b <- cohen.d(a2, a1)
I noticed that f in the ?cohen.d examples is not a factor, but a character vector. I tried playing around with the cohen.d(d, f) method, but didn't find a solution. Would like to see if someone else has anything regarding that.

R: quickly simulate unbalanced panel with variable that depends on lagged values of itself

I am trying to simulate monthly panels of data where one variable depends on lagged values of that variable in R. My solution is extremely slow. I need around 1000 samples of 2545 individuals, each of whom is observed monthly over many years, but the first sample took my computer 8.5 hours to construct. How can I make this faster?
I start by creating an unbalanced panel of people with different birth dates, monthly ages, and variables xbsmall and error that will be compared to determine the Outcome. All of the code in the first block is just data setup.
# Setup:
library(plyr)
# Would like to have 2545 people (nPerson).
#Instead use 4 for testing.
nPerson = 4
# Minimum and maximum possible ages and birth dates
AgeMin = 10
AgeMax = 50
BornMin = 1950
BornMax = 1963
# Person-specific characteristics
ind =
data.frame(
id = 1:nPerson,
BornYear = floor(runif(length(1:nPerson), min=BornMin, max=BornMax+1)),
BornMonth = ceiling(runif(length(1:nPerson), min=0, max=12))
)
# Make an unbalanced panel of people over age 10 up to year 1986
# panel = ddply(ind, ~id, transform, AgeMonths = BornMonth)
panel = ddply(ind, ~id, transform, AgeMonths = (AgeMin*12):((1986-BornYear)*12 + 12-BornMonth))
# Set up some random variables to approximate the data generating process
panel$xbsmall = rnorm(dim(panel)[1], mean=-.3, sd=.45)
# Standard normal error for probit
panel$error = rnorm(dim(panel)[1])
# Placeholders
panel$xb = rep(0, dim(panel)[1])
panel$Outcome = rep(0, dim(panel)[1])
Now that we have data, here is the part that is slow (around a second on my computer for only 4 observations but hours for thousands of observations). Each month, a person gets two draws (xbsmall and error) from two different normal distributions (these were done above), and Outcome == 1 if xbsmall > error. However, if Outcome equals 1 in the previous month, then Outcome in the current month equals 1 if xbsmall + 4.47 > error. I use xb = xbsmall+4.47 in the code below (xb is the "linear predictor" in a probit model). I ignore the first month for each person for simplicity. For your information, this is simulating a probit DGP (but that is not necessary to know to solve the problem of computation speed).
# Outcome == 1 if and only if xb > -error
# The hard part: xb includes information about the previous month's outcome
start_time = Sys.time()
for(i in 1:nPerson){
# Determine the range of monthly ages to loop over for this person
AgeMonthMin = min(panel$AgeMonths[panel$id==i], na.rm=T)
AgeMonthMax = max(panel$AgeMonths[panel$id==i], na.rm=T)
# Loop over the monthly ages for this person and determine the outcome
for(t in (AgeMonthMin+1):AgeMonthMax){
# Indicator for whether Outcome was 1 last month
panel$Outcome1LastMonth[panel$id==i & panel$AgeMonths==t] = panel$Outcome[panel$id==i & panel$AgeMonths==t-1]
# xb = xbsmall + 4.47 if Outcome was 1 last month
# Otherwise, xb = xbsmall
panel$xb[panel$id==i & panel$AgeMonths==t] = with(panel[panel$id==i & panel$AgeMonths==t,], xbsmall + 4.47*Outcome1LastMonth)
# Outcome == 1 if xb > 0
panel$Outcome[panel$id==i & panel$AgeMonths==t] =
ifelse(panel$xb[panel$id==i & panel$AgeMonths==t] > - panel$error[panel$id==i & panel$AgeMonths==t], 1, 0)
}
}
end_time = Sys.time()
end_time - start_time
My thoughts for reducing computer time:
Something with cumsum()
Some wonderful panel data function that I do not know about
Find a way to make the t loop go through the same starting and ending points for each individual and then somehow use plyr::ddpl() or dplyr::gather_by()
Iterative solution: make an educated guess about the value of Outcome at each monthly age (say, the mode) and somehow adjust values that do not match the previous month. This would work better in my real application because xbsmall has a very clear trend in age.
Do the simulation only for smaller samples and then estimate the effect of sample size on the values I need (the distributions of regression coefficient estimates not calculated here)

One approach is to use a split-apply-combine method. I take out the for(t in (AgeMonthMin+1):AgeMonthMax) loop and put the contents in a function:
generate_outcome <- function(x) {
AgeMonthMin <- min(x$AgeMonths, na.rm = TRUE)
AgeMonthMax <- max(x$AgeMonths, na.rm = TRUE)
for (i in 2:(AgeMonthMax - AgeMonthMin + 1)){
x$xb[i] <- x$xbsmall[i] + 4.47 * x$Outcome[i - 1]
x$Outcome[i] <- ifelse(x$xb[i] > - x$error[i], 1, 0)
}
x
}
where x is a dataframe for one person. This allows us to simplify the panel$id==i & panel$AgeMonths==t construct. Now we can just do
out <- lapply(split(panel, panel$id), generate_outcome)
out <- do.call(rbind, out)
and all.equal(panel$Outcome, out$Outcome) returns TRUE. Computing 100 persons took 1.8 seconds using this method, compared to 1.5 minutes in the original code.

How do I test for time heterogeneity in RSiena?

How can I test for time differences in the parameters of my model if I have multiple panels?
For example, I use the stock dataset from RSiena:
d <- sienaDataCreate(sienaDependent(array(c(s501, s502, s503), dim=c(50, 50, 3))))
e <- getEffects(d)
e <- includeEffects(e, transTrip)
ans <- siena07(sienaAlgorithmCreate(nsub=4, n3=1000), data=d, effects=e, batch=TRUE)
I want to see whether the transTrip effect differs from period 1 to period 2.

All that is required is to pass the ans into the sienaTimeTest function:
tt <- sienaTimeTest(ans)
summary(tt)
This provides a nice test for whether there is any time heterogeneity in the parameters.

coin::wilcox_test versus wilcox.test in R

In trying to figure out which one is better to use I have come across two issues.
1) The W statistic given by wilcox.test is different from that of coin::wilcox_test. Here's my output:
wilcox_test:
Exact Wilcoxon Mann-Whitney Rank Sum Test
data: data$variableX by data$group (yes, no)
Z = -0.7636, p-value = 0.4489
alternative hypothesis: true mu is not equal to 0
wilcox.test:
Wilcoxon rank sum test with continuity correction
data: data$variable by data$group
W = 677.5, p-value = 0.448
alternative hypothesis: true location shift is not equal to 0
I'm aware that there's actually two values for W and that the smaller one is usually reported. When wilcox.test is used with comma instead of "~" I can get the other value, but this comes up as W = 834.5. From what I understand, coin::statistic() can return three different statistics using ("linear", "standarized", and "test") where "linear" is the normal W and "standardized" is just the W converted to a z-score. None of these match up to the W I get from wilcox.test though (linear = 1055.5, standardized = 0.7636288, test = -0.7636288). Any ideas what's going on?
2) I like the options in wilcox_test for "distribution" and "ties.method", but it seems that you can not apply a continuity correction like in wilcox.test. Am I right?

I encountered the same issue when trying to apply Wendt formula to compute effect sizes using the coin package, and obtained aberrant r values due to the fact that the linear statistic outputted by wilcox_test() is unadjusted.
A great explanation is already given here, and therefore I will simply address how to obtain adjusted U statistics with the wilcox_test() function. Let's use a the following data frame:
d <- data.frame( x = c(rnorm(n = 60, mean = 10, sd = 5), rnorm(n = 30, mean = 16, sd = 5)),
g = c(rep("a",times = 60), rep("b",times = 30)) )
We can perform identical tests with wilcox.test() and wilcox_test():
w1 <- wilcox.test( formula = x ~ g, data = d )
w2 <- wilcox_test( formula = x ~ g, data = d )
Which will output two distinct statistics:
> w1$statistic
W
321
> w2#statistic#linearstatistic
[1] 2151
The values are indeed totally different (albeit the tests are equivalent).
To obtain the U statistics identical to that of wilcox.test(), you need to subtract wilcox_test()'s output statistic by the minimal value that the sum of the ranks of the reference sample can take, which is n_1(n_1+1)/2.
Both commands take the first level in the factor of your grouping variable g as reference (which will by default be alphabetically ordered).
Then you can compute the smallest sum of the ranks possible for the reference sample:
n1 <- table(w2#statistic#x)[1]
And
w2#statistic#linearstatistic- n1*(n1+1)/2 == w1$statistic
should return TRUE
Voilà.

It seems to be one is performing Mann-Whitney's U and the other Wilcoxon rank test, which is defined in many different ways in literature. They are pretty much equivalent, just look at the p-value. If you want continuity correction in wilcox.test just use argument correct=T.
Check https://stats.stackexchange.com/questions/79843/is-the-w-statistic-outputted-by-wilcox-test-in-r-the-same-as-the-u-statistic

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Test for significant change of means in application insights - azure-application-insights

I did a change on my webpage on a given date. Now a want to measure if there is a significant impact on the usage due to the change. How can I do a sampled t-test in Application Insights? Splitting the time series data on a given date than comparing the two sets?

Related

2-sample independent t-test where each of two columns is in different data frame

Sign of Cohen's d is unaffected by reversing order of factor levels in R

R: quickly simulate unbalanced panel with variable that depends on lagged values of itself

How do I test for time heterogeneity in RSiena?

coin::wilcox_test versus wilcox.test in R

Categories

Resources