I was wondering if the Kappa Statistic metric provided by WEKA is an inter-annotator agreement metric.
Is is similar to Cohen's Kappa or Fleiss Kappa?
This slide is from the chapter 5 slides for Witten, et al.'s textbook.
https://www.cs.waikato.ac.nz/ml/weka/book.html
(Because I have modified many of the slides, the slide number will be different.)
I'm using it in my research. It is Cohen's Kappa.
Related
I am trying to find influential observations in my logistic regression. Particularly, I am trying to plot Pregibon's delta beta statistics against the predicted probabilities to find these observations.
I could not find any package that would help me to create this statistics. Anyone have any suggestions?
Here is more on Pregibon's delta beta statistics (basically Cook's D for logit): http://people.umass.edu/biep640w/pdf/5.%20%20Logistic%20Regression%202014.pdf
I have found glmtoolbox package that approximates Cook's D for glm - but I am not sure whether this is the correct approach.
I have 5 raters who have rated 10 subjects. I've chosen to use the Light's Kappa to calculate inter-rater reliability because I have multiple raters. My issue is that when there is strong agreement between the raters, Light's kappa cannot be calculated due to lack of variability, and I've followed the updated post here which suggests using the Raters package in R when there is strong agreement.
My issue is that the Raters package calculates Fleiss' kappa, which from my understanding,is not suitable for inter-rater reliability where the same raters rate all subjects (such as in my case). My question is what type of kappa statistic I should be calculating in cases where there is strong agreement?
#install.packages("irr")
library(irr)
#install.packages('raters')
library(raters)
#mock dataset
rater1<- c(1,1,1,1,1,1,1,1,0,1)
rater2<- c(1,1,1,1,1,1,1,1,1,1)
rater3<- c(1,1,0,1,1,0,1,1,1,1)
rater4<- c(1,1,1,1,1,1,1,1,1,1)
rater5<- c(1,1,1,1,1,1,0,1,1,1)
df <- data.frame(rater1, rater2, rater3, rater4, rater5)
#light's kappa
kappam.light(df)
#kappa using raters package
data(df)
concordance(df, test = 'Normal')
I have 4 raters who have rated 10 subjects. Because I have multiple raters (and in my actual dataset, these 4 raters have rated the 10 subjects on multiple variables) I've chosen to use the Light's Kappa to calculate inter-rater reliability. I've run the light's kappa code shown below and included an example of my data.
My question is that the resulting kappa value (kappa=0.545) is fairly low even though the raters agree on almost all ratings? I'm not sure if there is some other way to calculate inter-rater reliability (e.g., pairwise combinations between raters?)
Any help is appreciated.
subjectID<- c(1,2,3,4,5,6,7,8,9,10)
rater1<- c(3,2,3,2,2,2,2,2,2,2)
rater2<-c(3,2,3,2,2,2,2,2,2,2)
rater3<- c(3,2,3,2,2,2,2,2,2,2)
rater4<-c(3,2,1,2,2,2,2,2,2,2)
df <- data.frame(subjectID, rater1, rater2, rater3, rater4)
kappam.light(df)
I have trouble intrepreting the result which I get from the segregation.test method in spatstat. However, I have three different point patterns A,B,C and I want to prove that C and B are correlating whereas A and B not. You can see the Kernel estimates of intensity in this picture:
But computing this in R with spatstat package I always get the same p-value, although the test statistic T is different… How is this possible? What does the test statistic T mean in this context? And why do I get the exact same p-value?
I hope you can help what I did wrong doing this Monte Carlo test.
The meaning of the test statistic T is clearly explained in the help file. Did you look at it?
?segregation.test
Under the null hypothesis of no segregation in the Monte Carlo test the data pattern and the simulated patterns are exchangeable. The p-value is calculated by the rank of the test statistic of the observed pattern out of the total number of patterns. In both cases you have presented the observed data had the most extreme segregation statistic T, and the p-value is 1/26 = 0.03846.
To understand the details look at the mentioned help file and Chapters 10 and 14 of the spatstat book. (Unfortunately none of these are free sample chapters.)
Edit: The test statistic T is a measure of the degree of segregation. If the points are randomly labeled it tends to be close to 0 and if the marks are very well separated it tends to be numerically "large". Since there is no notion of "large" the Monte Carlo p-value is used to judge whether the observed T is so large that we should reject the null hypothesis of random labeling.
I want to know the goodness of fit while fitting a power law distribution in R using poweRlaw package.
After estimate_xmin() , I had a p-value 0.04614726. But the bootstrap_p() returns another p-value 0.
So why do these two p-value differ? And how can I judge if it is a power law distribution?
here is the plot when using poweRlaw for fittingpoweRlaw fitting result
You're getting a bit confused. One of the statistics that estimate_xmin returns is the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)). This statistic is used to estimate the best cut-off value for your model, i.e. xmin. However, this doesn't tell you anything about the model fit.
To assess model suitability is where the bootstrap function comes in.