OptimalCutoff Youden index calculation - r

After calculating the ROC curve for a dichotomous variable (a vs b). I want to calculate the optimal cut off value to differentiate this variable. The Youden index is the value that optimizes sensitivity and specificity for the differentiation.
Apparently, the package "OptimalCutpoints" should be able to do it. However, I get this strange error. Code inserted below:
library(pROC)
library(OptimalCutpoints)
df <- structure(list(value = c(1945.523629, 2095.549323, 2066.585153,
2445.878083, 2112.252632, 2115.92955, 2000.285032, 2224.611905,
1616.534694, 1668.017699, 1475.980978, 1940.849817, 1716.666667,
2153.284314, 2063.353635, 2163.070313, 1856.319149, 1499.986928,
2240.440449, 1869.083916, 1807.196078, 2025.603604, 1638.22973,
1781.602941, 2014.013809, 1906.027356, 2033.148718, 1923.403162,
1687.107744, 2632.280305, 1774.073084, 2196.162393, 2164.108659,
2055.031216, 2229.501425, 1273.872576, 2224.126126, 2006.858974,
1956.601942, 1808.214521, 1535.387136, 1382.15, 1596.69693, 1779.477273,
1577.174699, 1908.321526, 1833.124454, 1679.492978, 1777.31114,
1988.249023, 1736.75, 1985.68521, 1821.025974, 1745.325862, 1805.640777,
2326.821229, 1858.558824, 2025.622727, 2197.781321, 1475.685446,
2000.906423, 1714.749573, 1436.529412, 1981.15572, 1939.612779,
2007.679335, 2029.189536, 1644.298246, 1824.697342, 2281.990385,
2131.331776, 1143.722714, 1784.578076, 2143.131579, 982.4908457,
2217.021592, 1799.512346, 526.7047753, 1613.25, 951.9103079,
1006.241888, 1146.276835, 1651.474138, 1568.484778, 1938.867704,
792.5410822, 1602.037383, 1244.281863, 957.5739437, 819.6116071,
879.2128326, 1189.638632, 775.5525292, 1148.193333, 1130.812183,
902.34, 994.3302961), type = c("a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"
)), .Names = c("value", "type"), row.names = c(NA, -97L), class = "data.frame")
rocobj <- plot.roc(df$type, df$value, percent = TRUE, main="ROC", col="#1c61b6", add=FALSE)
optimal.cutpoint.Youden <- optimal.cutpoints(X = "value", status = "type", tag.healthy = 0, methods = "Youden",
data = df, pop.prev = NULL,
control = control.cutpoints(), ci.fit = FALSE, conf.level = 0.95, trace = FALSE)
summary(optimal.cutpoint.Youden)
plot(optimal.cutpoint.Youden)
Error: There are no healthy subjects in your dataset. Please review data and
variables. Prevalence must be a value higher than 0 and lower than 1.
I am probably missing something very obvious here. I tried to modify the code based on the package help file, but I cannot get rid of the error.
Thank you very much and my apologies for my R "skills"
PS: I understand the limitations of defining an "optimal cutoff" because it depends on how important your sensitivity is versus your specificity etc. I just want to have an idea of what value we would get using this technique.

the problem is how you have defined the tag.healthy argument. It should be 'a' or 'b' as these are in your data. You have defined it as 0.
Hope this helps.

Related

Creating a treemap, based on count, using R

I would like to create a tree map based on the count of "names". However, I am not sure how to do so. Seeking you help on this matter.
names <- c("A", "B", "B", "C", "D", "A", "A", "A", "A", "G", "B", "F", "F", "H")
names <- names %>% as.factor()
ggplot(names, aes(area= names, fill= names) + geom_treemap()
Many thanks
names <- c("A", "B", "B", "C", "D", "A", "A", "A", "A", "G", "B", "F", "F", "H")
names <- data.frame(names)
names <- names %>%
count(names)
ggplot(names, aes(area= n, fill= names)) + geom_treemap()

Adjust the multiple fills(color) of different label regions

0
Forgive my stupid to disturb you again.
#teunbrand answered my question yesterday and I used it in my real data but it doesn’t work .
Here is my question in stackoverfow:Can I adjust the fill(color) of different label regions when using ggh4x package
And # teunbrand created a function : assign_strip_colours <- function(gt, index, colours){…}
I don’t know where is wrong with my real data and code. There are 42 regions need to be filled with different colors.
gt <- assign_strip_colours(gt, 1:42,rainbow(42)) Warning message: In gt$grobs[is_strips] <- strips : 被替换的项目不是替换值长度的倍数(The item being replaced is not a multiple of the length of the replacement value. ) ?
If there is sth need to be adjust in assign_strip_colours <- function(gt, index, colours){…} ?
Forgive me I’m really new to ggplotGrob. I need your help.Thanks.
sample data and code:
structure(list(Name = 1:71, Disease = 72:142, Organ = c("A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A"), fill = c("a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a"
), mean =..., row.names = c(NA, 71L), class = "data.frame")
p1<-ggplot(data = data, aes(Name,mean, label = Name, fill=Organ)) +
geom_bar(position="dodge2", stat="identity",width = 0.85,color="black") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),position = position_dodge(0.95), width = .2) +
# scale_alpha_manual(values = datamean_sd$Alpha) +
# scale_color_manual(name = "Organ", values = c("A"="#f15a24", "B"="#00FF00","C"="#7570B3","D"="#FF00FF","E"="#FFFF33","F"="#00F5FF","G"="#666666","H"="#7FC97F","I"="#BEAED4","J"="#A6D854"))+
# guides(
# colour = guide_legend(title.position = "right")
# )+
facet_nested(.~Organ+Disease, scales = "free_x", space = "free_x",switch = "x")+
## facet_wrap(strip.position="bottom") +
labs(title = "123", x = NULL, y = "value") +
rotate_x_text(angle = 45)+
scale_fill_manual(name = "Organ",values = unique(datamean_sd$Organ_fill))
p1
####
gt <- ggplotGrob(p1)
###############
assign_strip_colours <- function(gt, index, colours) {
if (length(index) != length(colours))
stop()
# Decide which strips to recolour, here: the first 3
is_strips <- which(startsWith(gt$layout$name, "strip-b"))[index]
# Extract strips
strips <- gt$grobs[is_strips]
# Loop over strips
strips <- mapply(function(strip, colour) {
# Find actual strip
is_strip <- strip$layout$name == "strip"
grob <- strip$grobs[is_strip][[1]]
# Find rectangle
is_rect <- which(vapply(grob$children, inherits, logical(1), "rect"))
# Change colour
grob$children[[is_rect]]$gp$fill <- colour
# Put back into strip
strip$grobs[is_strip][[1]] <- grob
return(strip)
}, strip = strips, colour = colours)
# Put strips back into gtable
gt$grobs[is_strips] <- strips
return(gt)
}
gt <- assign_strip_colours(gt, 1:42,rainbow(42))
grid::grid.newpage(); grid::grid.draw(gt)
My bad, I think there should have been a SIMPLIFY = FALSE at the mapply() function which I forgot earlier.
gt <- ggplotGrob(p1)
assign_strip_colours <- function(gt, index, colours) {
if (length(index) != length(colours))
stop()
# Decide which strips to recolour, here: the first 3
is_strips <- which(startsWith(gt$layout$name, "strip-b"))[index]
# Extract strips
strips <- gt$grobs[is_strips]
# Loop over strips
strips <- mapply(function(strip, colour) {
# Find actual strip
is_strip <- strip$layout$name == "strip"
grob <- strip$grobs[is_strip][[1]]
# Find rectangle
is_rect <- which(vapply(grob$children, inherits, logical(1), "rect"))
# Change colour
grob$children[[is_rect]]$gp$fill <- colour
# Put back into strip
strip$grobs[is_strip][[1]] <- grob
return(strip)
}, strip = strips, colour = colours, SIMPLIFY = FALSE)
# Put strips back into gtable
gt$grobs[is_strips] <- strips
return(gt)
}
gt <- assign_strip_colours(gt, 1:42,rainbow(42))
grid::grid.newpage(); grid::grid.draw(gt)
Created on 2021-04-11 by the reprex package (v1.0.0)
Data / plot construction:
library(ggplot2)
library(ggh4x)
data <- [Censored upon request]
p1<-ggplot(data = data, aes(Name,mean, label = Name, fill=Organ)) +
geom_bar(position="dodge2", stat="identity",width = 0.85,color="black") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),position = position_dodge(0.95), width = .2) +
facet_nested(.~Organ+Disease, scales = "free_x", space = "free_x",switch = "x")+
theme_classic() +
theme(legend.position = "bottom",
legend.box = "horizontal",
plot.title = element_text(hjust = 0.5),
plot.margin = unit(c(5, 10, 20, 7), "mm"),
strip.background = element_rect(colour="black", fill="white"),
strip.text.x = element_text(size = 6, angle=0),
axis.text.x=element_text(size=8),
strip.placement = "outside"
) +
labs(title = "123", x = NULL, y = "value")

Frequency of a word in a list [duplicate]

This question already has answers here:
Find the n most common values in a vector
(4 answers)
Closed 4 years ago.
How do I count how many times does a word appear in R and the output is the one which appears the most?
a <- list(c("A", "A", "A", "A", "B", "B", "A", "B", "C", "C", "C", "A"))
the output should be "A"
Not sure if you really have a list or vector, but with a vector
a <-c("A", "A", "A", "A", "B", "B", "A", "B", "C", "C", "C", "A")
you can do
names(sort(table(a), decreasing=TRUE))[1]
to get the most common value
You can use sort with the decreasing=TRUE flag:
sort(table(list(c("A", "A", "A", "A", "B", "B", "A", "B", "C", "C", "C", "A"))),decreasing=TRUE)[1]
Output:
A
6

Efficient subsetting of a data.frame based on another jagged data.frame

I'm working on a project where I need to repeatedly subset a data.frame based on different combinations of attributes. Right now I'm subsetting the data.frame using the merge function as I don't know what the attributes input will be at run time, and this works. However, I'm wondering if there is a faster way to create the subsets.
require(data.table)
df <- structure(list(att1 = c("e", "a", "c", "a", "d", "e", "a", "d", "b", "a", "c", "a", "b", "e", "e", "c", "d", "d", "a", "e", "b"),
att2 = c("b", "d", "c", "a", "e", "c", "e", "d", "e", "b", "e", "e", "c", "e", "a", "a", "e", "c", "b", "b", "d"),
att3 = c("c", "b", "e", "b", "d", "d", "d", "c", "c", "d", "e", "a", "d", "c", "e", "a", "d", "e", "d", "a", "e"),
att4 = c("c", "a", "b", "a", "e", "c", "a", "a", "b", "a", "a", "e", "c", "d", "b", "e", "b", "d", "d", "b", "e")),
.Names = c("att1", "att2", "att3", "att4"), class = "data.frame", row.names = c(NA, -21L))
#create combinations of attributes
#attributes to search through
cnames <- colnames(df)
att_combos <- data.table()
for(i in 2:length(cnames)){
combos <- combn(cnames, i)
for(x in 1:ncol(combos)){
df_sub <- unique(df[,combos[1:nrow(combos), x]])
att_combos <- rbind(att_combos, df_sub, fill = T)
}
}
rm(df_sub, i, x, combos, cnames)
for(i in 1:nrow(att_combos)){
att_sub <- att_combos[i, ]
att_sub <- att_sub[, is.na(att_sub)==F, with = F]
#need to subset data.frame here - very slow on large data.frames
#anyway to speed this up?
df_subset_for_analysis <- merge(df, att_sub)
}
I would use data.table keys on the columns you want to subset on, and then generate a data.table (at runtime) with the combinations you are interested in, and then merge the two.
Here is an example with a single combination of attributes (simple_combinations) and one with multiple combinations of attributes (multiple_combinations):
require(data.table)
df <- structure(list(att1 = c("e", "a", "c", "a", "d", "e", "a", "d", "b", "a", "c", "a", "b", "e", "e", "c", "d", "d", "a", "e", "b"),
att2 = c("b", "d", "c", "a", "e", "c", "e", "d", "e", "b", "e", "e", "c", "e", "a", "a", "e", "c", "b", "b", "d"),
att3 = c("c", "b", "e", "b", "d", "d", "d", "c", "c", "d", "e", "a", "d", "c", "e", "a", "d", "e", "d", "a", "e"),
att4 = c("c", "a", "b", "a", "e", "c", "a", "a", "b", "a", "a", "e", "c", "d", "b", "e", "b", "d", "d", "b", "e")),
.Names = c("att1", "att2", "att3", "att4"), class = "data.frame", row.names = c(NA, -21L))
# Convert to data.table
dt <- data.table(df)
# Set key on the columns used for "subsetting"
setkey(dt, att1, att2, att3, att4)
# Simple subset on a single set of attributes
simple_combinations <- data.table(att1 = "d", att2 = "e", att3 = "d", att4 = "e")
setkey(simple_combinations, att1, att2, att3, att4)
# Merge to generate simple output subset (simple_combinations of att present in dt)
simple_subset <- merge(dt, simple_combinations)
# Complex (multiple) sets of attributes
multiple_combinations <- data.table(expand.grid(att1=c("d"), att2=c("c", "d", "e"),
att3 = c("d"), att4 = c("b", "e")))
setkey(multiple_combinations, att1, att2, att3, att4)
# Merge to generate output subset (multiple_combinations of att present in dt)
multiple_subset <- merge(dt, multiple_combinations)
The output is in simple_subset and multiple_subset.

R: How do I produce a state transition matrix from a vector that represent states over discrete time steps? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm using R and need some help.
Background:
I video-recorded participants in a behavioural study. I then coded different aspects of their behaviour from the videos so that I now have one data frame per participant. The df has many unordered factors, each representing the discrete temporal sequence of the participant's states for one specific behavioural dimension (e.g. gaze direction). Each row holds the value for one second for that dimension. To simplify, let's assume one such vector might look like this:
p01.gaze = factor(x = c("a", "b", "b", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "a", "d", "d", "d", "a", "a", "a", "e", "e", "d", "e", "e", "a","a", "e", "a", "a", "a", "e", "e", "e", "e", "e", "e", "e", "e", "e", "e", "d", "b", "b", "b", "d", "d", "d", "d", "d", "d", "d", "b", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "d", "a", "b", "a", "d", "d", "a", "c", "e", "e", "e", "c", "c", "a", "e", "e", "a", "a", "a"))
Problem:
For each vector I want to construct a 'state transition matrix' by calculating the frequency of transitions (using counts or alternatively proportion) between all possible pairs of states. So the matrix would be:
p01.gaze.m = matrix(nrow=5, ncol=5, dimnames = list(c("a", "b", "c", "d", "e"), c("a", "b", "c", "d", "e")))
NOTES:
1) I'm new to programming and couldn't find the right functions. I did search thoroughly but didn't find appropriate solutions so any help would be welcome.
2) The function markovchainFit (package markovchain) sounded tempting but I don't think I want/need to fit a Markov Model to my data (because of implications and commitments I don't want to make).
3) The function count.transitions (package RDS) also sounded tempting but I couldn't figure out how to coerce my data into rds.data object.
Many thanks =]
moe
Use the markovchain package for your #1 & #3.
Here is some sample code for your data that shows counting state transitions, and then graphing the transition probability matrix:
library(markovchain)
p01.gaze = factor(x = c("a", "b", "b", "a", "a", "a",
"a", "a", "a", "a", "a", "a",
"a", "b", "b", "a", "d", "d",
"d", "a", "a", "a", "e", "e",
"d", "e", "e", "a","a", "e",
"a", "a", "a", "e", "e", "e",
"e", "e", "e", "e", "e", "e",
"e", "d", "b", "b", "b", "d",
"d", "d", "d", "d", "d", "d",
"b", "d", "d", "d", "d", "d",
"d", "d", "d", "d", "d", "d",
"d", "d", "d", "d", "d", "d",
"d", "d", "d", "d", "d", "d",
"d", "d", "d", "d", "d", "a",
"b", "a", "d", "d", "a", "c",
"e", "e", "e", "c", "c", "a",
"e", "e", "a", "a", "a"))
p01_gaze_tpm = createSequenceMatrix(p01.gaze, toRowProbs = TRUE)
p01_gaze_mc = as(p01_gaze_tpm, "markovchain")
plot(p01_gaze_mc, edge.arrow.size = 0.2)
This gives the following graph:
Once you upload sample data for your second problem, I will update my answer to address that as well.

Resources