Related
This is a part of my code:
a <- data.frame(X1 = c(9, 9, 9, 8, 9, 9, 8, 9, 8, 7),
X2 = c(8, 8, 6, 8, 6, 8, 9, 8, 8, 8),
X3= c(-3, -3, -3, -3, -3, -3, -2, -1, -3, -3),
X4= c(-5, -7, -5, -7, -7, -7, -7, -7, -7, -5),
X5= c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1),
X6= c(9, 11, 11, 11, 11, 11, 9, 11, 11, 10),
X7= c(7, 8, 8, 7, 8, 8, 8, 8, 8, 6),
X8= c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0),
X9= c(25, 25, 25, 24, 25, 25, 24, 25, 25, 24))
cov=cov(a)
cov[5,1:5]<-0
cov[1:5,5]<-0
cov[5,5]<-1
cov
I try to write this part of the code in a more professional way:
cov[5,1:5]<-0
cov[1:5,5]<-0
cov[5,5]<-1
I tried to do something like:
cov[5,1:5] & cov[1:5,5]<-0
but it does not work
cov[5,1:4] <- cov[1:4,5] <-0
cov[5,5] <- 1
cov[5,1:5] <- cov[1:5,5] <-0
I use ggplot and plotROC packages to draw ROC curves, but one of the drawn curves is in the opposite direction. How can I modify them to keep the two curves in the same direction?
My code is as follows:
library("plotROC")
Response <- c(0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0)
len <- c(4, 7, 8, 10, 4, 10, 10, 10, 10, 10, 9, 8, 7, 7, 5, 4, 4, 4, 3, 3, 2,
2, 9, 11, 0.5, 10, 8, 5, 4, 10, 10, 9, 8, 8, 7, 5, 1, 12, 10, 11, 9,
10, 7, 10, 7, 12, 10, 11, 10, 4, 12, 7, 12, 14, 10, 9, 9, 7, 10, 2,
12, 12, 10, 16, 10, 9, 15, 10, 9, 5, 12, 12, 11, 6, 9.5, 9, 11, 3)
gc <- c(15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13,
13, 12, 12, 11, 10, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7,
7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 4, 3, 3, 3,3)
d1 <- data.frame(Response = Response, Predictor = len, group = "len")
d2 <- data.frame(Response = Response, Predictor = gc, group = "gc")
mydata <- rbind(d1, d2)
ggplot(mydata, aes(d = Response, m = Predictor, color = group, linetype = group, shape = group)) +
geom_roc(n.cut = 0, show.legend = TRUE, labels=FALSE, size = 0.6)+
geom_abline(size = 0.7, color = "grey", linetype = "dashed")+
xlab("1 - Specificity") +
ylab("Sensitivity")
I have data equivalent data from 2019 and 2020. The proportion of diagnoses in 2020 look like they differ from 2019, but I'd like to ...
a) statistically test the populations are different.
b) determine which categories are the most different.
I've worked out I can do 'a' using:
chisq.test(test$count.2020, test$count.2019)
I don't know how to find out which categories are the ones that are the most different between 2020 and 2019. Any help would be amazing, thanks!
diagnosis <- data.frame(mf_label = c("Audiovestibular", "Autonomic", "Cardiovascular",
"Cerebral palsy", "Cerebrovascular", "COVID", "Cranial nerves",
"CSF disorders", "Developmental", "Epilepsy and consciousness",
"Functional", "Head injury", "Headache", "Hearing loss", "Infection",
"Maxillofacial", "Movement disorders", "Muscle and NMJ", "Musculoskeletal",
"Myelopathy", "Neurodegenerative", "Neuroinflammatory", "Peripheral nerve",
"Plexopathy", "Psychiatric", "Radiculopathy", "Spinal", "Syncope",
"Toxic and nutritional", "Tumour", "Visual system"),
count.2019 = c(5, 0, 1, 1, 2, 0, 4, 3, 0, 7, 4, 0, 24, 0, 0, 2, 22, 3, 3, 0, 3, 18, 12, 0, 0, 2, 2, 0, 1, 4, 0),
count.2020 = c(5, 1, 1, 3, 28, 9, 11, 13, 1, 13, 30, 5, 68, 1, 1, 2, 57, 14, 5, 8, 16, 37, 27, 3, 13, 17, 3, 1, 8, 13, 11))
Your Chi square test is not correct. You need to provide the counts as a table or matrix, not as two separate vectors. Because you have very small expected values for half of the cells, you need to use simulation to estimate the p-value:
results <- chisq.test(diagnosis[, 2:3], simulate.p.value=TRUE)
The overall table is barely significant at .05. The chisq.test function returns a list including the original data, the expected values, residuals, and standardized residuals. The manual page describes these (?chisq.test) and provides some citations for more details.
I am trying to use a gsub function on my data frame. In my data frame, I have phrases like "text.Democrat17_P" and many others with different numbers. My goal is to replace phrases like this with just, "DEM".
I first wanted to test the gsub function with only one row before I replace every value in the data frame. However, when I ran my script, gsub seemed to disassemble my data frame and list numbers out instead.
My result looked like this:
[1] "c(14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16)"
[2] "c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4)"
[3] "c(0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1)"
[4] "c(2322, 2490, 2912, 3181, 3245, 2640, 3215, 4506, 3256, 2705, 2662, 5676, 7344, 2888, 2891, 4387, 9494, 2525, 3649, 1654, 2178, 2913, 2922, 3320, 7243, 5836, 6054, 6283, 4499, 5291, 4747, 2538, 5433, 5354, 5272, 4166, 3427, 5432, 4566, 5371, 5503, 4550, 1639, 2603, 3937, 2359, 1516, 1204, 826, 916, 1039, 1738, 2077, 874, 2495, 628, 582, 872, 2179, 682, 578, 476, 2207, 1178, 859, 1345, 1223, 2014, 438, 448, 1020, 879, 1117, 271, 210, 295, 233, 172, 77, 205, 3958)"
[5] "c(\"DEM\", \"text.Demminus14_P\", \"text.Demplus9_P\", \"text.Repminus11_O\", \"text.Repminus18_O\", \"text.Demminus12_P\", \"text.Repminus4_O\", \"text.Demminus12_P\", \"text.Repplus8_O\", \"text.Demminus4_P\", \"text.Demplus9_P\", \"text.Demminus20_P\", \"text.Repplus16_O\", \"text.Repminus10_O\", \"text.Repminus13_O\", \"text.Demplus18_P\", \"text.Repplus18_O\", \"text.Demplus1_P\", \"text.Repminus15_O\", \"text.Demminus11_P\", \"text.Repplus14_O\", \"text.Demminus8_P\", \"text.Repminus18_O\", \"text.Repminus13_O\", \"text.Demminus9_P\", \n\"text.Repminus13_O\", \"text.Repminus16_O\", \"text.Demminus9_P\", \"text.Repminus1_O\", \"text.Demplus15_P\", \"DEM\", \"text.Demminus1_P\", \"text.Repplus2_O\", \"text.Demminus18_P\", \"text.Repplus14_O\", \"text.Repminus20_O\", \"text.Repplus16_O\", \"text.Demplus2_P\", \"text.Repplus10_O\", \"text.Demminus18_P\", \"text.Repplus2_O\", \"text.Demminus15_P\", \"text.Repminus6_O\", \"text.Demminus19_P\", \"text.Repminus9_O\", \"text.Repplus15_O\", \"text.Repminus15_O\", \"text.Repminus8_O\", \"text.Repplus12_O\", \"text.Demminus19_P\", \n\"text.Repplus6_O\", \"text.Demplus13_P\", \"text.Demminus14_P\", \"text.Demminus5_P\", \"text.Demminus2_P\", \"text.Repplus1_O\", \"text.Repminus18_O\", \"text.Repplus14_O\", \"text.Demplus20_P\", \"text.Repplus6_O\", \"text.Repminus16_O\", \"text.Demminus19_P\", \"text.Demplus12_P\", \"text.Demminus12_P\", \"text.Demminus10_P\", \"text.Repplus5_O\", \"text.Demplus5_P\", \"text.Repplus17_O\", \"text.Repminus13_O\", \"text.Demplus3_P\", \"text.Demminus5_P\", \"text.Repminus10_O\", \"text.Repplus6_O\", \"text.Repplus16_O\", \"text.Repminus10_O\", \n\"text.Repplus1_O\", \"text.Demminus6_P\", \"text.Repplus5_O\", \"text.Demplus3_P\", \"text.Demplus3_P\", \"text.Repminus19_P\")"
Does anyone know why this happend and how I can get my result to look like a original data frame, with rows and columns.
This is the code that I am using:
DDMB <- DDMBehavfin[1,]
DDMB
gsub (pattern= "text.Demminus17_P", replacement = "DEM", x= DDMB)
Could this have to do something with the datatype of the columns? What can I do to my gsub function so that I can gain a regular looking data frame instead of a messy result like this?
I first want to tackle why my result looks odd, before using gsub to replace all of the values.
Thank you for any help.
I have the following simple data
data <- structure(list(status = c(9, 5, 9, 10, 11, 10, 8, 6, 6, 7, 10,
10, 7, 11, 11, 7, NA, 9, 11, 9, 10, 8, 9, 10, 7, 11, 9, 10, 9,
9, 8, 9, 11, 9, 11, 7, 8, 6, 11, 10, 9, 11, 11, 10, 11, 10, 9,
11, 7, 8, 8, 9, 4, 11, 11, 8, 7, 7, 11, 11, 11, 6, 7, 11, 6,
10, 10, 9, 10, 10, 8, 8, 10, 4, 8, 5, 8, 7), statusgruppe = c(0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1,
1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0,
1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0)), .Names = c("status",
"statusgruppe"), class = "data.frame", row.names = c(NA, -78L
))
from that I'd like to make a histogram:
ggplot(data, aes(status))+
geom_histogram(aes(y=..density..),
binwidth=1, colour = "black",
fill="white")+
theme_bw()+
scale_x_continuous("Staus", breaks=c(min(data$status,na.rm=T), median(data$status, na.rm=T), max(data$status, na.rm=T)),labels=c("Low", "Middle", "High"))+
scale_y_continuous("Percent", formatter="percent")
Now - i'd like for the bins to take colou according to value - e.g. bins with value > 9 gets dark grey - everything else should be light grey.
I have tried with fill=statusgruppe, scale_fill_grey(breaks=9) etc. - but I can't get it to work. Any ideas?
Hopefully this should get you started:
ggplot(data, aes(status, fill = ..x..))+
geom_histogram(binwidth = 1) +
scale_fill_gradient(low = "black", high = "white")
ggplot(data, aes(status, fill = ..x.. > 9))+
geom_histogram(binwidth = 1) +
scale_fill_grey()
How about using fill=..count.. or fill=I(..count..>9) right after y=..density..? You have to tinker with the legend title and labels a bit, but it gets the coloring right.
EDIT:
It seems I misunderstood your question a bit. If you want to define color based on the x-coordinate, you can use the ..x.. automatic variable similarly.
What about scale_manual? Here's link to Hadley's site. I've used this function to set an appropriate fill colour for a boxplot. Not sure if it'll work with histogram, though...