R - Making code more professional/efficient - r

This is a part of my code:
a <- data.frame(X1 = c(9, 9, 9, 8, 9, 9, 8, 9, 8, 7),
X2 = c(8, 8, 6, 8, 6, 8, 9, 8, 8, 8),
X3= c(-3, -3, -3, -3, -3, -3, -2, -1, -3, -3),
X4= c(-5, -7, -5, -7, -7, -7, -7, -7, -7, -5),
X5= c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1),
X6= c(9, 11, 11, 11, 11, 11, 9, 11, 11, 10),
X7= c(7, 8, 8, 7, 8, 8, 8, 8, 8, 6),
X8= c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0),
X9= c(25, 25, 25, 24, 25, 25, 24, 25, 25, 24))
cov=cov(a)
cov[5,1:5]<-0
cov[1:5,5]<-0
cov[5,5]<-1
cov
I try to write this part of the code in a more professional way:
cov[5,1:5]<-0
cov[1:5,5]<-0
cov[5,5]<-1
I tried to do something like:
cov[5,1:5] & cov[1:5,5]<-0
but it does not work

cov[5,1:4] <- cov[1:4,5] <-0
cov[5,5] <- 1

cov[5,1:5] <- cov[1:5,5] <-0

Related

How to make the expected value of the difference in the values in paired data using ggplot2

I have a pair data as below and I want to make the expected value of the difference in the value (column called value) of pairs. In all the pairs, one has disease and the other one does not have disease as you can see from the data. In other words, the expected value of the difference of the value in one sibling compare to his/her sibling.
The description of the variable in the data are:
id = individual ID
family ID = family ID showing their dependency
status = 1 means disease and status = 0 means no-disease
Any guidance is appreciated.
d <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
familyID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10),
status = c(0,1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1),
value = c(29,26, 39, 22.3, 24, 41, 29.7, 24, 25.9, 21, 29,24,26,29, 15.2, 11, 35, 15.4,16, 13.4)),
class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))
I'm not certain if this is what you are looking for, but I used pivot_wider from tidyr to spread the values into two columns, though with status 0 and those with status 1. Then I used mutate to take a difference between the two columns, then plotted the familyID by the newly created difference with ggplot. Note that I removed the id column for the pivot_wider to work.
d <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
familyID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10),
status = c(0,1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1),
value = c(29,26, 39, 22.3, 24, 41, 29.7, 24, 25.9, 21, 29,24,26,29, 15.2, 11, 35, 15.4,16, 13.4)),
class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))
library(dplyr)
library(tidyr)
library(ggplot2)
d%>%
select(-id)%>%
pivot_wider(values_from = value, names_from = status)%>%
mutate("Diff" = (`0`-`1`))%>%
ggplot()+
aes(as.character(familyID), Diff)+
geom_point()
You can group by familyID, then use summarize() from the dplyr package to find the differences.
Also note the conversion of id, familyID, and status to factors, which may make life easier so they aren't confused with being integers.
library(dplyr)
library(forcats)
library(ggplot2)
d <- structure(list(id = as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)),
familyID = as.factor(c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10)),
status = as.factor(c(0,1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1)),
value = c(29,26, 39, 22.3, 24, 41, 29.7, 24, 25.9, 21, 29,24,26,29, 15.2, 11, 35, 15.4,16, 13.4)),
class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))
diffs <- group_by(d, familyID) %>%
summarize(., diff = (value[status == 0] - value[status == 1]))
Reordering the families by difference can help get a sense of the distribution of differences
diffs$familyID <- fct_reorder(diffs$familyID, diffs$diff, .desc = TRUE)
ggplot(diffs, aes(x = familyID, y = diff)) +
geom_bar(stat="identity")
If you really have a lot of families you may want to display a summary of the differences.
One option is with a histogram (modifying binwidth can control how fine the bins are):
ggplot(diffs, aes(x = diff)) +
geom_histogram(binwidth = 3)
Similar to a histogram is a density plot:
ggplot(diffs, aes(x = diff)) +
geom_density()
Finally, a boxplot is also a familiar summary. They're mostly meant for comparing multiple groups, but it works okay with just one. I've added the individual points using the geom_jitter() function.
ggplot(diffs, aes(y = diff)) + #If using multiple groups add x=group inside the aes() function.
geom_boxplot() +
geom_jitter(aes(x = 0))

Density plot of a vector shows tails before and after its minimum and maximum

I have the following vector:
v<-c(1, 1, 8, 3, 1, 9, 4, 21, 13, 13, 1, 1, 3, 10, 1, 13, 22, 1,
1, 4, 2, 1, 13, 1, 5, 1, 2, 1, 1, 2, 12, 10, 26, 15, 2, 9, 6,
5, 1, 3, 18, 2, 10, 2, 8, 9, 4, 1, 11, 4, 2, 12, 3, 14, 2, 1,
27, 3, 6, 2, 1, 1, 3, 16, 3, 36, 13, 9, 11, 10, 24, 2, 27, 4,
4, 2, 9, 1, 3, 13, 3, 1, 8, 5, 5, 15, 1, 1, 3, 1, 4, 14, 8, 1,
1, 2, 20, 1, 9, 3, 1, 2, 5, 14, 5, 11, 1, 3, 2, 9, 10, 21, 9,
1, 20, 5, 11, 23, 2, 1, 1, 2, 1, 7, 2, 9, 1, 19, 9, 9, 2, 15,
17, 8, 11, 17, 2, 14, 2, 8, 13, 1, 2, 9, 15, 25, 3, 8, 32, 4,
11, 1, 1, 2)
I would like to estimate its density in R through the command density. With few lines of code:
d<-density(v)
df<-data.frame(x=d$x,y=d$y,stringsAsFactors = FALSE)
plot(df)
I obtained the following picture:
But the resulting plot doesn't add up, because max(v) is 36 and min(v) is 1 while the graph shows tails before and after 0 and 40.

The ROC curve is below the oblique line, how to correct it?

I use ggplot and plotROC packages to draw ROC curves, but one of the drawn curves is in the opposite direction. How can I modify them to keep the two curves in the same direction?
My code is as follows:
library("plotROC")
Response <- c(0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0)
len <- c(4, 7, 8, 10, 4, 10, 10, 10, 10, 10, 9, 8, 7, 7, 5, 4, 4, 4, 3, 3, 2,
2, 9, 11, 0.5, 10, 8, 5, 4, 10, 10, 9, 8, 8, 7, 5, 1, 12, 10, 11, 9,
10, 7, 10, 7, 12, 10, 11, 10, 4, 12, 7, 12, 14, 10, 9, 9, 7, 10, 2,
12, 12, 10, 16, 10, 9, 15, 10, 9, 5, 12, 12, 11, 6, 9.5, 9, 11, 3)
gc <- c(15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13,
13, 12, 12, 11, 10, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7,
7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 4, 3, 3, 3,3)
d1 <- data.frame(Response = Response, Predictor = len, group = "len")
d2 <- data.frame(Response = Response, Predictor = gc, group = "gc")
mydata <- rbind(d1, d2)
ggplot(mydata, aes(d = Response, m = Predictor, color = group, linetype = group, shape = group)) +
geom_roc(n.cut = 0, show.legend = TRUE, labels=FALSE, size = 0.6)+
geom_abline(size = 0.7, color = "grey", linetype = "dashed")+
xlab("1 - Specificity") +
ylab("Sensitivity")

Showing a subgroup or subdivision within a histogram bar

I have data as follows:
thevalues <- structure(c(9, 7, 9, 9, 9, 8, 9, 6, 4, 7, 9, 9, 9, 8, 7, 7, 9,
8, 8, 9, 5, 5, 8, 7, 5, 9, 9, 7, 7, 9, 8, 7, 8, 9, 4, 7, 9, 8,
6, 7, 7, 4, 8, 6, 9, 9, 8, 1, 9, 9, 9, 8, 9, 9, 6, 7, 4, 7, 9,
6, 6, 9, 9, 8, 6, 8, 7, 7, 7, 5, 9, 5, 7, 9, 8, 4, 9, 8, 8, 8,
5, 8, 1, 7, 7, 5, 6, 9, 5, 9, 6, 9, 6, 9, 9, 9, 8, 9, 9, 9, 9,
4, 6, 4, 8, 6, 8, 8, 7, 4, 6, 7, 4, 8, 8, 8, 7, 9, 3, 8, 8, 6,
9, 8, 8, 6, 5, 8, 3, 8, 6, 8, 7, 7, 6, 9, 5, 9, 8, 7, 9, 7, 9,
9, 8, 9, 6, 8, 9, 8, 6, 8, 9, 9, 9, 4, 8, 8, 5, 8, 7, 8, 8, 9,
9, 6, 8, 5, 9, 8, 7, 9, 9, 7, 6, 8, 7, 7, 8, 9, 6, 7, 8, 9, 7,
6, 6, 9, 7, 7, 8, 7, 7, 2, 4, 9, 9, 7, 7, 9, 7, 6, 9, 9, 8, 5,
5), label = NA_character_, class = c("labelled", "numeric"))
mistakes <- structure(c(0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0), label = NA_character_, class = c("labelled", "numeric"))
I want to create a histogram of thevalues like so:
df <- data.frame(value = c(A),
variable = rep(c("thevalues"), each = length(A)))
ggplot(df, aes(value, fill = variable)) +
geom_density(aes(y = ..count..), size = 0.7, alpha = 0.1) +
geom_bar(position = "dodge") +
scale_fill_brewer(palette = "Set1") +
scale_x_continuous(breaks = c(1:9), labels = c(1:9)) +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))
However, I would like to see the mistakes as part of these bars:
table(thevalues, mistakes)
mistakes
thevalues 0 1
1 1 1
2 1 0
3 1 1
4 9 2
5 10 4
6 17 8 # The total height of the bar is 25, 8 have a different colour.
7 24 16 # The total height of the bar is 40, 16 have a different colour.
8 33 16 # The total height of the bar is 49, 16 have a different colour.
9 49 14 # The total height of the bar is 63, 14 have a different colour.
Something like this:
EDIT:
The solution works perfectly, but I would really like to do this when there are two variables in the histogram:
thevalues_II <- structure(c(9, 9, 9, 8, 8, 9, 6, 9, 8, 8, 6, 9, 9, 9, 6, 7, 9,
7, 8, 9, 7, 9, 9, 8, 7, 9, 8, 7, 8, 9, 8, 9, 9, 9, 9, 7, 9, 7,
8, 9, 7, 7, 8, 4, 6, 9, 7, 7, 9, 9, 9, 8, 9, 8, 9, 9, 4, 8, 9,
8, 7, 9, 9, 8, 7, 8, 9, 8, 2, 7, 8, 8, 8, 8, 8, 6, 4, 9, 9, 8,
3, 7, 3, 8, 8, 9, 7, 9, 5, 6, 7, 8, 9, 8, 9, 9, 9, 9, 9, 9, 9,
7, 3, 7, 9, 7, 7, 7, 8, 8, 9, 9, 8, 8, 9, 6, 9, 9, 6, 7, 8, 7,
8, 9, 9, 7, 6, 8, 7, 9, 6, 5, 8, 8, 7, 9, 8, 9, 9, 7, 9, 7, 9,
8, 7, 9, 4, 8, 7, 7, 9, 9, 9, 9, 9, 4, 9, 9, 6, 7, 6, 7, 8, 9,
8, 9, 5, 9, 8, 8, 8, 9, 9, 6, 8, 8, 8, 8, 8, 8, 7, 8, 9, 9, 9,
7, 4, 8, 7, 7, 9, 8, 8, 7, 5, 8, 9, 8, 8, 9, 8, 5, 8, 9, 8, 9,
7), label = NA_character_, class = c("labelled", "numeric"))
df <- data.frame(value = c(thevalues, thevalues_II),
variable = rep(c("tax", "truth"), each = length(A)))
ggplot(df, aes(value, fill = variable)) +
geom_density(aes(y = ..count..), size = 0.7, alpha = 0.3) +
geom_bar(position = "dodge") +
scale_fill_brewer(palette = "Set1") +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))
I tried:
library(tidyverse)
mydf <- data.frame(thevalues, mistakes)
mycount <- count(mydf, thevalues, thevalues_II, mistakes)
ggplot() +
geom_col(data = mycount, aes(thevalues, thevalues_II, n, fill = as.character(mistakes))) +
geom_density(data = mydf, aes(thevalues, thevalues_II, y = ..count..), size = 0.7, alpha = 0.1) +
scale_fill_brewer(palette = "Set1") +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))
But that does not work.
Try a summarising count first. Apologies again for lack of image - using online console with reduced facilities.
library(tidyverse)
mydf <- data.frame(thevalues, mistakes)
mycount <- count(mydf, thevalues, mistakes)
ggplot() +
geom_col(data = mycount, aes(thevalues, n, fill = as.character(mistakes))) +
geom_density(data = mydf, aes(thevalues, y = ..count..), size = 0.7, alpha = 0.1) +
scale_fill_brewer(palette = "Set1") +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))

Joining two weighted Graphs in R and keeping weight as sum

I have the same question as this how to merge two weighted graph and sum weigths.
But here ist my R code for better understanding:
g1 <- graph.full(10)
V(g1)$name <- letters[1:vcount(g1)]
E(g1)$weight <- 1
g3 <- graph.full(5)
V(g3)$name <- c("a", "b", "x", "y", "z")
E(g3)$weight <- 1
graph.union.by.name(g1, g3)
The weights in merged graph should be a 2 on same edges in g1 and g3 (a - b)
And the dput of graphs is:
> dput(g1)
structure(list(10, FALSE, c(1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3,
4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 5, 6,
7, 8, 9, 6, 7, 8, 9, 7, 8, 9, 8, 9, 9), c(0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8), c(0, 1, 9,
2, 10, 17, 3, 11, 18, 24, 4, 12, 19, 25, 30, 5, 13, 20, 26, 31,
35, 6, 14, 21, 27, 32, 36, 39, 7, 15, 22, 28, 33, 37, 40, 42,
8, 16, 23, 29, 34, 38, 41, 43, 44), c(0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44), c(0, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45),
c(0, 9, 17, 24, 30, 35, 39, 42, 44, 45, 45), list(c(1, 0,
1), structure(list(name = "Full graph", loops = FALSE), .Names = c("name",
"loops")), structure(list(name = c("a", "b", "c", "d", "e",
"f", "g", "h", "i", "j")), .Names = "name"), structure(list(
weight = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = "weight"))), class = "igraph")
> dput(g2)
structure(list(10, FALSE, c(1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3,
4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 5, 6,
7, 8, 9, 6, 7, 8, 9, 7, 8, 9, 8, 9, 9), c(0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8), c(0, 1, 9,
2, 10, 17, 3, 11, 18, 24, 4, 12, 19, 25, 30, 5, 13, 20, 26, 31,
35, 6, 14, 21, 27, 32, 36, 39, 7, 15, 22, 28, 33, 37, 40, 42,
8, 16, 23, 29, 34, 38, 41, 43, 44), c(0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44), c(0, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45),
c(0, 9, 17, 24, 30, 35, 39, 42, 44, 45, 45), list(c(1, 0,
1), structure(list(name = "Full graph", loops = FALSE), .Names = c("name",
"loops")), structure(list(name = c("a", "b", "c", "d", "e",
"f", "g", "h", "i", "j")), .Names = "name"), structure(list(
weight = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = "weight"))), class = "igraph")
Is it possible with igraph or do i need some workaround?
This will be supported in the next version, until then here is a workaround:
mymerge <- function(g1, g2) {
e1 <- get.data.frame(g1, what="edges")
e2 <- get.data.frame(g2, what="edges")
e <- merge(e1, e2, by=c("from", "to"), all=TRUE)
newe <- data.frame(e[,c("from", "to"), drop=FALSE],
weight=rowSums(e[, c("weight.x", "weight.y")], na.rm=TRUE))
graph.data.frame(newe, directed=is.directed(g1))
}
mymerge(g1, g3)
# IGRAPH UNW- 13 54 --
# + attr: name (v/c), weight (e/n)
mymerge(g1, g3)["a", "b"]
# [1] 2

Resources