Related
I have a rather small dataset resulted from a linkage between two different datasets. I would like to know how can I calculate specificity, sensibility, predictive values and plot the ROC curve. This is the first time I'm using this kind of statistics in R, so I don't even know how to start.
Part of the data looks like this:
data <- data.frame(NMM_TOTAL = c(1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1),
CPAV_TOTAL = c(0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0),
SIH_NMM_TOTAL = c(0, 0, 0, 1, 1, 1, 1, 1, 1, 0 , 0, 1, 1, 0, 1),
SIH_CPAV_TOTAL = c(1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1))
And the two way tables would be the combination of:
tab1 <- table(data$SIH_NMM_TOTAL, data$NMM_TOTAL)
tab2 <- table(data$SIH_CPAV_TOTAL, data$CPAV_TOTAL)
Where NMM_TOTAL and CPAV_TOTAL are the "gold standard". I don't know if any of this makes sense. Thanks in advance!
Obs: 1 stands for positive and 0 for negative.
Let's work with tab1 to demonstrate specificity, sensitivity, and predictive values. Consider labeling the rows and columns of your tables to enhance clarity
act <- data$SIH_NMM_TOTAL
ref <- data$NMM_TOTAL
table(act,ref)
Load this library
library(caret)
The input data needs to be factors
act <- factor(act)
ref <- factor(ref)
The commands look like this
sensitivity(act, ref)
specificity(act, ref)
posPredValue(act, ref)
negPredValue(act, ref)
ROC curve. The Receiver Operating Characteristic (ROC) curve is used to assess the accuracy of a continuous measurement for predicting a binary outcome. It is not clear from your data that you can plot an ROC curve. Let me show you a simple example on how to generate one. The example is drawn from https://cran.r-project.org/web/packages/plotROC/vignettes/examples.html
library(ggplot2)
library(plotROC)
set.seed(1)
D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, stringsAsFactors = FALSE)
head(test)
ggplot(test, aes(d = D, m = M1)) + geom_roc()
everyone!
I am trying to run a GWAS analysis in R on some very simple genetic data. It only contains the SNPs and one outcome variable (as well as an ID variable for each observation).
Everything I have found online includes chromosome and position data. I have that for the SNPs, but in a separate file. (My plan is to map the SNPs after the relevant ones have been selected).
How can I go about running a GWAS analysis on this data? Would I need to, or could I use another method to filter to only the most significant SNPs?
I tried this, but it didn't work, because my data is not a gData object.
# SNPs are in A/B notation, with 0 = AA, 1 = AB, and 2 = BB
library(statgenGWAS)
id <- c("person1", "person2", "person3", "person4", "person5", "person6", "person7", "person8", "person9", "person10")
snp1 <- c(0, 1, 2, 2, 1, 0, 0, 0, 1, 1)
snp2 <- c(2, 2, 2, 1, 1, 1, 0, 0, 0, 1)
snp3 <- c(0, 0, 2, 2, 0, 2, 1, 0, 2, 2)
diagnosis <- c(0, 1, 1, 0, 0, 1, 1, 0, 1, 1)
data <- as.data.frame(cbind(id, snp1, snp2, snp3, diagnosis))
gwas1a <- runSingleTraitGwas(gData = data,
traits = "diagnosis")
Any help here is appreciated.
Thank you!
I want to study mixture Copula for reliability analysis.however I can't construct RVINEMatrix ,
Therefore, the probability integral transformation (PIT) cannot be performed、 The copula used in H-equation to convert related variables into independent variables cannot be filled with mixed copulas。
Here is my code:
copula1 <- mixCopula(list(claytonCopula(param = 1.75,dim = 2),
frankCopula(param = 0.718,dim = 2),
gumbelCopula(param = 1.58,dim = 2)),w=c(0.4492,0.3383,0.2125))
copula2 <- mixCopula(list(frankCopula(param = 0.69,dim = 2),
gumbelCopula(param = 1.48,dim = 2),
claytonCopula(param = 1.9,dim = 2)),w=c(0.3784,0.3093,0.3123))
copula3 <- mixCopula(list(frankCopula(param = 7.01,dim = 2),
claytonCopula(param = 0.75,dim = 2),
gumbelCopula(param = 1.7,dim = 2)),w=c(0.4314,0.2611,0.3075))
copula4 <- mixCopula(list(gumbelCopula(param = 1.21,dim = 2),
claytonCopula(param = 0.89,dim = 2),
frankCopula(param = 3.62,dim = 2)),w=c(0.3306,0.2618,0.4076))
.......
Matrix <- c (5, 4, 3, 2, 1,
0, 4, 3, 2, 1,
0, 0, 3, 2, 1,
0, 0, 0, 2, 1,
0, 0, 0, 0, 1)
Matrix <- matrix(Matrix, 5, 5)
family1 <- c(0,copula10,copula9,copula7, copula4,
0, 0, copula8,copula6, copula3,
0, 0, 0, copula5, copula2,
0, 0, 0, 0, copula1,
0, 0, 0, 0, 0)
family1 <- matrix(family1, 5, 5)
par <- c(0, 0.2, 0.5,0.32, 0.50,``
0, 0, 0.5, 0.98, 0.5,
0, 0, 0, 0.9 , 0.5,
0, 0, 0, 0, 0.39,
0, 0, 0, 0, 0)
par <- matrix(par, 5, 5)
par2 <- c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0)
par2 <- matrix(par2, 5, 5)
RVM <- RVineMatrix(Matrix = Matrix, family = family1,
par = par, par2 = par2,
names = c("V1", "V2", "V3", "V4", "V5"),check.pars = TRUE)
so could you help me to construct the rvinematrix ? or Achieve this by other means. thanks!
There are some points you should be aware of:
You use the mixcopula from the copula package. That will provide you with a mixture model with a copula, not a mixture of R-vine copula.
Then you try to fit the copula generated from the mixture of copula into the Rvine copula model. This will not work because the index for copula functions in the R-vine copula is different from the one in the copula package. So, Rvine matrix accepts only a number, where each number corresponds to a specific type of copula.
So, to build a mixture of the R-vine copula model, you should build a mixture of R-vine densities. There exist a clustering GitHub package, called vineclust. It is designed for vine copula clustering models. By the way, for the mixture of Rvine copula, you need (for two components), two matrices of families, parameters, and Matrix.
An example of vine mixture from vineclust is:
dims <- 3
obs <- c(500,500)
RVMs <- list()
RVMs[[1]] <- VineCopula::RVineMatrix(Matrix=matrix(c(1,3,2,0,3,2,0,0,2),dims,dims),
family=matrix(c(0,3,4,0,0,14,0,0,0),dims,dims),
par=matrix(c(0,0.8571429,2.5,0,0,5,0,0,0),dims,dims),
par2=matrix(sample(0, dims*dims, replace=TRUE),dims,dims))
RVMs[[2]] <- VineCopula::RVineMatrix(Matrix=matrix(c(1,3,2,0,3,2,0,0,2), dims,dims),
family=matrix(c(0,6,5,0,0,13,0,0,0), dims,dims),
par=matrix(c(0,1.443813,11.43621,0,0,2,0,0,0),dims,dims),
par2=matrix(sample(0, dims*dims, replace=TRUE),dims,dims))
margin <- matrix(c('Normal', 'Gamma', 'Lognormal', 'Lognormal', 'Normal', 'Gamma'), 3, 2)
margin_pars <- array(0, dim=c(2, 3, 2))
margin_pars[,1,1] <- c(1, 2)
margin_pars[,1,2] <- c(1.5, 0.4)
margin_pars[,2,1] <- c(1, 0.2)
margin_pars[,2,2] <- c(18, 5)
margin_pars[,3,1] <- c(0.8, 0.8)
margin_pars[,3,2] <- c(1, 0.2)
x_data <- rvcmm(dims, obs, margin, margin_pars, RVMs)
Thank you for your time in advance. I am attempting to identify a method to calculate in-degree Bonacich Power Centrality in R. I'm a long-time UCINET user attempting to make the switch. In UCINET, this is done selecting Beta Centrality (Bonacich Power), and selecting "in-centrality" for the direction.
In R, it doesn't seem as though there is a way to calculate this using either sna or igraph packages. Here it is for bonpow in sna:
bonpow(dat, g=1, nodes=NULL, gmode="digraph", diag=FALSE, tmaxdev=FALSE,
exponent=1, rescale=FALSE, tol=1e-07)
I do specify digraph, but I am not able to replicate the analysis in R.
Similarly, here it is for power_centrality in igraph:
power_centrality(graph, nodes = V(graph), loops = FALSE,
exponent = 1, rescale = FALSE, tol = 1e-07, sparse = TRUE)
Here, there does not seem to be a way to specify that it is a directed graph (although you can specify it when defining the network). However, you can estimate it for betweenness centrality.
In neither case do I seem to be able to specify in-degree or out-degree power centrality. Any help is appreciated. Is there something either in these or in a different package that I may be overlooking?
I'm not sure about what do you mean by direction since the original paper, seems to me, does not deal with it. Now, a thing that is usually done with these statistics that are calculated directly from the adjacency matrix is "change the direction" by taking the transpose of the statistic (for example, when computing exposure in the netdiffuseR package we allow the user to compute "incoming" or "outgoing" exposure by just taking the transpose of the adjacency matrix). When you take the transpose, you are essentially flipping the directionality of the ties, i.e. i->j turns to j->i.
If that's what UCINET does (again, not completely sure what it is), then you can get the "incoming"/"outgoing" version by transposing the network. Here is a toy example:
# Loading the sna package (btw: igraph's implementation is a copy of
# sna's). I wrap it around suppressMessages to avoid the verbose
# print that the package has
suppressMessages(library(sna))
# This is random graph I generated with 10 vertices
net <- structure(
c(0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1,
0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0),
.Dim = c(10L, 10L)
)
# Here is the default
bonpow(net)
#> [1] -0.8921521 -0.7658658 -0.9165947 -1.4176664 -0.6151369 -0.7862345
#> [7] -0.9206684 -1.3565601 -1.0347335 -1.0062173
# Here I'm getting the transpose of the adjmat
net2 <- t(net)
# The output is different (as you can see)
bonpow(net2)
#> [1] -0.8969158 -1.1026305 -0.6336011 -0.7158869 -1.2960022 -0.9545159
#> [7] -1.1684592 -0.8845729 -1.0368018 -1.1190876
Created on 2019-11-20 by the reprex package (v0.3.0)
Using kernlab I've trained a model with code like the following:
my.model <- ksvm(result ~ f1+f2+f3, data=gold, kernel="vanilladot")
Since it's a linear model, I prefer at run-time to compute the scores as a simple weighted sum of the feature values rather than using the full SVM machinery. How can I convert the model to something like this (some made-up weights here):
> c(.bias=-2.7, f1=0.35, f2=-0.24, f3=2.31)
.bias f1 f2 f3
-2.70 0.35 -0.24 2.31
where .bias is the bias term and the rest are feature weights?
EDIT:
Here's some example data.
gold <- structure(list(result = c(-1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), f1 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1), f2 = c(13.4138113499447,
13.2216999857095, 12.964145772169, 13.1975227965938, 13.1031520152764,
13.59351759447, 13.1031520152764, 13.2700658838026, 12.964145772169,
13.1975227965938, 12.964145772169, 13.59351759447, 13.59351759447,
13.0897162110721, 13.364151238365, 12.9483051847806, 12.964145772169,
12.964145772169, 12.964145772169, 12.9483051847806, 13.0937231331592,
13.5362700880482, 13.3654209223623, 13.4356400945176, 13.59351759447,
13.2659406408724, 13.4228886221088, 13.5103065354936, 13.5642812689161,
13.3224757352068, 13.1779418771704, 13.5601730479315, 13.5457299603578,
13.3729010596517, 13.4823595997866, 13.0965264603473, 13.2710281801434,
13.4489887206797, 13.5132372154748, 13.5196188787197), f3 = c(0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0)), .Names = c("result",
"f1", "f2", "f3"), class = "data.frame", row.names = c(NA, 40L
))
To get the bias, just evaluate the model with a feature vector of all zeros. To get the coefficient of the first feature, evaluate the model with a feature vector with a "1" in the first position, and zeros everywhere else - and then subtract the bias, which you already know. I'm afraid I don't know R syntax, but conceptually you want something like this:
bias = my.model.eval([0, 0, 0])
f1 = my.model.eval([1, 0, 0]) - bias
f2 = my.model.eval([0, 1, 0]) - bias
f3 = my.model.eval([0, 0, 1]) - bias
To test that you did it correctly, you can try something like this:
assert(bias + f1 + f2 + f3 == my.model.eval([1, 1, 1]))
If I'm not mistaken, I think you're asking how to extract the W vector of the SVM, where W is defined as:
W = \sum_i y_i * \alpha_i * example_i
Ugh: don't know best way to write equations here, but this just is the sum of the weight * support vectors. After you calculate the W, you can extract the "weight" for the feature you want.
Assuming this is correct, you'd:
Get the indices of your data that are the support vectors
Get their weights (alphas)
Calculate W
kernlab stores the support vector indices and their values in a list (so it works on multiclass problems, too), anyway any use of list manipulation is just to get at the real data (you'll see that the length of the lists returned by alpha and alphaindex are just 1 if you just have a 2-class problem, which I'm assuming you do).
my.model <- ksvm(result ~ f1+f2+f3, data=gold, kernel="vanilladot", type="C-svc")
alpha.idxs <- alphaindex(my.model)[[1]] # Indices of SVs in original data
alphas <- alpha(my.model)[[1]]
y.sv <- gold$result[alpha.idxs]
# for unscaled data
sv.matrix <- as.matrix(gold[alpha.idxs, c('f1', 'f2', 'f3')])
weight.vector <- (y.sv * alphas) %*% sv.matrix
bias <- b(my.model)
kernlab actually scales your data first before doing its thing. You can get the (scaled) weights like so (where, I guess, the bias should be 0(?))
weight.vector <- (y.sv * alphas) %*% xmatrix(my.model)[[1]]
If I understood your question, this should get you what you're after.