I am performing a meta-analysis of proportions using metaprop function. I am looking at the prevalence of heart fibrosis in people living with HIV.
#mri$lgehivn <- number in people with HIV with fibrosis
#mri$lgehivn <- total number of peopl with HIV who have had CMR
lge.prop <- metaprop(event = mri$lgehivn,
n = mri$hivnmri,
subset = c(1:11, 13:16),
studlab = paper,
data = mri,
method = "Inverse"
sm = "PLOGIT",
random = TRUE,
hakn = FALSE,
pscale = 100,
digits = 1)
I am then passing this into a forest plot:
forest.meta(lge.prop,
rightcols=FALSE,
leftcols=c("studlab", "event", "n", "effect", "ci"),
leftlabs = c("Study", "Cases", "Total", "Prevalence", "95% C.I."),
xlim= c(0,110),
smlab = c("Prevalence of LGE (%)"),
digits = 1,
colgap.left = 1)
This then gives me the following forest plot:
Forest plot of meta analysis
I am trying to remove the line that reports the "Common effect model" and only show the random effect model.
Does anyone know the code for this?
Thank you!
I do not know if it is still relevant but I think that this was introduced with a newer version of the meta package.
When manually installing version 4.15-1, the common effect model was removed automatically.
Please use the bellow code before making forest figure
lge.prop <- metaprop(event = mri$lgehivn,
n = mri$hivnmri,
subset = c(1:11, 13:16),
studlab = paper,
data = mri,
method = "Inverse"
sm = "PLOGIT",
random = TRUE,
hakn = FALSE,
pscale = 100,
digits = 1,
common=F)
Related
I'm trying to create a plot in R that would generate a table of the survival probabilities at specified points in time in a table.
Currently the plot looks like the following:
R code for the plot using the survminer package:
ggsurvplot(fit,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Ideally I would like a table below the "Number at risk by time" to display the survival probabilities for each strata at times 250, 500, 750, and 1000.
I can retrieve the survival probabilities with the following code:
summary(fit, times=0:1000)
I made a function for that a wile back. It takes as an argument a survfit object and a time sequence and returns the survival probabilities.
ConstruirTabela = function(a, sequencia = seq(250,1000,by=250)){
quebra=NULL
for(i in 1:(length(a$time)-1)){
if(a$time[i] > a$time[i+1]){
quebra = c(quebra,i)
}
}
quebra= c(quebra,length(a$time))
lsurv = list()
ltime = list()
previous = 0
for(i in 1:length(quebra)){
periodo = c((previous+1):quebra[i])
lsurv[[i]] = a$surv[periodo]
ltime[[i]] = a$time[periodo]
previous = quebra[i]
}
matriz=matrix(ncol=length(ltime),nrow=length(sequencia))
for(i in 1:length(sequencia)){
for(j in 1:length(ltime)){
indice = which.min(abs(ltime[[j]]-sequencia[i]))
matriz[i,j] = lsurv[[j]][indice]
}
}
retorno = as.data.frame(matriz)
f=strsplit(names(a$strata),"=")
names(retorno) = sapply(f, "[[", 2)
rownames(retorno) = as.character(sequencia)
return(retorno)}
It's probably not the best way to achieve this, but check if it works for you.
Try this ggpubr library. Look at the very bottom of this page. It shows a graph with a text table.
Given a data frame containing mixed variables (i.e. both categorical and continuous) like,
digits = 0:9
# set seed for reproducibility
set.seed(17)
# function to create random string
createRandString <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
df <- data.frame(ID=c(1:10), name=sample(letters[1:10]),
studLoc=sample(createRandString(10)),
finalmark=sample(c(0:100),10),
subj1mark=sample(c(0:100),10),subj2mark=sample(c(0:100),10)
)
I perform unsupervised feature selection using the package FactoMineR
df.princomp <- FactoMineR::FAMD(df, graph = FALSE)
The variable df.princomp is a list.
Thereafter, to visualize the principal components I use
fviz_screeplot() and fviz_contrib() like,
#library(factoextra)
factoextra::fviz_screeplot(df.princomp, addlabels = TRUE,
barfill = "gray", barcolor = "black",
ylim = c(0, 50), xlab = "Principal Component",
ylab = "Percentage of explained variance",
main = "Principal Component (PC) for mixed variables")
factoextra::fviz_contrib(df.princomp, choice = "var",
axes = 1, top = 10, sort.val = c("desc"))
which gives the following Fig1
and Fig2
Explanation of Fig1: The Fig1 is a scree plot. A Scree Plot is a simple line segment plot that shows the fraction of total variance in the data as explained or represented by each Principal Component (PC). So we can see the first three PCs collectively are responsible for 43.8% of total variance. The question now naturally arises, "What are these variables?". This I have shown in Fig2.
Explanation of Fig2: This figure visualizes the contribution of rows/columns from the results of Principal Component Analysis (PCA). From here I can see the variables, name, studLoc and finalMark are the most important variables that can be used for further analysis.
Further Analysis- where I'm stuck at: To derive the contribution of the aforementioned variables name, studLoc, finalMark. I use the principal component variable df.princomp (see above) like df.princomp$quanti.var$contrib[,4]and df.princomp$quali.var$contrib[,2:3].
I've to manually specify the column indices [,2:3] and [,4].
What I want: I want to know how to do dynamic column index assignment, such that I do not have to manually code the column index [,2:3] in the list df.princomp?
I've already looked at the following similar questions 1, 2, 3 and 4 but cannot find my solution? Any help or suggestions to solve this problem will be helpful.
Not sure if my interpretation of your question is correct, apologies if not. From what I gather you are using PCA as an initial tool to show you what variables are the most important in explaining the dataset. You then want to go back to your original data, select these variables quickly without manual coding each time, and use them for some other analysis.
If this is correct then I have saved the data from the contribution plot, filtered out the variables that have the greatest contribution, and used that result to create a new data frame with these variables alone.
digits = 0:9
# set seed for reproducibility
set.seed(17)
# function to create random string
createRandString <- function(n = 5000) {
a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}
df <- data.frame(ID=c(1:10), name=sample(letters[1:10]),
studLoc=sample(createRandString(10)),
finalmark=sample(c(0:100),10),
subj1mark=sample(c(0:100),10),subj2mark=sample(c(0:100),10)
)
df.princomp <- FactoMineR::FAMD(df, graph = FALSE)
factoextra::fviz_screeplot(df.princomp, addlabels = TRUE,
barfill = "gray", barcolor = "black",
ylim = c(0, 50), xlab = "Principal Component",
ylab = "Percentage of explained variance",
main = "Principal Component (PC) for mixed variables")
#find the top contributing variables to the overall variation in the dataset
#here I am choosing the top 10 variables (although we only have 6 in our df).
#note you can specify which axes you want to look at with axes=, you can even do axes=c(1,2)
f<-factoextra::fviz_contrib(df.princomp, choice = "var",
axes = c(1), top = 10, sort.val = c("desc"))
#save data from contribution plot
dat<-f$data
#filter out ID's that are higher than, say, 20
r<-rownames(dat[dat$contrib>20,])
#extract these from your original data frame into a new data frame for further analysis
new<-df[r]
new
#finalmark name studLoc
#1 53 b POTYQ0002N
#2 73 i LWMTW1195I
#3 95 d VTUGO1685F
#4 39 f YCGGS5755N
#5 97 c GOSWE3283C
#6 58 g APBQD6181U
#7 67 a VUJOG1460V
#8 64 h YXOGP1897F
#9 15 j NFUOB6042V
#10 81 e QYTHG0783G
Based on your comment, where you said you wanted to 'Find variables with value greater than 5 in Dim.1 AND Dim.2 and save these variables to a new data frame', I would do this:
#top contributors to both Dim 1 and 2
f<-factoextra::fviz_contrib(df.princomp, choice = "var",
axes = c(1,2), top = 10, sort.val = c("desc"))
#save data from contribution plot
dat<-f$data
#filter out ID's that are higher than 5
r<-rownames(dat[dat$contrib>5,])
#extract these from your original data frame into a new data frame for further analysis
new<-df[r]
new
(This keeps all the original variables in our new data frame since they all contributed more than 5% to the total variance)
There are a lot of ways to extract contributions of individual variables to PCs. For numeric input, one can run a PCA with prcomp and look at $rotation (I spoke to soon and forgot you've got factors here so prcomp won't work directly). Since you are using factoextra::fviz_contrib, it makes sense to check how that function extracts this information under the hood. Key factoextra::fviz_contrib and read the function:
> factoextra::fviz_contrib
function (X, choice = c("row", "col", "var", "ind", "quanti.var",
"quali.var", "group", "partial.axes"), axes = 1, fill = "steelblue",
color = "steelblue", sort.val = c("desc", "asc", "none"),
top = Inf, xtickslab.rt = 45, ggtheme = theme_minimal(),
...)
{
sort.val <- match.arg(sort.val)
choice = match.arg(choice)
title <- .build_title(choice[1], "Contribution", axes)
dd <- facto_summarize(X, element = choice, result = "contrib",
axes = axes)
contrib <- dd$contrib
names(contrib) <- rownames(dd)
theo_contrib <- 100/length(contrib)
if (length(axes) > 1) {
eig <- get_eigenvalue(X)[axes, 1]
theo_contrib <- sum(theo_contrib * eig)/sum(eig)
}
df <- data.frame(name = factor(names(contrib), levels = names(contrib)),
contrib = contrib)
if (choice == "quanti.var") {
df$Groups <- .get_quanti_var_groups(X)
if (missing(fill))
fill <- "Groups"
if (missing(color))
color <- "Groups"
}
p <- ggpubr::ggbarplot(df, x = "name", y = "contrib", fill = fill,
color = color, sort.val = sort.val, top = top, main = title,
xlab = FALSE, ylab = "Contributions (%)", xtickslab.rt = xtickslab.rt,
ggtheme = ggtheme, sort.by.groups = FALSE, ...) + geom_hline(yintercept = theo_contrib,
linetype = 2, color = "red")
p
}
<environment: namespace:factoextra>
So it's really just calling facto_summarize from the same package. By analogy you can do the same thing, simply call:
> dd <- factoextra::facto_summarize(df.princomp, element = "var", result = "contrib", axes = 1)
> dd
name contrib
ID ID 0.9924561
finalmark finalmark 21.4149175
subj1mark subj1mark 7.1874438
subj2mark subj2mark 16.6831560
name name 26.8610132
studLoc studLoc 26.8610132
And that's the table corresponding to your figure 2. For PC2 use axes = 2 and so on.
Regarding "how to programmatically determine the column indices of the PCs", I'm not 100% sure I understand what you want, but if you just want to say for column "finalmark", grab its contribution to PC3 you can do the following:
library(tidyverse)
# make a tidy table of all column names in the original df with their contributions to all PCs
contribution_df <- map_df(set_names(1:5), ~factoextra::facto_summarize(df.princomp, element = "var", result = "contrib", axes = .x), .id = "PC")
# get the contribution of column 'finalmark' by name
contribution_df %>%
filter(name == "finalmark")
# get the contribution of column 'finalmark' to PC3
contribution_df %>%
filter(name == "finalmark" & PC == 3)
# or, just the numeric value of contribution
filter(contribution_df, name == "finalmark" & PC == 3)$contrib
BTW I think ID in your example is treated as numeric instead of factor, but since it's just an example I'm not bothering with it.
I'm trying to plot a K-Means cluster to analyze different categories of products based on their inventory average and sold quantity.
All values are non-negative and of the same measurement unit.
I don't know what I did wrong and the results contain point with negative values. Actually, I believe all the points given in the plot aren't actual valid points from my data.
Here is my code:
reduced_dataset = dataset[1:20, 4:5]
# Using the elbow method to find the optimal number of clusters
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(reduced_dataset, i)$withinss)
plot(1:10,
wcss,
type = 'b',
main = paste('The Elbow Method'),
xlab = 'Number of clusters',
ylab = 'WCSS')
# As a result, number of clusters should be 2
# Fitting K-Means to the dataset
kmeans = kmeans(x = reduced_dataset, centers = 2)
y_kmeans = kmeans$cluster
# Visualising the clusters
library(cluster)
clusplot(reduced_dataset,
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste('Clusters of categories - NOT ON SALE'),
xlab = 'Average Sold Quantity',
ylab = 'Average Inventory')
dput(reduced_dataset):
structure(list(Avg_Sold_No_Promo = c(0.255722695, 1.139983236,
0.458651842, 0.784966698, 1.642746914, 0.115264798, 7.50338696,
0.487603306, 1.023373984, 0.956099815, 1.505901506, 0.253837072,
0.834963325, 0.880898876, 6.527699531, 11.54054054, 3.44077135,
0.750182882, 0.251033058, 1.875698324), Avg_Inventory_No_Promo =
c(6.068672335,
22.57865326, 9.00694927, 11.56137012, 28.47530864, 7.485981308,
170.9064352, 11.07438017, 22.80792683, 40.63863216, 41.73463573,
10.87603306, 35.87408313, 46.09213483, 185.5671362, 315.6015693,
165.1129477, 78.18032187, 9.65857438, 198.4385475)), .Names =
c("Avg_Sold_No_Promo",
"Avg_Inventory_No_Promo"), row.names = c(NA, 20L), class = "data.frame")
Can someone please help me?
The clusplot function does this automatically.
It is called PCA, and that is also why you get the line with the variability explained there.
So I'm doing a meta-analysis using the meta.for package in R. I am preparing figures for publication in a scientific journal and i would like to add p-values to my forest plots but with scientific annotation formatted as x10-04 rather than standard
e-04
However the argument ilab in the forest function does not accept expression class objects but only vectors
Here is an example :
library(metafor)
data(dat.bcg)
## REM
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg,
measure = "RR",
slab = paste(author, year, sep = ", "), method = "REML")
# MADE UP PVALUES
set.seed(513)
p.vals <- runif(nrow(dat.bcg), 1e-6,0.02)
# Format pvalues so only those bellow 0.01 are scientifically notated
p.vals <- ifelse(p.vals < 0.01,
format(p.vals,digits = 3,scientific = TRUE,trim = TRUE),
format(round(p.vals, 2), nsmall=2, trim=TRUE))
## Forest plot
forest(res, ilab = p.vals, ilab.xpos = 3, order = "obs", xlab = "Relative Risk")
I want the scientific notation of the p-values to be formatted as x10-04
All the answers to similar questions that i've seen suggest using expression() but that gives Error in cbind(ilab) : cannot create a matrix from type 'expression' which makes sense because the help file on forest specifies that the ilab argument should be a vector.
Any ideas on how I can either fix this or work around it?
A hacky solution would be to
forest.rma <- edit(forest.rma)
Go to line 574 and change
## line 574
text(ilab.xpos[l], rows, ilab[, l], pos = ilab.pos[l],
to
text(ilab.xpos[l], rows, parse(text = ilab[, l]), pos = ilab.pos[l],
fix your p-values and plot
p.vals <- gsub('e(.*)', '~x~10^{"\\1"}', p.vals)
forest(res, ilab = p.vals, ilab.xpos = 3, order = "obs", xlab = "Relative Risk")
R version: 3.4.2
I'm using rugarch and mgarch to spec and fit model with DCC to my data. The model is generated successfully, however I'm unable to generate the plots. Here's a snippet of my code:
library(rugarch)
library(rmgarch)
da=read.table("d-msft3dx0113.txt",header=T)
MSFT.ret = da[,3]
GSPC.ret = da[,6]
MSFT.GSPC.ret = cbind(MSFT.ret,GSPC.ret)
garch11.spec = ugarchspec(mean.model = list(armaOrder = c(0,0)),
variance.model = list(garchOrder = c(1,1),
model = "sGARCH"),
distribution.model = "norm")
dcc.garch11.spec = dccspec(uspec = multispec( replicate(2, garch11.spec) ),
dccOrder = c(1,1),
distribution = "mvnorm")
dcc.fit = dccfit(dcc.garch11.spec, data = MSFT.GSPC.ret)
dcc.fcst = dccforecast(dcc.fit, n.ahead=100)
plot(dcc.fcst)
When I call for plot, I get this error:
plot(dcc.fcst)
Make a plot selection (or 0 to exit):
Conditional Mean Forecast (vs realized returns)
Conditional Sigma Forecast (vs realized |returns|)
Conditional Covariance Forecast
Conditional Correlation Forecast
EW Portfolio Plot with forecast conditional density VaR limits
Selection: 1
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) :
plot.new has not been called yet
I then give it a new plot area:
plot.new()
plot(dcc.fcst)
Which gives me this unhelpful plot:
Selection1Plot
I have the same question, too. I don't know why plot(dcc.fic) cannot work. So I do it manually to extract the correlation and covariance. rcov and rcor are two functions to extract what we need.
plot(rcov(dcc.fit)[1,2,], type = "l", col = "blue",
main = "Conditional Covariance", xlab = "Time",
ylab = "Covariance")
plot(rcor(dcc.fit)[1,2,], type = "l", col = "purple",
main = "Conditional Correlation", xlab = "Time",
ylab = "Correlation")