Add subgroups to a meta-analysis - metafor

I have recently been teaching myself how to perform a meta-analysis in R. My current project is looking at various functional outcomes following certain head and neck cancer surgeries. I have used the following code to produce a forest plot of results.
#Load packages
library(tidyverse)
library(meta)
library(metafor)
Overall long term feeding tube
data <- gastrostomy_overall
sapply(data,class)
data_all= subset(data, events_all >=0)
meta_all <- metaprop(events_all, total_all, studlab=Study, sm="PLOGIT", data=data_all, method="GLMM", method.tau="ML")
summary(metaALL)
forest.meta(meta_all, layout="RevMan5", xlab="Proportion", comb.r=T, comb.f=F, xlim = c(0,1), fontsize=10, digits=2, subgroup= surgery, subgroup.name = surgery)
The data I am using contains results from 28 different studies. I am eager to subgroup the results and perform a metaregression between different surgery types, which is labelled as "Surgery" within the uploaded CSV file, but I have been having difficulty producing code that works. How can I add subgroups and metaregression to the above code?
Thanks

Related

Using Amelia and decision trees

I have a panel dataset (countries and years) with a lot of missing data so I've decided to use multiple imputation. The goal is to see the relationship between the proportion of women in management (managerial_value) and total fatal workplace injuries (total_fatal)
From what I've read online, Amelia is the best option for panel data so I used that like so:
amelia_data <- amelia(spdata, ts = "year", cs = "country", polytime = 1,
intercs = FALSE)
where spdata is my original dataset.
This imputation process worked, but I'm unsure of how to proceed with forming decision trees using the imputed data (an object of class 'amelia').
I originally tried creating a function (amelia2df) to turn each of the 5 imputed datasets into a data frame:
amelia2df <- function(amelia_data, which_imp = 1) {
stopifnot(inherits(amelia_data, "amelia"), is.numeric(which_imp))
imps <- amelia_data$imputations[[which_imp]]
as.data.frame(imps)
}
one_amelia <- amelia2df(amelia_data, which_imp = 1)
two_amelia <- amelia2df(amelia_data, which_imp = 2)
three_amelia <- amelia2df(amelia_data, which_imp = 3)
four_amelia <- amelia2df(amelia_data, which_imp = 4)
five_amelia <- amelia2df(amelia_data, which_imp = 5)
where one_amelia is the data frame for the first imputed dataset, two_amelia is the second, and so on.
I then combined them using rbind():
total_amelia <- rbind(one_amelia, two_amelia, three_amelia, four_amelia, five_amelia)
And used the new combined dataset total_amelia to construct a decision tree:
set.seed(300)
tree_data <- total_amelia
I_index <- sample(1:nrow(tree_data), size = 0.75*nrow(tree_data), replace=FALSE)
I_train <- tree_data[I_index,]
I_test <- tree_data[-I_index,]
fatal_tree <- rpart(total_fatal ~ managerial_value, I_train)
rpart.plot(fatal_tree)
fatal_tree
This "works" as in it doesn't produce an error, but I'm not sure that it is appropriately using the imputed data.
I found a couple resources explaining how to apply least squares, logit, etc., but nothing about decision trees. I'm under the impression I'd need the 5 imputed datasets to be combined into one data frame, but I have not been able to find a way to do that.
I've also looked into Zelig and bind_rows but haven't found anything that returns one data frame that I can then use to form a decision tree.
Any help would be appreciated!
As already indicated by #Noah, you would set up the multiple imputation workflow different than you currently do.
Multiple imputation is not really a tool to improve your results or to make them more correct.
It is a method to enable you to quantify the uncertainty caused by the missing data, that comes along with your analysis.
All the different datasets created by multiple imputation are plausible imputations, because of the uncertainty, you don't know, which one is correct.
You would therefore use multiple imputation the following way:
Create your m imputed datasets
Build your trees on each imputed dataset separately
Do you analysis on each tree separately
In your final paper, you can now state how much uncertainty is caused trough the missing values/imputation
This means you get e.g. 5 different analysis results for m = 5 imputed datasets. First this looks confusing, but this enables you to give bounds, between the correct result probably lies. Or if you get completely different results for each imputed dataset, you know, there is too much uncertainty caused by the missing values to give reliable results.

Plotting different mixture model clusters in the same curve

I have two sets of data, one representing a healthy data set having 4 variables and 11,000 points and another representing a faulty set having 4 variables and 600 points. I have used R's package MClust to obtain GMM clustering for each data set separately. What I want to do is to obtain both clusters in the same frame so as to study them at the same time. How can that be done?
I have tried joining both the datasets but the result I am obtaining is not what I want.
The code in use is:
Dat4M <- Mclust(Dat3, G = 3)
Dat3 is where I am storing my dataset, Dat4M is where I store the result of Mclust. G = 3 is the number of Gaussian mixtures I want, which in this case is three. To plot the result, the following code is used:
plot(Dat4M)
The following is obtained when I apply the above code in my Healthy dataset:
The following is obtained when the above code is used on Faulty dataset:
Notice that in the faulty data density curve, consider the mixture of CCD and CCA, we see that there are two density points that have been obtained. Now, I want to place the same in the same block in the healthy data and study the differences.
Any help on how to do this will be appreciated.

Labeling the centroids of a PCoA based on betadisper() multivariate dispersions in R

I've used the function betadisper() in the vegan package to generate multivariate dispersions and plot those data in a PCoA. In this example I'll be looking at the difference between the sexes in a singular species.
Load the original data. For our purposes this can legit be anything here. The data I'm using isn't special. Its feature measurements are from a bioacoustic dataset. I am walking through my process:
my_original_data = read.csv("mydata.csv", as.is = T, check.names = F)
#Just extract the numeric/quantitative data.
myData=my_original_data[, 13:107]
Based on previous research, we used an unsupervised randomForest to determine similarity within our original feature measurements:
require(randomForest)
full_urf = randomForest(myData, proximity=T, scale=TRUE, ntree=4999,importance = TRUE)
A index was then generated using the proximity matrix:
urf_dist_full = as.dist(1-full_urf$proximity)
An permutational MANOVA was run on the resulting index using the vegan package. The use of the pMANOVA was well researched and is the correct test for my purposes:
mod=adonis(formula = urf_dist_full ~ Sex * Age * Variant, data = my_original_data, permutations = 999, method = "euclidean")
my_original_data had qualitative factors, Sex, Age and Variant. I could have extracted them, but it seemed cleaner to keep them within the original dataset.
After running a few homogeneity tests, I want to plot the multivariate dispersions. To do this I have been using the betadisper function:
Sex=betadisper(urf_dist_full,my_original_data$Sex)
plot(Sex, main="Sex Multivariate Dispersions")
That plots this beauty:
How can I label the centroids as Male and Female? I also want to run this plot for the Variant category, but that has five factors rather than two, which really warrants labeling.
I've seen the boxplot() variant of this, but I like how the PCoA also shows clustering.
You can add labels to centroids like this:
ordilabel(scores(Sex, "centroids"))
where Sex is your betadisper result. If you do not want to use the original names of your centroids, you can change the names with:
ordilabel(scores(Sex, "centroids"), labels=c("A","B"))
You can use the identify-function:
A <- plot(sex)
identify(A, "centroids")
Or look at the scores (this don't add labels to the plot, but shows you the centroid position)
scores(sex, 1:2, display = "centroids")

Creating a dendrogram with the results from the results the multipatt function in the indicspecies package

I am getting familiar with the multipatt function in indicspecies package. Thus far I only see summary being used to give a breakdown of the results. However I would like a dendrogram, ideally with the names of the species which are more 'indicative' of my given community location.
example from package file:
library(indicspecies)
library(stats)
data(wetland) ## Loads species data
wetkm = kmeans(wetland, centers=3) ## Creates three clusters using kmeans
## Runs the combination analysis using IndVal.g as statistic
wetpt = multipatt(wetland, wetkm$cluster, control = how(nperm=999))
## Lists those species with significant association to one combination
summary(wetpt)
wetpt gives the raw results but I am not sure how to proceed to get a cluster plot out of this result. Can anyone offer any pointers?

Variable Clustering (varclus) Summary Tables

I am using varclus from the Hmisc package in R. Are there ways to produce summary tables from varclus like those is in SAS (e.g. Output 100.1.2 and Output 100.1.3 ) in R. Basically, I would like to know the information that is contained in the plot in a tabular or matrix form. For example: what variables are in what clusters (in SAS cluster structure), proportion of variance they explain, etc.
# varclust example in R using mtcars data
mtc <- mtcars[,2:8]
mtcn <- data.matrix(mtc)
clust <- varclus(mtcn)
clust
plot(clust)
#cut_tree <- cutree(varclus(mtcn)$hclust, k=5) # This would show group membership, but only after I chose some a cut point, not what I am after

Resources