I am new to the survey package and have a mystery problem. I have made data weights using anesrake package and then created a survey design.
I have a problem when using svyboxplot and a grouping variable:
It draws similar boxplots for each grouping category, which is not true
When I studied the problem subsetting each of my category (15 of them)
the values are different for each area / different boxplots for each area.
Can anyone help me? I am desperate!
Here`s sample to test
library(tidyverse)
col <- tibble(
name = c("seura 1", "seura 2", "seura 3", "seura 4", "seura 5", "seura 6", "seura 7", "seura 8", "seura 9"
, "seura 10", "seura 11", "seura 12"),
riistakeskus = c("Keski-Suomi","Keski-Suomi","Keski-Suomi","Keski-Suomi","Keski-Suomi","Satakunta","Satakunta",
"Satakunta","Uusimaa", "Uusimaa","Uusimaa","Uusimaa"),
hirvi_sarvisuositus = c(1,4,5,3,7,5,3,4,6,5,8,9),
weights = c(1.1461438,1.1461438,1.1461438,1.1461438,1.1461438,0.5107815,0.5107815,0.5107815,2.0461937,
2.0461937,2.0461937,2.0461937)
)
library(survey)
my_des1 <- svydesign(data = col, weights = ~weights, ids = ~1)
b <- svyboxplot(hirvi_sarvisuositus~factor(riistakeskus), my_des1, all.outliers = F, ylim = c(0,10))
svyboxplot(hirvi_sarvisuositus~1, subset(my_des1, riistakeskus == "Keski-Suomi"), ylim = c(0,10))
svyboxplot(hirvi_sarvisuositus~1, subset(my_des1, riistakeskus == "Satakunta"), ylim = c(0,10))
svyboxplot(hirvi_sarvisuositus~1, subset(my_des1, riistakeskus == "Uusimaa"), ylim = c(0,10))
I had the same problem and would like to add to Anthonys answer, but I cannot comment yet.
There is an error in survey:::svyboxplot.default as Anthony indicates but it does not seem to have anything to do with data points. If you use keep.var = FALSE with FUN=svyquantile it does return the overall quantiles instead of the group specific quantiles.
Compare
svyby(~hirvi_sarvisuositus, ~riistakeskus, my_des1, svyquantile, ci = FALSE,
keep.var = FALSE, quantiles = c(0, 0.25, 0.5, 0.75, 1),
na.rm = TRUE)
with
svyquantile(~hirvi_sarvisuositus, my_des1,
quantiles = c(0, 0.25, 0.5, 0.75, 1),
na.rm = TRUE)
Note that svyquantile can not compute the SE for some quantiles.
If you use keep.var=TRUE instead and try to extract the CIs, you get quantiles by group.
svyby(~hirvi_sarvisuositus, ~riistakeskus, my_des1, svyquantile,
quantiles = c(0, 0.25, 0.5, 0.75, 1), ci=TRUE, na.rm = TRUE,
keep.var = TRUE, vartype = "ci")
However, you can't change the svyquantile function options when calling svyboxplot. This needs to be fixed in the package. You could built your boxplots yourself instead. A simple base R solution:
q <- svyby(~hirvi_sarvisuositus, ~riistakeskus, my_des1, svyquantile,
quantiles = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE, ci=TRUE,
keep.var = TRUE,
vartype = "ci")
boxstats <- q[,2:6]
bxp(list(stats=t(as.matrix(boxstats)),
n = c(100,100,100),
names = rownames(boxstats)))
To prevent whiskers inside the box, you can change qrule to use a different way to calculate quantiles (e.g. qrule="hf7" for the quantile() default).
An alternative solution would be to use a weighted boxplot from ggplot2:
library(ggplot2)
ggplot(data=col, aes(y=hirvi_sarvisuositus, x=factor(riistakeskus), weight=weights)) +
geom_boxplot()
Please note that ggplot2 uses a slightly different estimation of the hinges, see help(geom_boxplot), which influences the results for low N.
great reproducible example, thank you! this result especially looks silly
svyboxplot(hirvi_sarvisuositus~riistakeskus,my_des1,ylim=c(0,10))
i think this largely happens because svyquantile just needs more data points to get reasonable estimates..
if you look at the code inside survey:::svyboxplot.default you can find the line that produces all of the same quantile results
svyby(~hirvi_sarvisuositus, ~riistakeskus, my_des1, svyquantile, ci = FALSE,
keep.var = FALSE, quantiles = c(0, 0.25, 0.5, 0.75, 1),
na.rm = TRUE)
not sure if this is really a bug that the survey package author would want to fix.. perhaps consider using the ?bxp function if your use case has that small of a dataset?
Related
I am performing a meta-analysis of proportions using metaprop function. I am looking at the prevalence of heart fibrosis in people living with HIV.
#mri$lgehivn <- number in people with HIV with fibrosis
#mri$lgehivn <- total number of peopl with HIV who have had CMR
lge.prop <- metaprop(event = mri$lgehivn,
n = mri$hivnmri,
subset = c(1:11, 13:16),
studlab = paper,
data = mri,
method = "Inverse"
sm = "PLOGIT",
random = TRUE,
hakn = FALSE,
pscale = 100,
digits = 1)
I am then passing this into a forest plot:
forest.meta(lge.prop,
rightcols=FALSE,
leftcols=c("studlab", "event", "n", "effect", "ci"),
leftlabs = c("Study", "Cases", "Total", "Prevalence", "95% C.I."),
xlim= c(0,110),
smlab = c("Prevalence of LGE (%)"),
digits = 1,
colgap.left = 1)
This then gives me the following forest plot:
Forest plot of meta analysis
I am trying to remove the line that reports the "Common effect model" and only show the random effect model.
Does anyone know the code for this?
Thank you!
I do not know if it is still relevant but I think that this was introduced with a newer version of the meta package.
When manually installing version 4.15-1, the common effect model was removed automatically.
Please use the bellow code before making forest figure
lge.prop <- metaprop(event = mri$lgehivn,
n = mri$hivnmri,
subset = c(1:11, 13:16),
studlab = paper,
data = mri,
method = "Inverse"
sm = "PLOGIT",
random = TRUE,
hakn = FALSE,
pscale = 100,
digits = 1,
common=F)
I prepared a code to visualize my data:
library(forestplot)
test_data <- data.frame(coef=c(1.14, 0.31, 10.70),
low=c(1.01, 0.12, 1.14),
high=c(1.30, 0.83, 100.16),
boxsize=c(0.2, 0.2, 0.2))
row_names <- cbind(c("Variable", "Variable 1", "Variable 2", "So looooooong and nasty name of the variable"),
c("OR", test_data$coef), c("CI -95%", test_data$low), c("CI +95%", test_data$high) )
test_data <- rbind(rep(NA, 4), test_data)
forestplot(labeltext = row_names,
mean = test_data$coef, upper = test_data$high,
lower = test_data$low,
is.summary=c(TRUE, FALSE, FALSE, FALSE),
boxsize = test_data$boxsize,
zero = 1,
xlog = TRUE,
xlab = "OR (95% CI)",
col = fpColors(lines="black", box="black"),
title="My Happy Happy Title \n o happy happy title...\n",
ci.vertices = TRUE,
xticks = c(0.1, 1, 10, 100))
It gives a following forestplot:
I would like to:
1) expand the plot and diminish font of the plot details on the left for better visualization
2) edit "So looooooong and nasty name of the variable" to move part "name..." below the row like:
"
So looooooong and nasty
name of the variable
"
However, when I write as "/nSo.../n" it gives another row of number from columns "OR" and "CIs".
How correct it?
Three possibilities (one more than you asked for):
1) change text of row labels with txt_gp.
2) cut column spacing from 6 mm default to half that value by passing colgap a grid call to unit. Fully understanding the options for forestplot requires understanding the grid system of plotting.
3) add a "\n" to the loooong label. (I'm puzzled you didn't see that possibility, since you already had a "\n" in the title.)
row_names <- cbind(c("Variable", "Variable 1", "Variable 2", "So looooooong and \nnasty name of the variable"),
c("OR", test_data$coef), c("CI -95%", test_data$low), c("CI +95%", test_data$high) )
forestplot(labeltext = row_names,
mean = test_data$coef, upper = test_data$high,
lower = test_data$low,
is.summary=c(TRUE, FALSE, FALSE, FALSE),
boxsize = test_data$boxsize,
zero = 1, colgap = unit(3, "mm"), txt_gp=fpTxtGp(label= gpar(cex = 0.7),
title = gpar(cex = 1) ),
xlog = TRUE,
xlab = "OR (95% CI)",
col = fpColors(lines="black", box="black"),
title="My Happy Happy Title \n o happy happy title...\n",
ci.vertices = TRUE,
xticks = c(0.1, 1, 10, 100))
If I only used a cex of 0.7 in the call to gpar passed to 'label', it also affected the size of the title, so I needed to "reset" the 'cex' of the 'title' back to 1.
Complete beginner at R here trying to perform nonmetric multidimensional scaling on a 95x95 matrix of similarities where 8 corresponds to very similar and 1 corresponds to very dissimilar. I also have an additional column (96th) signifying type and ranging from 0 to 1.
First I load the data:
dsimilarity <- read.table("d95x95matrix.txt",
header = T,
row.names = c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6", "Y7", "Y8", "Y9", "Y10", "Y11", "Y12", "Y13", "Y14", "Y15", "Y16", "Y17", "Y18", "Y19", "Y20",
"Y21", "Y22", "Y23", "Y24", "Y25", "Y26", "Y27", "Y28", "Y29", "Y30", "Y31", "Y32", "Y33", "Y34", "Y35", "Y36", "Y37", "Y38", "Y39", "Y40",
"Y41", "Y42", "Y43", "Y44", "Y45", "Y46", "Y47", "Y48", "Y49", "Y50", "Y51", "Y52", "Y53", "Y54", "Y55", "Y56", "Y57", "Y58", "Y59", "Y60",
"Y61", "Y62", "Y63", "Y64", "Y65", "Y66", "Y67", "Y68", "Y69", "Y70", "Y71", "Y72", "Y73", "Y74", "Y75", "Y76", "Y77", "Y78", "Y79", "Y80",
"Y81", "Y82", "Y83", "Y84", "Y85", "Y86", "Y87", "Y88", "Y89", "Y90", "Y91", "Y92", "Y93", "Y94", "Y95"))
I convert the matrix of similarities into a matrix of dissimilarities, and exclude the 96th column:
ddissimilarity <- dsimilarity; ddissimilarity[1:95, 1:95] = 8 - ddissimilarity[1:95, 1:95]
Then I perform the nonmetric MDS using the Smacof function:
ordinal.mds.results <- smacofSym(ddissimilarity[1:95, 1:95],
type = c("ordinal"),
ndim = 2,
ties = "primary",
verbose = T )
I create a new data frame (I'm following a guide and don't really know what's going on here):
mds.config <- as.data.frame(ordinal.mds.results$conf)
All well and good thus far (to my knowledge). However at this point I will try to create an xyplot of the data and get a good result using this code:
xyplot(D2 ~ D1, data = mds.config,
aspect = 1,
main = "Figure 1. MDS solution",
panel = function (x, y) {
panel.xyplot(x, y, col = "black")
panel.text(x, y-.03, labels = rownames(mds.config),
cex = .75)
},
xlab = "MDS Axis 1",
ylab = "MDS Axis 2",
xlim = c(-1.1, 1.1),
ylim = c(-1.1, 1.1))
Now I want to create a figure that incorporates the type in column 96th and assigns different colors to observations of the two different types. However, can't quite figure out how to do so. Does anyone have any ideas of where I'm going wrong here?
xyplot(D2 ~ D1, data = mds.config ~ ddissimilarity[96:96, 96:96],
aspect = 1,
main = "Figure 1. MDS solution",
panel = function (x, y) {
panel.xyplot(x, y, col = "black")
panel.text(x, y-.03, labels = rownames(mds.config),
cex = .75)
},
xlab = "MDS Axis 1",
ylab = "MDS Axis 2",
xlim = c(-1.1, 1.1),
ylim = c(-1.1, 1.1),
group = "Type")
So I'm doing a meta-analysis using the meta.for package in R. I am preparing figures for publication in a scientific journal and i would like to add p-values to my forest plots but with scientific annotation formatted as x10-04 rather than standard
e-04
However the argument ilab in the forest function does not accept expression class objects but only vectors
Here is an example :
library(metafor)
data(dat.bcg)
## REM
res <- rma(ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg,
measure = "RR",
slab = paste(author, year, sep = ", "), method = "REML")
# MADE UP PVALUES
set.seed(513)
p.vals <- runif(nrow(dat.bcg), 1e-6,0.02)
# Format pvalues so only those bellow 0.01 are scientifically notated
p.vals <- ifelse(p.vals < 0.01,
format(p.vals,digits = 3,scientific = TRUE,trim = TRUE),
format(round(p.vals, 2), nsmall=2, trim=TRUE))
## Forest plot
forest(res, ilab = p.vals, ilab.xpos = 3, order = "obs", xlab = "Relative Risk")
I want the scientific notation of the p-values to be formatted as x10-04
All the answers to similar questions that i've seen suggest using expression() but that gives Error in cbind(ilab) : cannot create a matrix from type 'expression' which makes sense because the help file on forest specifies that the ilab argument should be a vector.
Any ideas on how I can either fix this or work around it?
A hacky solution would be to
forest.rma <- edit(forest.rma)
Go to line 574 and change
## line 574
text(ilab.xpos[l], rows, ilab[, l], pos = ilab.pos[l],
to
text(ilab.xpos[l], rows, parse(text = ilab[, l]), pos = ilab.pos[l],
fix your p-values and plot
p.vals <- gsub('e(.*)', '~x~10^{"\\1"}', p.vals)
forest(res, ilab = p.vals, ilab.xpos = 3, order = "obs", xlab = "Relative Risk")
I'm using heatmap to plot the leader for each of the respective pitching performance categories for some baseball data. My problem is that I need to reverse the "heat" of just one of the columns, because the best ERA is the lowest, not the highest. Here's the code. mlb2010 is data that was imported from a SQL database via RSQLite.
mlb10 <- sapply(2:length(mlb2010), function(i) {
mlb2010[, i] <- as.numeric(mlb2010[, i])
})
rc <- rainbow(nrow(mlb10), start = 0, end = .3)
cc <- rainbow(ncol(mlb10), start = 0, end = .3)
heatmap(mlb10, col = rev(heat.colors(256)), scale = "column",
Rowv = NULL, Colv = NA, RowSideColors = rc, ColSideColors = cc,
margins = c(5,10), labRow = c(mlb2010$team), labCol = names(al2010)[-1],
xlab = "Performance factors", ylab = "Team",
main = c("Relating Performance to Payroll", "2010 MLB Season"))
I have tried the revC argument in heatmap with no success. Is that what I should be using? Or does that reorder all of the columns, and not what is inside the column? I've also tried an sapply over the colors to no avail.
Any help would be greatly appreciated.
Per request from OP, posting the basics of the solution.
Just do ml10$ERA <- -ml10$ERA to reverse the order, then plot as in the post.