R: label variables based on their prefix - r

Im new to R and trying to label multiple but not all variables of my data at the same time. Specifically, I want to label the variables starting with "pol". I tried to combine the select and the set_variable_labels command in the following manner:
cp14 <- cp14 %>%
select(matches("pol")) %>%
set_variable_labels(cp14,
labels = "Interest in politics")
I would like all variables that include "pol" to be labelled as "Interest in politics". This however does not work. Any advice on how to do this in a similar or completely different manner is greatly appreciated.
My data looks something like this, but with many more variables:
structure(list(pol_interest_w1 = c(0.5, 0.5, 0.25, 0.25, 0.25,
0.5), pol_interest_w2 = c(0.5, 0.5, 0.25, NA, 0.25, 0.5), pol_interest_w3 = c(0.5,
0.5, 0.25, NA, 0, 0.5), pol_interest_w4 = c(0.5, 0.5, 0.25, NA,
0, 0.5), pol_interest_w5 = c(0.5, 0.5, 0.25, NA, 0, 0.5), pol_interest_w6 = c(0.5,
0.5, 0.25, NA, 0, 0.5), pol_interest_w7 = c(0.5, 0.5, 0.25, NA,
0.25, 0.5), new_col = c(0.75, 0.5, 0.25, NA, 0.25, 0.5)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

You can do this in a couple ways. For either solution, start by creating a vector of variable names starting with "pol". (I use stringr::str_starts() here; you don’t want to use select(), as in your code, which is for subsetting columns from your dataset.)
library(stringr)
library(labelled)
pol_vars <- names(cp14)[str_starts(names(cp14), "pol")]
Then, you can make a named list of labels, and pass it to the .labels argument of labelled::set_variable_labels().
pol_labels <- setNames(
as.list(rep("Interest in politics", length(pol_vars))),
pol_vars
)
cp14 <- set_variable_labels(cp14, .labels = pol_labels)
Alternatively, you could loop over the variable names and assign labels using labelled::var_label().
for (v in pol_vars) {
var_label(cp14[[v]]) <- "Interest in politics"
}
Both approaches yield the same result:
#> var_label(cp14)
$pol_interest_w1
[1] "Interest in politics"
$pol_interest_w2
[1] "Interest in politics"
$pol_interest_w3
[1] "Interest in politics"
$pol_interest_w4
[1] "Interest in politics"
$pol_interest_w5
[1] "Interest in politics"
$pol_interest_w6
[1] "Interest in politics"
$pol_interest_w7
[1] "Interest in politics"
$new_col
NULL

Related

Change font of specific rows to bold in forestplot

I wrote a script using the "forestplot" package. I want to group the variables in certain categories, which I would like to show in bold, in order to accentuate those categories. How can i adjust my script, so that only certain rows, i.e Risk factor OR (95% CI), patient characteristics, medication history, comorbidities, surgical history and other are shown in bold? I have two colums and 18 rows. Can someone help me? I would be much grateful!!
My script is as below:
tabletext <- cbind(
c("Risk factor" ,"Patient characteristics","Sex, male*", "Bmi (5 points)",
"Alcohol (5 units)", "Smoking*","Medication history",
"Steroid use", "Anticoagulant use*","Comorbidities",
"COPD GOLD 1/2", "COPD GOLD 3/4", "Other pulmonary disease",
"Surgical history",
"Previous colorectal surgery*",
"Previous abdominal surgery (other)","Other", "HIPEC*"),
c("OR (95% CI)",NA, "1.78 (1.20-2.68)", "1.15 (0.95-1.38)", "1.04 (0.94-1.14)",
"1.78 (1.11-2.80)", NA," 1.40 (0.68-2.67)", "1.55 (1.02-2.32)",NA,
"1.40 (0.70-2.61)", "1.56 (0.42-4.67)", "1.78 (0.63-4.28)",NA,
"1.61 (1.03-2.49)", "0.80 (0.47-1.32)",NA, "4.14 (2.14-7.73)"))
?fpTxtGp
require(forestplot)
forestplot(tabletext,
txt_gp = fpTxtGp(label = list(gpar(fontfamily = "Times",
fontface="bold"),
gpar(fontfamily = "",
col = "black"))),
df_c,new_page = TRUE,
boxsize = 0.2,
is.summary = c(rep(FALSE,32)),
clip = c(0,17),
xlab = 'Odds ratio with 95% confidence interval
* indicates significance',
xlog = FALSE,
zero = 1,
plotwidth=unit(12, "cm"),
colgap=unit(2, "mm"),
col = fpColors(box = "royalblue",
line = "darkblue",
summary = "royalblue"))
Its not clear what df_c is so I just created it based on your tabletext matrix:
df_c <- data.frame(mean = c(NA, NA, 1.78, 1.15, 1.04, 1.78, NA, 1.4, 1.55,
NA, 1.4, 1.56, 1.78, NA, 1.61, 0.8, NA, 4.14),
lower = c(NA, NA, 1.2, 0.95, 0.94, 1.11, NA, 0.68, 1.02, NA, 0.7,
0.42, 0.63, NA, 1.03, 0.47, NA, 2.14),
upper = c(NA, NA, 2.68, 1.38,1.14, 2.8, NA, 2.67,2.32, NA,
2.61, 4.67, 4.28, NA, 2.49, 1.32, NA, 7.73))
From there, its just a matter of adjusting the values passed to is.summary:
forestplot(tabletext,
txt_gp = fpTxtGp(label = list(gpar(fontfamily = "Times"),
gpar(fontfamily = "",
col = "black"))),
df_c,new_page = TRUE,
boxsize = 0.2,
is.summary = c(TRUE, TRUE, rep(FALSE, 4),
TRUE, FALSE, FALSE, TRUE,
rep(FALSE,3), TRUE, rep(FALSE,4)),
clip = c(0,17),
xlab = 'Odds ratio with 95% confidence interval
* indicates significance',
xlog = FALSE,
zero = 1,
plotwidth=unit(12, "cm"),
colgap=unit(2, "mm"),
col = fpColors(box = "royalblue",
line = "darkblue",
summary = "royalblue"))
Which generates the following figure:

qplot: Only graphing nodes below a threshold

I am trying to make a visual graph of a dissimilarity matrix. Using this site, I ran into the qgraph function from the package qgraph. Using the threshold flag, I am able to remove edges from my network above the supplied numerical value. This works beautifully, however, what if I only want to plot values below a certain threshold, not above?
For this, I came back to this site and read here: How to plot near-zero values with qgraph? to use the cut flag for this purpose. However, as the answer states, this flag will only "adjust the saturation so that everything above the cut point has the strongest color intensity, anything below the cut point, the saturation gets weaker."
What I would like to do is to plot only lines between the nodes that are below my cut value (or threshold), not anything else.
Here is some reproducible data:
Dist <- data.frame(Sample_1 = c(0.0, 0.245, 0.191, 0.78, 0.5),
Sample_2 = c(0.3, 0.0, 0.2, 0.99, 0.6),
Sample_3 = c(0.65, 0.45, 0.0, 0.05, 0.8),
Sample_4 = c(0.45, 0.06, 0.88, 0.0, 0.7),
Sample_5 = c(0.11, 0.79, 0.66, 0.37, 0.0),
row.names = c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5"))
Plotting the graph:
qgraph(Dist, layout = "circle", vsize = 5, color = c("cyan", "yellow", "pink", "green3", "gray"), labels = c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5"), label.cex = 3, cut = 0.2)
As you can see, anything above the cut = 0.2 is also plotted and darker.
I would like only values below the 0.2 threshold to be plotted. Is there any way to do this?
Thanks.
qgraph does not seems to have the ability to cut below a threshold, so we have to manipulate the input data.
Replacing values above the threshold to 0 or NA should do it. Using NA result in the same output but with a warning.
Dist <- data.frame(
Sample_1 = c(0.0, 0.245, 0.191, 0.78, 0.5),
Sample_2 = c(0.3, 0.0, 0.2, 0.99, 0.6),
Sample_3 = c(0.65, 0.45, 0.0, 0.05, 0.8),
Sample_4 = c(0.45, 0.06, 0.88, 0.0, 0.7),
Sample_5 = c(0.11, 0.79, 0.66, 0.37, 0.0),
row.names = c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
)
library(qgraph)
qgraph(
replace(Dist, Dist > 0.2, 0),
layout = "circle",
vsize = 5,
color = c("cyan", "yellow", "pink", "green3", "gray"),
labels = c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5"),
label.cex = 3
)
Created on 2020-04-06 by the reprex package (v0.3.0)

merge elements in forestplot format a single element

I am creating a forestplot using the forestplot package in R, and am having trouble with a few things.
Questions:
Is it possible to merge two adjacent text elements
Is it possible to modify either a single text element font, or the font of an entire row
My Code:
library(forestplot)
# creating text
text <- rbind(c('', 'N (%)', 'SRT', 'ART', 'HR [95% CI]'),
c('', '', '5 year survival %', '5 year survival %', ''),
c('Seminal Vesicle Involvement', '', '', '', ''),
c(' Yes', '10 (20%)', '94', '12', '0.73 [0.36, 1.50]'),
c(' No', '40 (80%)', '96', '10', '1.78 [0.73, 4.35]'),
c('Gender', '', '', '', ''),
c(' Male', '13 (22.5%)', '84', '22', '0.06 [-0.2, 0.86]'),
c(' Female', '37 (77.5%)', '93', '13', '1.89 [0.90, 6.67]'))
# creating the plot
forestplot(text,
mean = c(NA, NA, NA, 0.73, 1.78, NA, 0.06, 1.89),
lower = c(NA, NA, NA, 0.36, 0.73, NA, -0.2, 0.90),
upper = c(NA, NA, NA, 1.50, 4.35, NA, 0.86, 6.67),
is.summary=c(T, T, T, F, F, T, F, F),
lineheight = unit(0.9, "cm"),
graph.pos = 5,
graphwidth = unit(4, 'cm'),
xticks = c(-1, 0, 1, 2, 3, 4),
ci.vertices = T,
txt_gp = fpTxtGp(ticks = gpar(cex = 1),
xlab = gpar(cex = 1),
label = gpar(cex = 0.8),
summary = gpar(cex = 0.8)),
col=fpColors(box="black",
line="darkgrey",
summary="black",
zero='grey20',
axes='grey20'),
hrzl_lines = list("2" = gpar(lwd=1, col = "#000044")))
Output:
Desired:
I would like the two 5 year survival % text bits to be combined into 1 (and centered between the two headings above), and either just those elements or the whole row to be italic font.
I have tried using summary=list(gpar(...)) for the txt_gp option, but that only seems to be able to modify the whole column, and I have found nothing on merging cells at all.
If you make the colgap much smaller in forestplot than usual, you can split the text that is currently duplicated in row 2 in columns 3 and 4 into two parts:
> text[2, 4] <- 'survival % '
> text[2, 3] <- '5 year '
>
> forestplot(text,
+ mean = c(NA, NA, NA, 0.73, 1.78, NA, 0.06, 1.89),
+ lower = c(NA, NA, NA, 0.36, 0.73, NA, -0.2, 0.90),
+ upper = c(NA, NA, NA, 1.50, 4.35, NA, 0.86, 6.67),
+ is.summary=c(T, T, T, F, F, T, F, F),
+ lineheight = unit(0.9, "cm"),
+ graph.pos = 5,
+ graphwidth = unit(4, 'cm'),
+ xticks = c(-1, 0, 1, 2, 3, 4),
+ ci.vertices = T,
# add line---------
colgap=unit(.0011,"npc"),
#
+ txt_gp = fpTxtGp(ticks = gpar(cex = 1),
+ xlab = gpar(cex = 1),
+ label = gpar(cex = 0.8),
+ summary = gpar(cex = 0.8)),
+ col=fpColors(box="black",
+ line="darkgrey",
+ summary="black",
+ zero='grey20',
+ axes='grey20'),
+ hrzl_lines = list("2" = gpar(lwd=1, col = "#000044")))

Misplaced grid lines with xlog=TRUE - Forestplot - R

Please, I need some help about using the xlog=TRUE option.
It is requested to provide the mean, lower, upper, zero, grid and clip already as exponentials, but I find the package is drawing the grid lines at the exponential of the numbers I am already providing as exponentials. As a consequence, the grid lines are in the wrong place.
metaan <-
structure(list(
mean = c(NA, NA, NA, 0.27, 0.47, 0.33, 0.69, 0.86, 0.37, 0.08, 0.44, 0.54, 0.41, NA),
lower = c(NA, NA, NA, 0.13, 0.12, 0.19, 0.12, 0.54, 0.17, 0.03, 0.16, 0.06, 0.29, NA),
upper = c(NA, NA, NA, 0.58, 1.81, 0.60, 3.97, 1.36, 0.81, 0.21, 1.25, 4.50, 0.58, NA)),
.Names = c("mean", "lower", "upper"),
row.names = c(NA, -27L),
class = "data.frame")
tabletext<-cbind(
c("", "AB class", "", " Aminoglycosides", " B-lactams", " Cephalosporins", " Fenicoles", " Fluoroquinolones", " Multiresistance", " Sulphamides", " Tetracyclines", " Tri/Sulpha", " Subtotal", ""),
c("", "OR", "", "0.27", "0.47", "0.33", "0.69", "0.86", "0.37", "0.08", "0.44", "0.54", "0.41", ""),
c("", "n", "", "4", "3", "2", "3", "4", "2", "3", "4", "3", "5", ""))
xticks <- c(0.1, 0.25, 0.5, 1, 1.5, 2, 3)
forestplot(tabletext,
graph.pos = 3,
txt_gp = fpTxtGp(label = gpar(fontsize=10)),
hrzl_lines = list("3" = gpar(lty=1)),
zero = 1,
line.margin = .05,
mean = cbind(metaan[,"mean"]),
lower = cbind(metaan[,"lower"]),
upper = cbind(metaan[,"upper"]),
is.summary=c(FALSE, TRUE, rep(FALSE, 9)),
col=fpColors(box=c("blue"), summary=c("blue")),
grid = structure(0.41,
gp = gpar(lty = 2, col = "#CCCCFF")),
clip=c(0.1, 3),
xlog=T,
xticks=xticks,
xlab="Odds ratio")
The grid line is at the exponential of OR=0.41, instead of at OR=0.41
When provided the log to get the grid lines at the correct place (e.g. -0.38, or log(0.41)), I get the error message that I should provide all parameters already as exponential.
forestplot(tabletext,
graph.pos = 3,
txt_gp = fpTxtGp(label = gpar(fontsize=10)),
hrzl_lines = list("3" = gpar(lty=1)),
zero = 1,
line.margin = .05,
mean = cbind(metaan[,"mean"]),
lower = cbind(metaan[,"lower"]),
upper = cbind(metaan[,"upper"]),
is.summary=c(FALSE, TRUE, rep(FALSE, 9)),
col=fpColors(box=c("blue"), summary=c("blue")),
grid = structure(-0.39,
gp = gpar(lty = 2, col = "#CCCCFF")),
clip=c(0.1, 3),
xlog=T,
xticks=xticks,
xlab="Odds ratio")
Error in forestplot.default(tabletext, graph.pos = 3, txt_gp = fpTxtGp(label = gpar(fontsize = 10)), :
All argument values (mean, lower, upper, zero, grid and clip) should be provided as exponentials when using the log scale. This is an intentional break with the original forestplot function in order to simplify other arguments such as ticks, clips, and more.
I have tried including the grid numbers as lists, but I always encounter the same error message either if I provide the numbers as exponentials (grid misplaced) or as log (error message).
I am wondering what I am doing wrong and if there is any other way to get the grid lines in the correct place.
Thanks in advance,
Magda.
Solved. Updated to 1.8 package version from GitHub and got the correct figure. :)

Add value or percentage in contingency table plot

I would like to put in each box from the ploted contingency table, the value obtained by the table.
The following image represent the contingency table
Te following code is how to display the contingency table:
> svm.video.table2<-table(pred=svm.video.pred2, true= filteredDataFinal$rate)
> svm.video.table2
An this one is how to plot that table
plot(svm.video.table2)
And adhoc approach would be:
text(x = 0.23, y = 0.55, "10")
text(x = 0.23, y = 0.67, "2")
text(x = 0.64, y = 0.94, "1")
text(x = 0.64, y = 0.45, "9")
text(x = 0.92, y = 0.44, "4")
PS: I generated the data to make your example reproducible with svm.video.table2 <- as.table(matrix(c(10, 1, 0, 2, 9, 0, 0, 0, 4), ncol = 3))

Resources