How to get part of column in df in italics? - r

I have a dataframe as follows:
number <- c(34,36,67,87,99)
mz <- c("m/z 565.45","m/z 577.45","m/z 65.49","m/z 394.22","m/z 732.43")
df <- data.frame(number, mz)
However, I want the m/z part in italics, but I cannot figure out how to.
df$mz <- gsub('m/z', italic('m/z'), df$mz)
This does not work, I get the error:
Error in italic("m/z") : could not find function "italic"
This also does not work:
df$mz <- gsub('m/z', expression(italic('m/z')), df$mz)
I don't get an error but I get literally 'italic('m/z')' in my dataframe.
Is there any way around this?
EDIT:
The reason I want to do this is because I'm going to use the df to make a plot and I need the m/z to be in italics in the plot

You should store mz as character strings, then parse at the time of plotting:
df$mz <- sub('m/z', 'italic(m/z)~', df$mz)
A base R plot would then look like this:
plot(1:5, df$number, xaxt = 'n', xlab = 'mz')
axis(1, at = 1:5, labels = parse(text = df$mz))
And a ggplot like this:
ggplot(df, aes(factor(1:5), number)) +
geom_point() +
scale_x_discrete(labels = parse(text = df$mz), name = 'mz')

Related

How to store custom ggplot aliases as `list` objects in .R script and source them along required label vectors

I want to streamline my work with ggplot2 given certain settings I repeat across plots. To this end, I want to:
set aliases for those ggplot2 settings (assigned to individual list objects)
store them all in an .R script
source() the .R script
call the aliases when creating a plot with ggplot
However, this workflow fails when I set aliases for ggplot functions that call vectors. For example, scale_x_discrete() has an argument called labels, which takes a character vector of strings. This vector can be set in-place (e.g., scale_x_discrete(labels = c("a", "b", "c"))) or otherwise call a vector object stored in the environment:
labels_for_x <- c("a", "b", "c")
scale_x_discrete(labels = labels_for_x)
Therefore, my problem happens when I want to set an "alias" for a ggplot function that relies on an additional, separate vector.
Demonstrating the problem with example
Step 1 -- set alias for scale_x_discrete in a new .R script
library(tidyverse)
mygg_scale_x_relabel <-
scale_x_discrete() %>%
list
Works!
But what if:
mygg_scale_x_relabel <-
scale_x_discrete(labels = labels_for_barplot) %>%
list
Error in check_breaks_labels(breaks, labels) : object
'labels_for_barplot' not found
So I can do:
labels_for_barplot <- c()
mygg_scale_x_relabel <-
scale_x_discrete(labels = labels_for_barplot) %>%
list
And this runs OK.
Step 2 -- open a new .R/.Rmd script and source the previous .R script with alias
source(file = ...)
Step 3 -- in the new script, generate a bar plot
library(ggplot2)
df <- data.frame(trt = c("col_1", "col_2", "col_3"), outcome = c(2.3, 1.9, 3.2))
p <-
ggplot(df, aes(trt, outcome)) +
geom_col()
p
Step 4 -- (PROBLEM HAPPENS HERE) -- call the scale_x_discrete alias
labels_for_barplot <- c(col_1 = "new_lab_col_1",
col_2 = "new_lab_col_2",
col_3 = "new_lab_col_3")
p +
mygg_scale_x_relabel
<----- You can see here that all x labels are now gone! This is because I've set labels_for_barplot <- c() (meaning, an empty vector) in my aliases script. But why this cannot be overridden with an updated vector assignment after script is sourced?
When you assign a value to mygg_scale_x_relabel, it gets set straight away with the values it has. It won't change when labels_for_barplot changes.
The best way to do what you want is with a function. The function won't evaluate until you call it.
mygg_scale_x_relabel <- function(labels_for_barplot) {
scale_x_discrete(labels = labels_for_barplot) %>%
list()
}
labels_for_barplot <- c(col_1 = "new_lab_col_1",
col_2 = "new_lab_col_2",
col_3 = "new_lab_col_3")
p +
mygg_scale_x_relabel(labels_for_barplot)
However, if you really wanted the syntax you've described, you can use active bindings. This is not the way things should be done generally, though.
rlang::env_bind_active(
rlang::global_env(),
mygg_scale_x_relabel = function() scale_x_discrete(labels = labels_for_barplot) %>% list()
)
labels_for_barplot <- c(col_1 = "new_lab_col_1",
col_2 = "new_lab_col_2",
col_3 = "new_lab_col_3")
p + mygg_scale_x_relabel
labels_for_barplot <- c(col_1 = "a",
col_2 = "b",
col_3 = "c")
p + mygg_scale_x_relabel

Changing the title of individual plots within a panel of traceplots in ggplot2/rstan to greek letters with subscript

I create traceplots of my stanfit objects via mcmc_trace.
I want to rename the titles of the traceplots.
I already managed to change the title, but I don't know how I can rename the plots to greek letters with subscripts similar to the expression function.
array <- as.array(fit)
array[1,1,1:3]
dimnames(array)[[3]][1:3] <- c("alpha1", "alpha2", "alpha3")
trace <- mcmc_trace(array, pars = c("alpha1", "alpha2", "alpha3")
I want to replace alpha1 with expression(gamma[0]), but it doesn't work.
Okay, so after a little digging, it simply seems like bayesplott::mcmc_trace doesn't have an option for greek lettering.
However, the much similar mcmcplots::traplot package has an option greek = TRUE.
I did a random example for you to see:
library(mcmcplots)
nc <- 3; nr <- 1000
pnames <- c(paste('alpha[', 1:2, ']', sep = ''), paste('gamma[1]', sep = ''))
means <- rpois(10, 20)
fakemcmc <- coda::as.mcmc.list(
lapply(1:3, function(i) coda::mcmc(matrix(rnorm(nc*nr, rep(means, each=nr)),
nrow=nr, dimnames=list(NULL,pnames)))))
traplot(fakemcmc, greek = TRUE)
Which produces this output:

Extract specific lines and make a list of those in R

I have a file, from which I want to extract the number after segsites: and make a histogram with bins. I've written some code that checks if a line begins with the word "segsites", then extracts that line and puts it in a data frame.
However, it's not doing what it's supposed to. It extracts some numbers but they do not correspond to the values I have in the file.
I've attached a screenshot to show what the file looks like. It's an example and not the actual file.
library(dplyr)
library(ggplot2)
txt <- readLines("file.msOut")
lns <- (data.frame((beg=which(grepl("segsites:",txt)))))
output <- cut(lns, breaks = seq(0,1000, by= 100), labels = c("<100","100-200","200-300","300-400","400-500",
"600-700","700-800,800-900","900-100"))
table(output) %>%
as.data.frame() %>%
ggplot(aes(x = output, y = Freq)) +
geom_col()
Sample data from txt
Using regex and supposing txt contains the data from the image
txt <- c('segsites: 10','test')
as.numeric(gsub('\\D', '', grep('segsites\\:', txt, value = TRUE), perl = TRUE))
# [1] 10

How I can select the coordinate X and Y of R plot from Column Filter (R/Knime)?

So, I have this workflow :
I have selected 2 columns(Day and Temperature) from my file using ‘Columns filter’. And I connected to ‘R plot’ that I configurated but I obtain this :
The day column is not selected as X axis but (Row ID) and the Y axis is ok.
This is my code in R plot:
# Library
library(qcc)
library(readr)
library(Rserve)
Rserve(args = "--vanilla")
# Data column filter from CSV file imported
Test <- kIn
#Background color
qcc.options(bg.margin = "white", bg.figure = "gray95")
#R graph ranges of a continuous process variable
qcc(data = Test,
type = "R",
sizes = 5,
title = "Sample R Chart Title",
digits = 2,
plot = TRUE)
Here is my try (using KNIME's R, not the community contribution):
#install.packages("qcc")
library(qcc)
data <- knime.in
#Change the names to use Day instead of row keys
row.names(data) <- data$Day
#Using the updated data
plot(qcc(data = data,
type = "R",
sizes = 5,
title = "Sample R Chart Title",
digits = 2,
plot = TRUE))
With results like:
If you want to select the column for the X axis, just change the row.names assignment. (It can also come from knime.flow.in in case the column name is coming from a flow variable, but as I understand it is not the case for you.)

How can I produce report quality tables from R?

If I have the following dataframe called result
> result
Name CV LCB UCB
1 within 2.768443 1.869964 5.303702
2 between 4.733483 2.123816 18.551051
3 total 5.483625 3.590745 18.772389
> dput(result,"")
structure(list(Name = structure(c("within", "between", "total"
), .rk.invalid.fields = list(), .Label = character(0)), CV = c(2.768443,
4.733483, 5.483625), LCB = c(1.869964, 2.123816, 3.590745), UCB = c(5.303702,
18.551051, 18.772389)), .Names = c("Name", "CV", "LCB", "UCB"
), row.names = c(NA, 3L), class = "data.frame")
What is the best way to present this data nicely? Ideally I'd like an image file that can be pasted into a report, or possibly an HTML file to represent the table?
Extra points for setting number of significant figures.
I would use xtable. I usually use it with Sweave.
library(xtable)
d <- data.frame(letter=LETTERS, index=rnorm(52))
d.table <- xtable(d[1:5,])
print(d.table,type="html")
If you want to use it in a Sweave document, you would use it like so:
<<label=tab1,echo=FALSE,results=tex>>=
xtable(d, caption = "Here is my caption", label = "tab:one",caption.placement = "top")
#
For the table aspect, the xtable package comes to mind as it can produce LaTeX output (which you can use via Sweave for professional reports) as well as html.
If you combine that in Sweave with fancy graphs (see other questions for ggplot examples) you are almost there.
library(ggplot2)
ggplot(result, aes(x = Name, y = CV, ymin = LCB, ymax = UCB)) + geom_errorbar() + geom_point()
ggplot(result, aes(x = Name, y = CV, ymin = LCB, ymax = UCB)) + geom_pointrange()
To set the significant figures, the easiest thing to do (for this sample data, mind you) would be to move Name to rownames and round the whole thing.
#Set the rownames equal to Name - assuming all unique
rownames(result) <- result$Name
#Drop the Name column so that round() can coerce
#result.mat to a matrix
result.mat <- result[ , -1]
round(result.mat, 2) #Where 2 = however many sig digits you want.
This is not a terribly robust solution - non-unique Name values would break it, I think, as would other non-numeric columns. But for producing a table like your example, it does the trick.

Resources