R Legend Variable Substitution - r

I always desire to have my R code as flexible as possible; at present I have three (potentially more) curves to compare based on a parameter delta, but I don't want to hardcode the values of delta anywhere (or even how many values if I can avoid it).
I am trying to make a legend that involves both Greek and a variable substitution for the delta values, so each legend entry is of the form like 'delta = 0.01', where delta is Greek and 0.01 is determined by variable. Many different combinations of paste, substitute, bquote and expression have been tried, but always end up with some verbatim code leftover in the finished legend, OR fail to put 'delta' into symbolic form.
delta <- c(0.01,0.05,0.1)
plot(type="n", x=1:5, y=1:5) #the curves themselves are irrelevant
legend_text <- vector(length=length(delta)) #I don't think lists work either
for(i in 1:length(delta)){
legend_text[i] <- substitute(paste(delta,"=",D),list(D=delta[i]) )
}
legend(x="topleft", fill=rainbow(length(delta)), legend=legend_text)
Since legend=substitute(paste(delta,"=",D),list(D=delta[1]) works for a single entry, I've also tried doing a 'semi-hardcoded' version, fixing the length of delta:
legend(x="topleft", fill=rainbow(length(delta)),
legend=c(substitute(paste(delta,"=",A), list(A=delta[1])),
substitute(paste(delta,"=",B), list(B=delta[2])),
substitute(paste(delta,"=",C), list(C=delta[3])) )
)
but this has the same issues as before.
Is there a way I can do this, or do I need to change the code by hand with each update of delta?

Try using lapply() with as.expression() to generate your legend labels. Also use bquote to create your individual expressions
legend_text <- as.expression(lapply(delta, function(d) {
bquote(delta==.(d))
} ))
Note that with plotmath you need == to get an equals sign. Also no need for paste() since nothing is really a string here.

Related

How to create a variable list of bquote expressions in order to use sub/superscript in a R plot legend

I have a shiny app that takes a user inputed number, user_input, and plots that many lines on a graph; the legend also has that many items in it. Currently for the legend names I have a loop that pastes together variables and adds them to a vector, legend_names:
legend_names = c()
for (num in 0:(user_input-1)) {
name = paste0("A", num, " ", percents[[num+1]], "%")
legend_names = append(legend_names, name)
}
This list is then applied to the legend with legend(x=1, legend_names)
However, in the name = paste0("A", num, " ", percents[[num+1]], "%") line, I would like the num variable to be a super script. I have tried using expressions and bquote but can't seem to make it work in a vector. Is there anyway that I can go about doing this? I apologize if my problem is poorly worded; it was hard to articulate well.

I want to use heatmap in my code but i am getting error

heatmap(Web_Data$Timeinpage)
str(Web_Data)
heat = c(t(as.matrix(Web_Data$Timeinpage[,-1])))
heatmap(heat)
A few items to note here:
1) by including the c() operator in the c(t(as.matrix(Web_Data$Timeinpage[,-1]))) You are creating a single vector and not a matrix. You can see this by running the following: is.matirx(c(t(as.matrix(Web_Data$Timeinpage[,-1])))). heatmap (I believe) is checking for a matrix because...
2) You need to provide a matrix with at least two rows and two columns for this function to work. Currently, you are only give on vector - time. You will need to provide some other feature of interest to have it work correctly, such as Continent.
3) If you intend to plot ONLY one field, you may consider doing as suggested here and use the image() function. (I included an example below).
4) I find the heatmap function somewhat dated in look. You may want to consider other popular functions, such as ggplot's geom_tile. (see here).
Below is an example code that should produce an output:
#fake data
Web_Data <- data.frame("Timeinpage" = c(123,321,432,555,332,1221,2,43,0, NA,10, 44),
OTHER = rep(c("good", "bad",6)) )
#a matrix with TWO columns from my data frame. Notice the c() is removed and I am not transposing. Also removing the , from [,-1]
heat <- matrix(c(Web_Data$Timeinpage[-1], Web_Data$OTHER[-1]), 2,11)
#output
heatmap(heat)
#one row
heat2 <- as.matrix(sort(Web_Data$Timeinpage[-1])) #sorting as well
#output
image(heat2)

How to diplay subscripts and array elements simultaneously on r plot

I have a coefficient array bees created in the following way:
gfit = lm(y_data,x_data);
bees = coef(gfit);, where bees[1]=0.123, bees[2]=4.56
A plot plot(x_data,y_data) is created. I'd liket to add some text on this plot. The text should look like $b_0=0.123, b_1=4.55$ (how to add Latex symbols on StackOverflow?).
I tried the following command: text(3,15,expression(paste("b"[0],"="bees[1])));, which turns out to be $b_0=bees_1$, i.e. the variable bees[1] is not interpreted properly.
How can I display the value of a variable by typing its name?
R doesn't have a LaTeX interpreter. You need to use ?plotmath. Try using bquote to allow getting values of R-objects , and here assuming that (1,1) is in the range of your (undescribed) data. The .()-function will put values pulled from the working environment into expressions:
text(1,1, bquote( list( b[0] == .(bees[1]) , b[1] == .(bees[2]) ) ) )
See the examples in ?bquote.
Writing formulas is a horrible mess in R. Only regexp is more write-only.
bees=c(0.12, 4.56)
plot(rnorm(100))
text(30,0,bquote(bees[1]== .(bees[1])))

interpreting R code function

I would like to perform pathway enrichment analyses.
I have 21 list of significant genes, and mutiple types of pathways I would like to check (ie. check for enrichment in KEGG pathways, GOterms, complexes etc.).
I found this example of code, on an old BioC post. However, I am having trouble adapting it for myself.
Firstly,
1- what does this mean? I don't know this multiple colon syntax.
hyperg <- Category:::.doHyperGInternal
2 - I don't understand how this line works. hyperg.test is a function that needs 3 variables passed to it, correct? Is this line somehow passing "genes.by.pathways, significant.genes, and all.geneIDs to thr hyperg.test?
pVals.by.pathway<-t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))
Code that I would like to adapt
library(KEGGREST)
library(org.Hs.eg.db)
# created named list, length 449, eg:
# path:hsa00010: "Glycolysis / Gluconeogenesis"
pathways <- keggList("pathway", "hsa")
# make them into KEGG-style human pathway identifiers
human.pathways <- sub("path:", "", names(pathways))
# for demonstration, just use the first ten pathways
demo.pathway.ids <- head(human.pathways, 10)
demo.pathways <- setNames(keggGet(demo.pathway.ids), demo.pathway.ids)
genes.by.pathway <- lapply(demo.pathways, function(demo.pathway) {
demo.pathway$GENE[c(TRUE, FALSE)]
})
all.geneIDs <- keys(org.Hs.eg.db)
# chose one of these for demonstration. the first (a whole genome random
# set of 100 genes) has very little enrichment, the second, a random set
# from the pathways themselves, has very good enrichment in some pathways
set.seed(123)
significant.genes <- sample(all.geneIDs, size=100)
#significant.genes <- sample(unique(unlist(genes.by.pathway)), size=10)
# the hypergeometric distribution is traditionally explained in terms of
# drawing a sample of balls from an urn containing black and white balls.
# to keep the arguments straight (in my mind at least), I use these terms
# here also
hyperg <- Category:::.doHyperGInternal
hyperg.test <-
function(pathway.genes, significant.genes, all.genes, over=TRUE)
{
white.balls.drawn <- length(intersect(significant.genes, pathway.genes))
white.balls.in.urn <- length(pathway.genes)
total.balls.in.urn <- length(all.genes)
black.balls.in.urn <- total.balls.in.urn - white.balls.in.urn
balls.pulled.from.urn <- length(significant.genes)
hyperg(white.balls.in.urn, black.balls.in.urn,
balls.pulled.from.urn, white.balls.drawn, over)
}
pVals.by.pathway <-
t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))
print(pVals.by.pathway)
The reason you are getting your error is because it appears you don't have the Category package installed from bioconductor. I suspect this because of the triple colon operator :::. This operator is very similar to the double colon operator ::. Whereas with :: you can access exported objects from a package without loading it, the ::: allows access to non-exported objects (in this case the hyperg function from Category). If you install the Category package the code runs without error.
With regard to the sapply statement:
pVals.by.pathway<-t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))
You can break this down into the separate parts to understand it. Firstly, the sapply is iterating over the elements of gene.by.pathway and passing them to the first argument of hyperg.test. The following arguments are the two addition parameters. It is a little unclear and I personally recommend that people explicitly identify the parameters to avoid unexpected surprises and avoids the need for the exact same order. This is a little repetitive in this case but a good way to avoid a silly bug (e.g. putting significant.genes after all.geneIds)
Rewritten:
pVals.by.pathway <-
t(sapply(genes.by.pathway, hyperg.test, significant.genes=significant.genes, all.genes=all.geneIDs))
Once this loop completes, the sapply function simplifies the output in to a matrix. However, the output is much more user-friendly by taking the transpose t.
Generally speaking, when trying to understand complex apply statements I find it best to break them apart in to smaller parts and see what the objects themselves look like.

strip panels lattice

My problem is to strip my panels with lattice framework.
testData<-data.frame(star=rnorm(1200),frame=factor(rep(1:12,each=100))
,n=factor(rep(rep(c(4,10,50),each=100),4))
,var=factor(rep(c("h","i","h","i"),each=300))
,stat=factor(rep(c("c","r"),each=600))
)
levels(testData$frame)<-c(1,7,4,10,2,8,5,11,3,9,6,12)# order of my frames
histogram(~star|factor(frame), data=testData
,as.table=T
,layout=c(4,3),type="density",breaks=20
,panel=function(x,params,...){
panel.grid()
panel.histogram(x,...,col=1)
panel.curve(dnorm(x,0,1), type="l",col=2)
}
)
What I'm looking for, is:
You should not need to add the factor call around items in the conditioning section of the formula when they are already factors. If you want to make a cross between two factors the interaction function is the best approach. It even has a 'sep' argument which will accept a new line character. This is the closest I can produce:
h<-histogram(~star|interaction(stat, var, sep="\n") + n, data=testData ,
as.table=T ,layout=c(4,3), type="density", breaks=20 ,
panel=function(x,params,...){ panel.grid()
panel.histogram(x,...,col=1)
panel.curve(dnorm(x,0,1), type="l",col=2) } )
plot(h)
useOuterStrips(h,strip.left = strip.custom(horizontal = FALSE),
strip.lines=2, strip.left.lines=1)
I get an error when I try to put in three factors separately and then try to use useOuterStrips. It won't accept three separate conditioning factors. I've searched for postings in Rhelp, but the only perfectly on-point question got an untested suggestion and when I tried it failed miserably.

Resources