strip panels lattice - r

My problem is to strip my panels with lattice framework.
testData<-data.frame(star=rnorm(1200),frame=factor(rep(1:12,each=100))
,n=factor(rep(rep(c(4,10,50),each=100),4))
,var=factor(rep(c("h","i","h","i"),each=300))
,stat=factor(rep(c("c","r"),each=600))
)
levels(testData$frame)<-c(1,7,4,10,2,8,5,11,3,9,6,12)# order of my frames
histogram(~star|factor(frame), data=testData
,as.table=T
,layout=c(4,3),type="density",breaks=20
,panel=function(x,params,...){
panel.grid()
panel.histogram(x,...,col=1)
panel.curve(dnorm(x,0,1), type="l",col=2)
}
)
What I'm looking for, is:

You should not need to add the factor call around items in the conditioning section of the formula when they are already factors. If you want to make a cross between two factors the interaction function is the best approach. It even has a 'sep' argument which will accept a new line character. This is the closest I can produce:
h<-histogram(~star|interaction(stat, var, sep="\n") + n, data=testData ,
as.table=T ,layout=c(4,3), type="density", breaks=20 ,
panel=function(x,params,...){ panel.grid()
panel.histogram(x,...,col=1)
panel.curve(dnorm(x,0,1), type="l",col=2) } )
plot(h)
useOuterStrips(h,strip.left = strip.custom(horizontal = FALSE),
strip.lines=2, strip.left.lines=1)
I get an error when I try to put in three factors separately and then try to use useOuterStrips. It won't accept three separate conditioning factors. I've searched for postings in Rhelp, but the only perfectly on-point question got an untested suggestion and when I tried it failed miserably.

Related

R Legend Variable Substitution

I always desire to have my R code as flexible as possible; at present I have three (potentially more) curves to compare based on a parameter delta, but I don't want to hardcode the values of delta anywhere (or even how many values if I can avoid it).
I am trying to make a legend that involves both Greek and a variable substitution for the delta values, so each legend entry is of the form like 'delta = 0.01', where delta is Greek and 0.01 is determined by variable. Many different combinations of paste, substitute, bquote and expression have been tried, but always end up with some verbatim code leftover in the finished legend, OR fail to put 'delta' into symbolic form.
delta <- c(0.01,0.05,0.1)
plot(type="n", x=1:5, y=1:5) #the curves themselves are irrelevant
legend_text <- vector(length=length(delta)) #I don't think lists work either
for(i in 1:length(delta)){
legend_text[i] <- substitute(paste(delta,"=",D),list(D=delta[i]) )
}
legend(x="topleft", fill=rainbow(length(delta)), legend=legend_text)
Since legend=substitute(paste(delta,"=",D),list(D=delta[1]) works for a single entry, I've also tried doing a 'semi-hardcoded' version, fixing the length of delta:
legend(x="topleft", fill=rainbow(length(delta)),
legend=c(substitute(paste(delta,"=",A), list(A=delta[1])),
substitute(paste(delta,"=",B), list(B=delta[2])),
substitute(paste(delta,"=",C), list(C=delta[3])) )
)
but this has the same issues as before.
Is there a way I can do this, or do I need to change the code by hand with each update of delta?
Try using lapply() with as.expression() to generate your legend labels. Also use bquote to create your individual expressions
legend_text <- as.expression(lapply(delta, function(d) {
bquote(delta==.(d))
} ))
Note that with plotmath you need == to get an equals sign. Also no need for paste() since nothing is really a string here.

How to do box plots on a range of variables

I have a data matrix with approximately one hundred variables and I want to do box plots of these variables. Doing them one by one is possible, but tedious. The code I use for my box plots is:
boxplot(myVar ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T,las=2, ylab='Counts', at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
I started doing them one by one, but realized there must be better options. So, the boxplot call will take only one variable at at time (I may be wrong), so I am looking for a way to get it done in one go. A for loop? Next, I would like to print the name of the current variable (= the colName) on the plot in order to keep them apart.
Appreciate suggestions.
Thank you.
jd
Why not try the following:
data(something)
panel.bxp <- function(x, ...)
{
a <- par("a"); on.exit(par(a))
par(a = c(0, 2, a[3:4]))
boxplot(x, add=TRUE)
}
Then, to run the function, you can try something like the following:
pairs(something, diag.panel = panel.bxp, text.panel = function(...){})
EDIT: There is also a nice link to an article here on R-bloggers which you might want to have a look at.
Being very new to R, I've tried to follow my 'old' thinking - making a for-loop. Here is what I came up with. Probably very primitive, and therefore, I'd appreciate comments/suggestions. Anyway: the loop:
for (i in 1:ncol(final)) {
#print(i)
c <- colnames(final)[i]
#print(c)
b <- final[,i]
#b <- t(b)
#dim(b)
#print(b)
exp <- data.frame(Group,Trt,Time,b)
#dim(exp)
#print(exp)
boxplot(b ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T, las=2, ylab='Counts',main=c, at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
}
The loop runs through the data matrix 'final', (48rows x 67cols). Picks up the column header, c, which is used in the boxplot call as main title. Picks up the data column, b. Sets up the experiment using the Group, Trt, and Time factors established outside the loop, and calls the boxplot.
This seem to do what I want. Oddly, Rstudio does not allow more than 25 (approx) plots to be stored in the plots console, so I have to run this loop in a couple of rounds.
Anyway, sorry for answering my own question. Better solutions are greatly appreciated since my way is pretty amateourish, I suspect.

Custom function does not work in R 'ddply' function

I am trying to use a custom function inside 'ddply' in order to create a new variable (NormViability) in my data frame, based on values of a pre-existing variable (CelltiterGLO).
The function is meant to create a rescaled (%) value of 'CelltiterGLO' based on the mean 'CelltiterGLO' values at a specific sub-level of the variable 'Concentration_nM' (0.01).
So if the mean of 'CelltiterGLO' at 'Concentration_nM'==0.01 is set as 100, I want to rescale all other values of 'CelltiterGLO' over the levels of other variables ('CTSC', 'Time_h' and 'ExpType').
The normalization function is the following:
normalize.fun = function(CelltiterGLO) {
idx = Concentration_nM==0.01
jnk = mean(CelltiterGLO[idx], na.rm = T)
out = 100*(CelltiterGLO/jnk)
return(out)
}
and this is the code I try to apply to my dataframe:
library("plyr")
df.bis=ddply(df,
.(CTSC, Time_h, ExpType),
transform,
NormViability = normalize.fun(CelltiterGLO))
The code runs, but when I try to double check (aggregate or tapply) if the mean of 'NormViability' equals '100' at 'Concentration_nM'==0.01, I do not get 100, but different numbers. The fact is that, if I try to subset my df by the two levels of the variable 'ExpType', the code returns the correct numbers on each separated subset. I tried to make 'ExpType' either character or factor but I got similar results. 'ExpType has two levels/values which are "Combinations" and "DoseResponse", respectively. I can't figure out why the code is not working on the entire df, I wonder if this is due to the fact that the two levels of 'ExpType' do not contain the same number of levels for all the other variables, e.g. one of the levels of 'Time_h' is missing for the level "Combinations" of 'ExpType'.
Thanks very much for your help and I apologize in advance if the answer is already present in Stackoverflow and I was not able to find it.
Michele
I (the OP) found out that the function was missing one variable in the arguments, that was used in the statements. Simply adding the variable Concentration_nM to the custom function solved the problem.
THANKS
m.

Specifing order of lattice plot panels

I have looked at the two similar questions on this topic but do not find the answer I'm looking for in either of the two. The as.table function alters the alphabetic sequence from starting in the lower left to starting in the upper left but does nothing about the order of panes within the group.
The data (which are proprietary to my client) have station identifications that are a combination of letters and numbers. When there is a series of sites with the same initial letters within the group of all sites being plotted, they sort by first digit rather than the way we humans count. For example, SW-1, SW-10, SW-11, SW-2, SW-3. I would like them in the order SW-1, SW-2, SW-3, SW-10, SW-11. The code I use is:
xyplot(as.d$quant ~ as.d$sampdate | as.d$site, ylim=range(as.d$quant), xlim=range(as.d$sampdate),
main='Arsenic By Time', ylab='Concentraion (mg/L)', xlab='Time')
I do not know how to attach a .pdf of the resulting plot but will do so if someone shows me how to do this.
There are a couple of points here.
First is that in R things like the order of factor levels are considered to be a property or attribute of the data rather than a property of the graph/analysis. Because of that there is not generally arguments in the plotting or analysis functions for specifying the order, rather you specify that order in the data object itself, then all plots and analyses use that order.
To change the order you can specify the desired order using the factor function, or you can use functions like relevel and reorder to change the order of the levels of a factor. If you want the levels to be in the same order that they appear in the data then the unique function works well. For sorting with characters and numbers mixed the mixedsort function in the gtools package can be useful.
You need to specify the levels of that factor variable in the sequence you expect. The default is lexigraphic as you noticed:
xyplot(as.d$quant ~ as.d$sampdate | factor( as.d$site,
levels=1:length(unique(as.d$site))) ,
ylim=range(as.d$quant), xlim=range(as.d$sampdate),
main='Arsenic By Time', ylab='Concentration (mg/L)', xlab='Time')
Based on how the question currently stands, you might need:
require(gtools)
xyplot(as.d$quant ~ as.d$sampdate | factor( as.d$site,
levels=mixedsort( as.character(unique(as.d$site)) ) ) ,
ylim=range(as.d$quant), xlim=range(as.d$sampdate),
main='Arsenic By Time', ylab='Concentration (mg/L)', xlab='Time')

Understanding xyplot in R

I'm an R newbie and I'm trying to understand the xyplot function in lattice.
I have a dataframe:
df <- data.frame(Mean=as.vector(abc), Cycle=seq_len(nrow(abc)), Sample=rep(colnames(abc), each=nrow(abc)))
and I can plot it using
xyplot(Mean ~ Cycle, group=Sample, df, type="b", pch=20, auto.key=list(lines=TRUE, points=FALSE, columns=2), file="abc-quality")
My question is, what are Mean and Cycle? Looking at ?xyplot I can see that this is some kind of function and I understand they are coming from the data frame df, but I can't see them with ls() and >Mean gives Error: object 'Mean' not found. I tried to replicate the plot by substituting df[1] and df[2] for Mean and Cycle respectively thinking that these would be equal but that doesn't seem to be the case. Could someone explain what data types these are (objects, variables, etc) and if there is a generic way to access them (like df[1] and df[2])?
Thanks!
EDIT: xyplot works fine, I'm just trying to understand what Mean and Cycle are in terms of how they relate to df (column labels?) and if there is a way to put them in the xyplot function without referencing them by name, like df[1] instead of Mean.
These are simply references to columns of df.
If you'd like access them by name without mentioning df every time, you could write with(df,{ ...your code goes here... }). The ...your code goes here... block can access the columns as simply Mean and Cycle.
A more direct way to get to those columns is df$Mean and df$Cycle. You can also reference them by position as df[,1] and df[,2], but I struggle to see why you would want to do that.
The reason your xyplot call works is it that implicitly does the equivalent of with(df), where df is your third argument to xyplot. Many R functions are like this, for example lm(y~x,obs) would also correctly pick up columns x and y from dataframe obs.
You need to add , data=df to your call to xyplot():
xyplot(Mean ~ Cycle, data=df, # added data= argument
group=Sample, type="b", pch=20,
auto.key=list(lines=TRUE, points=FALSE, columns=2),
file="abc-quality")
Alternatively, you can with(df, ....) and place your existing call where I left the four dots.

Resources