Specifing order of lattice plot panels - r

I have looked at the two similar questions on this topic but do not find the answer I'm looking for in either of the two. The as.table function alters the alphabetic sequence from starting in the lower left to starting in the upper left but does nothing about the order of panes within the group.
The data (which are proprietary to my client) have station identifications that are a combination of letters and numbers. When there is a series of sites with the same initial letters within the group of all sites being plotted, they sort by first digit rather than the way we humans count. For example, SW-1, SW-10, SW-11, SW-2, SW-3. I would like them in the order SW-1, SW-2, SW-3, SW-10, SW-11. The code I use is:
xyplot(as.d$quant ~ as.d$sampdate | as.d$site, ylim=range(as.d$quant), xlim=range(as.d$sampdate),
main='Arsenic By Time', ylab='Concentraion (mg/L)', xlab='Time')
I do not know how to attach a .pdf of the resulting plot but will do so if someone shows me how to do this.

There are a couple of points here.
First is that in R things like the order of factor levels are considered to be a property or attribute of the data rather than a property of the graph/analysis. Because of that there is not generally arguments in the plotting or analysis functions for specifying the order, rather you specify that order in the data object itself, then all plots and analyses use that order.
To change the order you can specify the desired order using the factor function, or you can use functions like relevel and reorder to change the order of the levels of a factor. If you want the levels to be in the same order that they appear in the data then the unique function works well. For sorting with characters and numbers mixed the mixedsort function in the gtools package can be useful.

You need to specify the levels of that factor variable in the sequence you expect. The default is lexigraphic as you noticed:
xyplot(as.d$quant ~ as.d$sampdate | factor( as.d$site,
levels=1:length(unique(as.d$site))) ,
ylim=range(as.d$quant), xlim=range(as.d$sampdate),
main='Arsenic By Time', ylab='Concentration (mg/L)', xlab='Time')
Based on how the question currently stands, you might need:
require(gtools)
xyplot(as.d$quant ~ as.d$sampdate | factor( as.d$site,
levels=mixedsort( as.character(unique(as.d$site)) ) ) ,
ylim=range(as.d$quant), xlim=range(as.d$sampdate),
main='Arsenic By Time', ylab='Concentration (mg/L)', xlab='Time')

Related

Creating a histogram from a subset created from the subset function

This is how I've retrieved my dataset, everything is good so far.
> mantis<-read.csv("mantis.csv")
> attach(mantis)
The dataset provides numerical data on body mass/length/claw strength/etc. of FEMALE and MALE mantises. The object is to create a histogram showing the body masses of ONLY female mantises. I created a subset;
> mantis_sub<-subset(mantis, Sex=="f",select="Body.Mass.g")
Then I tried;
> hist(mantis_sub)
Error in hist.default(mantis_sub) : 'x' must be numeric
I've searched this link;
Plot a histogram of subset of a data
...and I cannot figure out how to properly create this histogram. I am unfortunately not fluent enough in R to understand the solution and the textbook I'm using does not cover this.
It is because mantis_sub is a dataframe (ie a table of body masses, lengths, claw strengths, ..), not a set of numbers, so hist is unsure which column you wish to plot.
You need to extract the column you want to do a histogram of. To do this you put mantis_sub${column name}. The dollar sign extracts the appropriate column from the mantis_sub table.
e.g. to do a histogram of the column named "BodyMass"
hist(mantis_sub$BodyMass)
If you want to do histograms of many columns automatically, then you'll have to loop through them, e.g.
for (column in c("BodyMass", "ClawStrength")) {
hist(mantis_sub[[column]])
}

How to convert character/factor to integer?

I know that has been asked quite frequently. However, by applying the previous advice I'm still confused about two things.
How to convert from multinomial values to integers?
How to get the integer back to the factor/character after the analysis?
library(car)
data(Prestige)
View(Prestige)
# here I convert directly from character which seems quite useless
Prestige$TYPE<-as.numeric(levels(Prestige$type))
# here I generate factors
Prestige$type<-as.factor(Prestige$type)
# and try to convert afterwards. doesnt work either
Prestige$TYPE<-as.numeric(levels(Prestige$type))
Basically, I would like to extract the three levels in type without renaming it manually.
A vector with class factor has an attributes called levels. The levels function acts on that attributes and not on the vector itself.
library(car)
data(Prestige)
length(Prestige$type) # 102
levels(Prestige$type) # Notice that this has length 3.
If you want the numeric values for the vector, use
as.numeric(Prestige$type)
What was bc is not 1, what was prof is now 2, and what was wc is now 3.
if you need to reconstitute the factor, use
factor(Prestige$type, 1:3, c("bc", "prof", "wc"))
But as a general rule, it's better not to alter your factors unless you need to alter the categories. If you need the numerical codes under the data, make a new variable
Prestige$type_numeric <- as.numeric(Prestige$type)

how to transform columns of a data frame according to the values in a vector in R?

I am trying to normalize some columns on a data frame so they have the same mean. The solution I am now implementing, even though it works, feels like there is a simpler way of doing this.
# we make a copy of women
w = women
# print out the col Means
colMeans(women)
height weight
65.0000 136.7333
# create a vector of factors to normalize with
factor = colMeans(women)/colMeans(women)[1]
# normalize the copy of women that we previously made
for(i in 1:length(factor)){w[,i] <- w[,i] / factor[i]}
#We achieved our goal to have same means in the columns
colMeans(w)
height weight
65 65
I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer?
By the way, what does women/factor actually doing? as doing:
colMeans(women/factor)
height weight
49.08646 98.40094
Is not the same result.
Can use mapply too
colMeans(mapply("/", w, factor))
Re your question re what does women/factor do, so women is a data.frame with two columns, while factor is numeric vector of length two. So when you do women/factor, R takes each entry of women (i.e. women[i,j]) and divides it once by factor[1] and then factor[2]. Because factor is shorter in length than women, R rolls factor over and over again.
You can see, for example, that every second entry of women[, 1]/factor equals to every second entry of women[, 1] (because factor[1] equals to 1)
One way of doing this is using sweep. By default this function subtracts a summary statistic from each row, but you can also specify a different function to perform. In this case a division:
colMeans(sweep(women, 2, factor, '/'))
Also:
rowMeans(t(women)/factor)
#height weight
#65 65
Regarding your question:
I can come up with the same thing easily ussing apply but is there something easier like just doing women/factor and get the correct answer? By the way, what does women/factor actually doing?
women/factor ## is similar to
unlist(women)/rep(factor,nrow(women))
What you need is:
unlist(women)/rep(factor, each=nrow(women))
or
women/rep(factor, each=nrow(women))
In my solution, I didn't use rep because factor gets recycled as needed.
t(women) ##matrix
as.vector(t(women))/factor #will give same result as above
or just
t(women)/factor #preserve the dimensions for ?rowMeans
In short, column wise operations are happening here.

Custom function does not work in R 'ddply' function

I am trying to use a custom function inside 'ddply' in order to create a new variable (NormViability) in my data frame, based on values of a pre-existing variable (CelltiterGLO).
The function is meant to create a rescaled (%) value of 'CelltiterGLO' based on the mean 'CelltiterGLO' values at a specific sub-level of the variable 'Concentration_nM' (0.01).
So if the mean of 'CelltiterGLO' at 'Concentration_nM'==0.01 is set as 100, I want to rescale all other values of 'CelltiterGLO' over the levels of other variables ('CTSC', 'Time_h' and 'ExpType').
The normalization function is the following:
normalize.fun = function(CelltiterGLO) {
idx = Concentration_nM==0.01
jnk = mean(CelltiterGLO[idx], na.rm = T)
out = 100*(CelltiterGLO/jnk)
return(out)
}
and this is the code I try to apply to my dataframe:
library("plyr")
df.bis=ddply(df,
.(CTSC, Time_h, ExpType),
transform,
NormViability = normalize.fun(CelltiterGLO))
The code runs, but when I try to double check (aggregate or tapply) if the mean of 'NormViability' equals '100' at 'Concentration_nM'==0.01, I do not get 100, but different numbers. The fact is that, if I try to subset my df by the two levels of the variable 'ExpType', the code returns the correct numbers on each separated subset. I tried to make 'ExpType' either character or factor but I got similar results. 'ExpType has two levels/values which are "Combinations" and "DoseResponse", respectively. I can't figure out why the code is not working on the entire df, I wonder if this is due to the fact that the two levels of 'ExpType' do not contain the same number of levels for all the other variables, e.g. one of the levels of 'Time_h' is missing for the level "Combinations" of 'ExpType'.
Thanks very much for your help and I apologize in advance if the answer is already present in Stackoverflow and I was not able to find it.
Michele
I (the OP) found out that the function was missing one variable in the arguments, that was used in the statements. Simply adding the variable Concentration_nM to the custom function solved the problem.
THANKS
m.

strip panels lattice

My problem is to strip my panels with lattice framework.
testData<-data.frame(star=rnorm(1200),frame=factor(rep(1:12,each=100))
,n=factor(rep(rep(c(4,10,50),each=100),4))
,var=factor(rep(c("h","i","h","i"),each=300))
,stat=factor(rep(c("c","r"),each=600))
)
levels(testData$frame)<-c(1,7,4,10,2,8,5,11,3,9,6,12)# order of my frames
histogram(~star|factor(frame), data=testData
,as.table=T
,layout=c(4,3),type="density",breaks=20
,panel=function(x,params,...){
panel.grid()
panel.histogram(x,...,col=1)
panel.curve(dnorm(x,0,1), type="l",col=2)
}
)
What I'm looking for, is:
You should not need to add the factor call around items in the conditioning section of the formula when they are already factors. If you want to make a cross between two factors the interaction function is the best approach. It even has a 'sep' argument which will accept a new line character. This is the closest I can produce:
h<-histogram(~star|interaction(stat, var, sep="\n") + n, data=testData ,
as.table=T ,layout=c(4,3), type="density", breaks=20 ,
panel=function(x,params,...){ panel.grid()
panel.histogram(x,...,col=1)
panel.curve(dnorm(x,0,1), type="l",col=2) } )
plot(h)
useOuterStrips(h,strip.left = strip.custom(horizontal = FALSE),
strip.lines=2, strip.left.lines=1)
I get an error when I try to put in three factors separately and then try to use useOuterStrips. It won't accept three separate conditioning factors. I've searched for postings in Rhelp, but the only perfectly on-point question got an untested suggestion and when I tried it failed miserably.

Resources