I am doing a project on Functional Data Analysis, where I am trying to plot spaghetti plots for height. I am using xyplot from lattice library. Why is y-axis wrapped in xyplot?
Here I am plotting data for only one individual. If plot whole data set it looks like a block of thick lines.
My code in R is:
xyplot(height ~ age|sex, p_data, type="l", group=id)
Resulting in:
Without seeing p_data it's hard to say, but based upon the axis labelling I would guess that height is being treated as a factor variable.
Run is.factor(p_data$height), and if the answer is TRUE then try
p_data$height <- as.numeric(levels(p_data$height))[p_data$height]
and repeat your plot. If this doesn't work then at least give us some idea of what the p_data dataframe looks like.
#Joe has put you on the right path. The issue is almost certainly that the height variable is being treated as a factor (categorical variable) rather than a continuous, numeric variable:
E.g. - I can replicate a similar problem via:
p_data <- data.frame(height=c(96,72,100,45),age=1:4,sex=c("m","f","f","m"),id=1)
p_data$height <- factor(p_data$height,levels=p_data$height)
# it's all out of order cap'n!
p_data$height
#[1] 96 72 100 45
#Levels: 96 72 100 45
# same plot call as you are using
xyplot(height ~ age|sex, p_data, type="l", group=id)
If you fix it up like so:
p_data$height <- as.numeric(as.character(p_data$height))
....then the same call gives an appropriate result:
xyplot(height ~ age|sex, p_data, type="l", group=id)
Related
I have a data frame data_2 and wish to create a Bland-Altman plot to compare the differences between the data in the columns alog1 vs. dig1.
Please help with the function for this and how to execute this. Would the function be barplot()?
Thanks for your time.
Another name for a Bland-Altman plot is a Tukey mean-difference plot. (I have nothing against Bland and Altman, but I think 'mean-difference' is more descriptive.) Note that this different from a boxplot (observe the pictures on the two Wikipedia pages). The mean-difference plot is simply a regular scatterplot, except that instead of plotting x versus y, you are plotting the difference x-y against the mean of x and y (or in your case, alog1 and dig1). Probably the easiest way to make this is to form these two new variables first, and then simply plot them as you would any other scatterplot. Here is some sample code:
mn <- (data_2$alog1 + data_2$dig1)/2
dif <- data_2$alog1 - data_2$dig1
plot(mn, dif)
If you wanted to add arguments to customize your plot, you could do that just as you normally would, for example:
plot(mn, dif, main="Bland-Altman plot", xlab="mean of alog1 & dig1",
ylab="difference between alog1 & dig1")
This is for research I am doing for my Masters Program in Public Health
I am graphing data against each other, a standard x,y type deal, over top of that I am plotting a predicted line. I get what I think to be the most funky looking point/boxplot looking thing ever with an x axis that is half filled out and I don't understand why as I do not call a boxplot function. When I call the plot function it is my understanding that only the points will plot.
The data I am plotting looks like this
TOTAL.LACE | DAYS.TO.FAILURE
9 | 15
16 | 7
... | ...
The range of the TOTAL.LACE is from 0 to 19 and DAYS.TO.FAILURE is 0 - 30
My code is as follows, maybe it is something before the plot but I don't think it is:
# To control the type of symbol we use we will use psymbol, it takes
# value 1 and 2
psymbol <- unique(FAILURE + 1)
# Build a test frame that will predict values of the lace score due to
# a patient being in a state of failure
test <- survreg(Surv(time = DAYS.TO.FAILURE, event = FAILURE) ~ TOTAL.LACE,
dist = "logistic")
pred <- predict(test, type="response") <-- produces numbers from about 14 to 23
summary(pred)
ord <- order(TOTAL.LACE)
tl_ord <- TOTAL.LACE[ord]
pred_ord <- pred[ord]
plot(TOTAL.LACE, DAYS.TO.FAILURE, pch=unique(psymbol)) <-- Produces goofy graph
lines(tl_ord, pred_ord) <-- this produces the line not boxplots
Here is the resulting picture
Not to sure how to proceed from here, this is an off shoot of another problem I had with the same data set at this link here I am not understanding why boxplots are being drawn, the reason being is I did not specifically call the boxplot() command so I don't know why they appeared along with point plots. When I issue the following command: plot(DAYS.TO.FAILURE, TOTAL.LACE) I only get points on the resulting plot like I expected, but when I change the order of what is plotted on x and y the boxplots show up, which to me is unexpected.
Here is a link to sample data that will hopefully help in reproducing the problem as pointed out by #Dwin et all Some Sample Data
Thank you,
Since you don't have a reproducible example, it is a little hard to provide an answer that deals with your situation. Here I generate some vaguely similar-looking data:
set.seed(4)
TOTAL.LACE <- rep(1:19, each=1000)
zero.prob <- rbinom(19000, size=1, prob=.01)
DAYS.TO.FAILURE <- rpois(19000, lambda=15)
DAYS.TO.FAILURE <- ifelse(zero.prob==1, DAYS.TO.FAILURE, 0)
And here is the plot:
First, the problem with some of the categories not being printed on the x-axis is because they don't fit. When you have so many categories, to make them all fit you have to display them in a smaller font. The code to do this is to use cex.axis and set the value <1 (you can read more about this here):
boxplot(DAYS.TO.FAILURE~TOTAL.LACE, cex.axis=.8)
As to the question of why your plot is "goofy" or "funky-looking", it is a bit hard to say, because those terms are rather nebulous. My guess is that you need to more clearly understand how boxplots work, and then understand what these plots are telling you about the distribution of your data. In a boxplot, the midline of the box is the 50th percentile of your data, while the bottom and top of the box are the 25th and 75th percentiles. Typically, the 'whiskers' will extend out to the furthest datapoint that is at most 1.5 times the inter-quartile range beyond the ends of the box. In your case, for the first 9 TOTAL.LACEs, more than 75% of your data are 0's, so there is no box and thus no whiskers are possible. Everything beyond the whisker limits is plotted as an individual point. I don't think your plots are "funky" (although I'll admit I have no idea what you mean by that), I think your data may be "funky" and your boxplots are representing the distributions of your data accurately according to the rules by which boxplots are constructed.
In the future (and I mean this politely), it will help you get more useful and faster answers if you can write questions that are more clearly specified, and contain a reproducible example.
Update: Thanks for providing more information. I gather by "funky" you mean that it is a boxplot, rather than a typical scatterplot. The thing to realize is that plot() is a generic function that will call different methods depending on what you pass to it. If you pass simple continuous data, it will produce a scatterplot, but if you pass continuous data and a factor, then it will produce a boxplot, even if you don't call boxplot explicitly. Consider:
plot(TOTAL.LACE, DAYS.TO.FAILURE)
plot(as.factor(TOTAL.LACE), DAYS.TO.FAILURE)
Evidently, you have converted DAYS.TO.FAILURE to a factor without meaning to. Presumably this was done in the pch=unique(psymbol) argument via the code psymbol <- unique(FAILURE + 1) above. Although I haven't had time to try this, I suspect eliminating that line of code and using pch=(FAILURE + 1) will accomplish your goals.
Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:
I have a matrix of Chip-seq results data like this for 26000 genes
LncRNA_ID LncRNA_Name Control_Raw_TagCount ICLIP_EZH2_Raw_TagCount
1 AK092525 47908 194887
2 ENST00000423879 RP11-12M5.1 10794 90146
3 AF318349 5514 61617
4 ENST00000506392 CTC-313D10.1 288 40880
5 ENST00000438080 RP11-177A2.4 25005 37380
6 AK123756 800 35469
I want to plot the counts densities of both samples, control and EZH2, that is column 3 and 4, in order to compare them. I am using R and I am very confused, mainly because I can't plot them as histograms, I get one figure with only one bar and not all the bars that I am waiting for, the same if I am interested to do a boxplot. Probably is a very silly question but I am a bit desperate
ezh2<-data$ICLIP_EZH2_Raw_TagCount
control<-data$Control_Raw_TagCount
hist(ezh2)# not working, i can't see distribution at all
Do you have any idea to do it?
Thanks in advance
Box plot, where the two columns are stuck together and then split along the groups:
N <- length(d$Control_Raw_TagCount)
x <- c(d$Control_Raw_TagCount, d$ICLIP_EZH2_Raw_TagCount)
group <- rep(c("Control_Raw_TagCount", "ICLIP_EZH2_Raw_TagCount"), c(N, N))
boxplot(x ~ group)
Here I've assumed the data name is d, so adjust that to your data frame's name. If you want something like hollow histograms (see pg26 of OpenIntro Statistics), the histPlot function in the openintro package will do the trick using the arguments probability=TRUE, hollow=TRUE:
# install.packages("openintro")
library(openintro)
histPlot(d$Control_Raw_TagCount, probability=TRUE, hollow=TRUE)
histPlot(d$ICLIP_EZH2_Raw_TagCount, probability=TRUE, hollow=TRUE,
lty=3, border='red')
If the vertical scale isn't right, add a ylim argument to the first call to histPlot (e.g. ylim=c(0,0.05)).
Suppose I ran a factor analysis & got 5 relevant factors. Now, I want to graphically represent the loading of these factors on the variables. Can anybody please tell me how to do it. I can do using 2 factors. But can't able to do when number of factors are more than 2.
The 2 factor plotting is given in "Modern Applied Statistics with S", Fig 11.13. I want to create similar graph but with more than 2 factors. Please find the snap of the Fig mentioned above:
X & y axes are the 2 factors.
Regards,
Ari
Beware: not the answer you are looking for and might be incorrect also, this is my subjective thought.
I think you run into the problem of sketching several dimensions on a two dimension screen/paper.
I would say there is no sense in plotting more factors' or PCs' loadings, but if you really insist: display the first two (based on eigenvalues) or create only 2 factors. Or you could reduce dimension by other methods also (e.g. MDS).
Displaying 3 factors' loadings in a 3 dimensional graph would be just hardly clear, not to think about more factors.
UPDATE: I had a dream about trying to be more ontopic :)
You could easily show projections of each pairs of factors as #joran pointed out like (I am not dealing with rotation here):
f <- factanal(mtcars, factors=3)
pairs(f$loadings)
This way you could show even more factors and be able to tweak the plot also, e.g.:
f <- factanal(mtcars, factors=5)
pairs(f$loadings, col=1:ncol(mtcars), upper.panel=NULL, main="Factor loadings")
par(xpd=TRUE)
legend('topright', bty='n', pch='o', col=1:ncol(mtcars), attr(f$loadings, 'dimnames')[[1]], title="Variables")
Of course you could also add rotation vectors also by customizing the lower triangle, or showing it in the upper one and attaching the legend on the right/below etc.
Or just point the variables on a 3D scatterplot if you have no more than 3 factors:
library(scatterplot3d)
f <- factanal(mtcars, factors=3)
scatterplot3d(as.data.frame(unclass(f$loadings)), main="3D factor loadings", color=1:ncol(mtcars), pch=20)
Note: variable names should not be put on the plots as labels, but might go to a distinct legend in my humble opinion, specially with 3D plots.
It looks like there's a package for this:
http://factominer.free.fr/advanced-methods/multiple-factor-analysis.html
Comes with sample code, and multiple factors. Load the FactoMineR package and play around.
Good overview here:
http://factominer.free.fr/docs/article_FactoMineR.pdf
Graph from their webpage:
You can also look at the factor analysis object and see if you can't extract the values and plot them manually using ggplot2 or base graphics.
As daroczig mentions, each set of factor loadings gets its own dimension. So plotting in five dimensions is not only difficult, but often inadvisable.
You can, though, use a scatterplot matrix to display each pair of factor loadings. Using the example you cite from Venables & Ripley:
#Reproducing factor analysis from Venables & Ripley
#Note I'm only doing three factors, not five
data(ability.cov)
ability.FA <- factanal(covmat = ability.cov,factor = 3, rotation = "promax")
load <- loadings(ability.FA)
rot <- ability.FA$rot
#Pairs of factor loadings to plot
ind <- combn(1:3,2)
par(mfrow = c(2,2))
nms <- row.names(load)
#Loop over pairs of factors and draw each plot
for (i in 1:3){
eqscplot(load[,ind[1,i]],load[,ind[2,i]],xlim = c(-1,1),
ylim = c(-0.5,1.5),type = "n",
xlab = paste("Factor",as.character(ind[1,i])),
ylab = paste("Factor",as.character(ind[2,i])))
text(load[,ind[1,i]],load[,ind[2,i]],labels = nms)
arrows(c(0,0),c(0,0),rot[ind[,i],ind[,i]][,1],
rot[ind[,i],ind[,i]][,2],length = 0.1)
}
which for me resulting in the following plot:
Note that I had to play a little with the the x and y limits, as well as the various other fiddly bits. Your data will be different and will require different adjustments. Also, plotting each pair of factor loadings with five factors will make for a rather busy collection of scatterplots.