this is a follow up to a previously asked, related question:
data and code are here error message when ploting subjects at risk with survplot
When trying to plot the subjects at risk below the survplot, the table either overlaps with the labels of the x - axis or does not appear on the plot (in the example below one line is missing; totalps=4). How to solve this issue?
From the documentation to the survplot command, I understand that I may have to reset the margins of the plot with the par command (e.g. par(mar=c(5,4,4,2)+.1).) I don't understand how to include this par command into survplot.
Furthermore, there is considerable space between the lines of the table on the subjects at risk. Is there any direct way to reduce this space?
Here the code for the plot:
library(rms)
pdf(plot1.pdf)
survplot(KM.Duration.totalps[-1],
xlab="duration in months", ylab="survival prob",
conf="none",
label.curves=list(method="arrow", cex=0.8),
time.inc=12,
col=c(1:4),
levels.only = FALSE,
n.risk=TRUE,
y.n.risk = -0.3, cex.n.risk = 0.6
)
dev.off()
Simply reading the help page:
sep.n.risk
multiple of upper y limit - lower y limit for separating lines of text containing number of subjects at risk. Default is .056*(ylim[2]-ylim[1]).
And some par functions are best used with par just prior to the plot call (but after the pdf() call), so:
pdf(...)
par( mar=c(7,4,4,2)+.1) ) # adds two lines to default space along bottom margin
survplot(...
You set the margins before you plot, like so:
par(mar=(0,0,0,0))
plot(c(1:10))
will give you a plot with no margins. par(mar=(1,2,3,4) will give you a margin of one text line at the bottom, two on the left, three on the top, and four on the right.
If you want to specify the margins in inches use par(mai=(x,x,x,x)). The default for R is that an output device is 7 by 7 inches, although depending on the device (including ones I've written), that might be a little fuzzy.
Related
I have performed PCA Analysis using the prcomp function apart of the FactoMineR package on quite a substantial dataset of 3000 x 500.
I have tried plotting the main Principal Components that cover up to 100% of cumulative variance proportion with a fviz_eig plot. However, this is a very large plot due to the large dimensions of the dataset. Is there any way in R to split a plot into multiple plots using a for loop or any other way?
Here is a visual of my plot that only cover 80% variance due to the fact it being large. Could I split this plot into 2 plots?
Large Dataset Visualisation
I have tried splitting the plot up using a for loop...
for(i in data[1:20]) {
fviz_eig(data, addlabels = TRUE, ylim = c(0, 30))
}
But this doesn't work.
Edited Reproducible example:
This is only a small reproducible example using an already available dataset in R but I used a similar method for my large dataset. It will show you how the plot actually works.
# Already existing data in R.
install.packages("boot")
library(boot)
data(frets)
frets
dataset_pca <- prcomp(frets)
dataset_pca$x
fviz_eig(dataset_pca, addlabels = TRUE, ylim = c(0, 100))
However, my large dataset has a lot more PCs that this one (possibly 100 or more to cover up to 100% of cumulative variance proportion) and therefore this is why I would like a way to split the single plot into multiple plots for better visualisation.
Update:
I have performed what was said by #G5W below...
data <- prcomp(data, scale = TRUE, center = TRUE)
POEV = data$sdev^2 / sum(data$sdev^2)
barplot(POEV, ylim=c(0,0.22))
lines(0.7+(0:10)*1.2, POEV, type="b", pch=20)
text(0.7+(0:10)*1.2, POEV, labels = round(100*POEV, 1), pos=3)
barplot(POEV[1:40], ylim=c(0,0.22), main="PCs 1 - 40")
text(0.7+(0:6)*1.2, POEV[1:40], labels = round(100*POEV[1:40], 1),
pos=3)
and I have now got a graph as follows...
Graph
But I am finding it difficult getting the labels to appear above each bar. Can someone help or suggest something for this please?
I am not 100% sure what you want as your result,
but I am 100% sure that you need to take more control over
what is being plotted, i.e. do more of it yourself.
So let me show an example of doing that. The frets data
that you used has only 4 dimensions so it is hard to illustrate
what to do with more dimensions, so I will instead use the
nuclear data - also available in the boot package. I am going
to start by reproducing the type of graph that you displayed
and then altering it.
library(boot)
data(nuclear)
N_PCA = prcomp(nuclear)
plot(N_PCA)
The basic plot of a prcomp object is similar to the fviz_eig
plot that you displayed but has three main differences. First,
it is showing the actual variances - not the percent of variance
explained. Second, it does not contain the line that connects
the tops of the bars. Third, it does not have the text labels
that tell the heights of the boxes.
Percent of Variance Explained. The return from prcomp contains
the raw information. str(N_PCA) shows that it has the standard
deviations, not the variances - and we want the proportion of total
variation. So we just create that and plot it.
POEV = N_PCA$sdev^2 / sum(N_PCA$sdev^2)
barplot(POEV, ylim=c(0,0.8))
This addresses the first difference from the fviz_eig plot.
Regarding the line, you can easily add that if you feel you need it,
but I recommend against it. What does that line tell you that you
can't already see from the barplot? If you are concerned about too
much clutter obscuring the information, get rid of the line. But
just in case, you really want it, you can add the line with
lines(0.7+(0:10)*1.2, POEV, type="b", pch=20)
However, I will leave it out as I just view it as clutter.
Finally, you can add the text with
text(0.7+(0:10)*1.2, POEV, labels = round(100*POEV, 1), pos=3)
This is also somewhat redundant, but particularly if you change
scales (as I am about to do), it could be helpful for making comparisons.
OK, now that we have the substance of your original graph, it is easy
to separate it into several parts. For my data, the first two bars are
big so the rest are hard to see. In fact, PC's 5-11 show up as zero.
Let's separate out the first 4 and then the rest.
barplot(POEV[1:4], ylim=c(0,0.8), main="PC 1-4")
text(0.7+(0:3)*1.2, POEV[1:4], labels = round(100*POEV[1:4], 1),
pos=3)
barplot(POEV[5:11], ylim=c(0,0.0001), main="PC 5-11")
text(0.7+(0:6)*1.2, POEV[5:11], labels = round(100*POEV[5:11], 4),
pos=3, cex=0.8)
Now we can see that even though PC 5 is much smaller that any of 1-4,
it is a good bit bigger than 6-11.
I don't know what you want to show with your data, but if you
can find an appropriate way to group your components, you can
zoom in on whichever PCs you want.
Data Background: I have a large data frame (50,000 values, 10,000 when removing NAs) for a single chromosome. I am trying to plot a fixation index (Y-range: 0-1)(data$'N:S') across chromosomal positions (X-range: 0-250,000,000)(data$'pos'). I used a program (popoolation 2) to calculate sliding window averages for a window size of 50,000 and a step size of 10,000, resulting in my data. However, on R this is too noisy and it comes out looking like a blob. When I zoom in by changing the x-axis so each tick is 500,000 separation, you can see the trends nicely. I think I can fix this on a large chromosomal stage by increasing the area of the x-axis and finding a way to simplify the data.
Currently I have: All my data plotted, simple mean, StandDevs (color coded)
I am trying to figure out two things.
1 Is there a way to extend the X-axis to stretch out the length of it. I don't want to change the markers on it or what it displays, I want to make the actual length longer. (Example, if I had a graph on a piece of paper that showed an x-Axis of 1-10 on a 2" area, I would want to increase the area to 5", not change the defined limits to say 1-100. so, not xlim function)
2 Simplify the data in some way. I was thinking easiest would be a smoothed or rolling mean across the data. When I use rollmean() or smooth() it separates my data from the x-axis, so it only extends to the 8,000 points and when I plot it doesn't go across the whole chromosomal graph with the rest of my data. Someone mentioned there may be away to instead randomly sample data to simplify it?
2B If I get a trendline to work, can I color code it so that part of it that is 1 or 2 standard deviations above the mean can be a different color if I mute my actual background data and remove its color.
R Code
Image 1-Plotting All Positions
plot(data$'Pos',data$'N:S', ylim=c(0,0.5), col=data$Colour)
Image 3-I tried both
lines(smooth(datatest$`N:S`), type="l", col = "blue", lwd = 1)
and
rolling = rollmean(datatest$N:S, 9)
lines(rolling, type="b", col = "purple", lwd = 1)
Image 2-Plotting a Nice Subsection-- why I want to extend X-axis
plot(data$'Pos',data$'N:S', ylim=c(0,0.5), xlim=c(163000000,165000000), col=data$Colour)
Notes:
If it matters, my graph has colored points due to color coded regions related to means and Standard Dev.
data$Colour[data$'N:S'>=data_SD1above]="orange"
Also, the only difference between data and datatest was that datatest had NA values removed.
Image 1: All Positions-Messy
Image 2: Zoomed In to see trends
Image 3: All positions with the two attempted trendlines
So it seems like that you want to resize the width of the graph for the visualization.
if you use Rstudio, there is an output option which changes the width and height of the graph.
if you use the console, you can save your plot with width and height. for example
png("mychromosome.png".width=1000,height=300)
plot(..blah..blah..)
dev.off()
I hope it will help you.
I am trying to arrange 3 plots together. All 3 plots have the same y axis scale, but the third plot has a longer x axis than the other two. I would like to arrange the first two plots side by side in the first row and then place the third plot on the second row aligned to the right. Ideally I would like the third plot's x values to align with plot 2 for the full extent of plot 2 and then continue on below plot one. I have seen some other postings about using the layout function to reach this general configuration (Arrange plots in a layout which cannot be achieved by 'par(mfrow ='), but I haven't found anything on fine tuning the plots so that the scales match. Below is a crappy picture that should be able to get the general idea across.
I thought you could do this by using par("plt"), which returns the coordinates of the plot region as a fraction of the total figure region, to programmatically calculate how much horizontal space to allocate to the bottom plot. But even when using this method, manual adjustments are necessary. Here's what I've got for now.
First, set the plot margins to be a bit thinner than the default. Also, las=1 rotates the y-axis labels to be horizontal, and xaxs="i" (default is "r") sets automatic x-axis padding to zero. Instead, we'll set the amount of padding we want when we create the plots.
par(mar=c(3,3,0.5,0.5), las=1, xaxs="i")
Some fake data:
dat1=data.frame(x=seq(-5000,-2500,length=100), y=seq(-0.2,0.6,length=100))
dat2=data.frame(x=seq(-6000,-2500,length=100), y=seq(-0.2,0.6,length=100))
Create a layout matrix:
# Coordinates of plot region as a fraction of the total figure region
# Order c(x1, x2, y1, y2)
pdim = par("plt")
# Constant padding value for left and right ends of x-axis
pad = 0.04*diff(range(dat1$x))
# If total width of the two top plots is 2 units, then the width of the
# bottom right plot is:
p3w = diff(pdim[1:2]) * (diff(range(dat2$x)) + 2*pad)/(diff(range(dat1$x)) + 2*pad) +
2*(1-pdim[2]) + pdim[1]
# Create a layout matrix with 200 "slots"
n=200
# Adjustable parameter for fine tuning to get top and bottom plot lined up
nudge=2
# Number of slots needed for the bottom right plot
l = round(p3w/2 * n) - nudge
# Create layout matrix
layout(matrix(c(rep(1:2, each=0.5*n), rep(4:3,c(n - l, l))), nrow=2, byrow=TRUE))
Now create the graphs: The two calls to abline are just to show us whether the graphs' x-axes line up. If not, we'll change the nudge parameter and run the code again. Once we've got the layout we want, we can run all the code one final time without the calls to abline.
# Plot first two graphs
with(dat1, plot(x,y, xlim=range(dat1$x) + c(-pad,pad)))
with(dat1, plot(x,y, xlim=range(dat1$x) + c(-pad,pad)))
abline(v=-5000, xpd=TRUE, col="red")
# Lower right plot
plot(dat2, xaxt="n", xlim=range(dat2$x) + c(-pad,pad))
abline(v=-5000, xpd=TRUE, col="blue")
axis(1, at=seq(-6000,-2500,500))
Here's what we get with nudge=2. Note the plots are lined up, but this is also affected by the pixel size of the saved plot (for png files), and I adjusted the size to get the upper and lower plots exactly lined up.
I would have thought that casting all the quantities in ratios that are relative to the plot area (by using par("plt")) would have both ensured that the upper and lower plots lined up and that they would stay lined up regardless of the number of pixels in the final image. But I must be missing something about how base graphics work or perhaps I've messed up a calculation (or both). In any case, I hope this helps you get the plot layout you wanted.
I want to present percentages over a 24h period in 15 min intervals as a bar plot.
When I use barplot(), the labels for those timepoints are more or less randomly chosen by R (depending on how I format the window. I know it's not random, but it's not what I want either). I would rather have them evenly spaced at 1 h intervals (that is every 4th bar).
I have searched extensively on this and know i can add labels later with axis() but I have not found a way to set which bars are labeled and which are left blank.
So here is an example. Sorry for the long lines:
x<-sample(1:100,96)
Labels<-c("09","09:15","09:30","09:45","10:00","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
names(x)<-Labels
barplot(x)
I do not think you can force R to show every label if it does not have enough space. But at least if you want to add the labels every 1h, the following code should work :
x<-sample(1:100,96)
Labels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")
b=barplot(x,axes = F)
axis(2)
axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))
I have performed a multidimensional cluster analysis in matlab. For each cluster, I have calculated mean and covariance (assuming conditional independence).
I have chosen two or three dimensions out of my raw data and plotted it into a scatter or scatter3 plot.
Now I would like to add the cluster-means and the corresponding standart deviations into the same plot.
In other words, I wand to add some data points with error bars to a scatter plot.
This question is almost what I want. But I would be ok with bars instead of boxes and I wonder if in that case there is a built-in way to do it with less effort.
Any suggestions on how to do that?
Once you realize that line segments will probably suffice for your purpose (and may be less ugly than the usual error bars with the whiskers, depending on the number of points), you can do something pretty simple (which applies to probably any plotting package, not just MATLAB).
Just plot a scatter, then write a loop to plot all line-segments you want corresponding to error bars (or do it in the opposite order like I did with error bars first then the scatter plot, depending if you want your dots or your error bars on top).
Here is the simple MATLAB code, along with an example figure showing error bars in two dimensions (sorry for the boring near-linearity):
As you can see, you can plot error bars for each axis in different colors to aid in visualization.
function scatterError(x, y, xe, ye, varargin)
%Brandon Barker 01/20/2014
nD = length(x);
%Make these defaults later:
dotColor = [1 0.3 0.3]; % conservative pink
yeColor = [0, 0.4, 0.8]; % bright navy blue
xeColor = [0.35, 0.35, 0.35]; % not-too-dark grey
dotSize = 23;
figure();
set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.
set(gca, 'FontSize', 23);
hold all;
for i = 1:nD
plot([(x(i) - xe(i)) (x(i) + xe(i))], [y(i) y(i)], 'Color', xeColor);
plot([x(i) x(i)], [(y(i) - ye(i)) (y(i) + ye(i))], 'Color', yeColor);
end
scatter(x, y, dotSize, repmat(dotColor, nD, 1));
set(gca, varargin{:});
axis square;
With some extra work, it wouldn't be too hard to add whiskers to your error bars if you really want them.
If you are not too picky about what the graph looks like and are looking for performance, a builtin function is indeed often a good choice.
My first thought would be to try using a boxplot, it has quite a lot of options so probably one combination of them will give you the result you need.
Sidenote: At first sight the answer you referred to does not look very inefficient so you may have to manage your expectations when it comes to achievable speedups.