How to create an x axis break on the density plot? - r

I am trying to do a density plot of a dataset that has a wide range.
data=c(-10,-20,-20,-18,-17,1000,10000, 500, 500, 500, 500000)
plot(density(data))
As you can see in the figure, we can not see much
.
Is there a way to make an axis break (or several ones) on the x axis to visualise better the distribution of the data? Or, is there a way to plot a certain range of the data in several graphs and than paste it together?
Thanks a lot!

There is a function gap.plot() in package plotrix but I think it has some problems (see How to plot “multiple” curves with a break through y-data-range in R?). I recommend you draw two plots.
## use small margins and relatively big outer margins (to write labels).
old.par <- par(mfrow = c(1, 2), mar = rep(0.5, 4), oma = c(4, 4, 1, 1))
plot(density(data), xlim = c(-1000, 29000), main = "", bty="c") # diff 30000
abline(v = par("usr")[2], lty=2) # keep the same diff of xlim to avoid misleading
plot(density(data), xlim = c(471000, 501000), main = "", yaxt ="n", bty="]") # diff 30000
abline(v = par("usr"[1]), lty=2)
par(old.par)

Related

Plotting in R using plot function

I am trying to plot few graphs using loops. I am now describing in details.
First I have a function which is calculates the y-variable (called effect for vertical axis)
effect<- function (x, y){
exp(-0.35*log(x)
+0.17*log(y)
-0.36*sqrt(log(x)*log(y)/100))
}
Now I run the following code and use the option par to plot the lines in the same graph. I use axis=FALSE and xlab="" to get a plot without labels. I do this so that my labels are not re-written each time the loop runs and looks ugly.
for (levels in seq(exp(8), exp(10), length.out = 5)){
x = seq(exp(1),exp(10), length.out = 20)
prc= effect(levels,x)
plot(x, prc,xlim = c(0,max(x)*1.05), ylim=c(0.0,0.3),
type="o", xlab = "",ylab = "", pch = 16,
col = "dark blue", lwd = 2, cex = 1, axes = F)
label = as.integer(levels) #x variable
text(max(x)*1.03,max(prc), label )
par(new=TRUE)
}
Finally, I duplicate the plot command this time using the xlab and ylab options
plot(x, prc, xlab = "X-label", ylab = "effect",
xlim = c(0,max(x)*1.05), ylim = c(0,0.3),
type="l", col ='blue')
I have several other plots in the similar lines, using complex equations. I have two questions:
Is there an better option to have the same plot with smoother lines?
Is there an easier option with few lines to achieve the same, where I can place the texts (levels) for each line on the right with white background at the back?
I believe working with the plot function was tedious and time consuming. So, I have finally used ggplot2 to plot. There were several help available online, which I have used.

Plot a piecewise regression in two steps, in the same plot

I'm trying to plot two curves in the same graph, but it doesn't work. I want to plot the function f(x) = 3x + 2 if x<=3 and f(x) = 2x-0.5x^2 if x>3 on the interval [0,6]. I thought I had to do
curve(3*x+2, 0,3)
and
curve(2*x-0.5*x^2,3,6, add = TRUE)
What could I do to plot such function?
Use xlim and ylim in the first curve to set the limits of the plot.
curve(3*x+2, 0,3, xlim = c(0, 6), ylim = c(-5, 12))
curve(2*x-0.5*x^2,3,6, add = TRUE)
As the second curve still gets cut off a little bit, you might want to use c(-7 12) for the y limits.
Another option, if you want the lines connected and which removes the need to set manual limits is to encode both functions in one with ifelse:
curve(ifelse(x <= 3, 3 * x + 2, 2 * x - 0.5 * x^2), 0, 6, ylab = "f(x)")
You can do this for instance, you'll get 2 curves together.
c1 <- curve(2*x-0.6*x^2,3,6)
c2 <- curve(2*x-0.5*x^2,3,6)
plot(c1)
lines(c2, col="red")

Rescale axis in grouped scatter plots with use of axes breaks

The data for some of these types graphs that I'm graphing in R,
http://graphpad.com/faq/images/1352-1(1).gif
has outliers that are way out of range and I can't just exclude them. I attempted to use the axis.break() function from plotrix but the function doesn't rescale the y axis. It just places a break mark on the axis. The purpose of doing this is to be able to show the medians for both groups, as well as the data points, and the outliers all in one plot frame. Essentially, the data points that are far apart from the majority is taking up a chunk of space and the majority of points are being squished, not displaying much differences. Here is the code:
https://gist.github.com/9bfb05dcecac3ecb7491
Any suggestions would be helpful.
Thanks
Unfortunately the code you link to isn't self-contained, but possibly the code you have for gap.plot() there doesn't work as you expect because you are setting ylim to cover the full data range rather than the plotted sections only. Consider the following plot:
As you can see, the y axis has tickmarks for every 50 pg/ml, but there is a gap between 175 and 425. So the data range (to the nearest 50) is c(0, 500) but the range of the y axis is c(0, 250) - it's just that the tickmarks for 200 and 250 are being treated as those for 450 and 500.
This plot was produced using the following modified version of your code:
## made up data
GRO.Controls <- c(25, 40:50, 60, 150)
GRO.Breast <- c(70, 80:90, 110, 500)
##Scatter plot for both groups
library(plotrix)
gap.plot(jitter(rep(0,length(GRO.Controls)),amount = 0.2), GRO.Controls,
gap = c(175,425), xtics = -2, # no xtics visible
ytics = seq(0, 500, by = 50),
xlim = c(-0.5, 1.5), ylim = c(0, 250),
xlab = "", ylab = "Concentrations (pg/ml)", main = "GRO(P=0.0010)")
gap.plot(jitter(rep(1,length(GRO.Breast)),amount = 0.2), GRO.Breast,
gap = c(175, 425), col = "blue", add = TRUE)
##Adds x- variable (groups) labels
mtext("Controls", side = 1, at= 0.0)
mtext("Breast Cancer", side = 1, at= 1.0)
##Adds median lines for each group
segments(-0.25, median(GRO.Controls), 0.25, median(GRO.Controls), lwd = 2.0)
segments(0.75, median(GRO.Breast), 1.25, median(GRO.Breast), lwd = 2.0,
col = "blue")
You could be using gap.plot() which is easily found by following the link on the axis.break help page. There is a worked example there.

Plotting multiple curves same graph and same scale

This is a follow-up of this question.
I wanted to plot multiple curves on the same graph but so that my new curves respect the same y-axis scale generated by the first curve.
Notice the following example:
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
# first plot
plot(x, y1)
# second plot
par(new = TRUE)
plot(x, y2, axes = FALSE, xlab = "", ylab = "")
That actually plots both sets of values on the same coordinates of the graph (because I'm hiding the new y-axis that would be created with the second plot).
My question then is how to maintain the same y-axis scale when plotting the second graph.
(The typical method would be to use plot just once to set up the limits, possibly to include the range of all series combined, and then to use points and lines to add the separate series.) To use plot multiple times with par(new=TRUE) you need to make sure that your first plot has a proper ylim to accept the all series (and in another situation, you may need to also use the same strategy for xlim):
# first plot
plot(x, y1, ylim=range(c(y1,y2)))
# second plot EDIT: needs to have same ylim
par(new = TRUE)
plot(x, y2, ylim=range(c(y1,y2)), axes = FALSE, xlab = "", ylab = "")
This next code will do the task more compactly, by default you get numbers as points but the second one gives you typical R-type-"points":
matplot(x, cbind(y1,y2))
matplot(x, cbind(y1,y2), pch=1)
points or lines comes handy if
y2 is generated later, or
the new data does not have the same x but still should go into the same coordinate system.
As your ys share the same x, you can also use matplot:
matplot (x, cbind (y1, y2), pch = 19)
(without the pch matplopt will plot the column numbers of the y matrix instead of dots).
You aren't being very clear about what you want here, since I think #DWin's is technically correct, given your example code. I think what you really want is this:
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
# first plot
plot(x, y1,ylim = range(c(y1,y2)))
# Add points
points(x, y2)
DWin's solution was operating under the implicit assumption (based on your example code) that you wanted to plot the second set of points overlayed on the original scale. That's why his image looks like the points are plotted at 1, 101, etc. Calling plot a second time isn't what you want, you want to add to the plot using points. So the above code on my machine produces this:
But DWin's main point about using ylim is correct.
My solution is to use ggplot2. It takes care of these types of things automatically. The biggest thing is to arrange the data appropriately.
y1 <- c(100, 200, 300, 400, 500)
y2 <- c(1, 2, 3, 4, 5)
x <- c(1, 2, 3, 4, 5)
df <- data.frame(x=rep(x,2), y=c(y1, y2), class=c(rep("y1", 5), rep("y2", 5)))
Then use ggplot2 to plot it
library(ggplot2)
ggplot(df, aes(x=x, y=y, color=class)) + geom_point()
This is saying plot the data in df, and separate the points by class.
The plot generated is
I'm not sure what you want, but i'll use lattice.
x = rep(x,2)
y = c(y1,y2)
fac.data = as.factor(rep(1:2,each=5))
df = data.frame(x=x,y=y,z=fac.data)
# this create a data frame where I have a factor variable, z, that tells me which data I have (y1 or y2)
Then, just plot
xyplot(y ~x|z, df)
# or maybe
xyplot(x ~y|z, df)

Lattice: multiple plots in one window?

I'm trying to put multiple lattice plots in one window using levelplot by setting par(mfrow=c(2,1)) but it seems to be ignoring this.
Is there a particular function for setting multiple plots in lattice?
The 'lattice' package is built on the grid package and attaches its namespace when 'lattice' loaded. However, in order to use the grid.layout function, you need to explicitly load() pkg::grid. The other alternative, that is probably easier, is the grid.arrange function in pkg::gridExtra:
install.packages("gridExtra")
require(gridExtra) # also loads grid
require(lattice)
x <- seq(pi/4, 5 * pi, length.out = 100)
y <- seq(pi/4, 5 * pi, length.out = 100)
r <- as.vector(sqrt(outer(x^2, y^2, "+")))
grid <- expand.grid(x=x, y=y)
grid$z <- cos(r^2) * exp(-r/(pi^3))
plot1 <- levelplot(z~x*y, grid, cuts = 50, scales=list(log="e"), xlab="",
ylab="", main="Weird Function", sub="with log scales",
colorkey = FALSE, region = TRUE)
plot2 <- levelplot(z~x*y, grid, cuts = 50, scales=list(log="e"), xlab="",
ylab="", main="Weird Function", sub="with log scales",
colorkey = FALSE, region = TRUE)
grid.arrange(plot1,plot2, ncol=2)
The Lattice Package often (but not always) ignores the par command, so i just avoid using it when plotting w/ Lattice.
To place multiple lattice plots on a single page:
create (but don't plot) the lattice/trellis plot objects, then
call print once for each plot
for each print call, pass in arguments for (i) the plot; (ii)
more, set to TRUE, and which is only passed in for the initial call to print, and (iii) pos, which gives the position of each plot on the page specified as x-y coordinate pairs for the plot's lower left-hand corner and upper right-hand
corner, respectively--ie, a vector with four numbers.
much easier to show than to tell:
data(AirPassengers) # a dataset supplied with base R
AP = AirPassengers # re-bind to save some typing
# split the AP data set into two pieces
# so that we have unique data for each of the two plots
w1 = window(AP, start=c(1949, 1), end=c(1952, 1))
w2 = window(AP, start=c(1952, 1), end=c(1960, 12))
px1 = xyplot(w1)
px2 = xyplot(w2)
# arrange the two plots vertically
print(px1, position=c(0, .6, 1, 1), more=TRUE)
print(px2, position=c(0, 0, 1, .4))
This is simple to do once you read ?print.trellis. Of particular interest is the split parameter. It may seem complicated at first sight, but it's quite straightforward once you understand what it means. From the documentation:
split: a vector of 4 integers, c(x,y,nx,ny), that says to position the current plot at the x,y position in a regular array of nx by ny plots. (Note: this has origin at top left)
You can see a couple of implementations on example(print.trellis), but here's one that I prefer:
library(lattice)
# Data
w <- as.matrix(dist(Loblolly))
x <- as.matrix(dist(HairEyeColor))
y <- as.matrix(dist(rock))
z <- as.matrix(dist(women))
# Plot assignments
pw <- levelplot(w, scales = list(draw = FALSE)) # "scales..." removes axes
px <- levelplot(x, scales = list(draw = FALSE))
py <- levelplot(y, scales = list(draw = FALSE))
pz <- levelplot(z, scales = list(draw = FALSE))
# Plot prints
print(pw, split = c(1, 1, 2, 2), more = TRUE)
print(px, split = c(2, 1, 2, 2), more = TRUE)
print(py, split = c(1, 2, 2, 2), more = TRUE)
print(pz, split = c(2, 2, 2, 2), more = FALSE) # more = FALSE is redundant
The code above gives you this figure:
As you can see, split takes four parameters. The last two refer to the size of your frame (similar to what mfrow does), whereas the first two parameters position your plot into the nx by ny frame.

Resources