fake data
set.seed( 123)
x<-rnorm(1000, mean=60,sd=20)
y <- exp(-10 + .95*log(x^3)) + rnorm(1000,mean=1,sd=1)
df <- data.frame(x,y)
cls.x <- quantile(df$x, seq(.1, .9, by=.1))
df$x.class <- findInterval(df$x, cls.x)
df$x.class <- as.factor(df$x.class)
head(df)
Neither the following work
plot(df$x,df$y,col=3)
par(new=T)
boxplot(y~x.class, data=df,xlab="",ylab="",xaxt="n")
nor this
boxplot(y~x.class, data=df,xlab="",ylab="",xaxt="n")
points(df$x,df$y,col=3)
Using ggplot, the closest I get is using something like
library(ggplot2)
ggplot(df,aes(x.class,y))+geom_boxplot() + geom_point()
Unfortunately, It does not show the real variability in the-axis.
I tried with the jitter option, but I was not able to force the plot to use the real variability of the X-variable
Any suggestion would be very much appreciated.
Ps: I am aware of the bplot.xy() function in Rlab, however, that function does not allow me to change colours of the boxplot, or plot the dots first.
library(Rlab)
bplot.xy( x,y, N=10)
points( x,y, pch=".", col=3, cex=3)
Your df$x varies from 3 to 124, whereas your x-axis is from 1 to 10. In base graphics, you can do this:
plot(jitter(as.integer(df$x.class)), df$y, col=3, type="p", xlab = "", ylab = "", xaxt = "n")
boxplot(y~x.class, data=df,xlab="",ylab="",xaxt="n", add = TRUE)
I added jitter to help break out the distribution. You might also try pch=16 to make the dots solid, and perhaps use transparency (e.g., col="#aaffaa22" for the dots).
Is the following what you want?
library(ggplot2)
ggplot(df, aes(x, y, fill = x.class)) +
geom_point(alpha = 0.10) +
geom_boxplot(alpha = 0.50)
Related
I am trying to add vertical lines to a time series plot I made in base R plot(data1,type = 'l',lwd = 1.5, family = "A", ylab ="", xlab = "", main = ""). This plot has a total of 5 plots inside of it. There are two x-axes that are the same (see current plot)
When adding vlines with abline(v=c(27,87, 167, 220, 280, 329), lty=2) I get this result
Is there a way to get them to go on the graphs so it looks something like this but with dashed lines and the lines within the plots.
Or if you know of a better way to plot this in ggplot that would be fantastic as well. Thank you so much in advance.
Here is a toy example of using ggplot to put in vlines.
library(tidyverse)
iris2 <- iris %>% pivot_longer(cols=Sepal.Length:Petal.Length)
ggplot(iris2, aes(x = Petal.Width, y = value)) +
geom_line() +
facet_wrap(~name, scales="free_y", ncol=2) +
geom_vline(xintercept=c(.25, .75, 1.25, 1.75, 2.25),
linetype='dashed', col = 'blue')
plot calls plot.ts. This has a panel= argument in which you may define a function what should happen in the panels. Since you want a grid and the usual lines, you can do quite easily:
panel <- function(...) {grid(col=1, ny=NA, lty=1); lines(...)}
plot(z, panel=panel)
grid usually uses the axTicks and you can define number of cells, see ?grid.
This also works with abline.
panel <- function(...) {abline(v=seq(1961, 1969, 2), col=1,lty=1); lines(...)}
plot(z, panel=panel)
Data:
set.seed(42)
z <- ts(matrix(rnorm(300), 100, 5), start = c(1961, 1), frequency = 12)
I'm trying to create a scatterplot with marginal histograms as in this question.
My data are two (numeric) variables which share seven discrete (somewhat) logarithmically-spaced levels.
I've successfully done this with the help of ggMarginal in the ggExtra package, however I'm not happy with the outcome as when plotting the marginal histograms using the same data as for the scatterplots, things don't line up.
As can be seen below, the histogram bars are biased a little to the right or left of the datapoints themselves.
library(ggMarginal)
library(ggplot2)
x <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,7,12,18,12,7,3))
y <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,1,13,28,13,1,3))
d <- data.frame("x" = x,"y" = y)
p1 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(), type = "histogram")
A possible solution for this may be change the variables used in the histograms into factors, so they are nicely aligned with the scatterplot axes.
This works well when creating histograms using ggplot:
p2 <- ggplot(data.frame(lapply(d, as.factor)), aes(x = x)) + geom_histogram()
However, when I try to do this using ggMarginal, I do not get the desired result - it appears that the ggMarginal histogram is still treating my variables as numeric.
p3 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(),
x = as.factor(x), y = as.factor(y), type = "histogram")
How can I ensure my histogram bars are centred over the data points?
I'm absolutely willing to accept an answer which does not involve use of ggMarginal.
Not sure if it is a good idea to replicate here the answer I gave to the question you mentioned but I have no rights to comment still, please let me know otherwise.
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
If you are willing to give baseplotting a try, here is a function:
plots$scatterWithHists <- function(x, y, histCols=c("lightblue","lightblue"), lhist=20, xlim=range(x), ylim=range(y), ...){
## set up layout and graphical parameters
layMat <- matrix(c(1,4,3,2), ncol=2)
layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
ospc <- 0.5 # outer space
pext <- 4 # par extension down and to the left
bspc <- 1 # space between scatter plot and bar plots
par. <- par(mar=c(pext, pext, bspc, bspc), oma=rep(ospc, 4)) # plot parameters
## barplot and line for x (top)
xhist <- hist(x, breaks=seq(xlim[1], xlim[2], length.out=lhist), plot=FALSE)
par(mar=c(0, pext, 0, 0))
barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density)), space=0, col=histCols[1])
## barplot and line for y (right)
yhist <- hist(y, breaks=seq(ylim[1], ylim[2], length.out=lhist), plot=FALSE)
par(mar=c(pext, 0, 0, 0))
barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density)), space=0, col=histCols[2], horiz=TRUE)
## overlap
dx <- density(x)
dy <- density(y)
par(mar=c(0, 0, 0, 0))
plot(dx, col=histCols[1], xlim=range(c(dx$x, dy$x)), ylim=range(c(dx$y, dy$y)),
lwd=4, type="l", main="", xlab="", ylab="", yaxt="n", xaxt="n", bty="n"
)
points(dy, col=histCols[2], type="l", lwd=3)
## scatter plot
par(mar=c(pext, pext, 0, 0))
plot(x, y, xlim=xlim, ylim=ylim, ...)
}
Just do:
scatterWithHists(x,y, histCols=c("lightblue","orange"))
And you get:
If you absolutely want to use ggMargins then look up xparams and yparams. It says you can send additional arguments to x-margin and y-margin using those. I was only successful in sending trivial things like color. But maybe sending something like xlim would help.
I have two Poisson processes:
n <- 100
x <- seq(0, 10, length = 1000)
y1 <- cumsum(rpois(1000, 1 / n))
y2 <- -cumsum(rpois(1000, 1 / n))
I would like to plot them in one plot and expect that y1 lies above x-axis and y2 lies below x-axis. I tried the following code:
plot(x, y1)
par(new = TRUE)
plot(x, y2, col = "red",
axes = FALSE,
xlab = '', ylab = '',
xlim = c(0, 10), ylim = c(min(y2), max(y1)))
but it did not work. Can someone please tell me how to fix this? (I am working with R for my code)
Many thanks in advance
How about
plot(x,y1, ylim=range(y1,y2), type="l")
lines(x, y2, col="red")
I would suggest trying to avoid multiple calls to plot with par(new=TRUE). That is usually very messy. Here we use lines() to add to an existing plot. The only catch is that the x and y limits won't change based on the new data, so we use ylim in the first plot() call to set a range appropriate for all the data.
Or if you don't want to worry about limits (like MrFlick mentioned) or the number of lines, you could also tide up your data and using melt and ggplot
df <- data.frame(x, y1, y2)
library(reshape2)
library(ggplot2)
mdf <- melt(df, "x")
ggplot(mdf, aes(x, value, color = variable)) +
geom_line()
I have a for loop that creates a set of beta densities and looks something like this in my script.
x = seq(0,1,0.01)
alpha <- c(1,3,5,5,1,1,2,5,5,2)
beta <- c(2,4,15,5,1,1,2,5,5,5)
color <- c("blue","green","pink","gray","brown","yellow","purple","skyblue","plum","tan")
plot(x,dbeta(x, alpha[1], beta[1]) / sum(x), type="l", col= color[1], xlab="x-axis", ylab="y-axis")
for(i in 2:10){
lines(x,dbeta(x, alpha[i], beta[i]), type="l", col= color[i], pch="i")
}
I now want to create a legend at the bottom of the plot containing, the color of the line and the corresponding values from the alpha/beta vector. How do I achive this? All my attempts have failed until now...
Not base graphics, but ggplot was made for this.
params = data.frame(alpha,beta)
gg <- do.call(rbind,lapply(1:nrow(params),function(i)
cbind(i,x,y=dbeta(x,params[i,]$alpha,params[i,]$beta))))
gg <- data.frame(gg)
library(ggplot2)
ggplot(gg,aes(x,y,color=factor(i))) +
geom_line(size=1)+
scale_color_manual("Alpha,Beta",
values=color,
labels=paste0("(",params$alpha,", ",params$beta,")"))+
theme(legend.position="bottom")
Note that with your definitions of alpha and beta, plots 4, 8, and 9 are identical, and plots 5 and 6 are identical.
I want compare two curves, it's possible with R to draw a plot and then draw another plot over it ? how ?
thanks.
With base R, you can plot your one curve and then add the second curve with the lines() argument. Here's a quick example:
x <- 1:10
y <- x^2
y2 <- x^3
plot(x,y, type = "l")
lines(x, y2, col = "red")
Alternatively, if you wanted to use ggplot2, here are two methods - one plots different colors on the same plot, and the other generates separate plots for each variable. The trick here is to "melt" the data into long format first.
library(ggplot2)
df <- data.frame(x, y, y2)
df.m <- melt(df, id.var = "x")
qplot(x, value, data = df.m, colour = variable, geom = "line")
qplot(x, value, data = df.m, geom = "line")+ facet_wrap(~ variable)
Using lattice package:
require(lattice)
x <- seq(-3,3,length.out=101)
xyplot(dnorm(x) + sin(x) + cos(x) ~ x, type = "l")
There's been some solutions already for you. If you stay with the base package, you should get acquainted with the functions plot(), lines(), abline(), points(), polygon(), segments(), rect(), box(), arrows(), ...Take a look at their help files.
You should see a plot from the base package as a pane with the coordinates you gave it. On that pane, you can draw a whole set of objects with the abovementioned functions. They allow you to construct a graph as you want. You should remember though that, unless you play with the par settings like Dr. G showed, every call to plot() gives you a new pane. Also take into account that things can be plot over other things, so think about the order you use to plot things.
See eg:
set.seed(100)
x <- 1:10
y <- x^2
y2 <- x^3
yse <- abs(runif(10,2,4))
plot(x,y, type = "n") # type="n" only plots the pane, no curves or points.
# plots the area between both curves
polygon(c(x,sort(x,decreasing=T)),c(y,sort(y2,decreasing=T)),col="grey")
# plot both curves
lines(x,y,col="purple")
lines(x, y2, col = "red")
# add the points to the first curve
points(x, y, col = "black")
# adds some lines indicating the standard error
segments(x,y,x,y+yse,col="blue")
# adds some flags indicating the standard error
arrows(x,y,x,y-yse,angle=90,length=0.1,col="darkgreen")
This gives you :
Have a look at par
> ?par
> plot(rnorm(100))
> par(new=T)
> plot(rnorm(100), col="red")
ggplot2 is a great package for this sort of thing:
install.packages('ggplot2')
require(ggplot2)
x <- 1:10
y1 <- x^2
y2 <- x^3
df <- data.frame(x = x, curve1 = y1, curve2 = y2)
df.m <- melt(df, id.vars = 'x', variable_name = 'curve' )
# now df.m is a data frame with columns 'x', 'curve', 'value'
ggplot(df.m, aes(x,value)) + geom_line(aes(colour = curve)) +
geom_point(aes(shape=curve))
You get the plot coloured by curve, and with different piont marks for each curve, and a nice legend, all painlessly without any additional work:
Draw multiple curves at the same time with the matplot function. Do help(matplot) for more.