I have a for loop that creates a set of beta densities and looks something like this in my script.
x = seq(0,1,0.01)
alpha <- c(1,3,5,5,1,1,2,5,5,2)
beta <- c(2,4,15,5,1,1,2,5,5,5)
color <- c("blue","green","pink","gray","brown","yellow","purple","skyblue","plum","tan")
plot(x,dbeta(x, alpha[1], beta[1]) / sum(x), type="l", col= color[1], xlab="x-axis", ylab="y-axis")
for(i in 2:10){
lines(x,dbeta(x, alpha[i], beta[i]), type="l", col= color[i], pch="i")
}
I now want to create a legend at the bottom of the plot containing, the color of the line and the corresponding values from the alpha/beta vector. How do I achive this? All my attempts have failed until now...
Not base graphics, but ggplot was made for this.
params = data.frame(alpha,beta)
gg <- do.call(rbind,lapply(1:nrow(params),function(i)
cbind(i,x,y=dbeta(x,params[i,]$alpha,params[i,]$beta))))
gg <- data.frame(gg)
library(ggplot2)
ggplot(gg,aes(x,y,color=factor(i))) +
geom_line(size=1)+
scale_color_manual("Alpha,Beta",
values=color,
labels=paste0("(",params$alpha,", ",params$beta,")"))+
theme(legend.position="bottom")
Note that with your definitions of alpha and beta, plots 4, 8, and 9 are identical, and plots 5 and 6 are identical.
Related
fake data
set.seed( 123)
x<-rnorm(1000, mean=60,sd=20)
y <- exp(-10 + .95*log(x^3)) + rnorm(1000,mean=1,sd=1)
df <- data.frame(x,y)
cls.x <- quantile(df$x, seq(.1, .9, by=.1))
df$x.class <- findInterval(df$x, cls.x)
df$x.class <- as.factor(df$x.class)
head(df)
Neither the following work
plot(df$x,df$y,col=3)
par(new=T)
boxplot(y~x.class, data=df,xlab="",ylab="",xaxt="n")
nor this
boxplot(y~x.class, data=df,xlab="",ylab="",xaxt="n")
points(df$x,df$y,col=3)
Using ggplot, the closest I get is using something like
library(ggplot2)
ggplot(df,aes(x.class,y))+geom_boxplot() + geom_point()
Unfortunately, It does not show the real variability in the-axis.
I tried with the jitter option, but I was not able to force the plot to use the real variability of the X-variable
Any suggestion would be very much appreciated.
Ps: I am aware of the bplot.xy() function in Rlab, however, that function does not allow me to change colours of the boxplot, or plot the dots first.
library(Rlab)
bplot.xy( x,y, N=10)
points( x,y, pch=".", col=3, cex=3)
Your df$x varies from 3 to 124, whereas your x-axis is from 1 to 10. In base graphics, you can do this:
plot(jitter(as.integer(df$x.class)), df$y, col=3, type="p", xlab = "", ylab = "", xaxt = "n")
boxplot(y~x.class, data=df,xlab="",ylab="",xaxt="n", add = TRUE)
I added jitter to help break out the distribution. You might also try pch=16 to make the dots solid, and perhaps use transparency (e.g., col="#aaffaa22" for the dots).
Is the following what you want?
library(ggplot2)
ggplot(df, aes(x, y, fill = x.class)) +
geom_point(alpha = 0.10) +
geom_boxplot(alpha = 0.50)
I want to plot a 3D plot using R. My data set is independent, which means the values of x, y, and z are not dependent on each other. The plot I want is given in this picture:
This plot was drawn by someone using MATLAB. How can I can do the same kind of Plot using R?
Since you posted your image file, it appears you are not trying to make a 3d scatterplot, rather a 2d scatterplot with a continuous color scale to indicate the value of a third variable.
Option 1: For this approach I would use ggplot2
# make data
mydata <- data.frame(x = rnorm(100, 10, 3),
y = rnorm(100, 5, 10),
z = rpois(100, 20))
ggplot(mydata, aes(x,y)) + geom_point(aes(color = z)) + theme_bw()
Which produces:
Option 2: To make a 3d scatterplot, use the cloud function from the lattice package.
library(lattice)
# make some data
x <- runif(20)
y <- rnorm(20)
z <- rpois(20, 5) / 5
cloud(z ~ x * y)
I usually do these kinds of plots with the base plotting functions and some helper functions for the color levels and color legend from the sinkr package (you need the devtools package to install from GitHib).
Example:
#library(devtools)
#install_github("marchtaylor/sinkr")
library(sinkr)
# example data
grd <- expand.grid(
x=seq(nrow(volcano)),
y=seq(ncol(volcano))
)
grd$z <- c(volcano)
# plot
COL <- val2col(grd$z, col=jetPal(100))
op <- par(no.readonly = TRUE)
layout(matrix(1:2,1,2), widths=c(4,1), heights=4)
par(mar=c(4,4,1,1))
plot(grd$x, grd$y, col=COL, pch=20)
par(mar=c(4,1,1,4))
imageScale(grd$z, col=jetPal(100), axis.pos=4)
mtext("z", side=4, line=3)
par(op)
Result:
I'm trying to create a scatterplot with marginal histograms as in this question.
My data are two (numeric) variables which share seven discrete (somewhat) logarithmically-spaced levels.
I've successfully done this with the help of ggMarginal in the ggExtra package, however I'm not happy with the outcome as when plotting the marginal histograms using the same data as for the scatterplots, things don't line up.
As can be seen below, the histogram bars are biased a little to the right or left of the datapoints themselves.
library(ggMarginal)
library(ggplot2)
x <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,7,12,18,12,7,3))
y <- rep(log10(c(1,2,3,4,5,6,7)), times=c(3,1,13,28,13,1,3))
d <- data.frame("x" = x,"y" = y)
p1 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(), type = "histogram")
A possible solution for this may be change the variables used in the histograms into factors, so they are nicely aligned with the scatterplot axes.
This works well when creating histograms using ggplot:
p2 <- ggplot(data.frame(lapply(d, as.factor)), aes(x = x)) + geom_histogram()
However, when I try to do this using ggMarginal, I do not get the desired result - it appears that the ggMarginal histogram is still treating my variables as numeric.
p3 <- ggMarginal(ggplot(d, aes(x,y)) + geom_point() + theme_bw(),
x = as.factor(x), y = as.factor(y), type = "histogram")
How can I ensure my histogram bars are centred over the data points?
I'm absolutely willing to accept an answer which does not involve use of ggMarginal.
Not sure if it is a good idea to replicate here the answer I gave to the question you mentioned but I have no rights to comment still, please let me know otherwise.
I've found the package (ggpubr) that seems to work very well for this problem and it considers several possibilities to display the data.
The link to the package is here, and in this link you will find a nice tutorial to use it. For completeness, I attach one of the examples I reproduced.
I first installed the package (it requires devtools)
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
For the particular example of displaying different histograms for different groups, it mentions in relation with ggExtra: "One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots. In the R code below, we provide a solution using the cowplot package." In my case, I had to install the latter package:
install.packages("cowplot")
And I followed this piece of code:
# Scatter plot colored by groups ("Species")
sp <- ggscatter(iris, x = "Sepal.Length", y = "Sepal.Width",
color = "Species", palette = "jco",
size = 3, alpha = 0.6)+
border()
# Marginal density plot of x (top panel) and y (right panel)
xplot <- ggdensity(iris, "Sepal.Length", fill = "Species",
palette = "jco")
yplot <- ggdensity(iris, "Sepal.Width", fill = "Species",
palette = "jco")+
rotate()
# Cleaning the plots
sp <- sp + rremove("legend")
yplot <- yplot + clean_theme() + rremove("legend")
xplot <- xplot + clean_theme() + rremove("legend")
# Arranging the plot using cowplot
library(cowplot)
plot_grid(xplot, NULL, sp, yplot, ncol = 2, align = "hv",
rel_widths = c(2, 1), rel_heights = c(1, 2))
Which worked fine for me:
If you are willing to give baseplotting a try, here is a function:
plots$scatterWithHists <- function(x, y, histCols=c("lightblue","lightblue"), lhist=20, xlim=range(x), ylim=range(y), ...){
## set up layout and graphical parameters
layMat <- matrix(c(1,4,3,2), ncol=2)
layout(layMat, widths=c(5/7, 2/7), heights=c(2/7, 5/7))
ospc <- 0.5 # outer space
pext <- 4 # par extension down and to the left
bspc <- 1 # space between scatter plot and bar plots
par. <- par(mar=c(pext, pext, bspc, bspc), oma=rep(ospc, 4)) # plot parameters
## barplot and line for x (top)
xhist <- hist(x, breaks=seq(xlim[1], xlim[2], length.out=lhist), plot=FALSE)
par(mar=c(0, pext, 0, 0))
barplot(xhist$density, axes=FALSE, ylim=c(0, max(xhist$density)), space=0, col=histCols[1])
## barplot and line for y (right)
yhist <- hist(y, breaks=seq(ylim[1], ylim[2], length.out=lhist), plot=FALSE)
par(mar=c(pext, 0, 0, 0))
barplot(yhist$density, axes=FALSE, xlim=c(0, max(yhist$density)), space=0, col=histCols[2], horiz=TRUE)
## overlap
dx <- density(x)
dy <- density(y)
par(mar=c(0, 0, 0, 0))
plot(dx, col=histCols[1], xlim=range(c(dx$x, dy$x)), ylim=range(c(dx$y, dy$y)),
lwd=4, type="l", main="", xlab="", ylab="", yaxt="n", xaxt="n", bty="n"
)
points(dy, col=histCols[2], type="l", lwd=3)
## scatter plot
par(mar=c(pext, pext, 0, 0))
plot(x, y, xlim=xlim, ylim=ylim, ...)
}
Just do:
scatterWithHists(x,y, histCols=c("lightblue","orange"))
And you get:
If you absolutely want to use ggMargins then look up xparams and yparams. It says you can send additional arguments to x-margin and y-margin using those. I was only successful in sending trivial things like color. But maybe sending something like xlim would help.
I am trying to plot 4 ecdf functions on one plot but can't seem to figure out the proper syntax.
If I have 4 functions "A, B, C, D" what would be the proper syntax in R to get them to be plotted on the same chart with different colors. Thanks!
Here is one way (for three of them, works for four the same way):
set.seed(42)
ecdf1 <- ecdf(rnorm(100)*0.5)
ecdf2 <- ecdf(rnorm(100)*1.0)
ecdf3 <- ecdf(rnorm(100)*2.0)
plot(ecdf3, verticals=TRUE, do.points=FALSE)
plot(ecdf2, verticals=TRUE, do.points=FALSE, add=TRUE, col='brown')
plot(ecdf1, verticals=TRUE, do.points=FALSE, add=TRUE, col='orange')
Note that I am using the fact that the third has the widest range, and use that to initialize the canvas. Else you need ylim=c(...).
The package latticeExtra provides the function ecdfplot.
library(lattice)
library(latticeExtra)
set.seed(42)
vals <- data.frame(r1=rnorm(100)*0.5,
r2=rnorm(100),
r3=rnorm(100)*2)
ecdfplot(~ r1 + r2 + r3, data=vals, auto.key=list(space='right')
Here is an approach using ggplot2 (using the ecdf objects from [Dirk's answer])(https://stackoverflow.com/a/20601807/1385941)
library(ggplot2)
# create a data set containing the range you wish to use
d <- data.frame(x = c(-6,6))
# create a list of calls to `stat_function` with the colours you wish to use
ll <- Map(f = stat_function, colour = c('red', 'green', 'blue'),
fun = list(ecdf1, ecdf2, ecdf3), geom = 'step')
ggplot(data = d, aes(x = x)) + ll
A simpler way is to use ggplot and have the variable that you want to plot as a factor. In the example below, I have Portfolio as a factor and plotting the distribution of Interest Rates by Portfolio.
# select a palette
myPal <- c( 'royalblue4', 'lightsteelblue1', 'sienna1')
# plot the Interest Rate distribution of each portfolio
# make an ecdf of each category in Portfolio which is a factor
g2 <- ggplot(mortgage, aes(x = Interest_Rate, color = Portfolio)) +
scale_color_manual(values = myPal) +
stat_ecdf(lwd = 1.25, geom = "line")
g2
You can also set geom = "step", geom = "point" and adjust the line width lwd in the stat_ecdf() function. This gives you a nice plot with the legend.
I have following data and plot:
pos <- rep(1:2000, 20)
xv =c(rep(1:20, each = 2000))
# colrs <- unique(xv)
colrs <- xv # edits
yv =rnorm(2000*20, 0.5, 0.1)
xv = lapply(unique(xv), function(x) pos[xv==x])
to.add = cumsum(sapply(xv, max) + 1000)
bp <- c(xv[[1]], unlist(lapply(2:length(xv), function(x) xv[[x]] + to.add[x-1])))
plot (bp,yv, pch = "*", col = colrs)
I have few issues in this plot I could not figure out.
(1) I want to use different color for different group or two different color for different groups (i.e xv), but when I tried color function in terms to be beautiful mixture. Although I need to highlight some points (for example bp 4000 to 4500 for example with blue color)
(2) Instead of bp positions I want to put a tick mark and label with the group.
Thank you, appreciate your help.
Edits: with help of the following answer (with slight different approach in case I have unbalanced number in each group will work) I could get the similar plot. But still question remaining regarding colors is what if I want to use two alternate colors in alternate group ?
You can solve your colour issue by repeating the colour index however many times each group has a point plotted, like so:
plot (bp,yv, pch = "*", col = rep(colrs,each=2000))
The default colour palette (see ?palette or palette() ) will wrap around itself and you might want to specify your own to get 20 distinct colours.
To relabel the x axis, try plotting without the axis and then specifying the points and labels manually.
plot (bp,yv, pch = "*", col = rep(colrs,each=2000),xaxt="n")
axis(1,at=seq(1000,58000,3000),labels=1:20)
If you are trying to squeeze a lot of labels in there, you might have to shrink the text (cex.axis)or spin the labels 90 degrees (las=2).
plot (bp,yv, pch = "*", col = rep(colrs,each=2000),xaxt="n")
axis(1,at=seq(1000,58000,3000),labels=1:20,cex.axis=0.7,las=2)
Result:
One way is you could use a nested ifelse.
I'm still learning R, but one way it could be done would look something like:
plot(whatev$x, whatev$y, col=ifelse(xv<2000,red,ifelse(2000<xv & xv<4000,yellow,blue)))
You could nest as many of these as you want to have specificity on the colors and the intervals. The ifelse command is of form ifelse(TEST, True, False).
A simpler way would be to use the unique groups in xv to assign rainbow colors.
colrs=rainbow(length(unique(xv))) #Or colrs=rainbow(length(xv)) if xv is unique.
plot(whatev$x, whatev$y, col=colrs)
I hope I got all that right. I'm still learning R myself.
I'm going to go out on a limb and guess that your real data are something like 2000 values of things from 20 different groups. For instance, heights of 2000 plants of 20 different species. In such a case, you might want to look at the dotplot() function (or as illustrated below, dotplot.table()) in the lattice package.
Generate matrix of hypothetical values:
set.seed(1)
myY <- sapply( seq_len(20), function(x) rnorm(2000, x^(1/3)))
Transpose matrix to get groups as rows
myY <- t(myY)
Provide names of groups to matrix:
dimnames(myY)[[1]]<-paste("group", seq_len(nrow(myY)))
Load lattice package
library(lattice)
Generate dotplot
dotplot(myY, horizontal = FALSE, panel = function(x, y, horizontal, ...) {
panel.dotplot(x = x, y = y, horizontal = horizontal, jitter.x = TRUE,
col = seq_len(20)[x], pch = "*", cex = 1.5)
}, scales = list(x = list(rot = 90))
)
Which looks like (with unfortunate y-axis labeling):
Seeing that #JohnCLK is requesting a way of colouring by values on the x axis, I tried these demos in ggplot2-- each uses a dummy variable that is coded based on values or ranges to be highlighted in the other variables.
So, first set up the data, as in the question:
pos <- rep(1:2000, 20)
xv <- c(rep(1:20, each = 2000))
yv <- (2000*20, 0.5, 0.1)
xv <- lapply(unique(xv), function(x) pos[xv==x])
to.add <- cumsum(sapply(xv, max) + 1000)
bp <- c(xv[[1]], unlist(lapply(2:length(xv), function(x) xv[[x]] + to.add[x-1])))
Then load ggplot2, prepare a couple of utility functions, and set the default theme:
library("ggplot2")
make.png <- function(p, fName) {
png(fName, width=640, height=480, units="px")
print(p)
dev.off()
}
make.plot <- function(df) {
p <- ggplot(df,
aes(x = bp,
y = yv,
colour = highlight))
p <- p + geom_point()
p <- p + opts(legend.position = "none")
return(p)
}
theme_set( theme_bw() )
Draw a plot which highlights values in a defined range on the vertical axis:
# highlight a horizontal band
df <- data.frame(cbind(bp, yv))
df$highlight <- 0
df$highlight[ df$yv >= 0.4 & df$yv < 0.45 ] <- 1
p <- make.plot(df)
print(p)
make.png(p, "demo_horizontal.png")
Next draw a plot which highlights values in a defined range on the x axis, a vertical band:
# highlight a vertical band
df$highlight <- 0
df$highlight[ df$bp >= 38000 & df$bp < 42000 ] <- 1
p <- make.plot(df)
print(p)
make.png(p, "demo_vertical.png")
And finally draw a plot which highlights alternating vertical bands, by x value:
# highlight alternating bands
library("gtools")
alt.band.width <- 2000
df$highlight <- as.integer(df$bp / alt.band.width)
df$highlight <- ifelse(odd(df$highlight), 1, 0)
p <- make.plot(df)
print(p)
make.png(p, "demo_alternating.png")
Hope this helps; it was good practice anyway.