I'm trying to make a hexbin representation of data in several categories. The problem is, facetting these bins seems to make all of them different sizes.
set.seed(1) #Create data
bindata <- data.frame(x=rnorm(100), y=rnorm(100))
fac_probs <- dnorm(seq(-3, 3, length.out=26))
fac_probs <- fac_probs/sum(fac_probs)
bindata$factor <- sample(letters, 100, replace=TRUE, prob=fac_probs)
library(ggplot2) #Actual plotting
library(hexbin)
ggplot(bindata, aes(x=x, y=y)) +
geom_hex() +
facet_wrap(~factor)
Is it possible to set something to make all these bins physically the same size?
As Julius says, the problem is that hexGrob doesn't get the information about the bin sizes, and guesses it from the differences it finds within the facet.
Obviously, it would make sense to hand dx and dy to a hexGrob -- not having the width and height of a hexagon is like specifying a circle by center without giving the radius.
Workaround:
The resolution strategy works, if the facet contains two adjacent haxagons that differ in both x and y. So, as a workaround, I'll construct manually a data.frame containing the x and y center coordinates of the cells, and the factor for facetting and the counts:
In addition to the libraries specified in the question, I'll need
library (reshape2)
and also bindata$factor actually needs to be a factor:
bindata$factor <- as.factor (bindata$factor)
Now, calculate the basic hexagon grid
h <- hexbin (bindata, xbins = 5, IDs = TRUE,
xbnds = range (bindata$x),
ybnds = range (bindata$y))
Next, we need to calculate the counts depending on bindata$factor
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("ID", "factor", "counts")
As we have the cell IDs, we can merge this data.frame with the proper coordinates:
hexdf <- data.frame (hcell2xy (h), ID = h#cell)
hexdf <- merge (counts, hexdf)
Here's what the data.frame looks like:
> head (hexdf)
ID factor counts x y
1 3 e 0 -0.3681728 -1.914359
2 3 s 0 -0.3681728 -1.914359
3 3 y 0 -0.3681728 -1.914359
4 3 r 0 -0.3681728 -1.914359
5 3 p 0 -0.3681728 -1.914359
6 3 o 0 -0.3681728 -1.914359
ggplotting (use the command below) this yields the correct bin sizes, but the figure has a bit weird appearance: 0 count hexagons are drawn, but only where some other facet has this bin populated. To suppres the drawing, we can set the counts there to NA and make the na.value completely transparent (it defaults to grey50):
hexdf$counts [hexdf$counts == 0] <- NA
ggplot(hexdf, aes(x=x, y=y, fill = counts)) +
geom_hex(stat="identity") +
facet_wrap(~factor) +
coord_equal () +
scale_fill_continuous (low = "grey80", high = "#000040", na.value = "#00000000")
yields the figure at the top of the post.
This strategy works as long as the binwidths are correct without facetting. If the binwidths are set very small, the resolution may still yield too large dx and dy. In that case, we can supply hexGrob with two adjacent bins (but differing in both x and y) with NA counts for each facet.
dummy <- hgridcent (xbins = 5,
xbnds = range (bindata$x),
ybnds = range (bindata$y),
shape = 1)
dummy <- data.frame (ID = 0,
factor = rep (levels (bindata$factor), each = 2),
counts = NA,
x = rep (dummy$x [1] + c (0, dummy$dx/2),
nlevels (bindata$factor)),
y = rep (dummy$y [1] + c (0, dummy$dy ),
nlevels (bindata$factor)))
An additional advantage of this approach is that we can delete all the rows with 0 counts already in counts, in this case reducing the size of hexdf by roughly 3/4 (122 rows instead of 520):
counts <- counts [counts$counts > 0 ,]
hexdf <- data.frame (hcell2xy (h), ID = h#cell)
hexdf <- merge (counts, hexdf)
hexdf <- rbind (hexdf, dummy)
The plot looks exactly the same as above, but you can visualize the difference with na.value not being fully transparent.
more about the problem
The problem is not unique to facetting but occurs always if too few bins are occupied, so that no "diagonally" adjacent bins are populated.
Here's a series of more minimal data that shows the problem:
First, I trace hexBin so I get all center coordinates of the same hexagonal grid that ggplot2:::hexBin and the object returned by hexbin:
trace (ggplot2:::hexBin, exit = quote ({trace.grid <<- as.data.frame (hgridcent (xbins = xbins, xbnds = xbnds, ybnds = ybnds, shape = ybins/xbins) [1:2]); trace.h <<- hb}))
Set up a very small data set:
df <- data.frame (x = 3 : 1, y = 1 : 3)
And plot:
p <- ggplot(df, aes(x=x, y=y)) + geom_hex(binwidth=c(1, 1)) +
coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) +
geom_point (data = df, col = "red") # data pts
str (trace.h)
Formal class 'hexbin' [package "hexbin"] with 16 slots
..# cell : int [1:3] 3 5 7
..# count : int [1:3] 1 1 1
..# xcm : num [1:3] 3 2 1
..# ycm : num [1:3] 1 2 3
..# xbins : num 2
..# shape : num 1
..# xbnds : num [1:2] 1 3
..# ybnds : num [1:2] 1 3
..# dimen : num [1:2] 4 3
..# n : int 3
..# ncells: int 3
..# call : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
..# xlab : chr "x"
..# ylab : chr "y"
..# cID : NULL
..# cAtt : int(0)
I repeat the plot, leaving out data point 2:
p <- ggplot(df [-2,], aes(x=x, y=y)) + geom_hex(binwidth=c(1, 1)) + coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p
p + geom_point (data = trace.grid, size = 4) + geom_point (data = df, col = "red")
str (trace.h)
Formal class 'hexbin' [package "hexbin"] with 16 slots
..# cell : int [1:2] 3 7
..# count : int [1:2] 1 1
..# xcm : num [1:2] 3 1
..# ycm : num [1:2] 1 3
..# xbins : num 2
..# shape : num 1
..# xbnds : num [1:2] 1 3
..# ybnds : num [1:2] 1 3
..# dimen : num [1:2] 4 3
..# n : int 2
..# ncells: int 2
..# call : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
..# xlab : chr "x"
..# ylab : chr "y"
..# cID : NULL
..# cAtt : int(0)
note that the results from hexbin are on the same grid (cell numbers did not change, just cell 5 is not populated any more and thus not listed), grid dimensions and ranges did not change. But the plotted hexagons did change dramatically.
Also notice that hgridcent forgets to return the center coordinates of the first cell (lower left).
Though it gets populated:
df <- data.frame (x = 1 : 3, y = 1 : 3)
p <- ggplot(df, aes(x=x, y=y)) + geom_hex(binwidth=c(0.5, 0.8)) +
coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) +
geom_point (data = df, col = "red") + # data pts
geom_point (data = as.data.frame (hcell2xy (trace.h)), shape = 1, size = 6)
Here, the rendering of the hexagons cannot possibly be correct - they do not belong to one hexagonal grid.
I tried to replicate your solution with the same data set using lattice hexbinplot. Initially, it gave me an error xbnds[1] < xbnds[2] is not fulfilled. This error was due to wrong numeric vectors specifying range of values that should be covered by the binning. I changed those arguments in hexbinplot, and it somehow worked. Not sure if it helps you to solve it with ggplot, but it's probably some starting point.
library(lattice)
library(hexbin)
hexbinplot(y ~ x | factor, bindata, xbnds = "panel", ybnds = "panel", xbins=5,
layout=c(7,3))
EDIT
Although rectangular bins with stat_bin2d() work just fine:
ggplot(bindata, aes(x=x, y=y, group=factor)) +
facet_wrap(~factor) +
stat_bin2d(binwidth=c(0.6, 0.6))
There are two source files that we are interested in: stat-binhex.r and geom-hex.r, mainly hexBin and hexGrob functions.
As #Dinre mentioned, this issue is not really related to faceting. What we can see is that binwidth is not ignored and is used in a special way in hexBin, this function is applied for every facet separately. After that, hexGrob is applied for every facet. To be sure you can inspect them with e.g.
trace(ggplot2:::hexGrob, quote(browser()))
trace(ggplot2:::hexBin, quote(browser()))
Hence this explains why sizes differ - they depend on both binwidth and the data of each facet itself.
It is difficult to keep track of the process because of various coordinates transforms, but notice that the output of hexBin
data.frame(
hcell2xy(hb),
count = hb#count,
density = hb#count / sum(hb#count, na.rm=TRUE)
)
always seems to look quite ordinary and that hexGrob is responsible for drawing hex bins, distortion, i.e. it has polygonGrob. In case when there is only one hex bin in a facet there is a more serious anomaly.
dx <- resolution(x, FALSE)
dy <- resolution(y, FALSE) / sqrt(3) / 2 * 1.15
in ?resolution we can see
Description
The resolution is is the smallest non-zero distance between adjacent
values. If there is only one unique value, then the resolution is
defined to be one.
for this reason (resolution(x, FALSE) == 1 and resolution(y, FALSE) == 1) the x coordinates of polygonGrob of the first facet in your example are
[1] 1.5native 1.5native 0.5native -0.5native -0.5native 0.5native
and if I am not wrong, in this case native units are like npc, so they should be between 0 and 1. That is, in case of single hex bin it goes out of range because of resolution(). This function also is the reason of distortion that #Dinre mentioned even when having up to several hex bins.
So for now there does not seem to be an option to have hex bins of equal size. A temporal (and very inconvenient for a large number of factors) solution could begin with something like this:
library(gridExtra)
set.seed(2)
bindata <- data.frame(x = rnorm(100), y = rnorm(100))
fac_probs <- c(10, 40, 40, 10)
bindata$factor <- sample(letters[1:4], 100,
replace = TRUE, prob = fac_probs)
binwidths <- list(c(0.4, 0.4), c(0.5, 0.5),
c(0.5, 0.5), c(0.4, 0.4))
plots <- mapply(function(w,z){
ggplot(bindata[bindata$factor == w, ], aes(x = x, y = y)) +
geom_hex(binwidth = z) + theme(legend.position = 'none')
}, letters[1:4], binwidths, SIMPLIFY = FALSE)
do.call(grid.arrange, plots)
I also did some fiddling around with the hex plots in 'ggplot2', and I was able to consistently produce significant bin distortion when a factor's population was reduced to 8 or below. I can't explain why this is happening without digging down into the package source (which I am reluctant to do), but I can tell you that sparse factors seem to consistently wreck the hex bin plotting in 'ggplot2'.
This suggests to me that the size and shape of a particular hex bin in 'ggplot2' is related to a calculation that is unique to each facet, instead of doing a single calculation for the group and plotting the data afterwards. This is somewhat reinforced by the fact that I can reproduce the distortion in any given facet by plotting only that single factor, like so:
ggplot(bindata[bindata$factor=="e",], aes(x=x, y=y)) +
geom_hex()
This feels like something that should be elevated to the package maintainer, Hadley Wickham (h.wickham at gmail.com). This info is publicly available from CRAN.
Update: I sent an email to the Hadley Wickham asking if he would take a look at this question, and he confirmed that this behavior is indeed a bug.
Related
I am trying to create a Manhattan plot for the following results:
CHR SNP BP A1 TEST NMISS
1 1 rs3131972 742584 A ADD 80
4 1 rs1048488 750775 C ADD 80
BETA SE L95 U95 STAT
1 -2.234 2.977 -8.068 3.601 -0.7503
4 -2.234 2.977 -8.068 3.601 -0.7503
P
1 0.4554
4 0.4554
I am using the code :
manhattan(subset(GWASresults, CHR == 1), xlim = c(1,4))
However I keep getting the error:
Error in `[.data.frame`(x, c(key.ord, criteria)) : undefined columns selected.
what have I done wrong? I am very new to coding and R.
Why have you given the limit xlim = c(1,4)?
There's some problem with the BP value in your data.
I am getting the plot from the original data
manhattan(subset(gwasResults, CHR == 1), xlim=c(1,4))
The x axis in the plot represents the BP value. If you want to have av limit for x, it should be beyond the maximum value of BP, in your data set.
manhattan(subset(gwasResults, CHR == 1))
With your data:
manhattan(subset(gwasResults, CHR == 1), xlim=c(1, 900000))
I am trying to determine the width at half height of a density plot, and I found the following code in a previous post:
d <- ggplot(A0, aes(DIAMETER)) +
geom_density()
xmax <- d$x[d$y==max(d$y, na.rm = TRUE)]
x1 <- d$x[d$x < xmax][which.min(abs(d$y[d$x < xmax]-max(d$y)/2))]
x2 <- d$x[d$x > xmax][which.min(abs(d$y[d$x > xmax]-max(d$y)/2))]
FWHM <- x2-x1
when I execute it though I get the following error message relative to the function max()
Warning message:
In max(d$y, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
I looked a bit around and saw that this could be due to the presence of NA value in my dataset but that is not the case (structure of data frame below).. does someone know how I could fix this problem ? thanks in advance!
str(A0)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 387 obs. of 3 variables:
$ SAMPLE : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
$ TIME : Factor w/ 3 levels "0","24","72": 1 1 1 1 1 1 1 1 1 1 ...
$ DIAMETER: num 13.57 3.76 10.67 14.74 4.2 ...
I think you're possibly misinterpreting the code you found. A ggplot object simply doesn't have a member called y. Suppose we make a density plot like this:
library(ggplot2)
set.seed(69)
d <- ggplot(data = data.frame(x = rnorm(100)), aes(x)) + geom_density()
d
We can replicate your error like this:
xmax <- d$x[d$y==max(d$y, na.rm = TRUE)]
#> Warning message:
#> In max(d$y, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
And if we try to look at d$y we get:
d$y
#> NULL
If you want to work with the underlying data from a plot you need to use ggbuild, like this:
p <- ggplot_build(d)
Now p contains a list of data frames, one for each plot layer, so in our case we can do:
df <- p$data[[1]]
Now df is a data frame containing our x and y coordinates, so we can run your algorithm
xmax <- df$x[df$y==max(df$y, na.rm = TRUE)]
x1 <- df$x[df$x < xmax][which.min(abs(df$y[df$x < xmax]-max(df$y)/2))]
x2 <- df$x[df$x > xmax][which.min(abs(df$y[df$x > xmax]-max(df$y)/2))]
FWHM <- x2-x1
FWHM
#> [1] 2.100466
and we can see this calculation is correct by plotting a segment between the x1 and x2 values at half the maximum height:
d + geom_segment(aes(x = x1, xend = x2, y = max(df$y)/2, yend = max(df$y)/2),
linetype = 2, colour = "red")
Using the following code:
library("ggplot2")
require(zoo)
args <- commandArgs(TRUE)
input <- read.csv(args[1], header=F, col.names=c("POS","ATT"))
id <- args[2]
prot_len <- nrow(input)
manual <- prot_len/100 # 4.3
att_name <- "Entropy"
att_zoo <- zoo(input$ATT)
att_avg <- rollapply(att_zoo, width = manual, by = manual, FUN = mean, align = "left")
autoplot(att_avg, col="att1") + labs(x = "Positions", y = att_name, title="")
With data:
> str(input)
'data.frame': 431 obs. of 2 variables:
$ POS: int 1 2 3 4 5 6 7 8 9 10 ...
$ ATT: num 0.652 0.733 0.815 1.079 0.885 ...
I do:
I would like to upload input2 which has different lenght (therefore, different x-axis) and overlap the 2 curves in the same plot (I mean overlap because I want the two curves in the same plot size, so I will "ignore" the overlapped axis labels and tittles), I would like to compare the shape, regardles the lenght of input.
First I've tried by generating toy input2 changing manual value, so that I have att_avg2 in which manual equals e.g. 7. In between original autoplot and new autoplot-2 I add par(new=TRUE), but this is not my expected output. Any hint on how doing this? Maybe it's better to save att_avg from zoo series to data.frame and not use autoplot? Thanks
UPDATE, response to G. Grothendieck:
If I do:
[...]
att_zoo <- zoo(input$ATT)
att_avg <- rollapply(att_zoo, width = manual, by = manual, FUN = mean, align = "left") #manual=4.3
att_avg2 <- rollapply(att_zoo, width = 7, by = 7, FUN = mean, align = "left")
autoplot(cbind(att_avg, att_avg2), facet=NULL) +
labs(x = "Positions", y = att_name, title="")
I get
and a warning message:
Removed 1 rows containing missing values (geom_path).
par is used with classic graphics, not for ggplot2. If you have two zoo series just cbind or merge the series together and autoplot them using facet=NULL:
library(zoo)
library(ggplot2)
z1 <- zoo(1:3) # length 3
z2 <- zoo(5:1) # length 5
autoplot(cbind(z1, z2), facet = NULL)
Note: The question omitted input2 so there could be some additional considerations from aspects not shown.
I am using the package "lomb" to calculate Lomb-Scargle Periodograms, a method for analysing biological time series data. The package does create a plot if you tell it to do so. However, the plots are not too nice (compared to ggplot2 plots). Therefore, I would like to plot the results with ggplot. However, I do not know how to access the function for the curve plotted...
This is a sample code for a plot:
TempDiff <- runif(4033, 3.0, 18) % just generate random numbers
Time2 <- seq(1,4033) % Time vector
Rand.LombScargle <- randlsp(repeats=10, TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = T,
trace = T, xlab="period", main = "Lomb-Scargle Periodogram")
I have also tried to find out something about the function looking into the function randlsp itself, but could not really find anything that seemed useful to me there...
getAnywhere(randlsp)
A single object matching ‘randlsp’ was found
It was found in the following places
package:lomb
namespace:lomb
with value
function (repeats = 1000, x, times = NULL, from = NULL, to = NULL,
type = c("frequency", "period"), ofac = 1, alpha = 0.01,
plot = TRUE, trace = TRUE, ...)
{
if (is.ts(x)) {
x = as.vector(x)
}
if (!is.vector(x)) {
times <- x[, 1]
x <- x[, 2]
}
if (plot == TRUE) {
op <- par(mfrow = c(2, 1))
}
realres <- lsp(x, times, from, to, type, ofac, alpha, plot = plot,
...)
realpeak <- realres$peak
pks <- NULL
if (trace == TRUE)
cat("Repeats: ")
for (i in 1:repeats) {
randx <- sample(x, length(x))
randres <- lsp(randx, times, from, to, type, ofac, alpha,
plot = F)
pks <- c(pks, randres$peak)
if (trace == TRUE) {
if (i/10 == floor(i/10))
cat(i, " ")
}
}
if (trace == TRUE)
cat("\n")
prop <- length(which(pks >= realpeak))
p.value <- prop/repeats
if (plot == TRUE) {
mx = max(c(pks, realpeak)) * 1.25
hist(pks, xlab = "Peak Amplitude", xlim = c(0, mx), main = paste("P-value: ",
p.value))
abline(v = realpeak)
par(op)
}
res = realres[-(8:9)]
res = res[-length(res)]
res$random.peaks = pks
res$repeats = repeats
res$p.value = p.value
class(res) = "randlsp"
return(invisible(res))
Any idea will be appreciated!
Best,
Christine
PS: Here an example of the plot with real data.
The key to getting ggplot graphs out of any returned object is to convert the data that you need in to some sort of data.frame. To do this, you can look at what kind of object your returned value is and see what sort of data you can immediately extract into a data.frame
str(Rand.LombScargle) # get the data type and structure of the returned value
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "times" "x"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ random.peaks: num [1:10] 4.99 9.82 7.03 7.41 5.91 ...
$ repeats : num 10
$ p.value : num 0.3
- attr(*, "class")= chr "randlsp"
in the case of randlsp, its a list, which is usually what is returned from statistical functions. Most of this information can also be obtained from ?randlsp too.
It looks as if Rand.LombScargle$scanned and Rand.LombScargle$power contains most of what is needed for the first graph:
There is also a horizontal line on the Periodogram, but it doesn't correspond to anything that was returned by randlsp. Looking at the source code that you provided, it looks as if the Periodogram is actually generated by lsp().
LombScargle <- lsp( TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = F)
str(LombScargle)
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "Time2" "TempDiff"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ alpha : num 0.01
$ sig.level: num 10.7
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ p.value : num 0.274
- attr(*, "class")= chr "lsp"
I am guessing that, based on this data, the line is indicating the significance level LombScargle$sig.level
Putting this together, we can create our data to pass to ggplot from lsp:
lomb.df <- data.frame(period=LombScargle$scanned, power=LombScargle$power)
# use the data frame to set up the line plot
g <- ggplot(lomb.df, aes(period, power)) + geom_line() +
labs(y="normalised power", title="Lomb-Scargle Periodogram")
# add the sig.level horizontal line
g + geom_hline(yintercept=LombScargle$sig.level, linetype="dashed")
For the histogram, it looks like this is based on the vector Rand.LombScargle$random.peaks from randlsp:
rpeaks.df <- data.frame(peaks=Rand.LombScargle$random.peaks)
ggplot(rpeaks.df, aes(peaks)) +
geom_histogram(binwidth=1, fill="white", colour="black") +
geom_vline(xintercept=Rand.LombScargle$peak, linetype="dashed") +
xlim(c(0,12)) +
labs(title=paste0("P-value: ", Rand.LombScargle$p.value),
x="Peak Amplitude",
y="Frequency")
Play around with these graphs to get them looking to your taste.
I have an R data.frame:
> str(trainTotal)
'data.frame': 1000 obs. of 41 variables:
$ V1 : num 0.299 -1.174 1.192 1.573 -0.613 ...
$ V2 : num -1.227 0.332 -0.414 -0.58 -0.644 ...
etc.
$ V40 : num 0.101 -1.818 2.987 1.883 0.408 ...
$ Class: int 1 0 0 1 0 1 0 1 1 0 ...
and I would like to draw a 3D scatter plot of Class "0" in blue and Class "1" in red according to V13, V5, and V24.
V13, V5, V24 are the top variables when sorted by scaled variance, so my intuition tells me the 3D visualization could be interesting. Not sure if that makes sense.
How can I plot this with R ?
Edit:
I have tried the following:
install.packages("Rcmdr")
library(Rcmdr)
scatter3d(x=trainTotal[[13]], y= trainTotal[[5]], z= trainTotal[[24]], point.col = as.numeric(as.factor(trainTotal[,41])), size = 10)
which gives me this plot:
I am not sure how to read this plot.
I would prefer to see only dots of two colors, for a start.
Maybe something like this? Using scatterplot3d.
library(scatterplot3d)
#random data
DF <- data.frame(V13 = sample(1:100, 10, T), V5 = sample(1:100, 10, T), V24 = sample(1:100, 10, T), class = sample(0:1, 10, T))
#plot
scatterplot3d(x = DF$V13, y = DF$V5, z = DF$V24, color = c("blue", "red")[as.factor(DF$class)], pch = 19)
This gives:
In scatterplot3d there is also an angle argument for different views.
Perspective issues mean that static 3d plots are mostly horrible and misleading. If you really want a 3d scatterplot, it's best to draw one where you can view it from different angles. The rgl package allows this.
EDIT: I've updated the plot to use colours, in this case picked using the colorspace package, though you can define them however you like. Specifying attributes for points is described on the ?rgl.material help page.
library(rgl)
library(colorspace)
n_points <- 50
n_groups <- 5
some_data <- data.frame(
x = seq(0, 1, length.out = n_points),
y = runif(n_points),
z = rnorm(n_points),
group = gl(n_groups, n_points / n_groups)
)
colors <- rainbow_hcl(n_groups)
with(some_data, points3d(x, y, z, color = colors[group], size = 7))
axes3d()