I have 4 columns of data in R that looks like this
x y z group
the group columns has categorical values, so it is a discrete set of values, whereas the other three columns are continuous.
I want to make a 3d plot in R with x, y, and z, and where the color of the dot is given by "group". I also want to have a legend to this plot. How can I do this? I don't have a particular preference on the actual colors. I suppose rainbow(length(unique(group)) should do fine.
Here is an example using scatterplot3d and based on the example in the vignette
library(scatterplot3d)
# some basic dummy data
DF <- data.frame(x = runif(10),
y = runif(10),
z = runif(10),
group = sample(letters[1:3],10, replace = TRUE))
# create the plot, you can be more adventurous with colour if you wish
s3d <- with(DF, scatterplot3d(x, y, z, color = as.numeric(group), pch = 19))
# add the legend using `xyz.convert` to locate it
# juggle the coordinates to get something that works.
legend(s3d$xyz.convert(0.5, 0.7, 0.5), pch = 19, yjust=0,
legend = levels(DF$group), col = seq_along(levels(DF$group)))
Or, you could use lattice and cloud, in which case you can construct the key using key
cloud(z~x+y, data = DF, pch= 19, col.point = DF$group,
key = list(points = list(pch = 19, col = seq_along(levels(DF$group))),
text = list(levels(DF$group)), space = 'top', columns = nlevels(DF$group)))
Related
I plotted a 3d scatter plot in R using the scatter3d function.
Now, I want to plot the labels on every dot in the 3d scatter, such as every point has its ID next to it i.e., "1", "2" etc..
Here is what I tried:
library("car")
library("rgl")
scatter3d(geometry[,1],geometry[,2],geometry[,3] , surface=FALSE, labels = rownames(geometry), id.n=nrow(geometry))
This tutorial says that adding arguments labels=rownames(geometry), id.n=nrow(geometry) should display the labels on every dot but that did not work.
EDIT:
I uploaded the coordinate file here, you can read it like this
geometry = read.csv("geometry.txt",sep = " ")
colnames(geometry) = c("x","y","z")
EDIT:
Actually, even the example from the tutorial does not label the points and does not produce the plot displayed. There is probably something wrong with the package.
scatter3d(x = sep.l, y = pet.l, z = sep.w,
surface=FALSE, labels = rownames(iris), id.n=nrow(iris))
I can give you a quick fix if you want to use any other function other than scatter3d. This can be achieved using plot3d and text3d function. I have provided the basic code block of how it can be implemented. You can customize it to your needs.
plot3d(geometry[,1],geometry[,2],geometry[,3])
text3d(geometry[,1],geometry[,2],geometry[,3],rownames(geometry))
points3d(geometry[,1],geometry[,2],geometry[,3], size = 5)
After much messing around I got it (I also have the method for plot_ly if you,re interested)
test2 <- cbind(dataSet[,paste(d)],set.final$Groups,test)
X <- test2[,1]
Y <- test2[,2]
Z <- test2[,3]
# 3D plot with the regression plane
scatter3d(x = X, y = Y, z = Z, groups = test2$`set.final$Groups`,
grid = FALSE, fit = "linear",ellipsoid = FALSE, surface=FALSE,
surface.col = c("green", "blue", "red"),
#showLabels(x = x, y = y, z = z, labels=test2$test, method="identify",n = nrow(test2), cex=1, col=carPalette()[1], location=c("lr"))
#labels = test2$test,
id=list(method = "mahal", n = length(test2$test), labels = test2$test)
#id.n=nrow(test2$test)
)
#identify3d(x = X, y = Y, z = Z, labels = test2$test, n = length(test2$test), plot = TRUE, adj = c(-0.1, 0.5), tolerance = 20, buttons = c("right"))
rglwidget()
Problem: I am trying to reproduce a round filled 2d contour plot in R using plotly (have tried ggplot2 also but plotly seemed to be easier).
Data: Sample data download link -
https://drive.google.com/file/d/10Mr5yWVReQckPI6TKLY_vzPT8zWiijKl/view?usp=sharing
The data to be plotted for contour is in a column format and typically called z variable, there is x and y data also available for all values of z. A simple dataframe would look like this:
Please ignore the repeat common x and y as I have truncated decimals. The data has about 25000 rows.
Approach: I first use akima package to interpolate z variable values for given x and y to map z in 2d. This makes the z column data fit in a xy grid for 2d plotting and show contours.
Expected outcome:
Code used:
dens <- akima::interp(x = dt$`Xvalue(mm)`,
y = dt$`Yvalue(mm)`,
z = dt$Values,
duplicate = "mean",
xo=seq(min(dt$`Xvalue(mm)`), max(dt$`Xvalue(mm)`), length = 10),
yo=seq(min(dt$`Yvalue(mm)`), max(dt$`Yvalue(mm)`), length = 10))
plot_ly(x = dens$x,
y = dens$y,
z = dens$z,
colors = c("blue","grey","red"),
type = "contour")
Actual outcome:
Help Needed:
To refine edges of the actual outcome plot to something of a close match to the expected outcome image.
Many thanks in advance for your comments and help.
I found that I could increase the grid output z matrix from akima::interp() from default 40x40 to custom using nx and ny input in function.
And then in plot_ly() add contours = list(coloring = 'fill', showlines = FALSE) to hide contour lines to get output close to my expected outcome.
So working code is like this:
dens <- akima::interp(x = dt$`Xvalue(mm)`,
y = dt$`Yvalue(mm)`,
z = dt$Values,
nx = 50,
ny = 50,
duplicate = "mean",
xo=seq(min(dt$`Xvalue(mm)`), max(dt$`Xvalue(mm)`), length = 50),
yo=seq(min(dt$`Yvalue(mm)`), max(dt$`Yvalue(mm)`), length = 50))
plot_ly(x = dens$x,
y = dens$y,
z = dens$z,
colors = c("blue","grey","red"),
type = "contour",
contours = list(coloring = 'fill', showlines = FALSE))
Plotly contour plot reference was very helpful in this case:
https://plot.ly/r/reference/#contour
I have been struggling with rescaling the loadings (arrows) length in a ggplot2/ggfortify PCA. I have looked around extensively for an answer to this, and the only information I have found either code new biplot functions or refer to other entirely different packages for PCA (ggbiplot, factoextra), neither of which address the question I would like to answer:
Is it possible to scale/change size of PCA loadings in ggfortify?
Below is the code I have to plot a PCA using stock R functions as well as the code to plot a PCA using autoplot/ggfortify. You'll notice in the stock R plots I can scale the loads by simply multiplying by a scalar (*20 here) so my arrows aren't cramped in the middle of the PCA plot. Using autoplot...not so much. What am I missing? I'll move to another package if necessary but would really like to have a better understanding of ggfortify.
On other sites I have found, the graph axes limits never seem to exceed +/- 2. My graph goes +/- 20, and the loadings sit staunchly near 0, presumably at the same scale as graphs with smaller axes. I would still like to plot PCA using ggplot2, but if ggfortify won't do it then I need to find another package that will.
#load data geology rocks frame
georoc <- read.csv("http://people.ucsc.edu/~mclapham/earth125/data/georoc.csv")
#load libraries
library(ggplot2)
library(ggfortify)
geo.na <- na.omit(georoc) #remove NA values
geo_matrix <- as.matrix(geo.na[,3:29]) #create matrix of continuous data in data frame
pca.res <- prcomp(geo_matrix, scale = T) #perform PCA using correlation matrix (scale = T)
summary(pca.res) #return summary of PCA
#plotting in stock R
plot(pca.res$x, col = c("salmon","olivedrab","cadetblue3","purple")[geo.na$rock.type], pch = 16, cex = 0.2)
#make legend
legend("topleft", c("Andesite","Basalt","Dacite","Rhyolite"),
col = c("salmon","olivedrab","cadetblue3","purple"), pch = 16, bty = "n")
#add loadings and text
arrows(0, 0, pca.res$rotation[,1]*20, pca.res$rotation[,2]*20, length = 0.1)
text(pca.res$rotation[,1]*22, pca.res$rotation[,2]*22, rownames(pca.res$rotation), cex = 0.7)
#plotting PCA
autoplot(pca.res, data = geo.na, colour = "rock.type", #plot results, name using original data frame
loadings = T, loadings.colour = "black", loadings.label = T,
loadings.label.colour = "black")
The data comes from an online file from a class I'm taking, so you could just copy this if you have the ggplot2 and ggfortify packages installed. Graphs below.
R plot of what I want ggplot to look like
What ggplot actually looks like
Edit:
Adding reproducible code below.
iris.res <-
iris %>%
select(Sepal.Length:Petal.Width) %>%
as.matrix(.) %>%
prcomp(., scale = F)
autoplot(iris.res, data = iris, size = 4, col = "Species", shape = "Species",
x = 1, y = 2, #components 1 and 2
loadings = T, loadings.colour = "grey50", loadings.label = T,
loadings.label.colour = "grey50", loadings.label.repel = T) + #loadings are arrows
geom_vline(xintercept = 0, lty = 2) +
geom_hline(yintercept = 0, lty = 2) +
theme(aspect.ratio = 1) +
theme_bw()
This answer is probably long after the OP needs it, but I'm offering it because I have been wrestling with the same issue for a while, and maybe I can save someone else the same effort.
# Load data
iris <- data.frame(iris)
# Do PCA
PCA <- prcomp(iris[,1:4])
# Extract PC axes for plotting
PCAvalues <- data.frame(Species = iris$Species, PCA$x)
# Extract loadings of the variables
PCAloadings <- data.frame(Variables = rownames(PCA$rotation), PCA$rotation)
# Plot
ggplot(PCAvalues, aes(x = PC1, y = PC2, colour = Species)) +
geom_segment(data = PCAloadings, aes(x = 0, y = 0, xend = (PC1*5),
yend = (PC2*5)), arrow = arrow(length = unit(1/2, "picas")),
color = "black") +
geom_point(size = 3) +
annotate("text", x = (PCAloadings$PC1*5), y = (PCAloadings$PC2*5),
label = PCAloadings$Variables)
In order to increase the arrow length, multiply the loadings for the xend and yend in the geom_segment call. With a bit of trial and effort, can work out what number to use.
To place the labels in the correct place, multiply the PC axes by the same value in the annotate call.
I have two columns of data, f.delta and g.delta that I would like to produce a scatter plot of in R.
Here is how I am doing it.
plot(f.delta~x, pch=20, col="blue")
points(g.delta~x, pch=20, col="red")
The problem is this: the values of f.delta vary from 0 to -7; the values of g.delta vary from 0 to 10.
When the plot is drawn, the y axis extends from 1 to -7. So while all the f.delta points are visible, any g.delta point that has y>1 is cut-off from view.
How do I stop R from automatically setting the ylims from the data values. Have tried, unsuccessfully, various combinations of yaxt, yaxp, ylims.
Any suggestion will be greatly appreciated.
Thanks,
Anjan
In addition to Gavin's excellent answer, I also thought I'd mention that another common idiom in these cases is to create an empty plot with the correct limits and then to fill it in using points, lines, etc.
Using Gavin's example data:
with(df,plot(range(x),range(f.delta,g.delta),type = "n"))
points(f.delta~x, data = df, pch=20, col="blue")
points(g.delta~x, data = df, pch=20, col="red")
The type = "n" causes plot to create only the empty plotting window, based on the range of x and y values we've supplied. Then we use points for both columns on this existing plot.
You need to tell R what the limits of the data are and pass that as argument ylim to plot() (note the argument is ylim not ylims!). Here is an example:
set.seed(1)
df <- data.frame(f.delta = runif(10, min = -7, max = 0),
g.delta = runif(10, min = 0, max = 10),
x = rnorm(10))
ylim <- with(df, range(f.delta, g.delta)) ## compute y axis limits
plot(f.delta ~ x, data = df, pch = 20, col = "blue", ylim = ylim)
points(g.delta ~ x, data = df, pch = 20, col = "red")
Which produces
If my X axis is time, and my Y is numeric data, how can I add a point at an arbitrary Y value (Say 500) whenever a point exists?
I am overlaying using lines on top of other plots.
Just add more points, with the same x values as your previous data, and a fixed y value.
So you have something like:
dfr <- data.frame(x = sample(100, 10, replace = TRUE), y = runif(10))
with(dfr, plot(x, y))
and you want to add
points(dfr$x, rep.int(0.5, 10), col = "blue")
Having time for x values shouldn't affect anything.