Plot3d only plots first two principle components - r

I am currently trying to plot the colours of different subgroups of a large dataset. I have separated the data in to 6 subgroups with 6 colours. However my plot3d function only plots the first two principle components.
Here is the example of the plot.
Here is the code. I have create a PCA analysis of my dataset and originally only want to show the first 3 main principle components but I have tried plotting all the principle components to ensure it isn't to do with the data.
PCA_Model <- prcomp(t(Input_dataset), center = T, scale=F)
samples_names <- row.names(PCA_Model$rotation)
# Bind sample names to their subgroup
pca_matrix <- cbind(samples_names, "Subgroup"=labeled_subgroup, stringsAsFactors=FALSE)
# Link dataframe to color
colours <- as.character(factor(pca_matrix[,"Subgroup"], levels = paste0("C", 1:6),labels = c("blue",
"red", "yellow", "green", "black", "white")))
plot3d(PCA_Model$x[,1:440], col=colours)
The dataset is very diverse so should show all subgroups. Any help would be much appreciated!

You may be using the wrong plotting function. Using scatter3d in the latest version of plot3D package:
# fit PCA model
PCA_Model <- prcomp(dplyr::select(iris, -Species), center = T, scale=F)
# Plot
scatter3D(x = PCA_Model$x[,1], y = PCA_Model$x[,2], z = PCA_Model$x[,3],
# just use the factor to color the points:
col = factor(iris$Species))

I think you get something odd from feeding characters into the col option of plot3d. So below I show an example of how to feed the colors. You create a color vector first, named it after your levels and then call them out. Adjust the script before for 6 colours:
library(rgl)
library(RColorBrewer)
pca = prcomp(iris[,-5])$x
COLS = brewer.pal(3,"Set1")
names(COLS) = levels(iris$Species)
plot3d(pca,col=COLS[as.character(iris$Species)])
I used snapshot3d() to capture the image, and the axis labels seem quite squished

Related

Issues with colour in plots [duplicate]

I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code:
data <- iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
This is all well and good but how do I know what factor has been coloured what colour??
data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)
should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.
The command palette tells you the colours and their order when col = somefactor. It can also be used to set the colours as well.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
In order to see that in your graph you could use a legend.
legend('topright', legend = levels(iris$Species), col = 1:3, cex = 0.8, pch = 1)
You'll notice that I only specified the new colours with 3 numbers. This will work like using a factor. I could have used the factor originally used to colour the points as well. This would make everything logically flow together... but I just wanted to show you can use a variety of things.
You could also be specific about the colours. Try ?rainbow for starters and go from there. You can specify your own or have R do it for you. As long as you use the same method for each you're OK.
Like Maiasaura, I prefer ggplot2. The transparent reference manual is one of the reasons.
However, this is one quick way to get it done.
require(ggplot2)
data(diamonds)
qplot(carat, price, data = diamonds, colour = color)
# example taken from Hadley's ggplot2 book
And cause someone famous said, plot related posts are not complete without the plot, here's the result:
Here's a couple of references:
qplot.R example,
note basically this uses the same diamond dataset I use, but crops the data before to get better performance.
http://ggplot2.org/book/
the manual: http://docs.ggplot2.org/current/
There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
Using ggplot2 (generally easier)
Using R's built in plotting functionality in combination with the colorRampPallete function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat and diamond$price, and the factor/categorical column diamond$color. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot the factor you want to color by as the color argument. qplot will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:
Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete function. colorRampPallete() returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5) would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col argument. As long as we always use this same list, our mapping between colors and diamond$colors will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:
Nifty, right?
The col argument in the plot function assign colors automatically to a vector of integers. If you convert iris$Species to numeric, notice you have a vector of 1,2 and 3s So you can apply this as:
plot(iris$Sepal.Length, iris$Sepal.Width, col=as.numeric(iris$Species))
Suppose you want red, blue and green instead of the default colors, then you can simply adjust it:
plot(iris$Sepal.Length, iris$Sepal.Width, col=c('red', 'blue', 'green')[as.numeric(iris$Species)])
You can probably see how to further modify the code above to get any unique combination of colors.
The lattice library is another good option. Here I've added a legend on the right side and jittered the points because some of them overlapped.
xyplot(Sepal.Width ~ Sepal.Length, group=Species, data=iris,
auto.key=list(space="right"),
jitter.x=TRUE, jitter.y=TRUE)

R plot3d coloring

I am using plot3d() from rgl to make 3D scatter plots, using three columns, of samples in a data frame. Furthermore, I am using a fourth column colorby from my data frame (where each sample takes values, say, 10, 11, 12 as factors/levels) to color the points in the plot.
When using plot() to make 2D plots, I first set palette(rainbow(3)) and then col = colorby within plot() followed by legend(). The plot, colors and legend work fine.
However, when I repeat the last part for plot3d(), the coloring mixes up and not the same colors are assigned to the same levels as they would be in plot(). Moreover, if I use legend3d("topright", legend = levels(colorby), col = rainbow(3)) to create a legend, it looks the same as the 2D legend, but the coloring is clearly wrong in the 3D plot.
Where am I going wrong?
This looks correct to me:
df <- data.frame(x=1:9, y=9:1, z=101:109, colorby=gl(3,3))
palette(rainbow(3))
plot(x~y, df, col = df$colorby)
library(rgl)
with(df, plot3d(x,y,z, col = colorby))
legend3d("topright", legend = levels(df$colorby), col = levels(df$colorby), pch=19)

R distinct some points with different color

I have around 20.000 points in my scatter plot. I have a list of interesting points and want to show those points in the scatter plot with different color. Is there any simple way to do it? Thank you.
Further explanation,
I have a matrix, consist of 20.000 rows, let's say R1 to R20000 and 4 columns, let's say A,B,C, and, D. Each row has its own row.names. I want to make a scatter plot between A and C. It is easy with plot(data$A,data$B).
On the other hand, I have a list of row.names which I want to check where in the scatter plot those point is. Let's say R1,R3,R5,R10,R20,R25.
I just want to change the color of R1,R3,R5,R10,R20,R25 in the scatter plot different from other points. Sorry if my explanation is not clear.
If your data is in a simple form, then it is easy to do. For example:
# Make some toy data
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
# List of indicies (or a logical vector) defining your interesting points
is.interesting <- sample(1000, 30)
# Create vector/column of colours
dat$col <- "lightgrey"
dat$col[is.interesting] <- "red"
# Plot
with(dat, plot(x, y, col = col, pch = 16))
Without a reproducible example, it's hard to say anything more specific.

Legend of a raster map with categorical data

I would like to plot a raster containing 4 different values (1) with a categorical text legend describing the categories such as 2 but with colour boxes:
I've tried using legend such as :
legend( 1,-20,legend = c("land","ocean/lake", "rivers","water bodies"))
but I don't know how to associate one value to the displayed color. Is there a way to retrieve the colour displayed with 'plot' and to use it in the legend?
The rasterVis package includes a Raster method for levelplot(), which plots categorical variables and produces an appropriate legend:
library(raster)
library(rasterVis)
## Example data
r <- raster(ncol=4, nrow=2)
r[] <- sample(1:4, size=ncell(r), replace=TRUE)
r <- as.factor(r)
## Add a landcover column to the Raster Attribute Table
rat <- levels(r)[[1]]
rat[["landcover"]] <- c("land","ocean/lake", "rivers","water bodies")
levels(r) <- rat
## Plot
levelplot(r, col.regions=rev(terrain.colors(4)), xlab="", ylab="")
By default, the colours used in a raster-plot are generated by rev(terrain.colors()) (see ?raster::plot). You can use this to re-create that sequence of 4 colours for your legend - or choose a random sequence of colours:
my_col = rev(terrain.colors(n = 4))
# my_col = c('beige','red','green','blue')
First plot the map using the colour sequence. legend = FALSE gets rid of the standard colour bar:
plot(my_raster, legend = FALSE, col = my_col)
Add a custom legend to the bottom left. Use the fill argument to generate coloured boxes:
legend(x='bottomleft', legend = c("land", "ocean/lake", "rivers", "water bodies"), fill = my_col)

Customizing metaMDS() plot

I have created a NMDS plot using the 'vegan' package, like this:
y=metaMDS(data,type="p").
plot(y)
Now I have this NMDS with a good spread of my points. However, I would like to add the graphics of the plot. I would like to give the points in the plot a different colour, depending on a categorical variable (the variable is called 'regio') in my dataset, which has two values (1 or 2).
Is this possible? And if so, how?
Best,
Koen
The easiest way is to use the grouping variable regio to index into a vector of colours you want to plot with. E.g., (untested as I don't have your data...)
colvec <- c("red","blue")
plot(y, type = "n")
points(y, display = "sites", col = colvec[data$regio])
## or
text(y, display = "sites", col = colvec[data$regio])
## depending on how you want to represent the sample scores

Resources