How to add more points/dots on an existing pairs plot? - r

par(mfrow=c(1,2))
Trigen <- data.frame(OTriathlon$Gender,OTriathlon$Swim,OTriathlon$Bike,OTriathlon$Run)
colnames(Trigen) <- c("Gender","Swim","Bike","Run")
res <- split(Trigen[,2:4],Trigen$Gender)
pairs(res$Male, pch="M", col = 4)
points(res$Female, pch ="F", col= 2)
Basically, Customize the pairs plot, so where the plot symbol and color of each data point represents
gender.
I did some random things in the code but the issue that I am facing is that I cant add female points to the existing plot. After running the points code it just stays the same doesn't get updated

There is no need to call points sevral times, because you can use the factor directly as a color. Example:
plot(iris[,c(2,3)], col=iris$Species)

Related

Mismatch in legend/color in R plot

I created several plots in R. Occasionally, the program does not match the color of the variables in the plot to the variable colors in the legend. In the attached file (Unfortunately, I can't yet attach images b/c of reputation), the first 2 graphs are assigned a black/red color scheme. But, the third chart automatically uses a green/black and keeps the legend with black/red. I cannot understand why this happens.
How can I prevent this from happening?
I know it's possible to assign color, but I am struggling to find a clear way to do this.
Code:
plot(rank, abundance, pch=16, col=type, cex=0.8)
legend(60,50,unique(type),col=1:length(type),pch=16)
plot(rank, abundance, pch=16, col=Origin, cex=0.8)
legend(60,50,unique(Origin),col=1:length(Origin),pch=16)
Below is where color pattern won't match
plot(rank, abundance, pch=16, col=Lifecycle, cex=0.8)
legend(60,50,unique(Lifecycle),col=1:length(Lifecycle),pch=16)
data frame looks like this:
Plant rank abundance Lifecycle Origin type
X 1 23 Perennial Native Weedy
Y 2 10 Annual Exotic Ornamental
Z 3 9 Perennial Native Ornamental
First, I create some fake data.
df <- data.frame(rank = 1:10, abundance = runif(10,10,100),
Lifecycle = sample(c('Perennial', 'Annual'), 10, replace=TRUE))
Then I explicitly say what colors I want my points to be.
cols=c('dodgerblue', 'plum')
Then I plot, using the factor df$Lifecycle to color points.
plot(df$rank, df$abundance, col = cols[df$Lifecycle], pch=16)
When the factor df$Lifecycle is used above, it converts it to a numeric reference to cols, such that it sorts the values alphabetically. Therefore, in the legend, we just need to sort the unique df$Lifecycle values, and then hand it our color vector (cols).
legend(5, 40, sort(unique(df$Lifecycle)), col=cols, pch=16, bty='n')
Hopefully this helps.

Add points to pairs plot?

Is there any way for me to add some points to a pairs plot?
For example, I can plot the Iris dataset with pairs(iris[1:4]), but I wanted to execute a clustering method (for example, kmeans) over this dataset and plot its resulting centroids on the plot I already had.
It would help too if there's a way to plot the whole data and the centroids together in a single pairs plot in such a way that the centroids can be plotted in a different way. The idea is, I plot pairs(rbind(iris[1:4],centers) (where centers are the three centroids' data) but plotting the three last elements of this matrix in a different way, like changing cex or pch. Is it possible?
You give the solution yourself in the last paragraph of your question. Yes, you can use pch and col in the pairs function.
pairs(rbind(iris[1:4], kmeans(iris[1:4],3)$centers),
pch=rep(c(1,2), c(nrow(iris), 3)),
col=rep(c(1,2), c(nrow(iris), 3)))
Another option is to use panel function:
cl <- kmeans(iris[1:4],3)
idx <- subset(expand.grid(x=1:4,y=1:4),x!=y)
i <- 1
pairs(iris[1:4],bg=cl$cluster,pch=21,
panel=function(x, y,bg, ...) {
points(x, y, pch=21,bg=bg)
points(cl$center[,idx[i,'x']],cl$center[,idx[i,'y']],
cex=4,pch=10,col='blue')
i <<- i +1
})
But I think it is safer and easier to use lattice splom function. The legend is also automatically generated.
cl <- kmeans(iris[1:4],3)
library(lattice)
splom(iris[1:4],groups=cl$cluster,pch=21,
panel=function(x, y,i,j,groups, ...) {
panel.points(x, y, pch=21,col=groups)
panel.points(cl$center[,j],cl$center[,i],
pch=10,col='blue')
},auto.key=TRUE)

How to colour every sub group of points?

I am trying to plot a set of points in r by "plot" command, I would like to sub group them by colour. E.x. I have 9 points the first three points in to red, the next three points into blue , ...
you just have to provide a vector of the respective colors
plot(1:9, 1:9, col = c(rep("black", 3), rep("blue", 3), rep("red", 3)))
Altough under normal circumstances you shouldn't do this manually, creating the vector of colours according to a grouping variable instead.
Without more detailed requirements, sample code and some example data it'll be difficult to determine exactly what you're looking for, but perhaps the col parameter for qplot is what you need.
# load required library.
require(ggplot2)
# create some data.frame with numbers and colours.
p <- data.frame(x=1:9, y=1:9, c=rep(c("red","blue","green"), each=3))
# plot.
qplot(x, y, col=c, data=p)
Hope that helps.

Plot With Blocks

I have been searching for hours, but I can't find a function that does this.
How do I generate a plot like
Lets say I have an array x1 = c(2,13,4) and y2=c(5,23,43). I want to create 3 blocks with height from 2-5,13-23...
How would I approach this problem? I'm hoping that I could be pointed in the right direction as to what built-in function to look at?
I have not used your data because you say you are working with an array, but you gave us two vectors. Moreover, the data you showed us is overlapping. This means that if you chart three bars, you only see two.
Based on the little image you provided, you have three ranges you want to plot for each individual or date. Using times series, we usually see this to plot the min/max, the standard deviation and the current data.
The trick is to chart the series as layers. The first series is the one with the largest range (the beige band in this example). In the following example, I chart an empty plot first and I add three layers of rectangles, one for beige, one for gray and one for red.
#Create data.frame
n=100
df <-data.frame(1:n,runif(n)*10,60+runif(n)*10,25+runif(n)*10,40+runif(n)*10,35-runif(n)*10,35+runif(n)*10)
colnames(df) <-c("id","beige.min","beige.max","gray.min","gray.max","red.min","red.max")
#Create chart
plot(x=df$id,y=NULL,ylim=range(df[,-1]), type="n") #blank chart, ylim is the range of the data
rect(df$id-0.5,df[,2],df$id+0.5,df[,3],col="beige", border=FALSE) #first layer
rect(df$id-0.5,df[,4],df$id+0.5,df[,5],col="gray", border=FALSE) #second layer
rect(df$id-0.5,df[,6],df$id+0.5,df[,7],col="darkred", border=FALSE) #third layer
It's not entirely clear what you want based on the png, but based on what you've written:
x1 <- c(2,13,4)
y2 <- c(5,23,43)
foo <- data.frame(id=1:3, x1, y2)
library(ggplot2)
ggplot(data=foo) + geom_rect(aes(ymin=x1, ymax=y2, xmin=id-0.4, xmax=id+0.4))

Circling a particular box in R boxplot

Is it possible to circle a particular box in a boxplot in R? The assumption here is that I know beforehand which of the boxes it is that I have to highlight.
I heartily second #csgillespie's suggestion to just make it a different color.
That said, I played around a bit, and this is what I came up with (using #Marc's data):
df <- data.frame(s1=rnorm(100), s2=rnorm(100, mean=2), s3=rnorm(100, mean=-2))
Plot the boxplot and keep the stats for plotting the ellipse:
foo <- boxplot(df, border=c(8,8,1), lwd=c(1,1,3))
Set semimajor and semiminor axes:
aa <- 0.5
bb <- foo$stats[4,3]-foo$stats[2,3]
Plot a parameterized ellipse around the third box:
tt <- seq(0,2*pi,by=.01)
lines(3+aa*cos(tt),foo$stats[3,3]+bb*sin(tt))
If you want to go with a somewhat hand drawn look and can do some interactive parts (for example, creating a presentation where one slide just shows the plot, then the next slide includes the circling of the one of interest).
use the locator function to click on points that surround the part of the plot that is of interest, you might want to set type='l' so you can see the shape that you are making (but then will need to recreate the plot without the added lines)
pass the return value from above to the xspline function with other options.
example:
boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
tmp <- locator(type='l') # click on plot around box of interest
boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
xspline(tmp, open=FALSE, border='red', lwd=3)

Resources