I have around 20.000 points in my scatter plot. I have a list of interesting points and want to show those points in the scatter plot with different color. Is there any simple way to do it? Thank you.
Further explanation,
I have a matrix, consist of 20.000 rows, let's say R1 to R20000 and 4 columns, let's say A,B,C, and, D. Each row has its own row.names. I want to make a scatter plot between A and C. It is easy with plot(data$A,data$B).
On the other hand, I have a list of row.names which I want to check where in the scatter plot those point is. Let's say R1,R3,R5,R10,R20,R25.
I just want to change the color of R1,R3,R5,R10,R20,R25 in the scatter plot different from other points. Sorry if my explanation is not clear.
If your data is in a simple form, then it is easy to do. For example:
# Make some toy data
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
# List of indicies (or a logical vector) defining your interesting points
is.interesting <- sample(1000, 30)
# Create vector/column of colours
dat$col <- "lightgrey"
dat$col[is.interesting] <- "red"
# Plot
with(dat, plot(x, y, col = col, pch = 16))
Without a reproducible example, it's hard to say anything more specific.
Related
I made a plot with 1000000 points using plot() and the data from object z = data.frame(x,y). I have a separate data frame with a column of 1000 rows (values) df$v. All elements from df$v intersect with z[,"x"].
I want to plot all these values (points) with a different color, let's say green. I know how to do it one by one, e.g. for the value 582251 from df$v:
plot(z$x, z$y,.....)
points(z[z[,"x"]==582251,],col="green", pch=19, cex=0.3)
Is there any way to do it for the whole df$v using points()?
I would create another vector just for colors. This can be added to your existing data.frame z using z = cbind(z, 'col'='black'). This will produce a new column in z. Then you replace the black with green for desired points using z$col[z$x %in% df$v] = 'green'. Then run the following command
plot(z$x, z$y, col = z$col)
Here is some data to work with.
df <- data.frame(x1=c(234,543,342,634,123,453,456,542,765,141,636,3000),x2=c(645,123,246,864,134,975,341,573,145,468,413,636))
If I plot these data, it will produce a simple scatter plot with an obvious outlier:
plot(df$x2,df$x1)
Then I can always write the code below to remove the y-axis outlier(s).
plot(df$x2,df$x1,ylim=c(0,800))
So my question is: Is there a way to exclude obvious outliers in scatterplots automatically? Like ouline=F would do if I were to plot, say, boxplots for an example. To my knowledge, outline=F doesn't work with scatterplots.
This is relevant because I have hundreds of scatterplots and I want to exclude all obvious outlying data points without setting ylim(...) for each individual scatterplot.
You could write a function that returns the index of what you define as an obvious outlier. Then use that function to subset your data before plotting.
Here all observations with "a" exceeding 5 * median of "a" are excluded.
df <- data.frame(a = c(1,3,4,2,100), b=c(1,3,2,4,2))
f <- function(x){
which(x$a > 5*median(x$a))
}
with(df[-f(df),], plot(b, a))
There is no easy yes/no option to do what you are looking for (the question of defining what is an "obvious outlier" for a generic scatterplot is potentially quite problematic).
That said, it should not be too difficult to write a reasonable function to give y-axis limits from a set of data points. If we take "obvious outlier" to mean a point with y value significantly above or below the bulk of the sample (which could be justified assuming a sufficient distribution of x values), then you could use something like:
ybounds <- function(y){ # y is the response variable in the dataframe
bounds = quantile(df$x1, probs=c(0.05, 0.95), type=3, names=FALSE)
return(bounds + c(-1,1) * 0.1 * (bounds[2]-bounds[1]) )
}
Then plot each dataframe with plot(df$x, df$y, ylim=ybounds(df$y))
Is there any way for me to add some points to a pairs plot?
For example, I can plot the Iris dataset with pairs(iris[1:4]), but I wanted to execute a clustering method (for example, kmeans) over this dataset and plot its resulting centroids on the plot I already had.
It would help too if there's a way to plot the whole data and the centroids together in a single pairs plot in such a way that the centroids can be plotted in a different way. The idea is, I plot pairs(rbind(iris[1:4],centers) (where centers are the three centroids' data) but plotting the three last elements of this matrix in a different way, like changing cex or pch. Is it possible?
You give the solution yourself in the last paragraph of your question. Yes, you can use pch and col in the pairs function.
pairs(rbind(iris[1:4], kmeans(iris[1:4],3)$centers),
pch=rep(c(1,2), c(nrow(iris), 3)),
col=rep(c(1,2), c(nrow(iris), 3)))
Another option is to use panel function:
cl <- kmeans(iris[1:4],3)
idx <- subset(expand.grid(x=1:4,y=1:4),x!=y)
i <- 1
pairs(iris[1:4],bg=cl$cluster,pch=21,
panel=function(x, y,bg, ...) {
points(x, y, pch=21,bg=bg)
points(cl$center[,idx[i,'x']],cl$center[,idx[i,'y']],
cex=4,pch=10,col='blue')
i <<- i +1
})
But I think it is safer and easier to use lattice splom function. The legend is also automatically generated.
cl <- kmeans(iris[1:4],3)
library(lattice)
splom(iris[1:4],groups=cl$cluster,pch=21,
panel=function(x, y,i,j,groups, ...) {
panel.points(x, y, pch=21,col=groups)
panel.points(cl$center[,j],cl$center[,i],
pch=10,col='blue')
},auto.key=TRUE)
Imagine we have 7 categories (e.g. religion), and we would like to plot them not in a linear way, but in clusters that are automatically chosen to be nicely aligned. Here the individuals within groups have the same response, but should not be plotted on one line (which happens when plotting ordinal data).
So to sum it up:
automatically using available graph space
grouping without order, spread around canvas
individuals remain visible; no overlapping
would be nice to have the individuals within groups to be bound by some (invisible) circle
Are there any packages designed for this purpose? What are keywords I need to look for?
Example data:
religion <- sample(1:7, 100, T)
# No overlap here, but I would like to see the group part come out more.
plot(religion)
After assigning coordinates to the center of each group,
you can use wordcloud::textplot to avoid overlapping labels.
# Data
n <- 100
k <- 7
religion <- sample(1:k, n, TRUE)
names(religion) <- outer(LETTERS, LETTERS, paste0)[1:n]
# Position of the groups
x <- runif(k)
y <- runif(k)
# Plot
library(wordcloud)
textplot(
x[religion], y[religion], names(religion),
xlim=c(0,1), ylim=c(0,1), axes=FALSE, xlab="", ylab=""
)
Alternatively, you can build a graph with a clique (or a tree)
for each group,
and use one of the many graph-layout algorithms in igraph.
library(igraph)
A <- outer( religion, religion, `==` )
g <- graph.adjacency(A)
plot(g)
plot(minimum.spanning.tree(g))
In the image you linked each point has three numbers associated: coordinates x and y and group (color). If you only have one information for each individual, you can do something like this:
set.seed(1)
centers <- data.frame(religion=1:7, cx=runif(7), cy=runif(7))
eps <- 0.04
data <- within(merge(data.frame(religion=sample(1:7, 100, T)), centers),
{
x <- cx+rnorm(length(cx),sd=eps)
y <- cy+rnorm(length(cy),sd=eps)
})
with(data, plot(x,y,col=religion, pch=16))
Note that I'm creating random centers for each group and also creating small displacements around these centers for each observation. You'll have to play around with parameter eps and maybe set the centers manually if want to pursue this path.
I am trying to plot a set of points in r by "plot" command, I would like to sub group them by colour. E.x. I have 9 points the first three points in to red, the next three points into blue , ...
you just have to provide a vector of the respective colors
plot(1:9, 1:9, col = c(rep("black", 3), rep("blue", 3), rep("red", 3)))
Altough under normal circumstances you shouldn't do this manually, creating the vector of colours according to a grouping variable instead.
Without more detailed requirements, sample code and some example data it'll be difficult to determine exactly what you're looking for, but perhaps the col parameter for qplot is what you need.
# load required library.
require(ggplot2)
# create some data.frame with numbers and colours.
p <- data.frame(x=1:9, y=1:9, c=rep(c("red","blue","green"), each=3))
# plot.
qplot(x, y, col=c, data=p)
Hope that helps.