I'm producing a plot like this:
library(ggplot2)
data.dist = matrix(
c(10, -10, 10, -10, 10, -10, 10, -10, 10),
nrow=3,
ncol=3,
byrow = TRUE)
hc <- agnes(dist(data.dist), method = "ward", diss = TRUE)
cluster <- cutree(hc, k=2)
xy <- data.frame(cmdscale(dist(data.dist)), factor(cluster))
names(xy) <- c("x", "y", "cluster")
xy$model <- rownames(xy)
ggplot(xy, aes(x, y)) + geom_point(aes(colour=cluster), size=3)
Which gives me:
However, let's say I want to attach another covariate, say a binary variable c(1, 0, 1) to the data and display all 1 using one symbol (say an X) and all 0 using another symbol (say a dot). How can I accomplish this?
xy<-data.frame(x=rnorm(3),y=rnorm(3),cluster=as.factor(c(1,0,1)),another=as.factor(c(1,1,0)) )
ggplot(xy, aes(x, y,shape=another)) + geom_point(aes(colour=cluster), size=3)
Related
I am plotting a density in base R and then in ggplot2.
When I use base R the plot comes out alright, but in ggplot2 the margins are cut out.
This is the plot in base R:
library(tidyverse)
library(mvtnorm)
library(reshape2)
#>
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#>
#> smiths
sd <- 1 / 2
# sigma
s1 <- sd^2
mu1 <- c(0, 0)
sigma1 <- matrix(c(s1^2, 0, 0, s1^2), nrow = 2)
# first two vectors
x.points <- seq(-3, 3, length.out = 100)
y.points <- seq(-3, 3, length.out = 100)
# the third vector is a density
z <- matrix(0, nrow = 100, ncol = 100)
z[] <- dmvnorm(expand.grid(x.points, y.points), mean = mu1, sigma = sigma1)
contour(x.points, y.points, z, xlim = range(-3, 3), ylim = c(-3, 3), nlevels = 5, drawlabels = TRUE)
And this is the plot in ggplot2:
df <- reshape2::melt(z)
df <- transform(
df,
x = x.points[Var1],
y = y.points[Var2]
)
ggplot(df, aes(x, y)) +
geom_contour(aes(z = value)) +
xlim(-3,3) +
ylim(-3,3) +
theme_classic()
Created on 2021-04-08 by the reprex package (v0.3.0)
When I started working on the plots, both plots were coming out well. I was running par(pty = "s") (unfortunately if I include the par command in the reprex(), something goes wrong and there is no plot.) The par command was working and giving me a square plot for both base R and ggplot2. Then I added a line and some points to the ggplot2 plot:
points <- data.frame(
x = c(0, 1, 1.5, 1, 0),
y = c(-3, -2.5, 0, 2.5, 3)
)
ggplot(df, aes(x, y)) +
geom_contour(aes(z = value)) +
xlim(-3,3) +
ylim(-3,3) +
geom_path(mapping = aes(x=points[,1], y=points[,2]), points) +
geom_point(mapping = aes(x=points[,1], y=points[,2]), points) +
theme_classic()
After I added the points, the ggplot2 plot started cutting out the margins.
I have tried adding dev.new(width=10, height=10) following this advice, but of course, it just opens a new graphing design and, in addition, the margins are also cut. I have also tried to reset the graphing device with dev.off(), and restarting the R session.
The issue was the classic theme theme_classic(): it looks like that no matter the type of plot, theme_classic() is going to draw only two sides of the box around the plot.
library(tidyverse)
library(mvtnorm)
library(reshape2)
#>
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#>
#> smiths
sd <- 1 / 2
# sigma
s1 <- sd^2
mu1 <- c(0, 0)
sigma1 <- matrix(c(s1^2, 0, 0, s1^2), nrow = 2)
# first two vectors
x.points <- seq(-3, 3, length.out = 100)
y.points <- seq(-3, 3, length.out = 100)
# the third vector is a density
z <- matrix(0, nrow = 100, ncol = 100)
z[] <- dmvnorm(expand.grid(x.points, y.points), mean = mu1, sigma = sigma1)
df <- reshape2::melt(z)
df <- transform(
df,
x = x.points[Var1],
y = y.points[Var2]
)
ggplot(df, aes(x, y)) +
geom_contour(aes(z = value)) +
xlim(-3,3) +
ylim(-3,3) +
theme_bw()
Created on 2021-04-08 by the reprex package (v0.3.0)
I am using K-mean alg. in R in order to separe variables. I would like to plot results in ggplot witch I was able to manage,
however results seem to be different in ggplot and in cluster::clusplot
So I wanted to ask what I am missing: for example I know that scaling in different but I was wondering Whz when using clustplot all variables are inside the bounds and when using ggplot it is not.
Is it just because of the scaling?
So are two below result exatly the same?
library(cluster)
library(ggfortify)
x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)
A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
autoplot(kmeans(x, centers = 3, nstart = 50, iter.max = 500), data = x, frame.type = 'norm')
For me, I get the same plot using either clusplot or ggplot. But for using ggplot, you have to first make a PCA on your data in order to get the same plot as clustplot. Maybe it's where you have an issue.
Here, with your example, I did:
x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)
A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
pca_x = princomp(x)
x_cluster = data.frame(pca_x$scores,A$cluster)
ggplot(test, aes(x = Comp.1, y = Comp.2, color = as.factor(A.cluster), fill = as.factor(A.cluster))) + geom_point() +
stat_ellipse(type = "t",geom = "polygon",alpha = 0.4)
The plot using clusplot
And the one using ggplot:
Hope it helps you to figure out the reason of your different plots
I am trying to create a scatterplot that is summarized by hexagon bins of counts. I would like the user to be able to define the count breaks for the color scale. I have this working, using scale_fill_manual(). Oddly, however, it only works sometimes. In the MWE below, using the given seed value, if xbins=10, there are issues resulting in a plot as follows:
However, if xbins=20 or 40, for example, the plot doesn't seem to have problems:
My MWE is as follows:
library(ggplot2)
library(hexbin)
library(RColorBrewer)
set.seed(1)
xbins <- 20
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
h <- hexbin(x = x, y = y, xbins = xbins, shape = 1, IDs = TRUE,
xbnds = maxRange, ybnds = maxRange)
hexdf <- data.frame (hcell2xy(h), hexID = h#cell, counts = h#count)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf),
labels = rev(clrs))
ggplot(hexdf, aes(x = x, y = y, hexID = hexID, fill = countColor)) +
scale_fill_manual(values = levels(hexdf$countColor)) +
geom_hex(stat = "identity") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
coord_fixed(xlim = c(-0.5, (maxRange[2] + buffer)),
ylim = c(-0.5, (maxRange[2] + buffer))) +
theme(aspect.ratio=1)
My goal is to tweak this code so that the plot does not have problems (where suddenly certain hexagons are different sizes and shapes than the rest) regardless of the value assigned to xbins. However, I am puzzled what may be causing this problem for certain xbins values. Any advice would be greatly appreciated.
EDIT:
I am updating the example code after taking into account comments by #bdemarest and #Axeman. I followed the most popular answer in the link #Axeman recommends, and believe it is more useful when you are working with scale_fill_continuous() on an integer vector. Here, I am working on scale_fill_manual() on a factor vector. As a result, I am still unable to get this goal to work. Thank you.
library(ggplot2)
library(hexbin)
library(RColorBrewer)
set.seed(1)
xbins <- 10
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
bindata = data.frame(x=x,y=y,factor=as.factor(1))
h <- hexbin(bindata, xbins = xbins, IDs = TRUE, xbnds = maxRange, ybnds = maxRange)
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("factor", "ID", "counts")
counts$factor =as.factor(counts$factor)
hexdf <- data.frame (hcell2xy (h), ID = h#cell)
hexdf <- merge (counts, hexdf)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf), labels = rev(clrs))
ggplot(hexdf, aes(x = x, y = y, fill = countColor)) +
scale_fill_manual(values = levels(hexdf$countColor)) +
geom_hex(stat = "identity") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
coord_cartesian(xlim = c(-0.5, maxRange[2]+buffer), ylim = c(-0.5, maxRange[2]+ buffer)) + theme(aspect.ratio=1)
you can define colors in 'geom' instead of 'scale' that modifies the scale of plot:
ggplot(hexdf, aes(x = x, y = y)) +
geom_hex(stat = "identity",fill =hexdf$countColor)
Suppose I have the following data:
coef <- list(c(47, 2, 0, 0),
c(7, 42, -8, 0),
c(78, -71, 43, -7))
my_data <- data.frame("x" = rep(1:4, times = 20))
cols <- c("1", "2", "3")
I would like to plot a function for each vector of coefficients in the object coef.
However, using a for loop does not work.
library(ggplot2)
g1 <- ggplot(data = my_data, aes(x = x))
for(i in 1:3){
my_fun <- function(x){
sum(as.vector(outer(x, 0:3, FUN="^")) * coef[[i]])
}
my_fun <- Vectorize(my_fun)
g1 <- (g1 + stat_function(fun = my_fun, aes(col = cols[i]), lwd = 1.2))
print(g1)
}
The result is supposed to look (somehow) like this:
Is there a way to use lapply instead of the loop? Or can I modify the for-loop to fix this problem?
I wanted to make a graph using facet_wrap and plot it in different pages in a pdf file. I've read son many options, and this works:
R + ggplot: plotting over multiple pages
but only when you have the same rows in each page.
I have this demo data to try explain my case:
A <- data.frame(TIME = rep(c(0, 5, 10, 15, 30, 45, 60), 5))
A$C <- (1 - exp(-0.2*A$TIME))
A$ID <- rep(1:5, each = 7)
A$R <- rnorm(35, mean = 1, sd = 0.01)
A$C2 <- A$C*A$R
Pages <- 5
A2 <- A[c(1,4:8,10:22,24:35),]
So, I have ID with different number of observations. I tried to make a vector with the number of observation in each ID (I want an ID per page), but it doesn't work.
nrws <- ddply(A2, .(ID), "nrow")
nsamp <- nrws[,2]
pdf("Test.pdf")
for (i in seq(Pages))
{
slice = seq(((i-1)*nsamp[i]),(i*nsamp[i]))
slice2 = slice[!(slice > nrow(A2))]
A3 = A2[slice2,]
p1 <- ggplot(A3, aes(x = TIME, y = C2)) +
geom_line(size = 0.5) +
geom_point(size = 1) +
facet_wrap(~ID)
print(p1)
}
dev.off()
Could you help me?
Thanks in advances,
Nacho
I think you were overthinking trying to calculate your "slices". Maybe you want this?
Not entirely sure. If you only want one ID per page you don't need facet_wrap, and you will probably need to set the scale explicitly to keep it the same from page to page.
library(plyr)
A <- data.frame(TIME = rep(c(0, 5, 10, 15, 30, 45, 60), 5))
A$C <- (1 - exp(-0.2*A$TIME))
A$ID <- rep(1:5, each = 7)
A$R <- rnorm(35, mean = 1, sd = 0.01)
A$C2 <- A$C*A$R
Pages <- 5
A2 <- A[c(1,4:8,10:22,24:35),]
nrws <- ddply(A2, .(ID), "nrow")
nsamp <- nrws[,2]
pdf("Test.pdf")
for (i in seq(Pages))
{
# slice = seq(((i-1)*nsamp[i]),(i*nsamp[i]))
# slice2 = slice[!(slice > nrow(A2))]
# A3 = A2[slice2,]
A3 = A2[A2$ID==i,]
p1 <- ggplot(A3, aes(x = TIME, y = C2)) +
geom_line(size = 0.5) +
geom_point(size = 1) +
facet_wrap(~ID)
print(p1)
}
dev.off()