Visualising evolution of SOM learning in 2-D - r

I am relatively new to data visualisation using R. However, I would like to use R to provide a visual demonstration of how a self-organising map (SOM) learns.
I wondered if someone could help with how to reproduce this types of examples in R, or direct me to reproducible code?
Just even some good pointers would be very help. I can't find anything like this in SOM documentation or R visualisation documentation.
Would be greatly appreciated.

rm(list = ls())
while(!is.null(dev.list()))dev.off()
library(dplyr)
library(kohonen)
library(ggplot2)
nx <- 20; ny <- 20
som.input <- as.matrix(expand.grid(x = seq(-1, 1, length.out = nx), y = seq(-1, 1, length.out = ny))) %>%
scale
x.grid <- 10; y.grid <- 10
sgd <- somgrid(x.grid, y.grid,'hexagonal')
som.output <- list()
epoch <- 100
initial.matrix <- matrix(rnorm(x.grid * y.grid * ncol(som.input), 0, .1), nrow = x.grid * y.grid)
training.alpha <- seq(1, .01, length.out = epoch)
som.output[[1]] <- som(som.input, sgd, rlen = 1, init = initial.matrix, alpha = training.alpha[1], mode = 'online')
for(a in 2:epoch) {
som.output[[a]] <- som(som.input, sgd, rlen = 1, init = som.output[[a-1]]$codes[[1]], alpha = training.alpha[a], mode = 'online')
}
no.picture <- 16
index <- as.integer(seq(1, epoch, length.out = no.picture))
som.codes <- lapply(index, function(input) {
codes <- som.output[[input]]$codes[[1]] %>%
scale(attr(som.input, 'scaled:center'), attr(som.input, 'scaled:scale'))
cbind(
data.frame(codes * (input / epoch), index = input),
expand.grid(column = 1:y.grid, rows = 1:x.grid)
)
})
som.codes <- do.call(rbind, som.codes)
ggplot() +
geom_point(aes(x, y), as.data.frame(som.input), color = 'red', size = 1.1) +
geom_point(aes(x, y), som.codes, size = 1.9) +
geom_path(aes(x, y, group = column), as.data.frame(som.codes)) +
geom_path(aes(x, y, group = rows), as.data.frame(som.codes)) +
facet_wrap(index ~ .) +
theme_bw() +
coord_equal() +
xlab('') + ylab('')

Related

Plot a discontinuous function in R without connecting a "jump"

I'd like to plot a discontinuous function without connecting a jump. For example, in the following plot, I'd like to delete the line connecting (0.5, 0.5) and (0.5, 1.5).
f <- function(x){
(x < .5) * (x) + (x >= .5) * (x + 1)
}
ggplot()+
geom_function(fun = f)
Edit: I'm looking for a solution that works even if the discountinuous point is not a round number, say pi/10.
You could write a little wrapper function which finds discontinuities in the given function and plots them as separate groups:
plot_fun <- function(fun, from = 0, to = 1, by = 0.001) {
x <- seq(from, to, by)
groups <- cut(x, c(-Inf, x[which(abs(diff(fun(x))) > 0.1)], Inf))
df <- data.frame(x, groups, y = fun(x))
ggplot(df, aes(x, y, group = groups)) +
geom_line()
}
This allows
plot_fun(f)
plot_fun(floor, 0, 10)
This answer is based on Allan Cameron's answer, but depicts the jump using open and closed circles. Whether the function is right or left continuous is controlled by an argument.
library("ggplot2")
plot_fun <- function(fun, from = 0, to = 1, by = 0.001, right_continuous = TRUE) {
x <- seq(from, to, by)
tol_vertical <- 0.1
y <- fun(x)
idx_break <- which(abs(diff(y)) > tol_vertical)
x_break <- x[idx_break]
y_break_l <- y[idx_break]
y_break_r <- y[idx_break + 1]
groups <- cut(x, c(-Inf, x_break, Inf))
df <- data.frame(x, groups, y = fun(x))
plot_ <- ggplot(df, aes(x, y, group = groups)) +
geom_line()
# add open and closed points showing jump
dataf_l <- data.frame(x = x_break, y = y_break_l)
dataf_r <- data.frame(x = x_break, y = y_break_r)
shape_open_circle <- 1
# this is the default of shape, but might as well specify.
shape_closed_circle <- 19
shape_size <- 4
if (right_continuous) {
shape_l <- shape_open_circle
shape_r <- shape_closed_circle
} else {
shape_l <- shape_closed_circle
shape_r <- shape_open_circle
}
plot_ <- plot_ +
geom_point(data = dataf_l, aes(x = x, y = y), group = NA, shape = shape_l, size = shape_size) +
geom_point(data = dataf_r, aes(x = x, y = y), group = NA, shape = shape_r, size = shape_size)
return(plot_)
}
Here's the OP's original example:
f <- function(x){
(x < .5) * (x) + (x >= .5) * (x + 1)
}
plot_fun(f)
Here's Allan's additional example using floor, which shows multiple discontinuities:
plot_fun(floor, from = 0, to = 10)
And here's an example showing that the function does not need to be piecewise linear:
f_curved <- function(x) ifelse(x > 0, yes = 0.5*(2-exp(-x)), no = 0)
plot_fun(f_curved, from = -1, to = 5)
You can insert everything inside an ifelse:
f <- function(x){
ifelse(x==0.5,
NA,
(x < .5) * (x) + (x >= .5) * (x + 1))
}
ggplot()+
geom_function(fun = f)

How to get a list of eigenvectors in R

When I do the below code on my data, since there are 35 variables the resulting plot is almost useless because of all the overlap. I can't seem to find anywhere that would give me the list of data that's used to make the plot. For instance, I have a factor called avg_sour that has a direction of about 272 degrees and a magnitude of 1. That's one of the few I can actually see. If I had this data in a table, however, I could see clearly what I'm looking for without having to zoom in and out every time. Add to that the fact that this is for a presentation, so I need to be able to make this visible quickly, without them looking at multiple things--but I think I could get away with a crowded graph and a table that explained the crowded portion. Seems like it ought to be simple, but...I'm afraid I haven't found it yet. Any ideas? I can use any package I can find.
ggbiplot(xD4PCA,obs.scale = .1, var.scale = 1,
varname.size = 3, labels.size=6, circle = T, alpha = 0, center = T)+
scale_x_continuous(limits=c(-2,2)) +
scale_y_continuous(limits=c(-2,2))
If your xD4PCA is from prcomp function, then $rotation gives you eigenvectors. See prcomp function - Value.
You may manually choose and add arrows from xD4PCA$rotation[,1:2]
I was working on this with sample data ir.pca, which is just simple prcomp object using iris data, and all these jobs are based on source code of ggbiplot.
pcobj <- ir.pca # change here with your prcomp object
nobs.factor <- sqrt(nrow(pcobj$x) - 1)
d <- pcobj$sdev
u <- sweep(pcobj$x, 2, 1 / (d * nobs.factor), FUN = '*')
v <- pcobj$rotation
choices = 1:2
choices <- pmin(choices, ncol(u))
df.u <- as.data.frame(sweep(u[,choices], 2, d[choices]^obs.scale, FUN='*'))
v <- sweep(v, 2, d^1, FUN='*')
df.v <- as.data.frame(v[, choices])
names(df.u) <- c('xvar', 'yvar')
names(df.v) <- names(df.u)
df.u <- df.u * nobs.factor
r <- sqrt(qchisq(circle.prob, df = 2)) * prod(colMeans(df.u^2))^(1/4)
v.scale <- rowSums(v^2)
df.v <- r * df.v / sqrt(max(v.scale))
df.v$varname <- rownames(v)
df.v$angle <- with(df.v, (180/pi) * atan(yvar / xvar))
df.v$hjust = with(df.v, (1 - 1.5 * sign(xvar)) / 2)
theta <- c(seq(-pi, pi, length = 50), seq(pi, -pi, length = 50))
circle <- data.frame(xvar = r * cos(theta), yvar = r * sin(theta))
df.v <- df.v[1:2,] # change here like df.v[1:2,]
ggbiplot::ggbiplot(ir.pca,obs.scale = .1, var.scale = 1,
varname.size = 3, labels.size=6, circle = T, alpha = 0, center = T, var.axes = FALSE)+
scale_x_continuous(limits=c(-2,2)) +
scale_y_continuous(limits=c(-2,2)) +
geom_segment(data = df.v, aes(x = 0, y = 0, xend = xvar, yend = yvar),
arrow = arrow(length = unit(1/2, 'picas')),
color = muted('red')) +
geom_text(data = df.v,
aes(label = rownames(df.v), x = xvar, y = yvar,
angle = angle, hjust = hjust),
color = 'darkred', size = 3)
ggbiplot::ggbiplot(ir.pca)+
scale_x_continuous(limits=c(-2,2)) +
scale_y_continuous(limits=c(-2,2)) +
geom_path(data = circle, color = muted('white'),
size = 1/2, alpha = 1/3)
Original one(having all four variables)
Edited one(select only first two variables)

How to to print a plot based on function arguments in R?

I am trying to build a savings calculator. Eventually, I want to create an RShiny app, but before I do that, I want to make sure the code is perfect. Do what I want to do, I have to use three chunks, which are:
ks <- function (x) { number_format(accuracy = 1,
scale = 1/1000,
big.mark = ",")(x) }
savings <- function(years,apr,initial,investment) {
value <- numeric(years + 1)
value[1] <- initial
for (i in 1:years) value[i + 1] <- (value[i] + investment) * apr
data.frame(year = 0:years, value)
}
savings(45.02,1.07,45000,15000)
ggplot(data=savings(45,1.07,45000,15000),aes(x=year,y=value))+geom_line()+ scale_x_continuous(breaks = seq(0, 100, by = 5)) +
scale_y_continuous(labels = ks, breaks = seq(0, 400000000, by = 250000))+labs(x="Year",y="Value (thousands)")
I want to produce the ggplot as part of the "savings" function but I do not know how to integrate it.
You can save the dataframe in an object and use it in ggplot
library(ggplot2)
savings <- function(years,apr,initial,investment) {
value <- numeric(years + 1)
value[1] <- initial
for (i in 1:years) value[i + 1] <- (value[i] + investment) * apr
df <- data.frame(year = 0:years, value)
ggplot(data=df,aes(x=year,y=value))+ geom_line() +
scale_x_continuous(breaks = seq(0, 100, by = 5)) +
scale_y_continuous(labels = ks, breaks = seq(0, 400000000, by = 250000)) +
labs(x="Year",y="Value (thousands)")
}
savings(45.02,1.07,45000,15000)

gam plots with ggplot

I need to create some gam plots in ggplot. I can do them with the general plot function, but am unsure how to do with ggplot. Here is my code and plots with the regular plot function. I'm using the College data set from the ISLR package.
train.2 <- sample(dim(College)[1],2*dim(College)[1]/3)
train.college <- College[train.2,]
test.college <- College[-train.2,]
gam.college <- gam(Outstate~Private+s(Room.Board)+s(Personal)+s(PhD)+s(perc.alumni)+s(Expend)+s(Grad.Rate), data=train.college)
par(mfrow=c(2,2))
plot(gam.college, se=TRUE,col="blue")
See update below old answer.
Old answer:
There is an implementation of GAM plotting using ggplot2 in voxel library. Here is how you would go about it:
library(ISLR)
library(mgcv)
library(voxel)
library(tidyverse)
library(gridExtra)
data(College)
set.seed(1)
train.2 <- sample(dim(College)[1],2*dim(College)[1]/3)
train.college <- College[train.2,]
test.college <- College[-train.2,]
gam.college <- gam(Outstate~Private+s(Room.Board)+s(Personal)+s(PhD)+s(perc.alumni)+s(Expend)+s(Grad.Rate), data=train.college)
vars <- c("Room.Board", "Personal", "PhD", "perc.alumni","Expend", "Grad.Rate")
map(vars, function(x){
p <- plotGAM(gam.college, smooth.cov = x) #plot customization goes here
g <- ggplotGrob(p)
}) %>%
{grid.arrange(grobs = (.), ncol = 2, nrow = 3)}
after a bunch of errors: In plotGAM(gam.college, smooth.cov = x) :
There are one or more factors in the model fit, please consider plotting by group since plot might be unprecise
To compare to the plot.gam:
par(mfrow=c(2,3))
plot(gam.college, se=TRUE,col="blue")
You might also want to plot the observed values:
map(vars, function(x){
p <- plotGAM(gam.college, smooth.cov = x) +
geom_point(data = train.college, aes_string(y = "Outstate", x = x ), alpha = 0.2) +
geom_rug(data = train.college, aes_string(y = "Outstate", x = x ), alpha = 0.2)
g <- ggplotGrob(p)
}) %>%
{grid.arrange(grobs = (.), ncol = 3, nrow = 2)}
or per group (especially important if you used the by argument (interaction in gam).
map(vars, function(x){
p <- plotGAM(gam.college, smooth.cov = x, groupCovs = "Private") +
geom_point(data = train.college, aes_string(y = "Outstate", x = x, color= "Private"), alpha = 0.2) +
geom_rug(data = train.college, aes_string(y = "Outstate", x = x, color= "Private" ), alpha = 0.2) +
scale_color_manual("Private", values = c("#868686FF", "#0073C2FF")) +
theme(legend.position="none")
g <- ggplotGrob(p)
}) %>%
{grid.arrange(grobs = (.), ncol = 3, nrow = 2)}
Update, 08. Jan. 2020.
I currently think the package mgcViz offers superior functionality compared to the voxel::plotGAMfunction. An example using the above data set and models:
library(mgcViz)
viz <- getViz(gam.college)
print(plot(viz, allTerms = T), pages = 1)
plot customization is similar go ggplot2 syntax:
trt <- plot(viz, allTerms = T) +
l_points() +
l_fitLine(linetype = 1) +
l_ciLine(linetype = 3) +
l_ciBar() +
l_rug() +
theme_grey()
print(trt, pages = 1)
This vignette shows many more examples.

Manually assigning colors with scale_fill_manual only works for certain hexagon sizes

I am trying to create a scatterplot that is summarized by hexagon bins of counts. I would like the user to be able to define the count breaks for the color scale. I have this working, using scale_fill_manual(). Oddly, however, it only works sometimes. In the MWE below, using the given seed value, if xbins=10, there are issues resulting in a plot as follows:
However, if xbins=20 or 40, for example, the plot doesn't seem to have problems:
My MWE is as follows:
library(ggplot2)
library(hexbin)
library(RColorBrewer)
set.seed(1)
xbins <- 20
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
h <- hexbin(x = x, y = y, xbins = xbins, shape = 1, IDs = TRUE,
xbnds = maxRange, ybnds = maxRange)
hexdf <- data.frame (hcell2xy(h), hexID = h#cell, counts = h#count)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf),
labels = rev(clrs))
ggplot(hexdf, aes(x = x, y = y, hexID = hexID, fill = countColor)) +
scale_fill_manual(values = levels(hexdf$countColor)) +
geom_hex(stat = "identity") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
coord_fixed(xlim = c(-0.5, (maxRange[2] + buffer)),
ylim = c(-0.5, (maxRange[2] + buffer))) +
theme(aspect.ratio=1)
My goal is to tweak this code so that the plot does not have problems (where suddenly certain hexagons are different sizes and shapes than the rest) regardless of the value assigned to xbins. However, I am puzzled what may be causing this problem for certain xbins values. Any advice would be greatly appreciated.
EDIT:
I am updating the example code after taking into account comments by #bdemarest and #Axeman. I followed the most popular answer in the link #Axeman recommends, and believe it is more useful when you are working with scale_fill_continuous() on an integer vector. Here, I am working on scale_fill_manual() on a factor vector. As a result, I am still unable to get this goal to work. Thank you.
library(ggplot2)
library(hexbin)
library(RColorBrewer)
set.seed(1)
xbins <- 10
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
bindata = data.frame(x=x,y=y,factor=as.factor(1))
h <- hexbin(bindata, xbins = xbins, IDs = TRUE, xbnds = maxRange, ybnds = maxRange)
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("factor", "ID", "counts")
counts$factor =as.factor(counts$factor)
hexdf <- data.frame (hcell2xy (h), ID = h#cell)
hexdf <- merge (counts, hexdf)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf), labels = rev(clrs))
ggplot(hexdf, aes(x = x, y = y, fill = countColor)) +
scale_fill_manual(values = levels(hexdf$countColor)) +
geom_hex(stat = "identity") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
coord_cartesian(xlim = c(-0.5, maxRange[2]+buffer), ylim = c(-0.5, maxRange[2]+ buffer)) + theme(aspect.ratio=1)
you can define colors in 'geom' instead of 'scale' that modifies the scale of plot:
ggplot(hexdf, aes(x = x, y = y)) +
geom_hex(stat = "identity",fill =hexdf$countColor)

Resources