How can I make 3d plot with stacked 2d plot? - r

I want to plot as below. I tried to search several packages and plot functions but I couldn't find a solution.
My data has four columns.
ID F R M
1 2 3 4
2 4 6 7
...
I want to see the relationship between M and R with respect to each F value (1, 2, 3, ...). So, I'd like F along the x-axis, R along the y-axis, and M as the z-axis as in the below graph.
Thanks.

You can do this kind of thing with lattice cloud plots, using panel.3dpolygon from latticeExtra.
library(latticeExtra)
# generating random data
d <- data.frame(x=rep(1:40, 7), y=rep(1:7, each=40),
z=c(sapply(1:7, function(x) runif(40, 10*x, 10*x+20))))
# define the panel function
f <- function(x, y, z, groups, subscripts, ...) {
colorz <- c('#8dd3c7', '#ffffb3', '#bebada', '#fb8072', '#80b1d3',
'#fdb462', '#b3de69')
sapply(sort(unique(groups), decreasing=TRUE), function(i) {
zz <- z[subscripts][groups==i]
yy <- y[subscripts][groups==i]
xx <- x[subscripts][groups==i]
panel.3dpolygon(c(xx, rev(xx)), c(yy, yy),
c(zz, rep(-0.5, length(zz))),
col=colorz[i], ...)
})
}
# plot
cloud(z~x+y, d, groups=y, panel.3d.cloud=f, scales=list(arrows=FALSE))
I'm sure I don't need to loop over groups in the panel function, but I always forget the correct incantation for subscripts and groups to work as intended.
As others have mentioned in comments, this type of plot might look snazzy, but can obscure data.

Related

Way to progressively overlap line plots in R

I have a for loop from which I call a function grapher() which extracts certain columns from a dataframe (position and w, both continuous variables) and plots them. My code changes the Y variable (called w here) each time it runs and so I'd like to plot it as an overlay progressively. If I run the grapher() function 4 times for example, I'd like to have 4 plots where the first plot has only 1 line, and the 4th has all 4 overlain on each other (as different colours).
I've already tried points() as suggested in other posts, but for some reason it only generates a new graph.
grapher <- function(){
position.2L <- data[data$V1=='2L', 'V2']
w.2L <- data[data$V1=='2L', 'w']
plot(position.2L, w.2L)
points(position.2L, w.2L, col='green')
}
# example of my for loop #
for (t in 1:200){
#code here changes the 'w' variable each iteration of 't'
if (t%%50==0){
grapher()
}
}
Not knowing any details about your situation I can only assume something like this might be applicable.
# Example data set
d <- data.frame(V1=rep(1:2, each=6), V2=rep(1:6, 2), w=rep(1:6, each=2))
# Prepare the matrix we will write to.
n <- 200
m <- matrix(d$w, nrow(d), n)
# Loop progressively adding more noise to the data
set.seed(1)
for (i in 2:n) {
m[,i] <- m[,i-1] + rnorm(nrow(d), 0, 0.05)
}
# We can now plot the matrix, selecting the relevant rows and columns
matplot(m[d$V1 == 1, seq(1, n, by=50)], type="o", pch=16, lty=1)

How to add labels to original data given clustering result using hclust

Just say I have some unlabeled data which I know should be clustered into six catergories, like for example this dataset:
library(tidyverse)
ts <- read_table(url("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data"), col_names = FALSE)
If I create an hclust object with a sample of 60 from the original dataset like so:
n <- 10
s <- sample(1:100, n)
idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
ts.samp <- ts[idx,]
observedLabels <- c(rep(1,n), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
# compute DTW distances
library(dtw)#Dynamic Time Warping (DTW)
distMatrix <- dist(ts.samp, method= 'DTW')
# hierarchical clustering
hc <- hclust(distMatrix, method='average')
I know that I can then add the labels to the dendrogram for viewing like this:
observedLabels <- c(rep(1,), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
plot(hc, labels=observedLabels, main="")
However, I would like to the correct labels to the initial data frame that was clustered. So for ts.samp I would like to add a extra column with the correct label that each observation has been clustered into.
It would seems that ts.samp$cluster <- hc$label should add the cluster to the data frame, however hc$label returns NULL.
Can anyone help with extracting this information?
You need to define a level where you cut your dendrogram, this will form the groups.
Use:
labels <- cutree(hc, k = 3) # you set the number of k that's more appropriate, see how to read a dendrogram
ts.samp$grouping <- labels
Let's look at the dendrogram in order to find the best number for k:
plot(hc, main="")
abline(h=500, col = "red") # cut at height 500 forms 2 groups
abline(h=300, col = "blue") # cut at height 300 forms 3/4 groups
It looks like either 2 or 3 might be good. You need to find the highest jump in the vertical lines (Height).
Use the horizontal lines at that height and count the cluster "formed".

How to combine custom panels with splom() (or xyplot() or pairs())

I'm having trouble combining heterogenous panels with lattice package tools. I tried splom(), pairs(), and xyplot(), but unsuccessfully so far. Suppose I have a simple time series data of 3 columns as xts object:
library(xts)
S = as.xts(apply(matrix(rnorm(300), ,3), 2, cumsum), Sys.Date()+1:100)
Diagonal panels (top left to bottom right or diag(5) format) need to show 3 density plots, one for each series.
Upper triangular panels need to show latticeExtra::densityplot (or equivalently panel.densityplot) for the three series. The order doesn't matter for now; I'll work it out later.
Lower triangular panels need to show horizontal box plots. I suppose panel.bwplot would work, but could not successfully tame it.
Here is a skeleton of what may work, but I'll be thankful for any successful version.
library(lattice); library(latticeExtra)
splom(as.data.frame(S),
upper.panel=function(){
panel.abline() # temporary placeholder
},
diag.panel = function(x, ...){
yrng <- current.panel.limits()$ylim
d <- density(x, na.rm=TRUE)
d$y <- with(d, yrng[1] + 0.95 * diff(yrng) * y / max(y) )
panel.lines(d)
diag.panel.splom(x, ...)
},
lower.panel = function(x, y, ...){
panel.abline() # temporary placeholder
},
pscale=0, as.matrix = TRUE
)

ploting large number of time series with xyplot

Here is a minimal example of the type of data I'm
strugling to plot:
These curves are drawn from two processes.
library("lattice")
x0<-matrix(NA,350,76)
for(i in 1:150) x0[i,]<-arima.sim(list(order=c(1,0,0),ar=0.9),n=76)
for(i in 151:350) x0[i,]<-arima.sim(list(order=c(1,0,0),ar=-0.9),n=76)
I'd like to plot them as line plots in a lattice made of two boxes. The box located above
would contain the first 150 curves (in orange) and the box below should display
the next 200 curves (which should be in blue). I don't need a
label or legend. I've tried to use the example shown on the man-page:
aa<-t(x0)
colnames(aa)<-c(paste0("v0_",1:150),paste0("v1_",1:200))
aa<-as.ts(aa)
xyplot(aa,screens=list(v0_="0","1"),col=list(v0_="orange",v1_="blue"),auto.key=FALSE)
but somehow it doesn't work.
This will do without additional factors (yet agstudy's solution is not so much of a hack like this one):
# This is equivalent to your for-loops, use whatever you prefer
x0 <- do.call(rbind, lapply(1:350, function(i) {
arima.sim(list(order=c(1,0,0), ar=ifelse(i <= 150, 0.9, -0.9)), n=76)
}))
plotStuff <- function(indices, ...) {
plot.new()
plot.window(xlim=c(1, ncol(x0)), ylim=range(x0[indices,]))
box()
for (i in indices)
lines(x0[i,], ...)
}
par(mfrow=c(2,1), mar=rep(1,4)) # two rows, reduced margin
plotStuff(1:150, col="orange")
plotStuff(151:350, col="blue")
You should put your data in the long format like this:
Var1 Var2 value group
1 1 v0_1 2.0696016 v0
2 2 v0_1 1.3954414 v0
..... ..........
26599 75 v1_200 0.3488131 v1
26600 76 v1_200 0.2957114 v1
For example using reshape2 :
library(reshape2)
aa.m <- melt(aa)
aa.m$group <- gsub('^(v[0-9])_(.*)','\\1',aa.m$Var2)
xyplot(value~Var1|group,data=aa.m,type='l',groups=group)

Plotting Two Factors on the same Graph

Say I have two factors and I want to graph them on the same plot, both factors have the same levels.
s1 <- c(rep("male",20), rep("female", 30))
s2 <- c(rep("male",10), rep("female", 40))
s1 <- factor(s1, levels=c("male", "female"))
s2 <- factor(s2, levels=c("male", "female"))
I would have thought that using the table function would have produced the correct result for graphing but it pops out.
table(s1, s2)
s2
s1 male female
male 10 10
female 0 30
So really two questions, what is the table function doing to get this result and what other function can i use to create a graph with 2 series using functions with the same levels?
Also if it is a factor I'm using barplot2 in the gplots package to graph it.
You can achieve slightly more detailed results with lattice package:
s1 <- factor(c(rep("male",20), rep("female", 30)))
s2 <- factor(c(rep("male",10), rep("female", 40)))
D <- data.frame(s1, s2)
library(lattice)
histogram(~s1+s2, D, col = c("pink", "lightblue"))
Or if you want males/females side by side for easier comparison:
t1 <- table(s1)
t2 <- table(s2)
barchart(cbind(t1, t2), stack = F, horizontal = F)
From ?table:
‘table’ uses the cross-classifying factors to build a contingency
table of the counts at each combination of factor levels.
When you do table(s1,s2) what happens is that the function considers s1 and s2 as paired results. Effectively it tells you that if you were to take cbind(s1,s2) then there would be 10 rows of male-male, 10 of male-female and so on.
To understand this consider a very trivial example:
a <- c("M","M","F","F")
b <- c("F","F","M","M")
table(a,b)
b
a F M
F 0 2
M 2 0
What you should do is:
t1 <- table(s1)
t2 <- table(s2)
barplot(cbind(t1,t2), beside=TRUE, col=c("lightblue", "salmon"))
Two options producing slightly different forms of plots are
plot(s1, s2)
and
plot(table(s1,s2))
The former is a spineplot a special case of the mosaic plot, which the plot method for table produces (the second example). See ?spineplot and ?mosaicplot for more details and you can use these functions directly, rather than the generic plot() if you wish.
Also take a look at the mosaic() function in the vcd package on CRAN by Meyer et al (Link to vcd on CRAN)
table() is producing the contingency table for the two factors.
Hmm.. I don't think creating a contingency table is what Cameron was looking for. If I understood him correctly, I think he wanted to create a data frame with two variables in it, where s1 and s2 seems to be vectors of the same size. (length(s1)==length(s2)).
In this case, he would simply need to create a "table" (I think he meant data.frame) using:
df = data.frame(s1=s1, s2=s2);
And then plot the 2 series in the same plot.
So as for the second question of plotting these things, I'd use matplot. For example:
matplot(1:10, data.frame(a=rnorm(10), b=rnorm(10)), type="l", lty=1, lwd=1, col=c("blue","red"))
Given that he has his data of 2 vectors organized in a single data.frame named "df", he can just do something like:
matplot(df, type="l", lty=1, lwd=1, col=c("blue","red"))
Hope this helps.

Resources