I'm looking for an efficient way to plot time, x, y, z with different colors for different objects - to view proximity of the objects over time.
plot3D::line3D works with add = TRUE, but it is not very elegant. Here's a sample code that works:
data$object_id <- factor(data$object_id)
library(plot3D)
for(tr in unique(data$object_id)) {
lines3D(data$x[data$object_id == tr], data$y[data$object_id == tr], data$z[data$ba object_id ll == tr], add = T, col = data$object_id[data$object_id == tr])
}
Example data:
data <- data.frame(object_id = c(1, 1, 2, 2), t = c(0, 1, 0, 1), x = c(0, 1, 1, 0), y = c(0, 1, 1, 0), altitude = c(0, 1, 1, 0))
Desired result: path traced by different objects at a given time along with an arrow that indicates the current direction of heading (determined by joining the last 2 known positions).
At time t = 0, this should yield nothing or should yield points. At t = 1, this should yield 2 lines (one over the other) of different colors: one color for each object.
2D equivalent is ggplot2::geom_path, which does all the heavy-lifting using group parameter which joins all the paths by the grouping variable.
Related
I have some simple tree-like edge-data (e.g. data below) with the following characteristics:
there is a root node (0)
all non-root nodes have exactly one parent, but 0-to-many children (including 1)
there is a time t associated with each edge (or equivalently each unique node in i.fr)
we can compute dt as below, if helpful
I want to plot these data as a tree, with time along one dimension, so that edge lengths are proportional to dt (e.g. sketch below). How can I do this in R?
I explored ape and data.tree packages, and ggtree, but none seem to provide interface for creating tree objects from edge lists, and I think my data (with 1-child nodes) are rejected as some types of trees?
Sample Data
tree = data.frame(
t = c( 0, 1, 1, 4, 5, 7),
i.fr = c( 0, 1, 1, 2, 3, 5),
i.to = c( 1, 2, 3, 4, 5, 6),
dt = c(NA, 1, 1, 3, 4, 2))
Fake phylo
fake.phylo = list(
edge = cbind(tree$i.fr,tree$i.to),
tip.label = c('4','6'),
Nnode = 5,
edge.length = tree$dt)
class(fake.phylo) = 'phylo'
phylo.tree = as.phylo(fake.phylo) # works 😈
plot(tree) # (!) tree badly conformed; cannot plot. Check the edge matrix.
Desired Result
Here is an option using ggraph; I have no idea how well this generalises for trees with more than one split.
library(ggraph)
tree %>%
select(i.fr, i.to) %>%
graph_from_data_frame(directed = TRUE) %>%
ggraph() +
geom_node_point() +
geom_edge_link(arrow = arrow(length = unit(4, 'mm')), end_cap = circle(3, 'mm')) +
geom_node_label(aes(label = name)) +
theme_minimal() +
scale_y_continuous(
limits = c(0, max(tree$dt, na.rm = TRUE)),
breaks = c(0:(max(tree$dt, na.rm = TRUE))),
labels = rev(c(0:(max(tree$dt, na.rm = TRUE)))),
minor_breaks = NULL,
position = "left") +
scale_x_continuous(expand = c(0.1, 0.1), breaks = NULL) +
labs(x = "", y = "Time")
You can also force coord_fixed() which gives you a narrower version
# ... Same as before +
coord_fixed()
With help from here, a reasonably full-featured solution is:
library('ggplot2')
.tip.pos <<- 0
.recurse.tree = function(ii,par=0,gen=0){
# recursively walk the tree and extract the following rows:
# index (ordered by tree search), generation, position, n direct children, n total children
b.par = ii[,1]==par
i.chi = ii[b.par,2]
n.chi = length(i.chi)
if (n.chi > 0){
mat.chi = matrix(nrow=5,unlist(lapply(i.chi,function(i){
.recurse.tree(ii=ii[!b.par,,drop=FALSE],par=i,gen=gen+1)
})))
par.pos = mean(range(mat.chi[3,mat.chi[1,] %in% i.chi])) # midpoint of direct children
mat.par.chi = matrix(nrow=5,c(par,gen,par.pos,n.chi,ncol(mat.chi),mat.chi))
} else {
.tip.pos <<- .tip.pos + 1
mat.par = matrix(nrow=5,c(par,gen,.tip.pos,0,0))
}
}
plot.tree = function(tree,...){
# plot a transmission tree vs time
tree.data = .recurse.tree(as.matrix(tree[c('par','chi')]))
tree = rbind(c(-1,-1,0,NA),tree) # append dummy root
tree = tree[match(tree.data[1,],tree$chi),] # reorder to match tree.data
tree$gen = factor(tree.data[2,]) # generation
tree$pos = tree.data[3,] # position
tree$child.direct = tree.data[4,] # n direct children
tree$child.total = tree.data[5,] # n total children
pc.map = match(tree$par,tree$chi) # lookup index for chi -> par
tree$pos.par = tree$pos[pc.map] # parent position
tree$t.par = tree$t[pc.map] # parent t
g = ggplot(tree) +
geom_segment(aes_string(y='t.par',x='pos.par',xend='pos',yend='t'),alpha=.5) +
geom_point(aes_string(x='pos',y='t',...)) +
scale_x_continuous(labels=NULL,breaks=NULL) + labs(x='')
}
tree = data.frame(
t = c( 0, 1, 1, 4, 5, 7),
par = c( 0, 1, 1, 2, 3, 5),
chi = c( 1, 2, 3, 4, 5, 6),
dt = c(NA, 1, 1, 3, 4, 2))
g = plot.tree(tree,color='gen')
ggsave('Rplots.png',w=4,h=4)
Result
Extending
The code above can be easily modified to give some nice results, e.g. with bigger data, custom aes, and ggMarginal:
My data are not timeseries, but it has sequential properties.
Consider one sample:
data1 = matrix(rnorm(10, 0, 1), nrow = 1)
label1 = rnorm(1, 0, 1)
label1 is a function of the data1, but the data matrix is not a timeseries. I suppose that label is a function of not just one data sample, but more older samples, which are naturally ordered in time (not sampled randomly), in other words, data samples are dependent with one another.
I have a batch of examples, say, 16.
With that I want to understand how I can design an RNN/LSTM model which will memorize all 16 examples from the batch to construct the internal state. I am especially confused with the seq_len parameter, which as I understand is specifically about the length of the timeseries used as an input to a network, which is not case.
Now this piece of code (taken from a timeseries example) only confuses me because I don't see how my task fits in.
rm(symbol)
symbol <- rnn.graph.unroll(seq_len = 5,
num_rnn_layer = 1,
num_hidden = 50,
input_size = NULL,
num_embed = NULL,
num_decode = 1,
masking = F,
loss_output = "linear",
dropout = 0.2,
ignore_label = -1,
cell_type = "lstm",
output_last_state = F,
config = "seq-to-one")
graph.viz(symbol, type = "graph", direction = "LR",
graph.height.px = 600, graph.width.px = 800)
train.data <- mx.io.arrayiter(
data = matrix(rnorm(100, 0, 1), ncol = 20)
, label = rnorm(20, 0, 1)
, batch.size = 20
, shuffle = F
)
Sure, you can treat them as time steps, and apply LSTM. Also check out this example: https://github.com/apache/incubator-mxnet/tree/master/example/multivariate_time_series as it might be relevant for your case.
An assignment has tasked us with creating a series of variables: normal1, normal2, normal3, chiSquared1 and 2, t, and F. They are defined as follows:
library(tibble)
Normal.Frame <- data_frame(normal1 = rnorm(5000, 0, 1),
normal2 = rnorm(5000, 0, 1),
normal3 = rnorm(5000, 0, 1),
chiSquared1 = normal1^2,
chiSquared2 = normal2^2,
F = sum(chiSquared1/chiSquared2),
t = sum(normal3/sqrt(chiSquared1 )))
We then have to make histograms of the distributions for normal1, chiSquared1 and 2, t, and F, which is simple enough for normal1 and the chiSquared variables, but when I try to plot F and t, the plot space is blank.
Our lecturer recommended limiting the range of F to 0-10, and t to -5 to 5. To do this, I use:
HistT <- hist(Normal.Frame$t, xlim = c(-5, 5))
HistF <- hist(Normal.Frame$F, xlim = c(0, 10))
Like I mentioned, this yields blank plots.
Your t and F are defined as sums; they will be single values. If those values are outside your range, the histogram will be empty. If you remove the sum() function you should get the desired results.
I have a dataframe (what is the dataframe? i,e is not important).
I am using that and plotting some point curves. like below
#EXP <- 3 (example)
#EXP_VEC <- c(1:EXP)
for (i in 1:EXP)
{
gg2_plot[i] <- ggplot(subset(gg2,Ei == EXP_VEC[i] ),aes(x=hours, y=variable, fill = Mi)) + geom_point(aes(fill = Mi,color = Mi),size = 3)
}
As you can see EXP_VEC = c(1,2,3.......) (Depends on user input Ex: if user inputs 2 then EXP_VEC = c(1,2))
Dataframe has Ei = 1,2,3,4,........
Now I have to do the plotting for all these Ei values depending on the user input.
Consider, EXP_VEC=3
now the for loop should produce three plots for Ei = 1 , Ei = 2 and Ei = 3
for this if the for loop I have written works then it would have been done and finished.
But obviously for loop is not working. I cant use aes_string because variable "i" is outside the aes().
Ex: consider the following dataset
dd<-data.frame(
Ei = c(1L, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
Mi = c(1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2),
hours = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
variable = c(0.1023488, 0.1254325, 0.1523245, 0.1225425, 0.1452354,
0.1853324, 0.1452369, 0.1241241, 0.0542232, 0.8542154, 0.021542,
0.2541254))
As you can see I have two sets of Ei, I want to plot 1st plot for Ei = 1 and then beside this plot I want to again plot for Ei = 2.
So I thought of saving the plots for Ei=1 and Ei=2 in two separate variables and then using then in some kind of cascade function which I am yet to find out.
How do I do it?
Is there a easy way to do this by just using ggplot without any loop?
If not then how can I call "i" value inside my for loop?
I would do something like this:
plot_exp <-
function(i){
dat <- subset(gg2,Ei == i )
if (nrow(dat) > 0)
ggplot(dat,aes(x=hours, y=variable, fill = Mi)) +
geom_point(aes(color = Mi),size = 3)
}
ll <- lapply(seq_len(EXP), plot_exp)
ll is a list of plot of ggplot objects.
I have two sets of scatterplots. I want to use smoothScatter() but it seems that the add function does not work, I want to use separated colors for both distributions. For example:
X1<-rnorm(1000, mean = -2, sd = 1)
Y1<-rnorm(1000, mean = -2, sd = 1)
X2<-rnorm(1000, mean = 2, sd = 1)
Y2<-rnorm(1000, mean = 2, sd = 1)
smoothScatter(X1,Y1,col="green",colramp=colorRampPalette(c("white", "green")));
smoothScatter(X2,Y2,col="green",colramp=colorRampPalette(c("white", "red")),add=T);
Is it possible ?
You should use transparency like this colramp = colorRampPalette(c(rgb(1, 1, 1, 0), rgb(1, 0, 0, 1)), alpha = TRUE) for the second function.
This will add transparency to the background of second plot.