I have a signal with dimensional = (samples, height, width, channels) and a kernel with dimensional = (filter_height, filter_width, in_channels, out_channels) to use them in a convolution operation as:
tf.nn.conv2d(signal, kernel, strides=[1, 1, 1, 1], padding="VALID")
However, width = filter_width and channels = in_channels because I'm using a text as signal. What I want to check is if I use a multichannel signal is the same as concatenating in the width dimension the multichannel signal and operating as one channel signal (like in the next figure) but the output is different even sharing the kernel values. To explain better, I copy some code here:
import tensorflow as tf
sess = tf.InteractiveSession()
samples = 2
n = 10
We = 5
filter_size = 4
num_filters = 20
embedding1 = tf.Variable(tf.truncated_normal([samples,n,We,1], stddev=0.1))
embedding2 = tf.Variable(tf.truncated_normal([samples,n,We,1], stddev=0.1))
multi_X = tf.concat([embedding1, embedding2], 3)#shape=(2, 10, 5, 2)
concat_X = tf.concat([embedding1, embedding2], 2)#shape=(2, 10, 10, 1)
W1 = tf.Variable(tf.truncated_normal([filter_size, We, 1, num_filters], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([filter_size, We, 1, num_filters], stddev=0.1))
multi_W = tf.concat([W1, W2], 2)#shape=(4, 5, 2, 20)
concat_W = tf.concat([W1, W2], 1)#shape=(4, 10, 1, 20)
multi_conv = tf.nn.conv2d(multi_X, multi_W, strides=[1, 1, 1, 1], padding="VALID")#shape=(2, 7, 1, 20)
concat_conv = tf.nn.conv2d(concat_X, concat_W, strides=[1, 1, 1, 1], padding="VALID")#shape=(2, 7, 1, 20)
init_op = tf.global_variables_initializer()
#sess.run(multi_conv) == sess.run(concat_conv) may be True
I have some simple tree-like edge-data (e.g. data below) with the following characteristics:
there is a root node (0)
all non-root nodes have exactly one parent, but 0-to-many children (including 1)
there is a time t associated with each edge (or equivalently each unique node in i.fr)
we can compute dt as below, if helpful
I want to plot these data as a tree, with time along one dimension, so that edge lengths are proportional to dt (e.g. sketch below). How can I do this in R?
I explored ape and data.tree packages, and ggtree, but none seem to provide interface for creating tree objects from edge lists, and I think my data (with 1-child nodes) are rejected as some types of trees?
Sample Data
tree = data.frame(
t = c( 0, 1, 1, 4, 5, 7),
i.fr = c( 0, 1, 1, 2, 3, 5),
i.to = c( 1, 2, 3, 4, 5, 6),
dt = c(NA, 1, 1, 3, 4, 2))
Fake phylo
fake.phylo = list(
edge = cbind(tree$i.fr,tree$i.to),
tip.label = c('4','6'),
Nnode = 5,
edge.length = tree$dt)
class(fake.phylo) = 'phylo'
phylo.tree = as.phylo(fake.phylo) # works 😈
plot(tree) # (!) tree badly conformed; cannot plot. Check the edge matrix.
Desired Result
Here is an option using ggraph; I have no idea how well this generalises for trees with more than one split.
tree %>%
select(i.fr, i.to) %>%
graph_from_data_frame(directed = TRUE) %>%
ggraph() +
geom_node_point() +
geom_edge_link(arrow = arrow(length = unit(4, 'mm')), end_cap = circle(3, 'mm')) +
geom_node_label(aes(label = name)) +
theme_minimal() +
limits = c(0, max(tree$dt, na.rm = TRUE)),
breaks = c(0:(max(tree$dt, na.rm = TRUE))),
labels = rev(c(0:(max(tree$dt, na.rm = TRUE)))),
minor_breaks = NULL,
position = "left") +
scale_x_continuous(expand = c(0.1, 0.1), breaks = NULL) +
labs(x = "", y = "Time")
You can also force coord_fixed() which gives you a narrower version
# ... Same as before +
With help from here, a reasonably full-featured solution is:
.tip.pos <<- 0
.recurse.tree = function(ii,par=0,gen=0){
# recursively walk the tree and extract the following rows:
# index (ordered by tree search), generation, position, n direct children, n total children
b.par = ii[,1]==par
i.chi = ii[b.par,2]
n.chi = length(i.chi)
if (n.chi > 0){
mat.chi = matrix(nrow=5,unlist(lapply(i.chi,function(i){
par.pos = mean(range(mat.chi[3,mat.chi[1,] %in% i.chi])) # midpoint of direct children
mat.par.chi = matrix(nrow=5,c(par,gen,par.pos,n.chi,ncol(mat.chi),mat.chi))
} else {
.tip.pos <<- .tip.pos + 1
mat.par = matrix(nrow=5,c(par,gen,.tip.pos,0,0))
plot.tree = function(tree,...){
# plot a transmission tree vs time
tree.data = .recurse.tree(as.matrix(tree[c('par','chi')]))
tree = rbind(c(-1,-1,0,NA),tree) # append dummy root
tree = tree[match(tree.data[1,],tree$chi),] # reorder to match tree.data
tree$gen = factor(tree.data[2,]) # generation
tree$pos = tree.data[3,] # position
tree$child.direct = tree.data[4,] # n direct children
tree$child.total = tree.data[5,] # n total children
pc.map = match(tree$par,tree$chi) # lookup index for chi -> par
tree$pos.par = tree$pos[pc.map] # parent position
tree$t.par = tree$t[pc.map] # parent t
g = ggplot(tree) +
geom_segment(aes_string(y='t.par',x='pos.par',xend='pos',yend='t'),alpha=.5) +
geom_point(aes_string(x='pos',y='t',...)) +
scale_x_continuous(labels=NULL,breaks=NULL) + labs(x='')
tree = data.frame(
t = c( 0, 1, 1, 4, 5, 7),
par = c( 0, 1, 1, 2, 3, 5),
chi = c( 1, 2, 3, 4, 5, 6),
dt = c(NA, 1, 1, 3, 4, 2))
g = plot.tree(tree,color='gen')
The code above can be easily modified to give some nice results, e.g. with bigger data, custom aes, and ggMarginal:
I'm new to R and trying to isolate the best performing features from a data set of 247 columns (246 variables + 1 outcome), and 800 or so rows (where each row is one person's data) to create a predictive model.
I'm using caret to do RFE using lmfuncs - I need to use linear regression since the target variable continuous.
I use the following to split into test/training data (which hasn't evoked errors)
inTrain <- createDataPartition(data$targetVar, p = .8, list = F)
train <- data[inTrain, ]
test <- data[-inTrain, ]
The resulting test and train files have even variables within the sets. e.g X and Y contain the same number samples / all columns are the same length
My control parameters are as follows (also runs without error)
control = rfeControl(functions = lmFuncs, method = "repeatedcv", repeats = 5, verbose = F, returnResamp = "all")
But when I run RFE I get an error message saying
Error in rfe.default(train[, -1], train[, 1], sizes = c(10, 15, 20, 25, 30), rfeControl = control) :
there should be the same number of samples in x and y
My code for RFE is as follows, with the target variable in first column
rfe_lm_profile <- rfe(train[, -1], train[, 1], sizes = c(10, 15, 20, 25, 30), rfeControl = control)
I've looked through various forums, but nothing seems to work.
This google.group suggests using an older version of Caret - which I tried, but got the same X/Y error https://groups.google.com/g/rregrs/c/qwcP0VGn4ag?pli=1
Others suggest converting the target variable to a factor or matrix. This hasn't helped, and evokes
Warning message:
In createDataPartition(data$EBI_SUM, p = 0.8, list = F) :
Some classes have a single record
when partitioning the data into test/train, and the same X/Y sample error if you try to carry out RFE.
Mega thanks in advance :)
Here's the dput for the target variable (EBI_SUM) and a couple of variables
data <- structure(list(TargetVar = c(243, 243, 243, 243, 355, 355), Dosing = c(2,
2, 2, 2, 2, 2), `QIDS_1 ` = c(1, 1, 3, 1, 1, 1), `QIDS_2 ` = c(3,
3, 2, 3, 3, 3), `QIDS_3 ` = c(1, 2, 1, 1, 1, 2)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Your data object should not contain spaces:
data <- data.frame(
TargetVar = c(243, 243, 243, 243, 355, 355),
Dosing = c(2, 2, 2, 2, 2, 2),
QIDS_1 = c(1, 1, 3, 1, 1, 1),
QIDS_2 = c(3, 3, 2, 3, 3, 3),
QIDS_3 = c(1, 2, 1, 1, 1, 2)
inTrain <- createDataPartition(data$TargetVar, p = .8, list = F)
train <- data[inTrain, ]
test <- data[-inTrain, ]
control <- rfeControl(functions = lmFuncs, method = "repeatedcv", repeats = 5, verbose = F, returnResamp = "all")
rfe_lm_profile <- rfe(train[, -1], train[, 1], sizes = c(10, 15, 20, 25, 30), rfeControl = control)
I'm looking for an efficient way to plot time, x, y, z with different colors for different objects - to view proximity of the objects over time.
plot3D::line3D works with add = TRUE, but it is not very elegant. Here's a sample code that works:
data$object_id <- factor(data$object_id)
for(tr in unique(data$object_id)) {
lines3D(data$x[data$object_id == tr], data$y[data$object_id == tr], data$z[data$ba object_id ll == tr], add = T, col = data$object_id[data$object_id == tr])
Example data:
data <- data.frame(object_id = c(1, 1, 2, 2), t = c(0, 1, 0, 1), x = c(0, 1, 1, 0), y = c(0, 1, 1, 0), altitude = c(0, 1, 1, 0))
Desired result: path traced by different objects at a given time along with an arrow that indicates the current direction of heading (determined by joining the last 2 known positions).
At time t = 0, this should yield nothing or should yield points. At t = 1, this should yield 2 lines (one over the other) of different colors: one color for each object.
2D equivalent is ggplot2::geom_path, which does all the heavy-lifting using group parameter which joins all the paths by the grouping variable.
I was trying to produce the following 7 x 4 matrix in R:
m = matrix(c(seq(25, 1, by = -4),
seq(26, 2, by = -4),
seq(27, 3, by = -4),
seq(28, 4, by = -4)), nrow = 7, ncol = 4)
BUT, I'm wondering if could I achieve the same matrix with more efficient R code than what I used above?
Here's a solution:
m <- matrix(rev(c(1:28)),nrow=7,ncol=4,byrow = TRUE)[,rev(1:4)]
And this one is even faster:
m <- matrix(28:1,nrow=7,ncol=4,byrow = TRUE)[,4:1]
m = matrix(c(rep(seq(25, 1, by = -4),4) + rep(c(0:3),each=7) ), nrow = 7, ncol = 4)
Not sure if you would call this more efficient...
When one performs a logit regression in R, it is possible to obtain coefficients after the optimization algorithm has converged (or not) with coefficients() function:
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche)
## (Intercept) Age
## -21.226395 1.631968
Is there a way to obtain coefficients for each step of the optimization algorithm to trace its steps?
The internals of glm.fit have changed (see comment from #John) so use this instead. It does not rely on line positions of the internals but rather intercepts each instance of cat in glm.fit and adds a message to iteration message so although it still depends on the internals it should be a bit less fragile. This worked for me in R 4.1 and 4.2.
trace(glm.fit, quote(cat <- function(...) {
if (...length() >= 3 && identical(..3, " Iterations - ")) print(coefold)
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche,
control = glm.control(trace = TRUE))
Previous solution
The control= argument with the value shown causes the deviance to print and the trace statement will cause the coefficient values to print:
trace(glm.fit, quote(print(coefold)), at = list(c(22, 4, 8, 4, 19, 3)))
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche,
control = glm.control(trace = TRUE))
The output will look like this:
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
Deviance = 27.23412 Iterations - 1
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] -20.673652 1.589536
Deviance = 26.7041 Iterations - 2
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] -21.206854 1.630468
Deviance = 26.70345 Iterations - 3
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] -21.226370 1.631966
Deviance = 26.70345 Iterations - 4
To remove the trace use:
Note that in the trace call, coefold is the name of a variable used internally in glm.fit source code and the numbers used refer to statement numbers in the source code and so either could need to be changed if glm.fit source changes. I am using "R version 3.2.2 Patched (2015-10-19 r69550)".