MXNet: sequence length in LSTM in a non-sequence data (R) - r

My data are not timeseries, but it has sequential properties.
Consider one sample:
data1 = matrix(rnorm(10, 0, 1), nrow = 1)
label1 = rnorm(1, 0, 1)
label1 is a function of the data1, but the data matrix is not a timeseries. I suppose that label is a function of not just one data sample, but more older samples, which are naturally ordered in time (not sampled randomly), in other words, data samples are dependent with one another.
I have a batch of examples, say, 16.
With that I want to understand how I can design an RNN/LSTM model which will memorize all 16 examples from the batch to construct the internal state. I am especially confused with the seq_len parameter, which as I understand is specifically about the length of the timeseries used as an input to a network, which is not case.
Now this piece of code (taken from a timeseries example) only confuses me because I don't see how my task fits in.
rm(symbol)
symbol <- rnn.graph.unroll(seq_len = 5,
num_rnn_layer = 1,
num_hidden = 50,
input_size = NULL,
num_embed = NULL,
num_decode = 1,
masking = F,
loss_output = "linear",
dropout = 0.2,
ignore_label = -1,
cell_type = "lstm",
output_last_state = F,
config = "seq-to-one")
graph.viz(symbol, type = "graph", direction = "LR",
graph.height.px = 600, graph.width.px = 800)
train.data <- mx.io.arrayiter(
data = matrix(rnorm(100, 0, 1), ncol = 20)
, label = rnorm(20, 0, 1)
, batch.size = 20
, shuffle = F
)

Sure, you can treat them as time steps, and apply LSTM. Also check out this example: https://github.com/apache/incubator-mxnet/tree/master/example/multivariate_time_series as it might be relevant for your case.

Related

How to create a function within function

I have a problem in creating my function to impute my generated missing values. I have a generating data function, generating missing values function and imputing missing values function. But how can I combine them into one function?
# generate data
data <- function (n,alpha,kappa,miu){
X = rvm(n,alpha,kappa)
delta = rvm(n, 0, kappa)
epsilon = rvm(n, 0, kappa)
x = (X + delta)%%(2*pi)
Y = (alpha + X)%%(2*pi)
y = (Y + epsilon)%%(2*pi)
sample = cbind(x,y)
return(sample)
}
#generate missing values
misVal <- ampute(data=data(10,0.7854,5,0),prop=0.25,bycases=FALSE)
#impute missing values
impData <- mice(misVal,m=5,maxit=50,meth='pmm',seed=500)
summary(impData)
Combine all three functions into one large custom function...
nameYourFunction <- function(n,alpha,kappa,miu){
X = rvm(n,alpha,kappa)
delta = rvm(n, 0, kappa)
epsilon = rvm(n, 0, kappa)
x = (X + delta)%%(2*pi)
Y = (alpha + X)%%(2*pi)
y = (Y + epsilon)%%(2*pi)
sample = cbind(x,y)
#generate missing values
misVal <- ampute(data=sample,prop=0.25,bycases=FALSE)
#impute missing values
impData <- mice(misVal,m=5,maxit=50,meth='pmm',seed=500)
return(impData)
}
Then to run...
final_data <- nameYourFunction(n = 10, alpha = 0.7854, kappa = 5, miu = 0)
summary(final_data)
Obviously you may want to rename the function based on your own preferences.
If you wanted something more flexible, like to be able to easily supply arguments for the other function called within nameYourFunction, then you would add them to the list of arguments provided in the first line of code. So it might end up looking more like...
nameYourFunction <- function(n,alpha,kappa,miu,prop,m,maxit,meth,seed){...}
Then supplying those values to the function call like...
final_data <- nameYourFunction(n = 10, alpha = 0.7854, kappa = 5, miu = 0, prop = 0.25, m = 5, maxit = 50, meth = 'pmm', seed = 500)
And removing the hard coded values from within the custom function. I would probably recommend against this though as that is a lot of arguments to keep track of!

Can't pass variable into function in R

I am trying to fit a list of dataframes and I can't figure out why I can't define conc and t0 outside of the function.
If I do it like this I get error:
'Error in nls.multstart::nls_multstart(y ~ fit_drx_mono(assoc_time,
t0, : There must be as many parameter starting bounds as there are
parameters'
conc <- 5e-9
t0 <- 127
nls.multstart::nls_multstart(y ~ fit_mono(assoc_time, t0, conc, kon, koff, ampon, ampoff),
data = data_to_fit,
iter = 100,
start_lower = c(kon = 1e4, koff = 0.00001, ampon = 0.05, ampoff = 0),
start_upper = c(kon = 1e7, koff = 0.5, ampon = 0.6, ampoff = 0.5),
lower = c(kon = 0, koff = 0, ampon = 0, ampoff = 0))
When I specify the values in the function everything works as it is supposed to. And I don't understand why.
It turned out I cannot define data = data_to_fit otherwise the function looks for variables only in that dataframe. Once I defined every variable outside of the function without specifying data it works.

How do I graph a Bayesian Network with instantiated nodes using bnlearn and graphviz?

I am trying to graph a Bayesian Network (BN) with instantiated nodes using the libraries bnlearn and Rgraphviz. My workflow is as follow:
After creating a data frame with random data (the data I am actually using is obviously not random) I then discretise the data, structure learn the directed acyclic graph (DAG), fit the data to the DAG and then plot the DAG. I also plot a DAG which shows the posterior probabilities of each of the nodes.
#rm(list = ls())
library(bnlearn)
library(Rgraphviz)
# Generating random dataframe
data_clean <- data.frame(a = runif(min = 0, max = 100, n = 1000),
b = runif(min = 0, max = 100, n = 1000),
c = runif(min = 0, max = 100, n = 1000),
d = runif(min = 0, max = 100, n = 1000),
e = runif(min = 0, max = 100, n = 1000))
# Discretising the data into 3 bins
bins <- 3
data_discrete <- discretize(data_clean, breaks = bins)
# Creating factors for each bin in the data
lv <- c("low", "med", "high")
for (i in names(data_discrete)){
levels(data_discrete[, i]) = lv
}
# Structure learning the DAG from the training set
whitelist <- matrix(c("a", "b",
"b", "c",
"c", "e",
"a", "d",
"d", "e"),
ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))
bn.hc <- hc(data_discrete, whitelist = whitelist)
# Plotting the DAG
dag.hc <- graphviz.plot(bn.hc,
layout = "dot")
# Fitting the data to the structure
fitted <- bn.fit(bn.hc, data = data_discrete, method = "bayes")
# Plotting the DAG with posteriors
graphviz.chart(fitted, type = "barprob", layout = "dot")
The next thing I do is to manually change the distributions in the bn.fit object, assigned to fitted, and then plot a DAG that shows the instantiated nodes and the updated posterior probability of the response variable e.
# Manually instantiating
fitted_evidence <- fitted
cpt.a = matrix(c(1, 0, 0), ncol = 3, dimnames = list(NULL, lv))
cpt.c = c(1, 0, 0,
0, 1, 0,
0, 0, 1)
dim(cpt.c) <- c(3, 3)
dimnames(cpt.c) <- list("c" = lv, "b" = lv)
cpt.b = c(1, 0, 0,
0, 1, 0,
0, 0, 1)
dim(cpt.b) <- c(3, 3)
dimnames(cpt.b) <- list("b" = lv, "a" = lv)
cpt.d = c(0, 0, 1,
0, 1, 0,
1, 0, 0)
dim(cpt.d) <- c(3, 3)
dimnames(cpt.d) <- list("d" = lv, "a" = lv)
fitted_evidence$a <- cpt.a
fitted_evidence$b <- cpt.b
fitted_evidence$c <- cpt.c
fitted_evidence$d <- cpt.d
# Plotting the DAG with instantiation and posterior for response
graphviz.chart(fitted_evidence, type = "barprob", layout = "dot")
This is the result I get but my actual BN is much larger with many more arcs and it would be impractical to manually change the bn.fit object.
I would like to find out if there is a way to plot a DAG with instantiation without changing the bn.fit object manually? Is there a workaround or function that I am missing?
I think/hope I have read the documentation for bnlearn thoroughly. I appreciate any feedback and would be happy to change anything in the question if I have not conveyed my thoughts clearly enough.
Thank you.
How about using cpdist to draw samples from the posterior given the evidence. You can then estimate the updated parameters using bn.fit using the cpdist samples. Then plot as before.
An example:
set.seed(69184390) # for sampling
# Your evidence vector
ev <- list(a = "low", b="low", c="low", d="high")
# draw samples
updated_dat <- cpdist(fitted, nodes=bnlearn::nodes(fitted), evidence=ev, method="lw", n=1e6)
# refit : you'll get warnings over missing levels
updated_fit <- bn.fit(bn.hc, data = updated_dat)
# plot
par(mar=rep(0,4))
graphviz.chart(updated_fit, type = "barprob", layout = "dot")
Note I used bnlearn::nodes as nodes is masked by a dependency of Rgraphviz. I tend to load bnlearn last.

data manipulation - R

I am struggling with data manipulation in R. My dataset consists of variables type(5 factors), intensity(3 factors), damage(continous). I want to calculate mean damage(demage1, demage2 and damage3 separately) with respect to intensity and type. In onther words I want to summarize the average damage by type and intensity. I have created this small reproducible example of my data:
type <- sample(seq(from = 1, to = 5, by = 1), size = 50, replace = TRUE)
intensity <- sample(seq(from = 1, to = 3, by = 1), size = 50, replace = TRUE)
damage1 <- sample(seq(from = 1, to = 50, by = 1), size = 50, replace = TRUE)
damage2 <- sample(seq(from = 1, to = 200, by = 1), size = 50, replace = TRUE)
damage3 <- sample(seq(from = 1, to = 500, by = 1), size = 50, replace = TRUE)
dat <- cbind(type, intensity, damage1, damage2, damage3)
then to manipulate the data I have used the pipe operator %>% buy my commands seem not to work very well:
dat <- as.data.frame(dat)
dat %>%
filter(type == 1) %>%
group_by(intensity, damage) %>%
summarise(mean_damage = mean(Value))
I have read about multiple usefull functions here:
efficient reshaping using data tables
manipulating data tables
Do Faster Data Manipulation using These 7 R Packages
But I wasnt able to make any progress here. My question are:
What is wrong with my code?
Am I even going in the right direction here?
Is there some alternative how to do this?

Regarding the argument of d-dimensional copula function in R

I have a simple question on R. This is a simple code to generate random variables from a bivariate normal clayton copula with normally distributed margins. How could I do this neatly if I had d equally distributed margins, without having to write c("norm","norm","norm", ... ) etc.?
myMvd1 <- mvdc(copula = archmCopula(family = "clayton", param = 2),
margins = c("norm", "norm"), paramMargins = list(list(mean = 0,
sd = 1), list(mean = 0, sd = 1)))
You can use rep:
d <- 5
mvdc(copula = archmCopula(family = "clayton", param = 2),
margins = rep("norm", d),
paramMargins = rep(list(list(mean = 0, sd = 1)), d))
(And not knowing what this is about, I am not sure if param should be 2 or d.)
You can do something like this :
matrix(rMvdc(d*nRow, myMvd1),nRow,d)

Resources