I am trying to reproduce the example given in Goodfellow, I. et al.: Generative Adversarial Nets
The pseudocode is given on page 4 as "Algorithm 1". I am trying to rebuild it with the neuralnet package in R:
library(neuralnet)
train_iter <- 10
steps <- 1
m <- 100
# initialize D and G
z <- sort(runif(m))
x <- sort(rnorm(m))
data <- cbind(z, x)
D <- neuralnet( , data = data, hidden = 11) # unclear how to define formula
G <- neuralnet(x ~ z, data = data, hidden = 11)
for (i in 1:train_iter) {
for (k in 1:steps) {
z <- sort(runif(m))
x <- sort(rnorm(m))
data <- cbind(z, x)
err_fct_d <- function(x, z) {
-log(compute(D, x)$net.result + log(1 - compute(D, compute(G, z)$net.result)$net.result))
}
D <- neuralnet( , data = data, hidden = 11, err.fct = err_fct_d, startweights = D$weights) # unclear how to define formula
}
z <- sort(runif(m))
data <- cbind(z, x)
err_fct_g <- function(x, z) {
log(1 - compute(D, compute(G, z)$net.result)$net.result)
}
G <- neuralnet(x ~ z, data = data, hidden = 11, err.fct = err_fct_g, startweights = G$weights)
}
My questions
My first question is whether it is possible to use the neuralnet package with these customized error functions in the above way at all. My second question concerns the discriminator network: I don't know how to train it, i.e. how to define the formula part of the neuralnet function.
Unfortunately this does not work out of the box because err.fct has to be an analytically differentiable function and the compute function prevents this.
Edit: I contacted the author of the package and he wrote:
I checked your GAN example and I think that is not possible with neuralnet because the way error functions are handled is not flexible enough. [...]
Related
I generated some data in R
n <- 1000; p <- 30
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
beta <- c(rep(1, 10), rep(0, 10), rep(-2, 10))
y <- X %*% beta + rnorm(1000)
Next, I want to run a stepwise regression of y on the columns of X, from 1 to 30. First I only include the intercept, then only intercept and column one, then add column two, column three, and so on. I wrote the following code
model <- lm(y~1)
for(i in 1:30){
model <- update(model, ~.+X[, i])
print(model)
}
What I see in the output now is that for each iteration, the regression is of y on an intercept and X[, i], i.e. the i-th column of X, and not the previous columns, even though I'm updating at every step. For example, when i = 4, the model is a regression of y on an intercept and X[, 4], not all of columns 1, 2, 3, 4. Why does this happen?
Try this
model <- lm(y~1)
for(i in 1:30){
model <- update(model, ~.+X[, 1:i])
print(model)
}
The reason your proposed code doesn't work is because of how R sees the formula and the fact that R updates the formula before it evaluates i.
The source code for the relevant update method can be viewed by running update.default at the command line. You'll see that after some error checking it runs call$formula <- update(formula(object), formula.), which calls the update.formula() function. update.formula() sees that you want to add the term X[, i] into the formula and does that. But update.formula() doesn't evaluate the value of i at this point, it relies on "lazy evaluation". This can be seen more clearly if we expand out the loop.
form <- y ~ 1
form
#> y ~ 1
i <- 1
form <- update.formula(form, ~. +X[, i])
form
#> y ~ X[, i]
i <- 2
form <- update.formula(form, ~. +X[, i])
form
#> y ~ X[, i]
The formula is being updated with the symbol X[, i] and then simplified to remove the duplicate symbol. This lazy evaluation is useful because it means that I don't need to actually define what X of y are for the above code to run. R trusts that I'll create appropriate objects before I try to use them.
After update() has updated the formula, it eval()'s the updated call. At this time i is evaluated and its current value is used. So in fact, this loop below gives the exact same output as your loop even though it doesn't try to change the formula at all. Each time lm() runs it looks for the current value of i to use.
for(i in 1:30){
model <- lm(y ~ X[, i])
print(model)
}
To achieve your desired effect you can programmatically create the formula outside the lm() function, not using an update() function. Like so,
n <- 1000; p <- 30
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
beta <- c(rep(1, 10), rep(0, 10), rep(-2, 10))
y <- X %*% beta + rnorm(1000)
xnames <- sapply(list(1:ncol(X)), function(x) paste0("X",x))
colnames(X) <- xnames
dat <- data.frame(y,X)
for(i in 1:30){
form <- as.formula(paste0("y ~ ", paste(xnames[1:i], collapse = "+")))
model <- lm(form, data = dat)
print(model)
}
EDIT:
After reading this post, https://notstatschat.rbind.io/2022/06/23/getting-strings-into-code-in-base-r/, an alternate way to perform the formula manipulations is to use bquote(). This has the advantage that the model summary contains the correct formula.
for(i in 1:30){
model <- eval(bquote(update(model, ~. + .(as.name(xnames[[i]])))))
print(model)
}
I'm creating a neural network and i want to use the library torch for its autograd function. Now i can convert my data to a torch_tensor, but as soon as i then add that tensor to a list of other tensors they seem to lose their torch properties (which are needed to calculate the gradient at the end of the feedforward loop).
The reason i want to put the tensors in a list is because i want the number of hidden layers and neurons per hidden layer to be customizable by the end-user. Previously, before i started using torch, i accomplished that by making a list of all the separate weight matrices, the amount of which is determined by a user-provided variable.
This is the main learning function for my neural network:
train <- function(x, y, hidden = 4, layers = 3, rate = 0.01, iterations = 10000) {
d <- ncol(x) + 1
x <- torch_tensor(x)
Wn <- list()
Wn[[1]] <- torch_randn(d, hidden[1], requires_grad = T)
if(layers > 1){
for(j in 2:layers){
Wn[[j]] <- torch_randn(hidden[j-1] + 1, hidden[j], requires_grad = T)
}
}
Wn[[layers + 1]] <- torch_randn(hidden[length(hidden)] + 1, 1, requires_grad = T)
for (i in 1:iterations) {
ff <- feedforward(x, Wn)
Wn <- backpropagate(y, Wn, ff, learn_rate = rate)
}
return(Wn)
}
My feedforward function looks like this:
feedforward <- function(x, Wn) {
h <- list(x)
for(k in 1:length(Wn)){
Zn <- cbind(1, h[[k]]) %*% Wn[[k]]
h[[k + 1]] <- relu(Zn)
}
return(h)
}
with relu <- function(x) { max(0,x) }
Is there any way of making this work? Or should i try to find a different method to do the feedforward function?
The example is from the rootsolve package:
We have this function:
gradient(f, x, centered = FALSE, pert = 1e-8, ...)
in which f is a function and x is the data input in for of a vector
Now the following is an instance of the code being run:
logistic <- function (x, times) {
with (as.list(x),
{
N <- K / (1+(K-N0)/N0*exp(-r*times))
return(c(N = N))
})
}
# parameters for the US population from 1900
x <- c(N0 = 76.1, r = 0.02, K = 500)
# Sensitivity function: SF: dfi/dxj at
# output intervals from 1900 to 1950
SF <- gradient(f = logistic, x, times = 0:50)
My question is how does the code understand to use times in its routine. It's not defined globally and it is not part of the function input list either. Is it possible to pass inputs to a function when it is not defined in its structure? Does ... play a role here?
... is just a way of getting an extra arguments and passing them on to another function.
Simple example:
power.function <- function(x,power) { x^power }
apply.function <- function(f, data, ...) { f(data, ...) }
sample <- c(1,2,3)
apply.function (power.function, sample, power = 3)
# which is the same as
apply.function (power.function, sample, 3)
produces
> apply.function (power.function, sample, 3)
[1] 1 8 27
EDIT
To make it crystal clear, if you look at the source of the rootSolve::gradient you'll see the definition as
function (f, x, centered = FALSE, pert = 1e-08, ...)
and further down the call to
reff <- f(x, ...)
which is the same as described above in the example.
So I have a system of ode's and some data I am using the R packages deSolve and FME to fit the parameters of the ode system to data. I am getting a singular matrix result when I fit the full parameter set to the data. So I went back and looked at the collinearity of the parameters using a collinearity index cut-off of 20 as suggested in all the FME package documentation I then picked a few models with subsets of parameters to fit. Then when I run modFit I get this error:
Error in approx(xMod, yMod, xout = xDat) :
need at least two non-NA values to interpolate
Can anyone enlighten me as to a fix for this. Everything else is working fine. So this is not a coding problem.
Here is a minimal working example (removing r=2 in modFit creates the error which I can fix in the minimal working example but not in my actual problem so I doubt a minimal working example helps here):
`## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
# You need these packages
library('deSolve')
library('FME')
## logistic growth model
TT <- seq(1, 100, 2.5)
N0 <- 0.1
r <- 0.5
K <- 100
## analytical solution
Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
time <- 0:100
parms <- c(r = r, K = K)
x <- c(N = N0)
logist <- function(t, x, parms) {
with(as.list(parms), {
dx <- r * x[1] * (1 - x[1]/K)
list(dx)
})
}
## Run the model with initial guess: K = 10, r = 2
parms["K"] <- 10
parms["r"] <- 2
init <- ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<-modFit(p = c(K = 10,r=2), f = Cost))
summary(Fit)`
I think the problem is in your Cost function. If you don't provide both K and r, then the cost function will override the start value of r to NA. You can test this:
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
print(parms)
#out <- ode(x, time, logist, parms)
#return(modCost(out, Ana))
}
Cost(c(K=10, r = 2))
Cost(c(K=10))
This function works:
Cost <- function(P) {
parms[names(P)] <- P
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
The vignette FMEDyna is very helpful: https://cran.r-project.org/web/packages/FME/vignettes/FMEdyna.pdf See page 14 on how to specify the Objective (Cost) function.
I'm trying to recover the R matrix from the QR decomposition used in biglm. For this I am using a portion of the code in vcov.biglm and put it into a function like so:
qr.R.biglm <- function (object, ...) {
# Return the qr.R matrix from a biglm object
object$qr <- .Call("singcheckQR", object$qr)
p <- length(object$qr$D)
R <- diag(p)
R[row(R) > col(R)] <- object$qr$rbar
R <- t(R)
R <- sqrt(object$qr$D) * R
dimnames(R) <- list(object$names, object$names)
return(R)
}
More specifically, I'm trying to get the same result as using qr.R from the base package, which is used on QR decompositions of class "qr" such as those contained in the lm class (lm$qr). The code for the base function is as follows:
qr.R <- function (qr, complete = FALSE) {
if (!is.qr(qr))
stop("argument is not a QR decomposition")
R <- qr$qr
if (!complete)
R <- R[seq.int(min(dim(R))), , drop = FALSE]
R[row(R) > col(R)] <- 0
R
}
I manage to get the same result for a sample regression, except for the signs.
x <- as.data.frame(matrix(rnorm(100 * 10), 100, 10))
y <- seq.int(1, 100)
fit.lm <- lm("y ~ .", data = cbind(y, x))
R.lm <- qr.R(fit.lm$qr)
library(biglm)
fmla <- as.formula(paste("y ~ ", paste(colnames(x), collapse = "+")))
fit.biglm <- biglm(fmla, data = cbind(y, x))
R.biglm <- qr.R.biglm(fit.biglm)
Comparing both, it's clear that the absolute values match, but not the signs.
mean(abs(R.lm) - abs(R.biglm) < 1e-6)
[1] 1
mean(R.lm - R.biglm < 1e-6)
[1] 0.9338843
I can't quite figure out why this is. I would like to be able to get the same result for the R matrix as lm from biglm.
The difference between the two R matrices is that biglm apparently performs its rotations such that R's diagonal elements are all positive, while lm (or, really, the routines it calls) imposes no such constraint. (There should be no numerical advantage to one strategy or the other, so the difference is just one of convention, AFAIKT.)
You can make lm's results identical to biglm's by imposing that additional constraint yourself. I'd use a reflection matrix that multiplies columns by either 1 or -1, such that the diagonal elements all end up positive:
## Apply the necessary reflections
R.lm2 <- diag(sign(diag(R.lm))) %*% R.lm
## Show that they did the job
mean(R.lm2 - R.biglm < 1e-6)
# [1] 1