Using two different sigmoid-fucntions in Neuralnet in R - r

I am trying to repeat a study (https://www.sciencedirect.com/science/article/pii/S0957417410011711)
In the study they use two different functions, one for the hidden layer and one for the output. On page 5314 they write "A tangent sigmoid transfer function was selected on the hidden layer. On the other hand, a logistic sigmoid transfer function was used on the output layer."
I'm using package "neuralnet" in R.
In order to have a tangent sigmoid transfer function for the hidden layer I can use the code:
act.fct = 'tanh'
But this will create a problem that I will either A) have the SAME function for the output layer.
Or B) I use linear.output = T Which gives me a linear output, but not a sigmoid-function. Is there any possible way for me to have another function for the output layer?
Likewise: If I use act.fct = 'logistic' I will get a logistic sigmoid transfer function throughout the entire network, giving me the correct function for the output layer, but wrong for the hidden layers. Which again only take me halfway.
I have a crude alternative for solving this, a method I'd prefer to not use, It should be possible to use err.fct = and create a customized error function that uses the linear output and runs it through the desired sigmoid function, for the output. Then I run the output from compute command through the sigmoid function, separately. But that seems like a hassle and likely I will mess up somewhere along the way. Any proper/better solution for this?

It doesn't seem like the R package neuralnet supports activation functions in the hidden layers. Check out package keras to solve this for you.
model <- keras_model_sequential()
model %>%
layer_dense(units = 100, activation = 'tanh') %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 1, activation = 'sigmoid')

Related

Custom Weight Regularization in Keras

I am attempting to implement a custom regularization method in Keras for R which will discourage negative weightings during training. I have found supporting documentation for this in Python, just not for R.
In this method, I would like to identify negative weightings, and then apply regularization to those weights specifically. I have my current attempt defined as
l1l2_reg <- function(weight_matrix) {
neg <- which(weight_matrix < 0, arr.ind = T)
return(0.0001 * sum(sum(weight_matrix[neg]^2)) + sum(sum(abs(weight_matrix[neg]^2))))
}
I am defining the usage of this within my model as
reconstruct = bottleneck %>%
layer_dense(units = input_size, activation = "linear",
kernel_regularizer = l1l2_reg,
name = "reconstruct")
When the model is run, I am met with the error message
Error: Discrete value supplied to continuous scale
I believe that this is occurring because the function is not correctly locating the weights, but am unsure how to go about it. Based on the code above, it should be identifying the indices of the negative weightings and then returning the regularization based off of that, but clearly my implementation is flawed. I primarily use MATLAB, so my implementation may also be skewed towards that as well.
What is the correct method of implementing this within R?
For most custom functions passed to Keras (in both Python and R), you generally have to stick to TensorFlow operations. In this case, which() and subsetting with an integer array via [neg] need to be updated to their TensorFlow equivalents: tf$where() and tf$gather_nd(). Or you can take a different approach altogether and use tf$maximum(), like in the example below.
(The [ method for tensors today doesn't yet accept a list of arbitrary integer indices, but rather, slice specs, in R see ?`[.tensorflow.tensor` for details)
(sum(), abs(), ^, * are R generics which automatically dispatch to the TensorFlow methods tf$reduce_sum(), tf$abs(), tf$pow() and tf$multiply() when called with a Tensor)
You can update your l1l2_reg like this (note, the actual calculation is slightly different from what you wrote, to match the common meaning of "l1" and "l2"):
library(tensorflow)
library(keras)
neg_l1l2_reg <- function(weight_matrix) {
x <- tf$maximum(tf$zeros_like(weight_matrix), weight_matrix)
l1 <- sum(abs(x)) * 0.0001
l2 <- sum(x ^ 2) * 0.0001
l1 + l2
}

Flux.jl model always outputs 1.0 after adding Sigmoid activation function

My original issue was that I wanted my model to only output 0-1 so I can map back to my categorical images labels (Flux.jl restrict variables between 0 and 1). So I decided to add a sigmoid activation function as follows:
σ = sigmoid
model = Chain(
resnet[1:end-2],
Dense(2048, 1000),
Dense(1000, 256),
Dense(256, 2, σ), # we get 2048 features out, and we have 2 classes
);
However, now my model only outputs 1.0. Any ideas as to why or if I am using the activation function wrong?
Consider to use an activation function for your hidden layers as multiple linear layers (Dense layers without a non-linear activation function) are just equivalent to a single linear layer. If you are using categories which are exclusive (dog or cat, but not both) which cover all your cases (it will always be a dog or cat and never e.g. an ostrich) then the probabilities should sum to one and a softmax should be more appropriate for the last function.
The softmax function is generally used with the crossentropy
loss function.
model = Chain(
resnet[1:end-2],
Dense(2048, 1000, σ),
Dense(1000, 256, σ),
Dense(256, 2),
softmax
);
For better numerical stability and accuracy, it is recommended to replace crossentropy by and logitcrossentropy respectively (in which case softmax is not necessary).

How to use different activations in output layer in Keras in R

I want to combine more types of activations in output layer in Keras interface for R. Also, I want to use different loss functions for different outputs. Lets say I want to have first two neurons linear with MSE loss, second 2 neurons sigmoid with BCE loss and last output will be relu with MAE loss. By now I have this and it is not working:
model <- keras_model_sequential()
model %>% layer_dense(units=120, activation="selu",
input_shape=dim(X)[2]) # this is hidden layer, this works fine
model %>% layer_dense(units=120, activation=as.list(c(rep("linear",2),
rep("sigmoid",2), "relu"))) # output layer which is not working
model %>% compile(loss=as.list(c(rep("mean_squared_error",2),
rep("binary_crossentropy",2), "mean_absolute_error")), # problem here ?
optimizer=optimizer_adam(lr=0.001) ,metrics = "mae")
and after this I fit the model with model %>% fit(...) .
Error is the following:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: When passing a list as loss, it should have one entry per model outputs.
The model has 1 outputs, but you passed loss=['mean_squared_error', 'mean_squared_error', ...
Any help is appreciated.
EDIT : only rewrited code so that is better readable.
I think that if you want to have multiple outputs you need to use the functional (that is, not the sequential) API - see some examples here: https://keras.rstudio.com/articles/functional_api.html

Box Cox transformation in R, apply to a column

I have skewed data that I need to normalize in order to make a t-test and I struggle to find an implementation of the Box-Cox transformation taking a specified lambda. I tried to use a log but for a few data points, it does not work quite well.
I come from Python where there is this function:
from scipy.special import boxcox
>>> boxcox([1, 4, 10], 2.5)
array([0.,12.4, 126.09110641])
where 2.5 would be the lambda specified. This function can then be applied to a whole column.
I would like to find its implementation in R but so far I have only found the boxcox function that gives me the best lambda parameter in the MASS package but I cannot seem to find a way to apply any lambda I want.
You can try the boxcox function from the EnvStat pacakge (see here).
There you can specify lambda:
library(EnvStat)
boxcox(1:10, lambda = 2.5, optimize = F)

Using glmnet to predict a continuous variable in a dataset

I have this data set.
wbh
I wanted to use the R package glmnet to determine which predictors would be useful in predicting fertility. However, I have been unable to do so, most likely due to not having a full understanding of the package. The fertility variable is SP.DYN.TFRT.IN. I want to see which predictors in the data set give the most predictive power for fertility. I wanted to use LASSO or ridge regression to shrink the number of coefficients, and I know this package can do that. I'm just having some trouble implementing it.
I know there are no code snippets which I apologize for but I am rather lost on how I would code this out.
Any advice is appreciated.
Thank you for reading
Here is an example on how to run glmnet:
library(glmnet)
library(tidyverse)
df is the data set your provided.
select y variable:
y <- df$SP.DYN.TFRT.IN
select numerical variables:
df %>%
select(-SP.DYN.TFRT.IN, -region, -country.code) %>%
as.matrix() -> x
select factor variables and convert to dummy variables:
df %>%
select(region, country.code) %>%
model.matrix( ~ .-1, .) -> x_train
run model(s), several parameters here can be tweaked I suggest checking the documentation. Here I just run 5-fold cross validation to determine the best lambda
cv_fit <- cv.glmnet(x, y, nfolds = 5) #just with numeric variables
cv_fit_2 <- cv.glmnet(cbind(x ,x_train), y, nfolds = 5) #both factor and numeric variables
par(mfrow = c(2,1))
plot(cv_fit)
plot(cv_fit_2)
best lambda:
cv_fit$lambda[which.min(cv_fit$cvm)]
coefficients at best lambda
coef(cv_fit, s = cv_fit$lambda[which.min(cv_fit$cvm)])
equivalent to:
coef(cv_fit, s = "lambda.min")
after running coef(cv_fit, s = "lambda.min") all features with - in the resulting table are dropped from the model. This situation corresponds to the left lambda depicted with the left vertical dashed line on the plots.
I suggest reading the linked documentation - elastic nets are quite easy to grasp if you know a bit of linear regression and the package is quite intuitive. I also suggest reading ISLR, at least the part with L1 / L2 regularization. and these videos: 1, 2, 3 4, 5, 6, first three are about estimating model performance via test error and the last three are about the question at hand. This one is how to implement these models in R. By the way these guys on the videos invented LASSO and made glment.
Also check the glmnetUtils library which provides a formula interface and other nice things like in built mixing parameter (alpha) selection. Here is the vignette.

Resources