Extra weight values in neural network (nnet for R package) - r

I'm attempting to reverse engineer in Excel how the nnet package works using some simple input data. Here's the steps I've taken
Import dummy data:test <- read.csv('dataScaled.csv',header=TRUE,sep = ",")
Train the network:
anntrain <- nnet(Price ~ Sqft + Bedrooms + Bathrooms,test[1:650,],size=2, maxit=5000,linout=TRUE)
Grab the weights of the ANN:
anntrain$wts
This outputs:
[1] -2.12443010 6.68900321 0.85338018 -0.73329823 -3.95336239 7.91917321
[7] -5.38893137 4.05941771 -0.02062346 0.26584364 0.32881035
Grab fitted values of trained network:
anntrain$fitted.values
This outputs what I believe to be the scaled Price predictions of the trained network for each of the 650 transactions I trained it on above.
Prove out the fitted values by recalculating using the above weights using the sigmoid function.
My confusion is that it outputs 11 weight values. If I only have 3 inputs, 2 hidden nodes and 1 output, shouldn't that equate to only 8 weights? What are the 3 extra weights for?

There is a bias in every layer (Why use a bias/threshold?). A bias is like a node that always gives you the input 1. Thus you have (3+1)*2+(2+1)*1 = 11 weights.

Related

R: How to check which model of an ensemble algorithm has been selected to perform regression?

I am using the R package machisplin (it's not on CRAN) to downscale a satellite image. According to the description of the package:
The machisplin.mltps function simultaneously evaluates different combinations of the six algorithms to predict the input data. During model tuning, each algorithm is systematically weighted from 0-1 and the fit of the ensembled model is evaluated. The best performing model is determined through k-fold cross validation (k=10) and the model that has the lowest residual sum of squares of test data is chosen. After determining the best model algorithms and weights, a final model is created using the full training dataset.
My question is how can I check which model out of the 6 has been selected for the downscaling? To put it differently, when I export the downscaled image, I would like to know which algorithm (out of the 6) has been used to perform the downscaling.
Here is the code:
library(MACHISPLIN)
library(raster)
library(gbm)
evi = raster("path/evi.tif") # covariate
ntl = raster("path/ntl_1600.tif") # raster to be downscaled
##convert one of the rasters to a point dataframe to sample. Use any raster input.
ntl.points<-rasterToPoints(ntl,
fun = NULL,
spatial = FALSE)
##subset only the x and y data
ntl.points<- ntl.points[,1:2]
##Extract values to points from rasters
RAST_VAL<-data.frame(extract(ntl, ntl.points))
##merge sampled data to input
InInterp<-cbind(ntl.points, RAST_VAL)
#run an ensemble machine learning thin plate spline
interp.rast<-machisplin.mltps(int.values = InInterp,
covar.ras = evi,
smooth.outputs.only = T,
tps = T,
n.cores = 4)
#set negative values to 0
interp.rast[[1]]$final[interp.rast[[1]]$final <= 0] <- 0
writeRaster(interp.rast[[1]]$final,
filename = "path/ntl_splines.tif")
I vied all the output parameters (please refer to Example 2 in the package description) but I couldn't find anything relevant to my question.
I have posted a question on GitHub as well. From here you can download my images.
I think this is a misunderstanding; mahcisplin, isnt testing 6 and gives one. it's trying many ensembles of 6 and its giving one ensemble... or in other words
that its the best 'combination of 6 algorithms' that I will get, and not one of 6 algo's chosen.
It will get something like "a model which is 20% algo1 , 10% algo2 etc. "and not "algo1 is the best and chosen"

Implementing Longitudinal Random Forest with LongituRF package in R

I have some high dimensional repeated measures data, and i am interested in fitting random forest model to investigate the suitability and predictive utility of such models. Specifically i am trying to implement the methods in the LongituRF package. The methods behind this package are detailed here :
Capitaine, L., et al. Random forests for high-dimensional longitudinal data. Stat Methods Med Res (2020) doi:10.1177/0962280220946080.
Conveniently the authors provide some useful data generating functions for testing. So we have
install.packages("LongituRF")
library(LongituRF)
Let's generate some data with DataLongGenerator() which takes as arguments n=sample size, p=number of predictors and G=number of predictors with temporal behavior.
my_data <- DataLongGenerator(n=50,p=6,G=6)
my_data is a list of what you'd expect Y (response vector),
X (matrix of fixed effects predictors), Z (matrix of random-effects predictors),
id (vector of sample identifier) and time (vector of time measurements). To fit random forest model simply
model <- REEMforest(X=my_data$X,Y=my_data$Y,Z=my_data$Z,time=my_data$time,
id=my_data$id,sto="BM",mtry=2)
takes about 50secs here so bear with me
so far so good. Now im clear about all the parameters here except for Z. What is Z when i go to fit this model on my actual data?
Looking at my_data$Z.
dim(my_data$Z)
[1] 471 2
head(my_data$Z)
[,1] [,2]
[1,] 1 1.1128914
[2,] 1 1.0349287
[3,] 1 0.7308948
[4,] 1 1.0976203
[5,] 1 1.3739856
[6,] 1 0.6840415
Each row of looks like an intercept term (i.e. 1) and values drawn from a uniform distribution runif().
The documentation of REEMforest() indicates that "Z [matrix]: A Nxq matrix containing the q predictor of the random effects." How is this matrix to be specified when using actual data?
My understanding is that traditionally Z is simply one-hot (binary) encoding of the group variables (e.g. as described here), so Z from the DataLongGenerator() should be nxG (471x6) sparse matrix no?
Clarity on how to specify the Z parameter with actual data would be appreciated.
EDIT
My specific example is as follows, i have a response variable (Y). Samples (identified with id) were randomly assigned to intervention (I, intervention or no intervention). A high dimensional set of features (X). Features and response were measured at two timepoints (Time, baseline and endpoint). I am interested in predicting Y, using X and I. I am also interested in extracting which features were most important to predicting Y (the same way Capitaine et al. did with HIV in their paper).
I will call REEMforest() as follows
REEMforest(X=cbind(X,I), Y=Y, time=Time, id=id)
What should i use for Z?
When the function DataLongGenerator() creates Z, it's a random uniform data in a matrix. The actual coding is
Z <- as.matrix(cbind(rep(1, length(f)), 2 * runif(length(f))))
Where f represents the length of the matrices that represent each of the elements. In your example, you used 6 groups of 50 participants with 6 fixed effects. That led to a length of 472.
From what I can gather, since this function is designed to simulate longitudinal data, this is a simulation of random effects on that data. If you were working with real data, I think it would be a lot easier to understand.
While this example doesn't use RE-EM forests, I thought it was pretty clear, because it uses tangible elements as an example. You can read about random effects in section 1.2.2 Fixed v. Random Effects. https://ademos.people.uic.edu/Chapter17.html#32_fixed_effects
Look at section 3.2 to see examples of random effects that you could intentionally model if you were working with real data.
Another example: You're running a cancer drug trial. You've collected patient demographics on a weekly basis: weight, temperature, and a CBC panel and different groups of drug administration: 1 unit per day, 2 units per day, and 3 units per day.
In traditional regression, you'd model these variables to determine how accurately the model identifies the outcome. The fixed effects are the explainable variance or R2. So if you've .86 or 86% then 14% is unexplained. It could be an interaction causing the noise, the unexplained variance between perfect and what the model determined was the outcome.
Let's say the patients with really low white blood cell counts and were overweight responded far better to the treatment. Or perhaps the patients with red hair responded better; that's not in your data. In terms of longitudinal data, let's say that the relationship (the interaction relationship) only appears after some measure of time passes.
You can try to model different relationships to evaluate the random interactions in the data. I think you'd be better off with one of the many ways to evaluate interactions systematically than a random attempt to identify random effects, though.
EDITED I started to write this in the comments with #JustGettinStarted, but it was too much.
Without the background - the easiest way to achieve this would be to run something like REEMtree::REEMtree(), setting the random effects argument to random = ~1 | time / id). After it runs, extract the random effects it's calculated. You can do it like this:
data2 <- data %>% mutate(oOrder = row_number()) %>% # identify original order of the data
arrange(time, id) %>%
mutate(zOrder = row_number()) # because the random effects will be in order by time then id
extRE <- data.frame(time = attributes(fit$RandomEffects[2][["id"]])[["row.names"]]) %>%
separate(col = time,
into = c("time", "id"),
sep = "\\/") %>%
mutate(Z = fit$RandomEffects[[2]] %>% unlist(),
id = as.integer(id),
time = time)) # set data type to match dataset for time
data2 <- data2 %>% left_join(extRE) %>% arrange(oOrder) # return to original order
Z = cbind(rep(1, times = nrows(data2)), data2$Z)
Alternatively, I suggest that you start with the random generation of random effects. The random-effects you start with are just a jumping-off point. The random effects at the end will be different.
No matter how many ways I tried to use LongituRF::REEMforest() with real data, I ran into errors. I had an uninvertible matrix failure every time.
I noticed that the data generated by DataLongGenerator() comes in order by id, then time. I tried to order the data (and Z) that way, but it didn't help. When I extracted all the functionality out of the package LongituRF, I used the MERF (multiple-effects random forest) function with no problems. Even in the research paper, that method was solid. Just thought it was worth mentioning.

Find the nearest neighbor using caret

I'm fitting a k-nearest neighbor model using R's caret package.
library(caret)
set.seed(0)
y = rnorm(20, 100, 15)
predictors = matrix(rnorm(80, 10, 5), ncol=4)
data = data.frame(cbind(y, predictors))
colnames(data)=c('Price', 'Distance', 'Cost', 'Tax', 'Transport')
I left one observation as the test data and fit the model using the training data.
id = sample(nrow(data)-1)
train = data[id, ]
test = data[-id,]
knn.model = train(Price~., method='knn', train)
predict(knn.model, test)
When I display knn.model, it tells me it uses k=9. I would love to know which 9 observations are actually the "nearest" to the test observation. Besides manually calculating the distances, is there an easier way to display the nearest neighbors?
Thanks!
When you are using knn you are creating clusters with points that are near based on independent variables. Normally, this is done using train(Price~., method='knn', train), such that the model chooses the best prediction based on some criteria (taking into account also the dependent variable as well). Given the fact I have not checked whether the R object stores the predicted price for each of the trained values, I just used the model trained to predicte the expected price given the model (where the expected price is located in the space).
At the end, the dependent variable is just a representation of all the other variables in a common space, where the price associated is assumed to be similar since you cluster based on proximity.
As a summary of steps, you need to calculate the following:
Get the distance for each of the training data points. This is done through predicting over them.
Calculate the distance between the trained data and your observation of interest (in absolut value, since you do not care about the sign but just about the absolut distances).
Take the indexes of the N smaller ones(e.g.N= 9). you can get the observations and related to this lower distances.
TestPred<-predict(knn.model, newdata = test)
TrainPred<-predict(knn.model, train)
Nearest9neighbors<-order(abs(TestPred-TrainPred))[1:9]
train[Nearest9neighbors,]
Price Distance Cost Tax Transport
15 95.51177 13.633754 9.725613 13.320678 12.981295
7 86.07149 15.428847 2.181090 2.874508 14.984934
19 106.53525 16.191521 -1.119501 5.439658 11.145098
2 95.10650 11.886978 12.803730 9.944773 16.270416
4 119.08644 14.020948 5.839784 9.420873 8.902422
9 99.91349 3.577003 14.160236 11.242063 16.280094
18 86.62118 7.852434 9.136882 9.411232 17.279942
11 111.45390 8.821467 11.330687 10.095782 16.496562
17 103.78335 14.960802 13.091216 10.718857 8.589131

h2o deep learning weights and normalization

I'm exploring h2o via the R interface and I'm getting a weird weight matrix. My task is as simple as they get: given x,y compute x+y.
I have 214 rows with 3 columns. The first column(x) was drawn uniformly from (-1000, 1000) and the second one(y) from (-100,100). I just want to combine them so I have a single hidden layer with a single neuron.
This is my code:
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
train <- h2o.importFile(path = "/home/martin/projects/R NN Addition/addition.csv")
model <- h2o.deeplearning(1:2,3,train, hidden = c(1), epochs=200, export_weights_and_biases=T, nfolds=5)
print(h2o.weights(model,1))
print(h2o.weights(model,2))
and the result is
> print(h2o.weights(model,1))
x y
1 0.5586579 0.05518193
[1 row x 2 columns]
> print(h2o.weights(model,2))
C1
1 1.802469
For some reason the weight value for y is 0.055 - 10 times lower than for x. So, in the end the neural net would compute x+y/10. However, h2o.predict actually returns the correct values (even on a test set).
I'm guessing there's a preprocessing step that's somehow scaling my data. Is there any way I can reproduce the actual weights produced by the model? I would like to be able to visualize some pretty simple neural networks.
Neural networks perform best if all the input features have mean 0 and standard deviation 1. If the features have very different standard deviations, neural networks perform very poorly. Because of that h20 does this normalization for you. In other words, before even training your net it computes mean and standard deviation of all the features you have, and replaces the original values with (x - mean) / stddev. In your case the stddev for the second feature is 10x smaller than for the first, so after the normalization the values end up being 10x more important in terms of how much they contribute to the sum, and the weights heading to the hidden neuron need to cancel it out. That's why the weight for the second feature is 10x smaller.

Storing the mean square resulted from CVlm

I am trying to create a script in R for automatically assessing the predictive power of various possible linear models. To assess the predictive power of a model, I use as a quality indicator their overall mean square which comes from a cross-validation for which I use the function CVlm from package DAAG. My question is how can I retrieve the value of the overall mean square resulted from CVlm in an automated way (without having to observed the textual output of CVlm)?
For example the following code from http://maths-people.anu.edu.au/~johnm/r-book/3edn/scripts/reg1.R
houseprices.lm <- lm(sale.price ~ area, data=houseprices)
CVlm(houseprices, houseprices.lm, plotit=TRUE)
has an output in the form
fold 1
Observations in test set: ...
fold 2
Observations in test set: ...
Overall ms
2023
How can I access/store the value of ms (2023) of each run?
You have to store the result of CVlm in a variable and access the ms attribute:
houseprices.lm <- lm(sale.price ~ area, data=houseprices)
cv <- CVlm(houseprices, houseprices.lm, plotit=TRUE)
attr(cv, "ms")
# [1] 3934

Resources