I am trying to plot my neural network and I am wondering how can I round the weights to 3 digits.
library(neuralnet)
set.seed(0)
x = matrix(rnorm(100, 0, 5), ncol=4)
y = rnorm(25, 100, 20)
data = data.frame(y, x)
nn.model = neuralnet(y~., data, linear.output=T, stepmax = 1e+06)
plot(nn.model)
I've tried mapply(round) but it didn't work out on lists as neuralnet model generates. Any suggestion is appreciated!
Like this:
nn.model$weights[[1]] <- lapply(nn.model$weights[[1]], function(x) round(x, 3))
plot(nn.model)
Related
I need to plot the statistical power vs. the number of replicates and in this case the number of replicates (n) is 3, but I can't figure out how to plot it.
This is what I have:
library(car)
n <- 3
nsims <- 1000
p = coef = vector()
for (i in 1:nsims) {
treat <- rnorm(n, mean = 460, sd = 110)
cont <- rnorm(n, mean = 415, sd = 110)
df <- data.frame(
y = c(treat, cont),
x = rep(c("treat", "cont"), each = n)
)
model <- glm(y ~ x, data = df)
p[i] = Anova(model)$P
coef[i] = coef(model)[2]
}
hist(p, col = 'skyblue')
sum(p < 0.05)/nsims
Can someone help me plot this?
Also, I need to calculate the mean of the coefficients using only models where p < 0.05. This is simulating the following process: if you perform the experiment, and p > 0.05, you report 'no effect’, but if p < 0.05 you report ‘significant effect’. But I'm not sure how to set that up from what I have.
Would I just do this?
mean(coef)
But I don't know how to include only those with p < 0.05.
Thank you!
Disclaimer: I spend a decent amount of time simulating experiments for work so I have strong opinions on this.
If that's everything because it's for a study assignment then fine, if you are planning to go further with this I recommend
adding the tidyverse to your arsenal.
Encapsulating functionality
First allows me to put a single iteration into a function to decouple its logic from the result subsetting (the encapsulation).
sim <- function(n) {
treat <- rnorm(n, 460, 110)
cont <- rnorm(n, 415, 110)
data <- data.frame(y = c(treat, cont), x = rep(c("treat", "cont"), each = n))
model <- glm(y ~ x, data = data)
p <- car::Anova(model)$P
coef <- coef(model)[2]
data.frame(n, p, coef)
}
Now we can simulate
nsims <- 1000
sims <- do.call(
rbind,
# We are now using the parameter as opposed to the previous post.
lapply(
rep(c(3, 5, 10, 20, 50, 100), each = nsims),
sim
)
)
# Aggregations
power_smry <- aggregate(p ~ n, sims, function(x) {mean(x < 0.05)})
coef_smry <- aggregate(coef ~ n, sims[sims$p < 0.05, ], mean)
# Plots
plot(p ~ n, data = power_smry
If you do this in the tidyverse this is one possible approach
crossing(
n = rep(c(3, 5, 10, 20, 50, 100))
# Add any number of other inputs here that you want to explore (like lift).
) %>%
rowwise() %>%
# This looks complicated but will be less so if you have multiple
# varying hyperparameters defined in crossing.
mutate(results = list(bind_rows(rerun(nsims, sim(n))))) %>%
pull(results) %>%
bind_rows() %>%
group_by(n) %>%
# The more metrics you want to summarize in different ways the easier compared to base.
summarize(
power = mean(p < 0.05),
coef = mean(coef[p < 0.05])
)
With a vector of values, I want each value to be called on a function
values = 1:10
rnorm(100, mean=values, sd=1)
mean = values repeats the sequence (1,2,3,4,5,6,7,8,9,10). How can I get a matrix, each with 100 observations and using a single element from my vector?
ie:
rnorm(100, mean=1, sd=1)
rnorm(100, mean=2, sd=1)
rnorm(100, mean=3, sd=1)
rnorm(100, mean=4, sd=1)
# ...
It's not clear from your question, but I took it that you wanted a single matrix with 10 rows and 100 columns. That being the case you can do:
matrix(rnorm(1000, rep(1:10, each = 100)), nrow = 10, byrow = TRUE)
Or modify akrun's answer by using sapply instead of lapply
An option is lapply from base R
lapply(1:10, function(i) rnorm(100, mean = i, sd = 1))
Or Map from base R:
Map(function(i) rnorm(100, mean = i, sd = 1), 1:10)
Using map I can apply a function for each value from the vector values
library(purrr)
values = 1:10
map_dfc(
.x = values,
.f = ~rnorm(100,mean = .x,sd = 1)
)
In this case I will have a data.frame 100x10
This looks to me to have an exponential trend but I'm not completely sure how to approach this.
Using the forecast package:
library(forecast)
no_diffs_to_stationary = ndiffs(df$px)
df$stationary_series <- c(rep(NA, no_diffs_to_stationary),
diff(df$px, no_diffs_to_stationary))
mean(df$stationary_series, na.rm = TRUE)
sd(df$stationary_series, na.rm = TRUE)
Data:
x <- seq(0, 20, length.out=1000)
df <- data.frame(x = x, px = dexp(x, rate=0.65))
My aim is to plot the bias-variance decomposition of a cubic smoothing spline for varying degrees of freedom.
First I simulate a test-set (matrix) and a train-set (matrix). Then I iterate over 100 simulations and vary in each iteration the degrees of freedom of the smoothing spline.
The output I get with the below code does not show any trade-off. What am I doing wrong when calculating the bias /variance?
For reference, the right panel of this figure (slide 14) shows the tradeoff that I would expect (source)
rm(list = ls())
library(SimDesign)
set.seed(123)
n_sim <- 100
n_df <- 40
n_sample <- 100
mse_temp <- matrix(NA, nrow = n_sim, ncol = n_df)
var_temp <- matrix(NA, nrow = n_sim, ncol = n_df)
bias_temp <- matrix(NA, nrow = n_sim, ncol = n_df)
# Train data -----
x_train <- runif(n_sample, -0.5, 0.5)
f_train <- 0.8*x_train+sin(6*x_train)
epsilon_train <- replicate(n_sim, rnorm(n_sample,0,sqrt(2)))
y_train <- replicate(n_sim,f_train) + epsilon_train
# Test data -----
x_test <- runif(n_sample, -0.5, 0.5)
f_test <- 0.8*x_test+sin(6*x_test)
epsilon_test <- replicate(n_sim, rnorm(n_sample,0,sqrt(2)))
y_test <- replicate(n_sim,f_test) + epsilon_test
for (mc_iter in seq(n_sim)){
for (df_iter in seq(n_df)){
cspline <- smooth.spline(x_train, y_train[,mc_iter], df=df_iter+1)
cspline_predict <- predict(cspline, x_test)
mse_temp[mc_iter, df_iter] <- mean((y_test[,mc_iter] - cspline_predict$y)^2)
var_temp[mc_iter, df_iter] <- var(cspline_predict$y)
# bias_temp[mc_iter, df_iter] <- bias(cspline_predict$y, f_test)^2
bias_temp[mc_iter, df_iter] <- mean((replicate(n_sample, mean(cspline_predict$y))-f_test)^2)
}
}
mse_spline <- apply(mse_temp, 2, FUN = mean)
var_spline <- apply(var_temp, 2, FUN = mean)
bias_spline <- apply(bias_temp, 2, FUN = mean)
par(mfrow=c(1,3))
plot(seq(n_df),mse_spline, type = 'l')
plot(seq(n_df),var_spline, type = 'l')
plot(seq(n_df),bias_spline, type = 'l')
Actually I think your code works, it's just the small sample size, you hit the area of overfitting very fast, so everything in the plot is very close to the left border, in the area of few degrees of freedom. If you increase n_sample you should see the expected relation.
I am using the following code to generate data, and i am estimating regression models across a list of variables (covar1 and covar2). I have also created confidence intervals for the coefficients and merged them together.
I have been examining all sorts of examples here and on other sites, but i can't seem to accomplish what i want. I want to stack the results for each covar into a single data frame, labeling each cluster of results by the covar it is attributable to (i.e., "covar1" and "covar2"). Here is the code for generating data and results using lapply:
##creating a fake dataset (N=1000, 500 at treated, 500 at control group)
#outcome variable
outcome <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 70, sd = 10))
#running variable
running.var <- seq(0, 1, by = .0001)
running.var <- sample(running.var, size = 1000, replace = T)
##Put negative values for the running variable in the control group
running.var[1:500] <- -running.var[1:500]
#treatment indicator (just a binary variable indicating treated and control groups)
treat.ind <- c(rep(0,500), rep(1,500))
#create covariates
set.seed(123)
covar1 <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 50, sd = 20))
covar2 <- c(rnorm(500, mean = 10, sd = 20), rnorm(500, mean = 10, sd = 30))
data <- data.frame(cbind(outcome, running.var, treat.ind, covar1, covar2))
data$treat.ind <- as.factor(data$treat.ind)
#Bundle the covariates names together
covars <- c("covar1", "covar2")
#loop over them using a convenient feature of the "as.formula" function
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = d)
ci <-confint(regres, level=0.95)
regres_ci <- cbind(summary(regres)$coefficient, ci)
})
names(models) <- covars
print(models)
Any nudge in the right direction, or link to a post i just haven't come across, is greatly appreciated.
You can use do.call were de second argument is a list (like in here):
do.call(rbind, models)
I made a (possible) improve to your lapply function. This way you can save the estimated parameters and the variables in a data.frame:
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = data)
ci <-confint(regres, level=0.95)
regres_ci <- data.frame(covar=x,param=rownames(summary(regres)$coefficient),
summary(regres)$coefficient, ci)
})
do.call(rbind,models)