I am using the R programming language. On some bigger data, I tried the following code (make a decision tree):
#load library
library(rpart)
#generate data
a = rnorm(100, 7000000, 10)
b = rnorm(100, 5000000, 5)
c = rnorm(100, 400000, 10)
group <- sample( LETTERS[1:2], 100, replace=TRUE, prob=c(0.5,0.5) )
group_1 <- sample( LETTERS[1:4], 100, replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
d = data.frame(a,b,c, group, group_1)
d$group = as.factor(d$group)
d$group_1 = as.factor(d$group_1)
#fit model
tree <- rpart(group ~ ., d)
#visualize results
plot(tree)
text(tree, use.n=TRUE, minlength = 0, xpd=TRUE, cex=.8)
In the visual output, the numbers are displayed in scientific notation (e.g. 4.21e+06). Is there a way to disable this?
I consulted this previous answer on stackoverflow:How to disable scientific notation?
I then tried the following command : options(scipen=999)
But this did not seem to fix the problem.
Can someone please tell me what I am doing wrong?
Thanks
I think the labels.rpart function has scientific notation hard-coded in: it uses a private function called formatg to do the formatting using sprintf() with a %g format, and that function ignores options(scipen). You can override this by replacing formatg with a better function. Here's a dangerous way to do that:
oldformatg <- rpart:::formatg
assignInNamespace("formatg", format, "rpart")
which replaces formatg with the standard format function. (This will definitely have dangerous side effects, so afterwards you should change it back using
assignInNamespace("formatg", oldformatg, "rpart")
A better solution would be to rescale your data. rpart switches to scientific notation only for big numbers, so you could divide the bad numbers by something like 1000 or 1000000, and describe them as being in different units. For your example, this works for me:
library(rpart)
#generate data
set.seed(123)
a = rnorm(100, 7000000, 10)/1000
b = rnorm(100, 5000000, 5)/1000
c = rnorm(100, 400000, 10)/1000
group <- sample( LETTERS[1:2], 100, replace=TRUE, prob=c(0.5,0.5) )
group_1 <- sample( LETTERS[1:4], 100, replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
d = data.frame(a,b,c, group, group_1)
d$group = as.factor(d$group)
d$group_1 = as.factor(d$group_1)
#fit model
tree <- rpart(group ~ ., d)
#visualize results
plot(tree)
text(tree, use.n=TRUE, minlength = 0, xpd=TRUE, cex=.8)
Created on 2021-01-27 by the reprex package (v0.3.0)
Related
How do I make graphs like this in R?
Lets say I have a dataset like this:
data <- tibble(date=sample(seq(as.Date("2006-01-01"),
as.Date("2019-01-01"), by="day"),
10000, replace = T),
treatment=sample(c(0,1),10000, replace= T),
after=ifelse(date>as.Date("2015-03-01"), 1, 0),
score=rnorm(10000)+ifelse(treatment*after==1, 0.2, 0)
)
and is doing a difference in differences analysis:
did <- lm(score~treatment+after+treatment*after, data=data)
summary(did)
How can I make a plot with placebo tests?
Just using plot_model function in sjPlot.
data <- tibble(date=sample(seq(as.Date("2006-01-01"),
as.Date("2019-01-01"), by="day"),
10000, replace = T),
treatment=sample(c(0,1),10000, replace= T),
after=ifelse(date>as.Date("2015-03-01"), 1, 0),
score=rnorm(10000)+ifelse(treatment*after==1, 0.2, 0)
)
did <- lm(score~treatment+after+treatment*after, data=data)
summary(did)
sjPlot::plot_model(did,vline = 'black',show.values = T) + ylim(-.25, .5)
vline means to add a horizontal line at x = 1;
show.values means whether values should be plotted or not.
You can check the details of argument of plot_model from here.
I am using the R programming language. I am trying to replicate the plots from the following stackoverflow post using the "mlr" library: R: multiplot for plotLearnerPrediction ggplot objects of MLR firing errors in RStudio
(I am also using this site here: https://www.analyticsvidhya.com/blog/2016/08/practicing-machine-learning-techniques-in-r-with-mlr-package/)
First, I created the data for this exercise ("response variable" is the response, all other variables are the predictors)
#load libraries
library(mlr)
library(girdExtra)
library(ggplot2)
library(rpart)
#create data
a = rnorm(1000, 10, 10)
b = rnorm(1000, 10, 5)
c = rnorm(1000, 5, 10)
d <- sample( LETTERS[1:3], 1000, replace=TRUE, prob=c(0.2, 0.6, 0.2) )
response_variable <- sample( LETTERS[1:2], 1000, replace=TRUE, prob=c(0.3, 0.7) )
data <- data.frame(a, b, c, d, response_variable)
data$d = as.factor(data$d)
data$response_variable = as.factor(data$response_variable)
From here, I tried to follow the "mlr" part of the tutorial (only with the "decision tree" and the "random forest" algorithm):
task <- makeClassifTask(data = data, target = "response_variable")
learners = list(
"classif.randomForest",
"classif.rpart" )
p1<-plotLearnerPrediction(learner = learners[[1]], task = task)
p2<-plotLearnerPrediction(learner = learners[[2]], task = task)
Can someone please tell me if the plots I have produced as the user is intended to do so?
Thanks
Yes, they are as the user is intended to do so. To see this, you can run the same commands on the toy data. From this, you will see that the classification is correct. The only thing is that in your data the response has absolutely nothing to do with the predictors, so the classification sucks (in fact, it seems to be predicting everything as "B").
a = rnorm(100, 10, 10)
b = rnorm(100, 10, 5)
data <- data.frame(a, b)
library(dplyr)
data=mutate(data, response_variable=ifelse(a>mean(a) | b<mean(b), "A", "B"))
I am trying to plot my neural network and I am wondering how can I round the weights to 3 digits.
library(neuralnet)
set.seed(0)
x = matrix(rnorm(100, 0, 5), ncol=4)
y = rnorm(25, 100, 20)
data = data.frame(y, x)
nn.model = neuralnet(y~., data, linear.output=T, stepmax = 1e+06)
plot(nn.model)
I've tried mapply(round) but it didn't work out on lists as neuralnet model generates. Any suggestion is appreciated!
Like this:
nn.model$weights[[1]] <- lapply(nn.model$weights[[1]], function(x) round(x, 3))
plot(nn.model)
I am using the following code to generate data, and i am estimating regression models across a list of variables (covar1 and covar2). I have also created confidence intervals for the coefficients and merged them together.
I have been examining all sorts of examples here and on other sites, but i can't seem to accomplish what i want. I want to stack the results for each covar into a single data frame, labeling each cluster of results by the covar it is attributable to (i.e., "covar1" and "covar2"). Here is the code for generating data and results using lapply:
##creating a fake dataset (N=1000, 500 at treated, 500 at control group)
#outcome variable
outcome <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 70, sd = 10))
#running variable
running.var <- seq(0, 1, by = .0001)
running.var <- sample(running.var, size = 1000, replace = T)
##Put negative values for the running variable in the control group
running.var[1:500] <- -running.var[1:500]
#treatment indicator (just a binary variable indicating treated and control groups)
treat.ind <- c(rep(0,500), rep(1,500))
#create covariates
set.seed(123)
covar1 <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 50, sd = 20))
covar2 <- c(rnorm(500, mean = 10, sd = 20), rnorm(500, mean = 10, sd = 30))
data <- data.frame(cbind(outcome, running.var, treat.ind, covar1, covar2))
data$treat.ind <- as.factor(data$treat.ind)
#Bundle the covariates names together
covars <- c("covar1", "covar2")
#loop over them using a convenient feature of the "as.formula" function
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = d)
ci <-confint(regres, level=0.95)
regres_ci <- cbind(summary(regres)$coefficient, ci)
})
names(models) <- covars
print(models)
Any nudge in the right direction, or link to a post i just haven't come across, is greatly appreciated.
You can use do.call were de second argument is a list (like in here):
do.call(rbind, models)
I made a (possible) improve to your lapply function. This way you can save the estimated parameters and the variables in a data.frame:
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = data)
ci <-confint(regres, level=0.95)
regres_ci <- data.frame(covar=x,param=rownames(summary(regres)$coefficient),
summary(regres)$coefficient, ci)
})
do.call(rbind,models)
I am RSNNS to make a model. I am using QuickProp algorithm. here's my neural network:
mydata1 <- read.csv("-1-5_rand1.csv");
mydata <- mydata1[1:151, ]
test_set <- mydata1[152:168, ]
test_set1 <- test_set[c(-7)]
a <- SnnsRObjectFactory()
input <- mydata[c(-7)]
output <- mydata[c(7)]
b <- splitForTrainingAndTest(input, output, ratio = 0.22)
a <- mlp(b$inputsTrain, b$targetsTrain, size = 9, maxit = 650, learnFunc = "Quickprop", learnFuncParams = c(0.01, 2.5, 0.0001, 0, 0), updateFunc = "Topological_Order",
updateFuncParams = c(0.0), hiddenActFunc = "Act_TanH", computeError=TRUE, initFunc = "Randomize_Weights", initFuncParams = c(-1,1),
shufflePatterns = TRUE, linOut = FALSE, inputsTest = b$inputsTest, targetsTest = b$targetsTest)
I am predicting using test set as:
predictions <- predict(a, test_set1)
Is it possible to in RSNNS to predict after every 50 cycles using test set instead of predicting after 650 cycles?
the answer is you can't do it with the high-level interface, but with the low-level interface, you can have a look, e.g., at the mlp_irisSnnsR.R demo that is included in RSNNS