Loop in R not appending all the values - r

I have a testing data-set of 88 observations. I have built a model and predicting on this new data-set.
Here is the twist. I am adding a new column to it and trying to store the predictions into a dataframe.
My training and test data are all matching.
Now when I execute this loop, I am not getting the desired output.
#creating an EMI vector
em = c(10000,20000,30000,40000,50000,60000,70000,80000,90000,100000)
#my dataframe where i want to store predictions
v <- c()
v <- data.frame(v)
for(i in em){
newdata$EMI.Amount=i
prediction=predict(rf,newdata,type="response")
kl <- table(prediction)
v <- rbind(v,kl)
}
I am getting predictions of only the last EMI value from the vector em.
i.e for 1,00,000.
Here is the output
I want the output for each em vector i.e the predictions of the binary class to be in the dataframe like this.

You don't need to run a loop.
Instead I would simply run the code
newdata$EMI.Amount <- em
v = predict(rf, newdata, type="response")
I hope that is the answer to your problem.

I am not so sure how your code works as it is not reproducible. Maybe you would like something like this?
#creating an EMI vector
em=c(10000,20000,30000,40000,50000,60000,70000,80000,90000,100000)
#my dataframe where i want to store predictions
v<-c()
v <- data.frame(v)
newdata$EMI.Amount<-em
prediction=predict(rf,newdata,type="response")
kl <- table(prediction)
v <- rbind(v,kl)

Related

How do I loop different percentages of missing values using MCAR?

Using the cleveland data from MCI data respository, I want to generate missing values on the data to apply some imputation techniques.
heart.ds <- read.csv(file.choose())
head(heart.ds)
attach(heart.ds)
sum(is.na(heart.ds))
str(heart.ds)
#Changing Appropriate Variables to Factors
heart.ds$sex<-as.factor(heart.ds$sex)
heart.ds$cp<-as.factor(heart.ds$cp)
heart.ds$fbs<-as.factor(heart.ds$fbs)
heart.ds$exang<-as.factor(heart.ds$exang)
heart.ds$restecg<-as.factor(heart.ds$restecg)
heart.ds$slope<-as.factor(heart.ds$slope)
heart.ds$thal<-as.factor(heart.ds$thal)
heart.ds$target<-as.factor(heart.ds$target)
str(heart.ds)
Now i want to generate missing values using the MCAR mechanism. Below is the loop code;
p = c(0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1)
hd_mcar = rep(0, length(heart.ds)) #to generate empty bins of 10 different percentages of missingness using the MCAR package
for(i in 1:length(p)){
hd_mcar[i] <- delete_MCAR(heart.ds, p[i]) #to generate 10 different percentages of missingness using the MCAR package
}
The problem here is that, after the above code, i dont get the data been generated in it original values like in a data frame where i will have n variables and n rows.
Below is a picture of the output i had through the above code;
enter image description here
But when i use only one missingness percentage i get an accurate results; below is the coe for only one missing percentage
#Missing Completely at Random(MCAR)
hd_mcar <- delete_MCAR(heart.ds, 0.05)
sum(is.na(hd_mcar))
Below is the output of the results;
enter image description here
Please I need help to to solve the looping problem. Thank you.
Now I want to apply the MICE and other imputations methods like HMISC, Amelia, mi, and missForest inside the loop but it is giving me an error saying "Error: Data should be a matrix or data frame"
The code below is for only MICE;
#1. Method(MICE)
mice_mcar[[i]] <- mice(hd_mcar, m=ip, method = c("pmm","logreg","polyreg","pmm","pmm","logreg",
"polyreg","pmm","logreg","pmm","polyreg","pmm",
"polyreg","logreg"), maxit = 20)
#Diagnostic check
summary(heart.ds$age)
mice_mcar$imp$age
#Finding the means of the impuatations
app1 <- apply(mice_mcar$imp$age, MARGIN = 2, FUN = mean)
min1 <- abs(app1-mean(heart.ds$age))
#Selecting the minimum index
sm1 <- which(min1==min(min1))
#Selecting final imputation
final_clean_hd_mcar =mice::complete(mice_mcar,sm1)
mice.mcar = final_clean_hd_mcar
How do i go about to make it fit into the loop and works perfectly?
Your problem was this line:
hd_mcar = rep(0, length(heart.ds)) #to generate empty bins of 10 different percentages of missingness using the MCAR package
You are creating a vector here rather than a list. You can't assign a data frame to an element of a vector without coercing it into something that is not a data frame. You want to do this:
p <- c(0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1)
hd_mcar <- vector(mode = "list", length = length(p))
for(i in 1:length(p)){
hd_mcar[[i]] <- delete_MCAR(heart.ds, p[i]) #to generate 10 different percentages of missingness using the MCAR package
}
Note that because it's a list now, hd_mcar[[i]] uses the [[ rather than [ subscript.

Is there a way to get the index of a list in R without match or which

I am trying to detect anomalies in the iris dataset by normalising the data into iris_norm, then splitting that into a training and testing set, then using the knn function to find anomalies. now I can extract those anomalies from the normalised iris_test set but not from the actual iris set, is there a way for me to use the indexes of the values in 'actual' as the indexes in iris? Here is my code
library(gmodels)
library(class)
library(tidyverse)
# STEP 1: Import your dataset, look at a summary
summary(iris)
# STEP 2: Generate a random number to split the dataset.
ran <- sample(1:nrow(iris), 0.9 * nrow(iris))
# The normalization function is created
nor <-function(x) {(x -min(x))/(max(x)-min(x))}
# Run nomalisation on predictor columns
iris_norm <- as.data.frame(lapply(iris[,c(1,2,3,4)], nor))
##extract training set
iris_train <- iris_norm[ran,]
##extract testing set
iris_test <- iris_norm[-ran,]
# Extract 5th column of train dataset because it will be used as
#'cl' argument in knn function.
iris_target_category <- iris[ran,5]
##extract 5th column if test dataset to measure the accuracy
iris_test_category <- iris[-ran,5]
##run knn function
pr <- knn(iris_train,iris_test,cl=iris_target_category,k=15)
##create confusion matrix
tab <- table(pr,iris_test_category)
##this function divides the correct predictions by total number of predictions
#that tell us how accurate teh model is.
accuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
accuracy(tab)
#create a cross table to see where the wrong predictions are
mytab <- CrossTable(iris_test_category, pr, FALSE)
#anomaly indexes
anomalies_index <- which(iris_test_category != pr)
# get the anomaly values
anomaly_value1 <- iris_test[iris_test_category != pr, "Sepal.Length"]
anomaly_value2 <- iris_test[iris_test_category != pr, "Sepal.Width"]
anomaly_value3 <- iris_test[iris_test_category != pr, "Petal.Length"]
anomaly_value4 <- iris_test[iris_test_category != pr, "Petal.Width"]
anomalies <- data.frame(anomaly_value1, anomaly_value2,
anomaly_value3, anomaly_value4)
actual <- iris_test[anomalies_index,]
print(anomalies)
print(actual)
I found the solution a few minutes later, all I had to do was
actual_index <- as.numeric(rownames(actual))
iris[actual_index,]
and I was able to extract the correct values

Calculate Errors using loop function in R

I have two data matrices both having the same dimensions. I want to extract the same series of columns vectors. Then take both series as vectors, then calculate different errors for example mean absolute error (mae), mean percentage error (mape) and root means square error
(rmse). My data matrix is quite large dimensional so I try to explain with an example and calculate these errors manually as:
mat1<- matrix(6:75,ncol=10,byrow=T)
mat2<- matrix(30:99,ncol=10,byrow=T)
mat1_seri1 <- as.vector(mat1[,c(1+(0:4)*2)])
mat1_seri2<- as.vector(mat1[,c(2+(0:4)*2)])
mat2_seri1 <- as.vector(mat1[,c(1+(0:4)*2)])
mat2_seri2<- as.vector(mat1[,c(2+(0:4)*2)])
mae1<-mean(abs(mat1_seri1-mat2_seri1))
mae2<-mean(abs(mat1_seri2-mat2_seri2))
For mape
mape1<- mean(abs(mat1_seri1-mat2_seri1)/mat1_seri1)*100
mape2<- mean(abs(mat1_seri2-mat2_seri2)/mat1_seri2)*100
similarly, I calculate rmse from their formula, as I have large data matrices so manually it is quite time-consuming. Is it's possible to do this using looping which gives an output of the errors (mae,mape,rmse) term for each series separately.
I'm not sure if this is what you are looking for, but here is a function that could automate the process, maybe there is also a better way:
fn <- function(m1, m2) {
stopifnot(dim(m1) == dim(m2))
mat1_seri1 <- as.vector(m1[, (1:ncol(m1))[(1:ncol(m1))%%2 != 0]])
mat1_seri2 <- as.vector(m1[, (1:ncol(m1))[!(1:ncol(m1))%%2]])
mat2_seri1 <- as.vector(m2[, (1:ncol(m2))[(1:ncol(m2))%%2 != 0]])
mat2_seri2 <- as.vector(m2[, (1:ncol(m2))[!(1:ncol(m2))%%2]])
mae1 <- mean(abs(mat1_seri1-mat2_seri1))
mae2 <- mean(abs(mat1_seri2-mat2_seri2))
mape1 <- mean(abs(mat1_seri1-mat2_seri1)/mat1_seri1)*100
mape2 <- mean(abs(mat1_seri2-mat2_seri2)/mat1_seri2)*100
setNames(as.data.frame(matrix(c(mae1, mae2, mape1, mape2), ncol = 4)),
c("mae1", "mae2", "mape1", "mape2"))
}
fn(mat1, mat2)
mae1 mae2 mape1 mape2
1 24 24 92.62581 86.89572

R: looped variable assignment, augmenting variable calculation each time

I am trying to calculate a regression variable based on a range of variables in my data set. I would like the regression variable (ei: Threshold 1) to be calculated using a different variable set in each iteration of running the regression.
Aim to collected SSR values for each threshold range, and thus identify the ideal threshold based on the data.
Data (df) variables: Yield, Prec, Price, 0C, 1C, 2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C
Each loop calculates "thresholds" by selecting a different "b" each time.
a <- df$0C
b <- df$1C
Threshold1 <- (a-b)
Threshold2 <- (b)
Where "b" would be changing in each loop, ranging from 1C to 9C.
Each individual threshold set (1 and 2) should be used to run a regression, and save the SSR for comparison with the subsequent regression utilizing thresholds based on a new "b" value (ranging from 1C TO 9C)
Regression:
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
for each loop of the Regression, I vary the components of calculating thresholds in the following manner:
Current approach is centered around the following code:
df <- read.csv("Data.csv",header=TRUE)
names(df)
0C-9Cvarlist <- names(df)[9:19]
ssr.vec <- matrix(,21,1)
for(i in 1:length(varlist)){
a <- df$0C
b <- df$[i]
Threshold1 <- (a-b)
Threshold2 <- (b)
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
r2 <- summary(reg)$r.squared
ssr.vec[i,] <- c(varlist,r2)
}
colnames(ssr.vec) <- c("varlist","r2")
I am failing to achieve the desired result with the above approach.
Thank you.
I can spot quite a few mistakes...
You need to add variables of interest (Threshold1 anf Threshold2) to the data in the regression. Also, I think that you need to select varlist[i] and not varlist to create your ssr.vec. You need 2 columns to your ssr.vec which is a matrix, so you should call it matrix. You also cannot use something like df$[i] to extract a column! Why is the matrix of length 21 ?! Change the column name to C0,..,C9 and not 0C,..,9C.
For future reference, solve the simple errors before asking question... and include error messages in your post!
This should do the job:
df <- read.csv("Data.csv",header=TRUE)
names(df)[8:19] = paste0("C",0:10)
varlist <- names(df)[9:19]
ssr.vec <- matrix(,21,2)
for(i in 1:length(varlist)){
a <- df$C0
b <- df[,i+9]
df$Threshold1 <- (a-b)
df$Threshold2 <- (b)
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
r2 <- summary(reg)$r.squared
ssr.vec[i,] <- c(varlist[i],r2)
}
colnames(ssr.vec) <- c("varlist","r2")

Random sample from given bivariate discrete distribution

Suppose I have a bivariate discrete distribution, i.e. a table of probability values P(X=i,Y=j), for i=1,...n and j=1,...m. How do I generate a random sample (X_k,Y_k), k=1,...N from such distribution? Maybe there is a ready R function like:
sample(100,prob=biprob)
where biprob is 2 dimensional matrix?
One intuitive way to sample is the following. Suppose we have a data.frame
dt=data.frame(X=x,Y=y,P=pij)
Where x and y come from
expand.grid(x=1:n,y=1:m)
and pij are the P(X=i,Y=j).
Then we get our sample (Xs,Ys) of size N, the following way:
set.seed(1000)
Xs <- sample(dt$X,size=N,prob=dt$P)
set.seed(1000)
Ys <- sample(dt$Y,size=N,prob=dt$P)
I use set.seed() to simulate the "bivariateness". Intuitively I should get something similar to what I need. I am not sure that this is correct way though. Hence the question :)
Another way is to use Gibbs sampling, marginal distributions are easy to compute.
I tried googling, but nothing really relevant came up.
You are almost there. Assuming you have the data frame dt with the x, y, and pij values, just sample the rows!
dt <- expand.grid(X=1:3, Y=1:2)
dt$p <- runif(6)
dt$p <- dt$p / sum(dt$p) # get fake probabilities
idx <- sample(1:nrow(dt), size=8, replace=TRUE, prob=dt$p)
sampled.x <- dt$X[idx]
sampled.y <- dt$Y[idx]
It's not clear to me why you should care that it is bivariate. The probabilities sum to one and the outcomes are discrete, so you are just sampling from a categorical distribution. The only difference is that you are indexing the observations using rows and columns rather than a single position. This is just notation.
In R, you can therefore easily sample from your distribution by reshaping your data and sampling from a categorical distribution. Sampling from a categorical can be done using rmultinom and using which to select the index, or, as Aniko suggests, using sample to sample the rows of the reshaped data. Some bookkeeping can take care of your exact case.
Here's a solution:
library(reshape)
# Reshape data to long format.
data <- matrix(data = c(.25,.5,.1,.4), nrow=2, ncol=2)
pmatrix <- melt(data)
# Sample categorical n times.
rcat <- function(n, pmatrix) {
rows <- which(rmultinom(n,1,pmatrix$value)==1, arr.ind=TRUE)[,'row']
indices <- pmatrix[rows, c('X1','X2')]
colnames(indices) <- c('i','j')
rownames(indices) <- seq(1,nrow(indices))
return(indices)
}
rcat(3,pmatrix)
This returns 3 random draws from your matrix, reporting the i and j of the rows and columns:
i j
1 1 1
2 2 2
3 2 2

Resources