R: Error: new columns would leave holes after existing columns - r

When running this code, I get the following error:
Error in `[<-.data.frame`(`*tmp*`, , i, value = list(x = 0.0654882985934691, :
new columns would leave holes after existing columns
I am trying to populate a data.frame with i number of columns, which with the output of the posted for loop should look like something like this (Excel example for convenience only):
The aim is to store the output of the loop in such a way that I can get the average of each column at a later stage.
What can be done to achieve this?
library(plyr)
library(forecast)
library(vars)
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
# Forecast Accuracy
j = 12 #Forecast horizon
k = nrow(dx)-j #length of minimum training set
prediction <- data.frame()
for (i in 1:j) {
trainingset <- window(dx, end = k+i-1)
testset <- window(dx, start = k+i, end = k+j)
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
fcastmean <- do.call('cbind', fcast[['mean']])
fcastmean <- as.data.frame(fcastmean)
prediction[,i] <- rbind(fcastmean[,1])
}
Edit
As per the comment below, I have edited the above code to specify the first variable of fcastmean.
The error I get has however changed as a result, now being:
Error in `[<-.data.frame`(`*tmp*`, , i, value = c(-0.316529962287372, :
replacement has 1 row, data has 0
Edit 2
Below is the minimum replicable version without any packages as requested in the comments. I believe that should be equivalent in terms of the question posed.
x <- rnorm(70)
y <- rnorm(70)
dx <- cbind(x,y)
dx <- as.ts(dx)
j = 12
k = nrow(dx)-j
prediction <- matrix(NA,j,j)
for (i in 1:j) {
fcast <- as.matrix(1:(j-i+1))
fcastmean <- fcast
prediction[,i] <- (fcastmean)
}

For your new example, try
sapply(1:j, function(i) `length<-`(1:(j-i+1), j))
The result is
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2 2 NA
[3,] 3 3 3 3 3 3 3 3 3 3 NA NA
[4,] 4 4 4 4 4 4 4 4 4 NA NA NA
[5,] 5 5 5 5 5 5 5 5 NA NA NA NA
[6,] 6 6 6 6 6 6 6 NA NA NA NA NA
[7,] 7 7 7 7 7 7 NA NA NA NA NA NA
[8,] 8 8 8 8 8 NA NA NA NA NA NA NA
[9,] 9 9 9 9 NA NA NA NA NA NA NA NA
[10,] 10 10 10 NA NA NA NA NA NA NA NA NA
[11,] 11 11 NA NA NA NA NA NA NA NA NA NA
[12,] 12 NA NA NA NA NA NA NA NA NA NA NA
`length<-`(x, j) pads x with NA until it reaches a length of j.
You can replace 1:(j-i+1) with whatever function of i you want. In the OP's original example, I am guessing something like this will work (untested):
sapply(1:j, function(i){
trainingset <- window(dx, end = k+i-1)
# testset <- window(dx, start = k+i, end = k+j)
# ^ this isn't actually used...
fit <- VAR(trainingset, p = 2)
fcast <- forecast(fit, h = j-i+1)
`length<-`(fcast$mean, j)
})
function(i){...} is called an anonymous function and can be written like any other.

Related

How do I loop correctly?

Here is the data below. I'm not sure which type of looping I should be using, but here is what I am looking to do: If, for row 1, there is a 6 present, then for column 7 we have "Yes", if there is no 6 present, then column 7 has "No". Ignore columns 8 & 9.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 6 1 1 6 1 NA NA NA
[2,] 5 5 5 5 5 5 NA NA NA
[3,] 1 1 6 1 1 6 NA NA NA
[4,] 5 5 5 5 5 5 NA NA NA
[5,] 6 1 1 6 1 1 NA NA NA
[6,] 5 5 5 5 5 5 NA NA NA
[7,] 1 6 1 1 6 1 NA NA NA
[8,] 5 5 5 5 5 5 NA NA NA
[9,] 1 1 6 1 1 6 NA NA NA
[10,] 5 5 5 5 5 5 NA NA NA
Here is the code that I have.
data.matrix <- matrix(data=NA,nrow = b, ncol = n+3)
b <- 10
n <- 6
for (i in 1:b)
{
data.matrix[,1:n] <- sample(6,n,replace=T)
}
Side Note: I keep getting this error
"the condition has length > 1 and only the first element will be used"
Here is a solution using apply:
a[,7] <- apply(a, 1, function(x) ifelse(max(x,na.rm = T) == 6,"YES","NO"))
where a is the input data.frame/tibble. As commented above, if you have matrix, then convert it to data.frame and perform this operation.
Here is solution with lapply and which:
res <- apply(data.matrix, 1, function(x) {
x[[7]] <- length(which(x == 6)) > 0
x
})
res <- t(res)

Populate elements of a matrix using values from a column

I am trying to populate a matrix using the values from a specific column (Dependent). In the example below in row 1 the Dependent value is 3 which will indicate a 1 in the 3rd column. Row 4 has a Dependent value of 2 so a 1 is put in column 2. I have considered using a for loop but was interested if there is a more elegant way of solving the problem.
Project Dependent 1 2 3 4
1 3 1
2
3
4 2 1
5 4 1
Thanks in advance!
For
Project <- 1:5
Dependent <- c(3, 0, 0, 2, 4)
df <- data.frame(Project, Dependent)
Create a matrix
m = matrix(nrow = max(df$Project), ncol = max(df$Dependent))
and populate it using a 2-column matrix of row and column vectors as indexes
m[as.matrix(df)] = 1
here is what you described. hope it helps
Project<-1:5
Dependent<-c(3,0,0,2,4)
df<-data.frame(Project,Dependent)
df
Project Dependent
1 1 3
2 2 0
3 3 0
4 4 2
5 5 4
s<-matrix(NA, nrow = nrow(df), ncol = nrow(df))
for(i in 1:length(df$Dependent)) {
if (i > 0 ) s[i,df$Dependent[i]]<-1 else NULL
}
s
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA 1 NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA 1 NA NA NA
[5,] NA NA NA 1 NA

Produce a triangular matrix of integers increasing by 1

I am trying to produce a matrix of variable dimensions of the form below (i.e. integers increasing by 1 at a time, with a lower triangle of NAs)
NA 1 2 3 4
NA NA 5 6 7
NA NA NA 8 9
NA NA NA NA 10
NA NA NA NA 11
I have used the below code
sample_vector <- c(1:(total_nodes^2))
sample_matrix <- matrix(sample_vector, nrow=total_nodes, byrow=FALSE)
sample_matrix[lower.tri(sample_matrix, diag = TRUE)] <- NA
However the matrix I get with this method is of the form:
NA 2 3 4 5
NA NA 8 9 10
NA NA NA 14 15
NA NA NA NA 20
NA NA NA NA 25
How about this
total_nodes <- 5
sample_matrix <- matrix(NA, nrow=total_nodes, ncol=total_nodes)
sample_matrix[lower.tri(sample_matrix)]<-1:sum(lower.tri(sample_matrix))
sample_matrix <- t(sample_matrix)
sample_matrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 2 3 4
# [2,] NA NA 5 6 7
# [3,] NA NA NA 8 9
# [4,] NA NA NA NA 10
# [5,] NA NA NA NA NA
I'm using the diag function to construct a matrix and upper.tri to turn it into a "target" aas well as a logical indexing tool:
upr5 <- upper.tri(diag(5))
upr5
upr5[upr5] <- 1:sum(upr5)
upr5[upr5==0] <- NA # would otherwise have been zeroes
upr5
[,1] [,2] [,3] [,4] [,5]
[1,] NA 1 2 4 7
[2,] NA NA 3 5 8
[3,] NA NA NA 6 9
[4,] NA NA NA NA 10
[5,] NA NA NA NA NA

How to replace non diagonal elements of matrix by row?

I would like to replace non diagonal elements of matrix with a
sequence of numbers.
I managed to write this:
mat[outer(1:nrows(mat), 1:nrows(mat), function(i,j) j!=i)] <- seq(1:182)
But it fills the number by column. I would not like to use the
transpose function as I have specific row name which I would like to
keep.
Example
So if I have a matrix m
m <- matrix(NA, nrow=5, ncol=5, dimnames=list(letters[1:5], NULL))
m
# [,1] [,2] [,3] [,4] [,5]
# a NA NA NA NA NA
# b NA NA NA NA NA
# c NA NA NA NA NA
# d NA NA NA NA NA
# e NA NA NA NA NA
How can I add a sequence to the non-diagonals while keeping the rownames of the original matrix: expected output
# [,1] [,2] [,3] [,4] [,5]
# a NA 1 2 3 4
# b 5 NA 6 7 8
# c 9 10 NA 11 12
# d 13 14 15 NA 16
# e 17 18 19 20 NA
We can try
mat[lower.tri(mat, diag=FALSE)|upper.tri(mat, diag=FALSE)] <- 1:182
Or
mat[!diag(ncol(mat))] <- 1:182
Using a small example in OP's post
m[!diag(ncol(m))] <- 1:20
out <- t(m)
dimnames(out) <- rev(dimnames(out))
Used rev from #user20650's comments

randomly delete up to 3 elements per row

I would like to randomly delete up to three elements per row of a data set containing five columns. Below is R code I thought would do it, but it allows up to all five elements in a row to be deleted. This seems basic, but I cannot find the error. Thank you for any advice.
set.seed(1234)
# create matrix to contain flags identifying elements to be deleted
delete.these <- matrix(0, nrow=10, ncol=5)
for(i in 1:nrow(delete.these)) {
# for each row randomly select the order of the columns
# to be tested for deletion
rcols <- sample(5, 5, replace = FALSE)
for(j in 1:ncol(delete.these)) {
# select a random draw
delete.it <- runif(1,0,1)
# if random draw is below specified threshold and fewer than three
# elements have already been deleted from the row then delete element
if((delete.it <= 0.7) & sum(delete.these[i,1:5] <= 2)) { delete.these[i,rcols[j]] = 1}
if((delete.it > 0.7) | sum(delete.these[i,1:5] >= 3)) { delete.these[i,rcols[j]] = 0}
}
}
delete.these
Instead of using runif() try drawing the indices directly
delete.these <- matrix(0, nrow=10, ncol=5)
for (i in 1:NROW(delete.these)){
delete.these[i,sample.int(5,sample.int(4,1)-1)] <- 1
}
delete.these
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 0 0
[2,] 0 0 0 0 0
[3,] 0 1 0 1 1
[4,] 0 1 1 0 1
[5,] 1 0 1 0 0
[6,] 0 0 0 0 0
[7,] 1 0 1 0 0
[8,] 0 1 0 1 1
[9,] 0 1 1 0 0
[10,] 1 0 1 0 1
By the way your code doesn't work because of a misplaced paren.
sum(delete.these[i,1:5] <= 2)
should be instead
sum(delete.these[i,1:5]) <= 2
It would be easier (and much faster) to delete with a two column-matrix as an argument to [<-. You did not propose a test case but I will:
dfrm <- data.frame(a1=rnorm(20), a2=rnorm(20),a3=rnorm(20),
a4=rnorm(20),a5=rnorm(20))
dfrm[ matrix( c( rep(1:20,each=3),
replicate(20, {sample(5, 3)} ) ), ncol=2) ] <- NA
> dfrm
a1 a2 a3 a4 a5
1 NA 0.70871541 NA NA -0.6922827
2 1.9846227 1.70592512 NA NA NA
3 0.2684487 NA 0.0008968694 NA NA
4 NA NA 0.5546355410 0.07399188 NA
5 NA 0.82324761 -0.0410918599 NA NA
6 NA NA -1.0715205164 NA -0.1683819
7 0.0933059 NA NA NA 1.3129301
8 NA 0.79382695 0.1877369725 NA NA
9 0.3124101 NA NA -1.22087347 NA
10 -0.1657043 NA NA 1.36626832 NA
11 NA -0.06095247 -0.9622792102 NA NA
12 NA -1.29243386 -1.2133819819 NA NA
13 -0.0886702 NA NA 0.37495775 NA
14 1.0812527 -1.54215156 NA NA NA
15 NA -0.24765627 NA 0.81374405 NA
16 NA 0.21307051 NA NA -0.6825013
17 -0.4129100 NA NA NA -0.9844177
18 NA 1.95881167 0.7977172969 NA NA
19 NA NA 0.0953287645 NA 1.7067591
20 NA NA -0.1057690912 0.73408897 NA
This is assuming that by "delete" you meant set to missing. If the intent were something else you will need to supply a test case and clarify.
This (nested sampling strategy will provide a variable number of rows in the indexing matrix per row of the target matrix:
idx <- sapply(1:20, function(x) {n<- sample(1:5, sample(1:3,1))
matrix( c(rep(x,length(n)), n), ncol=2) }) # list
idx <- do.call(rbind, idx) # now a 2 col matrix
dfrm[ idx] <- NA
> idx <- sapply(1:20, function(x) {n<- sample(1:5, sample(1:3,1))
+ matrix( c(rep(x,length(n)), n), ncol=2) }) # list
> idx <- do.call(rbind, idx) # now a 2 col matrix
>
> dfrm[ idx] <- NA
>
> dfrm
a1 a2 a3 a4 a5
1 -0.048776740 NA 1.1879195 -0.23142932 -3.6185891
2 NA 0.4613289 -0.4532400 -0.85891682 -2.2034714
3 NA NA 1.1191833 1.12545821 NA
4 0.646399767 -0.7126735 2.9474470 0.36358070 NA
5 -0.630929314 1.3770828 NA NA 1.3987857
6 NA NA NA 1.06680025 0.4445383
7 0.484728630 NA 0.7382064 NA 0.9838159
8 -1.558031074 1.1630888 NA NA NA
9 -0.968887379 -0.7330051 NA 0.04621124 -0.9785049
10 0.935436533 NA NA -1.07365274 NA
11 NA 0.2529093 NA -1.38643245 -1.3389529
12 NA -0.2639166 -0.2301257 NA NA
13 2.026646586 -0.2452684 NA -0.30346521 NA
14 0.522717033 NA NA 1.25870278 NA
15 NA NA -0.9934046 -0.89009964 -0.8403772
16 NA NA 0.0987765 -0.98608109 1.4646301
17 NA 0.7693064 -0.9326388 -0.16240266 NA
18 -0.005393965 NA NA NA -0.8111057
19 NA 1.6241122 -1.1376916 0.15812435 NA
20 NA NA NA 0.71059666 0.5170046

Resources