Reading England and Wales Charity Commission bcp files in R

Reading England and Wales Charity Commission bcp files in R - r

I'm trying to read .bcp files provided by https://register-of-charities.charitycommission.gov.uk/register/full-register-download in R. I have been trying previously answered questions here, but readChar does not seem to read everything in all files, namely it breaks for extract_charity.bcp.
So I have thought of readBin and tried to read extract_charity.bcp like this:
library(stringr)
b <- readBin("extract_charity.bcp", "character", n = 300000, size = NA_integer_,
endian = .Platform$endian)
c<- paste0(b, collapse = "" ) #put it back as one large character string
d<- str_locate_all(c, "\\*\\#\\#\\*\\d") #find row breaks followed by a digit
e <- d[[1]]
flags <- e[,1]
f <- c()
f[1] <- substr(c, 1, flags[1]-1)
for (i in 2:length(flags)) f[i]<- substr(c, flags[i-1]+4, flags[i]-1) #removes row breaks
export <- matrix(nrow = 372432, ncol = 18)
exportF <- matrix(nrow = 0, ncol = 18)
for (j in 1:length(flags)) {
new_row <- str_split( f[j], "\\#\\*\\*\\#" )[[1]] #removes column breaks
if (length(new_row)==18) { export[j, ] <- new_row #if correct number of columns
} else { print(j)
exportF <- rbind(exportF, new_row) }}
However, there are 49 errors - all of the same type. There is a strange character string inserted at various places across the table - currently it is "P`j[Ÿ " but when I run the script again, it is "°Tj[Ÿ ", so it provides different string every time I run the script, so I cannot run the script to remove it manually:
str_replace_all(c, problem, "")
Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
Missing closing bracket on a bracket expression. (U_REGEX_MISSING_CLOSE_BRACKET)

Just to let the world know, this can be done.
In first pass, the file is parsed and problems are stored in exportF, where it is identified and deleted from original parsing output. Then in the second pass, that is parsed correctly.
This is a mess, but it works, and pretty fast, too.
library(stringr)
library(stringi)
b <- readBin("extract_charity.bcp", "character", n = 300000, size = NA_integer_)
c<- paste0(b, collapse = "" )
tt<- str_locate_all(c, "\\*\\#\\#\\*\\d")
e <- tt[[1]]
flags <- e[,1]
f <- c()
f[1] <- substr(c, 1, flags[1]-1)
for (i in 2:length(flags)) {
f[i]<- substr(c, flags[i-1]+4, flags[i]-1)
}
export <- matrix(nrow = length(flags), ncol = 18)
exportF <- matrix(nrow = 0, ncol = 18)
for (j in 1:length(flags)) {
new_row <- str_split( f[j], "\\#\\*\\*\\#" )[[1]]
if (length(new_row)==18) { export[j, ] <- new_row
} else {print(flags[j])
exportF <- rbind(exportF, new_row) }}
#go trough the first line and see where the problem is and locate its position
problem <- str_sub(as.character(exportF[1,8]), 5, 10)
#CHECK TO SEE IF CORRECT
problem %in% str_sub(exportF[1,8], 5, 10)
problem %in% exportF[1,8]
str_detect(c,problem )
str_detect(b[324],problem )
#d <-stri_replace_all_charclass(b, problem, "")
str_detect(d,problem )
r<- gsub(problem, "", b )
str_detect(r,problem )
#now go again but with clean data
r<- paste0(r, collapse = "" )
tt<- str_locate_all(r, "\\*\\#\\#\\*\\d")
e <- tt[[1]]
flags <- e[,1]
f <- c()
f[1] <- substr(r, 1, flags[1]-1)
for (i in 2:length(flags)) {
f[i]<- substr(r, flags[i-1]+4, flags[i]-1)
}
#g<- str_split(f[372432], "\\#\\*\\*\\#")[[1]]
export <- matrix(nrow = 372434, ncol = 18)
exportF <- matrix(nrow = 0, ncol = 18)
for (j in 1:length(flags)) {
new_row <- str_split( f[j], "\\#\\*\\*\\#" )[[1]]
if (length(new_row)==18) { export[j, ] <- new_row
} else {print(flags[j])
exportF <- rbind(exportF, new_row) }}
write.csv(export, "extract_charity2021.csv", row.names = F)
leaving it here for either future myself or someone in need of doing this.

Related

rowSums - 'x' must be an array of at least two dimensions

This is really hard to explain but basically I have a dataset where people completed a wordsearch task. By using the following code I indexed the letters of the wordsearch by finding their numbers in the descriptions. I put them into a matrix so that I can use them to index from the dataset later:
number <- "#101"
wordsearch <- matrix(rep(0, times = 16 * ncol(data)), nrow = 16, ncol = ncol(data))
for (i in 1:9){
for (j in 1:ncol(data)){
wordsearch[i,j] <- grepl(number, data[1, j], fixed = T)
}
number <- paste("#10", (i+1), sep = "")
}
number <- "#110"
for (i in 10:15) {
for (j in 34:217){
wordsearch[i,j] <- grepl(number, data[1, j], fixed = T)
}
number <- paste("#1", (i+1), sep = "")
}
number <- "#3"
for (j in 1:ncol(data)){
wordsearch[16,j] <- grepl(number, data[1, j], fixed = T)
}
This part works perfectly. Then I want to sum the number of letters people found for each word and create new columns for each word and add to the dataset. First I got the error that 'x must be numeric' so I did data[is.na(data] <- 0
And then I did the following code:
col <- seq(261, by = 1, length.out = 16)
for (i in 1:16){
d2[, col[i]] <- rowSums(d2[, wordsearch[i, ] == 1])
}
I literally just did that with another dataset and it worked fine but now I'm getting the "x' must be an array of at least two dimensions". Can someone help?

homals package for Nonlinear PCA in R: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

I am trying to implement NLPCA (Nonlinear PCA) on a data set using the homals package in R but I keep on getting the following error message:
Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent
The data set I use can be found in the UCI ML Repository and it's called dat when imported in R: https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29
Here is my code (some code is provided once the data set is downloaded):
nlpcasouthgerman <- homals(dat, rank=1, level=c('nominal','numerical',rep('nominal',2),
'numerical','nominal',
rep('ordinal',2), rep('nominal',2),
'ordinal','nominal','numerical',
rep('nominal',2), 'ordinal',
'nominal','ordinal',rep('nominal',3)),
active=c(FALSE, rep(TRUE, 20)), ndim=3, verbose=1)
I am trying to predict the first attribute, therefore I set it to be active=FALSE.
The output looks like this (skipped all iteration messages):
Iteration: 1 Loss Value: 0.000047
Iteration: 2 Loss Value: 0.000044
...
Iteration: 37 Loss Value: 0.000043
Iteration: 38 Loss Value: 0.000043
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
I don't understand why this error comes up. I have used the same code on some other data set and it worked fine so I don't see why this error persists. Any suggestions about what might be going wrong and how I could fix this issue?
Thanks!

It seems the error comes from code generating NAs in the homals function, specifically for your data for the number_credits levels, which causes problems with sort(as.numeric((rownames(clist[[i]])))) and the attempt to catch the error, since one of the levels does not give an NA value.
So either you have to modify the homals function to take care of such an edge case, or change problematic factor levels. This might be something to file as a bug report to the package maintainer.
As a work-around in your case you could do something like:
levels(dat$number_credits)[1] <- "_1"
and the function should run without problems.
Edit:
I think one solution would be to change one line of code in the homals function, but no guarantee this does work as intended. Better submit a bug report to the package author/maintainer - see https://cran.r-project.org/web/packages/homals/ for the address.
Using rnames <- as.numeric(rownames(clist[[i]]))[order(as.numeric(rownames(clist[[i]])))] instead of rnames <- sort(as.numeric((rownames(clist[[i]])))) would allow the following code to identify NAs, but I am not sure why the author did not try to preserve factor levels outright.
Anyway, you could run a modified function in your local environment, which would require to explicitly call internal (not exported) homals functions, as shown below. Not necessarily the best approach, but would help you out in a pinch.
homals <- function (data, ndim = 2, rank = ndim, level = "nominal", sets = 0,
active = TRUE, eps = 0.000001, itermax = 1000, verbose = 0) {
dframe <- data
name <- deparse(substitute(dframe))
nobj <- nrow(dframe)
nvar <- ncol(dframe)
vname <- names(dframe)
rname <- rownames(dframe)
for (j in 1:nvar) {
dframe[, j] <- as.factor(dframe[, j])
levfreq <- table(dframe[, j])
if (any(levfreq == 0)) {
newlev <- levels(dframe[, j])[-which(levfreq == 0)]
}
else {
newlev <- levels(dframe[, j])
}
dframe[, j] <- factor(dframe[, j], levels = sort(newlev))
}
varcheck <- apply(dframe, 2, function(tl) length(table(tl)))
if (any(varcheck == 1))
stop("Variable with only 1 value detected! Can't proceed with estimation!")
active <- homals:::checkPars(active, nvar)
rank <- homals:::checkPars(rank, nvar)
level <- homals:::checkPars(level, nvar)
if (length(sets) == 1)
sets <- lapply(1:nvar, "c")
if (!all(sort(unlist(sets)) == (1:nvar))) {
print(cat("sets union", sort(unlist(sets)), "\n"))
stop("inappropriate set structure !")
}
nset <- length(sets)
mis <- rep(0, nobj)
for (l in 1:nset) {
lset <- sets[[l]]
if (all(!active[lset]))
(next)()
jset <- lset[which(active[lset])]
for (i in 1:nobj) {
if (any(is.na(dframe[i, jset])))
dframe[i, jset] <- NA
else mis[i] <- mis[i] + 1
}
}
for (j in 1:nvar) {
k <- length(levels(dframe[, j]))
if (rank[j] > min(ndim, k - 1))
rank[j] <- min(ndim, k - 1)
}
x <- cbind(homals:::orthogonalPolynomials(mis, 1:nobj, ndim))
x <- homals:::normX(homals:::centerX(x, mis), mis)$q
y <- lapply(1:nvar, function(j) homals:::computeY(dframe[, j], x))
sold <- homals:::totalLoss(dframe, x, y, active, rank, level, sets)
iter <- pops <- 0
repeat {
iter <- iter + 1
y <- homals:::updateY(dframe, x, y, active, rank, level, sets,
verbose = verbose)
smid <- homals:::totalLoss(dframe, x, y, active, rank, level,
sets)/(nobj * nvar * ndim)
ssum <- homals:::totalSum(dframe, x, y, active, rank, level, sets)
qv <- homals:::normX(homals:::centerX((1/mis) * ssum, mis), mis)
z <- qv$q
snew <- homals:::totalLoss(dframe, z, y, active, rank, level,
sets)/(nobj * nvar * ndim)
if (verbose > 0)
cat("Iteration:", formatC(iter, digits = 3, width = 3),
"Loss Value: ", formatC(c(smid), digits = 6,
width = 6, format = "f"), "\n")
r <- abs(qv$r)/2
ops <- sum(r)
aps <- sum(La.svd(crossprod(x, mis * z), 0, 0)$d)/ndim
if (iter == itermax) {
stop("maximum number of iterations reached")
}
if (smid > sold) {
warning(cat("Loss function increases in iteration ",
iter, "\n"))
}
if ((ops - pops) < eps)
break
else {
x <- z
pops <- ops
sold <- smid
}
}
ylist <- alist <- clist <- ulist <- NULL
for (j in 1:nvar) {
gg <- dframe[, j]
c <- homals:::computeY(gg, z)
d <- as.vector(table(gg))
lst <- homals:::restrictY(d, c, rank[j], level[j])
y <- lst$y
a <- lst$a
u <- lst$z
ylist <- c(ylist, list(y))
alist <- c(alist, list(a))
clist <- c(clist, list(c))
ulist <- c(ulist, list(u))
}
dimlab <- paste("D", 1:ndim, sep = "")
for (i in 1:nvar) {
if (ndim == 1) {
ylist[[i]] <- cbind(ylist[[i]])
ulist[[i]] <- cbind(ulist[[i]])
clist[[i]] <- cbind(clist[[i]])
}
options(warn = -1)
# Here is the line that I changed in the code:
# rnames <- sort(as.numeric((rownames(clist[[i]]))))
rnames <- as.numeric(rownames(clist[[i]]))[order(as.numeric(rownames(clist[[i]])))]
options(warn = 0)
if ((any(is.na(rnames))) || (length(rnames) == 0))
rnames <- rownames(clist[[i]])
if (!is.matrix(ulist[[i]]))
ulist[[i]] <- as.matrix(ulist[[i]])
rownames(ylist[[i]]) <- rownames(ulist[[i]]) <- rownames(clist[[i]]) <- rnames
rownames(alist[[i]]) <- paste(1:dim(alist[[i]])[1])
colnames(clist[[i]]) <- colnames(ylist[[i]]) <- colnames(alist[[i]]) <- dimlab
colnames(ulist[[i]]) <- paste(1:dim(as.matrix(ulist[[i]]))[2])
}
names(ylist) <- names(ulist) <- names(clist) <- names(alist) <- colnames(dframe)
rownames(z) <- rownames(dframe)
colnames(z) <- dimlab
dummymat <- as.matrix(homals:::expandFrame(dframe, zero = FALSE, clean = FALSE))
dummymat01 <- dummymat
dummymat[dummymat == 2] <- NA
dummymat[dummymat == 0] <- Inf
scoremat <- array(NA, dim = c(dim(dframe), ndim), dimnames = list(rownames(dframe),
colnames(dframe), paste("dim", 1:ndim, sep = "")))
for (i in 1:ndim) {
catscores.d1 <- do.call(rbind, ylist)[, i]
dummy.scores <- t(t(dummymat) * catscores.d1)
freqlist <- apply(dframe, 2, function(dtab) as.list(table(dtab)))
cat.ind <- sequence(sapply(freqlist, length))
scoremat[, , i] <- t(apply(dummy.scores, 1, function(ds) {
ind.infel <- which(ds == Inf)
ind.minfel <- which(ds == -Inf)
ind.nan <- which(is.nan(ds))
ind.nael <- which((is.na(ds) + (cat.ind != 1)) ==
2)
ds[-c(ind.infel, ind.minfel, ind.nael, ind.nan)]
}))
}
disc.mat <- apply(scoremat, 3, function(xx) {
apply(xx, 2, function(cols) {
(sum(cols^2, na.rm = TRUE))/nobj
})
})
result <- list(datname = name, catscores = ylist, scoremat = scoremat,
objscores = z, cat.centroids = clist, ind.mat = dummymat01,
loadings = alist, low.rank = ulist, discrim = disc.mat,
ndim = ndim, niter = iter, level = level, eigenvalues = r,
loss = smid, rank.vec = rank, active = active, dframe = dframe,
call = match.call())
class(result) <- "homals"
result
}

R script (including multiple functions and loop) is very slow on a specific part

I have below code, which takes ages to run. I have already tried multiple things such as: limiting the amount of loops, using ifelse statements, and trying to declare as much thing as possible outside of the loop. However, it still takes a very long time.
What would be a good way to improve on this part of my code to improve its processing speed?
Are there some things I'm not seeing?
z <- 625
numDays <- 365
k <- numDays * 96
#To estimate the size of the list
df <- rmarkovchain(n=365, object = mcList, t0= "home", include.t0 = TRUE)
allTheCars <- rep(list(df), z)
#an example of df below:
Locations <- c("Home", "Bakery", "Grocery", "Home-Bakery", "Home-Grocery", "Bakery-Home", "Bakery-Grocery", "Grocery-Home", "Grocery-Bakery")
Iteration <- rep(seq(1:96), 365)
df <- data.frame(Iteration, sample(Locations, k, replace = TRUE))
#The loop takes a huge amount of time
for(y in 1:z){
df <- rmarkovchain(n=365, object = mcList, t0= "Home", include.t0 = TRUE)
df$Begin <- 0
df[1,3] <- b
df$Still <- ifelse(df$values == "Home", 1, 0)
df$KM <- vlookup(df$values, averageDistance, lookup_column = 1, result_column = 2)
df$Load <- ifelse(df$Still == 1, cp, 0)
df$costDistance <- df$KM * 0.21
df$End <- 0
df[is.na(df)] <- 0
df$reduce <- rep(seq(1:97), numDays)
df <- df %>% filter(reduce != 97)
df$Load <- ifelse(df$reduce <= 69 | df$reduce >= 87, df$Load, 0)
for(i in 1:k) {
df[i,3] <- ifelse(df[i,3] < b, pmin(df[i,3] + df[i,6], b), df[i,3])
df[i,8] <- df[i,3] - df[i,7]
j <- i + 1
df[j,3] <- df[i,8]
}
allDf[[y]] <- df
}
EDIT:
After Minem's suggestion to look at Profvis I found out that the second for-loop takes by far the most amount of time, which now looks like this:
for(i in 1:k) {
mainVector <- df[i,3]
extra <- df[i,6]
subtractingVector <- df[i,7]
mainVector <- ifelse(mainVector < b, pmin(mainVector + extra, b), mainVector )
newMain <- mainVector - subtractingVector
j <- i + 1
df[j,3] <- newMain
}
Now the vectorization of the first three vectors takes some time and the last line of code, which integrates the calculated value back in the dataframe costs the most time. Is there anyway to improve upon this?
Edit 2:
Reproducible example for all
library(dplyr)
library(markovchain)
library(expss)
matrixExample <- matrix(sample(runif(81, min = 0 , max =1), replace = FALSE ), nrow = 9, ncol = 9)
mcListLoop <- rep(list(matrixExample), 96)
mcList <- new("markovchainList", markovchains = mcListLoop)
distance <- runif(9, min = 5, max =10)
Locations <- c("Home", "Bakery", "Grocery", "Home-Bakery", "Home-Grocery", "Bakery-Home", "Bakery-Grocery", "Grocery-Home", "Grocery-Bakery")
averageDistance <- data.frame(cbind(distance, Locations))

Error in strsplit(word, NULL) : non-character argument with spell checker

I try to do a spelling checker with R that correct a spelling mistake of a word or a document.
I try with this R code to do a correction for a word, which it works very well:
> Correct("speling", dtm = counts)
$l4
[1] "spelling"
but when I try to do the correction of a document, I get this error :
> CorrectDocument("the quick bruwn fowx jumpt ovre tha lasy dog", dtm = counts)
Error in strsplit(word, NULL) : non-character argument
# This is a text processing function, which I
# borrowed from a CMU Data mining course professor.
strip.text <- function(txt) {
# remove apostrophes (so "don't" -> "dont", "Jane's" -> "Janes", etc.)
txt <- gsub("'","",txt)
# convert to lowercase
txt <- tolower(txt)
# change other non-alphanumeric characters to spaces
txt <- gsub("[^a-z0-9]"," ",txt)
# change digits to #
txt <- gsub("[0-9]+"," ",txt)
# split and make one vector
txt <- unlist(strsplit(txt," "))
# remove empty words
txt <- txt[txt != ""]
return(txt)
}
# Words within 1 transposition.
Transpositions <- function(word = FALSE) {
N <- nchar(word)
if (N > 2) {
out <- rep(word, N - 1)
word <- unlist(strsplit(word, NULL))
# Permutations of the letters
perms <- matrix(c(1:(N - 1), 2:N), ncol = 2)
reversed <- perms[, 2:1]
trans.words <- matrix(rep(word, N - 1), byrow = TRUE, nrow = N - 1)
for(i in 1:(N - 1)) {
trans.words[i, perms[i, ]] <- trans.words[i, reversed[i, ]]
out[i] <- paste(trans.words[i, ], collapse = "")
}
}
else if (N == 2) {
out <- paste(word[2:1], collapse = "")
}
else {
out <- paste(word, collapse = "")
}
return(out)
}
# Single letter deletions.
Deletes <- function(word = FALSE) {
N <- nchar(word)
word <- unlist(strsplit(word, NULL))
out <- list()
for(i in 1:N) {
out[i] <- paste(word[-i], collapse = "")
}
return(out)
}
# Single-letter insertions.
Insertions <- function(word = FALSE) {
N <- nchar(word)
out <- list()
for (letter in letters) {
out[[letter]] <- rep(word, N + 1)
for (i in 1:(N + 1)) {
out[[letter]][i] <- paste(substr(word, i - N, i - 1), letter,
substr(word, i, N), sep = "")
}
}
out <- unlist(out)
return(out)
}
# Single-letter replacements.
Replaces <- function(word = FALSE) {
N <- nchar(word)
out <- list()
for (letter in letters) {
out[[letter]] <- rep(word, N)
for (i in 1:N) {
out[[letter]][i] <- paste(substr(word, i - N, i - 1), letter,
substr(word, i + 1, N + 1), sep = "")
}
}
out <- unlist(out)
return(out)
}
# All Neighbors with distance "1"
Neighbors <- function(word) {
neighbors <- c(word, Replaces(word), Deletes(word),
Insertions(word), Transpositions(word))
return(neighbors)
}
# Probability as determined by our corpus.
Probability <- function(word, dtm) {
# Number of words, total
N <- length(dtm)
word.number <- which(names(dtm) == word)
count <- dtm[word.number]
pval <- count/N
return(pval)
}
# Correct a single word.
Correct <- function(word, dtm) {
neighbors <- Neighbors(word)
# If it is a word, just return it.
if (word %in% names(dtm)) {
out <- word
}
# Otherwise, check for neighbors.
else {
# Which of the neighbors are known words?
known <- which(neighbors %in% names(dtm))
N.known <- length(known)
# If there are no known neighbors, including the word,
# look farther away.
if (N.known == 0) {
print(paste("Having a hard time matching '", word, "'...", sep = ""))
neighbors <- unlist(lapply(neighbors, Neighbors))
}
# Then out non-words.
neighbors <- neighbors[which(neighbors %in% names(dtm))]
N <- length(neighbors)
# If we found some neighbors, find the one with the highest
# p-value.
if (N > 1) {
P <- 0*(1:N)
for (i in 1:N) {
P[i] <- Probability(neighbors[i], dtm)
}
out <- neighbors[which.max(P)]
}
# If no neighbors still, return the word.
else {
out <- word
}
}
return(out)
}
# Correct an entire document.
CorrectDocument <- function(document, dtm) {
by.word <- unlist(strsplit(document, " "))
N <- length(by.word)
for (i in 1:N) {
by.word[i] <- Correct(by.word[i], dtm = dtm)
}
corrected <- paste(by.word, collapse = " ")
return(corrected)
}
words <- scan("http://norvig.com/big.txt", what = character())
words <- strip.text(words)
counts <- table(words)
Correct("speling", dtm = counts)
#---correct a document
CorrectDocument("the quick bruwn fowx jumpt ovre tha lasy dog", dtm = counts)
Any idea please?
Thank you

The function Correct has a bug, you should add an unlist, i.e. the line :
Correct <- function(word, dtm) {
neighbors <- Neighbors(word)
should be changed as :
Correct <- function(word, dtm) {
neighbors <- unlist(Neighbors(word))
EDIT :
Here's a function which correct the lines of a document file (overwriting it) :
CorrectDocumentFile <- function(file,dtm){
# read the file lines
textLines <- unlist(readLines(file))
# for each line not empty or blank, correct the text
for(i in which(!grepl("^\\s*$",textLines))){
line <- textLines[[i]]
textLines[i] <- CorrectDocument(line,dtm)
}
# overwrite the file with the correction
writeLines(textLines, file)
}
Usage:
CorrectDocumentFile(file="fileToBeCorrected.txt", dtm=counts)

If error in loop create vector of "n" and continue

I have a loop in R which tests every possible combination of ARIMA with specific conditions and tests the lags. However during the loop there is an error
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [1]
When this error occurs I want it to create a vector of "n" which will be put into a matrix with the rest of the models. I have tried tryCatch but this for some reason stops the rest of the iterations from happening.
Here is my code:
N<- c(155782.7, 159463.7, 172741.1, 204547.2, 126049.3, 139881.9, 140747.3, 251963.0, 182444.3, 207780.8, 189251.2, 318053.7, 230569.2, 247826.8, 237019.6, 383909.5, 265145.5, 264816.4, 239607.0, 436403.1, 276767.7, 286337.9, 270022.7, 444672.9, 263717.2, 343143.9, 271701.7)
aslog<-"n"
library(gtools)
library(forecast)
a<-permutations(n=3,r=6,v=c(0:2),repeats.allowed=TRUE)
a<-a[ifelse((a[,1]+a[,4]>2|a[,2]+a[,5]>2|a[,3]+a[,6]>2),FALSE,TRUE),]
namWA<-matrix(0,ncol=1,nrow=length(a[,1]))
namWS<-matrix(0,ncol=1,nrow=length(a[,1]))
Arimafit<-matrix(0,ncol=length(N),nrow=length(a[,1]),byrow=TRUE)
tota<-matrix(0,ncol=1,nrow=length(a[,1]))
totb<-matrix(0,ncol=1,nrow=length(a[,1]))
for(i in 1:length(a[,1])){
namWA[i]<-paste("orderWA",i,sep=".")
assign(namWA[i],a[i,c(1:3)])
namWS[i]<-paste("orderWS",i,sep=".")
assign(namWS[i],a[i,c(4:6)])
ArimaW1 <- Arima(N, order= a[i,c(1:3)], seasonal=list(order=a[i,c(4:6)]),method="ML")
if(aslog=="y"){Arimafit[i,]<-c(exp(fitted(ArimaW1)))}else{Arimafit[i,]<-c(fitted(ArimaW1))}
nnn<-c(N)
arimab<-c(Arimafit[i,])
fullres<-nnn-arimab
v<-acf(fullres,plot=FALSE)
w<-pacf(fullres,plot=FALSE)
if(v$acf[2]>0.4|v$acf[2]<(-0.4)|v$acf[3]>0.4|v$acf[3]<(-0.4)|v$acf[4]>0.4|v$acf[4]<(-0.4)|v$acf[5]>0.4|v$acf[5]<(-0.4)|v$acf[6]>0.4|v$acf[6]<(-0.4)|v$acf[7]>0.4|v$acf[7]<(-0.4)|w$acf[1]>0.4|w$acf[1]<(-0.4)|w$acf[2]>0.4|w$acf[2]<(-0.4)|w$acf[3]>0.4|w$acf[3]<(-0.4)|w$acf[4]>0.4|w$acf[4]<(-0.4)|w$acf[5]>0.4|w$acf[5]<(-0.4)|w$acf[6]>0.4|w$acf[6]<(-0.4))
tota[i]<-"n" else{
tota[i]<-sum(abs(v$acf[2:7]))
totb[i]<-sum(abs(w$acf[1:6]))}
}
I tried doing
ArimaW1<-tryCatch(Arima(N, order= a[i,c(1:3)], seasonal=list(order=a[i,c(4:6)]),method="ML"),error=function(e) NULL)
and this gave another error
Error in Arimafit[i, ] <- c(fitted(ArimaW1)) :
number of items to replace is not a multiple of replacement length
then i tried:
ArimaW1<-tryCatch(Arima(N, order= a[i,c(1:3)], seasonal=list(order=a[i,c(4:6)]),method="ML"),error=function(e) matrix("n",ncol=length(Arimafit[1,])))
but this gave an error:
Error: $ operator is invalid for atomic vectors
and also gave a matrix with all the fitted ARIMA values up to iteration 68, after that it gives everything as 0.0
is there a way to get the loop to continue the iterations, filling a vector with a value which goes into the matrix Arimafit like the iterations that do work so that i can carry on with the code?

I just found out the way to do what i wanted to do. This may help other people so I wont delete it, ill just post the solution :)
library(gtools)
a<-permutations(n=3,r=6,v=c(0:2),repeats.allowed=TRUE)
a<-a[ifelse((a[,1]+a[,4]>2|a[,2]+a[,5]>2|a[,3]+a[,6]>2),FALSE,TRUE),]
namWA<-matrix(0,ncol=1,nrow=length(a[,1]))
namWS<-matrix(0,ncol=1,nrow=length(a[,1]))
Arimafit<-matrix(0,ncol=length(N),nrow=length(a[,1]),byrow=TRUE)
tota<-matrix(0,ncol=1,nrow=length(a[,1]))
totb<-matrix(0,ncol=1,nrow=length(a[,1]))
arimaerror<-matrix(0,ncol=length(N),nrow=1)
for(i in 1:length(a[,1])){
namWA[i]<-paste("orderWA",i,sep=".")
assign(namWA[i],a[i,c(1:3)])
namWS[i]<-paste("orderWS",i,sep=".")
assign(namWS[i],a[i,c(4:6)])
ArimaW1 <- try(Arima(N, order= a[i,c(1:3)], seasonal=list(order=a[i,c(4:6)]),method="ML"))
if(is(ArimaW1,"try-error"))
ArimaW1<-arimaerror else
ArimaW1<-ArimaW1
arimafitted<-try(fitted(ArimaW1))
if(is(arimafitted,"try-error"))
fitarima<-arimaerror else
fitarima<-arimafitted
if(aslog=="y"){Arimafit[i,]<-c(exp(fitarima))}else{Arimafit[i,]<-c(fitarima)}
nnn<-c(N)
arimab<-c(Arimafit[i,])
fullres<-nnn-arimab
v<-acf(fullres,plot=FALSE)
w<-pacf(fullres,plot=FALSE)
if(v$acf[2]>0.4|v$acf[2]<(-0.4)|v$acf[3]>0.4|v$acf[3]<(-0.4)|v$acf[4]>0.4|v$acf[4]<(-0.4)|v$acf[5]>0.4|v$acf[5]<(-0.4)|v$acf[6]>0.4|v$acf[6]<(-0.4)|v$acf[7]>0.4|v$acf[7]<(-0.4)|w$acf[1]>0.4|w$acf[1]<(-0.4)|w$acf[2]>0.4|w$acf[2]<(-0.4)|w$acf[3]>0.4|w$acf[3]<(-0.4)|w$acf[4]>0.4|w$acf[4]<(-0.4)|w$acf[5]>0.4|w$acf[5]<(-0.4)|w$acf[6]>0.4|w$acf[6]<(-0.4))
tota[i]<-"n" else{
tota[i]<-sum(abs(v$acf[2:7]))
totb[i]<-sum(abs(w$acf[1:6]))}
}

Here is a further adaption to what i wanted to achieve
a <- permutations(n = 3, r = 6, v = c(0:2), repeats.allowed = TRUE)
a <- a[ifelse((a[, 1] + a[, 4] > 2 | a[, 2] + a[, 5] > 2 | a[, 3] + a[, 6] > 2),
FALSE, TRUE), ]
Arimafit <- matrix(0,
ncol = length(Data.new),
nrow = length(a[, 1]),
byrow = TRUE)
totb <- matrix(0, ncol = 1, nrow = length(a[, 1]))
arimaerror <- matrix(0, ncol = length(Data.new), nrow = 1)
for (i in 1:length(a[, 1])){
ArimaData.new <- try(Arima(Data.new,
order = a[i, c(1:3)],
seasonal = list(order = a[i, c(4:6)]),
method = "ML"),
silent = TRUE)
if (is(ArimaData.new, "try-error")){
ArimaData.new <- arimaerror
} else {
ArimaData.new <- ArimaData.new
}
arimafitted <- try(fitted(ArimaData.new), silent = TRUE)
if (is(arimafitted, "try-error")){
fitarima <- arimaerror
} else {
fitarima <- arimafitted
}
if (as.log == "log"){
Arimafit[i, ] <- c(exp(fitarima))
Datanew <- c(exp(Data.new))
} else {
if (as.log == "sqrt"){
Arimafit[i, ] <- c((fitarima)^2)
Datanew <- c((Data.new)^2)
} else {
Arimafit[i, ] <- c(fitarima)
Datanew <- c(Data.new)
}
}
data <- c(Datanew)
arima.fits <- c(Arimafit[i, ])
fullres <- data - arima.fits
v <- acf(fullres, plot = FALSE)
w <- pacf(fullres, plot = FALSE)
if (v$acf[2]>0.4|v$acf[2]<(-0.4)|v$acf[3]>0.4|v$acf[3]<(-0.4)|v$acf[4]>0.4|v$acf[4]<(-0.4)|v$acf[5]>0.4|v$acf[5]<(-0.4)|v$acf[6]>0.4|v$acf[6]<(-0.4)|v$acf[7]>0.4|v$acf[7]<(-0.4)|w$acf[1]>0.4|w$acf[1]<(-0.4)|w$acf[2]>0.4|w$acf[2]<(-0.4)|w$acf[3]>0.4|w$acf[3]<(-0.4)|w$acf[4]>0.4|w$acf[4]<(-0.4)|w$acf[5]>0.4|w$acf[5]<(-0.4)|w$acf[6]>0.4|w$acf[6]<(-0.4)){
totb[i] <- "n"
} else {
totb[i] <- sum(abs(w$acf[1:4]))
}
j <- match(min(totb), totb)
order.arima <- a[j, c(1:3)]
order.seasonal.arima <- a[j, c(4:6)]
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Reading England and Wales Charity Commission bcp files in R - r

Related

rowSums - 'x' must be an array of at least two dimensions

homals package for Nonlinear PCA in R: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

R script (including multiple functions and loop) is very slow on a specific part

Error in strsplit(word, NULL) : non-character argument with spell checker

If error in loop create vector of "n" and continue

Categories

Resources