Chess PGN: Error in paste... result would exceed 2^31-1 bytes - r

I'm using the library bigchess in R to read in a 3GB PGN file, see source here: https://github.com/rosawojciech/bigchess
Reading it in like so:
df <- read.pgn(paste0(path, file), add.tags = c("UTCDate", "UTCTime", "WhiteElo", "BlackElo", "WhiteRatingDiff", "BlackRatingDiff", "WhiteTitle", "BlackTitle","TimeControl", "Termination"), n.moves = T, extract.moves = -1, stat.moves = T, big.mode = F, quiet = F, ignore.other.games = F)
I get the following error -
"Error in paste(subset(r2, tmp1 == "Movetext", select = c(tmp2))[, 1],
: result would exceed 2^31-1 bytes"
Based on internet searches this exists because R cannot store a character vector in memory that is greater than 2Gb. Is there a way to override this limit, or otherwise deal with the issue?

Related

Using the try and try catch function on .WAV files

Trying to cut a bunch of audio (.WAV) files into smaller samples in R. For this example, I'm using a loop to cut out 1 minute samples at 140 minutes.
For some files, the recording ends before 140 minutes due to an error in the recording device. When this occurs, an error appears -- and the loop stops. I'm trying to make it so the loop continues by using the try or tryCatch function however keep getting errors.
The code is as follows:
for(i in 1:length(AR_CD288)){
CUT_AR288_5 <- try({readWave(AR_CD288[i], from = 140, to = 141, units = "minutes")})
FILE.OUT_AR288_5<- sub("\\.wav$", "_140.wav", AR_CD288)
OUT.PATH_AR288_5 <- file.path("New files", basename(FILE.OUT_AR288_5))
writeWave(CUT_AR288_5, extensible=FALSE, filename = OUT.PATH_AR288_5[i])
}
I get the following two errors from the code:
Error in readBin(con, int, n = N, size = bytes, signed = (bytes != 1), :
invalid 'n' argument
Error in writeWave(CUT_AR288_5, extensible = FALSE, filename = OUT.PATH_AR288_5[i]) :
'object' needs to be of class 'Wave' or 'WaveMC
The loop still saves some samples into the "New files" directory, however, once the loop reaches a file <140 minutes, the loop stops.
I am very stuck! Any help would be greatly appreciated.
Cheers.
When I use try, I always do one (or both) of:
check the return value to see if it inherits "try-error", indicating that the command failed; or
add try(., silent = TRUE), indicating that I don't care if it succeeded (but this implies that I will not use its return value, either).
Try this:
for (i in seq_along(AR_CD288)) {
CUT_AR288_5 <- try({
readWave(AR_CD288[i], from = 140, to = 141, units = "minutes")
}, silent = TRUE)
if (!inherits(CUT_AR288_5, "try_error")) {
FILE.OUT_AR288_5 <- sub("\\.wav$", "_140.wav", AR_CD288)
OUT.PATH_AR288_5 <- file.path("New files", basename(FILE.OUT_AR288_5))
writeWave(CUT_AR288_5, extensible = FALSE, filename = OUT.PATH_AR288_5[i])
}
}
Three notes:
I changed 1:length(.) to seq_along(.); the latter is more resilient in an automated use when it is feasible that the vector might be length 0. For example, if AR_CD288 can ever be length 2, intuitively we expect 1:length(AR_CD288) to return nothing so that the for loop will not run; unfortunately, it resolves to 1:0 which returns a vector of length 2, which will often fail (based on whatever code is operating in the loop). The use of seq_along(.) will always return a vector of length 0 with an empty input, which is what we need. (Alternatively and equivalent, seq_len(length(AR_CD288)), though that's really what seq_along is intended to do.)
If you do not add silent=TRUE (or explicitly add silent=FALSE), then you will get an error message indicating that the command failed. Unfortunately, the error message may not indicate which i failed, so you may be left in the dark as far as fixing or removing the errant file. You may prefer to add an else to the if (inherits(.,"try-error")) clause so that you can provide a clearer error, such as
if (inherits(CUT_AR288_5, "try_error")) {
warning("'readWave' failed on ", sQuote(AR_CD288[i]), call. = FALSE)
} else {
FILE.OUT_AR288_5 <- sub("\\.wav$", "_140.wav", AR_CD288)
# ...
}
(noting that I put the "things worked" code in the else clause here ... I find it odd to do if (!...) {} else {}, seems like a double-negation :-).
The choice to wrap one function or the whole block depends on your needs: I tend to prefer to know exactly where things fail, so the will-possibly-fail functions are often individually wrapped with try so that I can react (or log/message) accordingly. If you don't need that resolution of error-detection, then you can certainly wrap the whole code-block in a sense:
for (i in seq_along(AR_CD288)) {
ret <- try({
CUT_AR288_5 <- readWave(AR_CD288[i], from = 140, to = 141, units = "minutes")
FILE.OUT_AR288_5 <- sub("\\.wav$", "_140.wav", AR_CD288)
OUT.PATH_AR288_5 <- file.path("New files", basename(FILE.OUT_AR288_5))
writeWave(CUT_AR288_5, extensible = FALSE, filename = OUT.PATH_AR288_5[i])
}, silent = TRUE)
if (inherits(ret, "try-error")) {
# do or log something
}
}

Skip or exit command if R goes over specified memory limit

I would like to run a block of code that skips or exits a command if R goes over a specified memory limit at any time. To illustrate a related example, the following code will skip to the next iteration of the for loop, if the code block takes more than a specified time limit. It will print: '1', 'skip', '2'
params = c(1,4,2)
for(i in params) {
tryCatch(
expr = {
evalWithTimeout({
Sys.sleep(i)
print(i)
},
timeout = 3) #go to next iteration if block takes more than 3 seconds
},
TimeoutException = function(x) cat("skip")
)
}
I would like to do something similar, but skip or exit a command if R goes over a memory limit instead. For example, how can I make the following code print: '1', NOTHING, '2'. Note the second matrix with 1000 rows should be skipped before it is fully built. Also, I will not know the size of the matrix/object that needs to be skipped beforehand, I will only know the memory_limit
unknown = matrix(rnorm(1000*1000), ncol = 1000, nrow = 1000) #unknown object
memory_limit = object.size(unknown)-100000 #known memory limit that happens to be just under the object
##Evaluate_in_memory_limit##{
print(nrow(matrix(rnorm(1*1), ncol = 1, nrow = 1)))
print(nrow(unknown)) #this should be skipped
print(nrow(matrix(rnorm(2*2), ncol = 2, nrow = 2))),
limit = memory_limit
}
An idea:
You could calculate the size of the vector (matrix) beforehand, if you know the length of it in advance.
For
integers: 40 + 4 * n bytes
numeric: 40 + 8 * n bytes
should be the formula for vectors. Check with e.g.
sapply((1:3)^10, function(n) object.size(numeric(n)))
# or for matrix
sapply((1:3)^10, function(n) object.size(matrix(numeric(n))))
Then use system('free') on unix, to determine free memory.
Create your elements in a for loop and use the check in an if condition to next the loop, in case used memory will exceed available.

Torch nn. Current error always is nan

I've wrote the following code:
require 'nn'
require 'cunn'
file = torch.DiskFile('train200.data', 'r')
size = file:readInt()
inputSize = file:readInt()
outputSize = file:readInt()
dataset = {}
function dataset:size() return size end;
for i=1,dataset:size() do
local input = torch.Tensor(inputSize)
for j=1,inputSize do
input[j] = file:readFloat()
end
local output = torch.Tensor(outputSize)
for j=1,outputSize do
output[j] = file:readFloat()
end
dataset[i] = {input:cuda(), output:cuda()}
end
net = nn.Sequential()
hiddenSize = inputSize * 2
net:add(nn.Linear(inputSize, hiddenSize))
net:add(nn.Tanh())
net:add(nn.Linear(hiddenSize, hiddenSize))
net:add(nn.Tanh())
net:add(nn.Linear(hiddenSize, outputSize))
criterion = nn.MSECriterion()
net = net:cuda()
criterion = criterion:cuda()
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.02
trainer.maxIteration = 100
trainer:train(dataset)
And it must works good (At least I think so), and it works correct when inputSize = 20. But when inputSize = 200 current error always is nan. At first I've thought that file reading part is incorrect. I've recheck it some times but it is working great. Also I found that sometimes too small or too big learning rate may affect on it. I've tried learning rate from 0.00001 up to 0.8, but still the same result. What I'm doing wrong?
Thanks,
Igor

Setting a loop in R

I have already discussed a similar type of a question in this following post
How to set a for -loop in R
each file contents as follows:
FILE_1.FASTA
>>TTBK2_Hsap ,(CK1/TTBK)
MSGGGEQLDILSVGILVKERWKVLRKIGGGGFGEIYDALDMLTRENVALKVESAQQPKQVLKMEVAVLKKLQGKDHVCRFIGCGRNDRFNYVVMQLQGRNLADLRRSQSRGTFT
FILE_2.FASTA
>>TTBK2_Hsap ,(CK1/TTBK)
MSGGGEQLDILSVGILVKERWKVLRKIGGGGFGEIYDALDMLTRENVALKVESAQQPKQVLKMEVAVLKKLQGKDHVCRFIGCGRNDRFNYVVMQLQGRNLADLRRSQSRGTFT
However, there is another package in R which works like this:
extractAPAAC(x, props = c("Hydrophobicity", "Hydrophilicity"), lambda = 30,
w = 0.05, customprops = NULL)
I tried creating a function to run it for number of file sequences and the program looks like this
read_and_extract <- function(fasta) {
seq <- readFASTA(fasta)[[1]]
return(extractAPAAC(seq, props = c("Hydrophobicity", "Hydrophilicity"), lambda = 30,
w = 0.05, customprops = NULL))
}
setwd("H:\\CC")
fasta_files <- dir(pattern = "[.]fasta$")
aa_comp <- vapply(fasta_files, read_and_extract, rep(pi, 80))
write.csv(aa_comp, file = "C:\\Users\\PAAC.csv")
This programs shows an error
Error: unexpected ',' in "w = 0.05,"
But I have given w=0.05 as of default value, could anyone tell me where is the actual problem?

R package VLMC dies if state space size exceeds 27

I am using VLMC to fit some Markov models and it dies as soon as the alphabet size reaches 28.
I thought this was due to using a single letter in the alphabet by default, but it has the same behavior with "code1char = FALSE". This is true for me on real data as well as this fake example.
library(VLMC)
# works fine
ins <- sample(seq(1,27,1),50000,replace=T)
vlmc(ins, dump = 1,threshold.gen = 2, debug = TRUE)
#core dump
ins <- sample(seq(1,28,1),50000,replace=T)
vlmc(ins, dump = 1,threshold.gen = 2, debug = TRUE)
Any ideas?
The seg fault looks like this BTW. It looks to me like the alphabet after z is being mapped to NA which is causing an array bound issue.
library(VLMC)
sc <- 10
amp <- 13
x <- round(amp*sin(seq(0,2*sc*pi,0.01)))
x <- amp + x + rpois(NROW(x),1)
length(table(x))
length(x)
vlmc(x, dump = 1,threshold.gen = 2, debug = TRUE)
vlmc: Alpha = 'abcdefghijklmnopqrstuvwxyzNANANANANA' ; |X| = 31
vlmc: ctl.dump = 4 11
vlmc: n = |data| = 6284, cutoff{prune} = 21.8865, threshold{gen} = 2
vlmc: |alphabet| = 31, alphabet = abcdefghijklmnopqrstuvwxyzNA
generating...
*** caught segfault ***
address 0x0, cause 'memory not mapped'
Traceback:
1: .C("vlmc_p", data = Data, n = n, threshold.gen = as.integer(threshold.gen), cutoff.prune = as.double(cutoff.prune), alpha.len = as.integer(alpha.len), alpha = as.character(Alpha), debug = as.integer(as.logical(debug)), dump.flags = as.integer(c(dump, ctl.dump)), size = integer(4), PACKAGE = "VLMC")
2: vlmc(x, dump = 1, threshold.gen = 2, debug = TRUE)
As the maintainer of VLMC,
I can tell you that one of the longest standing TODO entries for VLMC has been to raise the currently builtin limit of 26 for the maximal alphabet size should be raised..
Of course it it is a bug that I don't give an error message in the case of a larger alphabet, but rather pass things to C and do not check there.
The next version of VLMC will not seg.fault for this anymore.
However, I'm not yet sure I'll find the time to allow a considerably larger alphabet....
Of course I'd happily accept patches ... it's free open source software.
Best regards,
Martin Maechler, ETH Zurich

Resources