unexpected maximum possible values for terra::writeRaster according to datatype - raster

When using terra::writeRaster, the maximum possible values allowed for writing depend on the datatype (INT1U, INT2S, INT2U...).
The documentation tells that "When writing integer values the lowest available value (given the datatype) is used [to store NA, I suppose] for signed types, and the highest value is used for unsigned values.". This should give the following range for unsigned types:
INT1U : 0-254 (2^8-1, minus one for NA storage)
INT2U : 0-65,534 (2^16-1, minus one for NA storage)
INT4U : 0-4,294,967,294 (2^32-1, minus one for NA storage)
However, for unsigned datatypes INT2U and INT4U, the maxima I observed on my machine do not fit these expectations:
INT2U : 65,532
INT4U : 4,294,967,292
Why this unexpected maximum values? I ask the question because it is not insignificant, for safe code writing, to exactly know these maximum values before writing files.
I am working under Windows 10. Here is a couple of code lines that I used to check:
library(terra)
terra version 1.3.4
Warning message:
package ‘terra’ was built under R version 4.0.5
r <- rast(ncols=1, nrows=2)
values(r) <- c(65532,65533)
writeRaster(r,"test.tif",wopt=list(datatype="INT2U"))
t <- rast("test.tif")
values(t)
lyr.1
[1,] 65532
[2,] NaN

With the development version I now get the expected result
library(terra)
r <- rast(ncols=1, nrows=4)
values(r) <- 65533:65536
2-byte unsigned integer
x <- writeRaster(r,"test.tif", datatype="INT2U", overwrite=TRUE)
values(x)
# lyr.1
#[1,] 65533
#[2,] 65534
#[3,] NaN
#[4,] NaN
x <- writeRaster(r,"test.tif", datatype="INT2U", NAflag=0, overwrite=TRUE)
values(x)
#[1,] 65533
#[2,] 65534
#[3,] 65535
#[4,] NaN
4-byte unsigned integer
values(r) <- 4294967293:4294967296
x <- writeRaster(r,"test.tif", datatype="INT4U", overwrite=TRUE)
values(x)
# lyr.1
#[1,] 4294967293
#[2,] 4294967294
#[3,] NaN
#[4,] NaN
x <- writeRaster(r,"test.tif", datatype="INT4U", NAflag=0, overwrite=TRUE,)
values(x)
# lyr.1
#[1,] 4294967293
#[2,] 4294967294
#[3,] 4294967295
#[4,] NaN

Related

Error in as.Date.numeric() : 'origin' must be supplied

I have received a paper in which they included the R files for their empirical results. Nevertheless, I have some problems while trying to run their codes:
data <- vni$R[198:length(vni$R)]; date <- vni$Date[198:length(vni$R)]
l <- length(data)
rw_length <- 52 # 52 weeks (~ 1 year)
bound <- vector()
avr <- vector()
for (i in (rw_length+1):l) {
AVR.test <- AutoBoot.test(data[(i-rw_length):i],nboot=2000,"Normal",c(0.025, 0.975))
bound <- append(bound, AVR.test$CI.stat)
avr <- append(avr, AVR.test$test.stat)
}
lower <- bound[seq(1, length(bound), 2)]
upper <- bound[seq(2, length(bound), 2)]
results <- matrix(c(date[(rw_length+1):l],data[(rw_length+1):l],avr,upper, lower),ncol=5, dimnames = list(c(),c("Date", "Return", "AVR", "Upper", "Lower")))
And I get the following error: `
Error in as.Date.numeric(e) : 'origin' must be supplied`
for the results <- matrix(c(date[(rw_length+1):l],data[(rw_length+1):l],avr,upper, lower),ncol=5, dimnames = list(c(),c("Date", "Return", "AVR", "Upper", "Lower")))
My dataset is:
Date P R
1 2001-03-23 259.60 0.0000000000
2 2001-03-30 269.30 0.0366840150
3 2001-04-06 284.69 0.0555748690
4 2001-04-13 300.36 0.0535808860
5 2001-04-20 317.76 0.0563146260
...
935 2019-02-15 950.89 0.0454163960
936 2019-02-22 988.91 0.0392049380
937 2019-03-01 979.63 -0.0094283770
Could you please help me with that issue?
Thanks alot!
Everything in a matrix must be the same class. This is often found when there's a string among numbers, where
m <- matrix(0, nr=2, nc=2)
m
# [,1] [,2]
# [1,] 0 0
# [2,] 0 0
m[1] <- "a"
m
# [,1] [,2]
# [1,] "a" "0"
# [2,] "0" "0"
In this case, you have Date (first column) and numeric (all others? no idea what AutoBoot is). And because it's trying to coerce from least-complex to most-complex (from numeric to Date), the non-Date objects are being converted.
matrix(c(Sys.Date(), 1.1))
# Error in as.Date.numeric(e) : 'origin' must be supplied
I suggest that trying to store this in a matrix is therefore fundamentally flawed. If you want to store a Date object among numbers, you have two options:
Store it as a data.frame, where each column can have its own class.
Pre-convert the "Date" data to numeric and store it as a number. This means that if/when you need the dates to be of class Date again, you'll need to as.Date(..., origin="1970-01-01").

Ipfp function - how to specify the target.list argument

Can someone who understands what the bolded sentence below means please help me understand what it means? It is from R documentation on the Ipfp function. (https://www.rdocumentation.org/packages/mipfp/versions/3.1/topics/Ipfp)
"target.list - A list of dimensions of the marginal target constrains in target.data. Each component of the list is an array whose cells indicate which dimension the corresponding margin relates to.
"target.data - A list containing the data of the target marginal tables. Each component of the list is an array storing a margin. The list order must follow the ordering defined in target.list. Note that the cells of the arrays must be non-negative."
As an example, let's say I have this table:
seed.ex <- array(1,dim=c(3,4))
seed.ex
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 1 1 1 1
[3,] 1 1 1 1
And these targets stored for the 'target.data':
target.row <- c(50,50,100)
target.col <- c(50,50,50,50)
tgt.data.ex <- list(target.row, target.col)
How then should I specify the 'target.list'?
tgt.list.ex <- list(?,?)
That will then go into the Ipfp fuction...
res.ex <- Ipfp(seed.ex, tgt.list.ex, tgt.data.ex, print = TRUE, iter = 1000)
I'm pretty sure that the answer is list(1,2). I've never used this package, but both of the examples that were using two-dimensional seed had that as the answer. Here's my test:
tgt.list.ex <- list(1,2)
also installing the dependency ‘cmm’
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/cmm_0.8.tgz'
Content type 'application/x-gzip' length 197741 bytes (193 KB)
==================================================
downloaded 193 KB
trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/mipfp_3.1.tgz'
Content type 'application/x-gzip' length 333155 bytes (325 KB)
==================================================
downloaded 325 KB
The downloaded binary packages are in
/var/folders/yq/m3j1jqtj6hq6s5mq_v0jn3s80000gn/T//Rtmp0NHfpB/downloaded_packages
> library(mipfp)
Loading required package: cmm
Loading required package: Rsolnp
Loading required package: numDeriv
> res.ex <- Ipfp(seed.ex, tgt.list.ex, tgt.data.ex, print = TRUE, iter = 1000)
Margins consistency checked!
... ITER 1
stoping criterion: 24
... ITER 2
stoping criterion: 0
Convergence reached after 2 iterations!
> res.ex
Call:
Ipfp(seed = seed.ex, target.list = tgt.list.ex, target.data = tgt.data.ex,
print = TRUE, iter = 1000)
Method: ipfp - convergence: TRUE
Estimates:
[,1] [,2] [,3] [,4]
[1,] 12.5 12.5 12.5 12.5
[2,] 12.5 12.5 12.5 12.5
[3,] 25.0 25.0 25.0 25.0

R memory allocation with cbind

I'm taking a large (dense) network matrix and converting it to an edgelist. Yet, when I do so, the memory allocated in R seems crazy. In my case, I have a 12MB matrix (1259 x 1259) that when converted to an edgelist (i, j, w) is taking up 71MB of memory! I'm using the igraph package to perform the operations, but I don't think it is related to that.
Here is what I'm doing with made up data.
library(igraph)
A <- matrix(runif(25), 5, 5)
A <- A %*% t(A)
diag(A) <- 0
I made the matrix symmetric and diagonal 0 because that is what my data looks like, but I don't think it matters for this question.
Here I use igraph:
# using igraph here
adj <- graph.adjacency(as.matrix(A),weighted=TRUE)
object.size(A) # 400 bytes
object.size(adj) # 2336 bytes
I get that the igraph adj object will be bigger. That isn't the issue.
el <- get.edgelist(adj)
class(el) # "matrix"
object.size(el) # 520 bytes
w <- E(adj)$weight
class(w) # "numeric"
object.size(w) # 200 bytes
# expect something ~720 bytes
adj_w <- cbind(el,w)
class(adj_w) # "matrix"
object.size(adj_w) # 1016 bytes
Why is the memory on adj_w so much larger? It doesn't even seem to be linear since here, the original to final is 400 bytes to 1016 bytes but in my (bigger) data it is 12MB to 71MB.
FYI: I'm using RStudio locally on a Macbook Pro with the latest versions (just installed it all last week).
adj_w is larger because cbind added a column name. Remove it and you're back to the correct size.
head(adj_w)
# w
# [1,] 1 2 1.189969
# [2,] 1 3 1.100843
# [3,] 1 4 0.805436
# [4,] 1 5 1.001632
# [5,] 2 1 1.189969
# [6,] 2 3 1.265916
object.size(adj_w)
# 1016 bytes
attributes(adj_w)
# $dim
# [1] 20 3
#
# $dimnames
# $dimnames[[1]]
# NULL
#
# $dimnames[[2]]
# [1] "" "" "w"
#
#
adj_w2 <- adj_w
dimnames(adj_w2) <- NULL
object.size(adj_w2)
# 680 bytes
To avoid the automatic column name addition, you can first convert your vector to a matrix...
adj_w3 <- cbind(el, matrix(w))
object.size(adj_w3)
# 680 bytes
...or, alternatively, pass the deparse.level = 0 argument to cbind.
adj_w4 <- cbind(el, w, deparse.level = 0)
object.size(adj_w4)
# 680 bytes

Gibbs Sampler (Albert and Chib) for Binary Probit (rbprobitGibbs) A precision matrix

Presently, I am working through the above in the RStudio help file, which contains the following sample:
##
## rbprobitGibbs example
##
if(nchar(Sys.getenv("LONG_TEST")) != 0) {R=2000} else {R=10}
set.seed(66)
simbprobit = function(X,beta) {
## function to simulate from binary probit including x variable
y=ifelse((X%*%beta+rnorm(nrow(X)))<0,0,1)
list(X=X,y=y,beta=beta)
}
nobs=200
X=cbind(rep(1,nobs),runif(nobs),runif(nobs))
beta=c(0,1,-1)
nvar=ncol(X)
simout=simbprobit(X,beta)
Data1=list(X=simout$X,y=simout$y)
Mcmc1=list(R=R,keep=1)
out=rbprobitGibbs(Data=Data1,Mcmc=Mcmc1)
summary(out$betadraw,tvalues=beta)
if(0){
## plotting example
plot(out$betadraw,tvalues=beta)
}
When I step through the code, I don't see anywhere that the A matrix is set. It is only when I reach this line:
out=rbprobitGibbs(Data=Data1,Mcmc=Mcmc1)
That I see the A matrix displayed in the output, which I understand has to be a k * k matrix, where betabar is k * 1 matrix.
Prior Parms:
betabar
# [1] 0 0 0
A
# [,1] [,2] [,3]
# [1,] 0.01 0.00 0.00
# [2,] 0.00 0.01 0.00
# [3,] 0.00 0.00 0.01
So I can understand how A gets its dimensions; however, what is not clear to my is how the values in A are set to 0.01. I am trying to figure out how I can allow a user calling the rbprobitGibbs function to set the precision via A to whatever they like. I can see where A is output, but how are its values based on some input? Does anyone have any suggestions? TIA.
UPDATE:
Here is the output produced, but as far as I can determine it is identical whether I use prior = list(rep(0,3), .2*diag(3)) or not:
> out
$betadraw
[,1] [,2] [,3]
[1,] 0.3565099 0.6369436 -0.9859025
[2,] 0.4705437 0.7211755 -1.1955608
[3,] 0.1478930 0.6538157 -0.6989660
[4,] 0.4118663 0.7910846 -1.3919411
[5,] 0.0385419 0.9421720 -0.7359932
[6,] 0.1091359 0.7991905 -0.7731041
[7,] 0.4072556 0.5183280 -0.7993501
[8,] 0.3869478 0.8116237 -1.2831395
[9,] 0.8893555 0.5448905 -1.8526630
[10,] 0.3165972 0.6484716 -0.9857531
attr(,"class")
[1] "bayesm.mat" "mcmc"
attr(,"mcpar")
[1] 1 10 1
It gets this factor by a scaling constant on the prior precision matrix. In the source, you will note that if you do not supply a prior precision then it will generate a square k matrix and multiply it by .1. Nothing fancy here. These scaling parameters for all of the various functions in bayesm can be found in the ./bayesm/R/bayesmConstants.R file.
if (is.null(Prior$A)) {
A = BayesmConstant.A * diag(nvar)
}
Should you like to you could supply your own constant, say .2, you could do so as follows, prior = list(rep(0,k), .2*diag(k)), or even introduce some relational information into the prior.
Very late to the party, but I ran across this same issue and just figured it out. In order to change the A matrix and prior matrix you have to name them as well since all of your other input variables are named.
For example your code should be,
rbprobitGibbs(Data=Data1, Prior=list(betabar=betabar1, A=A1), Mcmc=Mcmc1)
If you do that, you are able to set your own values for betabar and A.

Mystified by qr.Q(): what is an orthonormal matrix in "compact" form?

R has a qr() function, which performs QR decomposition using either LINPACK or LAPACK (in my experience, the latter is 5% faster). The main object returned is a matrix "qr" that contains in the upper triangular matrix R (i.e. R=qr[upper.tri(qr)]). So far so good. The lower triangular part of qr contains Q "in compact form". One can extract Q from the qr decomposition by using qr.Q(). I would like to find the inverse of qr.Q(). In other word, I do have Q and R, and would like to put them in a "qr" object. R is trivial but Q is not. The goal is to apply to it qr.solve(), which is much faster than solve() on large systems.
Introduction
R uses the LINPACK dqrdc routine, by default, or the LAPACK DGEQP3 routine, when specified, for computing the QR decomposition. Both routines compute the decomposition using Householder reflections. An m x n matrix A is decomposed into an m x n economy-size orthogonal matrix (Q) and an n x n upper triangular matrix (R) as A = QR, where Q can be computed by the product of t Householder reflection matrices, with t being the lesser of m-1 and n: Q = H1H2...Ht.
Each reflection matrix Hi can be represented by a length-(m-i+1) vector. For example, H1 requires a length-m vector for compact storage. All but one entry of this vector is placed in the first column of the lower triangle of the input matrix (the diagonal is used by the R factor). Therefore, each reflection needs one more scalar of storage, and this is provided by an auxiliary vector (called $qraux in the result from R's qr).
The compact representation used is different between the LINPACK and LAPACK routines.
The LINPACK Way
A Householder reflection is computed as Hi = I - viviT/pi, where I is the identity matrix, pi is the corresponding entry in $qraux, and vi is as follows:
vi[1..i-1] = 0,
vi[i] = pi
vi[i+1:m] = A[i+1..m, i] (i.e., a column of the lower triangle of A after calling qr)
LINPACK Example
Let's work through the example from the QR decomposition article at Wikipedia in R.
The matrix being decomposed is
> A <- matrix(c(12, 6, -4, -51, 167, 24, 4, -68, -41), nrow=3)
> A
[,1] [,2] [,3]
[1,] 12 -51 4
[2,] 6 167 -68
[3,] -4 24 -41
We do the decomposition, and the most relevant portions of the result is shown below:
> Aqr = qr(A)
> Aqr
$qr
[,1] [,2] [,3]
[1,] -14.0000000 -21.0000000 14
[2,] 0.4285714 -175.0000000 70
[3,] -0.2857143 0.1107692 -35
[snip...]
$qraux
[1] 1.857143 1.993846 35.000000
[snip...]
This decomposition was done (under the covers) by computing two Householder reflections and multiplying them by A to get R. We will now recreate the reflections from the information in $qr.
> p = Aqr$qraux # for convenience
> v1 <- matrix(c(p[1], Aqr$qr[2:3,1]))
> v1
[,1]
[1,] 1.8571429
[2,] 0.4285714
[3,] -0.2857143
> v2 <- matrix(c(0, p[2], Aqr$qr[3,2]))
> v2
[,1]
[1,] 0.0000000
[2,] 1.9938462
[3,] 0.1107692
> I = diag(3) # identity matrix
> H1 = I - v1 %*% t(v1)/p[1] # I - v1*v1^T/p[1]
> H2 = I - v2 %*% t(v2)/p[2] # I - v2*v2^T/p[2]
> Q = H1 %*% H2
> Q
[,1] [,2] [,3]
[1,] -0.8571429 0.3942857 0.33142857
[2,] -0.4285714 -0.9028571 -0.03428571
[3,] 0.2857143 -0.1714286 0.94285714
Now let's verify the Q computed above is correct:
> qr.Q(Aqr)
[,1] [,2] [,3]
[1,] -0.8571429 0.3942857 0.33142857
[2,] -0.4285714 -0.9028571 -0.03428571
[3,] 0.2857143 -0.1714286 0.94285714
Looks good! We can also verify QR is equal to A.
> R = qr.R(Aqr) # extract R from Aqr$qr
> Q %*% R
[,1] [,2] [,3]
[1,] 12 -51 4
[2,] 6 167 -68
[3,] -4 24 -41
The LAPACK Way
A Householder reflection is computed as Hi = I - piviviT, where I is the identity matrix, pi is the corresponding entry in $qraux, and vi is as follows:
vi[1..i-1] = 0,
vi[i] = 1
vi[i+1:m] = A[i+1..m, i] (i.e., a column of the lower triangle of A after calling qr)
There is another twist when using the LAPACK routine in R: column pivoting is used, so the decomposition is solving a different, related problem: AP = QR, where P is a permutation matrix.
LAPACK Example
This section does the same example as before.
> A <- matrix(c(12, 6, -4, -51, 167, 24, 4, -68, -41), nrow=3)
> Bqr = qr(A, LAPACK=TRUE)
> Bqr
$qr
[,1] [,2] [,3]
[1,] 176.2554964 -71.1694118 1.668033
[2,] -0.7348557 35.4388886 -2.180855
[3,] -0.1056080 0.6859203 -13.728129
[snip...]
$qraux
[1] 1.289353 1.360094 0.000000
$pivot
[1] 2 3 1
attr(,"useLAPACK")
[1] TRUE
[snip...]
Notice the $pivot field; we will come back to that. Now we generate Q from the information the Aqr.
> p = Bqr$qraux # for convenience
> v1 = matrix(c(1, Bqr$qr[2:3,1]))
> v1
[,1]
[1,] 1.0000000
[2,] -0.7348557
[3,] -0.1056080
> v2 = matrix(c(0, 1, Bqr$qr[3,2]))
> v2
[,1]
[1,] 0.0000000
[2,] 1.0000000
[3,] 0.6859203
> H1 = I - p[1]*v1 %*% t(v1) # I - p[1]*v1*v1^T
> H2 = I - p[2]*v2 %*% t(v2) # I - p[2]*v2*v2^T
> Q = H1 %*% H2
[,1] [,2] [,3]
[1,] -0.2893527 -0.46821615 -0.8348944
[2,] 0.9474882 -0.01602261 -0.3193891
[3,] 0.1361660 -0.88346868 0.4482655
Once again, the Q computed above agrees with the R-provided Q.
> qr.Q(Bqr)
[,1] [,2] [,3]
[1,] -0.2893527 -0.46821615 -0.8348944
[2,] 0.9474882 -0.01602261 -0.3193891
[3,] 0.1361660 -0.88346868 0.4482655
Finally, let's compute QR.
> R = qr.R(Bqr)
> Q %*% R
[,1] [,2] [,3]
[1,] -51 4 12
[2,] 167 -68 6
[3,] 24 -41 -4
Notice the difference? QR is A with its columns permuted given the order in Bqr$pivot above.
I have researched for this same problem as the OP asks and I don't think it is possible. Basically the OP question is whether having the explicitly computed Q, one can recover the H1 H2 ... Ht. I do not think this is possible without computing the QR from scratch but I would also be very interested to know whether there is such solution.
I have a similar issue as the OP but in a different context, my iterative algorithm needs to mutate the matrix A by adding columns and/or rows. The first time, the QR is computed using DGEQRF and thus, the compact LAPACK format. After the matrix A is mutated e.g. with new rows I can quickly build a new set of reflectors or rotators that will annihilate the non-zero elements of the lowest diagonal of my existing R and build a new R but now I have a set of H1_old H2_old ... Hn_old and H1_new H2_new ... Hn_new (and similarly tau's) which can't be mixed up into a single QR compact representation. The two possibilities I have are, and maybe the OP has the same two possibilities:
Always maintain Q and R explicitly separated whether when computed the first time or after every update at the cost of extra flops but keeping the required memory well bounded.
Stick to the compact LAPACK format but then every time a new update comes in, keep a list of all these mini sets of update reflectors. At the point of solving the system one would do a big Q'*c i.e. H1_u3*H2_u3*...*Hn_u3*H1_u2*H2_u2*...*Hn_u2*H1_u1*H2_u1...*Hn_u1*H1*H2*...*Hn*c where ui is the QR update number and this is potentially a lot of multiplications to do and memory to keep track of but definitely the fastest way.
The long answer from David basically explains what the compact QR format is but not how to get to this compact QR format having the explicit computed Q and R as input.

Resources