Rcpp submat from a big sparse matrix - r

I am trying to multiply a vec by a subset of a very big sparse matrix (as the script followed), but it fails to complier when using sourceCpp, it reports error: no matching function for call to ‘arma::SpMat<double>::submat(arma::uvec&, arma::uvec&), it would be much appreciate if someone could do me a favour.
#include <RcppArmadillo.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double myf(sp_mat X, vec g, uvec xi){
double u = g(xi).t() * X.submat(xi, xi) * g(xi);
return u;
}

So, as #RalfStubner mentioned, the matrix access for sparse matrices is continuous only. That said, the access approach taken is symmetric for the actual sparse matrix since the same index is being used. So, in this case, it makes sense to revert back to a standard element accessor of (x,y). As a result, the summation reduction can be done with a single loop.
#include <RcppArmadillo.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double submat_multiply(const arma::sp_mat& X,
const arma::vec& g, const arma::uvec& xi){
// Add an assertion
if(X.n_rows != g.n_elem) {
Rcpp::stop("Mismatched row and column dimensions of X (%s) and g (%s).",
X.n_rows, g.n_elem);
}
// Reduction
double summed = 0;
for (unsigned int i = 0; i < xi.n_elem; ++i) {
// Retrieve indexing element
arma::uword index_at_i = xi(i);
// Add components together
summed += g(index_at_i) * X(index_at_i, index_at_i) * g(index_at_i);
}
// Return result
return summed;
}
Another approach, but potentially more costly, would be to extract out the diagonal of the sparse matrix and convert it to a dense vector. From there apply an element-wise multiplication and summation.
#include <RcppArmadillo.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double submat_multiply_v2(const arma::sp_mat& X,
const arma::vec& g, const arma::uvec& xi){
// Add an assertion
if(X.n_rows != g.n_elem) {
Rcpp::stop("Mismatched row and column dimensions of X (%s) and g (%s).",
X.n_rows, g.n_elem);
}
// Copy sparse diagonal to dense vector
arma::vec x_diag(X.diag());
// Obtain the subset
arma::vec g_sub = g.elem(xi);
// Perform element-wise multiplication and then sum.
double summed = arma::sum(g_sub % x_diag.elem(xi) % g_sub);
// Return result
return summed;
}
Test code:
# Sparse matrix
library(Matrix)
i <- c(1,4:8,10); j <- c(2, 9, 6:10); x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)
X
# 10 x 10 sparse Matrix of class "dgCMatrix"
#
# [1,] . 7 . . . . . . . .
# [2,] . . . . . . . . . .
# [3,] . . . . . . . . . .
# [4,] . . . . . . . . 14 .
# [5,] . . . . . 21 . . . .
# [6,] . . . . . . 28 . . .
# [7,] . . . . . . . 35 . .
# [8,] . . . . . . . . 42 .
# [9,] . . . . . . . . . .
# [10,] . . . . . . . . . 49
# Vector
g <- 1:10
# Indices
xi <- c(0, 3, 4, 9)
# Above function
submat_multiply(X, g, xi)
# [1] 4900
submat_multiply_v2(X, g, xi)
# [1] 4900

Related

Why isn't column-wise operation much faster than row-wise operation (as it should be) for a matrix in R

Consider the following functions which store values row-wise-ly and column-wise-ly.
#include <Rcpp.h>
using namespace Rcpp;
const int m = 10000;
const int n = 3;
// [[Rcpp::export]]
SEXP rowWise() {
SEXP A = Rf_allocMatrix(INTSXP, m, n);
int* p = INTEGER(A);
int i, j;
for (i = 0; i < m; i++){
for(j = 0; j < n; j++) {
p[m * j + i] = j;
}
}
return A;
}
// [[Rcpp::export]]
SEXP columnWise() {
SEXP A = Rf_allocMatrix(INTSXP, n, m);
int* p = INTEGER(A);
int i, j;
for(j = 0; j < m; j++) {
for (i = 0; i < n; i++){
p[n * j + i] = i;
}
}
return A;
}
/*** R
library(microbenchmark)
gc()
microbenchmark(
rowWise(),
columnWise(),
times = 1000
)
*/
The above code yields
Unit: microseconds
expr min lq mean median uq max neval
rowWise() 12.524 18.631 64.24991 20.4540 24.8385 10894.353 1000
columnWise() 11.803 19.434 40.08047 20.9005 24.1585 8590.663 1000
Assigning values row-wise-ly is faster (if not slower) than assigning them column-wise-ly, which is counter-intuitive to what I believe.
However, it does depend magically on the value of m and n. So I guess my question is: why columnWise is not much faster than rowWise?
The dimension (shape) of the matrix has an impact.
When we do a row-wise scan of a 10000 x 3 integer matrix A, we can still effectively do caching. For simplicity of illustration, I assume that each column of A are aligned to a cache line.
--------------------------------------
A[1, 1] A[1, 2] A[1, 3] M M M
A[2, 1] A[2, 2] A[2, 3] H H H
. . . . . .
. . . . . .
A[16,1] A[16,2] A[16,3] H H H
--------------------------------------
A[17,1] A[17,2] A[17,3] M M M
A[18,1] A[18,2] A[18,3] H H H
. . . . . .
. . . . . .
A[32,1] A[32,2] A[32,3] H H H
--------------------------------------
A[33,1] A[33,2] A[33,3] M M M
A[34,1] A[34,2] A[34,3] H H H
. . . . . .
. . . . . .
A 64-bit cache line can hold 16 integers. When we access A[1, 1], a full cache line is filled, that is, A[1, 1] to A[16, 1] are all loaded into cache. When we scan a row A[1, 1], A[1, 2], A[1, 3], a 16 x 3 matrix is loaded into cache and it is much smaller than cache capacity (32 KB). While we have a cache miss (M) for each element in the 1st row, when we start to scan the 2nd row, we have a cache hit (H) for every element. So we have a periodic pattern as such:
[3 Misses] -> [45 Hits] -> [3 Misses] -> [45 Hits] -> ...
That is, we have on average a cache miss ratio of 3 / 48 = 1 / 16 = 6.25%. In fact, this equals to the cache miss ratio if we scan A column-wise, where we have the following periodic pattern:
[1 Miss] -> [15 Hits] -> [1 Miss] -> [15 Hits] -> ...
Try a 5000 x 5000 matrix. In that case, after reading the first row, 16 x 5000 elements are fetched into cache but that is much larger than cache capacity so cache eviction has happened to kick out the A[1, 1] to A[16, 1] (most cache applies "least recently unused" cache line replacement policy). When we come back to scan the 2nd row, we have to fetch A[2, 1] from RAM again. So a row-wise scan gives a cache miss ratio of 100%. In contrasts, a column-wise scan only has a cache miss ratio of 1 / 16 = 6.25%. In this example, we will observe that column-wise scan is much faster.
In summary, with a 10000 x 3 matrix, we have the same cache performance whether we scan it by row or column. I don't see that rowWise is faster than columnWise from the median time reported by microbenchmark. Their execution time may not be exactly equal, but the difference is too minor to cause our concern.
For a 5000 x 5000 matrix, rowWise is much slower than columnWise.
Thanks for verification.
Remark
The "golden rule" that we should ensure sequential memory access in the innermost loop is a general guideline for efficiency. But don't understand it in the narrow sense.
In fact, if you treat the three columns of A as three vectors x, y, z, and consider the element-wise addition (i.e., the row-wise sum of A): z[i] = x[i] + y[i], are we not having a sequential access for all three vectors? Doesn't this fall into the "golden rule"? Scanning a 10000 x 3 matrix by row is no difference from alternately reading three vectors sequentially. And this is very efficient.

R equivalent of the Matlab spy function

In Matlab, there is a function named spy, which displays the structure of a sparse matrix. It creates a plot of the matrix' dimensions where each entry that has a nonzero value is colored. Is there an equivalent function in R?
image() from Matrix is one option.
library(Matrix)
# Example from ?Matrix:::sparseMatrix
i <- c(1,3:8); j <- c(2,9,6:10); x <- 7 * (1:7)
A <- sparseMatrix(i, j, x = x)
print(A)
##8 x 10 sparse Matrix of class "dgCMatrix"
##[1,] . 7 . . . . . . . .
##[2,] . . . . . . . . . .
##[3,] . . . . . . . . 14 .
##[4,] . . . . . 21 . . . .
##[5,] . . . . . . 28 . . .
##[6,] . . . . . . . 35 . .
##[7,] . . . . . . . . 42 .
##[8,] . . . . . . . . . 49
image(A)
To get the output of spy() in R, it takes a little bit more work.
In MATLAB (2011b):
spy()
h = gcf;
axObj = get(h, 'Children');
datObj = get(axObj, 'Children');
xdata = get(datObj,'XData');
ydata = get(datObj,'YData');
spyMat = [xdata; ydata];
csvwrite('spydat.csv',spyMat);
And in R:
library(Matrix)
spyData <- read.csv("spydat.csv")
spyMat <- t(sparseMatrix(spyData[1,],spyData[2,]))
image(spyMat)
A simple function that duplicates the Matlab spy() function in R, based on above ideas, is:
library(Matrix)
spy <- function(w){
# Get indices not equal to zero
inds <- which(w != 0, arr.ind=TRUE )
# Create sparse matrix with ones where not zero
A <- sparseMatrix(inds[,1], inds[,2], x = rep(1,nrow(inds)))
#
image(A))
}
This may be useful for some applications.

R apply() function slow for large row sizes while using %in% or == operators?

I am trying to figure out why apply functions (with multiple %in% and == operators inside) get terribly slow for very large row sizes.
A relevant discussion was made in this link , however for my particular case, I believe speed by vectorization might not solve my problem - (Am I correct to assume this ??)
apply() is slow - how to make it faster or what are my alternatives?
I am adding the code to generate a representative data for my problem and the associated benchmarking results.
set.seed(123)
# Representative data
data <- matrix(rnorm(25000*1000),byrow=T,ncol=1000,nrow=25000)
tmp_data <- data
# Discretizing the data
data[tmp_data <=-1] = -2
data[tmp_data >= 1] = 2
data[tmp_data > -1 & tmp_data < 1] = 0
rm(tmp_data)
rownames(data) <- paste("Gene",c(1:nrow(data)),sep="_")
colnames(data) <- paste("Sample",c(1:ncol(data)),sep="_")
# Pair combination of any 2000 sampled rownames
gene_sample <- rownames(data)[sample(c(1:nrow(data)), 2000, replace=F)]
gene_pairs <- t(combn(gene_sample,2))
# Different size of rows to be generated for speed testing
test_size = c(500, 1000, 5000, 10000, 20000, 50000, 100000)
time_measure <- list()
for ( i in 1: length(test_size))
{
sample_rows <- sample(nrow(gene_pairs),test_size[i],replace=F)
gene_pairs_sel <- gene_pairs[sample_rows,]
proc_time <- system.time(
# The actual analysis I perform within the apply function
# The aim is to calculate a co occurance score and something like a mutual
# information score, for the co-occurances of 2's in the given gene pairs (i.e per row) from the main data.
# Simply put all the common samples having common 2's between the two row names (pairs of genes)
result <- t(apply(gene_pairs_sel,1,function(y){
pat1 <- data[rownames(data) %in% y[1],,drop=F]
pat1 <- colnames(pat1[,which(pat1 == 2)])
pat2 <- data[rownames(data) %in% y[2],,drop=F]
pat2 <- colnames(pat2[,which(pat2 == 2)])
common_pat <- intersect(pat1,pat2)
if(length(common_pat)>0)
{
mis <- round((length(common_pat)/ncol(data)) * log2((length(common_pat)/ncol(data))/((length(pat1)/ncol(data)) * (length(pat2)/ncol(data)))),3)
co_occur <- round(length(common_pat)/ncol(data),3)
}else{mis=0;co_occur=0}
return(c(sort(as.character(y[c(1,2)])),co_occur,mis))
}))
)
time_measure[[i]] <- proc_time
}
names(time_measure) <- paste("For_row_size_of",test_size,sep="_")
## time_measure ##
$For_row_size_500
user system elapsed
2.569 0.000 2.571
$For_row_size_1000
user system elapsed
5.000 0.000 5.001
$For_row_size_5000
user system elapsed
25.498 0.212 25.715
$For_row_size_10000
user system elapsed
50.271 0.124 50.389
$For_row_size_20000
user system elapsed
100.942 0.012 100.956
$For_row_size_50000
user system elapsed
250.760 0.356 251.134
$'For_row_size_1e+05'
user system elapsed
496.655 0.712 497.410
As you all see the computing time increases with increasing row sizes quite exponentially !!
The row sizes that I am dealing with are atleast 3 times bigger than the max size (i.e nrow = 100000) that I have used here for benchmarking .The whole analysis with ~ 500 matrices with large varying row sizes (> 100000) is taking me way too long to compute. Is there any way to speed this up substantially by some manipulation (of or within) apply() ??
I was hoping for a solution without having to resort to parallelization approaches as I am not too familiar with R packages like snow or multicore, but am not averse to using them if need be :-)
Help is much appreciated !!
Regards
Ashwin
For some data frame of gene pairs
sample_rows <- sample(nrow(gene_pairs),test_size[i],replace=FALSE)
df <- data.frame(gene1=gene_pairs[sample_rows, 1],
gene2=gene_pairs[sample_rows, 2],
stringsAsFactors=FALSE)
The focus is on data values equal to 2 so let's get that out of the way
data2 = data == 2
We need the number of samples of gene 1 and gene 2
df$n1 <- rowSums(data2[df$gene1,])
df$n2 <- rowSums(data2[df$gene2,])
and the number of times genes 1 and 2 co-occur
df$n12 <- rowSums(data2[df$gene1,] & data2[df$gene2,])
The statistics are then
df$co_occur <- df$n12 / ncol(data)
tmp <- df$n1 * df$n2 / (ncol(data) * ncol(data))
df$mis <- df$co_occur * log2(df$co_occur / tmp)
There is no need for an explicit loop. As a slightly modified function we might have
cooccur <- function(data, gene1, gene2) {
data <- data == 2
x1 <- rowSums(data)[gene1] / ncol(data)
x2 <- rowSums(data)[gene2] / ncol(data)
x12 <- rowSums(data[gene1,] & data[gene2,]) / (ncol(data)^2)
data.frame(gene1=gene1, gene2=gene2,
co_occur=x12, mis=x12 * log2(x12 / (x1 * x2)))
}
If there are very many rows in df, then it would make sense to process these in groups of say 500000. This still scales linearly, but is about 25x faster (e.g., about 3s for 10000 rows) than the original implementation. There are probably significant further space / time speed-ups to be had, particularly by treating the data matrix as sparse. No guarantees that I've accurately parsed the original code.
This can be optimized a little by looking up the character-based row index once and using the integer index instead, i1 <- match(gene1, rownames(data)), etc., but the main memory and speed limitation is the calculation of x12. It's relatively easy to implement this in C, using the inline package. We might as well go for broke and use multiple cores if available
library(inline)
xprod <- cfunction(c(data="logical", i1="integer", i2="integer"), "
const int n = Rf_length(i1),
nrow = INTEGER(Rf_getAttrib(data, R_DimSymbol))[0],
ncol = INTEGER(Rf_getAttrib(data, R_DimSymbol))[1];
const int *d = LOGICAL(data),
*row1 = INTEGER(i1),
*row2 = INTEGER(i2);
SEXP result = PROTECT(Rf_allocVector(INTSXP, n));
memset(INTEGER(result), 0, sizeof(int) * n);
int *sum = INTEGER(result);
for (int j = 0; j < ncol; ++j) {
const int j0 = j * nrow - 1;
#pragma omp parallel for
for (int i = 0; i < n; ++i)
sum[i] += d[j0 + row1[i]] * d[j0 + row2[i]];
}
UNPROTECT(1);
return result;
", cxxargs="-fopenmp -O3", libargs="-lgomp")
A more optimized version is then
cooccur <- function(data, gene1, gene2) {
data <- (data == 2)[rownames(data) %in% c(gene1, gene2), , drop=FALSE]
n2 <- ncol(data)^2
i1 <- match(gene1, rownames(data))
i2 <- match(gene2, rownames(data))
x <- rowSums(data)
x_12 <- x[i1] * x[i2] / n2
x12 <- xprod(data, i1, i2) / n2
data.frame(gene1=gene1, gene2=gene2,
co_occur=x12, mis=x12 * log2(x12 / x_12))
}
handling for me 1,000,000 gene pairs in about 2s. This still scales linearly with number of gene pairs; the openMP parallel evaluation isn't supported under the clang compiler, and this seems like one of those relatively rare situations where my code, on my processor, benefited substantially from re-arrangement to localize data access.
Here is what I see when I Rprof your code. Half the time is in the "%in%" function: This shows the hierarchy of the function calls. It ran for 23 seconds and all that time was within "FUN" which is within the "apply" call
0 23.2 root
1. 23.2 "source"
2. . 23.2 "withVisible"
3. . . 23.2 "eval"
4. . . . 23.2 "eval"
5. . . . . 23.2 "system.time"
6. . . . . . 23.2 "t"
7. . . . . . . 23.2 "apply"
8. . . . . . . . 23.1 "FUN"
9. . . . . . . . . 11.7 "%in%" ##half the time is here
10. . . . . . . . . . 10.9 "match"
11. . . . . . . . . . . 0.0 "rownames"
9. . . . . . . . . 0.5 "colnames"
10. . . . . . . . . . 0.4 "is.data.frame"
11. . . . . . . . . . . 0.3 "which"
12. . . . . . . . . . . . 0.2 "=="
10. . . . . . . . . . 0.0 "NCOL"
9. . . . . . . . . 0.3 "intersect"
10. . . . . . . . . . 0.3 "unique"
11. . . . . . . . . . . 0.0 "unique.default"
11. . . . . . . . . . . 0.0 "match"
10. . . . . . . . . . 0.0 "as.vector"
9. . . . . . . . . 0.3 "sort"
10. . . . . . . . . . 0.1 "sort.default"
11. . . . . . . . . . . 0.1 "sort.int"
12. . . . . . . . . . . . 0.0 "any"
12. . . . . . . . . . . . 0.0 "is.na"
9. . . . . . . . . 0.1 "c"
6. . . . . . 0.0 "gc"

MPI_Gather of columns

I have an array which is split up by columns between the processes for my calculation. Afterwards I want to gather this array in one process (0).
Each process has its columns saved in array A, process 0 has an array F for collecting the data. The F-array is of size n*n, each process has part_size columns, so the local arrays A are n*part_size. Columns are sent to alternating processes - c0 goes to p0, c1 to p1, c2 to p0 again and so on.
I created new datatypes for sending and receiving the columns.
On all processes:
MPI_Type_vector(n, 1, part_size, MPI::FLOAT, &col_send);
MPI_Type_commit(&col_send);
On process 0:
MPI_Type_vector(n, 1, n, MPI::FLOAT, &col_recv);
MPI_Type_commit(&col_recv);
Now I would like to gather the array as follows:
MPI_Gather(&A, part_size, col_send, &F, part_size, col_recv, 0, MPI::COMM_WORLD);
However the result is not as expected. My example has n = 4 and two processes. As a result the values from p0 should be in columns 0 and 2 of F and p1 should be stored in 1 and 3. Instead both columns of p0 are stored in 0 and 1, while the values of p1 are not there at all.
0: F[0][0]: 8.31786
0: F[0][1]: 3.90439
0: F[0][2]: -60386.2
0: F[0][3]: 4.573e-41
0: F[1][0]: 0
0: F[1][1]: 6.04768
0: F[1][2]: -60386.2
0: F[1][3]: 4.573e-41
0: F[2][0]: 0
0: F[2][1]: 8.88266
0: F[2][2]: -60386.2
0: F[2][3]: 4.573e-41
0: F[3][0]: 0
0: F[3][1]: 0
0: F[3][2]: -60386.2
0: F[3][3]: 4.573e-41
I'll admit that I'm out of ideas on this one. I obviously misunderstood how Gather or Type_vector works and saves their values. Could someone point me in the right direction? Any help would be much appreciated.
The problem that I see is that the datatype created with MPI_Type_vector() has extent going from the first to the last item. For example:
The extent for your col_recv datatype is between > and < (I hope this representation of the mask is clear enough):
>x . . .
x . . .
x . . .
x<. . .
That is 13 MPI_FLOAT items (must be read by row, that's C ordering).
receiving two of them will lead to:
>x . . .
x . . .
x . . .
x y . .
. y . .
. y . .
. y . .
That clearly is not what you want.
To let the MPI_Gather() properly skip data on the receiver you need to set the extent of col_recv as large as exactly ONE ELEMENT. You can do this by using MPI_Type_create_resized():
>x<. . .
x . . .
x . . .
x . . .
so that receiving successive blocks gets correctly interleaved:
x y . .
x y . .
x y . .
x y . .
However receiving two columns instead of one will lead to:
x x y y
x x y y
x x y y
x x y y
That again is not what you want, even if closer.
Since you want interleaved columns, you need to create a more complex datatype, capable of describing all the columns, with 1-item-extent as before:
Each column is separated (stride) as one ELEMENT (that is the extent - not the size, that is 4 elements - of the previously defined column):
>x<. x .
x . x .
x . x .
x . x .
receiving one of them per processor you'll get what you want:
x y x y
x y x y
x y x y
x y x y
You can do it with MPI_Type_create_darray() as well, since it allow to create datatypes suitable to be used with the block-cyclic distribution of scalapack, being your one a 1D subcase of it.
I have also tried it. Here is a working code, on two processors:
#include <mpi.h>
#define N 4
#define NPROCS 2
#define NPART (N/NPROCS)
int main(int argc, char **argv) {
float a_send[N][NPART];
float a_recv[N][N] = {0};
MPI_Datatype column_send_type;
MPI_Datatype column_recv_type;
MPI_Datatype column_send_type1;
MPI_Datatype column_recv_type1;
MPI_Datatype matrix_columns_type;
MPI_Datatype matrix_columns_type1;
MPI_Init(&argc, &argv);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
for(int i=0; i<N; ++i) {
for(int j=0; j<NPART; ++j) {
a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
}
}
MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
MPI_Type_commit(&column_send_type);
MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
MPI_Type_commit(&column_send_type1);
MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
MPI_Type_commit(&column_recv_type);
MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
MPI_Type_commit(&column_recv_type1);
MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
MPI_Type_commit(&matrix_columns_type);
MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
MPI_Type_commit(&matrix_columns_type1);
MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);
if (my_rank==0) {
for(int i=0; i<N; ++i) {
for(int j=0; j<N; ++j) {
printf("%4.0f ",a_recv[i][j]);
}
printf("\n");
}
}
MPI_Finalize();
}

Two rectangles on a plane

In the last few days I try to solve this problem. I even have the solution but I can't figure it out. Can someone help me?
Here the problem:
You are given two rectangles on a plane.
The centers of both rectangles are located in the origin of coordinates
(meaning the center of the rectangle's symmetry).
The first rectangle's sides are parallel to the coordinate axes:
the length of the side that is parallel to the Ox axis, equals w,
the length of the side that is parallel to the Oy axis, equals h.
The second rectangle can be obtained by rotating the first rectangle
relative to the origin of coordinates by angle α.
Example:
http://i.imgur.com/qi1WQVq.png
Your task is to find the area of the region which belongs to both
given rectangles. This region is shaded in the picture.
Input
The first line contains three integers w, h, α (1 ≤ w, h ≤ 106; 0 ≤ α ≤ 180). Angle α is given in degrees.
Output
In a single line print a real number — the area of the region which belongs to both given rectangles.
The answer will be considered correct if its relative or absolute error doesn't exceed 10 - 6.
Sample test(s)
input
1 1 45
output
0.828427125
input
6 4 30
output
19.668384925
Here a possible implementation:
<?php
list($w, $h, $alphaInt) = explode(' ', '34989 23482 180');
if ($alphaInt == 0 || $alphaInt == 180) {
$res = $h * $w;
}
else if ($alphaInt == 90) {
$res = $h * $h;
}
else {
if ($alphaInt > 90) $alphaInt = 180 - $alphaInt;
$alpha = $alphaInt / 180.0 * M_PI;
//echo '$alpha:' . $alpha . "\n";
$cos = cos($alpha);
$sin = sin($alpha);
//echo '$cos: ' . $cos . "\n";
//echo '$sin: ' . $sin . "\n";
$c = $w / 2 * $cos + $h / 2 * $sin - $w / 2;
//echo '$c: ' . $c . "\n";
$r1 = $c / $cos;
$r2 = $c / $sin;
//echo '$r1: ' . $r1 . "\n";
//echo '$r2: ' . $r2 . "\n";
$c = $w / 2 * $sin + $h / 2 * $cos - $h / 2;
//echo '$c: ' . $c . "\n";
$r3 = $c / $cos;
$r4 = $c / $sin;
//echo '$r3: ' . $r3 . "\n";
//echo '$r4: ' . $r4 . "\n";
if ($r1 < 0 || $r2 < 0 || $r3 < 0 || $r4 < 0) {
$res = $h * $h / $sin; //$res = $w * $w / $cos;
}
else {
$res = $h * $w - $r1 * $r2 - $r3 * $r4;
}
}
echo '$res: ' . $res . "\n";
Small alpha
When w*sin(alpha) < h*(1+cos(alpha)) (i.e., before the vertices of the new rectangle meet the vertices of the old one for the first time), the area of the intersection is the area of the original rectangle (w * h) minus 4 triangles (2 pairs of identical ones). Let the bigger triangle have hypotenuse a and the smaller hypotenuse b, then the area is
A = w * h - a*a*cos(alpha)*sin(alpha) - b*b*cos(alpha)*sin(alpha)
The sides of the original rectangle satisfy a system of equations:
a + a * cos(alpha) + b * sin(alpha) = w
a * sin(alpha) + b + b * cos(alpha) = h
Using the half-angle formulas,
a * cos(alpha/2) + b * sin(alpha/2) = w/(2*cos(alpha/2))
a * sin(alpha/2) + b * cos(alpha/2) = h/(2*cos(alpha/2))
thus (the matrix on the LHS is a rotation!)
a^2 + b^2 = (w^2 + h^2) / (2*cos(alpha/2))^2
and
A = h * w - (w^2 + h^2) * cos(alpha)* sin(alpha) / (2*cos(alpha/2))^2
(this can be simplified further a little bit)
Bigger alpha
When alpha is bigger (but still alpha<pi/2), the intersection is a parallelogram (actually, a rhombus) whose 2 altitudes are h and 4 sides h/sin(alpha) and the area is, thus, h*h/sin(alpha) (yes, it does not depend on w!)
Other alpha
Use symmetry to reduce alpha to [0;pi/2] and use one of the two cases above.
You might try describing both rectangles as solutions to a system of four linear insqualities.
The set of points in their intersection is the set of solutions to both sets of linear inequalities.
You want the area of that set of solutions. You can find all points at which at least two of your eight inequalities are tight. Filter out those that don't satisfy all of the inequalities. Then take their convex hull using Graham's scan and compute the area using the surveyor's formula.
This method works to find the intersection of any two convex polygons. Slightly modified, it generalises (in the form of Fourier-Motzkin elimination and the double description method for computing the intersection and determinants for volume calculation) to convex polyhedra in any dimension.

Resources