Average every n element of axis 1 in a 3D numpy ndarray - multidimensional-array

I have a 3D numpy of shape (900,10,54).
And I want to average the values of every two elements into one, for axis 1.
Expected outcome would have shape: (900,5,54).

One way to do this:
This uses numpy array slicing to achieve this.
import numpy as np
x = np.random.rand(900, 10, 64)
y = (x[:, ::2, :] + x[:, 1::2, :]) / 2
Another approach:
If you have a variable number of consecutive elements in axis = 1 that you want to sum (which was 2 above), you can use reshape and mean to achieve this.
n = 2
y = x.reshape(x.shape[0], x.shape[1] // n, n, x.shape[2]) # shape = (900, 5, 2, 64)
y = y.mean(axis = 2)
This sums consecutive n rows for each of inner matrices in your 3D array x.

Related

How to generate an elliptical cylinder, populate it with randomly distributed points, and measure instances of overlap between those points in R

I would like to determine the probability that a randomly distributed object of Type A occupies or touches (overlaps) the same space as any randomly distributed object of Type B when populated inside an elliptical cylinder. I would then like to loop this simulation many times to generate a more reliable probability value.
I am able to draw the elliptical cylinder using the shape package:
library(shape)
emptyplot(c(-5, 5), c(-15, 15), main = "filled elliptic cylinder")
filledcylinder(rx = 9, ry = 5, len= 2, angle = 00, col = "white",
lcol = "black", lcolint = "grey")
I do not know how to add points (i.e. objects A and B) to this graph. However, I suspect graphical expression is not the way to go with this task (though I find visualising helpful). I suspect a better approach will be to create a function to describe the elliptical cylinder, similar to the cone in the following example, and run the simulation without graphical output:
# Create a function to describe a cone
cone <- function(x, y){
sqrt(x ^ 2 + y ^ 2)
}
# prepare variables.
x <- y <- seq(-1, 1, length = 30)
z <- outer(x, y, cone)
# plot as a 3D surface for visual reference (even though I actually want a volume)
persp(x, y, z,
main="Perspective Plot of a Cone",
zlab = "Height",
theta = 30, phi = 15,
col = "orange", shade = 0.4)
Sadly I do not know how to do this for my elliptical cylinder. I am aware of the paramaters for describing an elliptical cylinder from the following source:
https://mathworld.wolfram.com/EllipticCylinder.html
Unfortunately, I do not understand much of it. I hope the dimensions given in my filledcylinder can act as a guide. Ultimately the dimension values do not matter, what matters is the code structure into which values can be entered.
As for the objects:
Let there be 50 Type A objects and 50 Type B objects of size x=0.4, y=0.4, z=0.4 (same units as in my graphical elliptical cylinder example).
All objects are to be distributed at random within the volume of the elliptical cylinder, with the exception that objects of Type A cannot overlap with another object of Type A, and objects of Type B cannot overlap with other objects of Type B. Type A objects may overlap with Type B objects.
I would like to output the number of Type A objects that overlap with any Type B object in the given volume, this number as a percentage of total Type A objects, and as a percentage of total all objects for each run of the simulation.
I do not know how to even start to do this.
If you can help, I'm afraid statistics, geometry and non-basic R expressions will need to be explained as if to a (not particularly bright) child.
Thank you very very much for your time!
An implementation with heavily commented code for explanations. This assumes the A- and B-type objects must be entirely within the elliptical cylinder.
library(data.table)
rObj <- function(rx, ry, h, n, dims, eps = 2) {
# Function to create a random sample (by rejection) of non-overlapping
# rectangular prism objects inside an elliptical cylinder whose ellipse is
# centered at x = 0, y = 0 and whose height ranges from -dims[3]/2 to h -
# dims[3]/2. The objects have dimensions (x, y, z) = dims, and all edges are
# parallel or orthogonal to each of the x, y, or z axes.
# INPUTS:
# rx: length of the ellipse
# ry: width of the ellipse
# h: height of the elliptical cylinder
# n: number of non-overlapping objects to return
# dims: dimensions of the rectangular prism objects (vector of length 3)
# eps: oversampling factor
# OUTPUT: a data.table with 3 columns and n rows. Each row gives the
# coordinates of the centroid of a sampled object
dt <- data.table()
while(nrow(dt) < n) {
# increase oversampling if it is not the first pass
if (nrow(dt)) eps <- eps*2
rho <- sqrt(runif(eps*n))
phi <- runif(eps*n, 0, 2*pi)
dt <- data.table(
# sample object centroids
# see https://stackoverflow.com/questions/5529148/algorithm-calculate-pseudo-random-point-inside-an-ellipse
# First, uniformly sample on an ellipse centered on x = 0, y = 0,
# with xlength = rx - dims[1] and ylength = ry - dims[2]
# (any object with a centroid outside of this ellipse will stick out of
# the elliptical cylinder, although some with a centroid within the
# smaller ellipse will still stick out of the elliptical cylinder).
x = (rx - dims[1])/2*rho*cos(phi),
y = (ry - dims[2])/2*rho*sin(phi),
# uniformly sample centroid heights
z = runif(eps*n, 0, h - dims[3])
)[
# remove objects that stick out of bounds
# The ellipse satisfies (x/(rx/2))^2 + (y/(ry/2))^2 = 1, which is the
# same as (x/rx)^2 + (y/ry)^2 = 0.25. Taking advantage of symmetry, add
# half of the x and y dimensions of the objects to the absolute value of
# x and y (the object corner furthest from the foci of the ellipse) and
# check if the result satisfies the standard equation.
((abs(x) + dims[1]/2)/rx)^2 + ((abs(y) + dims[2]/2)/ry)^2 < 0.25
][
# remove objects that overlap a previously placed object
# Since each rectangular prism object is oriented with the x, y, z axes,
# two objects overlap if they are closer than their lengths in each
# dimension.
tabulate(
sequence((.N - 1L):1, 2:.N)[ # row numbers (always keep the first row)
(dist(x) < dims[1]) & (dist(y) < dims[2]) & (dist(z) < dims[3])
],
.N
) == 0L
]
}
dt[1:n] # keep the first n objects
}
# function to get pairwise distances between two vectors
dist2 <- function(x, y) abs(outer(x, y, "-"))
fsim <- function(rx, ry, h, nA, nB, dimA, dimB, nreps, eps = 2) {
# function to simulate placement of A and B rectangular prism objects inside
# an elliptical cylinder and count the number of A-type objects that
# intersect at least one B-type object. All object edges are parallel or
# orthogonal to each of the x, y, or z axes.
# INPUTS:
# rx: length of the ellipse
# ry: width of the ellipses
# h: height of the elliptical cylinder
# nA: number of non-overlapping A-type objects to return
# nB: number of non-overlapping B-type objects to return
# dimX: dimensions of the rectangular prism objects (vector of length 3)
# nreps: the number of replications to simulate
# eps: oversampling factor when randomly sampling non-overlapping objects
# by rejection
# OUTPUT: vector of length "nreps" giving the number of A-type objects that
# intersect at least one B-type object for each replication
dims <- rowMeans(cbind(dimA, dimB)) # average dimensions of the A and B objects
out <- integer(nreps) # initialize the output vector
# repeat the simulation "nreps" times
for (i in 1:nreps) {
# get the coordinates of the A- and B-type objects' centroids
A <- rObj(rx, ry, h, nA, dimA, eps)
B <- rObj(rx, ry, h, nB, dimB, eps)
# count the number of A-type objects that intersect at least one B-type
# object
out[i] <- sum(rowSums((dist2(A$x, B$x) < dims[1])*(dist2(A$y, B$y) < dims[2])*(dist2(A$z, B$z) < dims[3])) != 0L)
}
out
}
Time 10K simulation replications:
system.time(overlaps <- fsim(9, 5, 2, 50L, 50L, rep(0.4, 3), rep(0.4, 3), 1e4L))
#> user system elapsed
#> 27.19 0.25 27.67
mean(overlaps)
#> [1] 18.7408
One approach to get an approximate answer to this problem is to discretize things. Set up a volume as a 3 dimensional array of zeros, then randomly generate the parameters of your shapes one at a time.
For each generated shape, find all the elements of the array that would be inside the shape. If any locations would be outside the cylinder or overlap a shape of the same type, try again. Once you have a legal shape, mark those array entries (e.g. 1 for type A, 2 for type B). Do all type A first, then all type B, and keep count of the times when shape B occupies a space that was previously marked for shape A.

Find the common area between two graphs with multiple intersection points

I have following simulated data of following 2 variables. I created the density plot as follows,
set.seed(1)
x1=density(rnorm(100,0.5,3))
x2=density(rnorm(100,1,3))
plot(x1)
lines(x2)
Is there any function that can use to find the common area for these 2 graphs using R ?
Do i need to perform an integration for intersecting points ?
Thank you
If you set the sequence both densities use for x values to be identical, you can use pmin on the y values. (Call str(x1) to see how they're stored.) For instance, to see how it works:
set.seed(1)
x1 <- density(rnorm(100,0.5,3), from = -10, to = 10, n = 501)
x2 <- density(rnorm(100,1,3), from = -10, to = 10, n = 501)
plot(x2, main = 'Density intersection')
lines(x1)
polygon(x1$x, pmin(x1$y, x2$y), 20, col = 'dodgerblue')
Taking the integral means just multiplying each pmin times the increment in the x sequence and summing the lot:
sum(pmin(x1$y, x2$y) * diff(x1$x[1:2]))
#> [1] 0.896468

How to plot Daubechies psi and phi wavelet functions in R?

The analysis with wavelets seems to be carried out as a discrete transform via matrix multiplication. So it is not surprising, I guess, that when plotting, for example, D4, the R package wmtsa returns the plot:
require(wmtsa)
filters <- wavDaubechies("d4")
plot(filters)
The question is how to go from this discretized plot to the plot in the Wikipedia entry:
Please note that I'm not interested in generating these curves precisely with wmtsa. Any other package will do - I don't have Matlab or Mathematica. But I wonder if the way to go is to start with translating this Mathematica chunk of code in this paper into R, rather than using built-in functions:
Wave1etTransform.m
c[k-1 := c[k] = Daubechies[4][[k+l]];
phi[l] = (l+Sqrt[3])/2 // N;
phi[2] = (l-Sqrt[3])/2 // N;
phi[xJ; xc=0 II x>=3] : = 0
phi[x-?NumberQ] := phi[x] =
N[Sqrt[2]] Sum[c[k] phi[2x-k],{k,0,3}];
In order to plot the wavelet and scaling function all you need are the four numbers shown in the first two plots. I'll focus on plotting the scaling function.
Integer shifts of the scaling function, 𝜑, form an orthonormal basis of the subspace V0 of the multiresolution analysis. We also have that V-1 ⊆ V0 and that 𝜑(x/2) ∈ V-1. Using this gives us the identity
𝜑(x/2) = ∑k ∈ ℤ hk𝜑(x-k)
Now we just need the values of hk. For the Daubechies wavelet these are the values show in the discrete plot you gave (and zero for every other value of k). For an exact value of the hk, first let 𝜇 = (1+sqrt(3))/2. Then we have that
h0 = 𝜇/4
h1 = (1+𝜇)/4
h2 = (2-𝜇)/4
h3 = (1-𝜇)/4
and hk = 0 otherwise.
Using these two things we are able to plot the function using what is known as the cascade algorithm. First notice that 𝜑(0) = 𝜑(0/2) = h0𝜑(0) + h1𝜑(0-1) + h2𝜑(0-2) + h3𝜑(0-3). The only way this equation can hold is if 𝜑(0) = 𝜑(-1) = 𝜑(-2) = 𝜑(-3) = 0. Extending this will show that for x ≦ 0 we have that 𝜑(x) = 0. Furthermore, a similar argument can show that 𝜑(x) = 0 for x ≥ 3.
Thus, we only need to worry about x = 1 and x = 2 to find non-zero values of 𝜑 for integer values of x. If we put x = 2 into the identity for 𝜑(x/2) we get that 𝜑(1) = h0𝜑(2) + h1𝜑(1). Putting x = 4 into the identity gives us that 𝜑(2) = h2𝜑(2) + h3𝜑(1).
We can rewrite the above two equations as a matrix multiplied by a vector equals a vector. In fact, it will be in the form v = Av (v is the same vector on both sides). This means that v is an eigenvector of the matrix A with eigenvalue 1. But v = (𝜑(1), 𝜑(2)) and so by finding this eigenvector using the standard methods we will be able to find the values of 𝜑(1) and 𝜑(2).
In fact, this gives us that 𝜑(1) = (1+sqrt(3))/2 and 𝜑(2) = (1-sqrt(3))/2 (this is where those values in the Mathematica code sample come from). Also note that we need to specifically chose the eigenvector of magnitude 2 for this algorithm to work so you must use those values for 𝜑(1) and 𝜑(2) even though you could rescale the eigenvector.
Now we can find the values of 𝜑(1/2), 𝜑(3/2), and 𝜑(5/2). For example, 𝜑(1/2) = h0𝜑(1) and 𝜑(3/2) = h1𝜑(2) + h2𝜑(1).
With these values, you can then find the values of 𝜑(1/4), 𝜑(3/4), and so on. Continuing this process will give you the value of 𝜑 for all dyadic rationals (rational numbers in the form k/2j.
The same process can be used to find the wavelet function. You only need to use the four different values shown in the first plot rather than the four shown in the second plot.
I recently implemented this Python. An R implementation will be fairly similar.
import numpy as np
import matplotlib.pyplot as plt
def cascade_algorithm(j: int):
mu = (1 + np.sqrt(3))/2
h_k = np.array([mu/4, (1+mu)/4, (2-mu)/4, (1-mu)/4])
# Array to store all the value of phi.
phi_vals = np.zeros((2, 3*2**j+1), dtype=np.float64)
for i in range(3*2**j+1):
phi_vals[0][i] = i/(2**j)
calced_vals = np.zeros((3*2**j+1), dtype=np.bool)
# Input values for 1 and 2.
phi_vals[1][1*2**j] = (1+np.sqrt(3))/2
phi_vals[1][2*2**j] = (1-np.sqrt(3))/2
# We now know the values for 0, 1, 2, and 3.
calced_vals[0] = True
calced_vals[1*2**j] = True
calced_vals[2*2**j] = True
calced_vals[3*2**j] = True
# Now calculate for all the dyadic rationals.
for k in range(1, j+1):
for l in range(1, 3*2**k):
x = l/(2**k)
if calced_vals[int(x*2**j)] != True:
calced_vals[int(x*2**j)] = True
two_x = 2*x
which_k = np.array([0, 1, 2, 3], dtype=np.int)
which_k = ((two_x - which_k > 0) & (two_x - which_k < 3))
phi = 0
for n, _ in enumerate(which_k):
if which_k[n] == True:
phi += h_k[n]*phi_vals[1][int((two_x-n)*2**j)]
phi_vals[1][int(x*2**j)] = 2*phi
return phi_vals
phi_vals = cascade_algorithm(10)
plt.plot(phi_vals[0], phi_vals[1])
plt.show()
If you just want to plot the graphs, then you can use the package "wavethresh" to plot for example the D4 with the following commands:
draw.default(filter.number=4, family="DaubExPhase", enhance=FALSE, main="D4 Mother", scaling.function = F) # mother wavelet
draw.default(filter.number=4, family="DaubExPhase", enhance=FALSE, main="D4 Father", scaling.function = T) # father wavelet
Notice that the mother wavelet and the father wavelets will be plotted depending on the variable "scaling.function". If true, then it plots the father wavelet (scaling), else it plots the mother wavelet.
If you want to generate it by yourself, without packages, I'd suggest you follow Daubechies-Lagarias algorithm, in this paper. It is not hard to implement.

TensorFlow: Take L2 norm over multiple dimensions

I have a TensorFlow placeholder with 4 dimensions representing a batch of images. Each image is 32 x 32 pixels, and each pixel has 3 color channels. The first dimensions represents the number of images.
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
For each image, I would like to take the L2 norm of all the image's pixels. Thus, the output should be a tensor with one dimension (i.e. one value per image). The tf.norm() (documentation) accepts an axis parameter, but it only lets me specify up to two axes over which to take the norm, when I would like to take the norm over axes 1, 2, and 3. How do I do this?
n = tf.norm(X, ord=2, axis=0) # n.get_shape() is (?, ?, 3), not (?)
n = tf.norm(X, ord=2, axis=[1,2,3]) # ValueError
You do not need flattening which was suggested in the other answer. If you will carefully read documentation, you would see:
axis: If axis is None (the default), the input is considered a vector
and a single vector norm is computed over the entire set of values in
the tensor, i.e. norm(tensor, ord=ord) is equivalent to
norm(reshape(tensor, [-1]), ord=ord)
Example:
import tensorflow as tf
import numpy as np
c = tf.constant(np.random.rand(3, 2, 3, 6))
d = tf.norm(c, ord=2)
with tf.Session() as sess:
print sess.run(d)
I tried Salvador's answer but it looks like that returns one number for the whole minibatch instead of one number per image. So it looks like we may be stuck with doing the norm per dimension.
import tensorflow as tf
import numpy as np
batch = tf.constant(np.random.rand(3, 2, 3, 6))
x = tf.norm(batch, axis=3)
x = tf.norm(x, axis=2)
x = tf.norm(x, axis=1)
with tf.Session() as sess:
result = sess.run(x)
print(result)
This might introduce a small amount of numerical instability but in theory it's the same as taking the norm of the whole image at once.
You might also think about only taking the norm over the x and y axes so that you get one norm per channel. There's a reason why that's supported by tensorflow and this isn't.
You can compute the L2-norm by yourself like this:
tf.sqrt(tf.reduce_sum(tf.pow(images,2), axis=(1,2,3)))

Levelplot in R on irregular grid

I would like to create a 2D levelplot in R where the x and y coordinates are from an irregular grid without using interpolation. The grid is given below:
grid<-cbind(seq(from=0.05,to=0.5,by=0.05),seq(from=0.05,to=0.5,by=0.05))
grid<-rbind(grid,cbind(seq(from=0.0,to=0.95,by=0.05),seq (from=0.05,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.9,by=0.05),seq (from=0.1,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.85,by=0.05),seq(from=0.15,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.75,by=0.05),seq(from=0.25,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.80,by=0.05),seq(from=0.20,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.70,by=0.05),seq(from=0.30,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.65,by=0.05),seq(from=0.35,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.60,by=0.05),seq(from=0.40,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.55,by=0.05),seq(from=0.45,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.50,by=0.05),seq(from=0.50,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.40,by=0.05),seq(from=0.60,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.45,by=0.05),seq(from=0.55,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=.35,by=0.05),seq(from=0.65,to=1,by=0.05)))
grid<-rbind(grid,cbind(seq(from=0,to=0.30,by=0.05),seq(from=0.70,to=1,by=0.05)))
x=grid[,1]
y=grid[,2]
The Z-values are stored in another vector. I have tried to use the image-function, but without any luck. For instance, if I try
image(x,y,height.vals)
where
height.vals=matrix(runif(dim(grid)[1]),nrow=dim(grid)[1],ncol=1)
I get an error message saying that x and y should be increasing.
One could use the akima function interp, but then I get interpolated data.
Looks like you have points on a 20 x 20 grid. So, you can create a 20 x 20 matrix and fill it with the values from height.vals.
With a little bit of tweaking, you can turn the x and y values into indices of the matrix and use those indices to assign height.vals to the appropriate places in the matrix.
# Turn the x and y values into integers.
# R doesn't take 0 as an index, so add 1 to the x values to get rid of the 0s
inds <- cbind(x = as.integer(20*x + 1), y = as.integer(20*y))
# create the 20 x 20 matrix
m <- matrix(nrow = 20, ncol = 20)
# fill the matrix with height.vals based on the indices
m[inds] <- height.vals
Then, you can use m as an input to functions like image, filled.contour, and lattice::levelplot
image(m)

Resources