what happens when you merge branches in keras with different shapes? - keras-2

Following is the partial code. I am trying to understand what "add" does. Why is the output of Add layer (None, 38, 300) when adding two inputs with different shapes here?
Following is the code in Keras.
image_model = Input(shape=(2048,))
x = Dense(units=EMBEDDING_DIM, activation="relu")(image_model)
x = BatchNormalization()(x)
language_model = Input(shape=(MAX_CAPTION_SIZE,))
y = Embedding(input_dim=VOCABULARY_SIZE, output_dim=EMBEDDING_DIM)(language_model)
y = Dropout(0.5)(y)
merged = add([x, y])
merged = LSTM(256, return_sequences=False)(merged)
merged = Dense(units=VOCABULARY_SIZE)(merged)
merged = Activation("softmax")(merged)

Why is the output of Add layer (None, 38, 300) when adding two inputs
with different shapes here?
It's a technique called broadcasting. You can find more details here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
In the example below, the first input(16,) is broadcast along the second dimension(2,) of the second input(2,16), so that the element-wise addition can happen.
import keras
import numpy as np
input1 = keras.layers.Input(shape=(16,))
input2 = keras.layers.Input(shape=(2,16))
added = keras.layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
output = model.predict([np.ones((1,16)), np.ones((1,2,16))])
print(output.shape)
print(output)
(1, 2, 16)
[[[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]]

Related

Why are Shapely 'difference' and 'intersection' returning unexpected result?

I have defined three Shapely linestrings and three Shapely polygons, which overlap/intersect each other in various places as shown in the annotated image below.
My understanding is that a shapely 'difference' operation should return the parts of the lines that are outside of the polygons. I'm not sure why, but when I perform a 'difference' operation, it seems to be keeping part of a line that is within one of the polygons. This is shown in the following plot where I have compared the original polygons to the output of the difference operation.
Note that, similarly, if I run an 'intersection' operation, it is missing this small segment. Can anyone explain why this is the case? Code to generate everything shown above is as follows:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, LineString
#Define lines and polygons:
linkID = ['1','2','3']
link_geom = [LineString([(0, 0), (10, 10)]),LineString([(10, 10), (20, 10)]),LineString([(20, 10), (25, 15)])]
gdf_links = gpd.GeoDataFrame({'linkID':linkID,'geometry':link_geom})
polyID = ['100','200','300']
poly_geom = [Polygon([(2, 1), (2, 3), (4, 3), (4, 1)]),Polygon([(15, 7), (15, 13), (18, 13), (18, 7)]),Polygon([(19, 7), (19, 13), (21, 13), (21, 7)])]
gdf_poly = gpd.GeoDataFrame({'polyID':polyID,'geometry':poly_geom})
links = gdf_links.unary_union
polys = gdf_poly.unary_union
#Show plot of lines and polygons together:
gpd.GeoSeries([links,polys]).plot(cmap='tab10')
#Split links at polygons, keeping segments that are outside of polgyon:
difference = gdf_links.difference(gdf_poly).reset_index(drop=True)
#Plot resulting 'difference' vs original polygons:
diff = difference.unary_union
gpd.GeoSeries([diff,polys]).plot(cmap='tab10')
You are performing the 'difference' function on your three separate linke and polygons. So the first line is getting cropped by the first box only. The secone link is getting cropped by the second box only. The third line is getting cropped by the third box only. The solution to this is cropping the lines on the joined polygon dataset so they are all cropped against all boxes. You can change this on the line:
difference = gdf_links.difference(gdf_poly).reset_index(drop=True)
and change it to:
difference = gdf_links.difference(polys).reset_index(drop=True)

How may i fill a 3d array in R?

I have a 3d array as below:
prob = array(0,c(7,7,7))
Now, i need to refill it by random numbers as below:
pop = sample(1:100, 7**3, replace=TRUE)
pop = pop/sum(pop)
if simply assign the value to it then it will remove all the dimentions of prob :
prob = pop
print(dim(prob))
The output of the print is:
> print(dim(prob))
NULL
Therefore, apparently the prob = pop erase the dimensions.
How can i assign data but keep the 3d dimensions?
You can perform subset assignment as follows:
prob[] = pop
This will replace the values but preserve dimensions and other attributes.
However, this seems unnecessary in your case: why assign after the fact, when you can initialise?
pop = sample(1 : 100, 7 ** 3, replace = TRUE)
prob = array(pop / sum(pop), c(7, 7, 7))
There’s no need to pre-assign prob as a zero array, and in fact I’d consider that an anti-pattern: in general you should treat variables as read-only, unless there are specific reasons to reassign/modify them (and there rarely are).

word mapping for 2D word embedding

For my Masters Thesis, I created a Word2Vec model. I wanted to show this image to clarify the result. But how does the mapping works to display the words in this 2D space?
All words are represented by a vector of 300 dim. How are they mapped on this 2D image? What are the x & y axes?
Code:
w2v_model.build_vocab(documents)
words = w2v_model.wv.vocab.keys()
vocab_size = len(words)
print("Vocab size", vocab_size)
w2v_model.train(documents, total_examples=len(documents),
epochs=W2V_EPOCH)
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df_train.text)
vocab_size = len(tokenizer.word_index) + 1
print("Total words", vocab_size)
x_train = pad_sequences(tokenizer.texts_to_sequences(df_train.text), maxlen=SEQUENCE_LENGTH)
x_test = pad_sequences(tokenizer.texts_to_sequences(df_test.text), maxlen=SEQUENCE_LENGTH)
labels = df_train.target.unique().tolist()
labels.append(NEUTRAL)
encoder = LabelEncoder()
encoder.fit(df_train.target.tolist())
y_train = encoder.transform(df_train.target.tolist())
y_test = encoder.transform(df_test.target.tolist())
y_train = y_train.reshape(-1,1)
y_test = y_test.reshape(-1,1)
embedding_matrix = np.zeros((vocab_size, W2V_SIZE))
for word, i in tokenizer.word_index.items():
if word in w2v_model.wv:
embedding_matrix[i] = w2v_model.wv[word]
print(embedding_matrix.shape)
embedding_layer = Embedding(vocab_size, W2V_SIZE, weights=[embedding_matrix], input_length=SEQUENCE_LENGTH, trainable=False)
There are a couple of approaches.
The first is to use PCA (principal components analysis), and plot the first component on the x-axis, the second component on the y-axis (and throw away the other components).
You don't say which library you are using to generate your word vectors, and it might come with its own PCA function. But sklearn has one: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
(https://machinelearningmastery.com/develop-word-embeddings-python-gensim/ has some ready-made code showing making the vectors with gensim, then plotting them with that function.)
The other approach you could try is just to plot the first two dimensions of your word vectors. This is reasonable because all dimensions in word vectors should be carrying equal weight. I.e. taking any two of the 300 dimensions should give you the same amount of information as any other two dimensions.
But using PCA is the more normal approach for visualization.

How to plot Daubechies psi and phi wavelet functions in R?

The analysis with wavelets seems to be carried out as a discrete transform via matrix multiplication. So it is not surprising, I guess, that when plotting, for example, D4, the R package wmtsa returns the plot:
require(wmtsa)
filters <- wavDaubechies("d4")
plot(filters)
The question is how to go from this discretized plot to the plot in the Wikipedia entry:
Please note that I'm not interested in generating these curves precisely with wmtsa. Any other package will do - I don't have Matlab or Mathematica. But I wonder if the way to go is to start with translating this Mathematica chunk of code in this paper into R, rather than using built-in functions:
Wave1etTransform.m
c[k-1 := c[k] = Daubechies[4][[k+l]];
phi[l] = (l+Sqrt[3])/2 // N;
phi[2] = (l-Sqrt[3])/2 // N;
phi[xJ; xc=0 II x>=3] : = 0
phi[x-?NumberQ] := phi[x] =
N[Sqrt[2]] Sum[c[k] phi[2x-k],{k,0,3}];
In order to plot the wavelet and scaling function all you need are the four numbers shown in the first two plots. I'll focus on plotting the scaling function.
Integer shifts of the scaling function, πœ‘, form an orthonormal basis of the subspace V0 of the multiresolution analysis. We also have that V-1 βŠ† V0 and that πœ‘(x/2) ∈ V-1. Using this gives us the identity
πœ‘(x/2) = βˆ‘k ∈ β„€ hkπœ‘(x-k)
Now we just need the values of hk. For the Daubechies wavelet these are the values show in the discrete plot you gave (and zero for every other value of k). For an exact value of the hk, first let πœ‡ = (1+sqrt(3))/2. Then we have that
h0 = πœ‡/4
h1 = (1+πœ‡)/4
h2 = (2-πœ‡)/4
h3 = (1-πœ‡)/4
and hk = 0 otherwise.
Using these two things we are able to plot the function using what is known as the cascade algorithm. First notice that πœ‘(0) = πœ‘(0/2) = h0πœ‘(0) + h1πœ‘(0-1) + h2πœ‘(0-2) + h3πœ‘(0-3). The only way this equation can hold is if πœ‘(0) = πœ‘(-1) = πœ‘(-2) = πœ‘(-3) = 0. Extending this will show that for x ≦ 0 we have that πœ‘(x) = 0. Furthermore, a similar argument can show that πœ‘(x) = 0 for x β‰₯ 3.
Thus, we only need to worry about x = 1 and x = 2 to find non-zero values of πœ‘ for integer values of x. If we put x = 2 into the identity for πœ‘(x/2) we get that πœ‘(1) = h0πœ‘(2) + h1πœ‘(1). Putting x = 4 into the identity gives us that πœ‘(2) = h2πœ‘(2) + h3πœ‘(1).
We can rewrite the above two equations as a matrix multiplied by a vector equals a vector. In fact, it will be in the form v = Av (v is the same vector on both sides). This means that v is an eigenvector of the matrix A with eigenvalue 1. But v = (πœ‘(1), πœ‘(2)) and so by finding this eigenvector using the standard methods we will be able to find the values of πœ‘(1) and πœ‘(2).
In fact, this gives us that πœ‘(1) = (1+sqrt(3))/2 and πœ‘(2) = (1-sqrt(3))/2 (this is where those values in the Mathematica code sample come from). Also note that we need to specifically chose the eigenvector of magnitude 2 for this algorithm to work so you must use those values for πœ‘(1) and πœ‘(2) even though you could rescale the eigenvector.
Now we can find the values of πœ‘(1/2), πœ‘(3/2), and πœ‘(5/2). For example, πœ‘(1/2) = h0πœ‘(1) and πœ‘(3/2) = h1πœ‘(2) + h2πœ‘(1).
With these values, you can then find the values of πœ‘(1/4), πœ‘(3/4), and so on. Continuing this process will give you the value of πœ‘ for all dyadic rationals (rational numbers in the form k/2j.
The same process can be used to find the wavelet function. You only need to use the four different values shown in the first plot rather than the four shown in the second plot.
I recently implemented this Python. An R implementation will be fairly similar.
import numpy as np
import matplotlib.pyplot as plt
def cascade_algorithm(j: int):
mu = (1 + np.sqrt(3))/2
h_k = np.array([mu/4, (1+mu)/4, (2-mu)/4, (1-mu)/4])
# Array to store all the value of phi.
phi_vals = np.zeros((2, 3*2**j+1), dtype=np.float64)
for i in range(3*2**j+1):
phi_vals[0][i] = i/(2**j)
calced_vals = np.zeros((3*2**j+1), dtype=np.bool)
# Input values for 1 and 2.
phi_vals[1][1*2**j] = (1+np.sqrt(3))/2
phi_vals[1][2*2**j] = (1-np.sqrt(3))/2
# We now know the values for 0, 1, 2, and 3.
calced_vals[0] = True
calced_vals[1*2**j] = True
calced_vals[2*2**j] = True
calced_vals[3*2**j] = True
# Now calculate for all the dyadic rationals.
for k in range(1, j+1):
for l in range(1, 3*2**k):
x = l/(2**k)
if calced_vals[int(x*2**j)] != True:
calced_vals[int(x*2**j)] = True
two_x = 2*x
which_k = np.array([0, 1, 2, 3], dtype=np.int)
which_k = ((two_x - which_k > 0) & (two_x - which_k < 3))
phi = 0
for n, _ in enumerate(which_k):
if which_k[n] == True:
phi += h_k[n]*phi_vals[1][int((two_x-n)*2**j)]
phi_vals[1][int(x*2**j)] = 2*phi
return phi_vals
phi_vals = cascade_algorithm(10)
plt.plot(phi_vals[0], phi_vals[1])
plt.show()
If you just want to plot the graphs, then you can use the package "wavethresh" to plot for example the D4 with the following commands:
draw.default(filter.number=4, family="DaubExPhase", enhance=FALSE, main="D4 Mother", scaling.function = F) # mother wavelet
draw.default(filter.number=4, family="DaubExPhase", enhance=FALSE, main="D4 Father", scaling.function = T) # father wavelet
Notice that the mother wavelet and the father wavelets will be plotted depending on the variable "scaling.function". If true, then it plots the father wavelet (scaling), else it plots the mother wavelet.
If you want to generate it by yourself, without packages, I'd suggest you follow Daubechies-Lagarias algorithm, in this paper. It is not hard to implement.

TensorFlow: Take L2 norm over multiple dimensions

I have a TensorFlow placeholder with 4 dimensions representing a batch of images. Each image is 32 x 32 pixels, and each pixel has 3 color channels. The first dimensions represents the number of images.
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
For each image, I would like to take the L2 norm of all the image's pixels. Thus, the output should be a tensor with one dimension (i.e. one value per image). The tf.norm() (documentation) accepts an axis parameter, but it only lets me specify up to two axes over which to take the norm, when I would like to take the norm over axes 1, 2, and 3. How do I do this?
n = tf.norm(X, ord=2, axis=0) # n.get_shape() is (?, ?, 3), not (?)
n = tf.norm(X, ord=2, axis=[1,2,3]) # ValueError
You do not need flattening which was suggested in the other answer. If you will carefully read documentation, you would see:
axis: If axis is None (the default), the input is considered a vector
and a single vector norm is computed over the entire set of values in
the tensor, i.e. norm(tensor, ord=ord) is equivalent to
norm(reshape(tensor, [-1]), ord=ord)
Example:
import tensorflow as tf
import numpy as np
c = tf.constant(np.random.rand(3, 2, 3, 6))
d = tf.norm(c, ord=2)
with tf.Session() as sess:
print sess.run(d)
I tried Salvador's answer but it looks like that returns one number for the whole minibatch instead of one number per image. So it looks like we may be stuck with doing the norm per dimension.
import tensorflow as tf
import numpy as np
batch = tf.constant(np.random.rand(3, 2, 3, 6))
x = tf.norm(batch, axis=3)
x = tf.norm(x, axis=2)
x = tf.norm(x, axis=1)
with tf.Session() as sess:
result = sess.run(x)
print(result)
This might introduce a small amount of numerical instability but in theory it's the same as taking the norm of the whole image at once.
You might also think about only taking the norm over the x and y axes so that you get one norm per channel. There's a reason why that's supported by tensorflow and this isn't.
You can compute the L2-norm by yourself like this:
tf.sqrt(tf.reduce_sum(tf.pow(images,2), axis=(1,2,3)))

Resources