text/speech domain discriminator network training - generative-adversarial-network

As a part of my project, I want to train an ASR system using teacher-student learning by the encoder of a TTS system. So, I have a TTS system that takes audio as input, not text. I try to implement an article for this purpose and I can't understand the author's meaning in this section.
The description is: " The text/speech domain discriminator takes each frame of spectral or character embeddings as input and is a 4-layer FC neural network (256→512→512→2) ".
While the student network gets audio and the teacher gets text as input and both outputs are embeddings with (#minibatch, #chars of input, #embdding dim=512).
By this description, what is the input of the discriminator? input size of the discriminator is (#minibatch, #chars of input, #embdding dim=512) or (#minibatch, #embdding dim=512)?
I attached the image of the training diagram for better understanding.
I very thankful, If somebody can help me.

Related

AES-based pseudo random

I'm reading the paper "Concerto: A High Concurrency Key-Value Store with Integrity
".
For memory checking, they said they used AES-based pseudo random function, which has two input and outputs hash.
enter image description here
My question is: How to implement this AES-based pseudo random function in C or Go? Is this just use AES to encrypt? Someone can give me an example of implementation.

Neural Network predicting Plain text for respective Encrypted text

Suppose I have a function called encrypt(string). which takes some string divides it into k blocks. and encrypt it in some fashion. I used this function to encrypt some 1000-2000 sample text and gathered there respective encrypted text.
Then I trained a Neural Network on that data i.e encrypted text and respective plain text. So my question is it possible that suppose I feed thousands of examples to the neural network then It'll someday reverse engineer the entire encryption logic and for any encrypted text it will return corresponding plain text with higher accuracy ?
Thanks ✌️

How to use Core ML to analyse device motion values

I would like to implement Core-ML app to analyze Device Motion, I'm recording device motion values for some time and capturing the details into the JSON file. Now I want to analyze the data x, y, z values then I should give the result how the user using the device.
Use Turi Create. It has an Activity Classification module that makes this very easy: https://apple.github.io/turicreate/docs/userguide/activity_classifier/
To learn more about this in detail, check out the book Machine Learning by Tutorials (disclaimer: I'm a co-author but did not write the chapters on activity classification).

In Seq2Seq tasks, could bidirectional RNN(LSTM, GRU) be decoder?

In the decoder part of Seq2seq, it is like a language modeling to be given an input word and the hidden state, to predict the next word.
How bidirectional information could be used in this mechanism?
Also, is that we also have to generate sentence words next-by-nextly in bidirectional RNN decoder?
Thank you.

Using multiple QR codes to encode a binary image

I'm increasingly looking at using QR codes to transmit binary information, such as images, since it seems whenever I demo my app, it's happening in situations where the WiFi or 3G/4G just doesn't work.
I'm wondering if it's possible to split a binary file up into multiple parts to be encoded by a series of QR codes?
Would this be as simple as splitting up a text file, or would some sort of complex data coherency check be required?
Yes, you could convert any arbitrary file into a series of QR codes,
something like Books2Barcodes.
The standard way of encoding data too big to fit in one QR code is with the "Structured Append Feature" of the QR code standard.
Alas, I hear that most QR encoders or decoders -- such as zxing -- currently do not (yet) support generating or reading such a series of barcodes that use the structured append feature.
QR codes already have a pretty strong internal error correction.
If you are lucky, perhaps splitting up your file with the "split" utility
into pieces small enough to fit into a easily-readable QR code,
then later scanning them in (hopefully) the right order and using "cat" to re-assemble them,
might be adequate for your application.
You surely can store a lot of data in a QR code; it can store 2953 bytes of data, which is nearly twice the size of a standard TCP/IP packet originated on an Ethernet network, so it's pretty powerful.
You will need to define some header for each QR code that describes its position in the stream required to rebuild the data. It'll be something like filename chunk 12 of 96, though encoded in something better than plain text. (Eight bytes for filename, one byte each for chunk number and total number of chunks -- a maximum of 256 QR codes, one simple ten-byte answer, still leaving 2943 bytes per code.)
You will probably also want to use some form of forward error correction such as erasure codes to encode sufficient redundant data to allow for mis-reads of either individual QR codes or entire missing QR codes to be transparently handled well. While you may be able to take an existing library, such as for Reed-Solomon codes to provide the ability to fix mis-reads within a QR code, handling missing QR codes entirely may take significantly more effort on your part.
Using erasure codes will of course reduce the amount of data you can transmit -- instead of all 753,408 bytes (256 * 2943), you will only have 512k or 384k or even less available to your final images -- depending upon what code rate you choose.
I think it is theoretically possible and as simple as splitting up text file. However, you probably need to design some kind of header to know that the data is multi-part and to make sure different parts can be merged together correctly regardless of the order of scanning.
I am assuming that the QR reader library returns raw binary data, and you will you the job of converting it to whatever form you want.
If you want automated creation and transmission, see
gre/qrloop: Encode a big binary blob to a loop of QR codes
maxg0/displaysocket.js: DisplaySocket.js - a JavaScript library for sending data from one device to another via QR ocdes using only a display and a camera
Note - I haven't used either.
See also: How can I publish data from a private network without adding a bidirectional link to another network - Security StackExchange

Resources