In the decoder part of Seq2seq, it is like a language modeling to be given an input word and the hidden state, to predict the next word.
How bidirectional information could be used in this mechanism?
Also, is that we also have to generate sentence words next-by-nextly in bidirectional RNN decoder?
Thank you.
Related
As a part of my project, I want to train an ASR system using teacher-student learning by the encoder of a TTS system. So, I have a TTS system that takes audio as input, not text. I try to implement an article for this purpose and I can't understand the author's meaning in this section.
The description is: " The text/speech domain discriminator takes each frame of spectral or character embeddings as input and is a 4-layer FC neural network (256→512→512→2) ".
While the student network gets audio and the teacher gets text as input and both outputs are embeddings with (#minibatch, #chars of input, #embdding dim=512).
By this description, what is the input of the discriminator? input size of the discriminator is (#minibatch, #chars of input, #embdding dim=512) or (#minibatch, #embdding dim=512)?
I attached the image of the training diagram for better understanding.
I very thankful, If somebody can help me.
I'm reading the paper "Concerto: A High Concurrency Key-Value Store with Integrity
".
For memory checking, they said they used AES-based pseudo random function, which has two input and outputs hash.
enter image description here
My question is: How to implement this AES-based pseudo random function in C or Go? Is this just use AES to encrypt? Someone can give me an example of implementation.
I want to generate voice in arduino using code. I can generate simple tones and music in arduino, but I need to output words like right, left, etc in arduino speaker. I found some methods using wav files but it requires external memory card reader. Is there a method to generate using only arduino and speaker?
Typical recorded sound (such as wav files) requires much larger amounts of memory than is a available on-chip on an Arduino.
It is possible to use an encoding and data rate that minimises the memory requirement - at the expense of audio quality. For example generally acceptable quality speech-band audio can be obtained using non-linear (companded) 8-bit PCM at 3KHz sample rate, which if differentially decoded to 4 bit samples (so that each sample is not the PCM code, but the difference in level from the previous sample), then you can get about 1 second of audio in 1.5Kbytes. You would have to do some off-line processing of the original audio to encode it in this manner before storing the resulting data in the Arduino flash memory. You will also have to implement the necessary decode and linearisation.
Another possibility is to use synthesised rather then recorded speech. This technique uses recorded phonemes (components of speech) rather than whole words, and you then build words from these components. The results are generally somewhat robotic and unnatural (modern speech synthesis can in fact be very convincing, but not with the resources available on an Arduino - think 1980's Speak-and-Spell).
Although it can be rather efficient, phoneme speech synthesis requires different phoneme sets for different natural languages. It is possible perhaps for a limited vocabulary perhaps to only encode the subset of phonemes actually used.
You can hear a recording of the kind of speech that can be generated by a simple phoneme speech generator at http://nsd.dyndns.org/speech/. This page discusses a 1980's GI-SP0256 speech chip driven by an Arduino rather than speech generated by the Arduino, but it gives you an idea of what might be achieved - the GI-SP0256 managed with just 2Kb ROM - the Arduino could probably implement something similar directly. The difficulty perhaps is in obtaining the necessary phoneme set. You could possibly record your own and encode them as above. Each word or phrase would then simply be a list of phonemes and delays to be output.
The eSpeak project might be a good place to start - it is probably too large for Arduino, and the whole text to speech translation unnecessary, but it converts text to phonemes, so you could do that part off-line (on a PC), then load the phonemes and the replay code to the Arduino. It may still be too large of course.
I'm increasingly looking at using QR codes to transmit binary information, such as images, since it seems whenever I demo my app, it's happening in situations where the WiFi or 3G/4G just doesn't work.
I'm wondering if it's possible to split a binary file up into multiple parts to be encoded by a series of QR codes?
Would this be as simple as splitting up a text file, or would some sort of complex data coherency check be required?
Yes, you could convert any arbitrary file into a series of QR codes,
something like Books2Barcodes.
The standard way of encoding data too big to fit in one QR code is with the "Structured Append Feature" of the QR code standard.
Alas, I hear that most QR encoders or decoders -- such as zxing -- currently do not (yet) support generating or reading such a series of barcodes that use the structured append feature.
QR codes already have a pretty strong internal error correction.
If you are lucky, perhaps splitting up your file with the "split" utility
into pieces small enough to fit into a easily-readable QR code,
then later scanning them in (hopefully) the right order and using "cat" to re-assemble them,
might be adequate for your application.
You surely can store a lot of data in a QR code; it can store 2953 bytes of data, which is nearly twice the size of a standard TCP/IP packet originated on an Ethernet network, so it's pretty powerful.
You will need to define some header for each QR code that describes its position in the stream required to rebuild the data. It'll be something like filename chunk 12 of 96, though encoded in something better than plain text. (Eight bytes for filename, one byte each for chunk number and total number of chunks -- a maximum of 256 QR codes, one simple ten-byte answer, still leaving 2943 bytes per code.)
You will probably also want to use some form of forward error correction such as erasure codes to encode sufficient redundant data to allow for mis-reads of either individual QR codes or entire missing QR codes to be transparently handled well. While you may be able to take an existing library, such as for Reed-Solomon codes to provide the ability to fix mis-reads within a QR code, handling missing QR codes entirely may take significantly more effort on your part.
Using erasure codes will of course reduce the amount of data you can transmit -- instead of all 753,408 bytes (256 * 2943), you will only have 512k or 384k or even less available to your final images -- depending upon what code rate you choose.
I think it is theoretically possible and as simple as splitting up text file. However, you probably need to design some kind of header to know that the data is multi-part and to make sure different parts can be merged together correctly regardless of the order of scanning.
I am assuming that the QR reader library returns raw binary data, and you will you the job of converting it to whatever form you want.
If you want automated creation and transmission, see
gre/qrloop: Encode a big binary blob to a loop of QR codes
maxg0/displaysocket.js: DisplaySocket.js - a JavaScript library for sending data from one device to another via QR ocdes using only a display and a camera
Note - I haven't used either.
See also: How can I publish data from a private network without adding a bidirectional link to another network - Security StackExchange
I want to convert a sound from Mic to binary and match it from the database(a type of voice identification program but don't getting idea how to get sound from Mic directly so that i can convert it to binary?Also it is possible or not. Please guide me )
See this:
http://www.dotnetspider.com/resources/4967-How-record-voice-from-microphone.aspx
You're not going to be able to identify voices by doing a binary comparison on sound data. The binary of a particular sound will not be identical to an imitation of that sound unless it is literally the same file because of minor variations in just about everything. You'll need to do some signals processing to do a fuzzy comparison of the data. You can read about signal processing on wikipedia.
You will probably find it easier to use a third party library to process the sound for you. Something like this might be a good start.
You're looking at two very distinct problems here.
The first is pretty technical: Getting sound from the microphone into a digital waveform. How you do this exactly depends on the OS and API you're using (on Windows, you're probably looking at DirectX audio or, if available, ASIO). Typically, this is how you'd proceed:
Set up a recording buffer for the microphone, with suitable parameters (number of channels, physical input on the sound card, sample rate, bit depth, buffer size)
Start the recording. This usually involves pointing the sound library to a callback function to process the recorded buffer.
In the callback, read the buffer, convert it to a suitable format, and append it to the audio file of your choice. (You could also record to RAM only, but longer recordings may exceed available storage).
Store the recorded audio in a suitable database field (some kind of binary blob)
This is the easy part though; the harder part is matching a chunk of audio data against other chunks. A naïve approach would be to try and find exact matches, but that won't help you much, because the chance that you find one is practically zero - recording equipment, even the best, introduces a bit of random noise, and recording setups vary slightly whether you want to or not, so even if you'd have someone say something twice, perfectly identical, you'd still see differences in the recorded audio.
What you need to do, then, is find certain typical characteristics of the waveform. Things you could look for are:
Overall amplitude shape
Base frequencies
Selected harmonics (formants)
Extracting these is non-trivial and involves pretty severe math; and then you'll have to condense them into some sort of fingerprint, and find a way to compare them with some fuzziness (so that a near-match is good enough, rather than requiring exact matches). Finding the right parameters and comparison algorithms isn't easy, and it takes a lot of tweaking and testing; your best bet is to go find a library that does this for you.