I want to generate voice in arduino using code. I can generate simple tones and music in arduino, but I need to output words like right, left, etc in arduino speaker. I found some methods using wav files but it requires external memory card reader. Is there a method to generate using only arduino and speaker?
Typical recorded sound (such as wav files) requires much larger amounts of memory than is a available on-chip on an Arduino.
It is possible to use an encoding and data rate that minimises the memory requirement - at the expense of audio quality. For example generally acceptable quality speech-band audio can be obtained using non-linear (companded) 8-bit PCM at 3KHz sample rate, which if differentially decoded to 4 bit samples (so that each sample is not the PCM code, but the difference in level from the previous sample), then you can get about 1 second of audio in 1.5Kbytes. You would have to do some off-line processing of the original audio to encode it in this manner before storing the resulting data in the Arduino flash memory. You will also have to implement the necessary decode and linearisation.
Another possibility is to use synthesised rather then recorded speech. This technique uses recorded phonemes (components of speech) rather than whole words, and you then build words from these components. The results are generally somewhat robotic and unnatural (modern speech synthesis can in fact be very convincing, but not with the resources available on an Arduino - think 1980's Speak-and-Spell).
Although it can be rather efficient, phoneme speech synthesis requires different phoneme sets for different natural languages. It is possible perhaps for a limited vocabulary perhaps to only encode the subset of phonemes actually used.
You can hear a recording of the kind of speech that can be generated by a simple phoneme speech generator at http://nsd.dyndns.org/speech/. This page discusses a 1980's GI-SP0256 speech chip driven by an Arduino rather than speech generated by the Arduino, but it gives you an idea of what might be achieved - the GI-SP0256 managed with just 2Kb ROM - the Arduino could probably implement something similar directly. The difficulty perhaps is in obtaining the necessary phoneme set. You could possibly record your own and encode them as above. Each word or phrase would then simply be a list of phonemes and delays to be output.
The eSpeak project might be a good place to start - it is probably too large for Arduino, and the whole text to speech translation unnecessary, but it converts text to phonemes, so you could do that part off-line (on a PC), then load the phonemes and the replay code to the Arduino. It may still be too large of course.
Related
I have been working on this project nd the code has gotten ao huge that the microcontroller's flash memory is full,so I want to know if there is any way i can connect an external eeprom or any memory device that can help me have more program memory..
Thanx in advanced!!!!
The only 8-bit PICs that can use external program memory are high-end parts in the PIC18F series - all 64-pin or more.
If a substantial portion of your code size consists of text or other data (rather than actual code), you could store the data on an external SPI or I2C EEPROM. This would be much slower than having the data internally, and less convenient to use - you'd have to manually send an address and then read bytes from the external chip, you couldn't just access the data as an array.
The 16F877 is a rather old chip - you can certainly find ones with more capacity these days. A quick search on Microchip's part selector turns up several 16F chips with twice the program memory, such as the 16F1789. If you'd be willing to switch to the more powerful 18F series, you could double the program memory yet again - 18F4620, for example.
I'm working on a project with an Arduino, and I'd like to be able to save some data persistently. I'm already using an Ethernet shield, which has a MicroSD reader.
The data I'm saving will be incredibly small. At the moment, I'll just be saving 3 bytes at a time. What I'd really like is a way to open the SD card for writing starting at byte x and then write y bytes of data. When I want to read it back, I just read y bytes starting at byte x.
However, all the code I've seen involves working with a filesystem, which seems like an unneeded overhead. I don't need this data to be readable on any other system, storage space isn't an issue, and there's no other data on the card to worry about. Is there a way to just write binary data directly to an SD card?
It is possible to write raw binary data to an SD card. Most people do this using the 4-pin SPI interface supported by the SD card. Unfortunately, data isn't byte-addressed, but block-addressed (block size usually 512 bytes).
This means if you wanted to write 4 bytes at byte 516, you'd have to read in block 0x00000001 (the second block), and then calculate an offset, write your data, then write the entire block back. (I can't say that this limitation applies to the SD interface using more pins, I have no experience with it)
This complication is why a lot of people opt for using libraries that include "unneeded overhead".
With that said, I've had to do this in the past, because I needed a way of logging data that was robust in the face of power failures. I found the following resource very helpful:
http://elm-chan.org/docs/mmc/mmc_e.html
You'll probably find it easier to make your smaller writes to a memory buffer, and dump them to the SD card when you have a large enough amount of data to make it worthwhile.
If you look around, you'll find plenty of open-source code dealing with the SD SPI interface to make use of directly, or as reference to implement your own system.
I'm increasingly looking at using QR codes to transmit binary information, such as images, since it seems whenever I demo my app, it's happening in situations where the WiFi or 3G/4G just doesn't work.
I'm wondering if it's possible to split a binary file up into multiple parts to be encoded by a series of QR codes?
Would this be as simple as splitting up a text file, or would some sort of complex data coherency check be required?
Yes, you could convert any arbitrary file into a series of QR codes,
something like Books2Barcodes.
The standard way of encoding data too big to fit in one QR code is with the "Structured Append Feature" of the QR code standard.
Alas, I hear that most QR encoders or decoders -- such as zxing -- currently do not (yet) support generating or reading such a series of barcodes that use the structured append feature.
QR codes already have a pretty strong internal error correction.
If you are lucky, perhaps splitting up your file with the "split" utility
into pieces small enough to fit into a easily-readable QR code,
then later scanning them in (hopefully) the right order and using "cat" to re-assemble them,
might be adequate for your application.
You surely can store a lot of data in a QR code; it can store 2953 bytes of data, which is nearly twice the size of a standard TCP/IP packet originated on an Ethernet network, so it's pretty powerful.
You will need to define some header for each QR code that describes its position in the stream required to rebuild the data. It'll be something like filename chunk 12 of 96, though encoded in something better than plain text. (Eight bytes for filename, one byte each for chunk number and total number of chunks -- a maximum of 256 QR codes, one simple ten-byte answer, still leaving 2943 bytes per code.)
You will probably also want to use some form of forward error correction such as erasure codes to encode sufficient redundant data to allow for mis-reads of either individual QR codes or entire missing QR codes to be transparently handled well. While you may be able to take an existing library, such as for Reed-Solomon codes to provide the ability to fix mis-reads within a QR code, handling missing QR codes entirely may take significantly more effort on your part.
Using erasure codes will of course reduce the amount of data you can transmit -- instead of all 753,408 bytes (256 * 2943), you will only have 512k or 384k or even less available to your final images -- depending upon what code rate you choose.
I think it is theoretically possible and as simple as splitting up text file. However, you probably need to design some kind of header to know that the data is multi-part and to make sure different parts can be merged together correctly regardless of the order of scanning.
I am assuming that the QR reader library returns raw binary data, and you will you the job of converting it to whatever form you want.
If you want automated creation and transmission, see
gre/qrloop: Encode a big binary blob to a loop of QR codes
maxg0/displaysocket.js: DisplaySocket.js - a JavaScript library for sending data from one device to another via QR ocdes using only a display and a camera
Note - I haven't used either.
See also: How can I publish data from a private network without adding a bidirectional link to another network - Security StackExchange
I want to convert a sound from Mic to binary and match it from the database(a type of voice identification program but don't getting idea how to get sound from Mic directly so that i can convert it to binary?Also it is possible or not. Please guide me )
See this:
http://www.dotnetspider.com/resources/4967-How-record-voice-from-microphone.aspx
You're not going to be able to identify voices by doing a binary comparison on sound data. The binary of a particular sound will not be identical to an imitation of that sound unless it is literally the same file because of minor variations in just about everything. You'll need to do some signals processing to do a fuzzy comparison of the data. You can read about signal processing on wikipedia.
You will probably find it easier to use a third party library to process the sound for you. Something like this might be a good start.
You're looking at two very distinct problems here.
The first is pretty technical: Getting sound from the microphone into a digital waveform. How you do this exactly depends on the OS and API you're using (on Windows, you're probably looking at DirectX audio or, if available, ASIO). Typically, this is how you'd proceed:
Set up a recording buffer for the microphone, with suitable parameters (number of channels, physical input on the sound card, sample rate, bit depth, buffer size)
Start the recording. This usually involves pointing the sound library to a callback function to process the recorded buffer.
In the callback, read the buffer, convert it to a suitable format, and append it to the audio file of your choice. (You could also record to RAM only, but longer recordings may exceed available storage).
Store the recorded audio in a suitable database field (some kind of binary blob)
This is the easy part though; the harder part is matching a chunk of audio data against other chunks. A naïve approach would be to try and find exact matches, but that won't help you much, because the chance that you find one is practically zero - recording equipment, even the best, introduces a bit of random noise, and recording setups vary slightly whether you want to or not, so even if you'd have someone say something twice, perfectly identical, you'd still see differences in the recorded audio.
What you need to do, then, is find certain typical characteristics of the waveform. Things you could look for are:
Overall amplitude shape
Base frequencies
Selected harmonics (formants)
Extracting these is non-trivial and involves pretty severe math; and then you'll have to condense them into some sort of fingerprint, and find a way to compare them with some fuzziness (so that a near-match is good enough, rather than requiring exact matches). Finding the right parameters and comparison algorithms isn't easy, and it takes a lot of tweaking and testing; your best bet is to go find a library that does this for you.
Say you have a conference room and meetings take place at arbitrary impromptu times. You would like to keep an audio record of all meetings. In order to make it as easy to use as possible, no action would be required on the part of meeting attenders, they just know that when they have a meeting in a specific room they will have a record of it.
Obviously just recording nonstop would be inefficient as it would be a waste of data storage and a pain to sift through.
I figure there are two basic ways to go about it.
Recording simply starts and stops according to sound level thresholds.
Recording is continuous, but split into X minute blocks. Blocks found to contain no content are discarded.
I like the second way better because I feel there is less risk for losing data because of late starts, or triggers failing.
I would like to implement in Python, and on Windows if possible.
Implementation suggestions?
Bonus considerations that probably deserve their own questions:
best audio format and compression for this purpose
any way of determining how many speakers are present, assuming identification is unrealistic
This is one of those projects where the path is going to be defined more about what's on hand for ready reuse.
You'll probably find it easier to continuously record and saving the data off in chunks (for example, hour long pieces).
Format is going to be dependent on what you in the form of recording tools and audio processing library. You may even find that you use two. One format, like PCM encoded WAV for recording and processing, but compressed MP3 for storage.
Once you have an audio stream, you'll need to access it in a PCM form (list of amplitude values). A simple averaging approach will probably be good enough to detect when there is a conversation. Typical tuning attributes:
* Average energy level to trigger
* Amount of time you need to be at the energy level or below to identify stop and start (I recommend two different values)
* Size of analysis window for averaging
As for number of participants, unless you find a library that does this, I don't see an easy solution. I've used speech recognition engines before and also done a reasonable amount of audio processing and I haven't seen any 'easy' ways to do this. If you were to look, search out universities doing speech analysis research. You may find some prototypes you can modify to give your software some clues.
I think you'll have difficulty doing this entirely in Python. You're talking about doing frequency/amplitude analysis of MP3 files. You would have to open up the file and look for a volume threshold, then cut out the portions that go below that threshold. Figuring out how many speakers are present would require very advanced signal processing.
A cursory Google search turned up nothing for me. You might have better luck looking for an off-the-shelf solution.
As an aside- there may be legal complications to having a recorder running 24/7 without letting people know.