I am working in a project where my task deals with speech/audio/voice comparison. This project is used for judging the winner in the competitions(mimicry). Practically I need to capture the user's speech/voice and compare it with the original audio file and return a percentage match. I need to develop this in R-language.
I had already tried voice related packages in R (tuneR, audio, seewave) but in my search, I am not able to get the comparison related information.
I need some assistance from you guys that where, I can find the information related to my work, which is the best way to handle this type of problems and if there, what are the prerequisites for processing these type of audio related work.
Basically, the best features to be used for speech/voice comparison are the MFCC.
There are some softwares that can be used to extract these coefficients: Praat website
You can also try to find a lib to extract these coefficients.
[Edit: I've found in tuneR documentation that it has a function to extract MFCC - search for the function melfcc()]
After you've extracted these features, you can use Machine Learning (SVM, RandomForests or something like that) to develop a classifier.
I have a seminar that I've presented about Speaker Recognition Systems, take a look at it, it may be helpful. (Seminar)
If you have time and interest, you could algo read:
Authors: Kinnunen, T., & Li, H. (2010)
Paper: an overview of text-independent speaker recognition: From features to supervectors
After you get a feature vector for each audio sample (with MFCC and/or other features), then you'll need to compare pairs of feature vectors (Features from A versus Features from B):
You could try to use the Absolute Difference between these feature vectors:
abs(feature vector from A - feature vector from B)
The result of the operation above is a feature vector where every element is >=0 and it has the same size of the A (or B) feature vector.
You could also test the element-wise multiplication between A and B features:
(A1*B1, A2*B2, ... , An*Bn)
Then you need to label each feature vector
(1 if person A == person B and 0 if person A != person B).
Usually the absolute difference performs better than the multiplication feature vector, but you can append both vectors and test the performance of the classifier using both the abs diff and the multiplication features at the same time.
Related
In traditional Simplex Algorithm notation, we have x at the current basis selection B as so:
xB = AB-1b - AB-1ANxN. How can I compute the AB-1AN term inside a separator in SCIP, or at least iterate over its columns?
I see three helpful methods: getLPColsData, getLPRowsData, getLPBasisInd. I'm just not sure exactly what data those methods represent, particularly the last one, with its negative row indexes. How do I use those to get the value I want?
Do those methods return the same data no matter what LP algorithm is used? Or do I need to account for dual vs primal? How does the use of the "revised" algorithm play into my calculation?
Update: I discovered the getLPBInvARow and getLPBInvRow. That seems to be much closer to what I'm after. I don't yet understand their results; they seem to include more/less dimensions than expected. I'm still looking for understanding at how to use them to get the rays away from the corner.
you are correct that getLPBInvRow or getLPBInvARow are the methods you want. getLPBInvARow directly returns you a of the simplex tableau, but it is not more efficient to use than getLPBInvRow and doing the multiplication yourself since the LP solver needs to also compute the actual tableau first.
I suggest you look into either sepa_gomory.c or sepa_gmi.c for examples of how to use these methods. How do they include less dimensions than expected? They both return sparse vectors.
Is there any (simple) random generation function that can work without variable assignment? Most functions I read look like this current = next(current). However currently I have a restriction (from SQLite) that I cannot use any variable at all.
Is there a way to generate a number sequence (for example, from 1 to max) with only n (current number index in the sequence) and seed?
Currently I am using this:
cast(((1103515245 * Seed * ROWID + 12345) % 2147483648) / 2147483648.0 * Max as int) + 1
with max being 47, ROWID being n. However for some seed, the repeat rate is too high (3 unique out of 47).
In my requirements, repetition is ok as long as it's not too much (<50%). Is there any better function that meets my need?
The question has sqlite tag but any language/pseudo-code is ok.
P.s: I have tried using Linear congruential generators with some a/c/m triplets and Seed * ROWID as Seed, but it does not work well, it's even worse.
EDIT: I currently use this one, but I do not know where it's from. The rate looks better than mine:
((((Seed * ROWID) % 79) * 53) % "Max") + 1
I am not sure if you still have the same problem but I might have a solution for you.
What you could do is use Pseudo Random M-sequence generators based on shifting registers. Where you just have to take high enough order of you primitive polynomial and you don't need to store any variables really.
For more info you can check the wiki page
What you would need to code is just the primitive polynomial shifting equation and I have checked in an online editor it should be very easy to do. I think the easiest way for you would be to use Binary base and use PRBS sequences and depending on how many elements you will have you can choose your sequence length. For example this is the implementation for length of 2^15 = 32768 (PRBS15), the primitive polynomial I took from the wiki page (There youcan find the primitive polynomials all the way to PRBS31 what would be 2^31=2.1475e+09)
Basically what you need to do is:
SELECT (((ROWID << 1) | (((ROWID >> 14) <> (ROWID >> 13)) & 1)) & 0x7fff)
The beauty of this approach is if you take the sequence of the PRBS with longer period than your ROWID largest value you will have unique random index. Very simple. :)
If you need help with searching for primitive polynomials you can see my github repo which deals exactly with finding primitive polynomials and unique m-sequences. It is currently written in Matlab, but I plan to write it in python in next few days.
Cheers!
What about using good hash function and map result into [1...max] range?
Along the lines (in pseudocode). sha1 was added to SQLite 3.17.
sha1(ROWID) % Max + 1
Or use any external C code for hash (murmur, chacha, ...) as shown here
A linear congruential generator with appropriately-chosen parameters (a, c, and modulus m) will be a full-period generator, such that it cycles pseudorandomly through every integer in its period before repeating. Although you may have tried this idea before, have you considered that m is equivalent to max in your case? For a list of parameter choices for such generators, see L'Ecuyer, P., "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", Mathematics of Computation 68(225), January 1999.
Note that there are some practical issues to implementing this in SQLite, especially if your SQLite version supports only 32-bit integers and 64-bit floating-point numbers (with 52 bits of precision). Namely, there may be a risk of—
overflow if an intermediate multiplication exceeds 32 bits for integers, and
precision loss if an intermediate multiplication results in a greater-than-52-bit number.
Also, consider why you are creating the random number sequence:
Is the sequence intended to be unpredictable? In that case, a linear congruential generator alone is not enough, and you should generate unique identifiers by other means, such as by combining unique numbers with cryptographically random numbers.
Will the numbers generated this way be exposed in any way to end users? If not, there is no need to obfuscate them by "shuffling" them.
Also, depending on the SQLite API you're using (for your programming language), there may be a way to write a custom function to convert the seed and ROWID to a random unique number. The details, however, depend heavily on the specific SQLite API. Another answer shows an example for Perl.
I have confusion. Semantically we can construct 2^(2^n) boolean functions, but I read in Digital Electronics Morris Mano that we can construct 2^2n combinations of minterm/maxterm. How?
Samsamp, could you point to a specific place in the book or even better provide an exact quote? In the copy I found over the Internet I was not able to find such a claim after a fast glance over. The closest thing I found is:
Since the function can be either I or 0 for each minterm, and since there
are 2^n min terms, one can calculate the possible functions that can be formed with n variables to be 2^2^n.
which looks OK to me.
We are trying to create an intelligent chatbot for customer service. We have a corpus of customer service questions and answers, with a flagged intention of each conversation. We are exploring to use Deep Learning to train our models but we encounter a couple of issues:
1 - How to do feature engineering to train models on text data. Specifically, how do you turn language into vectors ?
2 - How to use non-word features that you use as input for the intent recognition deep learning classifier? How do you accommodate e.g. client product names?
3 - How to choose a neural network architecture for Deep Learning with text input?
4 - How can we deal with situations where we do not have enough data? Use Bayesian techniques?
Cool.. great start !!.
before you make jump to implementation, i would suggest please do learn some basics.
anyway , here are answers to your questions. !!
feature engineering : as name suggests , in your data there are something that may reduce accuracy of your model. like words mixed with small and capital character, digits ,special character , lines ends with some special character.. etc. which after feature engineering gives more accuracy!! but again it's required all depends on what type of data you have !!
language into vectors : any type of language , at the end it is text (here in your case). we can give vector representation to word or character. this vector representation can be get by one hot vector or using pre-built methods like word2vec or glove.
one hot vector :- let's say you have 100 words from your training dataset . then create k-dimensional vector for each word. where k is total number of words. sord word by their character position. and based on thire sorted order create vector with keeping their index position 1 and rest as 0.
ex: [1 0 0 0 0 ....] - word1
[0 1 0 0 0 ....] - word2
[0 0 0 0 0 ...1] - word100
non-word features : follow same rule as word-features
client product name :- create one hot vector as they are not usually used in text. and they don't have meaning in real life.
how to choose NN :- it depends on what you want to achieve. NN can be used in many ways for many purpose.
not enough data :- it again depends on your data. !! if your data has more common pattern and in future data also these patterns going to come !! then it's still okay to use NN. else i don't recommend to use NN.
Good Luck !!
Some additions to the previous answer from Achyuta nanda sahoo. (Numbering according to your questions)
As he said, use some pretrained word embedding layers (Fasttext, word2vec)
U can find pretrained Models e.g. Here:
https://github.com/facebookresearch/fastText/blob/master/docs/pretrained-vectors.md
U can particularly find client product names using Named Entity Recognition. U can e.g. start off with the following repo
https://github.com/guillaumegenthial/tf_ner
U can start with some simple question answering matching according to cosine similarity, as done here:
https://github.com/sachinbiradar9/Question-Answer-Selection
Even if u initially do not have enough data, u may start with a deep neural net by pretraining on a huge dataset that comes from a similar question answering data distribution. There should be tons of websites, where u can find these question answering scenarios ready for scraping :-)
Best
I read the mapreduce at http://en.wikipedia.org/wiki/MapReduce ,understood the example of how to get the count of a "word" in many "documents". However I did not understand the following line:
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
Can someone elaborate on the difference again(MapReduce framework VS map and reduce combination)? Especially, what does the reduce functional programming do?
Thanks a great deal.
The main difference would be that MapReduce is apparently patentable. (Couldn't help myself, sorry...)
On a more serious note, the MapReduce paper, as I remember it, describes a methodology of performing calculations in a massively parallelised fashion. This methodology builds upon the map / reduce construct which was well known for years before, but goes beyond into such matters as distributing the data etc. Also, some constraints are imposed on the structure of data being operated upon and returned by the functions used in the map-like and reduce-like parts of the computation (the thing about data coming in lists of key/value pairs), so you could say that MapReduce is a massive-parallelism-friendly specialisation of the map & reduce combination.
As for the Wikipedia comment on the function being mapped in the functional programming's map / reduce construct producing one value per input... Well, sure it does, but here there are no constraints at all on the type of said value. In particular, it could be a complex data structure like perhaps a list of things to which you would again apply a map / reduce transformation. Going back to the "counting words" example, you could very well have a function which, for a given portion of text, produces a data structure mapping words to occurrence counts, map that over your documents (or chunks of documents, as the case may be) and reduce the results.
In fact, that's exactly what happens in this article by Phil Hagelberg. It's a fun and supremely short example of a MapReduce-word-counting-like computation implemented in Clojure with map and something equivalent to reduce (the (apply + (merge-with ...)) bit -- merge-with is implemented in terms of reduce in clojure.core). The only difference between this and the Wikipedia example is that the objects being counted are URLs instead of arbitrary words -- other than that, you've got a counting words algorithm implemented with map and reduce, MapReduce-style, right there. The reason why it might not fully qualify as being an instance of MapReduce is that there's no complex distribution of workloads involved. It's all happening on a single box... albeit on all the CPUs the box provides.
For in-depth treatment of the reduce function -- also known as fold -- see Graham Hutton's A tutorial on the universality and expressiveness of fold. It's Haskell based, but should be readable even if you don't know the language, as long as you're willing to look up a Haskell thing or two as you go... Things like ++ = list concatenation, no deep Haskell magic.
Using the word count example, the original functional map() would take a set of documents, optionally distribute subsets of that set, and for each document emit a single value representing the number of words (or a particular word's occurrences) in the document. A functional reduce() would then add up the global counts for all documents, one for each document. So you get a total count (either of all words or a particular word).
In MapReduce, the map would emit a (word, count) pair for each word in each document. A MapReduce reduce() would then add up the count of each word in each document without mixing them into a single pile. So you get a list of words paired with their counts.
MapReduce is a framework built around splitting a computation into parallelizable mappers and reducers. It builds on the familiar idiom of map and reduce - if you can structure your tasks such that they can be performed by independent mappers and reducers, then you can write it in a way which takes advantage of a MapReduce framework.
Imagine a Python interpreter which recognized tasks which could be computed independently, and farmed them out to mapper or reducer nodes. If you wrote
reduce(lambda x, y: x+y, map(int, ['1', '2', '3']))
or
sum([int(x) for x in ['1', '2', '3']])
you would be using functional map and reduce methods in a MapReduce framework. With current MapReduce frameworks, there's a lot more plumbing involved, but it's the same concept.