I'm using vosk for speech recognition. Does anyone know where the vocabulary dictionary is located and how to edit it to add or remove words?
Some Background on my project:
I'm working on a Linguistic AI project. I needed a speech recognition engine to convert spoken words into text. I started using CMUSphinx. PocketSphinx to be more precise. I like pocketsphinx but I was told that it is obsolete and that vosk is much better. However, pocketsphinx is very easy to use in terms of creating dictionaries from scratch and switching between different dictionaries on the fly programmatically.
I'm trying to move over to vosk as a speech recognizer. And it does seem to decode speech much fast and more accurately. But thus far I haven't been able to find any information on how to modify the vocabulary dictionary. The ability to modify the contents of the dictionary is of paramount importance in my Linguistic AI project. So if anyone can point to information of how to modify the vosk dictionary I would be very grateful. Thus far I haven't been able to find any information on how to do this. There is very little information on vosk to be found, especially in the way of tutorials or detailed instructions.
Thank you.
Edited to Add:
Here's the GitHub page for the vosk API that I'm referring to:
https://github.com/alphacep/vosk-api
Related
Does anyone know how to generate an 11x11 module QR-code? Alternatively, it could also be a 13x13 code, but preferably 11x11.
I know the codes exist, but I can't figure out how to go about generating my own.
The code doesn't have to store much information. It would be nice with a tiny image, but it wouldn't need more than a simple text message.
I have used online code generators in the past, but most of them requires a paid subscription to customize and make codes, and as my project isn't sponsored, I don't have the budget to pay for the code.
Thanks in advance for all answers
I'm looking for an online tools where me and my team could collaborate on creating graphs.
The purpose is to bind related words, and generate the adjacency list. For example,
Foo----Bar----Brool
|_____Lol
will generate the following list :
Foo,[Bar]
Bar,[Foo,Brool,Lol]
Brool,[Bar]
Lol,[Bar]
The idea is to allow people to collaborate simply using graph visualization, without diving through the adjacency list directly.
There is one service wchich I believe is going to be designed to allow people to collaborate on creating a graph. It is Graph Commons. Site slogan says:
Collaborative 'network mapping' platform and knowledge base of relationships
Unfortunately at the moment you can only sign up for beta invitation on the website. And from the website it is not clear what the creation/editing mechanism would be.
You could use yfiles library to build a graph editor online, but I've never used it and I don't know if you can manage multimple sessions (hence allowing direct collaboration). But, for instance, if you use graphity, which is an implementation of yfiles flex library, and save a file on dropbox, then each collaborator has access to that file, and you can set up a rudimentary collaboration graph tool. Maybe.
It would be great to have tools like LucidChart or Draw.io, but they don't allow to export a graph file (e.g. graphML from which you can then have an edgelist with some other programs like Gephi). Those tools only allow you to export images and vectors. Draw.io exports xml, but not graphML.
I believe Linkurious let you edit your graph. Again, I've never used it, I don't know if you can manage multiple sessions > collaboration. But I would check it out. Edit: Linkurious enterprise edition (see pricing) is desegned to handle multiple user sessions.
What about building something with vis.js? The library has the ability to «listen for changes in the data» using a DataSet component. Have a look at this example.
I'm sorry if I don't have any real answer, but since your question is very interesting in these days, and the right tools would come out sooner or later (if it doesn't exists), I wanted to share these thoughts. I hope they can help. Please post when you find a solution!
Within a DWT Template Building Block, we can use a few "free" variables such as ##Component.Title## or ##Component.ID## as well as built-in DWT functions.
I didn't realize we can also get a component's schema description with ##Component.Description## or ##Description##.
The out-of-the-box Default Dreamweaver Component Design has a good set of examples, along with the Tridion Cook book's iteration example, and SDL Live Content.
How else could I find other allowed built-in DWT functions and variables, programmatic or otherwise?
In other words, I wouldn't have thought ##Description## was even available in DWT without seeing an example first (not that I have a use for it yet).
Edit (June 8, 2013): I did find additional information on SDL Live Content (requires login). We can of course use available Package variables as described in the documentation.
Researching a bit, I found that if we go to the tridion.contenmanager.config file, we will discover the node, which references to the Dreamweaver mediator type:
<mediator matchMIMEType="text/x-tcm-dreamweaver" type="Tridion.ContentManager.Templating.Dreamweaver.DreamweaverMediator" />
This namespace can be found inside Tridion.ContentManager.Templating.dll
Decompiling is the best way to find out what is inside and learn something. Since it is .Net code, that will not be a problem, there is many free good tools available. I'm using lately JustDecompile
I did not go too deep into the code, but I can see that there is a TridionObjectSource class, with a number of Constants for reserved words, like:
ReservedNameTitle
ReservedNameDescription
Searching where this constants are used on the code, can help to better understand what they do, and the way the Dreamweaver Mediator works inside.
Seems like an interesting learning exercise
I take it that you've searched the documentation for the answer and come up empty. I suggest that you go to the relevant part of the LiveContent documentation and add a comment. This will reach the documentation team directly, and I'm sure they'll be very interested to hear of a feature that isn't properly covered. With a bit of luck they'll update it, and you'll have done us all a favour.
I have some sympathy for the "help yourself" approach too, but if you find a feature by your own analysis of the software, and it gets removed in a later release, you won't have a leg to stand on to complain about this. So help Tridion to get the feature documented, and then it's there to use with confidence.
I'd like to use the xref information from a GPS Ada project to generate lists of the variables defined for each package spec and body. I need to exclude any variables defined inside of subprograms.
I can see this information in GPS's "Project View" which shows the literals, package, pragmas, types, and variables defined in each file. However, the information is not selectable for cut/paste. How do I generate this in text form?
GPS is customised using Python. The provided scripts are in {installation}/share/gps/library; it looks as though unused_entities.py might be a good start. Or, there's a chapter on "Customizing and Extending GPS" in the GPS documentation.
[Edit]
Or, even better, look at the example globals.py in {installation}/share/examples/gps/python. A quick poke through the documentation (accessed in GPS via Help/Python extensions) suggests you're looking for GPS.Entities e where e.category() is "object".
Since you mention GPS, have you tried Tools->Documentation->Generate project?
This will generate html, with hyperlinks etc, similar to Javadoc.
SciTools' Understand product can extract this information, although it's rather pricey. Though if you're working with a mound of legacy code, it's well worth the money--it has saved my bacon on more than one occasion.
I'm looking at requiring my team to document their code more thoroughly for some major upcoming projects and to make life a little less painful, I am steering towards XML documentation generators such as Sandcastle, Doxygen or Box Live Documenter.
What are the key considerations I should keep in mind when evaluating the best option and what experiences have led you to a particular decision?
For me the key considerations would be:
Fully automated: Can it be set up in such a way so that pretty much
no outside work is required to
create or edit the documentation.
Fully styled: Can the documentation be fully styled so
that it looks great in a wiki or pdf
after it’s generated. I should be
able to change colors, font sizes,
layouts, etc.
Good Filtering: Can I select only the items I want to be
generated. I should be able to
filter the namespaces, file types,
classes, etc.
Customization: Can I include headers, footers, custom elements,
etc.
I found Doxygen could do all of this. Our workflow is as follows:
Developer makes a change to the code
They update the documentation tags right above the code they just changed
We click a generate button
Doxygen will then extract all the XML documentation from the code, filter it to only include the classes and methods we want, and apply the CSS styling we’ve pre-made for it. Our end result is an internal wiki that looks the way we want, and doesn’t require editing.
Extra: We have all our projects in various git repositories. We pull all these down to one root folder and generate the docs form this root folder..
Would be interested to know how others are automating even further..?
Who is paying for the documentation and why? (is the system stable enough, does it add enough value)
Who is going to read it, and why is she not using a more effective communication channel?
(if correct mostly distance in time/place)
Who is going to keep it up to date.
When are you going to destroy it? (Automatically if it hasn't been read or updated in the past three months?)
I mostly prefer better code to make my life less painful, over more documentation, but I like scenario & unit tests and a high level architecture description.
[edit] Documentation costs time and money to write and keep up to date. JavaDoc style documentation has a serious detrimental effect on the amount of code simultaneously visible and might be a good idea for the developers using the code, but not for those writing it.