bing speech api speech to text duration statistics - azure-cognitive-services

I have a .wav file which has a single sentence.
Does Bing Speech api analyze .wav files which can convert speech to text. What I would like to know is does bing speech api can give me statistics like example
input .wav file: "This is linda"
output: word: this duration: 5ms
word: is duration: 2ms
word: linda duration: 5ms
pause duration: 2ms
total duration : 12ms

Related

Google translate text to speech and apostrophes

I am using google API to translate a sentence. Once translated I use text to speech google API with the result of the translation.
Translation and text to speech work pretty well in general. However, I have a problem with the apostrophes. For example:
1) Translation result: I & # 3 9 ; m tired (Note: I had to separate the characters with spaces because it was shown as "I´m tired" in the preview...
2) Text to speech result says : "I and hash thirty nine m tired" (or something similar)
What kind of encoding do I need to use in the 1st step to get the output string right (i.e. I´m tired)
The program is in python. I include an extract here:
def tts_translated_text (self, input_text, input_language):
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = input_text.encode ("utf-8")
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text=input_text)
voice = texttospeech.types.VoiceSelectionParams( language_code=input_language, ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.LINEAR16)
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.wav', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
Thanks in advance,
Ester
I finally found what was wrong. Google Translate API returns the string with HTML encoding. And Google Text-To-Speech expects UTF-8 encoding.
I am forced to use python2.7 so I did the following:
translated_text = HTMLParser.HTMLParser().unescape (translated_text_html)
Where translated_text_html is the returned string from the translation API invocation
In python3 it should be:
translated_text = html.unescape (translated_text_html)

Deleting whole lines of a text file when starting with certain words in R

I have a .txt file that contains multiple newspaper articles. Each article has a headline, the author name etc. I want to read the whole .txt file in R and remove every line + the next 5 lines that starts with certain words. I think gsub + reg expression might be the solution, but I do not know how to define it like the way so that not only the line containing these words is deleted, but also the next 5 lines.
Edit:
The txt. file consists of 200 Washington Post articles. Each article ends with:
lydia.depillis#washpost.com
LOAD-DATE: July 14, 2013
LANGUAGE: ENGLISH
PUBLICATION-TYPE: Web Publication
Copyright 2013 Washingtonpost.Newsweek Interactive Company, LLC d/b/a Washington
Post Digital
All Rights Reserved
4 of 200 DOCUMENTS
Washington Post Blogs
In the Loop
June 28, 2013 Friday 3:08 PM EST
Whenever an e-mail address appears, I want to delete everything until the line where a date appears so that we have a smooth transition to the next article. I want to use a sentiment analysis and thus don't need these lines.

CSS for setting up figcaption format for Fig. Chapter number-Figure number

I'm editing an ePub and I need to set up the figures so that the caption has the format:
+=========+
| image |
+=========+
Fig. 8-1: My Image
I'm familiar with using CSS counters to reset a counter, increment it, and to insert it into this format:
Fig. 1: My Image
However, I've spent half a day looking through Google and a host of other sites trying to see how to do this with no success.
Any help/pointers/etc. appreciated!

unwanted "NUL" characters string after reading bytes of a mjpg TCP stream

I'm trying to record a jpeg image sent by an Ethernet camera in a mjpg stream.
The images I obtain with my Borland C++ application (VSPCIP) are sometimes "corrupted" :
I have the example of a "corrupted jpeg frame" :
it has 21690 characters (for a 640x480 jpeg image) and among them there is a string of 5045 following characters which have the value "NUL" (displayed as NUL in Notepad++).
And because I stop reading bytes when I reach the "content-length" specified in the mjpg header, the following bytes are cut off.
Two things :
- I would first like to remove these corrupted frame : how may I detect (quickly) a string of let's say more than 50 (or directly 5000 or 5045) following "NUL" characters) ?
- I have to find why my application adds this string of following "NUL" characters.

How can I convert MathType equation into MathML format?

I want to convert MathType equation saved as GIF format to MathML. Firstly, I opened these GIF files and saved them within MathType 6.7. As a result, MathML text is inserted into the end of GIF files. However, when I extracted MathML text from these GIF files using Perl script, I found some garbled characters in the MathML text as following text:
<mn>xxx</mn>
In the above line, a garbled character  is inserted before 'mn' label. Is this MathType 's BUG? How can I work around this problem? I have uploaded my test GIF files. URL is: http://ubuntuone.com/p/1352/
Update:
I have tried to paste full block of MathML here, but I found the syntax format of MathML text was messed. So I pasted the MathML on GitHub: https://gist.github.com/1068723.
There is a garbled character in the seventh line of MathML text: "  ?#x00A0;".
The original GIF file which doesn't contain MathML text: http://ubuntuone.com/p/13Ba/
Perl script that extracts MathML from GIF image generated by MathType: https://gist.github.com/1068749
Thanks,
thinkhy
Thanks thinkhy. It could be you extracting the data incorrectly (we haven't looked at your script yet). Only one of your GIFs had MathML -- the one that has a file name starting 106R. In that one, if you just grab all the bytes from the first bit that looks like MathML until the end, you do periodically get odd bytes in there, mostly 255's except the last one. (This however doesn't appear to be the junk character you're seeing.) The reason for the 255's is that the MathML is distributed over multiple comment records, each one of which starts with a count of the bytes in the record. From the MathType SDK (free download; link below):
GIF Image Files
MathML text is embedded into a GIF file as an Application Extension Record, which consists of a 14-byte header (Application Extension Descriptor), followed by the MTEF data. The header contains:
Byte Introducer = 0x21;
Byte ExtensionLabel = 0xFF;
Byte BlockSize = 0x0B;
Byte ApplicationId[8] = "MathType";
Byte AuthenticationCode[3] = "003";
The data follows this header and is written as a series of blocks each containing 255 bytes or less. Each block starts with a single byte count followed by the data. The end is marked as a block with length 0.
The header is unique enough that the easiest way to extract the data might be to scan the file for the 14-byte header, then expect the MathML data blocks to follow. Properly decoding the GIF records isn't that hard either, but obviously requires you read the GIF specification.
You may already be using the SDK, but you didn't say whether you were or not, so here's the link: http://www.dessci.com/en/reference/sdk/.

Resources