Cannot upload speech dataset because "Failed" - microsoft-cognitive

So I am trying to upload a dataset to the microsoft cognitive services speech portal for custom models.
I have been doing this for about a year without issue, however now I am getting "Failed" with the detail "Failed to upload data. Please check your data format and try to upload again." ... very useful.
So does anyone know what could be causing the issue apart from the below which I have already checked.
Filesize is 1.3GB (zipped) / 1.8GB (unzipped) which is below the 2GB limit for "Max acoustic dataset file size for Data Import" as specified in https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-services-quotas-and-limits#model-customization
The Trans.txt file is a properly formatted 1.3MB UTF-8 with a BOM text file with tab separated filename / text values as specified in https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train
All entries in the Trans.txt file are present in the directory
All files in the directory have an associated entry in the Trans.txt file
All files are WAV files in the specified format.
Basically all of the above has been working for a year with the only thing that really changes is the size of the zip file which is still below limits.
On the off-chance someone from MS sees this, the dataset ID is: 7a3f240c-5eb7-4942-8e0f-7efa1b808eee
Related feedback post: https://feedback.azure.com/forums/932041-azure-cognitive-services/suggestions/42375118-actionable-error-messaging-in-speech-portal

After contacting MS support it appears something broke server-side related to the file-size even though we are within limits. They are working on fixing it.

Related

Parsing FB-Purity's Firefox idb (Indexed Database API) object_data blob from Linux bash

From a Linux bash script, I want to read the structured data stored by a particular Firefox add-on called FB-Purity.
I have found a folder called .mozilla/firefox/b8eab5j0.default/storage/default/moz-extension+++37a9788c-671d-4cae-ba5c-fbdb8788499a^userContextId=4294967295/ that contains a .metadata file which contains the string moz-extension://37a9788c-671d-4cae-ba5c-fbdb8788499a, an URL which when opened in Firefox shows the add-on's details, so I am pretty sure that this folder belongs to the add-on.
That folder contains an idb directory, which sounds like Indexed Database API, a W3C standard apparently used since last year by Firefox it to store add-ons data.
The idb folder only contains an empty folder and an SQLite file.
The SQLite file, unfortunately, does not contain much application structured data, but the object_data table contains a 95KB blob which probably contains the real structured data:
INSERT INTO `object_data` VALUES (1,'0pmegsjfoetupsf.742612367',NULL,NULL,
X'e08b0d0403000101c0f1ffe5a201000400ffff7b00220032003100380035003000320022003a002
2005300610074006f0072007500200055007205105861006e00690022002c00220036003100350036
[... 95KB ...]
00780022007d00000000000000');
Question: Any clue what this blob's format is? How to extract it (using command line or any library or Linux tool) to JSON or any other readable format?
Well, I had a fun day today figuring this out and ended creating a Python tool that can read the data from these indexedDB database files and print them (and maybe more at some point): moz-idb-edit
To answer the technical parts of the question first:
Both the name key (name) and data (value) use a Mozilla proprietary format whose only documentation appears to be its source code at this time.
The keys use a special just-for-this use-case encoding whose rough description is available in mozilla-central/dom/indexedDB/Key.cpp – the file also contains the only known implementation. Its unique selling point appears to be the fact that it is relatively compact while being compatible with all the possible index types websites may throw at you as well as being in the correct binary sorting order by default.
The values are stored using SpiderMonkey's internal StructuredClone representation that is also used when moving values between processes in the browser. Again there are no docs to speak of but one can read the source code which fortunately is quite easy to understand. Before being added to the database however the generated binary is compressed on-the-fly using Google's Snappy compression which “does not aim for maximum compression [but instead …] aims for very high speeds and reasonable compression” – probably not a bad idea considering that we're dealing with wasteful web content here.
To locate the correct indexedDB file for an extension's local storage data, one needs to resolve the extension's static ID to a so-call “internal UUID” whose value is different in every browser profile instance (to make tracking based on installed addons a lot harder). The mapping table for this is stored as a pref (“extensions.webextensions.uuids”) in the prefs.js. The IDB path then is ${MOZ_PROFILE}/storage/default/moz-extension+++${EXT_UUID}^userContextId=4294967295/idb/3647222921wleabcEoxlt-eengsairo.sqlite
For all practical intents and purposes you can read the value of a single storage key of any extension by downloading the project mentioned above. Basic usage is:
$ ./moz-idb-edit --extension "${EXT_ID}" --profile "${MOZ_PROFILE}" "${STORAGE_KEY}"
Where ${EXT_ID} is the extension's static ID (check its manifest.json file or look in about:support#extensions-tbody if your unsure), ${MOZ_PROFILE} is the Firefox profile directory (also in about:support) and ${STORAGE_KEY} is the name of the key you'd like to query (unfortunately querying all keys is not supported yet).
Also writing data is not currently supported either.
I'll update this answer as I implement more features (or drop me an issue on the project page!).

RStudio converted code to jibberish, froze, and saved as gibberish

I had a small but important R file that I have been working on for a few days.
I created and uploaded a list of about 1,000 ID's to SQL Server the other day and today I was repeating the process with a different type of ID. I frequently save the file and after having added a couple of lines and saved, I ran the sqlSave() statement to upload the new ID's.
RStudio promptly converted all of my code to gibberish and froze (see screen shot).
After letting it try to finish for several minutes I closed RStudio and reopened it. It automatically re-opened my untitled text files where I had a little working code, but didn't open my main code file.
When I tried to open it I was informed that the file is 55 Megabytes and thus too large to open. Indeed, I confirmed that it really is 55MB now and when opening it in an external text editor I see the same gibberish as in this screnshot.
Is there any hope of recovering my code?
I suppose a low memory must be to blame. The object and command I was executing at the time were not resource intensive, however a few minutes before that I did retrieve an overly large dataframe from SQL Server.
You overwrote your code with a binary representation of your objects with this line:
save.image('jive.R')
save.image saves the R objects, not your R script file. To save your script, you can just click "File->Save". To save your objects, you would have to put that in a different file.

Efficency for reading file names of a directory ASP.NET

How efficient is reading the names of files in a directory in ASP.NET?
Background: I want to update pictures on a webserver automatically and deploy them in advance. E.g. until the 1. April I want to pick 'image1.png'. After the 1. April 'image2.png'. To achieve this I have to map every image name to a date which indicates if this image has to be picked or not.
In order to avoid mapping between file name and date in a seperate file or database the idea is to put a date in the file name. Iterating the directory and parsing the dates make me find my file.
E.g.:
image_2013-01-01.png
image_2013-04-31.png
The second one will be picked from May to eternity if no image with a later date will be dropped.
So I wonder how this solution impacts the speed of a website assuming <20 files.
If you are using something like Directory.GetFiles, that is one call to the OS.
This will access the disk to get the listing.
For less that 20 files this will be very quick. However since this data is unlikely to change very often, consider caching the name of your image.
You could store it in the application context to share it among all users of your site.

ASP.NET FileUpload

Greetings!
I am using the ASP.NET FileUpload control to allow users to upload text files to our web server. Everything works great in terms of saving the file to where we wanted, etc, using the SaveAs() method of the control.
But we were caught off guard by one seemingly simple caveat: the original timestamp of the uploaded file was lost such as the date last modified and date create. The date last modified and date created become the actual date and time when the file is saved to the server.
My question is: is there anyway to retain the original timestamp by setting some attributes that I am not aware of yet or is it possible to read the metadata of the file to get its original time stamp?
Any in-sight and suggestions are greatly appreciated.
John
Unless the file format being uploaded itself contains this data, then no.
When a file is uploaded to a web server, the binary data for the file is sent to the server, not the "file" as it is represented in the filesystem. You don't, for example, know that your file is coming from a compatible filesystem; you only get its data. Hence, the metadata is inaccessible.

Does DB2 OS/390 BLOB support .docx file

ASP.net app inserts Microsoft Windows 2007 .docx file into a row on DB2 OS/390 Blob table. A different VB.net app gets the DB2 OS/390 Blob data. VB.net app kicks off Microsoft Word to open the .docx file but then Microsoft Word pops up a message that the data is corrupted. Word will allow you to fix the data so the file can be viewed but it is extra steps and users complain.
I've seen some examples where .docx can be converted to .doc but they only talk about stripping out the text. Some of our .docx have pictures in them.
Any ideas?
I see that this question is 10 months old. I hope it's not too late to be helpful.
Neither DB2 nor any other database that allows a "Blob" data type would know that the data came from a .docx file, or do anything that would cause Word to complain. The data is supposed to be an exact copy of whatever data you pass to it.
Similarly, the Word document does not "know" that it has been copied to a BLOB object and then back.
Therefore, the problem is almost certainly with your handling of the BLOB data, in one or both of your programs.
Please run your first program to copy the .docx file into the databse, then run the second one to read it back out. Then use a byte-by-byte tool to compare the two files. One way to do this would be to open a command window and type:
fc/b Doc1.docx Doc2.docx
If you have access to some better compare tools, by all means use them... but make sure that it looks at EVERY BYTE, not just the printable characters.
Obviously, you ARE going to find differences, or else Microsoft Word wouldn't give you errors on the second one when the first one is just fine. Once you see what the differences are, hopefully you will understand what is going wrong and how to fix them.
I had a similar problem several years ago (I was storing graphics, but it's the same basic problem). It turns out that the document size was being affected - I would store 8005 bytes into the BLOB object, and when I read it back out I was getting 8192 bytes. NUL (0) bytes were being appended to the end of the data.
My solution at the time was to append an "X" to the end of the BLOB data when I wrote it to the database. Then, when I read it back, I would search for the very last "X" in the data and remove it, along with any data after it. That way, I could recover the original data. What I should have done was store the data length in the database along with the BLOB data. Then you could truncate the file to that size, eliminating the corruption.
If appended NUL bytes aren't your problem, then you'll need to do something else to fix the problem. But you don't have a clue until you know what changed. Something did.

Resources