My text to speech voice results never sound as good as on the IBM Demo page (2) - watson-text-to-speech

When I submit a text to speech conversion using CURL, I get an OK sounding audio file, but a bit robotic and nasal. But this demo page sounds terrific and I can never get such high quality results. I do not specify the voice to use, so it uses some default.
https://www.ibm.com/demos/live/tts-demo/self-service/home
What is the above page doing differently than me?
My curl command is this:
$ curl -u "apikey:api-removed" --header "Content-Type: application/json" --header "Accept: audio/ogg" -d "#Greeting_Script.txt" --output greeting.ogg --dump-header "logfile.txt" "url-removed"
Redgar Tech replied
"If you had seen on the demo page, you were using a neural enhanced DNN version of the voices. Here, you are using their regular voice with no perfection and training."
However this link
https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices
says
"If you omit the optional voice parameter from a synthesis request, the service uses en-US_MichaelV3Voice by default"
I omitted the optional voice parameter from my synthesis request (see above) and yet I did NOT get results that used the neural enhanced voice of en-US_MichaelV3Voice.
So I tried adding the voice parameter for en-US_MichaelV3Voice and now the result is the clear neural enhanced version, same as the demo page provides.
So that means the documentation that states omitting the optional voice parameter defaults to en-US_MichaelV3Voice is incorrect. I think it may default to en-US_MichaelVoice, which is not the neural enhanced version.

I have confirmed that if I omit the optional voice parameter from a synthesis request, the service uses en-US_MichaelVoice by default. The evidence is in the log file:
session-name: EIHRWWSDMRCEZXKA-en-US_MichaelVoice
This means the information at this link
https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices
that states "If you omit the optional voice parameter from a synthesis request, the service uses en-US_MichaelV3Voice by default." is incorrect.
When I did add the voice parameter for en-US_MichaelV3Voice, the log file contained this line:
session-name: FIPYVOXYBMNRSQZQ-en-US_MichaelV3Voice

Related

Anti-Scraping bypass?

Helllo,
I'm working on a scraper for this page : https://www.dirk.nl/
I'm trying to get in scrapy shell the 'row-wrapper' div class.
If I enter response.css('row-wrapper'), it gives me some random results, I think an anti scraping system is involved. I need the hrefs from this class.
Any opinions on how can I move forward ?
We would need a little bit more data, like the response you receive and any code if it's already set up.
But from the looks of it, it can be multiple things ( from 429 Response blocking the request because of the rate limit to sites internal API XHR causing data to not be rendered on page load etc. ).
Before fetching any website for scraping reasons, try curl, postman or insomnia software to see what type of the response are you going to receive. Some special servers and website architectures require certain cookies and headers while some don't. You simply have to do this research so you can make your scraping workflow efficient.
I ran curl https://www.dirk.nl/ and it returned data that's generated by Nuxt framework. In this case that data is unusable since Nuxt uses it's own functionality to parse data.
Instead, the best solution would be not to get the HTML based data but API content data.
Something like this:
curl 'https://content-api.dirk.nl/misc/specific/culios.aspx?action=GetRecipe' \
-H 'accept: application/json, text/plain, */*' \
--data-raw '{"id":"11962"}' \
--compressed
Will return:
{"id":11962,"slug":"Muhammara kerstkrans","title":"Muhammara kerstkrans","subtitle":"", ...Rest of the data
I don't understand this language but from my basic understanding this would be an API route for recipes.

Linkedin get user feed

I try to get my linkedin feed using this API :
https://linkedin.api-docs.io/v1.0/feed/42Hm9SaY2p2CGwPzp
I try to use this request : "GET /voyager/api/feed/updates" with this shell code :
curl --request GET \ --url
https://www.linkedin.com/voyager/api/feed/updates \ --data '{}'
But I get this response : "CSRF check failed". I understand why linledin respond this but how to avoid it ?
You missing headers, see API docs here: https://linkedin.api-docs.io/v1.0/feed and explanation how get headers here: https://towardsdatascience.com/using-browser-cookies-and-voyager-api-to-scrape-linkedin-via-python-25e4ae98d2a8
API docs a bit outdated, data output format might be different, this at least true for messaging/conversations, not sure about feed
In regards of headers I suggest to try apify.com and extract them in real time from browser instance (run puppeteer, login to LiN, get headers, save them)
Phantombuster will not allow you to use your own code so not very useful

Using R to call a Web Service: Send data and get the result table back in R

http://snomedct.t3as.org/ This is a web service that will analyse English clinical text, and report any concepts that can be detected.
For e.g.- I have headache. It will identify headache as a Symptom.
Now what I would like to do is send the sentence to the web service through R, and get the table back from the web page to R for further analysis purpose.
If we take their example curl command-line:
curl -s --request POST \
-H "Content-Type: application/x-www-form-urlencoded" \
--data-urlencode "The patient had a stroke." \
http://snomedct.t3as.org/snomed-coder-web/rest/v1.0/snomedctCodes
that can be translated to httr pretty easily.
The -s means "silent" (no progress meter or error messages) so we don't really have to translate that.
Any -H means to add a header to the request. This particular Content-Type header can be handled better with the encode parameter to httr::POST.
The --data-urlencode parameter says to URL encode that string and put it in the body of the request.
Finally, the URL is the resource to call.
library(httr)
result <- POST("http://snomedct.t3as.org/snomed-coder-web/rest/v1.0/snomedctCodes",
body="The patient had a stroke.",
encode="form")
Since you don't do this regularly, you can wrap the POST call with with_verbose() to see what's going on (look that up in the httr docs).
There are a ton of nuances that one should technically do after this (like check the HTTP status code with stop_for_status(), warn_for_status() or even just status_code(), but for simplicity let's assume the call works (this one is their example so it does work and returns a 200 HTTP status code which is A Good Thing).
By default, that web service is returning JSON, so we need to convert it to an R object. While httr does built-in parsing, I like to use the jsonlite package to process the result:
dat <- jsonlite::fromJSON(content(result, as="text"), flatten=TRUE)
The fromJSON function takes a few parameters that are intended to help shape JSON into a reasonable R data structure (many APIs return horrible JSON and/or XML). This API would fit into the "horrible" category. The data in dat is pretty gnarly and further decoding of it would be a separate SO question.

HTTP Post Multipart Tool for testing

does anyone know a little test tool (like Poster / RestTool for Firefox) that is able to upload a file and send a text body within the same post request (Multipart)?
It is not a firefox-addon, but what I can really recommend is to use curl tool. It fits perfect when playing around with RESTful HTTP APIs because it is very close to HTTP protocol. Because it is CLI based it is more flexible as graphical addon (e.g. you can mail around or can document your api with sample calls).
E.g. doing a multipart request with curl would be:
# with '-v' verbose-switch you see some headers
# with '-F' you are "activating" single multiparts
# with '#' you are referencing file
curl -v -F myPartName1=#file1.txt -F myPartName2=#file2.txt http://host.com/your/multipart/endpoint
# if server needs it you can also pass Content-Type with single files
... -F "myPartName1=#file1.txt;type=text/plain" ...
What kind of multipart do you expect on server-side (e.g. multipart/form-data or multipart/mixed).
Is there a reason why it has to be a firefox addon? I have seen people using RestClient, but I never saw it working with multipart.
You can use Firefox poster add-on to send HTTP posts with multipart.
Select "Parameters" tab
Enter the multipart "Name" and "Value"
Press "Add/Change"
Select "Content to Send" tab
Press "Body from Parameters"
Enter your URL and User Auth, as required
Press"POST"
For Chrome/Chromium there is the excellent Postman app/extension: http://www.getpostman.com/ .
For a brief visual tutorial you can check: https://stackoverflow.com/a/16022213/1667104 .
I like to include http://aminus.net/wiki/Okapi in most of my HTTP projects these days.
Firefox has a few:
Rest Easy
Rest Client
LiveHTTPHeaders
and poster as mentioned earlier by #joff

Translating from cURL to straight HTTP requests

What would the following cURL command look like as a generic (without cURL) http request?
feedUri="https://www.someservice.com/feeds\
?prettyprint=true"
curl $feedUri --silent \
--header "GData-Version: 2"
For example how could such an http request be expressed in the browser address bar? Partucluarly, how do I express the --header information if I were to just type out the plain http request?
I don't know of any browser that lets you specify header information in the address bar. I believe there are plug-ins that let you do this, but I don't have any experience with them.
Here is one for firefox that looks promising:
https://addons.mozilla.org/en-US/firefox/addon/967
Basically what you want to do is not a standard browser feature.

Resources