How to find original URL behind the translate.goog link? - google-translate

[A] https://dash-bootstrap-components.opensource.faculty.ai/docs/components/navbar/
[B] https://translate.google.com/website?sl=auto&tl=en&u=https://dash-bootstrap-components.opensource.faculty.ai/docs/components/navbar/
[C] https://hz2cpxsvz5qtrxmqge7znoupci--opensource-faculty.translate.goog/docs/components/navbar/
[A] is original link, [B] is used to create a google translate link, [C] is google translate link, [B] is redirected to [C].
Question:
I have got only the link [C] as text. I would like to get original link from [C]. How can I get the original link [A] from google translate link [C]? Maybe there is an API get original link.
What I found is if we send HTTP request to [C] it returns response and it has an iframe in tag. In iframe's src attribute there is an original url as query parameter. I need to parse the response and find the [A] from it. This solution is a fallback solution if I don't find better one. I'm trying to find a better solution.

In the [C] response there is <base href="https://dash-bootstrap-components.opensource.faculty.ai/docs/components/navbar/"> which contains the original URL.

Related

Extract URL stream from webpage?

I am trying to extract direct HTTP, RTSP, MP stream URLs or something that I can work with from this website:
http://kamery.ovanet.cz/
Any idea how to do it, I've tried f12 using chrome, but wasn't sucessful.
Thank you.
"Any idea how to do it, I've tried F12 using Chrome, but wasn't successful."
You have to first open the Network tab of Chrome's Developer tools then...
In the small text box that shows filter text, type in there: .m3u8
Select a camera. It should show multiple .m3u8 files being loaded
Choose a playlist.m3u8 that does not have ?auth=b64%3A etc in the URL
Try testing this link (is M3U8 playlist) :
https://stream.ovanet.cz/ovanet/f89ab1fdfeba19d51fa26dd491edcbaa/camera97/playlist.m3u8
You can try the above link with some online player : HLS Stream Tester.
Edit : Regarding comments...
"The URL link works for just a while and then it stops. Can I solve it somehow?"
Yes, their links are time-limited and will expire so you need a page refresh to get the newest version of the M3U8 link.
If you want to automate the process then you will have to write code that achieves the following:
Choose a camera ID number from this listing. Notice that the "cam_id" is followed by the related location's "title". Example is "cam_id":19.
Using the camera's ID number, make a request to this URL:
https://stream.ovanet.cz/player/api/embed-live.js?stream=camera19&_=XXXX
Note: Where XXXX is the current time as a UNIX timestamp (eg : 1659717867824).
This will return some text. Part of that text has the URL with the Base64 "auth" that you need for a correct request. You get it by finding the start position of return " and extracting (via String functions) everything between up until the first occurrence of "; going forward from the start position.
Request this newly extracted link, which looks like example:
https://stream.ovanet.cz/ovanet/camera19/playlist.m3u8?auth=b64%3AY2FtZXJhMTk6OjoxMjg0MzQzMTU3OjE2NTk3MjE1NjAwODc6MTY1OTcyNTE2MDo5Mi40MC4xOTAuMjc6MDdiZGUxN2MwMGY1Nzg5YTkwZjlkMmY3YjdmM2JmYTM%3D
There will be a re-direct to another URL of same M3U8 file name but from a different location, so you need to find out this new URL. It has the folder name you need.
Eg: the folder name is: /f3630f4527ce8ddd69bf27580737d66f/ from: https://stream.ovanet.cz/ovanet/f3630f4527ce8ddd69bf27580737d66f/camera19/playlist.m3u8
The final URL is basically (using chunklist.m3u8 seems to work better):
https://stream.ovanet.cz/ovanet/f3630f4527ce8ddd69bf27580737d66f/camera19/chunklist.m3u8

What is the # in this post request for?

I'm using fiddler to observe a redirect request and the request looks like this:
https://example.com/api/dothis#code=123456&name=abcdef
I'm not familiar with what the hashtag (#) is for. Could I get an explanation?
If it is an HTTP POST request to https://example.com/api/dothis#code=123456&name=abcdef, I think it is due to the backend (bad) design: The backend endpoint of /api/dothis will parse the string after # and extract the code and name parameter.
For the hashtag (#) itself, normally it is used to indicate anchor of HTML content, please refer to Fragment Identifier for more information. According to the wikipedia's description, it is unlikely that #code=123456&name=abcdef works as fragment identifier.
# is also used in Single Page Application to navigate between application UIs. But it's not the case, as the HTTP method in this question is POST.

Python 3.5 requests for clawing

I have a coding problem regarding Python 3.5 web clawing.
I try to use 'requests.get' to extract the real link from 'http://www.baidu.com/link?url=ePp1pCIHlDpkuhgOrvIrT3XeWQ5IRp3k0P8knV3tH0QNyeA042ZtaW6DHomhrl_aUXOaQvMBu8UmDjySGFD2qCsHHtf1pBbAq-e2jpWuUd3'. An example of the code is like below:
import requests
response = requests.get('http://www.baidu.com/link?url=ePp1pCIHlDpkuhgOrvIrT3XeWQ5IRp3k0P8knV3tH0QNyeA042ZtaW6DHomhrl_aUXOaQvMBu8UmDjySGFD2qCsHHtf1pBbAq-e2jpWuUd3')
c = response.url
I expected that 'c' should be 'caifu.cnstock.com/fortune/sft_jj/tjj_yndt/201605/3787477.htm'. (I remove http:// from the link as I can't post two links in one question.)
However, it doesn't work, and keeps return me the same link as I putted in.
Can anyone help on this. Many thanks in advance.
#
Thanks a lot to Charlie.
I have found out the solution. I first use .content.decode to read the response history, but that will be mixed up with many irrelevant info. I then use .findall to extract the redirect url from the history, which should be the first url displayed in the response history. Then, I use requests.get to retrieve the info. Below is the code:
rep1 = requests.get(url)
cont = rep1.content.decode('utf-8')
extract_cont = re.findall('"([^"]*)"', cont)
redir_url = extract_cont[0]
rep = requests.get(redir_url)
You may consider looking into the response headers for a 'location' header.
response.headers['location']
You may also consider looking at the response history, which contains a response for each response instance in a chain of redirects
response.history
Your sample URL doesn't redirect; The response is a 200 and then it uses a JavaScript window.location change. The requests library won't support this type of redirect.
<script>window.location.replace("http://caifu.cnstock.com/fortune/sft_jj/tjj_yndt/201605/3787477.htm")</script>
<noscript><META http-equiv="refresh" content="0;URL='http://caifu.cnstock.com/fortune/sft_jj/tjj_yndt/201605/3787477.htm'"></noscript>
If you know you will always be using this one service, you could parse the response, maybe using regex.
If you don't know what service will always be used and also want to handle every possible situation, you might need to instantiate a WebKit instance or something and somehow try to determine when it finally finishes. I'm sure there's a page load complete event which you could use, but you still might have pages that do a window.location change after the page is loaded using a timer. This will be very heavyweight and still not cover every conceivable type of redirect.
I recommend starting with writing a special handler for each type of edge case and fallback on a default handler that just looks at the response.url. As new edge cases come up, write new handlers. It's kind of the 'trial and error' approach.

How to show different content based on the path in Racket web servlets?

I'm trying to follow the tutorial on the Racket guide on simple web apps, but can't get one, basic, basic thing.
How can you have a servlet serve different content based on the request URL? Despite my scouring, even the huge blog example was one big file and everything handled with huge get query strings behind my back. How can I do anything based on URLs? Clojure's Noir framework puts this basic feature big up front on the home page (defpage) but how to do this with Racket?
The URL is part of the request structure that the servlet receives as an argument. You can get the URL by calling request-uri, then you can look at it to do whatever you want. The request also includes the HTTP method, headers, and so on.
But that's pretty low-level. A better solution is to use dispatch-rules to define a mapping from URL patterns to handler functions. Here's an example from the docs:
(define-values (blog-dispatch blog-url)
(dispatch-rules
[("") list-posts]
[("posts" (string-arg)) review-post]
[("archive" (integer-arg) (integer-arg)) review-archive]
[else list-posts]))
Make your main servlet handler blog-dispatch. The URL http://yoursite.com/ will be handled by calling (list-posts req), where req is the request structure. The URL http://yoursite.com/posts/a-funny-story will be handled by calling (review-post req "a-funny-story"). And so on.

Is it possible to use a search string url with Google Calendar

Would any body know if it is possible to search calendar events with a url string?
For example with gMail you could use:
https://mail.google.com/mail/?shva=1#search/Search+Term
to search for the words 'search' and 'term'
Any help or advice greatly appreciated.
Cheers
Noel
This is how you can search Google Calendar using a query parameter in the url:
https://www.google.com/calendar/render?q=TERM
where TERM is the string you're searching for.
So, you can search using a GET request.
Using Chrome's dev tools, it appears as though Google Calendar searches are performed using a POST request, so you won't be able to pass search terms into urls to fetch a response (which would be a GET request).
Update: Looks like a GET request will still return results, but the response is some form of JSON.
Here is the url (I x'd out my specific info), looks like its not meant for what you want to use it for:
https://www.google.com/calendar/search?ctz=America/New_York&hl=en&as_tp=basic&as_myuids=xxx&as_otheruids=xxx&as_q=kai%20mallea&as_cal=xxx;xxx&secid=xxx
A generalization of the current answer is:
https://calendar.google.com/calendar/b/0/render?q=TERM
where 0 is the index of the desired Google account (if you have multiple accounts).

Resources