How to get the request messages from a website - web-scraping

I want to be able to get the source video link for each video from tiktok.com. I found that you can see each request in chromes network panel and the link to the source video (request URL). How would be able to make a program that intercepts these requests and adds the request url to a list of some sort? Thanks.
chrome network panel

I suggest you read more about web-scraping
To receive video links, you can send the desired request, then separate the specific part you want with regex or ..
for example, video link ...
If there is a feature for this topic, you can read in the official source:
https://developers.tiktok.com/
It is possible that a library or framework has already been written for this
tiktok api
You can read the following link and use the available repositories:
https://www.google.com/search?q=tiktok+api

Related

Is there an endpoint to batch get urn:li:digitalmediaAsset in the LinkedIn API?

We are doing a rest/posts?author={MY_ORG} request against the LinkedIn Api (version 202211). Some of the posts returned contain content referenced with urn:li:digitalmediaAsset for which we need the download URL.
When I encounter urn:li:image or urn:li:video I can do a BATCH get to fetch additional details about the assets. I'd like to do the same thing for urn:li:digitalmediaAsset. I haven't seen an endpoint for that - does it exist?
I understand, that I can use a projection here but, I'd like to align with the code that I have for images and videos if the endpoint exists. In other words, I am looking for an alternative to using projections.

How to downloaded embedded videos on Instagram that have multiple media network requests? (ffmpeg / curl)

I was wondering whether it is still possible to download the stories on Instagram that have multiple media network requests nowadays. Previously, I could simply download them using the Firefox Media Page Info View. This is still possible for some content such as reels or by using ffmpeg / curl. I have posted the network behaviour when I load a story on Instagram and a Reel on Instagram. I could not figure out which of the network requests is the relevant one (assuming it is the request with the biggest size) and when downloading it, the VLC player does not recognise the video.
Instagram Reel Network Request
Instagram Stories Network Request
When using the Firefox Media Page Info View, it is possible to download the content from the reel directly whereas the mp4 media file is greyed out.
I tried to copy the URL from the network request using:
ffmpeg -i <network_videomanifest_url> -c copy <output_path>
Copying the corresponding curl request but also only works for the Reel where the HTTP response status codes is 206 (Reel) instead of 200 (Story).

How do I upload a video to YouTube via an HTTP request?

I've been trying to figure this out for hours now. Consulting the official documentation It says I need to make a post request to https://www.googleapis.com/upload/youtube/v3/videos with a content type header set to video/* or application/octet-stream (I've used the latter). Turns out if I just post a buffer of a video file to that url it'll work. But the documentation also says I can specify a whole bunch of options about the video (title, description, tags, etc.) However, it says to attach that information to the request body! I'm confused on how I'm supposed to send both the video bytes and the options in the same request. Maybe it's not supposed to be the same request, but they don't mention anything about using multiple.
Uploading videos using Youtube API is done using a protocol that Google calls "Resumable Uploads Protocol". Google uses this protocol across their APIs (i.e. Drive, Youtube etc.) and is recommended in the following scenarios
Uploading large file
Unreliable network connection.
The full details of how to use "Resumable Uploads Protocol" with the Youtube API can be found at https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol.
The following is a simplified set of steps:
Create a resumable upload session by sending a POST request to the insert API endpoint.
Read the resumable session URI from the Location header of the above request.
Upload the video by sending a PUT request with binary video data as body to the resumable session URI.

How can I find the URL that downloads a file?

I am developing a web scraper and I need to download a .pdf file from a page. I can get the file name from the html tag, but can't find the complete url (or request body) that downloads the file.
I have tried to sniff the traffic with the chrome and firefox network traffic tool and with wireshark, with no success. I can see it make a post request to the exact same url as the page itself, and so I can't understand why this happens. My guess is that the filename is being sent inside the POST request body, but I also can't find that information in those tools. If I could see the variable name in the body, I could create a copy of the request and then get the file.
How can I get that information?
Here is the website I am talking about: http://www2.trt8.jus.br/consultaprocesso/formulario/ProcessoConjulgado.aspx?sDsTelaOrigem=ListarProcessos.aspx&iNrInstancia=1&sFlTipo=T&iNrProcessoVaraUnica=126&iNrProcessoUnica=1267&iNrProcessoAnoUnica=2010&iNrRegiaoUnica=8&iNrJusticaUnica=5&iNrDigitoUnica=24&iNrProcesso=1267&iNrProcessoAno=2010&iNrProcesso2a=0&iNrProcessoAno2a=0
EDIT: for those seeking to do something similar, take a look at this website: http://curl.trillworks.com/
It converts a cURL to a python requests code. Very useful
The POST data used for the request is encoded content generated by ASP.NET. It contains various state/session information of the page that the link is on. This makes it difficult to directly scrape for the URL.
You can examine the HAR by exporting it from the Network tab in Chrome DevTools:
The __EVENTVALIDATION data is used to ensure events raised on the client originate from the controls rendered on the page from the server.
You might be able to achieve what you want by requesting the page the link is on first, then extract the required POST data from the response (containing the page state and embedded request for file), and then make a new request with this information. This assumes the server doesn't expire any sessions in the meantime.

How to find HTTP POST Data sent to a CGI Page?

I searched google for a good number of hours. Maybe I searched for the wrong keywords.
Here is what I want to do.
I'm posting data to a website which then makes a HTTP POST request and returns a .CGI webpage. I want to know the parameters the web page uses to send that HTTP POST request so that I can directly link a page from my Webpage to the final .CGI webpage by making the user enter the data on my own webpage.
How do I achieve it?
Usually the POST body is piped into STDIN, just read it as a normal file

Resources