Scrapping/ extracting Youtube Channel URLs - r

I am trying to collect Data from YouTube and would like to ideally get the channel URL from as many Youtubers, who are active on one particular day and from a specific country.
The website channel crawler lets you collect big amounts of different youtube channels, however they only have a collection of 5 mio channels in total and are therefore not a complete representation. Most scraping tools I found let you scrape based on the URL but have no way of finding them in the first place.
Basically, before collecting more data, it is necessary to have the channel URLs of the videos to actually scrape the data.
Does anybody have any recommendations for websites or methods? Not to scrape actual channel informantion, but only collect as many channel URLs as possible?
Any advice is welcome!
Thank You

Related

Is it possible to scrape multiple data points from multiple URLs with data on different pages into a CSV?

I'm trying to build a directory on my website and want to get that data from SERPs. The sites from my search results could have data on different pages.
For example, I want to build a directory of adult sports leagues in the US. I get my SERPs to gather my URLs for leagues. Then from that list, I want to search those individual URLs for: name of league, location, sports offered, contact info, description, etc.
Each website will have that info in different places, obviously. But I'd like to be able to get the data I'm looking for (which not every site will have) and put that in a CSV and then use it to build the directory on my website.
I'm not a coder but trying to find out if this is even feasible from my limited understanding of data scraping. Would appreciate any feedback!
I've looked at some data scraping software. Put requests on Fiverr with no response.

Is there a way to simulate a user search on the Youtube API?

I am trying to collect some data for a pet study. I would be collecting some metadata on the video suggestions based on a search. I was wondering if it is possible to do the following using the Youtube API(python or R) :
Input a search keyword and get the results
Choose one of the videos randomly and see the list of video suggestions.
Choose one of the suggested videos randomly.
Repeat this "n" times.
Is there a way to emulate this entire process? I think web-scraping can be an option but I am not really sure, how I would go about it. So if there are nay pointers that would be amazing and get me started.
Also, Is it possible to have no history, as in an option to erase all the cookies for the previous attempt(Steps 1 through 4) and start afresh? (More like an option to have this in Incognito Mode)
TIA for your suggestions.
AFAIK Google tracks the computer you are using in such a way that you can't escape their filter bubble. Even through Tor, YouTube might prefer some content related to the exit node IP location (and so language) or any previous YouTube search done by you (through this exit node) or another user of the end node or any computer using the same IP as the exit node...
The YouTube Data API v3 has a possibility to retrieve suggestions thanks to part=suggestions with Videos: list by authenticating with OAuth (so results might not be neutral). You can get the initial videos thanks to Search: list thanks to q filter. Web-scraping is also doable to be less tracked, my open-source YouTube operational API is able to web-scrape search results for instance.
Note that a French person claims having achieved to have done such a neutral French YouTube suggestions graph.

Webscraping ranked volume data from opensea stats pages

I'm trying to get rankings data for NFT collections sorted by their highest all-time volume. It seems that currently the opensea API does not support ranked lists as an endpoint. As a workaround, I'm looking at web-scraping to fetch the all-time volume rankings information using https://opensea.io/rankings?sortBy=total_volume.
However, I am having difficultly fetching data for any entry in the rankings list past a 100 items - i.e. page 2 of the rankings and onwards. The opensea url does not explicitly change when I click on the list of ranks at the bottom of the page (101-201).
Any ideas on how I could automate web scraping for ranks past the first 100 entries?
I'd appreciate any help here. And thanks for your help in advance!
Did you check out this library that does scraping for you under the hood? I have tested some endpoints and it appears to return data. Please check out: https://github.com/dcts/opensea-scraper

Tracking a Search that leads to a sale in GA

This seems really basic but i am struggling with it
We have a client who runs a travel website.
They have a few different search bars eg Flights, Hotels, Carhire.
I am trying to track the performance of each... "What % of people completed a sale that ran a Flight search." Same for Hotel, and for Car hire
Any ideas for the best way to get this info in GA?
Many thanks
There are a few ways to get this information, each with their pros and cons. The options that I see immediately available are segments and goals.
Segments are great because they are retrospective and generally more flexible, with the ability to be changed if you find your criteria isn't quite right. You create here, and specify sessions that go through search results pages etc:
Then you can create another segment for booking confirmation page, and any other intermediary steps that you'd like to report on. The main con of segments is that you can only pull in 4 at a time, but if you have more you can pull them 4 at a time and copy+paste the data into an excel sheet or google sheet. Segments can also be pulled via the Core Reporting Api and DataStudio which makes them great for automating into dashboards.
Goals are cool because they pull into the default reports, and basically track sessions through a particular page, event or sequence. The main con I see and the reason is that I don't use them is that they only start tracking fro mthe time you create them , and if you change the configuration it does not impact historical data, so your data can get messed up quickly if you don't have sandbox GA views or sandbox goals for your testing before putting it into a dedicated goal slot. You can also only have 10 or 20 goals depending on your plan, so once data is tracked against that goal you can't remove or clear it.

eBay API: Get all items currently in auction

For an university project (Big Data lecture), I’d like to analyze auctions on eBay. I wasn’t able to find reliable information so far whether it’s possible to get all current auctions on eBay via their API or not. I only need the auction title and the current price and I am aware that this is a huge load of data, but I’m just curios.
I don't think it's possible, in part because of the huge amount of data, and perhaps also because I don't think eBay wants people downloading data en masse like that. Doing so might allow people to do data mining and market research from a vantage point that is too publicly revealing for them.
If you're willing to settle for a large segment of data, look into eBay's Large Merchant Services and their LMS API.
For your research project, you should be able to make sense of an even smaller subset of data by just pulling from eBay's Finding API in a few automated large chunks.

Resources