Is there a way to simulate a user search on the Youtube API? - web-scraping

I am trying to collect some data for a pet study. I would be collecting some metadata on the video suggestions based on a search. I was wondering if it is possible to do the following using the Youtube API(python or R) :
Input a search keyword and get the results
Choose one of the videos randomly and see the list of video suggestions.
Choose one of the suggested videos randomly.
Repeat this "n" times.
Is there a way to emulate this entire process? I think web-scraping can be an option but I am not really sure, how I would go about it. So if there are nay pointers that would be amazing and get me started.
Also, Is it possible to have no history, as in an option to erase all the cookies for the previous attempt(Steps 1 through 4) and start afresh? (More like an option to have this in Incognito Mode)
TIA for your suggestions.

AFAIK Google tracks the computer you are using in such a way that you can't escape their filter bubble. Even through Tor, YouTube might prefer some content related to the exit node IP location (and so language) or any previous YouTube search done by you (through this exit node) or another user of the end node or any computer using the same IP as the exit node...
The YouTube Data API v3 has a possibility to retrieve suggestions thanks to part=suggestions with Videos: list by authenticating with OAuth (so results might not be neutral). You can get the initial videos thanks to Search: list thanks to q filter. Web-scraping is also doable to be less tracked, my open-source YouTube operational API is able to web-scrape search results for instance.
Note that a French person claims having achieved to have done such a neutral French YouTube suggestions graph.

Related

Has anyone displayed a Salesforce Dashboard component on WordPress site? If so, how?

I work for a nonprofit which help disabled military veterans. We have all our participants register with us using Salesforce as the repository of their registrations. We have dashboard components in Salesforce Lighting which totals up the number of active participants we have. I would like to display the component on our WordPress site but have never done anything like that before. I was hoping to find someone who has done something like that and offer some direction on how to go about doing it.
I tried looking up WordPress plugins which integrate with Salesforce. Most seem to be geared towards sending registrations back and forth but not displaying information. From a little bit of research, it seems like coding might need to be involved. Maybe doing a REST API with a Post option which will send the data through an HTTP URI? But to my understanding is that it would require WordPress to be an API. I am sure there are gaps in my logic.
I dont have an extensive amount of programing language experience but am willing to learn. I have taken a few Java and JavaScript classes in school.
I have not attempted this yet. I am just looking for feedback and direction.
Few options here, in no specific order...
Do Wordpress users have real Salesforce accounts or is their data simply stored in SF? Ask your Salesforce admin if there's a "customer community" configured (if your SF org is really old he might refer to it as customer portal). Communities offer nice way of exposing SF to poeple who don't need full SF user licenses. Think like collaborating with real SF users on "My Cases", viewing reports & dashboards... But for this you'd really need people logged in to SF so it won't work if you want just something anonymous. Some more info
Another option might be using Sites (Visualforce pages that expose SF data to guest users). Think like displaying a product catalog, FAQ, web-to-lead form or some other generic "contact us" page that's anonymous. So if you have SF developer (or admin with good copy-paste skills) you could use some Visualforce charts. They can be 100% coded (like this) or fed data from a report (like this) so it's simpler for admin to change the report filters or something without really writing code. Not sure if the simple route will work on a Site, there are some old answers that say "No", you might have to try it out. Worst case you'd need Apex code (or JavaScript) to query SF for results and display them. And display that SF Site page as <iframe> in Wordpress.
A slight twist on the Sites option - do you use Chatter (bit like Twitter inside SF)? There's way to take a snapshot of a report when a milestone has been met and post it to chatter ("congrats for hitting X participants"). And embed feeds on Visualforce pages too. Docs
What SF edition you're on (Group/Professional/Enterprise...)? If you have API access to Salesforce you could query the info yourself from Wordpress and display it using whatever charting library's easiest for you (Google Charts, Flot...). There are tons of examples how to connect to SF from PHP (or maybe you could cannibalize a WP plugin). Technically it's one POST message to log in to SF and one GET to run a query (something as simple as SELECT COUNT() FROM Contact WHERE isActive__c = true?)
That'd be more or less everything in terms of pulling data out of Salesforce. I mean if you have API access enabled you can slice & dice it how you want, extract data with raw PHP code or use some middleware but overall idea doesn't change. Write queries yourself or use "Analytics API" to access report results (so your administrator has power to change it without coding)...
So how about pushing? SF could notify you about current participants count. At scheduled intervals or even realtime. That'd be "just" raw data though, you'd have to write visualisation yourself.
Plenty of options here
workflow rules (code-free), sends XML message to specified URL so you'd need a WP page that can "capture" the result. Could be sent on creation of new record or update of existing. Won't give you totals, it'd be data related to that particular record so you'd have to build kind of +1 / -1 counter... Or if you use a report + analytic snapshot (helper object to store report results) and have workflow on that - that could be really close to what's needed.
scheduled apex job to run some queries and send the results to you. Again - you'd need a WP url that can be called from SF
if there's a CometD plugin for Wordpress you should look at Salesforce Streaming API, Platform Events or (newer and even simpler to configure) Change Data Capture. Basically you "subscribe" to a topic (a SF query) and whenever SF data changes and SF decides it'd change the results of the query - it'd push the results to you. It's almost realtime. Too much to write about them, perhaps best if you'd try to click through some trailheads - SF self-paced training courses:
https://trailhead.salesforce.com/en/content/learn/modules/api_basics/api_basics_streaming
https://trailhead.salesforce.com/en/content/learn/modules/change-data-capture
https://trailhead.salesforce.com/en/content/learn/modules/platform_events_basics

Scrape all google search result for a specific name

I think the question has been answered here before,but i could not find the desired topic.I am a newbie in web scraping.I have to develop a script that will take all the google search result for a specific name.Then it will grab the related data against that name and if there is found more than one,the data will be grouped according to their names.
All I know is that,google has some kind of restriction on scraping.They provide a custom search api.I still did not use that api,but hoping to get all the resulted links corresponding to a query from that api. But, could not understand what will be the ideal process to do the scraping of the information from that links.Any tutorial link or suggestion is very much appreciated.
You should have provided a bit more what you have been doing, it does not sound like you even tried to solve it yourself.
Anyway, if you are still on it:
You can scrape Google through two ways, one is allowed one is not allowed.
a) Use their API, you can get around 2k results a day.
You can up it to around 3k a day for 2000 USD/year. You can up it more by getting in contact with them directly.
You will not be able to get accurate ranking positions from this method, if you only need a lower number of requests and are mainly interested in getting some websites according to a keyword that's the choice.
Starting point would be here: https://code.google.com/apis/console/
b) You can scrape the real search results
That's the only way to get the true ranking positions, for SEO purposes or to track website positions. Also it allows to get a large amount of results, if done right.
You can Google for code, the most advanced free (PHP) code I know is at http://scraping.compunect.com
However, there are other projects and code snippets.
You can start off at 300-500 requests per day and this can be multiplied by multiple IPs. Look at the linked article if you want to go that route, it explains it in more details and is quite accurate.
That said, if you choose route b) you break Googles terms, so either do not accept them or make sure you are not detected. If Google detects you, your script will be banned by IP/captcha. Not getting detected should be a priority.

How to Stream Through Large Amounts of Twitter Data?

I'll be working on a project that will require a live output of a number of tweets users have hash tagged on Twitter as well as their tweets. Something along the lines of MTV's Twitter Tracker: http://vma-twittertracker.mtv.com/live/#buzz.
What intrigued me about this site is how can they constantly make API calls to Twitter without breaching the request limit?
I'd appreciate if anyone could guide me on the most effective way to accomplish this. From the research I've carried out thus far, I presume I will need to use Twitter's Streaming API.
Since there is a chance that the number of tweets output to my page could be in their thousands (AJAX loaded) along with stats on number of retweets/favourites, what would be the most scalable approach within my .NET site? Any examples or guidance would be appreciated.
Check out Linq2Twitter. It is a great wrapper around the Twitter API, and provides two mechanisms that will help you:
There is a search function that allows you to search for hash tags, etc, which will limit the amount of data you are getting back
You have the option to specify getting all the data since a certain tweet ID. You can therefore incrementally search the feed by performing searches and searching, in subsequent calls, from the ID you left off on.
I have used this many times to search the public feed and have not had any issues to date. I think the search function is key not requesting too much. Good luck!
you can look into Storm framework. Below are few links for further reference:-
http://storm-project.net/
https://github.com/nathanmarz/storm
Thanks for all your responses.
It looks like sites such that display a lot of Twitter stats/data use third party approved providers that have direct access to Twitter's Firehose API.
I have managed to get in contact with an approved provider to supply us with the feeds of data required (and it ain't cheap!).

Connecting track names and artist names with Spotify Uris

I've found related questions (like this one), but nothing that directly answers my question: I need a direct way to turn artist name and track name into a spotify link. Just like spotify does for the local file list (some are links, some are not, I assume because spotify doesn't have those tracks.
How can I turn something like artist:'Francolin' and track name:'Hospital Song' into a Spotify uri without searching for it (which will return multiple results, and I don't know which one to use). How does the Spotify local files list do it?
The local files list in the Spotify client makes URLs like this:
spotify:local:Coldplay:Mylo+Xyloto:Paradise:277 (spotify:local:ARTIST:ALBUM:TRACK:LENGTH_IN_SECONDS). You can verify this by right-clicking a local file in your list that hasn't been linked to a Spotify track and choosing "Copy Spotify URI".
When playing the track, the client resolves it without using the backend at all - it searches its own local list of known files and plays whichever matches it closest.
When linking to a "real" Spotify track, the client asks the backend to do the dirty work. There isn't a web API for this (it's in libSpotify though), but basically the backend does a few heuristics to the data* then chooses the track that matches the given data (including length) the closest.
*Basically, the track metadata is stripped to a simpler form when searching, and the album has less weighting since an artist may release the same track on multiple albums.
I ran into the same problem as you but I don't think there's a direct way to convert it. Instead I just run for a search with "artist:'$artist' title:'$title'", which should be very accurate, and just use the first result in the array of results.

Drupal 7 Click tracking / traffic per node

I'd like to track the number of times a node shows up in search results, the number of times it's clicked on in a search result, and the number of times it is displayed. I'd like all of this information to be readily available - perhaps stored in the content type? I could use some help determining the best practice here.
Alternatively. Could I use google analytics to track all of this somehow such that I could access the results from a custom designed module (which, can anyone point me in a direction on?)
Ultimately I want uses to look at a node and be able to see how much traffic came from in-site linking, off-site linking, and how many times users followed a link from the node to an external website.
Can I export google analytic data for each node somehow to drupal?
Thanks!
I don't know any existing Druapl module site can accomplish that, but some study on Google Analytics can get you there,
the number of times it's clicked on in a search result, and the number
of times it is displayed.
can be done by inserting events tracker.
Ultimately I want uses to look at a node and be able to see how much
traffic came from in-site linking, off-site linking, and how many
times users followed a link from the node to an external website.
is largely already built-in GA, and you can use conversion if you need some stats which is a bit more complex.
Can I export google analytic data for each node somehow to drupal
GA can export data in csv, you can automate(API) that to feed the data back to Druapl tho.
You might want to ask on webmaster site if you decide going for the GA
So how is what you're looking for any different than - http://drupal.org/project/click
Other than that there isn't yet a Drupal 7 version of this module?
An alternative to click for Drupal 7 click thru external link tracking/logging might be: pop_links
(currently a development release in Drupal 7)

Resources