Extract leads From Google Maps using Python - web-scraping

there's a Chrome Extension called "Lead Extractor" that when you search i.e. "Barbers in New York" allows you to download a CSV file with a list of all the merchants with Address, opening hours, reviews count, phone number etc etc. ( for better understanding click HERE) Is there any way to do what this extension does using Python?
I tried to look around but I wasn't able to find much. Thank you! I am a novice and I automated the process with PYAutoguy, but the process is time consuming and sometimes it things get messed up and the cursors ends up roaming around aimlessly without downloading anything.

Related

Is there a way to simulate a user search on the Youtube API?

I am trying to collect some data for a pet study. I would be collecting some metadata on the video suggestions based on a search. I was wondering if it is possible to do the following using the Youtube API(python or R) :
Input a search keyword and get the results
Choose one of the videos randomly and see the list of video suggestions.
Choose one of the suggested videos randomly.
Repeat this "n" times.
Is there a way to emulate this entire process? I think web-scraping can be an option but I am not really sure, how I would go about it. So if there are nay pointers that would be amazing and get me started.
Also, Is it possible to have no history, as in an option to erase all the cookies for the previous attempt(Steps 1 through 4) and start afresh? (More like an option to have this in Incognito Mode)
TIA for your suggestions.
AFAIK Google tracks the computer you are using in such a way that you can't escape their filter bubble. Even through Tor, YouTube might prefer some content related to the exit node IP location (and so language) or any previous YouTube search done by you (through this exit node) or another user of the end node or any computer using the same IP as the exit node...
The YouTube Data API v3 has a possibility to retrieve suggestions thanks to part=suggestions with Videos: list by authenticating with OAuth (so results might not be neutral). You can get the initial videos thanks to Search: list thanks to q filter. Web-scraping is also doable to be less tracked, my open-source YouTube operational API is able to web-scrape search results for instance.
Note that a French person claims having achieved to have done such a neutral French YouTube suggestions graph.

What is an acceptable way to print from app maker?

I'm trying to figure out what the best way to print from App Maker. I have a guess management app and I need a way to print out guest passes from app maker. I have some ideas, but I'm not sure what would work or be the acceptable best practice. These will print out on a 4x6 thermal printer. Any working examples would be greatly appreciated. I've only managed to get option two below to work without css formatting.
1) Open the guest info in a page fragment and print it. This would need to print the page fragment as displayed, unsure if this is possible in app maker.
2) Create an html page by passing the guests information, open the page in a new tab and use windows print.
3) Use a mailmerge of sort on a document on google drive and print with cloud print. I'm worried that the lag time might make this slow. I'm also unsure if it's doable.
Thank You
It sounds like you want to manually print them.
You could have a 'Visitor Card' template saved as either a Google Document or Google Sheet. When the visitor signs in, duplicate that document, write your visitors information to the appropriate cells/positions, then save as a PDF in a Google Drive location.
All of the above can automated, so you'll simply have to have a tab open at the Drive location and then print the file once saved. The saving process shouldn't take any longer than 5-10 seconds, which I think is a reasonable timeframe.
Another option, which is more technical, is you directly send a request to your printer to print. Although, this would depend on your printer and technical capabilities.

Retrieve a number from each page of a paginated website

I have a list from approx. 36,000 URLs, ranging from https://www.fff.fr/la-vie-des-clubs/1/infos-cles to https://www.fff.fr/la-vie-des-clubs/36179/infos-cles (a few of those pages return 404 erros).
Each of those pages contains a number (the number of teams the soccer club contains). In the HTML file, the number appears as <p class="number">5</p>.
Is there a reasonably simple way to compile an excel or csv file with the URL and the associated number of teams as a field ?
I've tried looking into phantomJS but my method took 10 seconds to open a single webpage and I don't really want to spend 100 hours doing this. I was not able to figure out how (or whether it was at all possible) to use scraping tools such as import.io to do this.
Thanks !
For the goal you want to achieve, I can see two solutions:
Code it in Java: Jsoup + any CSV library
In a few minutes, the 36000+ urls can be downloaded easily.
Use a tool like Portia from scrapinghub.com
Portia is a WYSIWYG tool quickly helping you create your project and run it. They offer a free plan which can take in charge the 36000+ links.

Connecting track names and artist names with Spotify Uris

I've found related questions (like this one), but nothing that directly answers my question: I need a direct way to turn artist name and track name into a spotify link. Just like spotify does for the local file list (some are links, some are not, I assume because spotify doesn't have those tracks.
How can I turn something like artist:'Francolin' and track name:'Hospital Song' into a Spotify uri without searching for it (which will return multiple results, and I don't know which one to use). How does the Spotify local files list do it?
The local files list in the Spotify client makes URLs like this:
spotify:local:Coldplay:Mylo+Xyloto:Paradise:277 (spotify:local:ARTIST:ALBUM:TRACK:LENGTH_IN_SECONDS). You can verify this by right-clicking a local file in your list that hasn't been linked to a Spotify track and choosing "Copy Spotify URI".
When playing the track, the client resolves it without using the backend at all - it searches its own local list of known files and plays whichever matches it closest.
When linking to a "real" Spotify track, the client asks the backend to do the dirty work. There isn't a web API for this (it's in libSpotify though), but basically the backend does a few heuristics to the data* then chooses the track that matches the given data (including length) the closest.
*Basically, the track metadata is stripped to a simpler form when searching, and the album has less weighting since an artist may release the same track on multiple albums.
I ran into the same problem as you but I don't think there's a direct way to convert it. Instead I just run for a search with "artist:'$artist' title:'$title'", which should be very accurate, and just use the first result in the array of results.

Using Yahoo! Pipes

Have you used pipes.yahoo.com to quickly and easily do... anything? I've recently created a quick mashup of StackOverflow tags (via rss) so that I can browse through new questions in fields I like to follow.
This has been around for some time, but I've just recently revisited it and I'm completely impressed with it's ease of use. It's almost to the point where I could set up a pipe and then give a client privileges to go in and edit feed sources... and I didn't have to write more than a few lines of code.
So, what other practical uses can you think of for pipes?
It's nice for aggregating feeds, yes, but the other handy thing to do is filtering the feeds. A while back, I created a feed for Digg (before Digg fell into the Fark pit of dispair). I didn't care about the overwhelming Apple and Ubuntu news, so I filtered those keywords out of Technology, which I then combined with Science and World & Business feeds.
Anyway, you can do a lot more than just combine things. If you wanted to be smart about it, you could set up per-subfeed and whole-feed filters to give granular or over-arching filtering abilities as the news changes and you get bored with one topic or another.
The one thing I have really used Y! Pipes for (rather than just playing around with it) is to clean up item titles, merge and finally de-dupe the feeds I got from querying multiple blog search engines with the same search term. This is something I’ve done in several very different contexts, eg. for my own ego surfing, in another case for the planet site set up by some conference’s organisers to keep an eye on their conference’s buzz, etc. Highly recommended.
You can do tons of things with pipes. For example for sites like digg or reddit, you can make one to bypass the site and go directly to the linked article (rewriting the RSS).
I like also to filter webcomics' feeds to keep just the comics, and then mix them all in only one feed
I've taken the liberty of copying your pipe and rearranging it a bit so that it's easier to add and remove tags:
Yahoo Pipe: StackOverflow Merge Tags
Tags are now listed in a string builder, so to add a tag you just have to hit the + button on the string builder and type in the tag preceded by a slash.
Well, pipes are real fast and useful.
Other effective uses might be:
1) combine many feeds into one, then sort, filter and translate it.
2) geocode your favorite feeds and browse the items on an interactive map.
3) power widgets/badges on your web site.
4) grab the output of any Pipes as RSS, JSON, KML, and other formats.
This is by no means a comprehensive list.
One of my favorite things to do with Yahoo! Pipes is to aggregate multiple craigslist feeds into a single feed. You can make a feed out of any category or search criteria on craigslist. I live in a university town and am always on the lookout for tickets to sporting events, for example. I have a half-dozen craigslist searches all being combined into a single feed via Yahoo! Pipes. This works a lot better for me than simply monitoring the entire "Tickets" category; filters out most of the tickets I am not interested in. Yes, this is another aggregating feeds example, but the craigslist usage is quite valuable with the ability to aggregate feeds that are themselves based upon searches.
I've used Pipes to translate blogs into English. I would have liked to use it to fetch the full text for blogs which only provide a summary of the content in the feed, but unfortunately they don't provide any input which fetches the content from a parameterizable source :-(.
Just stumbled on this while looking for ways to connect Excel to Pipes. A bit necromancer-ish, but here goes.
One thing I've done, is take an HTML page (science data) which has links to tons of CSV files for a bunch of Army Corps measurement stations. Each station has a big table of datafiles, all organized individually by month and year. I use YQL to parse out and organize the links to the individual CSV files in a way that Pipes can read them. Then, I use that as input into a Pipe, which has a user input for "Station" and "Date."
Using this, I can go to the Pipes page, type in those values and get the values only for a specific station and date, rather than have to find the station on a website, find the year and month in a big table, click the link, open the CSV file, and find the values for a day within that month's worth of data. I can even change the pipe to specify the hour, and the parameter, and then get a single value returned.
Now, I wish I could figure out how to program Excel so that I can use "=yahoo_function(station, datetime)" to place that value automatically into a cell give the values of other columns!

Resources