I need to scrape an URL list obtained by a Google search, using the Apify platform.
My plan is to start from a Google Search Scraper Actor task. However I don't think it can be used to scrape anything else than the Google search results (maybe I'm wrong ?). Therefore I need to provide its output to another Actor task, e.g. a Web Scraper or a Puppeteer Scraper.
But I can't seem to find the documentation related to the chaining of Actors. How should I proceed ?
Update :
I found How to pass data from crawler to actor, and setting an ACTOR.RUN.SUCCEEDED webhook on the Run task API endpoint of the second actor seems to work (that is, the second actor is launched).
However I can't seem to find how to pass the first actor's dataset to the second actor : the Start URLs field being mandatory I guess I should set it to the dataset, however the dataset link is different for each run…
You can chain multiple actor runs either via the Metamorph feature, or using Webhooks.
Metamorph
Metamorph allows you to run an actor and while the actor is running, "morph" it into a different actor with a custom input. The original actor will be stopped and replaced by the second one, but both will use the same storages, have the same run ID and will be displayed as a single actor run in the Apify app. You can use metamorph multiple times in a single run.
You can find the documentation for Metamorph here.
Webhooks
Webhooks allow you to call an arbitrary API endpoint once an actor reaches a given status, for example: SUCCEEDED. You can use this to call the Run Actor API to start another actor. You can set a custom payload for the webhooks, however, at this moment, passing output directly as webhook payload is not supported, so you'll need to use the ID of a key value store or dataset, where your results are stored and read it from there.
See the Webhooks docs here.
For example, to get the IDs of both key value store and dataset of the original actor, you would configure a payload like this:
{
"datasetId": {{resource.defaultDatasetId}},
"keyValueStoreId": {{resource.defaultKeyValueStoreId}}
}
Passing data from Google Search Scraper to Web Scraper
The task is not trivial because the Google Search output format is not compatible with the Web Scraper input format. The best way to do this is to create an intermediary actor that uses the output from Google Search Scraper to produce an input for Web Scraper and then metamorph into it. So the final flow is:
Google Search Scraper --webhook--> Output Processor Actor --metamorph--> Web Scraper.
Related
I want to create a smart contract and launch it for ICO. I also create a website where people can buy my token. I want know how to check how many token been sold (live)? so i can create a live bar counter to show how many percentages of the token already been sold.
Or is there a way i can monitor the token sale process in the smart contract?
A token contract is no different than any other smart contract. There are no special built in Solidity features or logic associated with them. They are just regular smart contracts that follow a specification.
So, if you want access to the number of tokens sold, you code that into your contract. While tokens sold is not part of the standard ERC20/ERC721 interface, nothing prevents you from adding a constant function to retrieve this information. In fact, if you're using the basic Zeppelin Crowdsale contract, you can just calculate it using the public state variables weiRaised / rate (Chances are you should be creating your own Crowdsale subcontract, so it's better to add the functionality you want there).
We can use the Etherscan Developer API to review transactions against a given contract address and find out the total supply or number of items available for sale.
There is a lot you can do with the Etherscan Developer API. For example, here's one URL that pulls data from Ethereum Mainnet -> Etherscan -> JSON parser -> Shields.io and renders it as an image to calculate the number of Su Squares remaining for sale:
Source: https://img.shields.io/badge/dynamic/json.svg?label=Su+Squares+available&url=https%3A%2F%2Fapi.etherscan.io%2Fapi%3Fmodule%3Daccount%26action%3Dtokenbalance%26contractaddress%3D0xE9e3F9cfc1A64DFca53614a0182CFAD56c10624F%26address%3D0xE9e3F9cfc1A64DFca53614a0182CFAD56c10624F%26tag%3Dlatest%26apikey%3DYourApiKeyToken&query=%24.result
^ I don't know if SO is going to cache the image here. But that URL is a live URL which pulls the number of Su Squares available hot off the blockchain.
How can create a procedure in any programming language that create an iconic call to page and get data from network? An simulator for user standard activity. For example, user every day search on Google some key words and get a list of websites that are close to his search. Is possible this action simulate by program that create a call to Google with standard key words and get the results from network? Also this logic can used in other sites without any hardcore part of code?
I need to get a goal name using google analytics API. I'd like to display this name along with some dimensions such as ga:goalCompletionsAll, ga:goalValueAll but I'm unable to.
I have done some research and all I could find are the explanations here Not getting Goal name using Google Analytics gapi but I'm using coldfusion and http requests to make the API call.
I know that I need to use the Management API to get the goal names and the Core Reporting API for other dimensions. I've done the API calls for both and looked at both responses and I'm unable to connect both results i.e the goal name and dimensions.
Kindly assist and thanks in advance
The reporting API doesn't return the name of the goal. You will need to go though the Management API.
goals.list returns a list of goals for the authenticated user. Then you can check if the goal nr is 1 what the name of it.
Note: Remember goal names can change over time so you cant really store these.
You should have two lists your the metrics you are requesting and the results of the goals.list. Currently there are only XX goal columns for metrics this may change in the future who knows. You will need to test your metrics to find out which number they selected. Depending upon what your application is allowing you can end up with several goals selected in one request.
You want to look at goal.id and goal.name. Goal id is the number.
My application is C# so I cant really share with you how I am handling this.
I am working on a school project in R where I am attempting to map where the most popular youtube videos are posted around the world. I am able to get the data for the 50 most popular videos, but am having trouble understanding how to use pageToken.
The current get request I am using is with the following:
https://www.googleapis.com/youtube/v3/videospart=snippet%2CrecordingDetails&chart=mostPopular&maxResults=50&key={api_key}
Is it possible to retrieve more than 50 results using "pageToken" (I am unfamiliar with how this works).
Any help would be appreciated thanks!
Videos: list
pageToken string The pageToken parameter identifies a specific
page in the result set that should be returned. In an API response,
the nextPageToken and prevPageToken properties identify other pages
that could be retrieved.
Note: This parameter is supported for use in conjunction with the
myRating parameter, but it is not supported for use in conjunction
with the id parameter.
So when you get the results from the first request you should have an option called page token if you send that to the next request
&pageToken=api_pageToken
it should give you the next bunch of rows.
Note: I am not an R programmer so I cant help with the code for a loop over the results to find out if there are page tokens or not.
I've found related questions (like this one), but nothing that directly answers my question: I need a direct way to turn artist name and track name into a spotify link. Just like spotify does for the local file list (some are links, some are not, I assume because spotify doesn't have those tracks.
How can I turn something like artist:'Francolin' and track name:'Hospital Song' into a Spotify uri without searching for it (which will return multiple results, and I don't know which one to use). How does the Spotify local files list do it?
The local files list in the Spotify client makes URLs like this:
spotify:local:Coldplay:Mylo+Xyloto:Paradise:277 (spotify:local:ARTIST:ALBUM:TRACK:LENGTH_IN_SECONDS). You can verify this by right-clicking a local file in your list that hasn't been linked to a Spotify track and choosing "Copy Spotify URI".
When playing the track, the client resolves it without using the backend at all - it searches its own local list of known files and plays whichever matches it closest.
When linking to a "real" Spotify track, the client asks the backend to do the dirty work. There isn't a web API for this (it's in libSpotify though), but basically the backend does a few heuristics to the data* then chooses the track that matches the given data (including length) the closest.
*Basically, the track metadata is stripped to a simpler form when searching, and the album has less weighting since an artist may release the same track on multiple albums.
I ran into the same problem as you but I don't think there's a direct way to convert it. Instead I just run for a search with "artist:'$artist' title:'$title'", which should be very accurate, and just use the first result in the array of results.