To further a personal project of mine, I have been pondering how to count the number of results for a user specified word on Twitter. I have used their API extensively, but have not been able to come up with an efficient or even halfway practical way to count the occurrences of a particular word. The actual results are not critical, just the overall count. I'll keep scratching my head. Any ideas or direction pointing would be most appreciated.
e.g. http://search.twitter.com/search?q=tomatoes
I'm able to go back about a week. I start my search with the parameters that Adam posted and then key off of the smallest id in the set of search results, like so,
http://search.twitter.com/search.atom?lang=en&q=iphone&rpp=100&max_id=
where max_id = the min(id) of the 100 results I just pulled.
net but I have made recursive function to call search query again and again until I don't find word "page=" in result.
Related
I'm working with internal site search terms from Google Analytics in Google Data Studio. I need to count how many times users searched specific terms on the website. The problem is, the data is case sensitive and users often misspell words when they search, so that won't get tallied in a normal count function. For example, "careers", "Careers", "cAREERS", and "carers" are all different searches. What formula can I use to easily count how many times users searched different terms?
First add a field with the formula LOWER. Then add a field with case when to correct each possible spelling errors.
Another route would be to create a "sounds like" field. Here BigQuery give a nice function SOUNDEX. Data Studio does not offer somthing like that, but you can build a function with reg_exs so that: first character of word and then only the vocals of the word, but remove duplicated vocals first.
I'm admittedly new to access so forgive me if there is a simple solution. I have had difficulty searching for an answer as I'm not sure how this needs to be worded...
I have a database containing hundreds of thousands of invoices. I have a query that searches for invoices of the same amount on the same day by the same vendor (with different invoice numbers).
I group by this criteria as well as a count > 1 to display possible duplicates. I'd like to see each record displayed, but it only shows the first invoice number since if I were to group by invoice, the count would be 1 and nothing would get pulled..
I'm sure there is a better way of doing this to achieve the results that I want,. But I am at a loss. If there is further information required to assist, I'll provide what I can.
Thank you.
I have a strange behaviour with Bing Web Search.
I have a search query "hawkers" OR "hawkersco" OR "#hawkersco" OR "#hawkers" OR "www.hawkersco.com" with market = 'es-ES', safeSearch = Strict and responseFilter = webPages.
So, I expect, that result will contain at least one of these words and it will be Spanish posts. In fact I get more of posts in English and its not contain these keywords...
If I try search one by one these keywords, without OR operator, I had expected Spanish posts.
Please, explain why it is? How to use search query for get expected results?..
Check the specification for Bing Web Search API. Possibly this might be as simple as changing market to mkt(since you listed all the other parameters as used). And that means you should have a value for setLang as well.
You're not getting Spanish posts at all?
In that case, see here.
Bing results are based on relevance. Regardless of Market or Language.
If the result is deemed relevant. It will rank higher compared to the
selected language, and appear in the results.
Freshness affects the results, in that you need relevant(popular)
sites in your language. For them to attain sufficient relevance in the
selected time period.
You cannot rely on Bing returning a single language exclusively, with
the settings as they are.
How can I use the TwitterR package for R to get more than 100 search results?
Although I would prefer an example in R (since it currently uses R), I could just as easily use Java, so, an example that searches Twitter in Java to get 200 search results may suffice.
I don't know if this is even possible. I do not remember seeing a "page number" that you can specify when you search (The Google API supports that). I think that with Twitter you can specify the minimum Tweet ID, which allows you to get only newer tweets, but I don't expect that to work as desired in this case (unless the search results are some how ordered based on date / time rather than relevance to the search term).
I'm getting tweets this way.
someTweets <- searchTwitter("#EroticBroadway", n=500)
The n argument tells it how many tweets to cap it at. If there aren't that many tweets it won't return 500 though.
From the docs:
n The maximum number of tweets to return
There is also a time limit on the twitter api search.
The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets.
The t gem doesn't have that.
You have to jump through more hoops, which has some (a lot of) disadvantages, but I've used TAGS to collect hundreds of tweets (even thousands over the course of time...) and then read them into R as a CSV.
I think the question has been answered here before,but i could not find the desired topic.I am a newbie in web scraping.I have to develop a script that will take all the google search result for a specific name.Then it will grab the related data against that name and if there is found more than one,the data will be grouped according to their names.
All I know is that,google has some kind of restriction on scraping.They provide a custom search api.I still did not use that api,but hoping to get all the resulted links corresponding to a query from that api. But, could not understand what will be the ideal process to do the scraping of the information from that links.Any tutorial link or suggestion is very much appreciated.
You should have provided a bit more what you have been doing, it does not sound like you even tried to solve it yourself.
Anyway, if you are still on it:
You can scrape Google through two ways, one is allowed one is not allowed.
a) Use their API, you can get around 2k results a day.
You can up it to around 3k a day for 2000 USD/year. You can up it more by getting in contact with them directly.
You will not be able to get accurate ranking positions from this method, if you only need a lower number of requests and are mainly interested in getting some websites according to a keyword that's the choice.
Starting point would be here: https://code.google.com/apis/console/
b) You can scrape the real search results
That's the only way to get the true ranking positions, for SEO purposes or to track website positions. Also it allows to get a large amount of results, if done right.
You can Google for code, the most advanced free (PHP) code I know is at http://scraping.compunect.com
However, there are other projects and code snippets.
You can start off at 300-500 requests per day and this can be multiplied by multiple IPs. Look at the linked article if you want to go that route, it explains it in more details and is quite accurate.
That said, if you choose route b) you break Googles terms, so either do not accept them or make sure you are not detected. If Google detects you, your script will be banned by IP/captcha. Not getting detected should be a priority.