extracting twitter video with R - r

I have a video in my local folder.
What i want to do is the use rtweet or any pther R package to search the twitter for this specific video and pull the data as a data frame.
IS there any way of doing this?

There’s no way to do this via, for example, video fingerprinting - Twitter search does not support that. If you knew a specific URL where that video is shared from, you could search for it in the API.

Related

Is there a way in R to extract data from a website using Microsoft Power BI

I am working on a project related to COVID travel restrictions and want to use data from
https://migration.iom.int/, in particular the data on country travel restriction (press on the tab on the bottom right once the page has loaded). My usual rvest approach to web scraping does not seem to work for the site. Any suggestions on possible ways to extract data from the site?
The data is from JS files like this one: https://migration.iom.int/sites/all/themes/fmp/pages/heatmap/js/heatmap_2020-07-23.js

How to search the internet for pages containing specified terms and storing the results in a data table, from within R, using OpenSearch

I am setting up a database of certain events that have occurred in the past, and need to search the internet for a number of terms to retrieve as many pages as possible that contain terms related to the happenings i want to document.
First I looked into achieving this using Googles "Custom Search API", after reading this question:
Need to access Google Custom search api through R
I did manage to get a JSON of search results through the browser, but not through R, so I moved on.
When I saw that the Custom Search API was using OpenSearch, and found the rOpenSearch package for R, I wanted to try going down this path:
http://terradue.github.io/rOpenSearch/
After reading through the documentation, there was only provided examples of searching sites that provide opensearch descriptions. As I need to search as many websites as possible, it seems like I would need an opensearch description for a search engine like Google. But I can't seem to find that anywhere.
Is there any way to search the internet via. R using OpenSearch, and collecting the results in a data table?
If you know of a better solution to my problem, I'd appreciate if you could point me in another direction.
If I read well, you are looking for something called Web Scraping via R.
<See me!>

Scrape data from a website that ranks first on a EAN/UPC google search

I am curious whether the following automation would be feasible:
search google for a UCP/EAN code number (e.g. 8710103703631)
scrape and parse data (depending on what is available) from the first ranked page concerning Product:
Name
Brand
Model
Picture
Description
Just trying to understand how complicated this might be.
Thank you!
Lookup EAN/UPC codes via API
There are some free web-APIs which (reverse-)lookup barcodes (EAN/UPC) or provide additional information.
For example ean-search.org is an REST API that is queried by the EAN and delivers XML (e.g. provides a link to Amazon for your sample "Philips Sonicare").
Benefit using an API: ready to use data, no scraping needed.
Web-scraping for search-results
For sure you can use search-engines (like google, duckduckgo, etc.) and search for the barcode using the favorite web-scraping library in your desired programming-language:
JSoup (in Java): see this question
Scrapy or BeautifulSoup (in Phyton): see this question

Import.io - Can it replace Kimonolabs

I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.

Is there a good R API for accessing Google Docs?

I'm using R for data analysis, and I'm sharing some data with collaborators via Google docs. Is there a simple interface that I can use to access a R data.frame object to and from a Google Docs spreadsheet? If not, is there a similar API in other languages?
There are two packages:
RGoogleDocs on Omegahat: the package allows you to get a list of the documents and details about each of them, download the contents of a document, remove a document, and upload a document, even binary files.
RGoogleData on RForge: provides R access to Google services through the Google supported Java API. Currently the R interface only supports Google Docs and Spreadsheets.
As of 2015, there is now the googlesheets package. It is the best option out there for analyzing and editing Google Sheets data in R. Not only can it pull data from Google Sheets, but you can edit the data in Google Sheets, create new sheets, etc.
The GitHub link above has a readme with usage details; there's also a vignette for getting started, or you can find the official documentation on CRAN.
This may partially answer the question, or help others who want to begin by only downloading FROM public google spreadsheets: http://blog.revolutionanalytics.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html#
I had a problem with certificates, and instead of figuring that out, I use the option ssl.verifypeer=FALSE. E.g.:
getURL("https://<googledocs URL for sharing CSV>, ssl.verifypeer=FALSE)
I put up a Github project to demonstrate how to use RGoogleDocs to read from a Google Spreadsheet. I have not yet been able to write to cells, but the read path works great.
Check out the README at https://github.com/hammer/google-spreadsheets-to-r-dataframe
I just wrote another package to download Google Docs spreadsheets. Its much simpler than the alternatives, since it just requires the URL (and that 'share by link' is enabled).
Try it:
install.packages('gsheet')
library(gsheet)
gsheet2tbl('docs.google.com/spreadsheets/d/1I9mJsS5QnXF2TNNntTy-HrcdHmIF9wJ8ONYvEJTXSNo')
More detail is here: https://github.com/maxconway/gsheet
Since R itself is relatively limited when it comes to execution flow control, i suggest using an api to an high-level programming language provided by google: link text.
There you can pick whichever you are most familiar with.
I for one always use python templates to give R a little more flexibility, so that would be a good combination.
For the task of exporting data from R to google docs, the first thing that comes to my mind would be to save it to csv, then parse and talk to g/docs with one of the given languages.

Resources