R: LinkedIn scraping using rvest

R: LinkedIn scraping using rvest - r

Using rvest package, I am trying to scrape data from my LinkedIn profile.
These attempts:
library(rvest)
url = "https://www.linkedin.com/profile/view?id=AAIAAAFqgUsBB2262LNIUKpTcr0cF_ekoX9ZJh0&trk=nav_responsive_tab_profile"
li = read_html(url)
html_nodes(li, "#experience-316254584-view span.field-text")
html_nodes(li, xpath='//*[#id="experience-610617015-view"]/p/span/text()')
don't find any nodes:
#> {xml_nodeset (0)}
Q: How to return just the text?
#> "Quantitative hedge fund manager selection for $650m portfolio of alternative investments"
EDIT:
LinkedIn has an API, however for some reason, below returns only the first two positions of experience, no other items (like education, projects). Hence the scraping approach.
library("Rlinkedin")
auth = inOAuth(application_name, consumer_key, consumer_secret)
getProfile(auth, connections = FALSE, id = NULL) # returns very limited data

You are making things unnecessarily difficult... All you need to do is issue a GET request to https://api.linkedin.com/v1/people/~?format=json after obtaining an OAuth 2.0 token from Linkedin. In R, you can do this using jsonlite:
library(jsonlite)
linkedin <- fromJSON('https://api.linkedin.com/v1/people/~?format=json')
position <- linkedin$headline
You must have the 'r_basicprofile' member permission on your oauth token.

Related

How to scrape Local Storage KEY/VALUES with R or Python (RVEST, HTTR, XHR, or something like that)

I've been trying to scrape this page's Data https://data.anbima.com.br/debentures?page=1&size=2000&.
I easily could get that table usin rvest bs4 etc.
However i found that the JSON file that sources the table's data have others useful complementary information.
Then i found the XHR link in the browser inspection panel provides the access to the JSON file.
I've been using this link for several month, however in the last few weeks that link (https://data.anbima.com.br/debentures-bff/debentures?page=0&size=2000&field=&order=&) started to request for a authorization code (TOKEN). The issue is that this token changes every period of time or another criteria.
I explore i little more and figure-out that TOKEN is generated by a JS and is stored in a Local Storage inside somewhere in the page. I need this token to include as headers in the code...
My simple question is: How can I scrape that value with r or python?
PLEASE CHECK THE IMAGE BELOW
LOCAL STORAGE VALUES
My simple question is: How can I scrape that value with r or python?
library(httr)
library(rlist)
library(jsonlite)
library(dplyr)
library(tidyverse)
library(V8)
resp<-GET("https://data.anbima.com.br/debentures?page=1&size=1499")
http_type(resp)
http_error(resp)
query <- list(
page="0",
size="1470",
field="",
order=""
)
URL <- "https://data.anbima.com.br/debentures-bff/debentures"
resp<-GET(URL,
c(
# add_headers(Referer = "https://data.anbima.com.br/debentures?page=1&size=1470&"),
add_headers(Authorization = "03AGdBq25HDdu4v2AzEjXJ_twI97EMrFlaNIcs3IuDHWzTFIp2mCXBqPaQPikuK7VRS3D7IC2v5briUdxPK3LpMPqrb1NoBqcXuI8gUkFdgVyNlObIdNzwQpVjcYASaW9N_gDx-M0SclFK54dDXHyRI7UVPAEQryV-1YSF6ebdJbY4BDr_eXRgMYe6UcK_Uh0YdfU1pMlcuU8O5dXKoRA-9GcX_AeaUxAUo5Mo_hQEGb0IPkPxojvEfgHvFdK0SQ4wgnmnJ0pcieO3h2exnJY1QxQd9sqqkfzdbGLaaCC7eNeWzXRAO3Yd9HtUciMclK612LfEm_ut89rtw8hSzlX3ZY6Vmo6zTvPT0WlMUrGLZ7syDEoDJKCi5xv6CSNgdAxqqqudEltDPUB7
")
),
query=query)
js <- fromJSON(content(resp,as="text"))[[1]]
enter image description here

R Moving Yahoo Fantasy API from Oauth 1 to Oauth2

It appears that Yahoo has discontinued support for oauth1.0. I am trying to update my R code to use oauth2.0 using the httr package, and am stumped. I am able to get token, but am unable to use the token to query the api.
I continue to get "You are not authorized to view this league".
options("httr_oob_default" = T)
library(httr)
b_url <- "https://fantasysports.yahooapis.com" #base url
#Create Endpoint
yahoo <- httr::oauth_endpoint(authorize = "https://api.login.yahoo.com/oauth2/request_auth"
, access = "https://api.login.yahoo.com/oauth2/get_token"
, base_url = b_url)
#Create App
yahoo_app <- httr::oauth_app("yahoo", key=cKey, secret = cSecret,redirect_uri = "oob")
#Open Browser to Authorization Code
httr::BROWSE(httr::oauth2.0_authorize_url(yahoo, yahoo_app, scope="fspt-r"
, redirect_uri = yahoo_app$redirect_uri))
#Code = zp6v82a
#Create Token
yahoo_token<- httr::oauth2.0_access_token(yahoo,yahoo_app,code="zp6v82a")
Now im not sure where to go from here.
If there is any advice, or an easier method please let me know. I am an amateur coder so please, be gentle.

How to get latest news posted on twitter by a website

I am using R and I need to retrieve the few most recent posts from a Twitter user (#ExpressNewsPK) using twitteR api. I have created an account and have an access token, etc. I have used the following command to extract the tweets:
setup_twitter_oauth(consumerkey,consumersecret,accesstoken,accesssecret)
express_news_tweets <- searchTwitter("#ExpressNewsPK", n = 10, lang = "en" )
However, the posts that are returned aren't the most recent ones from this user. Where have I made a mistake?

I think searchTwitter would search with the search string provided (here #ExpressNewsPK). So instead of giving tweets by #ExpressNewsPK it would give tweets which are directed to #ExpressNewsPK.
To get tweets from #ExpressNewsPK, you have a function named userTimeline which would give tweets from a particular user.
So after you are done with setup_twitter_oauth, you can try
userTimeline("ExpressNewsPK")
read more about it at ?userTimeline

When you use searchTwitter(), you call the Twitter Search API. Search API only returns a sample history of the tweets.
What you really need to do is to call Twitter Streaming API. Using it you'll be able to download tweets in near real time. You can read more about the Streaming API here: https://dev.twitter.com/streaming/overview

How to Mine Data from a Facebook Group using R?

I am using RFacebook package to do data mining.
For example, to get the facebook page then we do
fb_page <- getPage(page="facebook", token=fb_oauth)
In my case is it is a private group and the URL is something like this. So how do I get the page info for a group? I tried the following but got the following error.
Error in callAPI(url = url, token = token) : Unknown path
components: /posts
my_page <- getPage(page="group/222568978569", token=my_oauth)

This isn´t a reproducible example because the page you mention doesn´t exist. But I suggest that you check that this group has authorized your app. Note that after the introduction of version 2.0 of the Graph API, only friends/groups who are using the application that you used to generate the token to query the API will be returned.

How can I get a users from user public Twitter list by Twitter API

My problem is with a "cursor" in Twitter API. I know how to get users from public Twitter list, but I've got only the first 20 user. As I read at Twitter API doc , there is a parameter "cursor". With cursor I gen get next chunk of data, but I'm completly stuck how to do it.
My script is:
# package checking
require(httr)
require(jsonlite)
# authorization
myapp = oauth_app("TT_APIR", key="xxxxxxx", secret="xxxxxxx")
sig=sign_oauth1.0(myapp, token="xxxxxxx",token_secret="xxxxxx")
# Get list members Twitter's list "official-twitter-accts"
listMembersGet= GET("https://api.twitter.com/1.1/lists/members.json?slug=official- twitter-accts&owner_screen_name=twitter", sig)
listMembersContent = content(listMembersGet)
listMembers=jsonlite::fromJSON(toJSON(listMembersContent))
# endo of script
Anyone can help?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: LinkedIn scraping using rvest - r

Related

How to scrape Local Storage KEY/VALUES with R or Python (RVEST, HTTR, XHR, or something like that)

R Moving Yahoo Fantasy API from Oauth 1 to Oauth2

How to get latest news posted on twitter by a website

How to Mine Data from a Facebook Group using R?

How can I get a users from user public Twitter list by Twitter API

Categories

Resources