I'm using R to successful make API calls. For each individual call I need to alter one or two distinguishing ID's (in the case of the code below activity_id and/or name_id). The code is working fine however I am now in a position where I like to automate this process as opposed to manually changing the ID's for each call. I am wondering if there's a way to loop this using a data frame or list to store the relevant ID's.
I've searched across Stack however I'm yet to find or execute an appropriate solution.
Any help would be appreciated.
Thanks,
JPC
apiKey <-"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI"
example<- GET("url?activity_id=b5cb9359-f0e5-4939-9be3-fc95f8bc7d6b&name_id=f1e17fa6-c40c-4810-9c43-60939e2a9a99",add_headers(Authorization = paste("Bearer", apiKey)))
example <-content(example,"text")
example <-fromJSON(example,flatten = TRUE)
example <-unnest(example,data)
write.csv(example,"example.csv",row.names=F)
We could write a function where we dynamically generate url using sprintf based on activity_id and name_id passed.
get_data <- function(activity_id, name_id) {
url <- sprintf('url?activity_id=%s&name_id=%s', activity_id, name_id)
example<- httr::GET(url,add_headers(Authorization = paste("Bearer", apiKey)))
example <- httr::content(example,"text")
example <- jsonlite::fromJSON(example,flatten = TRUE)
example <- tidyr::unnest(example,data)
return(example)
}
and then call it using Map.
out <- Map(get_data, activity_vec, name_vec)
Here activity_vec and name_vec are the vector of respective id's. This will return a list of dataframes in out which can be combined into one dataframe if needed before writing to csv.
If only name_id is changing we can do
get_data <- function(name_id) {
url <- sprintf('url?activity_id=b5cb9359-f0e5-4939-9be3-fc95f8bc7d6&name_id=%s', name_id)
example<- httr::GET(url,add_headers(Authorization = paste("Bearer", apiKey)))
example <- httr::content(example,"text")
example <- jsonlite::fromJSON(example,flatten = TRUE)
example <- tidyr::unnest(example,data)
return(example)
}
out <- lapply(name_vec, get_data)
Related
I'm using r to download data from an api that uses a key. I've downloaded the data for AK into a df called officials and I would like to download the data for the remaining states using rbind to add each state to the df officials. But the format of the call to the api requires the state abbreviation without ". That is, stateId=AK not "AK". Is there a way to do this? I tried the code below and then realized my error in the GET command specifying stateID. My code inserts "AL" not AL.
states <- c("AL","AR","AZ","CA","CO","CT")
for(i in 1:length(states)) {
temp_raw <- GET("http://api.votesmart.org/Officials.getByOfficeTypeState?key=xxx&officeTypeId=L&stateId=states[i]&o=JSON")
my_content <- content(temp_raw, as = 'text')
my_content2 <- fromJSON(my_content)
temp_officials <- my_content2$candidate$candidate
officials2022 <- rbind(officials2022,temp_officials)
}
Try this variation, using the paste command to combine the strings together into the URL:
Also, notice the simplified way to perform a for loop over states, where i is directly available.
Edit: forgot the GET
states <- c("AL","AR","AZ","CA","CO","CT")
for(i in states) {
temp_raw <- GET(paste0("http://api.votesmart.org/Officials.getByOfficeTypeState?key=xxx&officeTypeId=L&stateId=", i, "&o=JSON"))
...
}
Maybe I'm asking for something too simple, but I can't solve this problem.
I want to create a script that recursively enters the folders present in a base_folder, opens a specific file with a name that is always the same (w3nu) and selects a precise value (I need to select the email of the subject belonging to the Response column, filtering for the corresponding heat in the Question.Key column).
I want my script to repeat itself in the same way for all the folders present in the base folder.
Finally, I want to merge all the emails into a new dataframe.
I have created this script but it does not work.
library(tidyverse)
base_folder <- "data/raw/exp_1_participants/sbj"
files <- list.files(base_folder, recursive = TRUE, full.names = TRUE)
demo_email <- files[str_detect(files, "w3nu")]
email_extraction <- function(demo_email){
demo_email <- read.csv(task,header = T)
demo_email <- demo_email %>%
filter(Question.Key == "respondent-email") %>%
select(Response)
}
email_list_jolly <- vector(mode = "list", length = length(demo_email))
for(i in 1:length(email_list_jolly)){
email_list_jolly[[i]] <- email_extraction(demo_email[i])
}
email_list_stud <- cbind(email_list_jolly)
write.csv(email_list_stud, 'data/cleaned/email_list_stud.csv')
can you help me? thanks
From comments:
Looks like you haven't defined task within the script shown above, but you're telling read.csv to find it. Did you mean to pass demo_email to read.csv instead? task is probably a random vector in your workspace.
I am trying to build a data frame with book id, title, author, rating, collection, start and finish date from the LibraryThing api with my personal data. I am able to get a nested list fairly easily, and I have figured out how to build a data frame with everything but the dates (perhaps in not the best way but it works). My issue is with the dates.
The list I'm working with normally has 20 elements, but it adds the startfinishdates element only if I added dates to the book in my account. This is causing two issues:
If it was always there, I could extract it like everything else and it would have NA most of the time, and I could use cbind to get it lined up correctly with the other information
When I extract it using the name, and get an object with less elements, I don't have a way to join it back to everything else (it doesn't have the book id)
Ultimately, I want to build this data frame and an answer that tells me how to pull out the book id and associate it with each startfinishdate so I can join on book id is acceptable. I would just add that to the code I have.
I'm also open to learning a better approach from the jump and re-designing the entire thing as I have not worked with lists much in R and what I put together was after much trial and error. I do want to use R though, as ultimately I am going to use this to create an R Markdown page for my web site (for instance, a plot that shows finish dates of books).
You can run the code below and get the data (no api key required).
library(jsonlite)
library(tidyverse)
library(assertr)
data<-fromJSON("http://www.librarything.com/api_getdata.php?userid=cau83&key=392812157&max=450&showCollections=1&responseType=json&showDates=1")
books.lst<-data$books
#create df from json
create.df<-function(item){
df<-map_df(.x=books.lst,~.x[[item]])
df2 <- t(df)
return(df2)
}
ids<-create.df(1)
titles<-create.df(2)
ratings<-create.df(12)
authors<-create.df(4)
#need to get the book id when i build the date df's
startdates.df<-map_df(.x=books.lst,~.x$startfinishdates) %>% select(started_stamp,started_date)
finishdates.df<-map_df(.x=books.lst,~.x$startfinishdates) %>% select(finished_stamp,finished_date)
collections.df<-map_df(.x=books.lst,~.x$collections)
#from assertr: will create a vector of same length as df with all values concatenated
collections.v<-col_concat(collections.df, sep = ", ")
#assemble df
books.df<-as.data.frame(cbind(ids,titles,authors,ratings,collections.v))
names(books.df)<-c("ID","Title","Author","Rating","Collections")
books.df<-books.df %>% mutate(ID=as.character(ID),Title=as.character(Title),Author=as.character(Author),
Rating=as.character(Rating),Collections=as.character(Collections))
This approach is outside the tidyverse meta-package. Using base-R you can make it work using the following code.
Map will apply the user defined function to each element of data$books which is provided in the argument and extract the required fields for your data.frame. Reduce will take all the individual dataframes and merge them (or reduce) to a single data.frame booksdf.
library(jsonlite)
data<-fromJSON("http://www.librarything.com/api_getdata.php?userid=cau83&key=392812157&max=450&showCollections=1&responseType=json&showDates=1")
booksdf=Reduce(function(x,y){rbind(x,y)},
Map(function(x){
lenofelements = length(x)
if(lenofelements>20){
if(!is.null(x$startfinishdates$started_date)){
started_date = x$startfinishdates$started_date
}else{
started_date=NA
}
if(!is.null(x$startfinishdates$started_stamp)){
started_stamp = x$startfinishdates$started_date
}else{
started_stamp=NA
}
if(!is.null(x$startfinishdates$finished_date)){
finished_date = x$startfinishdates$finished_date
}else{
finished_date=NA
}
if(!is.null(x$startfinishdates$finished_stamp)){
finished_stamp = x$startfinishdates$finished_stamp
}else{
finished_stamp=NA
}
}else{
started_stamp = NA
started_date = NA
finished_stamp = NA
finished_date = NA
}
book_id = x$book_id
title = x$title
author = x$author_fl
rating = x$rating
collections = paste(unlist(x$collections),collapse = ",")
return(data.frame(ID=book_id,Title=title,Author=author,Rating=rating,
Collections=collections,Started_date=started_date,Started_stamp=started_stamp,
Finished_date=finished_date,Finished_stamp=finished_stamp))
},data$books))
I have a list of locally saved html files. I want to extract multiple nodes from each html and save the results in a vector. Afterwards, I would like to combine them in a dataframe. Now, I have a piece of code for 1 node, which works (see below), but it seems quite long and inefficient if I apply it for ~ 20 variables. Also, something really strange with the saving to vector (XXX_name) it starts with the last observation and then continues with the first, second, .... Do you have any suggestions for simplifying the code/ making it more efficient?
# Extracts name variable and stores in a vector
XXX_name <- c()
for (i in 1:216) {
XXX_name <- c(XXX_name, name)
mydata <- read_html(files[i], encoding = "latin-1")
reads_name <- html_nodes(mydata, 'h1')
name <- html_text(reads_name)
#print(i)
#print(name)
}
Many thanks!
You can put the workings inside a function then apply that function to each of your variables with map
First, create the function:
read_names <- function(var, node) {
mydata <- read_html(files[var], encoding = "latin-1")
reads_name <- html_nodes(mydata, node)
name <- html_text(reads_name)
}
Then we create a df with all possible combinations of inputs and apply the function to that
library(tidyverse)
inputs <- crossing(var = 1:216, node = vector_of_nodes)
output <- map2(inputs$var, inputs$node, read_names)
I’ve got a df that consists of Twitter handles that I wish to scrape on a regular basis.
df=data.frame(twitter_handles=c("#katyperry","#justinbieber","#Cristiano","#BarackObama"))
My Methodology
I would like to run a for loop that loops over each of the handles in my df and creates multiple dataframes:
1) By using the rtweet library, I would like to gather tweets using the search_tweets function.
2) Then I would like to merge the new tweets to existing tweets for each dataframe, and then use the unique function to remove any duplicate tweets.
3) For each dataframe, I'd like to add a column with the name of the Twitter handle used to obtain the data. For example: For the database of tweets obtained using the handle #BarackObama, I'd like an additional column called Source with the handle #BarackObama.
4) In the event that the API returns 0 tweets, I would like Step 2) to be ignored. Very often, when the API returns 0 tweets, I get an error as it attempts to merge an empty dataframe with an existing one.
5) Finally, I would like to save the results of each scrape to the different dataframe objects. The name of each dataframe object would be its Twitter handle, in lower case and without the #
My Desired Output
My desired output would be 4 dataframes, katyperry, justinbieber, cristiano & barackobama.
My Attempt
library(rtweet)
library(ROAuth)
#Accessing Twitter API using my Twitter credentials
key <-"yKxxxxxxxxxxxxxxxxxxxxxxx"
secret <-"78EUxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
setup_twitter_oauth(key,secret)
#Dataframe of Twitter handles
df=data.frame(twitter_handles=c("#katyperry","#justinbieber","#Cristiano","#BarackObama"))
# Setting up the query
query <- as.character(df$twitter_handles)
query <- unlist(strsplit(query,","))
tweets.dataframe = list()
# Loop through the twitter handles & store the results as individual dataframes
for(i in 1:length(query)){
result<-search_tweets(query[i],n=10000,include_rts = FALSE)
#Strip tweets that contain RTs
tweets.dataframe <- c(tweets.dataframe,result)
tweets.dataframe <- unique(tweets.dataframe)
}
However I have not been able to figure out how to include in my for loop the part which ignores the concatenation step if the API returns 0 tweets for a given handle.
Also, my for loop does not return 4 dataframes in my environment, but stores the results as a Large list
I identified a post that addresses a problem very similar to the one I face, but I find it difficult to adapt to my question.
Your inputs would be greatly appreciated.
Edit: I have added Step 3) in My Methodology, in case you are able to help with that too.
tweets.dataframe = list()
# Loop through the twitter handles & store the results as individual dataframes
for(i in 1:length(query)){
result<-search_tweets(query[i],n=10,include_rts = FALSE)
if (nrow(result) > 0) { # only if result has data
tweets.dataframe <- c(tweets.dataframe, list(result))
}
}
# tweets.dataframe is now a list where each element is a date frame containing
# the results from an individual query; for example...
tweets.dataframe[[1]]
# to combine them into one data frame
do.call(rbind, tweets.dataframe)
in response to a reply...
twitter_handles <- c("#katyperry","#justinbieber","#Cristiano","#BarackObama")
# Loop through the twitter handles & store the results as individual dataframes
for(handle in twitter_handles) {
result <- search_tweets(handle, n = 15 , include_rts = FALSE)
result$Source <- handle
df_name <- substring(handle, 2)
if(exists(df_name)) {
assign(df_name, unique(rbind(get(df_name), result)))
} else {
assign(df_name, result)
}
}