Using R- My for loop only works on first iteration - r
I have a loop on each iteration does at least 1 and at most 2 API GET requests
I have a list of movies I loop through, I do a GET request to get the movies id corresponding in the API database, then I do a request to get the reviews using the movies ID.
I run the code and it works for the first movie but all the other movies remain empty. even when the first movie has no reviews.
Here is the code
for(i in 1:9964) {
id_url<- paste(url,search_title,key,COPY$Film[i],sep = "")
#GET request for movie
api_call_id<-GET(id_url)
#make readdble
read<- rawToChar(api_call_id$content)
#turn json into object
JSON<-fromJSON(read,flatten = TRUE)
if(is.null(JSON$results$id[1])){
break;
}
else{ #get the movie id from the json
id<-JSON$results$id[1]
}
#url to get movies raitings
raiting_url<- paste(url,raiting,key,id,sep = "")
#call
api_call_raiting<- GET(raiting_url)
#readble
read<- rawToChar(api_call_raiting$content)
#json
JSON<-fromJSON(read,flatten = TRUE)
if(is.null(JSON$rottenTomatoes)){
#set column value
COPY[ i, 'rTomatoes'] <- "No Review"
}
else{
#set column value
COPY[ i, 'rTomatoes'] <- JSON$rottenTomatoes
}
Related
Web Scraping: How do I return specific user input forms in python?
I'm having trouble with the forms returning an exact match for the user input. Emphasoft developer challenge: Taking a list of tax form names (ex: "Form W-2", "Form 1095-C"), search the website and return some informational results. Specifically, you must return the "Product Number", the "Title", and the maximum and minimum years the form is available for download. Taking a tax form name (ex: "Form W-2") and a range of years (inclusive, 2018-2020 should fetch three years), download all PDFs available within that range. import json import os import sys import requests from bs4 import BeautifulSoup URL = 'https://apps.irs.gov/app/picklist/list/priorFormPublication.html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&{param.strip}&isDescending=false' def get_forms(list_tax_form: list): """ function to get response from iris.gov with all forms content :param list_tax_form: list of form names that we want to get info about :return: dict with form name,form title """ response_list = [] # list for all responses of form names with requests.session() as session: for param in list_tax_form: request_params = {'value': param, 'criteria': 'formNumber', 'submitSearch': 'Find', } res = session.get(URL, params=request_params).content response_list.append(res) return response_list def parse_responses(list_tax_form: list): """ function to get all form names, titles years from previous func return :param list_tax_form: list of form names that we want to get info about :return: list of form names, titles, years """ responses = get_forms(list_tax_form) # empty lists to fill them with the received information for all names, years, and titles td_form_name, td_form_title, td_form_rev_year = [], [], [] for response in responses: soup = BeautifulSoup(response, 'lxml') td_name = soup.find_all('td', {'class': 'LeftCellSpacer'}) td_title = soup.find_all('td', {'class': 'MiddleCellSpacer'}) td_rev_year = soup.find_all('td', {'class': 'EndCellSpacer'}) td_form_name.extend(td_name) td_form_title.extend(td_title) td_form_rev_year.extend(td_rev_year) return td_form_name, td_form_title, td_form_rev_year def format_responses(list_tax_form: list): """ function to formate all responses for all forms we got! 1 Task :param list_tax_form: list of form names that we want to get info about :return: formated names,links,years """ td_names, td_titles, td_years = parse_responses(list_tax_form) names = [name.text.strip() for name in td_names] links = [link.find('a')['href'] for link in td_names] titles = [title.text.strip() for title in td_titles] years = [int(year.text.strip()) for year in td_years] set_names = set(names) final_dict = [] # loop to create dictionary of result information with years of tax form available to download for name in set_names: max_year = 0 min_year = max(years) dict1 = {'form_number': name} for index, p_name in enumerate(names): if p_name == name: if years[index] > max_year: max_year = years[index] elif years[index] < min_year: min_year = years[index] dict1['form_title'] = titles[index] dict1['max_year'] = max_year dict1['min_year'] = min_year final_dict.append(dict1) print(json.dumps(final_dict, indent=2)) return names, links, years def download_files(list_tax_form): """ 2 Task Module to download pdf files of form_name that input from user. :param list_tax_form: list of form names that we want to get info about :return: message to user of successful create file or either """ names, links, years = format_responses(list_tax_form) form_name = input('enter form name: ') if form_name in names: print('form exists. enter years range') form_year1 = int(input('start year to analysis: ')) form_year2 = int(input('end year to analysis: ')) try: os.mkdir(form_name) except FileExistsError: pass # indecies to define names range in list of all tax form names r_index = names.index(form_name) # index of first form_name mention on list l_index = names.index(form_name) # index of last form_name mention on list for name in names: if name == form_name: r_index += 1 years = years[l_index:r_index] if form_year1 < form_year2: range_years = range(form_year1, form_year2 + 1) for year in range_years: if year in years: link = links[years.index(year)] form_file = requests.get(link, allow_redirects=True) open(f'{form_name}/{form_name}_{str(year)}.pdf', 'wb').write(form_file.content) print(f'files saved to {form_name}/ directory!') else: print('input correct form name!') if __name__ == '__main__': tax_list = sys.argv[1:] # form names download_files(tax_list) (ex: "Form W-2" should not return "Form W-2 P") When this file is ran, it is displaying other unrelated results. How can I resolve this issue to display only specified user requests?
The New York Times API with R
I'm trying to get articles' information using The New York Times API. The csv file I get doesn't reflect my filter query. For example, I restricted the source to 'The New York Times', but the file I got contains other sources also. I would like to ask you why the filter query doesn't work. Here's the code. if (!require("jsonlite")) install.packages("jsonlite") library(jsonlite) api = "apikey" nytime = function () { url = paste('http://api.nytimes.com/svc/search/v2/articlesearch.json?', '&fq=source:',("The New York Times"),'AND type_of_material:',("News"), 'AND persons:',("Trump, Donald J"), '&begin_date=','20160522&end_date=','20161107&api-key=',api,sep="") #get the total number of search results initialsearch = fromJSON(url,flatten = T) maxPages = round((initialsearch$response$meta$hits / 10)-1) #try with the max page limit at 10 maxPages = ifelse(maxPages >= 10, 10, maxPages) #creat a empty data frame df = data.frame(id=as.numeric(),source=character(),type_of_material=character(), web_url=character()) #save search results into data frame for(i in 0:maxPages){ #get the search results of each page nytSearch = fromJSON(paste0(url, "&page=", i), flatten = T) temp = data.frame(id=1:nrow(nytSearch$response$docs), source = nytSearch$response$docs$source, type_of_material = nytSearch$response$docs$type_of_material, web_url=nytSearch$response$docs$web_url) df=rbind(df,temp) Sys.sleep(5) #sleep for 5 second } return(df) } dt = nytime() write.csv(dt, "trump.csv") Here's the csv file I got.
It seems you need to put the () inside the quotes, not outside. Like this: url = paste('http://api.nytimes.com/svc/search/v2/articlesearch.json?', '&fq=source:',"(The New York Times)",'AND type_of_material:',"(News)", 'AND persons:',"(Trump, Donald J)", '&begin_date=','20160522&end_date=','20161107&api-key=',api,sep="") https://developer.nytimes.com/docs/articlesearch-product/1/overview
How to check if subset is empty in R
I have a set of data with weight with time (t), I need to identify outliers of weight for every time (t), after which I need to send a notification email. I'm using bloxplot($out) to identify the outliers, it seems to work, but I'm not sure if: It's the correct way to use the boxplot? I can't detect if the boxplot has no outlier or if its empty (or maybe, I'm using a wrong technique) Or possibly the subset itself is empty (could be the root cause) For now, I just need to trap the empty subset and check if out variable is empty or not. Below is my R script code: #i am a comment, and the compiler doesn't care about me #load our libraries library(ggplot2) library(mailR) #some variables to be used later from<-"" to<-"" getwd() setwd("C:\\Temp\\rwork") #read the data file into a data(d) variable d<-read.csv("testdata.csv", header=TRUE) #file #get the current time(t) t <-format(Sys.time(),"%H") #create a subset of d based on t sbset<-subset(d,Time==t) #identify if outlier exists then send an email report out<-boxplot(sbset$weight)$out if(length(out)!=0){ #create a boxplot of the subset boxplot(sbset$weight) subject = paste("Attention: An Outlier is detected for Scheduled Job Run on Hour ",t) message = toString(out) #sort(out) }else{ subject = paste("No Outlier Identified") message = "" } email<-send.mail(from=from, to=to, subject=subject, body=message, html=T, smtp=list(host.name = "smtp.gmail.com", port = 465, user.name = from, passwd = "", #password of sender email ssl = TRUE), authenticate=TRUE, send=TRUE) DATA weight,Time,Chick,x 42,0,1,1 51,2,1,1 59,4,1,1 64,6,1,1 76,8,1,1 93,10,1,1 106,12,1,1 125,14,1,1 149,16,1,1 171,18,1,1 199,20,1,1 205,21,1,1 40,0,2,1 49,2,2,1 58,4,2,1 72,6,2,1 84,8,2,1 103,10,2,1 122,12,2,1 138,14,2,1 162,16,2,1 187,18,2,1 209,20,2,1 215,21,2,1 43,0,3,1 39,2,3,1 55,4,3,1 67,6,3,1 84,8,3,1 99,10,3,1 115,12,3,1 138,14,3,1 163,16,3,1 187,18,3,1 198,20,3,1 202,21,3,1 42,0,4,1 49,2,4,1 56,4,4,1 67,6,4,1 74,8,4,1 87,10,4,1 102,12,4,1 108,14,4,1 136,16,4,1 154,18,4,1 160,20,4,1 157,21,4,1 41,0,5,1 42,2,5,1 48,4,5,1 60,6,5,1 79,8,5,1 106,10,5,1 141,12,5,1 164,14,5,1 197,16,5,1 199,18,5,1 220,20,5,1 223,21,5,1 41,0,6,1 49,2,6,1 59,4,6,1 74,6,6,1 97,8,6,1 124,10,6,1 141,12,6,1 148,14,6,1 155,16,6,1 160,18,6,1 160,20,6,1 157,21,6,1 41,0,7,1 49,2,7,1 57,4,7,1 71,6,7,1 89,8,7,1 112,10,7,1 146,12,7,1 174,14,7,1 218,16,7,1 250,18,7,1 288,20,7,1 305,21,7,1 42,0,8,1 50,2,8,1 61,4,8,1 71,6,8,1 84,8,8,1 93,10,8,1 110,12,8,1 116,14,8,1 126,16,8,1 134,18,8,1 125,20,8,1 42,0,9,1 51,2,9,1 59,4,9,1 68,6,9,1 85,8,9,1 96,10,9,1 90,12,9,1 92,14,9,1 93,16,9,1 100,18,9,1 100,20,9,1 98,21,9,1 41,0,10,1 44,2,10,1 52,4,10,1 63,6,10,1 74,8,10,1 81,10,10,1 89,12,10,1 96,14,10,1 101,16,10,1 112,18,10,1 120,20,10,1 124,21,10,1 43,0,11,1 51,2,11,1 63,4,11,1 84,6,11,1 112,8,11,1 139,10,11,1 168,12,11,1 177,14,11,1 182,16,11,1 184,18,11,1 181,20,11,1 175,21,11,1 41,0,12,1 49,2,12,1 56,4,12,1 62,6,12,1 72,8,12,1 88,10,12,1 119,12,12,1 135,14,12,1 162,16,12,1 185,18,12,1 195,20,12,1 205,21,12,1 41,0,13,1 48,2,13,1 53,4,13,1 60,6,13,1 65,8,13,1 67,10,13,1 71,12,13,1 70,14,13,1 71,16,13,1 81,18,13,1 91,20,13,1 96,21,13,1 41,0,14,1 49,2,14,1 62,4,14,1 79,6,14,1 101,8,14,1 128,10,14,1 164,12,14,1 192,14,14,1 227,16,14,1 248,18,14,1 259,20,14,1 266,21,14,1 41,0,15,1 49,2,15,1 56,4,15,1 64,6,15,1 68,8,15,1 68,10,15,1 67,12,15,1 68,14,15,1 41,0,16,1 45,2,16,1 49,4,16,1 51,6,16,1 57,8,16,1 51,10,16,1 54,12,16,1 42,0,17,1 51,2,17,1 61,4,17,1 72,6,17,1 83,8,17,1 89,10,17,1 98,12,17,1 103,14,17,1 113,16,17,1 123,18,17,1 133,20,17,1 142,21,17,1 39,0,18,1 35,2,18,1 43,0,19,1 48,2,19,1 55,4,19,1 62,6,19,1 65,8,19,1 71,10,19,1 82,12,19,1 88,14,19,1 106,16,19,1 120,18,19,1 144,20,19,1 157,21,19,1 41,0,20,1 47,2,20,1 54,4,20,1 58,6,20,1 65,8,20,1 73,10,20,1 77,12,20,1 89,14,20,1 98,16,20,1 107,18,20,1 115,20,20,1 117,21,20,1 40,0,21,2 50,2,21,2 62,4,21,2 86,6,21,2 125,8,21,2 163,10,21,2 217,12,21,2 240,14,21,2 275,16,21,2 307,18,21,2 318,20,21,2 331,21,21,2 41,0,22,2 55,2,22,2 64,4,22,2 77,6,22,2 90,8,22,2 95,10,22,2 108,12,22,2 111,14,22,2 131,16,22,2 148,18,22,2 164,20,22,2 167,21,22,2 43,0,23,2 52,2,23,2 61,4,23,2 73,6,23,2 90,8,23,2
Your first use of boxplot is unnecessarily creating a plot, you can use out <- boxplot.stats(sbset$weight)$out for a little efficiency. You are interested in the presence of rows, but length(sbset) will return the number of columns. I suggest instead nrow or NROW. if (NROW(out) > 0) { boxplot(sbset$weight) # ... } else { # ... }
RSelenium - Storing data in an array
I'm extracting events description from a list of event in my website. Each event is a href link which goes to another page where we can find the image and the description of the event. I'm trying to store the image url and the description of all events in an array so I used the code below in the end of my loop, but I only get the image and the description of the last event looped: m<-c(images_of_events) n<-c( description_of_events) cc<-remDr$findElement(using = "css", "[class = '_24er']") cc<-remDr$getPageSource() page_events<-read_html(cc[[1]][1]) links_events_data=html_nodes(page_events,'._24er > table > tbody > tr > td > div> div._4dmk > a ') events_urls<-html_attr(links_events_data,"href") //the loop of each event for (i in events_urls) { remDr$navigate(paste("localhost://www.mywebsite",i,sep="")) #get image imagewebElem <- remDr$findElement(using = "class", "scaledImageFitWidth") images_of_events<-imagewebElem $getElementAttribute("src") descriptionwebElem <-remDr$findElement(using = "css", "[class = '_63ew']") descriptionwebElem <-remDr$getPageSource() page_event_description<-read_html(descriptionwebElem[[1]][1]) events_desc =html_nodes(page_event_description,'._63ew > span') description_of_events= html_text(events_desc) m<-c(images_of_events) n<-c( description_of_events) }
To save values in array in R you have to 1) create the array/data.frame dta <- data.frame(m=c(),n=c()) and then save to it dta[i,1] <- image_of_events and dta[i,2] <- description_of_evants where i is numeric iterator 2) create the array/data.frame and use rbind to add values like dta <- rbind(dta, data.frame(m=images_of_events, n = description_of_events))
R Language: getCommentReplies() error:
despite reading the existing answers, I still don't know how to fix this problem. I am trying to extract Comments For each post in the 1st phase which it is doing successfully and then in the 2nd phase for each comment extract the corresponding replies for that comment (i.e. in my program when i=1 [1st post] AND when j=1 [1st comment] ) However by the time getCommentreplies() tries to extract the very first reply for the very first comment of the first post it throws up the following error: Error in data.frame(from_id = json$from$id, from_name = json$from$name, : arguments imply differing number of rows: 0, 1 my program: load ("fb_oauth") fb_page_no_nullz<-getPage(page="gtbank", token=fb_oauth,n=130, since= '2018/3/10', until= '2018/3/12',feed=TRUE,api = 'v2.11') #Extract THE LATEST n=7 FCMB posts excluding Null rows from FCMB page# into variable/vector fb_page . no_of_rows=na.omit(nrow(fb_page_no_nullz)) #Count the number of rows without NULLS and store in var no_of_rows i=1 all_comments<-NULL while (i<=no_of_rows) { postt <- getPost(post=fb_page_no_nullz$id[i], n=200, token=fb_oauth, comments = TRUE, likes=FALSE, api= "v2.11" ) #Extract N comments for each post no_of_row_c=na.omit(nrow(postt$comments)) if(no_of_row_c!=0) #If their are no comments for each post then pick the next post. { comment_details<-postt$comments[,1:7] comment_details$from_id<-comment_details$from_name<-NULL # This line removes the columns from_id AND from_name from the v data Frame j =1 while (j<=no_of_row_c) { repl<-NULL repl<-getCommentReplies(comment_details$id[i],token=fb_oauth,n=200,replies=TRUE,likes=FALSE,n.replies=100) j=j+1 } } #all_comments$from_id<-all_comments$from_name<-NULL # This line removes the columns from_id AND from_name from the v data Frame all_comments<-rbind(all_comments,comment_details) # Cummutatively append all comments for all posts into the data frame all_comments i=i+1 } #allPC<-merge(all_comments,fb_page_no_nullz, by.x= substr(c("id"),1,14), by.y=substr(c("id"),14,30),all.x = TRUE)