Whenever i'm trying to get a post with a lot of comments from Facebook with Rfacebook's getPost-function, i get the following error:
Error in while (n.l < n.likes & length(content$data) > 0 & !is.null(url <- content$paging$`next`)) { :
Argument has length 0
The code i'm trying to run looks like this:
post <- getPost(post = "Post-ID", token = token, n = 200)
I've also tried playing around with the different arguments of the function but nothing so far has worked... Anyone has an idea what could have caused this error? Any help is greatly appreciated!
Here's the link to the documentation of the getPost function: https://www.rdocumentation.org/packages/Rfacebook/versions/0.6.15/topics/getPost
I have a way that attacks your problem from a slightly different angle.
Instead of tackling the post ID you could, extract it from the 'Page' angle, and also this is an easier way of getting the Post ID
Step1:
see what 'Page' the post is on then you can extract the 'post' but making sure to use time parameters - for example:
"If you want to extract a post form the Nike FB page that has a massive amount of comments - which happened to fall on June, 6th 2016"
nike_posts <- getPage("nike", token = fboauth, n=100000, since = '2016/06/05', until = '2016/06/07')
Step 2:
You will then have a data frame of posts - lets say example 7 observations for that time (they possibly post multiple times a day)
If the post you are looking for is observation #3, then extract the comments by:
Comments <- getPost(nike_posts$id[3], token = fboauth, n = 10000, comments = TRUE, likes = FALSE, n.likes = 1, n.comments = 100000)
to convert this output to a DataFrame
library(plyr)
Comments <- ldply(Comments, data.frame)
Related
I am trying to create a function which can return a data.frame consisting of data from various customers. This is done through an API call, but the challenge is to handle the data types that the API call returns. I have done this before for one customer at a time with no problem. But, things turn difficult when I want to loop over multiple customers.
The far from finished code looks like:
vClients = c("1234", "4321")
clientTransactions <- function(vClients) {
#Create a vector of URL strings:
urlClients = paste0("http:///api/clients/",vClients,"/transactions?")
#Use the URLS in API calls to get data for each client
for (i in 1:length(urlClients)) {
getClients = httr::GET(urlClients[i], accept_json()) #Returns a list [10](S3:Response)
getClients_content = httr::content(getClients[i], as = "text") #This step is wrong
getClients_json = jsonlite::fromJSON(getClients_content[i]) #This is where i want to end up with a data.frame of data for each client
}
}
Once again, I know the function is far from complete. But, in order for me to get any further I need to understand how to loop over the "getClients" object. If we look for a single customer we have:
getClients = paste0("http:///api/clients/1234/transactions?")
getClients_content = httr::GET(getClients, accept_json())
getClients_content
Response [http://api/clients/1234/transactions?]
Date: 2022-02-16 10:02
Status: 200
Content-Type: application/json; charset=utf-8
Size: 801 kB
But, when I try to do it for multiple web URLs in the urlClients it does not work. For example, if I write getClients[I] It just returns the web_url. Not the "response object".
So I guess my question is, how can I deal with this list of S:3 respones in my function and loop?
]
Any ideas or other thoughts would be much appreciated.
Each day, I get an email with the quantities of fruit sold on a particular day. The structure of the email is as below:
Date of report:,04-JAN-2022
Time report produced:,5-JAN-2022 02:04
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
I'm trying to build some code in R that will search through my emails, find all emails with a particular subject, iterate through each email to find the variables I'm looking for, take the values and place them in a dataframe with the "Date of report" put in a date column.
With the assistance of people in the community, I was able to achieve the desired result in Python. However as my project has developed, I need to now achieve the same result in R if at all possible.
Unfortunately, I'm quite new to R and therefore if anyone has any advice on how to take this forward I would greatly appreciate it.
For those interested, my Python code is below:
#PREP THE STUFF
Fruit_1 = "Apples"
Fruit_2 = "Pears"
searchf = [
Fruit_1,
Fruit_2
]
#DEF THE STUFF
def get_report_vals(report, searches):
dct = {}
for line in report:
term, *value = line
if term.casefold().startswith('date'):
dct['date'] = pd.to_datetime(value[0])
elif term in searches:
dct[term] = float(value[0])
if len(dct.keys()) != len(searches):
dct.update({x: None for x in searches if x not in dct})
return dct
#DO THE STUFF
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
results = []
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 in message.body and Fruit_2 in message.body:
data = [line.strip().split(",") for line in message.body.split('\n')]
results.append(get_report_vals(data, searchf))
else:
pass
fruit_vals = pd.DataFrame(results)
fruit_vals.columns = map(str.upper, fruit_vals.columns)
I'm probably going about this the wrong way, but I'm trying to use the steps I took in Python to achieve the same result in R. So for example I create some variables to hold the fruit sales I'm searching for, then I create a vector to store the searchables, and then when I create an equivalent 'get_vals' function, I create an empty vector.
library(RDCOMClient)
Fruit_1 <- "Apples"
Fruit_2 <- "Pears"
##Create vector to store searchables
searchf <- c(Fruit_1, Fruit_2)
## create object for outlook
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")
search <- OutApp$AdvancedSearch("Inbox", "urn:schemas:httpmail:subject = 'FRUIT QUANTITIES'")
inbox <- outlookNameSpace$Folders(6)$Folders("Inbox")
vec <- c()
for (x in emails)
{
subject <- emails(i)$Subject(1)
if (grepl(search, subject)[1])
{
text <- emails(i)$Body()
print(text)
break
}
}
read.table could be a good start for get_report_vals.
Code below outputs result as a list, exception handling still needs to be implemented :
report <- "
Date of report:,04-JAN-2022
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
"
get_report_vals <- function(report,searches) {
data <- read.table(text=report,sep=",")
colnames(data) <- c('key','value')
# find date
date <- data[grepl("date",data$key,ignore.case=T),"value"]
# transform dataframe to list
lst <- split(data$value,data$key)
# output result as list
c(list(date=date),lst[searches])
}
get_report_vals(report,c('Lemons','Oranges'))
$date
[1] "04-JAN-2022"
$Lemons
[1] "4"
$Oranges
[1] "2"
The results of various reports can then be concatenated in a data.frame using rbind:
rbind(get_report_vals(report,c('Lemons','Oranges')),get_report_vals(report,c('Lemons','Oranges')))
date Lemons Oranges
[1,] "04-JAN-2022" "4" "2"
[2,] "04-JAN-2022" "4" "2"
The code now functions as intended. Function was written quite a bit differently from those recommended:
get_vals <- function(email) {
body <- email$body()
date <- str_extract(body, "\\d{2}-[:alpha:]{3}-\\d{4}") %>%
as.character()
data <- read.table(text = body, sep = ",", skip = 9, strip.white = T) %>%
row_to_names(1) %>%
mutate("Date" = date)
return(data)
}
In addition I've written this to bind the rows together:
info <- sapply(results, get_vals, simplify = F) %>%
bind_rows()
May this is not what you are expecting to get as an answer, but I must state that here to help other readers to avoid such mistakes in future.
Unfortunately your Python code is not well-written. For example, I've noticed the following code where you iterate over all items in a folder and check the Subject and message bodies for keywords:
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 in message.body and Fruit_2 in message.body:
You need to use the Find/FindNext or Restrict methods of the Items class instead. So, you don't need to iterate over all items in a folder. Instead, you get only items that correspond to your conditions. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
You may combine all your search criteria into a single query. So, you just need to iterate over found items and extract the data.
Also you may find the AdvancedSearch method helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). You can read more about this in the Filtering article in MSDN. To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information.
I keep getting a returned "Error:" with my code but have no idea why when the same code works fine with a similar dataset.
Changed the observation variable, changed the time restrictions, searched for similar problems online
library(rerddap)
CalPoly = info("HABs-CalPoly", url= "http://erddap.sccoos.org/erddap/")
CalPoly_Data = tabledap(CalPoly,
fields = c('Ceratium','Cochlodinium', 'Dinophysis_spp', 'Gymnodinium_spp','time'),
'time>=2008-08-15T00:00:00Z', 'time<=2019-05-26T05:35:00Z')
Should return a data table but I just keep getting "Error:"
This similar code does work though and I have no idea why
CalCOFI = info('siocalcofiHydroCasts')
calcofi.df <- tabledap(CalCOFI,
fields = c('cst_cnt', 'date', 'year', 'month', 'julian_date', 'julian_day', 'rpt_line', 'rpt_sta', 'cruz_num', 'intchl', 'intc14', 'time'),
'time>=1984-01-01T00:00:00Z', 'time<=2014-04-17T05:35:00Z')
Resolved the issue!
I initially set the url correctly for the info argument, I didn't realize I had to set the url again for the tabledap argument. I did not realized it the Default is: https://upwell.pfeg.noaa.gov/ erddap/
Five hours later but at least it is resolved!
The code now works:
CalPoly_Data = tabledap(CalPoly, fields = c('Temp','time'),'time>=2008-08-15T07:00:00Z', 'time<=2019-05-26T07:00:00Z', url = "http://erddap.sccoos.org/erddap/")
this might look simple.. but dk how to do it
this is the information:
So.. i got the Cumulative Total using this function:
CumulativeTotal = CALCULATE(
SUM(vnxcritical[Used Space GB]),
FILTER(ALL(Datesonly[Date]),
Datesonly[Date] <= MAX(Datesonly[Date])))
But what i need is to get the differences between the dates, in the first date and the second the difference will be of 210. I need to get another column with that information. know the formula to do that?
ok..
So.. i used this:
IncrmentalValueTEST =
VAR CurrDate = MAX(vnxcritical[Date])
VAR PrevDate = CALCULATE(LASTDATE(vnxcritical[Date]), vnxcritical[Date] < CurrDate)
RETURN SUM(vnxcritical[Used Space GB]) -
CALCULATE(SUM(vnxcritical[Used Space GB]), vnxcritical[Date] = PrevDate)
and this is the result:
Ok, so this is is my data table:
You can see all the dates that i have for now, this is a capacity report for diferents EMC Storage Arrays, for diferentes Pools. The idea would be to have the knolwdge to review the incremental space used in a determinated portion of time.
allready tried another idea to get this, but the result was the same.. i used this:
Diferencia =
Var Day = MAX(Datesonly[Month])
Var Month = MAX(Datesonly[Year])
RETURN
SUM('Used Space'[used_mb])
- CALCULATE(
SUM('Used Space'[used_mb])
,FILTER(ALL(Datesonly[Date]),Datesonly[Date] <= Max(Datesonly[Date])))
But the return is the same.. "47753152401"
i'm using graphical filters, and other things to get a minimal view, because there are only 5 weekly reports and the sql database got more than 150.000 rows.
and this is the relation that i made with a only a table full of "dates" in order to invoke the function in a better way, but the result is the same..
Try something along these lines:
IncrmentalValue =
VAR CurrDate = MAX(Datesonly[Date])
VAR PrevDate = CALCULATE(LASTDATE(Datesonly[Date]), Datesonly[Date] < CurrDate)
RETURN SUM(vnxcritical[Used Space GB]) -
CALCULATE(SUM(vnxcritical[Used Space GB]), Datesonly[Date] = PrevDate)
First, calculate the current date and then find the previous date by taking the last date that occurred before it. Then take the difference between the current value and the previous value.
I am using the Rfacebook package to scrape a List of public pages that are of interest for my research question. The authentification works properly and I can get dataframes of all public posts, reactions towards the posts and comments made on these posts.
However, I´m running into an issue when I try to extract the replies to comments under the public posts. This is the code that I´m using:
BSBKB <-getPage("bersenbrueckerkreisblatt", token = my_OAuth, feed = TRUE, reactions = TRUE,verbose = TRUE, n = 1000)
#Getting comments for Post No.4
Comments <- getPost(BSBKB$id[4],token = my_OAuth, reactions = TRUE, n =180,likes=TRUE)
#Getting replies to comment No.4 under Post No.4
replies <- getCommentReplies(Comments$comments$id[4], token = my_OAuth, n = 500, replies = FALSE, likes= TRUE)
This code throws the following Error:
Error in data.frame(from_id = json$from$id, from_name = json$from$name, : arguments imply differing number of rows: 0, 1
Strangely enough, the same Error occurs when I try to run the example code from the ?getCommentReplies() page:
## Not run:
## See examples for fbOAuth to know how token was created.
## Getting information about Facebook's Facebook Page
load("fb_oauth")
fb_page <- getPage(page="facebook", token=my_OAuth)
## Getting information and likes/comments about most recent post
post <- getPost(post=fb_page$id[1], n=2000, token=my_OAuth)
## Downloading list of replies to first comment
replies <- getCommentReplies(comment_id=post$comments$id[1], token=my_OAuth)
## End(Not run)
Resulting in:
Error in data.frame(from_id = json$from$id, from_name = json$from$name, :
arguments imply differing number of rows: 0, 1
Is this a systematic error in the package, a recent change in the API or did I make a mistake somewhere? Any suggestions on how to work around this and to extract comment replies (and reactions to them ideally) would be great!
The sourcecode of the function getCommentReplies is published on Github: https://github.com/yanturgeon/R_Script/blob/master/getCommentReplies_dev.R
Reload this code in your own environment, but before you do it, commentout the line:
out[["reply"]] <- replyDataToDF(content)
The effect will be still list, not a dataframe.