using GET in a loop - r

I am using the following code. I create a list of first names and then generate links to an API for each name and then try to capture the data from each link.
mydata$NameGenderURL2 <- paste ("https://gender-api.com/get?name=",mydata$firstname, "&key=suZrzhrNJRvrkWFXAG", sep="")
mynamegenderfunction <- function(x){
GET(url= mydata$NameGenderURL2[x])
this.raw.content <- genderdata$content
this.raw.content <- rawToChar(genderdata$content)
this.content <- fromJSON(this.raw.content)
name1[x] <- this.content$name
gender1[x] <- this.content$gender}
namelist <- mydata$firstname[1:100]
genderdata <- lapply(namelist, mynamegenderfunction)
Oddly enough I receive the following message:
>Error in curl::curl_fetch_memory(url, handle = handle) :
>Could not resolve host: NA`
I tried another API and got the same issue. Any suggestions?
Here is a data sample:
namesurl
https://api.genderize.io/?name=kaan
https://api.genderize.io/?name=Joan
https://api.genderize.io/?name=homeblitz
https://api.genderize.io/?name=Flatmax
https://api.genderize.io/?name=BRYAN
https://api.genderize.io/?name=James
https://api.genderize.io/?name=Dion
https://api.genderize.io/?name=Flintu
https://api.genderize.io/?name=Adriana
The output that I need is the gender for each link, which would be :Male/Female, Null

Related

Page limit using rvest

I'm having an issue when using rvest to scrape 466 pages from a wiki. Each page represents a metric that I need further information about. I have the following code which loops through each link (loaded from a csv file) and extracts the information I need from a html table on each page.
Metrics <- read.csv("C:\\Users\\me\\Documents\\WebScraping\\LONMetrics.csv")
Metrics$Theme <- as.character(paste0(Metrics$Theme))
Metrics$Metric <- as.character(paste0(Metrics$Metric))
Metrics$URL <- as.character(paste0(Metrics$URL))
n = nrow(Metrics)
i = 1
while (i <= n) {
webPage <- read_html(Metrics$URL[i])
pageTable <- html_table(webpage)
Metrics$Definition[i] <- pageTable[[1]]$X2[1]
Metrics$Category[i] <- pageTable[[1]]$X2[2]
Metrics$Calculation[i] <- pageTable[[1]]$X2[3]
Metrics$UOM[i] <- pageTable[[1]]$X2[4]
Metrics$ExpectedTrend[i] <- pageTable[[1]]$X2[6]
Metrics$MinTech[i] <- pageTable[[1]]$X2[7]
i = i+1
}
The problem I'm having is that it stops returning data after 32 pages giving an error as:
Error in read_connection_(x, n) :
Evaluation error: Failure when receiving data from the peer
I'm wondering what the cause may be and how to get around this seeming limitation?
Thanks.
Rob

jsonlite suddenly retunring error: "Failure when receiving data from the peer"

Suddenly, over the weekend, my code is no longer working.
when I run it, I receive the following message:
Error in parse_con(txt, bigint_as_char) :
Failure when receiving data from the peer
the code is the following:
raiz <- "https://olinda.bcb.gov.br/olinda/servico/Expectativas/versao/v1/odata/"
tipo <- "ExpectativaMercadoMensais?%24format=json&%24select="
indicador <- "Indicador,Data,DataReferencia,Mediana,numeroRespondentes"
restricao <- "&%24orderby=Data%20desc&%24filter=Indicador%20eq%20'IPCA'&%24top=10"
library("jsonlite")
jsonlite::fromJSON(paste0(raiz,tipo,indicador,restricao), simplifyVector = FALSE)
There is a problem with the GET function that jsonlite uses to read the website. Use readLines instead.
raiz <- "https://olinda.bcb.gov.br/olinda/servico/Expectativas/versao/v1/odata/"
tipo <- "ExpectativaMercadoMensais?%24format=json&%24select="
indicador <- "Indicador,Data,DataReferencia,Mediana,numeroRespondentes"
restricao <- "&%24orderby=Data%20desc&%24filter=Indicador%20eq%20'IPCA'&%24top=10"
library("jsonlite")
web <- readLines(paste0(raiz,tipo,indicador,restricao), warn = FALSE)
df <- jsonlite::fromJSON(web, simplifyVector = FALSE)
I didn't understand your query, but here we have one that works:
web <- readLines("https://olinda.bcb.gov.br/olinda/servico/Expectativas/versao/v1/odata/ExpectativasMercadoInflacao12Meses?$format=json", warn = FALSE)
df <- fromJSON(web)
df$value

Error in file(con, "rb") : cannot open the connection External Hard Drive R

I have a code block of the following:
# Obtain records from all patients
patientDir <- sort(list.dirs(path = "sample_images", full.names = TRUE, recursive = FALSE))
dataframes <- list()
i = 1
while(i<19){
# Strip the patient out
patient <- coreHist(patientDir[i])
print("1")
setwd("/Volumes/HUGE storage drive/")
exists<- file.exists(patientDir[i])
print(exists)
# Extract the relevant information from the patient
dicom <- readDICOM(patientDir[i])
dicomdf <- dicomTable(dicom$hdr)
patient_id <- dicomdf$`0010-0020-PatientID`[1]
print("2")
# Normalize their VX's
sum<- sum(patient$histData$finalFreq)
print("3")
# Create the new VX's
patient$histData$finalFreq_scaled <- (patient$histData$finalFreq/sum)
print("4")
# Add their ID
patient$histData$patientid <- patient_id
print("5")
# Keep only the important columns
patient$histData <- patient$histData[c("patientid", "Var1", "finalFreq_scaled")]
print("6")
# Add these dataframes to a list for better recall afterwards
dataframes[[i]] <- patient$histData
print("7")
# Additional code to transpose and merge dataframes
if(i == 1){
wide_df <- patient$histData
}else{
wide_df <- rbind(wide_df,patient$histData )
}
print("8")
print(paste(c("Patient", i), sep ="", collapse = "-"))
i = i+1
}
However, after a (seemingly random) number of iterations, the code fails right after the line "print("1")" with the following error:
Error in file(con, "rb") : cannot open the connection
The working directory is set to an external hard drive as the "sample_images" folder is 62GB large. I thought perhaps there was a timeout connection with R studio and my external hard drive so I tried to "remain active" on my computer, I've also tried resetting the working directory after each iteration to make sure it can find the file.
When it fails on a certain patient, I check manually to see if that file does indeed exist, and it does. Any thoughts?
I'm actually not sure why the error was happening, but to fix it I simply added a "try" statement:
attempt <- 1
while(is.null(dicom) && attempt <= 3){
attempt <- attempt + 1
try(
dicom <- readDICOM(patientDir[i])
)
}
This did indeed work.

"object not found" when running a function in R

I have created the following function
FilterIndi <- function(infile,name, date){
sub_file <- infile[,c("NUMBER","CREATE_DTTM_NEW", name)]
sub_file <- subset(sub_file, name==1)
library(data.table)
sub_file <- setDT(sub_file)[, .SD[which.max(CREATE_DTTM_NEW)], NUMBER]
sub_file$date <- sub_file$CREATE_DTTM_NEW
sub_file$CREATE_DTTM_NEW <- NULL
library(dplyr) #to do left_join
Unique <- left_join(Unique,sub_file, by =c("NUMBER"="NUMBER"))
Unique$name[is.na(Unique$name)] <-0
return(Unique)
}
FilterIndi(allfile, pde, pde_date )
pde is in data frame allfile but I get the following error:
Error in '[.data.frame'(infile, c("NUMBER", "CREATE_DTTM_NEW", :
object 'pde' not found
I can't figure out how to make it work.
Can someone please help me? Thanks a lot in advance.
EDIT: I have attached an image of allfile:

Unknown error on Facebook API through R.

I'm trying to download all the posts from a facebook page through RFacebook, but when the page has an high number of posts (over 400 or so), the script stops, returning the error
"Error in callAPI(url = url, token = token) : An unknown error has occurred." at the line where I call the getPage.
library(Rfacebook)
library(stringr)
load("fb_oauth")
token=fb_oauth
page<-getPage("bicocca", token, n = 100000, since = NULL, until = NULL, feed = TRUE)
noSpaceMsg<-str_replace_all(page$message, "[\r\n]" , "")
output<-as.data.frame(cbind(page$from_name,page$id, noSpaceMsg, page$created_time, page$type, page$link, page$likes_count, page$comments_count, page$shares_count))
colnames(output)<-c("username","msgid", "message", "created_time", "type", "link", "likes", "comments", "shares")
write.csv(output, "bicocca.csv", row.names=FALSE)
Where is the problem? How can I fix it?
It seems to be a problem with the API, not with the R package. When I try to do the query in the Graph API Explorer here, I get an error too. No idea why.
One way around this is to query month by month, wrapping the getPage function in a try command:
page <- 'bicocca'
dates <- seq(as.Date("2010/10/01"), as.Date("2015/04/20"), by="month")
n <- length(dates)-1
df <- list()
for (i in 1:n){
cat(as.character(dates[i]), " ")
try(df[[i]] <- getPage(page, token, since=dates[i], until=dates[i+1]))
cat("\n")
}
df <- do.call(rbind, df)
This will not give you all the posts, but probably most of them.

Resources