I am using the following code. I create a list of first names and then generate links to an API for each name and then try to capture the data from each link.
mydata$NameGenderURL2 <- paste ("https://gender-api.com/get?name=",mydata$firstname, "&key=suZrzhrNJRvrkWFXAG", sep="")
mynamegenderfunction <- function(x){
GET(url= mydata$NameGenderURL2[x])
this.raw.content <- genderdata$content
this.raw.content <- rawToChar(genderdata$content)
this.content <- fromJSON(this.raw.content)
name1[x] <- this.content$name
gender1[x] <- this.content$gender}
namelist <- mydata$firstname[1:100]
genderdata <- lapply(namelist, mynamegenderfunction)
Oddly enough I receive the following message:
>Error in curl::curl_fetch_memory(url, handle = handle) :
>Could not resolve host: NA`
I tried another API and got the same issue. Any suggestions?
Here is a data sample:
namesurl
https://api.genderize.io/?name=kaan
https://api.genderize.io/?name=Joan
https://api.genderize.io/?name=homeblitz
https://api.genderize.io/?name=Flatmax
https://api.genderize.io/?name=BRYAN
https://api.genderize.io/?name=James
https://api.genderize.io/?name=Dion
https://api.genderize.io/?name=Flintu
https://api.genderize.io/?name=Adriana
The output that I need is the gender for each link, which would be :Male/Female, Null
Related
I'm having an issue when using rvest to scrape 466 pages from a wiki. Each page represents a metric that I need further information about. I have the following code which loops through each link (loaded from a csv file) and extracts the information I need from a html table on each page.
Metrics <- read.csv("C:\\Users\\me\\Documents\\WebScraping\\LONMetrics.csv")
Metrics$Theme <- as.character(paste0(Metrics$Theme))
Metrics$Metric <- as.character(paste0(Metrics$Metric))
Metrics$URL <- as.character(paste0(Metrics$URL))
n = nrow(Metrics)
i = 1
while (i <= n) {
webPage <- read_html(Metrics$URL[i])
pageTable <- html_table(webpage)
Metrics$Definition[i] <- pageTable[[1]]$X2[1]
Metrics$Category[i] <- pageTable[[1]]$X2[2]
Metrics$Calculation[i] <- pageTable[[1]]$X2[3]
Metrics$UOM[i] <- pageTable[[1]]$X2[4]
Metrics$ExpectedTrend[i] <- pageTable[[1]]$X2[6]
Metrics$MinTech[i] <- pageTable[[1]]$X2[7]
i = i+1
}
The problem I'm having is that it stops returning data after 32 pages giving an error as:
Error in read_connection_(x, n) :
Evaluation error: Failure when receiving data from the peer
I'm wondering what the cause may be and how to get around this seeming limitation?
Thanks.
Rob
Suddenly, over the weekend, my code is no longer working.
when I run it, I receive the following message:
Error in parse_con(txt, bigint_as_char) :
Failure when receiving data from the peer
the code is the following:
raiz <- "https://olinda.bcb.gov.br/olinda/servico/Expectativas/versao/v1/odata/"
tipo <- "ExpectativaMercadoMensais?%24format=json&%24select="
indicador <- "Indicador,Data,DataReferencia,Mediana,numeroRespondentes"
restricao <- "&%24orderby=Data%20desc&%24filter=Indicador%20eq%20'IPCA'&%24top=10"
library("jsonlite")
jsonlite::fromJSON(paste0(raiz,tipo,indicador,restricao), simplifyVector = FALSE)
There is a problem with the GET function that jsonlite uses to read the website. Use readLines instead.
raiz <- "https://olinda.bcb.gov.br/olinda/servico/Expectativas/versao/v1/odata/"
tipo <- "ExpectativaMercadoMensais?%24format=json&%24select="
indicador <- "Indicador,Data,DataReferencia,Mediana,numeroRespondentes"
restricao <- "&%24orderby=Data%20desc&%24filter=Indicador%20eq%20'IPCA'&%24top=10"
library("jsonlite")
web <- readLines(paste0(raiz,tipo,indicador,restricao), warn = FALSE)
df <- jsonlite::fromJSON(web, simplifyVector = FALSE)
I didn't understand your query, but here we have one that works:
web <- readLines("https://olinda.bcb.gov.br/olinda/servico/Expectativas/versao/v1/odata/ExpectativasMercadoInflacao12Meses?$format=json", warn = FALSE)
df <- fromJSON(web)
df$value
I have a code block of the following:
# Obtain records from all patients
patientDir <- sort(list.dirs(path = "sample_images", full.names = TRUE, recursive = FALSE))
dataframes <- list()
i = 1
while(i<19){
# Strip the patient out
patient <- coreHist(patientDir[i])
print("1")
setwd("/Volumes/HUGE storage drive/")
exists<- file.exists(patientDir[i])
print(exists)
# Extract the relevant information from the patient
dicom <- readDICOM(patientDir[i])
dicomdf <- dicomTable(dicom$hdr)
patient_id <- dicomdf$`0010-0020-PatientID`[1]
print("2")
# Normalize their VX's
sum<- sum(patient$histData$finalFreq)
print("3")
# Create the new VX's
patient$histData$finalFreq_scaled <- (patient$histData$finalFreq/sum)
print("4")
# Add their ID
patient$histData$patientid <- patient_id
print("5")
# Keep only the important columns
patient$histData <- patient$histData[c("patientid", "Var1", "finalFreq_scaled")]
print("6")
# Add these dataframes to a list for better recall afterwards
dataframes[[i]] <- patient$histData
print("7")
# Additional code to transpose and merge dataframes
if(i == 1){
wide_df <- patient$histData
}else{
wide_df <- rbind(wide_df,patient$histData )
}
print("8")
print(paste(c("Patient", i), sep ="", collapse = "-"))
i = i+1
}
However, after a (seemingly random) number of iterations, the code fails right after the line "print("1")" with the following error:
Error in file(con, "rb") : cannot open the connection
The working directory is set to an external hard drive as the "sample_images" folder is 62GB large. I thought perhaps there was a timeout connection with R studio and my external hard drive so I tried to "remain active" on my computer, I've also tried resetting the working directory after each iteration to make sure it can find the file.
When it fails on a certain patient, I check manually to see if that file does indeed exist, and it does. Any thoughts?
I'm actually not sure why the error was happening, but to fix it I simply added a "try" statement:
attempt <- 1
while(is.null(dicom) && attempt <= 3){
attempt <- attempt + 1
try(
dicom <- readDICOM(patientDir[i])
)
}
This did indeed work.
I have created the following function
FilterIndi <- function(infile,name, date){
sub_file <- infile[,c("NUMBER","CREATE_DTTM_NEW", name)]
sub_file <- subset(sub_file, name==1)
library(data.table)
sub_file <- setDT(sub_file)[, .SD[which.max(CREATE_DTTM_NEW)], NUMBER]
sub_file$date <- sub_file$CREATE_DTTM_NEW
sub_file$CREATE_DTTM_NEW <- NULL
library(dplyr) #to do left_join
Unique <- left_join(Unique,sub_file, by =c("NUMBER"="NUMBER"))
Unique$name[is.na(Unique$name)] <-0
return(Unique)
}
FilterIndi(allfile, pde, pde_date )
pde is in data frame allfile but I get the following error:
Error in '[.data.frame'(infile, c("NUMBER", "CREATE_DTTM_NEW", :
object 'pde' not found
I can't figure out how to make it work.
Can someone please help me? Thanks a lot in advance.
EDIT: I have attached an image of allfile:
I'm trying to download all the posts from a facebook page through RFacebook, but when the page has an high number of posts (over 400 or so), the script stops, returning the error
"Error in callAPI(url = url, token = token) : An unknown error has occurred." at the line where I call the getPage.
library(Rfacebook)
library(stringr)
load("fb_oauth")
token=fb_oauth
page<-getPage("bicocca", token, n = 100000, since = NULL, until = NULL, feed = TRUE)
noSpaceMsg<-str_replace_all(page$message, "[\r\n]" , "")
output<-as.data.frame(cbind(page$from_name,page$id, noSpaceMsg, page$created_time, page$type, page$link, page$likes_count, page$comments_count, page$shares_count))
colnames(output)<-c("username","msgid", "message", "created_time", "type", "link", "likes", "comments", "shares")
write.csv(output, "bicocca.csv", row.names=FALSE)
Where is the problem? How can I fix it?
It seems to be a problem with the API, not with the R package. When I try to do the query in the Graph API Explorer here, I get an error too. No idea why.
One way around this is to query month by month, wrapping the getPage function in a try command:
page <- 'bicocca'
dates <- seq(as.Date("2010/10/01"), as.Date("2015/04/20"), by="month")
n <- length(dates)-1
df <- list()
for (i in 1:n){
cat(as.character(dates[i]), " ")
try(df[[i]] <- getPage(page, token, since=dates[i], until=dates[i+1]))
cat("\n")
}
df <- do.call(rbind, df)
This will not give you all the posts, but probably most of them.