I have to make a lot of queries on Scopus. For this reason I need to automatize the process.
I have loaded "rscopus" Package and I wrote this code:
test <- generic_elsevier_api(query = "industry",
type = c("abstract"),
search_type = c("scopus"),
api_key = myLabel,
headers = NULL,
content_type = c("content"),
root_http = "http:/api.elsevier.com",
http_end = NULL,
verbose = TRUE,
api_key_error = TRUE)
My goal is obtaining the number of occurrences of a particular query.
In this example, if I search for "industry", I want to obtain the number of search results of the query.
query occurrence
industry 1789
how could I do?
Related
I am trying to download sequence data from E. coli samples within the state of Washington - it's about 1283 sequences, which I know is a lot. The problem that I am running into is that entrez_search and/or entrez_fetch seem to be pulling the wrong data. For example, the following R code does pull 1283 IDs, but when I use entrez_fetch on those IDs, the sequence data I get is from chickens and corn and things that are not E. coli:
search <- entrez_search(db = "biosample",
term = "Escherichia coli[Organism] AND geo_loc_name=USA:WA[attr]",
retmax = 9999, use_history = T)
Similarly, I tried pulling the sequence from one sample manually as a test. When I search for the accession number SAMN30954130 on the NCBI website, I see metadata for an E. coli sample. When I use this code, I see metadata for a chicken:
search <- entrez_search(db = "biosample",
term = "SAMN30954130[ACCN]",
retmax = 9999, use_history = T)
fetch_test <- entrez_fetch(db = "nucleotide",
id = search$ids,
rettype = "xml")
fetch_list <- xmlToList(fetch_test)
The issue here is that you are using a Biosample UID to query the Nucleotide database. However, the UID is then interpreted as a Nucleotide UID, so you get a sequence record unrelated to your original Biosample query.
What you need to use in this situation is entrez_link, which uses a UID to link records between two databases.
For example, your Biosample accession SAMN30954130 has the Biosample UID 30954130. You link that to Nucleotide like this:
nuc_links <- entrez_link(dbfrom='biosample', id=30954130, db='nuccore')
And you can get the corresponding Nucleotide UID(s) like this:
nuc_links$links$biosample_nuccore
[1] "2307876014"
And then:
fetch_test <- entrez_fetch(db = "nucleotide",
id = 2307876014,
rettype = "xml")
This is covered in the section "Finding cross-references" of the rentrez tutorial.
I want to get the timeline (10 tweets, for example) of a list of profiles, but I want to get only the tweets which contain a specific character or string.
profiles<- c("a", "b", "c", "d")
keyword <- "apple"
tweets <- get_timeline(
user = profiles,
q = #I DON'T KNOW WHAT TO PUT HERE TO GET TWEETS THAT CONTAIN keyword: can't use grepl()
#because the vector should be tweets... maybe with an if statement but I can't find the syntax
n = 10,
since_id = NULL,
max_id = NULL,
home = FALSE,
parse = TRUE,
check = TRUE,
retryonratelimit = NULL,
verbose = TRUE,
token = NULL)
You can use: q = "a OR b OR c OR d OR apple", I recommend reading the official documentation of the API about which logical operators are available and how to use them. rtweet doesn't use the twitter API v2 yet (only for the streaming endpoints at the 1.1.0 release)
I am currently trying to download a particular series from the Direction Of Trade Statistics at the IMF for a calculation of trade volumes between countries. There is a r-package imfr that does a fantastic job at doing this. However, when going for a particular set, I run into problems.
This code, works just fine and gets me the full data-series I am interested in for the fiven countries:
library(imfr)
# get the list of imf datasets
imf_ids()
# I am interested in direction of trade "DOT", so check the list of codes that are in the datastructure
imf_codelist(database_id = "DOT")
# I want the export and import data between countries FOB so "TXG_FOB_USD" and "TMG_FOB_USD"
imf_codes("CL_INDICATOR_DOT")
# works nicely for exports:
data_list_exports <- imf_data(database_id = "DOT", indicator = c("TXG_FOB_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A")
# however the same code does not work for imports
data_list_imports <- imf_data(database_id = "DOT", indicator = c("TMG_FOB_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A")
This will return an empty series and I did not understand why. So I thought, maybe the US is not in the dataset (although unlikely)
library(httr)
library(jsonlite)
# look at the API endpoint, that provides us with the data structure behind a dataset
result <- httr::GET("http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/DTO") %>% httr::content(as = "parsed")
structure_url <- "http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/DOT"
raw_data <- jsonlite::fromJSON(structure_url )
test <- raw_data$Structure$CodeLists
However, the result indicates that indeed the US is in the data. So what if I just don´t specify a country? The result finally does download the data, but only the 60 first countries because of rate limits. When doing the same with an httr::GET I directly hit the rate limit and get an error back.
data_list_imports <- imf_data(database_id = "DOT", indicator = c("TMG_FOB_USD"),
start = "1995",
return_raw = TRUE,
freq = "A")
Does anybody have an idea what I am doing wrong? I am really at a loss and just hope it is a typo somewhere...
Thanks and all the best!
This kind of answers the question:
cjyetman over at github gave me the following hint:
You can use the print_url = TRUE argument to see the actual API call.
With...
imf_data(database_id = "DOT", indicator = c("TMG_FOB_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A",
print_url = TRUE)
you get...
http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/DOT/.US+JP+KR.TMG_FOB_USD?startPeriod=1995&endPeriod=2021
which does not return any data.
But if you add "AU" as a country to that list, you do get data with...
http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/DOT/.AU+US+JP+KR.TMG_FOB_USD?startPeriod=1995&endPeriod=2021
So I guess either there is something wrong currently with their API,
or they actually do not have data for specifically that indicator for
those countries with that frequency, etc.
This does work indeed and makes apparent that either there is truly "missing data" in the API, or I am simply looking for data, where there is none. Since the original quest was to look at trade volumes, I have since found out, that the import value is usually used, with the CIF value and not FOB.
Hence the correct indicator for the API call would have been the following:
library(imfr)
data_list_imports <- imf_data(database_id = "DOT", indicator = c("TMG_CIF_USD"),
country = c("US","JP","KR"),
start = "1995",
return_raw = TRUE,
freq = "A")
I wanted to perform loop to capture weather data from multiple stations using code below:
library(rwunderground)
sample_df <- data.frame(airportid = c("K6A2",
"KAPA",
"KASD",
"KATL",
"KBKF",
"KBKF",
"KCCO",
"KDEN",
"KFFC",
"KFRG"),
stringsAsFactors = FALSE)
history_range(set_location(airport_code =sample_df$airportid), date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE, key = get_api_key(),
raw = FALSE, message = TRUE)
It won't work.
Currently, you are passing the entire vector (multiple character values) into the history_range call. Simply lapply to iteratively pass the vector values and even return a list of history_range() return objects. Below uses a defined function to pass the parameter. Extend the function as needed to perform other operations.
capture_weather_data <- function(airport_id) {
data <- history_range(set_location(airport_code=airport_id),
date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE, key = get_api_key(),
raw = FALSE, message = TRUE)
write.csv(data, paste0("/path/to/output/", airport_id, ".csv"))
return(data)
}
data_list <- lapply(sample_df$airportid, capture_weather_data)
Also, name each item in list to the corresponding airport_id character value:
data_list <- setNames(data_list, sample_df$airportid)
data_list$K6A2 # 1st ITEM
data_list$KAPA # 2nd ITEM
data_list$KASD # 3rd ITEM
...
In fact, with sapply (the wrapper to lapply) you can generate list and name each item in same call but the input vector must be a character type (not factor):
data_list <- sapply(as.character(sample_df$airportid), capture_weather_data,
simplify=FALSE, USE.NAMES=TRUE)
names(data_list)
I think this history_range function that you brought up, from the rwunderground package as I understand, requires a weather underground API key. I went to the site and even signed up for it, but the email validation process in order to get a key (https://www.wunderground.com/weather/api) doesn't seem to be working correctly at the moment.
Instead I went to the CRAN mirror (https://github.com/cran/rwunderground/blob/master/R/history.R) and from what I understand, the function accepts only one string as set_location argument. The example provided in the documentation is
history(set_location(airport_code = "SEA"), "20130101")
So what you should be doing as a "loop", instead, is
sample_df <- as.vector(sample_df)
for(i in 1:length(sample_df)){
history_range(
set_location(airport_code = sample_df[[i]]),
date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE,
key = get_api_key(),
raw = FALSE, message = TRUE)
}
If this doesn't work, let me know. (Ack, somebody also gave another answer to this question while I was typing this up.)
I am using the package bigQueryR, and in particular the function bqr_list_tables to get a list of all tables in a dataset in Google Big Query.
My issue is I only get 50 tables when I would ideally like to get them all so I can regex out the ones I want programmatically.
bqr_list_tables only takes two arguments, the datasetId and projectId. Is there any way to accomplish this restricting myself to this package?
I'm using version ‘0.2.0’ of bigQueryR
Edit:
Instead of installing the latest github version, I just directly used the following code from the repo https://github.com/cloudyr/bigQueryR/blob/master/R/tables.R, no issues and works as prescribed.
bqr_list_tables <- function(projectId, datasetId, maxResults = 1000, pageToken = ""){
l <- googleAuthR::gar_api_generator("https://www.googleapis.com/bigquery/v2",
"GET",
path_args = list(projects = projectId,
datasets = datasetId,
tables = ""),
pars_args = list(maxResults = maxResults,
pageToken = pageToken),
data_parse_function = parse_bqr_list_tables)
out <- l(path_arguments = list(projects = projectId,
datasets = datasetId))
out
}
parse_bqr_list_tables <- function(x) {
d <- x$tables
data.frame(id = d$id,
projectId = d$tableReference$projectId,
datasetId = d$tableReference$datasetId,
tableId = d$tableReference$tableId, stringsAsFactors = FALSE)
}