I wanted to perform loop to capture weather data from multiple stations using code below:
library(rwunderground)
sample_df <- data.frame(airportid = c("K6A2",
"KAPA",
"KASD",
"KATL",
"KBKF",
"KBKF",
"KCCO",
"KDEN",
"KFFC",
"KFRG"),
stringsAsFactors = FALSE)
history_range(set_location(airport_code =sample_df$airportid), date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE, key = get_api_key(),
raw = FALSE, message = TRUE)
It won't work.
Currently, you are passing the entire vector (multiple character values) into the history_range call. Simply lapply to iteratively pass the vector values and even return a list of history_range() return objects. Below uses a defined function to pass the parameter. Extend the function as needed to perform other operations.
capture_weather_data <- function(airport_id) {
data <- history_range(set_location(airport_code=airport_id),
date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE, key = get_api_key(),
raw = FALSE, message = TRUE)
write.csv(data, paste0("/path/to/output/", airport_id, ".csv"))
return(data)
}
data_list <- lapply(sample_df$airportid, capture_weather_data)
Also, name each item in list to the corresponding airport_id character value:
data_list <- setNames(data_list, sample_df$airportid)
data_list$K6A2 # 1st ITEM
data_list$KAPA # 2nd ITEM
data_list$KASD # 3rd ITEM
...
In fact, with sapply (the wrapper to lapply) you can generate list and name each item in same call but the input vector must be a character type (not factor):
data_list <- sapply(as.character(sample_df$airportid), capture_weather_data,
simplify=FALSE, USE.NAMES=TRUE)
names(data_list)
I think this history_range function that you brought up, from the rwunderground package as I understand, requires a weather underground API key. I went to the site and even signed up for it, but the email validation process in order to get a key (https://www.wunderground.com/weather/api) doesn't seem to be working correctly at the moment.
Instead I went to the CRAN mirror (https://github.com/cran/rwunderground/blob/master/R/history.R) and from what I understand, the function accepts only one string as set_location argument. The example provided in the documentation is
history(set_location(airport_code = "SEA"), "20130101")
So what you should be doing as a "loop", instead, is
sample_df <- as.vector(sample_df)
for(i in 1:length(sample_df)){
history_range(
set_location(airport_code = sample_df[[i]]),
date_start = "20170815", date_end = "20170822",
limit = 10, no_api = FALSE, use_metric = FALSE,
key = get_api_key(),
raw = FALSE, message = TRUE)
}
If this doesn't work, let me know. (Ack, somebody also gave another answer to this question while I was typing this up.)
Related
I have a data frame with observations on YouTube video_ids. When used in an API call, these ids allow me to fetch data on certain videos that I use to enrich my dataset.
First I created a list of unique video_ids with the below script. This returns a large list of 6350 unique elements.
video_ids <- list();
index <- 1
for(i in unique(df$video_id)){
video_ids[[index]] <- list(
video_id = i
)
index <- index + 1
}
The API documentation asks for a comma seperated list of video ids. I did that by using unlist(video_ids) which returns a large vector. I cannot use this vector in the API call, because it is way too long.
The maximum amount of ids I can process in one API call is 50.
library(httr)
api_key = "xxxx"
process_ids = unlist(video_ids[1:50]) #pass the first 50 elements of the video_ids list
url <- modify_url("https://www.googleapis.com/youtube/v3/videos",
query = list(
"part" = "snippet",
"id" = paste(process_ids, collapse=","),
"key" = api_key)
)
output <- content(GET(url), as = "parsed", type = "application/json")
What is the best approach for this in R? Can I loop through my list of 6350 elements by 50 items each loop, removing these items from the list when the loop completes?
My current script below loops through each video id in the list and fetches the data I need from the output of the API response. This works, but is very slow and requires a lot of loops / API calls. (6350 loops). It can't be the most effient way to approach this.
result <- list();
index <- 1
for (id in video_ids) {
api_key = "xxxx"
url <- modify_url("https://www.googleapis.com/youtube/v3/videos",
query = list(
"part" = "snippet",
"id" = paste(id, collapse=","),
"key" = api_key)
)
output <- content(GET(url), as = "parsed", type = "application/json")
#Adds what I need from the output to a list called result
for(t in output$items){
result[[index]] <- list(
video_id = t$id,
channel_id = t$snippet$channelId
)
}
index <- index + 1
}
You can try the following :
Split the video id's every 50 values and pass it to the API.
vec = unlist(video_ids)
result <- lapply(split(vec, ceiling(seq_along(vec)/50)), function(x) {
url <- modify_url("https://www.googleapis.com/youtube/v3/videos",
query = list(
"part" = "snippet",
"id" = paste(x, collapse=","),
"key" = api_key))
content(GET(url), as = "parsed", type = "application/json")
})
I am using WikipediR to query revision ids to see if the very next edit is a 'rollback' or an 'undo'
I am interested in the tag and revision comment to identify if the edit was undone/rolled back.
my code for this for a single revision id is:
library(WikipediR)
wp_diff<- revision_diff("en", "wikipedia", revisions = "883987486", properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
I then convert the output of this to a df using the code
library(dplyr)
library(tibble)
diff <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
This works great for a single revision id.
I am wondering how I would loop or map over a vector of many revision ID's
I tried
vec <- c("883987486","911412795")
for (i in 1:length(vec)){
wp_diff[i]<- revision_diff("en", "wikipedia", revisions = i, properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
}
But this creates the error
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
When I try to convert the output list to a dataframe.
Does anybody have any suggestions. I am not sure how to proceed.
Thanks.
Try the following code:
# Make a function
make_diff_df <- function(rev){
wp_diff <- revision_diff("en", "wikipedia", revisions = rev,
properties = c("tags", "comment"),
direction = "next", clean_response = TRUE,
as_wikitext = TRUE)
DF <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
# Define the names of the DF
names(DF) <- c("pageid","ns","title","revisions.diff.from",
"revisions.diff.to","revisions.diff..",
"revisions.comment","revisions..mw.rollback.")
return(DF)
}
vec <- c("883987486","911412795")
# Use do.call and lapply with the function
do.call("rbind",lapply(vec,make_diff_df))
Note that you have to fixed the names of the DF inside the make_diff_df function in order to "rbind" inside do.call could work. The names with the 2 versions from the example are pretty similar.
Hope this can help
I am trying to use the gmapsdistance package in R to calculate the journey time by public transport between a list of postcodes (origin) and a single destination postcode.
The output for a single query is:
$Time
[1] 5352
$Distance
[1] 34289
$Status
[1] "OK"
I actually have 2.5k postcodes to use but whilst I troubleshoot it I have set the iterations to 10. london1 is a dataframe containing a single column with 2500 postcodes in 2500 rows.
This is my attempt so far;
results <- for(i in 1:10) {
gmapsdistance::set.api.key("xxxxxx")
gmapsdistance::gmapsdistance(origin = "london1[i]"
destination = "WC1E 6BT"
mode = "transit"
dep_date = "2017-04-18"
dep_time = "09:00:00")}
When I run this loop I get
results <- for(i in 1:10) {
+ gmapsdistance::set.api.key("AIzaSyDFebeOppqSyUGSut_eGs8JcjdsgPBo8zk")
+ gmapsdistance::gmapsdistance(origin = "london1[i]"
+ destination = "WC1E 6BT"
Error: unexpected symbol in:
" gmapsdistance::gmapsdistance(origin = "london1[i]"
destination"
mode = "transit"
dep_date = "2017-04-18"
dep_time = "09:00:00")}
Error: unexpected ')' in " dep_time = "09:00:00")"
My questions are:
1)How can I fix this?
2) How do I need to format this, so the output is a dataframe or matrix containing the origin postcode and journey time
Thanks
There are a few things going on here:
"london[i]" needs to be london[i, 1]
you need to separate your arguments with commas ,
I get an error when using, e.g., "WC1E 6BT", I found it necessary to replace the space with a dash, like "WC1E-6BT"
the loop needs to explicitly assign values to elements of results
So your code would look something like:
library(gmapsdistance)
## some example data
london1 <- data.frame(postCode = c('WC1E-7HJ', 'WC1E-6HX', 'WC1E-7HY'))
## make an empty list to be filled in
results <- vector('list', 3)
for(i in 1:3) {
set.api.key("xxxxxx")
## fill in your results list
results[[i]] <- gmapsdistance(origin = london1[i, 1],
destination = "WC1E-6BT",
mode = "transit",
dep_date = "2017-04-18",
dep_time = "09:00:00")
}
It turns out you don't need a loop---and probably shouldn't---when using gmapsdistance (see the help doc) and the output from multiple inputs also helps in quickly formatting your output into a data.frame:
set.api.key("xxxxxx")
temp1 <- gmapsdistance(origin = london1[, 1],
destination = "WC1E-6BT",
mode = "transit",
dep_date = "2017-04-18",
dep_time = "09:00:00",
combinations = "all")
The above returns a list of data.frame objects, one each for Time, Distance and Status. You can then easily make those into a data.frame containing everything you might want:
res <- data.frame(origin = london1[, 1],
desination = 'WC1E-6BT',
do.call(data.frame, lapply(temp1, function(x) x[, 2])))
lapply(temp1, function(x) x[, 2]) extracts the needed column from each data.frame in the list, and do.call puts them back together as columns in a new data.frame object.
In R, I need to return two objects from a function:
myfunction()
{
a.data.frame <- read.csv(file = input.file, header = TRUE, sep = ",", dec = ".")
index.hash <- get_indices_function(colnames(a.data.frame))
alist <- list("a.data.frame" = a.data.frame, "index.hash" = index.hash)
return(alist)
}
But, the returned objects from myfunction all become list not data.frame and hash.
Any help would be appreciated.
You can only return one object from an R function; this is consistent with..pretty much every other language I've used. However, you'll note that the objects retain their original structure within the list - so alist[[1]] and alist[[2]] should be the data frame and hash respectively, and are structured as data frames and hashes. Once you've returned them from the function, you can split them out into unique objects if you want :).
You can use a structure.
return (structure(class = "myclass",
list(data = daza.frame,
type = anytype,
page.content = page.content.as.string.vector,
knitr = knitr)))
Than you can access your data with
values <- my function(...)
values$data
values$type
values$page.content
values$knitr
and so on.
A working example from my package:
sju.table.values <- function(tab, digits=2) {
if (class(tab)!="ftable") tab <- ftable(tab)
tab.cell <- round(100*prop.table(tab),digits)
tab.row <- round(100*prop.table(tab,1),digits)
tab.col <- round(100*prop.table(tab,2),digits)
tab.expected <- as.table(round(as.array(margin.table(tab,1)) %*% t(as.array(margin.table(tab,2))) / margin.table(tab)))
# -------------------------------------
# return results
# -------------------------------------
invisible (structure(class = "sjutablevalues",
list(cell = tab.cell,
row = tab.row,
col = tab.col,
expected = tab.expected)))
}
tab <- table(sample(1:2, 30, TRUE), sample(1:3, 30, TRUE))
# show expected values
sju.table.values(tab)$expected
# show cell percentages
sju.table.values(tab)$cell
Is there a way to make matching values at scale more programmatic? Basically what I want to do is add a bunch of columns for value lookups onto a dataframe, but I don't want to write the match[] argument every time. It seems like this would be a use case for mapply but I can't quite figure out how to use it here. Any suggestions?
Here's the data:
data <- data.frame(
region = sample(c("northeast","midwest","west"), 50, replace = T),
climate = sample(c("dry","cold","arid"), 50, replace = T),
industry = sample(c("tech","energy","manuf"), 50, replace = T))
And the corresponding lookup tables:
lookups <- data.frame(
orig_val = c("northeast","midwest","west","dry","cold","arid","tech","energy","manuf"),
look_val = c("dir1","dir2","dir3","temp1","temp2","temp3","job1","job2","job3")
)
So now what I want to do is: First add a column to "data" that's called "reg_lookups" and it will match the region to its appropriate value in "lookups". Do the same for "climate_lookups" and so on.
Right now, I've got this mess:
data$reg_lookup <- lookups$look_val[match(data$region, lookups$orig_val)]
data$clim_lookup <- lookups$look_val[match(data$climate, lookups$orig_val)]
data$indus_lookup <- lookups$look_val[match(data$industry, lookups$orig_val)]
I've tried using a function to do this, but the function doesn't seem to work, so then applying that to mapply is a no-go (plus I'm confused about how the mapply syntax would work here):
match_fun <- function(df, newval, df_look, lookup_val, var, ref_val) {
df$newval <- df_look$lookup_val[match(df$var, df_look$ref_val)]
return(df)
}
data2 <- match_fun(data, reg_2, lookups, look_val, region, orig_val)
I think you're just trying to do this:
data <- merge(data,lookups[1:3,],by.x = "region",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[4:6,],by.x = "climate",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[7:9,],by.x = "industry",by.y = "orig_val",all.x = TRUE)
But it would be much better to store the lookups either in separate data frames. That way you can control the names of the new columns more easily. It would also allow you to do something like this:
lookups1 <- split(lookups,rep(1:3,each = 3))
colnames(lookups1[[1]]) <- c('region','reg_lookup')
colnames(lookups1[[2]]) <- c('climate','clim_lookup')
colnames(lookups1[[3]]) <- c('industry','indus_lookup')
do.call(cbind,mapply(merge,
x = list(data[,1,drop = FALSE],data[,2,drop =FALSE],data[,3,drop = FALSE]),
y = lookups1,
moreArgs = list(all.x = TRUE),
SIMPLIFY = FALSE))
and you should be able to wrap that do.call bit in a function.
I used data[,1,drop = FALSE] in order to preserve them as one column data frames.
The way you structure mapply calls is to pass named arguments as lists (the x = and y = parts). I wanted to be sure to preserve all the rows from data, so I passed all.x = TRUE via moreArgs, so that gets passed each time merge is called. Finally, I need to stitch them all together myself, so I turned off SIMPLIFY.