post request with httr

post request with httr - r

I am new to httr. I am trying to use this geocoding api: https://geo.api.gouv.fr/adresse. I want to pass a csv file directly from R, as given in their example:
curl -X POST -F data=#search.csv -F columns=adresse -F columns=postcode https://api-adresse.data.gouv.fr/search/csv
The example csv is here: https://adresse.data.gouv.fr/exemples/search.csv
I tried, without specifying the columns:
library(httr)
test <- POST("https://api-adresse.data.gouv.fr/search/csv/",
body = "data = #search.csv")
> test
Response [https://api-adresse.data.gouv.fr/search/csv/]
Date: 2021-02-09 21:27
Status: 400
Content-Type: application/json; charset=utf-8
Size: 66 B
Or
test <- POST("https://api-adresse.data.gouv.fr/search/csv/",
body = "data = #search.csv",
content_type("application/json"))
But i still get 400 status. Specifying the whole file path did not work either. How does this work ? I would like to get the json, and read it in R
Thanks in advance !

I am not sure you can request to get json back, but here is how you might do this with httr:
library(httr)
r <- POST(url = "https://api-adresse.data.gouv.fr/search/csv",
body = list(data = upload_file("search.csv"),
columns = "adresse",
columns = "postcode"))
content(r)
# # A tibble: 4 x 20
# nom adresse postcode city latitude longitude result_label result_score result_type result_id
# <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr>
# 1 Écol~ 6 Rue ~ 54600 Vill~ 48.7 6.15 6 Rue Alber~ 0.96 housenumber 54578_00~
# 2 Écol~ 6 Rue ~ 54500 Vand~ 48.7 6.15 6 Rue d’Aqu~ 0.96 housenumber 54547_00~
# 3 Écol~ 31 Rue~ 54180 Heil~ 48.6 6.21 31 Rue d’Ar~ 0.96 housenumber 54257_00~
# 4 Écol~ 1 bis ~ 54250 Cham~ 48.7 6.16 1 bis Rue d~ 0.95 housenumber 54115_01~
# # ... with 10 more variables: result_housenumber <chr>, result_name <chr>, result_street <lgl>,
# # result_postcode <dbl>, result_city <chr>, result_context <chr>, result_citycode <dbl>,
# # result_oldcitycode <lgl>, result_oldcity <lgl>, result_district <lgl>

Related

Webscrape Table from FantasyLabs

I am trying to webscrape historical DFS NFL ownership from fanatsylabs.com using Rselenium. I am able to navigate to the page and even able to highlight the element I am trying to scrape, but am coming up with an error when I put it into a table.
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"webElement"’
I have looked up the error but cannot seem to find a reason why. I am essentially trying to follow this stack overflow example for this web scraping problem. Would someone be able to help me understand why I am not able to scrape this table and what I could do differently in order to do so?
here is my full code:
library(RSelenium)
library(XML)
library(RCurl)
# start the Selenium server
rdriver <- rsDriver(browser = "chrome",
chromever = "106.0.5249.61",
)
# creating a client object and opening the browser
obj <- rdriver$client
# navigate to the url
appURL <- 'https://www.fantasylabs.com/nfl/contest-ownership/?date=10112022'
obj$navigate(appURL)
obj$findElement(using = 'xpath', '//*[#id="ownershipGrid"]')$highlightElement()
tableElem <- obj$findElement(using = 'xpath', '//*[#id="ownershipGrid"]')
projTable <- readHTMLTable(tableElem, header = TRUE, tableElem$getElementAttribute("outerHTML")[[1]])
dvpCTable <- projTable[[1]]
dvpCTable

library(tidyverse)
library(httr2)
"https://www.fantasylabs.com/api/contest-ownership/1/10_12_2022/4/75377/0/" %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
as_tibble
#> # A tibble: 43 x 4
#> Prope~1 $Fant~2 $Posi~3 $Play~4 $Team $Salary $Actu~5 Playe~6 SortV~7 Fanta~8
#> <int> <int> <chr> <chr> <chr> <int> <dbl> <int> <lgl> <int>
#> 1 50882 1376298 TE Albert~ "DEN" 2800 NA 50882 NA 1376298
#> 2 51124 1376299 TE Andrew~ "DEN" 2500 1.7 51124 NA 1376299
#> 3 33781 1385590 RB Austin~ "LAC" 7500 24.3 33781 NA 1385590
#> 4 55217 1376255 QB Brett ~ "DEN" 5000 NA 55217 NA 1376255
#> 5 2409 1376309 QB Chase ~ "LAC" 4800 NA 2409 NA 1376309
#> 6 40663 1385288 WR Courtl~ "DEN" 6100 3.4 40663 NA 1385288
#> 7 50854 1376263 RB Damare~ "DEN" 4000 NA 50854 NA 1376263
#> 8 8580 1376342 WR DeAndr~ "LAC" 3600 4.7 8580 NA 1376342
#> 9 8472 1376304 D Denver~ "DEN" 2500 7 8472 NA 1376304
#> 10 62112 1376262 RB Devine~ "" 4000 NA 62112 NA 1376262
#> # ... with 33 more rows, 34 more variables:
#> # Properties$`$5 NFL $70K Flea Flicker [$20K to 1st] (Mon-Thu)` <dbl>,
#> # $Average <dbl>, $Volatility <lgl>, $GppGrade <chr>, $MyExposure <lgl>,
#> # $MyLeverage <lgl>, $MyLeverage_rnk <lgl>, $MediumOwnership_pct <lgl>,
#> # $PlayerId_rnk <int>, $PlayerId_pct <dbl>, $FantasyResultId_rnk <int>,
#> # $FantasyResultId_pct <dbl>, $Position_rnk <lgl>, $Position_pct <lgl>,
#> # $Player_Name_rnk <lgl>, $Player_Name_pct <lgl>, $Team_rnk <lgl>, ...
Created on 2022-11-03 with reprex v2.0.2

Scrap webpage that requires button click

I am trying to scrap data from the link below. I need to click and download a csv file available in the csv button from the webpage.
library(netstat)
library(RSelenium)
url <- https://gtr.ukri.org/search/project?term=%22climate+change%22+OR+%22climate+crisis%22&fetchSize=25&selectedSortableField=&selectedSortOrder=&fields=pro.gr%2Cpro.t%2Cpro.a%2Cpro.orcidId%2Cper.fn%2Cper.on%2Cper.sn%2Cper.fnsn%2Cper.orcidId%2Cper.org.n%2Cper.pro.t%2Cper.pro.abs%2Cpub.t%2Cpub.a%2Cpub.orcidId%2Corg.n%2Corg.orcidId%2Cacp.t%2Cacp.d%2Cacp.i%2Cacp.oid%2Ckf.d%2Ckf.oid%2Cis.t%2Cis.d%2Cis.oid%2Ccol.i%2Ccol.d%2Ccol.c%2Ccol.dept%2Ccol.org%2Ccol.pc%2Ccol.pic%2Ccol.oid%2Cip.t%2Cip.d%2Cip.i%2Cip.oid%2Cpol.i%2Cpol.gt%2Cpol.in%2Cpol.oid%2Cprod.t%2Cprod.d%2Cprod.i%2Cprod.oid%2Crtp.t%2Crtp.d%2Crtp.i%2Crtp.oid%2Crdm.t%2Crdm.d%2Crdm.i%2Crdm.oid%2Cstp.t%2Cstp.d%2Cstp.i%2Cstp.oid%2Cso.t%2Cso.d%2Cso.cn%2Cso.i%2Cso.oid%2Cff.t%2Cff.d%2Cff.c%2Cff.org%2Cff.dept%2Cff.oid%2Cdis.t%2Cdis.d%2Cdis.i%2Cdis.oid%2Ccpro.rtpc%2Ccpro.rcpgm%2Ccpro.hlt&type=#/csvConfirm
I am struggling to implement that using Selenium. Here is the code I have so far.
rD <- rsDriver(port= free_port(), browser = "chrome", chromever = "106.0.5249.21", check = TRUE, verbose = TRUE)
remote_driver <- rD[["client"]]
remDr <- rD$client
remDr$navigate(url)
webElem <- remDr$findElement(using = "css", "content gtr-body d-flex flex-column ng-scope")
webElem$clickElement()

You can often just record the network log and see what request is sent when hitting the download button. In Chrome, right click Inspect, then look for the network tab. In this case there is only one request sent:
Right click and "copy as cURL" to see the whole request or just click copy URL, since the cookies and headers are not necessary here. I wrote a quick function around the task of querying the site:
dl_ukri <- function(query,
destfile = paste0(query, ".csv"),
size = 25L,
quiet_download = FALSE) {
url <- paste0(
"https://gtr.ukri.org/search/project/csv?term=",
urltools::url_encode(query),
"&selectedFacets=&fields=acp.d,is.t,prod.t,pol.oid,acp.oid,rtp.t,pol.in,prod.i,per.pro.abs,acp.i,col.org,acp.t,is.d,is.oid,cpro.rtpc,prod.d,stp.oid,rtp.i,rdm.oid,rtp.d,col.dept,ff.d,ff.c,col.pc,pub.t,kf.d,dis.t,col.oid,pro.t,per.sn,org.orcidId,per.on,ff.dept,rdm.t,org.n,dis.d,prod.oid,so.cn,dis.i,pro.a,pub.orcidId,pol.gt,rdm.i,rdm.d,so.oid,per.fnsn,per.org.n,per.pro.t,pro.orcidId,pub.a,col.d,per.orcidId,col.c,ip.i,pro.gr,pol.i,so.t,per.fn,col.i,ip.t,ff.oid,stp.i,so.i,cpro.rcpgm,cpro.hlt,col.pic,so.d,ff.t,ip.d,dis.oid,ip.oid,stp.d,rtp.oid,ff.org,kf.oid,stp.t&type=&selectedSortableField=score&selectedSortOrder=DESC"
)
curl::curl_download(url, destfile, quiet = quiet_download)
}
Testing this with your original search:
dl_ukri('"climate change" OR "climate crisis"', destfile = "test.csv")
readr::read_csv("test.csv")
#> Rows: 5894 Columns: 25
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (23): FundingOrgName, ProjectReference, LeadROName, Department, ProjectC...
#> dbl (2): AwardPounds, ExpenditurePounds
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 5,894 × 25
#> FundingOrgN…¹ Proje…² LeadR…³ Depar…⁴ Proje…⁵ PISur…⁶ PIFir…⁷ PIOth…⁸ PI OR…⁹
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ESRC ES/W00… Univer… School… Fellow… Thew Harriet Christ… http:/…
#> 2 AHRC AH/W00… Univer… Arts L… Resear… Scott Peter Manley <NA>
#> 3 AHRC 2609218 Queen … Drama Studen… <NA> <NA> <NA> <NA>
#> 4 UKRI MR/V02… Univer… Politi… Fellow… Spaiser Viktor… <NA> http:/…
#> 5 MRC MC_PC_… Univer… <NA> Intram… Alessi Dario Renato <NA>
#> 6 AHRC 1948811 Royal … School… Studen… <NA> <NA> <NA> <NA>
#> 7 EPSRC 2688399 Brunel… Chemic… Studen… <NA> <NA> <NA> <NA>
#> 8 ESRC ES/T01… Univer… Social… Resear… Walker Cather… Louise http:/…
#> 9 AHRC AH/X00… Queen … Drama Resear… Herita… Paul <NA> http:/…
#> 10 ESRC 2272756 Univer… Sch of… Studen… <NA> <NA> <NA> <NA>
#> # … with 5,884 more rows, 16 more variables: StudentSurname <chr>,
#> # StudentFirstName <chr>, StudentOtherNames <chr>, `Student ORCID iD` <chr>,
#> # Title <chr>, StartDate <chr>, EndDate <chr>, AwardPounds <dbl>,
#> # ExpenditurePounds <dbl>, Region <chr>, Status <chr>, GTRProjectUrl <chr>,
#> # ProjectId <chr>, FundingOrgId <chr>, LeadROId <chr>, PIId <chr>, and
#> # abbreviated variable names ¹FundingOrgName, ²ProjectReference, ³LeadROName,
#> # ⁴Department, ⁵ProjectCategory, ⁶PISurname, ⁷PIFirstName, ⁸PIOtherNames, …
Created on 2022-10-17 with reprex v2.0.2
Voilà. I also played around with the fetchSize=25 which is in the original URL. But it does not seem to do anything, so I just omitted it.

R task, web scraping

I share my solution for the task, however, I get an error and cannot find the reason. Anyone can help with it?
Data download 1.1 Collect links Data on the Stack Overflow user survey is available on the Stack Overflow website. Create a web scraper that collects the links to the survey files. Select only the links to the surveys from 2017 to 2021.
lst_nodes <- "https://insights.stackoverflow.com/survey/" %>%
read_html() %>%
html_nodes(".js-download-link")
lst_url <- lst_nodes[1:5] %>%
html_attr("href")
print(lst_url)
Complete the function to download the data files from the URLs that extracted.
fun_download <- function(url) {
year <- # extract year from url
zip_file <- paste0("file_", year, ".zip")
zip_dir <- paste0("dir_", year)
download.file(url, zip_file)
unzip(zip_file, exdir = zip_dir, files = "survey_results_public.csv")
out <- read_csv(file.path(zip_dir, "survey_results_public.csv"), col_types = cols(.default = "c")) %>%
mutate(Year = year, ResponseId = row_number())
return(out)
year <- sub(".*[^0-9]([0-9]+)\\.zip$", "\\1", lst_url)
}
Apply the function to the URLs that you extracted and generate a data frame that contains the data from all surveys.
Save the data frame. Note: The read_csv command in the function seems to keep the downloaded csv files locked after reading. So once you tried to open the csv files, you cannot delete them. To overcome this lock, restart the R session.
Best to save the data so that you have to run the download and importing only once.
alldf <- lapply(lst_url, fun_download)
That is all I did so far...but it seems something is wrong

My suggestion to use year <- sub(.) needs to be put in context of the function itself, using its url only. This works.
fun_download <- function(url) {
stopifnot(length(url) == 1L) # just a safeguard
year <- sub(".*[^0-9]([0-9]+)\\.zip$", "\\1", url)
zip_file <- paste0("file_", year, ".zip")
zip_dir <- paste0("dir_", year)
download.file(url, zip_file)
unzip(zip_file, exdir = zip_dir, files = "survey_results_public.csv")
out <- readr::read_csv(file.path(zip_dir, "survey_results_public.csv"), col_types = readr::cols(.default = "c")) %>%
mutate(
Year = year,
ResponseId = row_number()
)
return(out)
}
fun_download(lst_url[[1]])
# trying URL 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2021.zip'
# Content type 'application/zip' length 8825103 bytes (8.4 MB)
# downloaded 8.4 MB
# # A tibble: 83,439 x 49
# ResponseId MainBranch Employment Country US_State UK_Country EdLevel Age1stCode LearnCode YearsCode YearsCodePro DevType
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 I am a deve~ Independen~ Slovakia NA NA Seconda~ 18 - 24 y~ Coding Bo~ NA NA Develop~
# 2 2 I am a stud~ Student, f~ Netherl~ NA NA Bachelo~ 11 - 17 y~ Other onl~ 7 NA NA
# 3 3 I am not pr~ Student, f~ Russian~ NA NA Bachelo~ 11 - 17 y~ Other onl~ NA NA NA
# 4 4 I am a deve~ Employed f~ Austria NA NA Master?~ 11 - 17 y~ NA NA NA Develop~
# 5 5 I am a deve~ Independen~ United ~ NA England Master?~ 5 - 10 ye~ Friend or~ 17 10 Develop~
# 6 6 I am a stud~ Student, p~ United ~ Georgia NA Bachelo~ 11 - 17 y~ Other onl~ NA NA NA
# 7 7 I code prim~ I prefer n~ United ~ New Ham~ NA Seconda~ 11 - 17 y~ Other onl~ 3 NA NA
# 8 8 I am a stud~ Student, f~ Malaysia NA NA Bachelo~ 11 - 17 y~ School;On~ 4 NA NA
# 9 9 I am a deve~ Employed p~ India NA NA Bachelo~ 18 - 24 y~ Coding Bo~ 6 4 Develop~
# 10 10 I am a deve~ Employed f~ Sweden NA NA Master?~ 11 - 17 y~ School 7 4 Data sc~
# # ... with 83,429 more rows, and 37 more variables: OrgSize <chr>, Currency <chr>, CompTotal <chr>, CompFreq <chr>,
# # LanguageHaveWorkedWith <chr>, LanguageWantToWorkWith <chr>, DatabaseHaveWorkedWith <chr>, DatabaseWantToWorkWith <chr>,
# # PlatformHaveWorkedWith <chr>, PlatformWantToWorkWith <chr>, WebframeHaveWorkedWith <chr>, WebframeWantToWorkWith <chr>,
# # MiscTechHaveWorkedWith <chr>, MiscTechWantToWorkWith <chr>, ToolsTechHaveWorkedWith <chr>, ToolsTechWantToWorkWith <chr>,
# # NEWCollabToolsHaveWorkedWith <chr>, NEWCollabToolsWantToWorkWith <chr>, OpSys <chr>, NEWStuck <chr>, NEWSOSites <chr>,
# # SOVisitFreq <chr>, SOAccount <chr>, SOPartFreq <chr>, SOComm <chr>, NEWOtherComms <chr>, Age <chr>, Gender <chr>,
# # Trans <chr>, Sexuality <chr>, Ethnicity <chr>, Accessibility <chr>, MentalHealth <chr>, SurveyLength <chr>, ...
From here, use lapply(., fun_download) to produce a list of frames.
list_of_frames <- lapply(lst_url, fun_download)
# trying URL 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2021.zip'
# Content type 'application/zip' length 8825103 bytes (8.4 MB)
# downloaded 8.4 MB
# trying URL 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2020.zip'
# Content type 'application/zip' length 9908290 bytes (9.4 MB)
# downloaded 9.4 MB
# trying URL 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2019.zip'
# Content type 'application/zip' length 18681322 bytes (17.8 MB)
# downloaded 17.8 MB
# trying URL 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2018.zip'
# Content type 'application/zip' length 20022841 bytes (19.1 MB)
# downloaded 19.1 MB
# trying URL 'https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2017.zip'
# Content type 'application/zip' length 9576818 bytes (9.1 MB)
# downloaded 9.1 MB
And a terse summary to show what they hold:
lapply(list_of_frames, function(z) z[1:2, 1:4])
# [[1]]
# # A tibble: 2 x 4
# ResponseId MainBranch Employment Country
# <int> <chr> <chr> <chr>
# 1 1 I am a developer by profession Independent contractor, freelancer, or self-employed Slovakia
# 2 2 I am a student who is learning to code Student, full-time Netherlands
# [[2]]
# # A tibble: 2 x 4
# Respondent MainBranch Hobbyist Age
# <chr> <chr> <chr> <chr>
# 1 1 I am a developer by profession Yes NA
# 2 2 I am a developer by profession No NA
# [[3]]
# # A tibble: 2 x 4
# Respondent MainBranch Hobbyist OpenSourcer
# <chr> <chr> <chr> <chr>
# 1 1 I am a student who is learning to code Yes Never
# 2 2 I am a student who is learning to code No Less than once per year
# [[4]]
# # A tibble: 2 x 4
# Respondent Hobby OpenSource Country
# <chr> <chr> <chr> <chr>
# 1 1 Yes No Kenya
# 2 3 Yes Yes United Kingdom
# [[5]]
# # A tibble: 2 x 4
# Respondent Professional ProgramHobby Country
# <chr> <chr> <chr> <chr>
# 1 1 Student Yes, both United States
# 2 2 Student Yes, both United Kingdom
If you need to assign names (such as the URL used to derive each dataset), then perhaps this, which adds a $url field to each frame.
list_of_frames <- Map(function(x, u) transform(x, url = u), list_of_frames, lst_url)
Data
library(rvest)
lst_nodes <- read_html("https://insights.stackoverflow.com/survey/") %>%
html_nodes(".js-download-link")
lst_url <- html_attr(lst_nodes [1:5], "href")
lst_url
# [1] "https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2021.zip"
# [2] "https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2020.zip"
# [3] "https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2019.zip"
# [4] "https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2018.zip"
# [5] "https://info.stackoverflowsolutions.com/rs/719-EMH-566/images/stack-overflow-developer-survey-2017.zip"

Apply a function to every case of a tibble

My second participation here, in Stackoverflow.
I have a function called bw_test with several args like this:
bw_test <- function(localip, remoteip, localspeed, remotespeed , duracion =30,direction ="both"){
comando <- str_c("ssh usuario#", localip ," /tool bandwidth-test direction=", direction," remote-tx-speed=",remotespeed,"M local-tx-speed=",localspeed,"M protocol=udp user=usuario password=mipasso duration=",duracion," ",remoteip)
resultado <- system(comando,intern = T,ignore.stderr = T)
# resultado pull from a ssh server a vector like this:
# head(resultado)
#[1] " status: connecting\r" " tx-current: #0bps\r" " tx-10-second-average: 0bps\r"
#[4] " tx-total-average: 0bps\r" " rx-current: #0bps\r" " rx-10-second-average: 0bps\r"
resultado %<>%
replace("\r","") %>%
tail(17) %>%
trimws("both") %>%
as_tibble %>%
mutate(local=localip, remote=remoteip) %>%
separate(value,sep=":", into=c("parametro","valor")) %>%
head(15)
resultado$valor %<>%
trimws() %>%
str_replace("Mbps","") %>% str_replace("%","") %>% str_replace("s","")
resultado %<>%
spread(parametro,valor)
resultado %<>%
mutate(`tx-percentaje`=as.numeric(resultado$`tx-total-average`)/localspeed) %>%
mutate(`rx-percentaje`=as.numeric(resultado$`rx-total-average`)/remotespeed)
return(resultado)
}
this function returns a tibble like this one:
A tibble: 1 x 19
local remote `connection-cou… direction duration `local-cpu-load` `lost-packets` `random-data` `remote-cpu-loa…
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 192.… 192.1… 1 both 4 13 0 no 12
# … with 10 more variables: `rx-10-second-average` <chr>, `rx-current` <chr>, `rx-size` <chr>,
# `rx-total-average` <chr>, `tx-10-second-average` <chr>, `tx-current` <chr>, `tx-size` <chr>,
# `tx-total-average` <chr>, `tx-percentaje` <dbl>, `rx-percentaje` <dbl>
So, when I call the function inside rbind, got the result of every run on a tibble:
rbind(bw_test("192.168.105.10" ,"192.168.105.18", 75,125),
bw_test("192.168.133.11","192.168.133.9", 5 ,50),
bw_test("192.168.254.251","192.168.254.250", 25,150))
My results are for the example:
# A tibble: 3 x 19
local remote `connection-cou… direction duration `local-cpu-load` `lost-packets` `random-data` `remote-cpu-loa…
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 192.… 192.1… 20 both 28 63 232 no 48
2 192.… 192.1… 20 both 29 4 0 no 20
3 192.… 192.1… 20 both 29 15 0 no 22
# … with 10 more variables: `rx-10-second-average` <chr>, `rx-current` <chr>, `rx-size` <chr>,
# `rx-total-average` <chr>, `tx-10-second-average` <chr>, `tx-current` <chr>, `tx-size` <chr>,
# `tx-total-average` <chr>, `tx-percentaje` <dbl>, `rx-percentaje` <dbl>
My problem is to apply the function to the cases of a tibble like like this.
aps <- tribble(
~name, ~ip, ~remoteip , ~bw_test, ~localspeed,~remotespeed,
"backbone_border_core","192.168.253.1", "192.168.253.3", 1,200,200,
"backbone_2_site2","192.168.254.251", "192.168.254.250", 1, 25,150
}
I was trying to use map, but i got:
map(c(aps$ip,aps$remoteip,aps$localspeed,aps$remotespeed), bw_test)
el argumento "remotespeed" está ausente, sin valor por omisión
I believe cause c(aps$ip,aps$remoteip,aps$localspeed,aps$remotespeed) feeds first all cases of aps$ip, later all of aps$remoteip and so on.
I'm using the right strategie? it's map a suitable way
What i'm doing wrong?
¿how can I apply function to every row to get the requested tibble?
I'll appreciate your kindly help.
Greets.

Try using pmap_df.
output <- purrr::pmap_df(list(aps$ip, aps$remoteip, aps$localspeed,
aps$remotespeed), bw_test)

reading zip file directly with read_csv from readr producing weird results

I'm trying to read directly from a URL to grab a zip file that contains a pipe delimited text file. If I download the file, then use read_csv to read it from disk, I have no problems. But if I try to use read_csv to read the URL directly I get garbage in my resulting df. I can work around this by coding in a download then read. But it seems like it should work directly. Any clues on what's going on here?
library(readr)
url <- "https://www.rma.usda.gov/data/sob/sccc/sobcov_2018.zip"
df <- read_delim(url, delim='|',
col_names = c('year','stFips','stAbbr','coFips','coName',
'cropCd','cropName','planCd','planAbbr','coverCat',
'deliveryType','covLevel','policyCount','policyPremCount','policyIndemCount',
'unitsReportingPrem', 'indemCount','quantType', 'quantNet', 'companionAcres',
'liab','prem','subsidy','indem', 'lossRatio'))
#> Parsed with column specification:
#> cols(
#> .default = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 1)
#> Warning: 7908 parsing failures.
#> row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1 year "" embedded null 'https://www.rma.usda.gov/data/sob… file 2 1 <NA> 25 columns 1 columns 'https://www.rma.usda.gov/data/sob… row 3 2 <NA> 25 columns 4 columns 'https://www.rma.usda.gov/data/sob… col 4 3 <NA> 25 columns 2 columns 'https://www.rma.usda.gov/data/sob… expected 5 4 year "" embedded null 'https://www.rma.usda.gov/data/sob…
#> ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
#> See problems(...) for more details.
head(df)
#> # A tibble: 6 x 25
#> year stFips stAbbr coFips coName cropCd cropName planCd planAbbr
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "PK\u00… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 "K\xe6\… "\xf5\x… "\xc5\… "\xfa\… <NA> <NA> <NA> <NA> <NA>
#> 3 "\xb0\x… "\xfd\x… <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 "j`/Q\x… "\x96\x… <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 "\xc0\x… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 6 "z\xe4\… "~y\xf5… <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> # ... with 16 more variables: coverCat <chr>, deliveryType <chr>,
#> # covLevel <chr>, policyCount <chr>, policyPremCount <chr>,
#> # policyIndemCount <chr>, unitsReportingPrem <chr>, indemCount <chr>,
#> # quantType <chr>, quantNet <chr>, companionAcres <chr>, liab <chr>,
#> # prem <chr>, subsidy <chr>, indem <chr>, lossRatio <chr>
If I download first, I get the following output:
> url <- './data/sobcov_2018.zip'
> df <- read_delim(url, delim='|',
+ col_names = c('year','stFips','stAbbr','coFips','coName',
+ 'cropCd','cropName','planCd','planAbbr','coverCat',
+ 'deliveryType','covLevel','policyCount','policyPremCount','policyIndemCount',
+ 'unitsReportingPrem', 'indemCount','quantType', 'quantNet', 'companionAcres',
+ 'liab','prem','subsidy','indem', 'lossRatio'))
Parsed with column specification:
cols(
.default = col_integer(),
stFips = col_character(),
stAbbr = col_character(),
coFips = col_character(),
coName = col_character(),
cropCd = col_character(),
cropName = col_character(),
planCd = col_character(),
planAbbr = col_character(),
coverCat = col_character(),
deliveryType = col_character(),
covLevel = col_double(),
quantType = col_character(),
lossRatio = col_double()
)
See spec(...) for full column specifications.
> head(df)
# A tibble: 6 x 25
year stFips stAbbr coFips coName cropCd cropName planCd planAbbr coverCat deliveryType covLevel
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 2018 02 AK 999 "All Other … 9999 "All Other C… 01 "YP … "A " RBUP 0.500
2 2018 02 AK 240 "Southeast … 9999 "All Other C… 90 "APH … "A " RBUP 0.500
3 2018 02 AK 240 "Southeast … 9999 "All Other C… 90 "APH … "A " RBUP 0.750
4 2018 02 AK 240 "Southeast … 9999 "All Other C… 90 "APH … "C " RCAT 0.500
5 2018 02 AK 240 "Southeast … 9999 "All Other C… 02 "RP … "A " RBUP 0.600
6 2018 02 AK 240 "Southeast … 9999 "All Other C… 02 "RP … "A " RBUP 0.750
# ... with 13 more variables: policyCount <int>, policyPremCount <int>, policyIndemCount <int>,
# unitsReportingPrem <int>, indemCount <int>, quantType <chr>, quantNet <int>, companionAcres <int>,
# liab <int>, prem <int>, subsidy <int>, indem <int>, lossRatio <dbl>
>

readr can handle only gz compressed files as remote sources, since there are no analogues to base::gzcon() for other compression algorithms. See this github issue for a discussion and the improved documentation (also in ?readr::datasource).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

post request with httr - r

Related

Webscrape Table from FantasyLabs

Scrap webpage that requires button click

R task, web scraping

Apply a function to every case of a tibble

reading zip file directly with read_csv from readr producing weird results

Categories

Resources