Fastest way to upload data via R to PostgresSQL 12

Fastest way to upload data via R to PostgresSQL 12 - r

I am using the following code to connect to a PostgreSQL 12 database:
con <- DBI::dbConnect(odbc::odbc(), driver, server, database, uid, pwd, port)
This connects me to a PostgreSQL 12 database on Google Cloud SQL. The following code is then used to upload data:
DBI::dbCreateTable(con, tablename, df)
DBI::dbAppendTable(con, tablename, df)
where df is a data frame I have created in R. The data frame consists of ~ 550,000 records totaling 713 MB of data.
When uploaded by the above method, it took approximately 9 hours at a rate of 40 write operations/second. Is there a faster way to upload this data into my PostgreSQL database, preferably through R?

I've always found bulk-copy to be the best, external to R. The insert can be significantly faster, and your overhead is (1) writing to file, and (2) the shorter run-time.
Setup for this test:
win10 (2004)
docker
postgres:11 container, using port 35432 on localhost, simple authentication
a psql binary in the host OS (where R is running); should be easy with linux, with windows I grabbed the "zip" (not installer) file from https://www.postgresql.org/download/windows/ and extracted what I needed
I'm using data.table::fwrite to save the file because it's fast; in this case write.table and write.csv are still much faster than using DBI::dbWriteTable, but with your size of data you might prefer something quick
DBI::dbCreateTable(con2, "mt", mtcars)
DBI::dbGetQuery(con2, "select count(*) as n from mt")
# n
# 1 0
z1000 <- data.table::rbindlist(replicate(1000, mtcars, simplify=F))
nrow(z1000)
# [1] 32000
system.time({
DBI::dbWriteTable(con2, "mt", z1000, create = FALSE, append = TRUE)
})
# user system elapsed
# 1.56 1.09 30.90
system.time({
data.table::fwrite(z1000, "mt.csv")
URI <- sprintf("postgresql://%s:%s#%s:%s", "postgres", "mysecretpassword", "127.0.0.1", "35432")
system(
sprintf("psql.exe -U postgres -c \"\\copy %s (%s) from %s (FORMAT CSV, HEADER)\" %s",
"mt", paste(colnames(z1000), collapse = ","),
sQuote("mt.csv"), URI)
)
})
# COPY 32000
# user system elapsed
# 0.05 0.00 0.19
DBI::dbGetQuery(con2, "select count(*) as n from mt")
# n
# 1 64000
While this is a lot smaller than your data (32K rows, 11 columns, 1.3MB of data), a speedup from 30 seconds to less than 1 second cannot be ignored.
Side note: there is also a sizable difference between dbAppendTable (slow) and dbWriteTable. Comparing psql and those two functions:
z100 <- rbindlist(replicate(100, mtcars, simplify=F))
system.time({
data.table::fwrite(z100, "mt.csv")
URI <- sprintf("postgresql://%s:%s#%s:%s", "postgres", "mysecretpassword", "127.0.0.1", "35432")
system(
sprintf("/Users/r2/bin/psql -U postgres -c \"\\copy %s (%s) from %s (FORMAT CSV, HEADER)\" %s",
"mt", paste(colnames(z100), collapse = ","),
sQuote("mt.csv"), URI)
)
})
# COPY 3200
# user system elapsed
# 0.0 0.0 0.1
system.time({
DBI::dbWriteTable(con2, "mt", z100, create = FALSE, append = TRUE)
})
# user system elapsed
# 0.17 0.04 2.95
system.time({
DBI::dbAppendTable(con2, "mt", z100, create = FALSE, append = TRUE)
})
# user system elapsed
# 0.74 0.33 23.59
(I don't want to time dbAppendTable with z1000 above ...)
(For kicks, I ran it with replicate(10000, ...) and ran the psql and dbWriteTable tests again, and they took 2 seconds and 372 seconds, respectively. Your choice :-) ... now I have over 650,000 rows of mtcars ... hrmph ... drop table mt ...

I suspect that dbAppendTable results in an INSERT statement per row, which can take a long time for high numbers of rows.
However, you can generate a single INSERT statement for the entire data frame using the sqlAppendTable function and run it by using dbSendQuery explicitly:
res <- DBI::dbSendQuery(con, DBI::sqlAppendTable(con, tablename, df, row.names=FALSE))
DBI::dbClearResult(res)
For me, this was much faster: a 30 second ingest reduced to a 0.5 second ingest.

Related

sparklyr connecting to kafka streams/topics

I'm having difficulty connecting to and retrieving data from a kafka instance. Using python's kafka-python module, I can connect (using the same connection parameters), see the topic, and retrieve data, so the network is viable, there is no authentication problem, the topic exists, and data exists in the topic.
On R-4.0.5 using sparklyr-1.7.2, connecting to kafka-2.8
library(sparklyr)
spark_installed_versions()
# spark hadoop dir
# 1 2.4.7 2.7 /home/r2/spark/spark-2.4.7-bin-hadoop2.7
# 2 3.1.1 3.2 /home/r2/spark/spark-3.1.1-bin-hadoop3.2
sc <- spark_connect(master = "local", version = "2.4",
config = list(
sparklyr.shell.packages = "org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0"
))
system.time({
Z <- stream_read_kafka(
sc,
options = list(
kafka.bootstrap.servers="11.22.33.44:5555",
subscribe = "mytopic"))
})
# user system elapsed
# 0.080 0.000 10.349
system.time(collect(Z))
# user system elapsed
# 1.336 0.136 8.537
Z
# # Source: spark<?> [inf x 7]
# # … with 7 variables: key <lgl>, value <lgl>, topic <chr>, partition <int>, offset <dbl>, timestamp <dbl>, timestampType <int>
My first concern is that I'm not seeing data from the topic, I appear to be getting a frame suggesting (meta)data about topics in general, and there is nothing found. With this topic, there are 800 strings (json), modest-to-small sizes. My second concern is that it takes almost 20 seconds to realize this problem (though I suspect that's a symptom of the larger connection problem).
For confirmation, this works:
cons = import("kafka")$KafkaConsumer(bootstrap_servers="11.22.33.44:5555", auto_offset_reset="earliest", max_partition_fetch_bytes=10240000L)
cons$subscribe("mytopic")
msg <- cons$poll(timeout_ms=30000L, max_records=99999L)
length(msg)
# [1] 1
length(msg[[1]])
# [1] 801
as.character( msg[[1]][[1]]$value )
# [1] "{\"TrackId\":\"c839dcb5-...\",...}"
(and those commands complete almost instantly, nothing like the 8-10sec lag above).
The kafka instance to which I'm connecting is using ksqlDB, though I don't think that's a requirement in order to need to use the "org.apache.spark:spark-sql-kafka-.." java package.
(Ultimately I'll be using stateless/stateful procedures on streaming data, including joins and window ops, so I'd like to not have to re-implement that from scratch on the simple kafka connection.)

DBI::dbWriteTable doesn't append beyond specific number of rows and throws exception on nvarchar(max)

Currently working with Professional Rstudio drivers
I have set all varchar to max in Sqlserver table as well but it didnt help.
diag_test <- diag[1:1025]
system.time({
DBI::dbWriteTable(con, "urban_prodbi_diag", diag_test, append = TRUE)
})
Error in result_insert_dataframe(rs#ptr, values) :
nanodbc/nanodbc.cpp:1587: HY000: The incoming tabular data stream (TDS)
remote procedure call (RPC) protocol stream is incorrect. Parameter 5
(""): The supplied length is not valid for data type nvarchar(max).
Check the source data for invalid lengths. An example of an invalid
length is data of nchar type with an odd length in bytes., Timing
stopped at: 0.049 0.002 0.135
It works with fewer rows though.
Few subset examples:
diag_test <- diagcopy[1:1024]
system.time({
DBI::dbWriteTable(con, "urban_prodbi_diag", diag_test, append = TRUE)
})
user system elapsed
0.042 0.000 0.109
diag_test <- diagcopy[1000:1025]
system.time({
odbc::dbWriteTable(con, "urban_prodbi_diag", diag_test, append = TRUE)
})
user system elapsed
0.014 0.000 0.074
Insert didnt give any error through RODBC package on the same table schema and dataframe.

Slow wordcloud in R

Trying to create a word cloud from a 300MB .csv file with text, but its taking hours on a decent laptop with 16GB of RAM. Not sure how long this should typically take...but here's my code:
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
dfTemplate <- read.csv("CleanedDescMay.csv", header=TRUE, stringsAsFactors = FALSE)
template <- dfTemplate
template <- Corpus(VectorSource(template))
template <- tm_map(template, removeWords, stopwords("english"))
template <- tm_map(template, stripWhitespace)
template <- tm_map(template, removePunctuation)
dtm <- TermDocumentMatrix(template)
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing=TRUE)
d <- data.frame(word = names(v), freq=v)
head(d, 10)
par(bg="grey30")
png(file="WordCloudDesc1.png", width=1000, height=700, bg="grey30")
wordcloud(d$word, d$freq, col=terrain.colors(length(d$word), alpha=0.9), random.order=FALSE, rot.per = 0.3, max.words=500)
title(main = "Top Template Words", font.main=1, col.main="cornsilk3", cex.main=1.5)
dev.off()
Any advice is appreciated!

Step 1: Profile
Have you tried profiling your full workflow yet with a small subset to figure out which steps are taking the most time? Profiling with RStudio here
If not, that should be your first step.
If the tm_map() functions are taking a long time:
If I recall correctly, I found working with stringi to be faster than the dedicated corpus tools.
My workflow wound up looking like the following for the pre-cleaning steps. This could definitely be optimized further -- magrittr pipes %>% do contribute to some additional processing time, but I feel like that's an acceptable trade-off for the sanity of not having dozens of nested parenthesis.
library(data.table)
library(stringi)
library(parallel)
## This function handles the processing pipeline
textCleaner <- function(InputText, StopWords, Words, NewWords){
InputText %>%
stri_enc_toascii(.) %>%
toupper(.) %>%
stri_replace_all_regex(.,"[[:cntrl:]]"," ") %>%
stri_replace_all_regex(.,"[[:punct:]]"," ") %>%
stri_replace_all_regex(.,"[[:space:]]+"," ") %>% ## Replaces multiple spaces with
stri_replace_all_regex(.,"^[[:space:]]+|[[:space:]]+$","") %>% ## Remove leading and trailing spaces
stri_replace_all_regex(.,"\\b"%s+%StopWords%s+%"\\b","",vectorize_all = FALSE) %>% ## Stopwords
stri_replace_all_regex(.,"\\b"%s+%Words%s+%"\\b",NewWords,vectorize_all = FALSE) ## Replacements
}
## Replacement Words, I would normally read in a .CSV file
Replace <- data.table(Old = c("LOREM","IPSUM","DOLOR","SIT"),
New = c("I","DONT","KNOW","LATIN"))
## These need to be defined globally
GlobalStopWords <- c("AT","UT","IN","ET","A")
GlobalOldWords <- Replace[["Old"]]
GlobalNewWords <- Replace[["New"]]
## Generate some sample text
DT <- data.table(Text = stringi::stri_rand_lipsum(500000))
## Running Single Threaded
system.time({
DT[,CleanedText := textCleaner(Text, GlobalStopWords,GlobalOldWords, GlobalNewWords)]
})
# user system elapsed
# 66.969 0.747 67.802
The process of cleaning text is embarrassingly parallel, so in theory you should be able some big time savings possible with multiple cores.
I used to run this pipeline in parallel, but looking back at it today, it turns out that the communication overhead makes this take twice as long with 8 cores as it does single threaded. I'm not sure if this was the same for my original use case, but I guess this may simply serve as a good example of why trying to parallelize instead of optimize can lead to more trouble than value.
## This function handles the cluster creation
## and exporting libraries, functions, and objects
parallelCleaner <- function(Text, NCores){
cl <- parallel::makeCluster(NCores)
clusterEvalQ(cl, library(magrittr))
clusterEvalQ(cl, library(stringi))
clusterExport(cl, list("textCleaner",
"GlobalStopWords",
"GlobalOldWords",
"GlobalNewWords"))
Text <- as.character(unlist(parallel::parLapply(cl, Text,
fun = function(x) textCleaner(x,
GlobalStopWords,
GlobalOldWords,
GlobalNewWords))))
parallel::stopCluster(cl)
return(Text)
}
## Run it Parallel
system.time({
DT[,CleanedText := parallelCleaner(Text = Text,
NCores = 8)]
})
# user system elapsed
# 6.700 5.099 131.429
If the TermDocumentMatrix(template) is the chief offender:
Update: I mentioned Drew Schmidt and Christian Heckendorf also submitted an R package named ngram to CRAN recently that might be worth checking out: ngram Github Repository. Turns out I should have just tried it before explaining the really cumbersome process of building a command line tool from source-- this would have saved me a lot of time had been around 18 months ago!
It is a good deal more memory intensive and not quite as fast -- my memory usage peaked around 31 GB so that may or may not be a deal-breaker for you. All things considered, this seems like a really good option.
For the 500,000 paragraph case, ngrams clocks in at around 7 minutes of runtime:
#install.packages("ngram")
library(ngram)
library(data.table)
system.time({
ng1 <- ngram::ngram(DT[["CleanedText"]],n = 1)
ng2 <- ngram::ngram(DT[["CleanedText"]],n = 2)
ng3 <- ngram::ngram(DT[["CleanedText"]],n = 3)
pt1 <- setDT(ngram::get.phrasetable(ng1))
pt1[,Ngrams := 1L]
pt2 <- setDT(ngram::get.phrasetable(ng2))
pt2[,Ngrams := 2L]
pt3 <- setDT(ngram::get.phrasetable(ng3))
pt3[,Ngrams := 3L]
pt <- rbindlist(list(pt1,pt2,pt3))
})
# user system elapsed
# 411.671 12.177 424.616
pt[Ngrams == 2][order(-freq)][1:5]
# ngrams freq prop Ngrams
# 1: SED SED 75096 0.0018013693 2
# 2: AC SED 33390 0.0008009444 2
# 3: SED AC 33134 0.0007948036 2
# 4: SED EU 30379 0.0007287179 2
# 5: EU SED 30149 0.0007232007 2
You can try using a more efficient ngram generator. I use a command line tool called ngrams (available on github here) by Zheyuan Yu- partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6 to take pre-processed text files off disk and generate a .csv output with ngram frequencies.
You'll need to build from source yourself using make and then interface with it using system() calls from R, but I found it to run orders of magnitude faster while using a tiny fraction of the memory. Using it, I was was able generate 5-grams from ~700MB of text input in well under an hour, the CSV result with all the output was 2.9 GB file with 93 million rows.
Continuing the example above, In my working directory, I have a folder, ngrams-master, in my working directory that contains the ngrams executable created with make.
writeLines(DT[["CleanedText"]],con = "ExampleText.txt")
system2(command = "ngrams-master/ngrams",args = "--type=word --n = 3 --in ExampleText.txt", stdout = "ExampleGrams.csv")
# ngrams have been generated, start outputing.
# Subtotal: 165 seconds for generating ngrams.
# Subtotal: 12 seconds for outputing ngrams.
# Total 177 seconds.
Grams <- fread("ExampleGrams.csv")
# Read 5917978 rows and 3 (of 3) columns from 0.160 GB file in 00:00:06
Grams[Ngrams == 3 & Frequency > 10][sample(.N,5)]
# Ngrams Frequency Token
# 1: 3 11 INTERDUM_NEC_RIDICULUS
# 2: 3 18 MAURIS_PORTTITOR_ERAT
# 3: 3 14 SOCIIS_AMET_JUSTO
# 4: 3 23 EGET_TURPIS_FERMENTUM
# 5: 3 14 VENENATIS_LIGULA_NISL
I think I may have made a couple tweaks to get the output format how I wanted it, if you're interested I can try to find the changes I made to generate a .csvoutputs that differ from the default and upload to Github. (I did that project before I was familiar with the platform so I don't have a good record of the changes I made, live and learn.)
Update 2: I created a fork on Github, msummersgill/ngrams that reflects the slight tweaks I made to output results in a .CSV format. If someone was so inclined, I have a hunch that this could be wrapped up in a Rcpp based package that would be acceptable for CRAN submission -- any takers? I honestly have no clue how Ternary Search Trees work, but they seem to be significantly more memory efficient and faster than any other N-gram implementation currently available in R.
Drew Schmidt and Christian Heckendorf also submitted an R package named ngram to CRAN, I haven't used it personally but it might be worth checking out as well: ngram Github Repository.
The Whole Shebang:
Using the same pipeline described above but with a size closer to what you're dealing with (ExampleText.txt comes out to ~274MB):
DT <- data.table(Text = stringi::stri_rand_lipsum(500000))
system.time({
DT[,CleanedText := textCleaner(Text, GlobalStopWords,GlobalOldWords, GlobalNewWords)]
})
# user system elapsed
# 66.969 0.747 67.802
writeLines(DT[["CleanedText"]],con = "ExampleText.txt")
system2(command = "ngrams-master/ngrams",args = "--type=word --n = 3 --in ExampleText.txt", stdout = "ExampleGrams.csv")
# ngrams have been generated, start outputing.
# Subtotal: 165 seconds for generating ngrams.
# Subtotal: 12 seconds for outputing ngrams.
# Total 177 seconds.
Grams <- fread("ExampleGrams.csv")
# Read 5917978 rows and 3 (of 3) columns from 0.160 GB file in 00:00:06
Grams[Ngrams == 3 & Frequency > 10][sample(.N,5)]
# Ngrams Frequency Token
# 1: 3 11 INTERDUM_NEC_RIDICULUS
# 2: 3 18 MAURIS_PORTTITOR_ERAT
# 3: 3 14 SOCIIS_AMET_JUSTO
# 4: 3 23 EGET_TURPIS_FERMENTUM
# 5: 3 14 VENENATIS_LIGULA_NISL
While the example may not be a perfect representation due to the limited vocabulary generated by stringi::stri_rand_lipsum(), the total run time of ~4.2 minutes using less than 8 GB of RAM on 500,000 paragraphs has been fast enough for the corpuses (corpi?) I've had to tackle in the past.
If wordcloud() is the source of the slowdown:
I'm not familiar with this function, but #Gregor's comment on your original post seems like it would take care of this issue.
library(wordcloud)
GramSubset <- Grams[Ngrams == 2][1:500]
par(bg="gray50")
wordcloud(GramSubset[["Token"]],GramSubset[["Frequency"]],color = GramSubset[["Frequency"]],
rot.per = 0.3,font.main=1, col.main="cornsilk3", cex.main=1.5)

base R faster than readr for reading multiple CSV files

There is a lot of documentation on how to read multiple CSVs and bind them into one data frame. I have 5000+ CSV files I need to read in and bind into one data structure.
In particular I've followed the discussion here: Issue in Loading multiple .csv files into single dataframe in R using rbind
The weird thing is that base R is much faster than any other solution I've tried.
Here's what my CSV looks like:
> head(PT)
Line Timestamp Lane.01 Lane.02 Lane.03 Lane.04 Lane.05 Lane.06 Lane.07 Lane.08
1 PL1 05-Jan-16 07:17:36 NA NA NA NA NA NA NA NA
2 PL1 05-Jan-16 07:22:38 NA NA NA NA NA NA NA NA
3 PL1 05-Jan-16 07:27:41 NA NA NA NA NA NA NA NA
4 PL1 05-Jan-16 07:32:43 9.98 10.36 10.41 10.16 10.10 9.97 10.07 9.59
5 PL1 05-Jan-16 07:37:45 9.65 8.87 9.88 9.86 8.85 8.75 9.19 8.51
6 PL1 05-Jan-16 07:42:47 9.14 8.98 9.29 9.04 9.01 9.06 9.12 9.08
I've created three methods for reading in and binding the data. The files are located in a separate directory which I define as:
dataPath <- "data"
PTfiles <- list.files(path=dataPath, full.names = TRUE)
Method 1: Base R
classes <- c("factor", "character", rep("numeric",8))
# build function to load data
load_data <- function(dataPath, classes) {
tables <- lapply(PTfiles, read.csv, colClasses=classes, na.strings=c("NA", ""))
do.call(rbind, tables)
}
#clock
method1 <- system.time(
PT <- load_data(path, classes)
)
Method 2: read_csv
In this case I created a wrapper function for read_csv to use
#create wrapper function for read_csv
read_csv.wrap <- function(x) { read_csv(x, skip = 1, na=c("NA", ""),
col_names = c("tool", "timestamp", paste("lane", 1:8, sep="")),
col_types =
cols(
tool = col_character(),
timestamp = col_character(),
lane1 = col_double(),
lane2 = col_double(),
lane3 = col_double(),
lane4 = col_double(),
lane5 = col_double(),
lane6 = col_double(),
lane7 = col_double(),
lane8 = col_double()
)
)
}
##
# Same as method 1, just uses read_csv instead of read.csv
load_data2 <- function(dataPath) {
tables <- lapply(PTfiles, read_csv.wrap)
do.call(rbind, tables)
}
#clock
method2 <- system.time(
PT2 <- load_data2(path)
)
Method 3: read_csv + dplyr::bind_rows
load_data3 <- function(dataPath) {
tables <- lapply(PTfiles, read_csv.wrap)
dplyr::bind_rows(tables)
}
#clock
method3 <- system.time(
PT3 <- load_data3(path)
)
What I can't figure out, is why read_csv and dplyr methods are slower for elapsed time when they should be faster. The CPU time is decreased, but why would the elapsed time (file system) increase? What's going on here?
Edit - I added the data.table method as suggested in the comments
Method 4 data.table
library(data.table)
load_data4 <- function(dataPath){
tables <- lapply(PTfiles, fread)
rbindlist(tables)
}
method4 <- system.time(
PT4 <- load_data4(path)
)
The data.table method is the fastest from a CPU standpoint. But the question still stands on what is going on with the read_csv methods that makes them so slow.
> rbind(method1, method2, method3, method4)
user.self sys.self elapsed
method1 0.56 0.39 1.35
method2 0.42 1.98 13.96
method3 0.36 2.25 14.69
method4 0.34 0.67 1.74

I would do that in the terminal(Unix). I would put all files int the same folder and then navigate to that folder (in terminal), the use the following command to create only one CSV file:
cat *.csv > merged_csv_file.csv
One observation regarding this method is that the header of each file will show up in the middle of the observations. To solve this I would suggest you do:
Get just the header from the first file
head -2 file1.csv > merged_csv_file.csv
then skip the first "X" lines from the other files, with the folling command, where "X" is the number of lines to skip.
tail -n +3 -q file*.csv >> merged_csv_file.csv
-n +3 makes tail print lines from 3rd to the end, -q tells it not to print the header with the file name (read man), >> adds to the file, not overwrites it as >.

I might have found a related issue. I am reading in nested CSV data from some simulation output, where multiple columns have CSV formatted data as elements, which I need to unnest and reshape for analysis.
With simulations where I have many runs, this resulted in thousands of elements that needed to be parsed. Using map(.,read_csv) this would take hours to transform. When I rewrote my script to apply read.csv in a lambda function, the operation would complete in seconds.
I'm curious if there is some intermediate system I/O operation or error handling that creates a bottleneck you wouldn't run into with a single input file.

fast url query with R

Hi have to query a website 10000 times I am looking for a real fast way to do it with R
as a template url:
url <- "http://mutationassessor.org/?cm=var&var=7,55178574,G,A"
my code is:
url <- mydata$mutationassessorurl[1]
rawurl <- readHTMLTable(url)
Mutator <- data.frame(rawurl[[10]])
for(i in 2:27566) {
url <- mydata$mutationassessorurl[i]
rawurl <- readHTMLTable(url)
Mutator <- smartbind(Mutator, data.frame(rawurl[[10]]))
print(i)
}
using microbenchmark I have 680 milliseconds for query. I was wondering if there is a faster way to do it!
Thanks

One way to speed up http connections is to leave the connection open
between requests. The following example shows the difference it makes
for httr. The first option is most similar to the default behaviour in
RCurl.
library(httr)
test_server <- "http://had.co.nz"
# Return times in ms for easier comparison
timed_GET <- function(...) {
req <- GET(...)
round(req$times * 1000)
}
# Create a new handle for every request - no connection sharing
rowMeans(replicate(20,
timed_GET(handle = handle(test_server), path = "index.html")
))
## redirect namelookup connect pretransfer starttransfer
## 0.00 20.65 75.30 75.40 133.20
## total
## 135.05
test_handle <- handle(test_server)
# Re use the same handle for multiple requests
rowMeans(replicate(20,
timed_GET(handle = test_handle, path = "index.html")
))
## redirect namelookup connect pretransfer starttransfer
## 0.00 0.00 2.55 2.55 59.35
## total
## 60.80
# With httr, handles are automatically pooled
rowMeans(replicate(20,
timed_GET(test_server, path = "index.html")
))
## redirect namelookup connect pretransfer starttransfer
## 0.00 0.00 2.55 2.55 57.75
## total
## 59.40
Note the difference in the namelookup and connect - if you're sharing a
handle you need to do each of these operations only once, which saves
quite a bit of time.
There's quite a lot of intra-request variation - on average the last two
methods should be very similar.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Fastest way to upload data via R to PostgresSQL 12 - r

Related

sparklyr connecting to kafka streams/topics

DBI::dbWriteTable doesn't append beyond specific number of rows and throws exception on nvarchar(max)

Slow wordcloud in R

base R faster than readr for reading multiple CSV files

fast url query with R

Categories

Resources