TIdHTTP slow downloads - http

I use TIdHTTP to download updates of my application. The install file is about 80 mb.
It works, but I noticed that somehow, the download speed is way slower than the same link downloaded directly from Google Chrome.
Why does this happen? Is there any setup I should do on TIdHTTP to speed up the download?
Nothing fancy on my code, I just use the Get() method like this:
idh := TIdHTTP.Create(nil);
ssl := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
ssl.SSLOptions.Method := sslvSSLv23;
ssl.SSLOptions.SSLVersions := [sslvTLSv1, sslvTLSv1_1, sslvTLSv1_2];
f := TFileStream.Create(localFileName, fmCreate);
idh.Get(remoteFile, f);

With TIdHTTP you may implement parallel downloading by launching two or more HTTP GET Requests in different threads, which each download a specific part of the resource. This however only will increase download speed if the system has enough CPU resources to execute the threads on different "cores".
See https://stackoverflow.com/a/9678441/80901 for some related information

Related

Uploading larger files with User-Agent python-requests/2.2.1 results in RemoteDisconnected

Using the python library requests and uploading larger files I will get the error RemoteDisconnected('Remote end closed connection without response').
However it will work if I change the default User-Agent of the library to something like "Mozilla/5.0".
Does anybody know the reason for this behaviour ?
Edit: Only happens with Property X-Explode-Archive: true
Are there any specific pattern of timeout that you could highlight in this case?
For example: It times out after 60 seconds every time (of that sort)?
I would suggest to check the logs from all the medium configured with the Artifactory instance. Like, Reverse-proxy & the embedded-tomcat too. As the issue is specific to large-sized files, correlate the timeout pattern with the timeouts configured from all the entities which would give us a hint towards this issue.

Reading Incoming HTTP Response Bodies to CLI

There's a simple game that my friends and I play both in person and and online. I developed a CLI that records our in-person games (I just type in each move), but I now want to use it to record our online games. All I need to do is pipe the HTTP response bodies being sent to my browser (Firefox) to my CLI. Unfortunately. I can't figure out how to do this.
Ideally, I'm looking for a Ubuntu package that I can run from the command line that will capture and return all HTTP response bodies from a specific endpoint. I've looked into tcpdump and some simple proxy servers, but I'm not sure they do what I want them to do.
Thanks for your help! Let me know if I need to provide any further information!
I used MITMProxy as ZachChilders recommended in the comments. I found it somewhat difficult to get set up, so I'll include what directions I followed to get it up and running:
1) Install MITMProxy.
2) Configure Firefox.
3) Create Add On to parse body.
4) Stream data via Python to CLI (TODO).

Non-blocking download using curl in R

I am writing some code where I download many pages of from a web API, do some processing, and combine them into a data frame. The API takes ~30 seconds to respond to each request, so it would be convenient to send the request for the next page while doing the processing for the current page. I can do this using, e.g., mcparallel, but that seems like overkill. The curl package claims that it can make non-blocking connections, but this does not seem to work for me.
From vignette("intro", "curl"):
As of version 2.3 it is also possible to open connetions in
non-blocking mode. In this case readBin and readLines will return
immediately with data that is available without waiting. For
non-blocking connections we use isIncomplete to check if the download
has completed yet.
con <- curl("https://httpbin.org/drip?duration=1&numbytes=50")
open(con, "rb", blocking = FALSE)
while(isIncomplete(con)){
buf <- readBin(con, raw(), 1024)
if(length(buf))
cat("received: ", rawToChar(buf), "\n")
}
close(con)
The expected result is that the open should return immediately, and then 50 asterisks should be progressively printed over 1 second as the results come in. For me, the open blocks for about a second, and then the asterisks are printed all at once.
Is there something else I need to do? Does this work for anyone else?
I am using R version 3.3.2, curl package version 3.1, and libcurl3 version 7.47.0 on Ubuntu 16.04 LTS. I have tried in RStudio and the command line R console, with the same results.

Why Symfony3 so slow?

I installed Symfony3 framework-standard-edition. I'm trying to open the home page(app.php prod) and it is loaded 300-400ms.
This is my profiler information:
also I use php7.
Why it is so long?
You can try to optimize Zend OPCache.
Here are some recommended settings
opcache.revalidate_freq
Basically put, how often (in seconds) should the code cache expire and check if your code has changed. 0 means it checks your PHP code every single request (which adds lots of stat syscalls). Set it to 0 in your development environment. Production doesn't matter because of the next setting.
opcache.validate_timestamps
When this is enabled, PHP will check the file timestamp per your opcache.revalidate_freq value.
When it's disabled, opcache.revaliate_freq is ignored and PHP files are NEVER checked for updated code. So, if you modify your code, the changes won't actually run until you restart or reload PHP (you force a reload with kill -SIGUSR2).
Yes, this is a pain in the ass, but you should use it. Why? While you're updating or deploying code, new code files can get mixed with old ones— the results are unknown. It's unsafe as hell
opcache.max_accelerated_files
Controls how many PHP files, at most, can be held in memory at once. It's important that your project has LESS FILES than whatever you set this at. For a codebase at ~6000 files, I use the prime number 8000 for maxacceleratedfiles.
You can run find . -type f -print | grep php | wc -l to quickly calculate the number of files in your codebase.
opcache.memory_consumption
The default is 64MB. You can use the function opcachegetstatus() to tell how much memory opcache is consuming and if you need to increase the amount.
opcache.interned_strings_buffer
A pretty neat setting with like 0 documentation. PHP uses a technique called string interning to improve performance— so, for example, if you have the string "foobar" 1000 times in your code, internally PHP will store 1 immutable variable for this string and just use a pointer to it for the other 999 times you use it. Cool.
This setting takes it to the next level— instead of having a pool of these immutable string for each SINGLE php-fpm process, this setting shares it across ALL of your php-fpm processes. It saves memory and improves performance, especially in big applications.
The value is set in megabytes, so set it to "16" for 16MB. The default is low, 4MB.
opcache.fast_shutdown
Another interesting setting with no useful documentation. "Allows for faster shutdown".
Oh okay. Like that helps me. What this actually does is provide a faster mechanism for calling the destructors in your code at the end of a single request to speed up the response and recycle php workers so they're ready for the next incoming request faster.
Set it to 1 and turn it on.
opcache=1
opcache.memory_consumption=256
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=8000
opcache.validate_timestamps=0
opcache.revalidate_freq=0
opcache.fast_shutdown=1
I hope it will help improve your performances
[EDIT]
You might also want to look at this answer:
Are Doctrine relations affecting application performance?
TheMrbikus, try some optimization with the following elements:
Use APC
Use Bootstrap files
Reference: http://symfony.com/doc/current/performance.html
Use the OPCache PHP7
Use Apache PHP-FPM.
E-mail sending process, and may slow down during the form rendering operations. Create a blank test Controller.

how to use the example of scrapy-redis

I have read the example of scrapy-redis but still don't quite understand how to use it.
I have run the spider named dmoz and it works well. But when I start another spider named mycrawler_redis it just got nothing.
Besides I'm quite confused about how the request queue is set. I didn't find any piece of code in the example-project which illustrate the request queue setting.
And if the spiders on different machines want to share the same request queue, how can I get it done? It seems that I should firstly make the slave machine connect to the master machine's redis, but I'm not sure which part to put the relative code in,in the spider.py or I just type it in the command line?
I'm quite new to scrapy-redis and any help would be appreciated !
If the example spider is working and your custom one isn't, there must be something that you have done wrong. Update your question with the code, including all relevant parts, so we can see what went wrong.
Besides I'm quite confused about how the request queue is set. I
didn't find any piece of code in the example-project which illustrate
the request queue setting.
As far as your spider is concerned, this is done by appropriate project settings, for example if you want FIFO:
# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
# Don't cleanup redis queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True
# Schedule requests using a queue (FIFO).
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'
As far as the implementation goes, queuing is done via RedisSpider which you must inherit from your spider. You can find the code for enqueuing requests here: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/scheduler.py#L73
As for the connection, you don't need to manually connect to the redis machine, you just specify the host and port information in the settings:
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
And the connection is configured in the ċonnection.py: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/connection.py
The example of usage can be found in several places: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/pipelines.py#L17

Resources