Symfony2 - Check server-server request - symfony

I need to know when a request comes from a browser and when it comes from a server.
I have created an API and a listener to onKernelRequest event, I need to know what kind of request I received to execute a function or other.
How can I do this on Symfony 2.7?

A “server“ is an HTTP client just as a browser is. They only handle your websites response differently. So there’s no way to be sure who you are talking to. You can only check for a number of indicators.
You can examine the HTTP headers in the Request object. Your best bet would probably be the User-Agent header. But a non-browser could just as well fake the user agent header of an actual browser, so you’d only detect them if they want you to. And you’d have to prepare a list of user agents that you’d consider “servers“.

Related

Track a client through HTTP request

In case of HTTP requests like HEAD / GET / POST etc, which information of client is received by the server?
I know some of the info includes client IP, which can be used to block a user in case of, lets say, too many requests.
Another information of use would be user-agent, which is different for browsers, scripts, curl, postman etc. (Of course client can change default by setting request headers, but thats alright)
I want to know which other parameters can be used to identify a client (or define some properties)? Does the server get the mac address somehow?
So, is there a possibility that just by the request, it is identifiable that this request is being done by a "bot" (python or java code, eg.) vs a genuine user?
Assume there is no token or any such secret shared between client-server so there is no session...each subsequent request is independent.
The technique you are describing is generally called fingerprinting - the article covers properties and techniques. Depending on the use there are many criticisms of it, as it bypasses a users intention of being anonymous. In all cases it is a statistical technique - like most analytics.
Putting your domain behind a service like cloudflare might help prevent some of those bots from hitting your server. Other than a service like that, setting up a reCAPTCHA would block bots from accessing any pages behind it.
It would be hard to detect bots using solely HTTP because they can send you whatever headers they want. These services use other techniques to try and detect and filter out the bots, while allowing real users to access the site.
I don't think you can rely on any HTTP request header, because a client might not send it to the server, and/or there might be proxies between the client and the server that strip or alter the request headers.
If you just want to associate a unique ID to an HTTP request, you could generate an ID on your backend. For example, the JavaScript framework Hapi.js computes a request ID using this code:
new Date() + '-' + process.pid + '-' + Math.floor(Math.random() * 0x10000)
You might not even need to generate an ID manually. For example, if your app is on AWS and there is an Application Load Balancer in front of your backend, the incoming request will have the custom header X-Amzn-Trace-Id.
As for distinguishing between requests made by human clients and bots, I think you could adopt a "time trap" approach like the one described in this answer about honeypots for spambots.
HTTP request headers are not a good way to track users that use your site. This is because users can edit these headers and the server has no way to verify their authenticy. Also, in the case of the IP Address, it can change during a session if, for example, a user is on a mobile network.
My suggestion is using a cookie with a unique, random id, given to the user the first time they land on a page of your site. Keep in mind that the user can still edit/remove this cookie, so it isn't a perfect method. If you can force the user to login, then you could track the user with their session token.

JMeter: The purpose of User-defined Cookies for requests

I'm just trying to get going with JMeter, and I try to understand more about User-defined cookies.
What purpose do they fulfill when you add them with hard-coded values, like for instance if you define a cookie called A and give it a value B to a certain domain for your HTTP sampler?
Much grateful for all information!
There could be several possible reasons:
Replay user session without having to re-login (i.e. session hijacking) so you would be able to debug your test by running a single request rather than the whole sequence (open login page, login, navigate somewhere, do something, etc.)
The value doesn't have to be hard-coded, it may come from correlation or calculation by the JSR223 Pre-Processor
Negative test scenarios (providing invalid cookie value to check for anticipated error)
You name it
Some web sites have Javascript scripts executed on page load which are adding/removing/updating HTTP cookie(s)
The set() method of the cookies API sets a cookie containing the specified cookie data. This method is equivalent to issuing an HTTP Set-Cookie header during a request to a given URL.
JMeter isn't executing Javascript
JMeter is not a browser, it works at protocol level. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages.
So in JMeter you can manually manipulate HTTP cookie as you expect your site to execute

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

Sending passwords through GET request

I have an app that requires users to enter database passwords. These passwords will not be saved on the server, and the server does not need to remember anything about the database after the request. If it's my understanding, most web servers will log get requests, and the browser can as well (does it do this even for fetch() requests?). I do not want to put databases at risk, but I also understand that you should not use a body in GET requests.
I am also not creating resources, so from what I also understand, I should not be using a POST request. Is there a safe way to send a get request with the password (over https) that makes sure it is not logged on the server? This would be an app that anyone could start - so I have no idea what their server configuration could be so I couldn't specifically disable it on one server to ignore it.
so from what I also understand, I should not be using a POST request.
Your understanding is incorrect. POST is the appropriate verb when none of the other verbs make sense. From the spec:
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
Put simply, that means POST does whatever the service says it does. It isn't safe, idempotent, or cacheable so there are disadvantages to just using it for everything, but the intent is for it to be the catch-all verb.
You should not use GET because, as you mentioned, you should not include a body and URLs often get logged, which would expose your credentials.
If the client for your app is going to be a browser you can just use https only cookies to handle the authentication flow. In case if you want it to extend or use it in any other type of client, you can use the Authorization HTTP header.

HTTP Response before Request

My question might sound stupid, but I just wanted to be sure:
Is it possible to send an HTTP response before having the request for that resource?
Say for example you have an HTML page index.html that only shows a picture called img.jpg.
Now, if your server knows that a visitor will request the HTML file and then the jpg image every time:
Would it be possible for the server to send the image just after the HTML file to save time?
I know that HTTP is a synchronous protocol, so in theory it should not work, but I just wanted someone to confirm it (or not).
A recent post by Jacques Mattheij, referencing your very question, claims that although HTTP was designed as a synchronous protocol, the implementation was not. In practise the browser (he doesn't specify which exactly) accepts answers to requests have not been sent yet.
On the other hand, if you are looking to something less hacky, you could have a look at :
push techniques that allows the server to send content to the browser. The modern implementation that replace long-polling/Comet "hacks" are the websockets. You may want to have a look at socket.io also.
Alternatively you may want to have a look at client-side routing. Some implementations combine this with caching techniques (like in derby.js I believe).
If someone requests /index.html and you send two responses (one for /index.html and the other for /img.jpg), how do you know the recipient will get the two responses and know what to do with them before the second request goes in?
The problem is not really with the sending. The problem is with the receiver possibly getting unexpected data.
One other issue is that you're denying the client the ability to use HTTP caching tools like If-Modified-Since and If-None-Match (i.e. the client might not want /img.jpg to be sent because it already has a cached copy).
That said, you can approximate the server-push benefits by using Comet techniques. But that is much more involved than simply anticipating incoming HTTP requests.
You'll get a better result by caching resources effectively, i.e. setting proper cache headers and configuring your web server for caching. You can also inline images using base 64 encoding, if that's a specific concern.
You can also look at long polling javascript solutions.
You're looking for server push: it isn't available in HTTP. Protocols like SPDY have it, but you're out of luck if you're restricted to HTTP.
I don't think it is possible to mix .html and image in the same HTTP response. As for sending image data 'immediately', right after the first request - there is a concept of 'static resources' which could be of help (but it will require client to create a new reqest for a specific resource).
There are couple of interesting things mentioned in the the article.
No it is not possible.
The first line of the request holds the resource being requested so you wouldn't know what to respond with unless you examined the bytes (at least one line's worth) of the request first.
No. HTTP is defined as a request/response protocol. One request: one response. Anything else is not HTTP, it is something else, and you would have to specify it properly and implement it completely at both ends.

Resources