Is there anyway to batch rdap request for domains? - networking

i want to check 500 domains....does rdap support a way to do it in one api call or do I have to do each domain individually?
$ for in in {1..10}; do curl https://rdap.verisign.com/com/v1/domain/chovy.com -s | jq -r '.events[] | select(.eventAction | contains("expiration")) | .eventDate'; done

No.
See RFC7482:
3.1. Lookup Path Segment Specification
A simple lookup to determine if an object exists (or not)
RDAP is JSON over HTTPS in a REST fashion. So you query for one "object", be it a domain, a host, a contact, or other things (RDAP is not only used for domain name registries but also IP ones).
There is search in RDAP (see section 3.2 in above RFC) and also various drafts about extensions (regular expressions, etc.) to be able to do a query that gets back potentially multiple results. However no registries in production enable that, and you can easily imagine that very few will, especially through public access.
Please note:
you do not need to do your queries sequentially in case of large batches, you can use threads or multiple processes
if you are not cautious (in limiting your requests) you will get at least rate limited if not completely banned for some time.

Related

How to implement discovery of the [corresponding] _file path_ for a [stored] resource, using HTTP?

We have an HTTP application which needs to communicate to the user the path of the resource the user has requested or modified, as people otherwise struggle to later locate the corresponding file when accessing the backing storage through other means (e.g. remote shell to a host where said storage is mounted with known path prefix).
For example, the user agent would send a HTTP request with the following first line:
PUT /project/human-genome/files/meetings/owners-101190/report.txt
...meaning someone requested creation and storage of the resource at the corresponding URL.
Now, you can say one should be trivially able to infer the file path from the URL, and normally I'd be inclined to design URLs with such assumption in mind, but in our case we don't generally have such relationship that the client can automatically infer the path like that, and this relationship (the "mapping", if you will) can change, so what we have is this additional level of indirection, you can say, that makes assumptions like this inapplicable.
In practice, it means the above URL can't be assumed to correspond to a file named report.txt stored in some folder with path like /project/human-genome/files/meetings/owners-....
Instead, we want to have the client be able to negotiate obtaining the path value related to the resource.
I have thought of the following solutions:
Return the path with a response header for at least HEAD requests; the header is probably custom, something like Resource-Path; If utilizing a custom header, there needs to be specification that tells for what responses related to the resource, the header is returned? Is it only returned for HEAD requests? Doesn't that fly in the face of HTTP basically describing HEAD request as GET-without-response-body, meaning that the corresponding GET response also shall include the response header? That is an overhead and a waste if the client didn't need the metadata. And if metadata is returned, how much of it, and what kind of representation of it? This option intuitively seems to be inferior to the second alternative:
Allocate a dedicated URL for metadata about the resource including the "stored path", e.g. /metadata/project/human-genome/files/meetings/owners-101190/report.txt, with optional suffix like :path or /path and/or utilizing URL query variables -- to specify what metadata to return; whether it's some /metadata/ URL pathname prefix, or some /metadata or :metadata suffix, this seems more "idiomatic", but I am not entirely convinced of the wisdom in representing also metadata as server resource -- what would the semantics for requests like PUT .../metadata, be?
On top of the choices above, I am half-suspecting this is a solved problem and there is some RFC I should read that solves this very problem -- surely metadata about resources that doesn't fall into categories for which HTTP uses response headers (which normally pertain to the response, not the resource), is something a lot of applications and application servers routinely have to manage?
Is there a truly idiomatic approach here that can be the correct answer to this problem?

AWS Lambda, Caching {proxy+}

Simple ASP.Net AWS Lambda is uploaded and functioning with several gets like:
{proxy+}
api/foo/bar?filter=value
api/foo/barlist?limit=value
with paths tested in Postman as:
//#####.execute-api.us-west-2.amazonaws.com/Prod/{proxy+}
Now want to enable API caching but when doing so only the first api call gets cached and all other calls now incorrectly return the first cached value.
ie //#####.execute-api.us-west-2.amazonaws.com/Prod/api/foo/bar?filter=value == //#####.execute-api.us-west-2.amazonaws.com/Prod/api/foo/barlist?limit=value; In terms of the cache these are return the same but shouldn't be.
How do you setup the caching in APIGateway to correctly see these as different requests per both path and query?
I believe you can't use {proxy+} because that is a resource/integration itself and that is where the caching is getting applied. Or you can (because you can cache any integration), but you get the result you're getting.
Note: I'll use the word "resource" a lot because I think of each item in API Gateway as the item in question, but I believe technically AWS documentation will say "integration" because it's not just the resource but the actual integration on said resource...And said resource has an integration and parameters or what I'll go on to call query string parameters. Apologies to the terminology police.
Put another way, if you had two resources: GET foo/bar and GET foo/barlist then you'd be able to set caching on either or both. It is at this resource based level that caching exists (don't think so much as the final URL path, but the actual resource configured in API Gateway). It doesn't know to break {proxy+} out into an unlimited number of paths unfortunately. Actually it's method plus resource. So I believe you could have different cached results for GET /path and POST /path.
However. You can also choose the integration parameters as cache keys. This would mean that ?filter=value and ?limit=value would be two different cache keys with two different cached responses.
Should foo/bar and foo/barlist have the same query string parameters (and you're still using {proxy+}) then you'll run into that duplicate issue again.
So you may wish to do foo?action=bar&filter=value and foo?action=barlist&filter=value in that case.
You'll need to configure this of course, for each query string parameter. So that may also start to diminish the ease of {proxy+} catch all. Terraform.io is your friend.
This is something I wish was a bit more automatic/smarter as well. I use {proxy+} a lot and it really creates challenges for using their caching.

Should "/users" and "/users/" point to the same (RESTful) resource?

They do in this and probably any other website, but I'm not sure I understand why.
A popular analogy compares RESTful resources to files in the file system and filename users wouldn't point to the same object as filename users/ static web pages and in a static website users would point to users.html and users/ - to a different file - users/index.html.
Short answer: they may only identify the same resource if one redirects to the other.
URI's identify resources, but they do so differently depending on the response status code to a GET request in HTTP. If one returns a 3xx to the other, then the two URI's identify the same resource. If the two resources each return a 2xx code, then the URI's identify different resources. They may return the same response in reply to a GET request, but they are not therefore the same resource. The two resources may even map to the same handler to produce their reply, but they are not therefore the same resource. To quote Roy Fielding:
The resource is not the storage object. The resource is not a
mechanism that the server uses to handle the storage object. The
resource is a conceptual mapping -- the server receives the identifier
(which identifies the mapping) and applies it to its current mapping
implementation (usually a combination of collection-specific deep tree
traversal and/or hash tables) to find the currently responsible
handler implementation and the handler implementation then selects the
appropriate action+response based on the request content.
So, should /users and /users/ return the same response? No. If one does not redirect to the other, then they should return different responses. However, this is not itself a constraint of REST. It is a constraint, however, which makes networked systems more scalable: information that is duplicated in multiple resources can get out of sync (especially in the presence of caches, which are a constraint of REST) and lead to race conditions. See Pat Helland's Apostate's Opinion for a complete discussion.
Finally, clients may break when attempting to resolve references relative to the given URI. The URI spec makes it clear that resolving the relative reference Jerry/age against /users/ results in /users/Jerry/age, while resolving it against /users (no trailing slash) results in /Jerry/age. It's amazing how much client code has been written to detect and correct the latter to behave like the former (and not always successfully).
For any collection (which /users/ often is), I find it best to always emit /users/ in URI's, redirect /users to /users/ every time, and serve the final response from the /users/ resource: this keeps entities from getting out of sync and makes relative resolution a snap on any client.
filename users wouldn't point to the same object as filename users/.
That is not true. In most filesystems, you cannot have a file named users and a directory named users in the same parent directory.
cd users and cd users/ have the same result.
There are some nuances on this, while "users" represent one resource while "users/" should represent a set of resources, or operations on all resources "users"... But there does not seem to exist a "standard" for this issue.
There is another discussion on this, take a look here: https://softwareengineering.stackexchange.com/questions/186959/trailing-slash-in-restful-api
Technically they are not the same. But a request for /users will probably cause a redirect to /users/ which makes them semantically equal.
In terms of JAX-RS #Path, they can both be used for the same path.

Creating Riak bucket over cURL

I would like to be able to create a Riak bucket over cURL. I have been searching online and cant seem to find a way to do it. I know there are ways to do it easily with the drivers but need to be able to do it with cURL for the Saas application I am working on.
You would do a PUT passing the bucket properties you want as a json object, e.g.
curl -v http://riak3.local:8098/riak/newBucket -X PUT -H Content-Type:application/json --data-binary '{"props":{"n_val":5}}'
The docs has more complete details.
One other thing - the important thing to remember is that, there is no way to 'create' a bucket explicitly, with a call (via CURL or a client api).
You can only create custom buckets via the call above.
The reason for that is -- buckets are simply just a prefix to the keys. There is no object anywhere in the Riak system that keeps track of buckets. There is no file somewhere, no variable in memory, or anything like that. That is why the simple "list buckets" commands is so expensive: Riak literally has to go through every key in the cluster, and assemble a list of buckets by looking at the key prefixes.
The only thing that exists as actual objects are buckets with non-default settings, ie, custom buckets. That's what that curl command above does -- it keeps track of some non-default settings for Riak to consult if a call ever comes in to that bucket.
Anyways, the moral of the story is: you don't need to create buckets in the normal course of operations. You can just start writing to them, and they will come into being (again, in the sense of, keys with bucket prefixes will come into being, which means they can now be iterated over by the expensive 'list buckets' call).
You only have to issue the call for custom buckets (and also, you don't want to do that too much, either, as there is a practical limit to the number of custom buckets created in a cluster, somewhere around 9000).
I also found that if you add a new object to an non existing bucket it will create that bucket on the fly.
Remember, buckets are automatically created when you add keys to them. There is no need to explicitly “create” a bucket (more on buckets and their properties further down the page.)
Bucket Properties and Operations

How to separate background HTTP requests

This is more of an issue of trying to understand how HTTP really works and then implementing it.
I need to have a HTTP analyzer that will be able to separate between the main page requests and "background" requests from some HTTP log data. The idea is to separate HTTP requests made by the user from those that happen automatically (loosely using this term) in the background. So, from the first few impressions of the HTTP data that I've seen it seems like when I go to any normal website an text/html object is fetched followed by a lot of other objects like css, xml, javascript, images etc.
Now, the problem is how do I separate these "background" requests where the user is actively not generating the requests. This will mostly be ad fetches, redirections and some Ajax based things from what I know.
Does anyone has any idea with regards to this. Some, experience or may be resources that you could point me to get started with doing this analysis?
There's no way to distinguish which requests were generated by the browser because of specific user actions or because of other automated processes from the bare HTTP requests. The browser/client it the only one that has such knowledge, so that you have to make it part of the picture, e.g. implementing the analyzer as a browser plugin or to embed an HTTP client as part of the analyzer itself.
If you're trying to create a generic tool to analyze traffic load, it's usually not meaningful to distinguish between traffic generated by user's direct "clicks" and automated requests.
There's no direct and clean way to do this. However, you can get pretty close by filtering out requests for files that clearly are not "user" requests, like *.jpg. Furthermore, you can filter out what is not a HTTP/200 response (e.g., 301 and 302 redirects).
Try something along the lines of:
cat access.log
| grep -E -v "(.gif|.ico|.png|.jpg|.jpeg|.js|.css) HTTP"
| grep "HTTP/1.1\" 200"
(added line breaks for readability)

Resources