Creating Riak bucket over cURL - http

I would like to be able to create a Riak bucket over cURL. I have been searching online and cant seem to find a way to do it. I know there are ways to do it easily with the drivers but need to be able to do it with cURL for the Saas application I am working on.

You would do a PUT passing the bucket properties you want as a json object, e.g.
curl -v http://riak3.local:8098/riak/newBucket -X PUT -H Content-Type:application/json --data-binary '{"props":{"n_val":5}}'
The docs has more complete details.

One other thing - the important thing to remember is that, there is no way to 'create' a bucket explicitly, with a call (via CURL or a client api).
You can only create custom buckets via the call above.
The reason for that is -- buckets are simply just a prefix to the keys. There is no object anywhere in the Riak system that keeps track of buckets. There is no file somewhere, no variable in memory, or anything like that. That is why the simple "list buckets" commands is so expensive: Riak literally has to go through every key in the cluster, and assemble a list of buckets by looking at the key prefixes.
The only thing that exists as actual objects are buckets with non-default settings, ie, custom buckets. That's what that curl command above does -- it keeps track of some non-default settings for Riak to consult if a call ever comes in to that bucket.
Anyways, the moral of the story is: you don't need to create buckets in the normal course of operations. You can just start writing to them, and they will come into being (again, in the sense of, keys with bucket prefixes will come into being, which means they can now be iterated over by the expensive 'list buckets' call).
You only have to issue the call for custom buckets (and also, you don't want to do that too much, either, as there is a practical limit to the number of custom buckets created in a cluster, somewhere around 9000).

I also found that if you add a new object to an non existing bucket it will create that bucket on the fly.
Remember, buckets are automatically created when you add keys to them. There is no need to explicitly “create” a bucket (more on buckets and their properties further down the page.)
Bucket Properties and Operations

Related

Only download changed/updated documents from Firestore

being new to firestore, I am trying to keep the number of downloads of documents as small as possible. I figured that I could download documents only once and store them offline. If something is changed in the cloud, download a new copy and replace the offline version of that document (give relevant documents a timestamp of last-change and download when the local version is older). I haven't started with this yet but I am going to assume something like this must already exist, right?
I'm not sure where to start and google isn't giving me many answers with the exception of enablePersistence() from the FirebaseFirestore instance. I have a feeling that this is not the thing I am looking for since it would be weird to artificially turn the network on and off for every time I want to check for changes.
Am I missing something or am I about to discover an optimisation solution to this problem?
What you're describing isn't something that's built in to Firestore. You're going to have to design and build it using Firestore capabilities that you read in the documentation. Persistence is enabled by default, but that's not going to solve your problem, which is rather broadly stated here.
The bottom line is that neither Firestore nor its local cache have an understanding of optimizing the download of only documents that have changed, by whatever definition of "change" that you choose. When you query Firestore, it will, by default, always go to the server and download the full set of documents that match your query. If you want to query only the local cache, you can do that as well, but it will not consult the server or be aware of any changes. These capabilities are not sufficient for the optimizations you describe.
If you want to get only document changes since the last time you queried, you're going to have to design your data so that you can make such a query, perhaps by adding a timestamp field to each document, and use that to query for documents that have changed since the last time you made a query. You might also need to manage your own local cache, since the SDK's local cache might not be flexible enough for what you want.
I recommend that you read this blog post that describes in more detail how the local cache actually works, and how you might take advantage of it (or not).

Using SQLite as a file cache

My C++ application needs to support caching of files downloaded from the network. I started to write a native LRU implementation when someone suggested I look at using SQLite to store an ID, a file blob (typically audio files) and the the add/modify datetimes for each entry.
I have a proof of concept working well for the simple case where one client is accessing the local SQLite database file.
However, I also need to support multiple access by different processes in my application as well as support multiple instances of the application - all reading/writing to/from the same database.
I have found a bunch of posts to investigate but I wanted to ask the experts here too - is this a reasonable use case for SQLite and if so, what features/settings should I dig deeper into in order to support my multiple access case.
Thank you.
M.
Most filesystems are in effect databases too, and most store two or more timestamps for each file, i.e. related to the last modification and last access time allowing implementation of an LRU cache. Using the filesystem directly will make just as efficient use of storage as any DB, and perhaps more so. The filesystem is also already geared toward efficient and relatively safe access by multiple processes (assuming you follow the rules and algorithms for safe concurrent access in a filesystem).
The main advantage of SQLite may be a slightly simpler support for sorting the list of records, though at the cost of using a separate query API. Of course a DB also offers the future ability of storing additional descriptive attributes without having to encode those in the filename or in some additional file(s).

Fill in NGINX cache from different service

I have an idea to form NGINX cache in non most common way and I want to ask if this is really possible to achieve.
The common way we all are used is when request hits the backend service and only then it's cached in NGINX.
What I want to achieve is to form that NGINX native cache from separate service. That means I want to manipulate hashed keys that are stored in memory via some NGINX module and also create that directory structure with files that contain cached payloads.
The questions would be:
Is this possible?
How to achieve this, what modules should I include into NGINX, etc.?
NGINX is writing cached data to filesystem using some algorithm described here: http://czerasz.com/2015/03/30/nginx-caching-tutorial/. What is actually stored in the first line of that cached file? Everything from the second line is payload but there are some bytes written into the first line that are non readable and cache does not work in case this line is removed.
Thanks in advance!

AWS Lambda, Caching {proxy+}

Simple ASP.Net AWS Lambda is uploaded and functioning with several gets like:
{proxy+}
api/foo/bar?filter=value
api/foo/barlist?limit=value
with paths tested in Postman as:
//#####.execute-api.us-west-2.amazonaws.com/Prod/{proxy+}
Now want to enable API caching but when doing so only the first api call gets cached and all other calls now incorrectly return the first cached value.
ie //#####.execute-api.us-west-2.amazonaws.com/Prod/api/foo/bar?filter=value == //#####.execute-api.us-west-2.amazonaws.com/Prod/api/foo/barlist?limit=value; In terms of the cache these are return the same but shouldn't be.
How do you setup the caching in APIGateway to correctly see these as different requests per both path and query?
I believe you can't use {proxy+} because that is a resource/integration itself and that is where the caching is getting applied. Or you can (because you can cache any integration), but you get the result you're getting.
Note: I'll use the word "resource" a lot because I think of each item in API Gateway as the item in question, but I believe technically AWS documentation will say "integration" because it's not just the resource but the actual integration on said resource...And said resource has an integration and parameters or what I'll go on to call query string parameters. Apologies to the terminology police.
Put another way, if you had two resources: GET foo/bar and GET foo/barlist then you'd be able to set caching on either or both. It is at this resource based level that caching exists (don't think so much as the final URL path, but the actual resource configured in API Gateway). It doesn't know to break {proxy+} out into an unlimited number of paths unfortunately. Actually it's method plus resource. So I believe you could have different cached results for GET /path and POST /path.
However. You can also choose the integration parameters as cache keys. This would mean that ?filter=value and ?limit=value would be two different cache keys with two different cached responses.
Should foo/bar and foo/barlist have the same query string parameters (and you're still using {proxy+}) then you'll run into that duplicate issue again.
So you may wish to do foo?action=bar&filter=value and foo?action=barlist&filter=value in that case.
You'll need to configure this of course, for each query string parameter. So that may also start to diminish the ease of {proxy+} catch all. Terraform.io is your friend.
This is something I wish was a bit more automatic/smarter as well. I use {proxy+} a lot and it really creates challenges for using their caching.

Symfony2: slow write to S3 from EC2 instance

I have 1 micro instance and S3 server for dev purposes within same region. My program has typeahead functionality that is working this way:
when user types "lond", the url is:
mys3.s3-website-us-east-1.amazonaws.com/typeahead/cities/lond.json
Because it will return 404, javascript then tries:
http://mydomain.com/typeahead/cities/lond.json
This will trigger the controller that will find all cities starting with lond, write it to S3 and return json results.
So next time someone types "lond", results would be found on S3 as static file and no controller action will be executed.
Now all this works but the problem is that I have very slow write speed from EC2 to S3. When I remove
$filesystem->write($filename, json_encode($results), true) ;
from my code, the response is ~0.7 seconds. With writting to S3, it goes to 2 seconds which is hardly usefull. The problem is bigger for fast typers, probably because of quick writes sent to S3.
I am using KnpGaufretteBundle.
I also tried
echo json_encode($results);
$filename->write(....) ;
die;
to send output to browser and after that continue to save file to S3 but it didn't work. JS didn't get response earlier.
ob_start(), ob_end_flush().... and others didn't work either.
How can I solve it?
I was thinking of starting some background process that will upload result (although, I don't know how to do it) but I think it would be just too complicated.
The other idea is to use amazon_s3 service directly and skip KnpGaufretteBundle, but I would like to avoid that.
What you seem to be trying to do, is use s3 to store cached data that is better stored in memcached or another memory based store.
I'll give you a couple of other options. You could use both if you wanted, but 1 is more than enough.
Use memcached (redis or another key value store) with read through caching. You will use one end point, which on a request, will check if the value exists in the cache, then will read through to get the result if it does not. At the same time, the result will be stored in the cache.
Use a cloudfront distribution with a custom origin. The custom origin will be your website. With this option, the result will be cached on a cloudfront endpoint for a duration specified by your headers. Each endpoint may check the source for the file if it does not have it. To prevent the custom origin from accidentally caching the wrong stuff, you may want to check for the cloudfront user agent and allow it to only access certain urls.

Resources