I have a scenario where I need to connect to two buckets. Each bucket is in a different couchbase cluster. Is it possible to configure with two buckets which are in two different clusters?
It should be possible to define an additional Cluster bean, associated Bucket bean and CouchbaseTemplate. It is certainly possible with 2 buckets within the same cluster, so you could start from there: http://docs.spring.io/spring-data/couchbase/docs/current/reference/html/#couchbase.repository.multibucket
The system I'm working on has multiple environments, each running in separate Azure regions. Our CosmosDB is replicated to these regions and multi-region writes are enabled. We're using the default consistency model (Session).
We have azure functions that use the CosmosDb trigger deployed in all three regions. Currently these use the same lease prefix which means that only one function processes changes at any given time. I know that we can set each
region to have different lease prefixes to enable concurrent processing but I'd like to solidify my understanding before taking this step.
My question is around the behaviour of the change feed with regards to replication in this scenario? According to this link https://github.com/MicrosoftDocs/azure-docs/issues/42248#issuecomment-552207409 data is first converged on the primary region and then the change feed is updated.
Other resources I've read seem to suggest that each region has it's own change feed which will update upon replication. Also, the previous link recommends only running a change feed processor in the primary region in multi-master.
In an ideal world, I'd like change feed processors in each region to handle local writes quickly. These functions will make updates to CosmosDB and I also want to avoid issues with replication. My question is - what is the actual behavior in a multi master configuration (and by extension the correct architecture)?. Is it "safe" to use per-region change feed processors, or should we use a single processor in the primary region?
You cannot have per-region Change Feed Processor's that only process the local changes, because the Change Feed in each region contains the local writes plus the replicated writes from each other region.
Technically you can use a single Change Feed Processor deployment connecting to one of the regions to process events on all the regions.
I am trying to implement dynamodb autoscaling using terraform but I am having a bit of difficulty in understanding the difference between aws_appautoscaling_target and aws_appautoscaling_policy.
Do we need both specified for the autoscaling group? Can some one kidly explain what each is meant for?
Thanks a ton!!
The aws_appautoscaling_target ties your policy to the DynamoDB table. You can define a policy once and use it over and over (i.e. build standard set of scaling policies for your organization to use), the target allows you to bind a policy to a resource.
An auto scaling group doesn't have to have either a target or a resource. An ASG can scale EC2 instances in/out based other triggers such as instance health (defined by EC2 health checks or LB health checks) or desired capacity. This allows a load balanced application to replace bad instances when they are unable to respond to instance traffic and also recover from failures to keep your cluster at the right size. You could add additional scaling policies to better react to demand. For example, your cluster has 2 instances but they're at max capacity, a scaling policy can watch those instances and add more when needed and then remove them when demand falls.
I currently have 2 separate microservices monitoring the same unpartitioned CosmosDB collection (let's call it MasterCollection).
I only ever want one microservice to process MasterCollection's change feed at any given time - the rationale for having 2 separate hosts monitor the same feed basically boils down to redundancy.
This is what I'm doing in code (note that only the first hostName param differs - every other param is identical):
microservice #1:
ChangeFeedEventHost host = new ChangeFeedEventHost("east", monitoredCollInfo, leaseCollInfo, feedOptions, feedHostOptions);
microservice #2:
ChangeFeedEventHost host = new ChangeFeedEventHost("west", monitoredCollInfo, leaseCollInfo, feedOptions, feedHostOptions);
My testing seems to indicate that this works (only one of them processes the changes), but I was wondering if this is a good practice?
The Change Feed Processor library has a load balancing mechanism that will share the load between them as explained here.
Sharing the load means that they will distribute the Leases among themselves. Each Lease represents a Partition Key Range in the collection.
In your scenario, where you are creating 2 hosts for the same monitored collection using the same lease collection, the effect is that they will share the load and each will hold half of the leases and process changes only for those Partition Key Ranges. So half of the partitions will be processed by the host named westand half by the one named east. If the collection is Single Partition, one of them will process all the changes while the other sits doing nothing.
If what you want is for both to process all the changes independently, you have multiple options:
Use a different lease collection for each Host.
Use the LeasePrefix option in the ChangeFeedHostOptions that can be set on the Host creation. That will let you share the Lease collection for both hosts but they will track independently. Just keep in mind that the RU usage in the Lease collection will raise depending on the amount of activity your main collection has.
What is bucket in Riak ? I tried to check documentation, but I was referred to buckets types, but could not grasp the concept of bucket in Riak.
Any explanation? what it is, and why its used?
I don't think there is much more to it than "bucket is a grouping mechanism for data with some configuration assigned to it.
Quoting official docs (emphasis mine):
Buckets are used to define a virtual keyspace for storing Riak
objects. They enable you to define non-default configurations over
that keyspace concerning replication properties and other parameters.
In certain respects, buckets can be compared to tables in relational
databases or folders in filesystems, respectively. From the standpoint
of performance, buckets with default configurations are essentially
“free,” while non-default configurations, defined using bucket types,
will be gossiped around [the ring][glossary read rep] using Riak’s
cluster metadata subsystem.
And from Bucket Types:
Buckets are essentially a flat namespace in Riak. They allow the same
key name to exist in multiple buckets and enable you to apply
configurations across keys.
Bucket : In certain respects, buckets can be compared to tables in relational databases or folders in filesystems
Bucket Types
In Riak 2.0 its new feature
Bucket types allow groups of buckets to share configuration details. This allows Riak users, and administrators, to manage bucket properties more efficiently than in the older configuration systems that were based on bucket properties
Lets dive little more :
The Using Bucket Types documentation covers the implementation, usage, and configuration of Bucket Types in great detail. Throughout the documentation there are code samples (e.g. Using Data Types) including code for creating the bucket types associated with each individual Riak Data Types.
Bucket types are a major improvement over the older system of bucket configuration. The ability to define a bucket configuration, and then change the configuration if necessary, for entire group of buckets, is a powerful new way to consider data modeling. In addition, bucket types are more reliable as buckets that have a given type (or configuration) only have their properties change when the type is changed. Previously, it was possible to change the properties of a bucket only through client requests.
In prior versions of Riak, bucket properties were altered by clients interacting with Riak…in contrast, bucket types are an operational concept. The riak-admin bucket-type interface enables Riak users to manage bucket configurations at an operational level, without recourse to the Riak clients.
In versions of Riak prior to 2.0, all queries were made to a bucket/key pair as in the following example:
curl http://localhost:8098/buckets/my_bucket/keys/my_key
Now in Riak 2.0 with the addition of bucket types, there is an additional namespace on top of buckets and keys. The same bucket name can be associated with completely different data if it is used in accordance with a different bucket type.
curl http://localhost:8098/types/type1/buckets/my_bucket/keys/my_key
curl http://localhost:8098/types/type2/buckets/my_bucket/keys/my_key
If a request is made to a bucket/key pair without a specified bucket type, default will be used in place of a bucket type. The following request are identical.
curl http://localhost:8098/buckets/my_bucket/keys/my_key
curl http://localhost:8098/types/default/my_bucket/keys/my_key
I'm new to non-php web applications and to nosql databases. I was looking for a smart solution matching my application requirements and I was very surprised when I knew that there exist graph based db. Well I found neo4j very nice and very suitable for my application, but as I've already wrote I'm new to this and I have some limitations in understending how it works. I hope you guys could help me to learn.
If I embed neo4j in a servlet program then the database access I create is shared among the different threads of that servet right? so I need to put database creation in init() method and the shutdown in the destroy() right? And it will be thread safe.(every dot is a "right?") But what if I want to create a database shared among the whole application?
I heard that graph databases in general relies on a relational low level. Is that true for neo4j? But if it is then I see an high level interface to the real persistence layer, so what a Connection is in this case? Are there some techniques like connection pooling or these low level things are all managed by neo4j?
In my application I need to join some objects to users and many other classification stuff. any of these object has an unique id (a String). then If some one asks to view some stuff about object having id=QW then I need to load the vertex associate to object.QW. Is this an easy operation for graph datbases?
If I need to manage authentications, so as I receive the couple (usr,pwd) and I need to check whether exists this couple in my graph. Is the same problem as before or there exist some good variation for managing authentications?
If you're coming from PHP world in most cases you're better of running Neo4j in server mode and access it either via REST directly or use a client driver like https://github.com/jadell/neo4jphp. If you still want to embed Neo4j in a servlet environment, the GraphDatabaseService is a shared component, maybe stored within the ServletContext. On a per request (and therefore per-thread) basis you start and commit transactions.
Neo4j is a native graph database. The bare metal persistence layer is optimized for navigating from one node to its neighbors as fast as possible and written by the Neo4j devteam themselves. There are other graph databases out there reusing other persistence technologies for their underlying persistence.
Best thing is to run the Neo4j online course at http://www.neo4j.org/learn/online_course.
see SecurityRules
As the Neo4j is NoSql Graph Database,
Genration of the Unique ID you have to handle using the GUID(with 3.x autonincremented proery also supported for particular label),
as the Neo4j default genrated id is unique but can be realocated to the another object once the first assigned object is deleted,
I am .net developer in my project I used the Neo4j rest api it works well, i will sugesst you to go with that,as it is implemented using async-awit programing pattern, so long running operation you can pass to DB and utilize your web server resources in more prominent way.