I posted this question also in the STM Community Forum but you seldom really get help there
I'm struggling to get some detailed information on how to determine the correct number for the max_attr_records value of the aci_gatt_add_serv() function.
I know that you need 1 for the service itself and then at least 2 for each characteristic, but what else requires an attribute record?
Let say I have the following characteristic
aci_gatt_add_char(hServiceConfiguration, UUID_TYPE_128, uuid, 6,
CHAR_PROP_NOTIFY|CHAR_PROP_READ|CHAR_PROP_WRITE, ATTR_PERMISSION_ENCRY_WRITE,
GATT_NOTIFY_READ_REQ_AND_WAIT_FOR_APPL_RESP,
16, 0 , &hCharTripConf);
What would be the resulting number of attribute records? I came up with 4, but I'm not sure if that's correct.
Further, there seems to be a limit for the total attribute records as I can only add a quite limited amount of services. How is this limit defined?
Related
I tried to run a Gremlin query adding a property to vertex through Gremlin console.
g.V().hasLabel("user").has("status", "valid").property(single, "type", "valid")
I constantly get this error:
org.apache.tinkerpop.gremlin.jsr223.console.RemoteException: Connection to server is no longer active
This error happens after query is running for one or two minutes.
I tried some simple queries like g.V().limit(10) and it works fine.
Since the affected vertex count is more than 4 million, not sure if it is failing due to timeout issue.
I also tried to split it into small batches:
g.V().hasLabel("user").has("status", "valid").hasNot("type").limit(200000).property(single, "type", "valid")
It succeeded for first few batches and started failing again.
Is there any recommendations for updating millions of vertices?
The precise approach you take may vary depending on the backend graph database and storage you are using as well as the capacity of the hardware being used.
The capacity of the hardware where Gremlin Server is running in terms of number of CPUs and most importantly, memory, will also be a factor as will the setting of the query timeout value.
To do this in Gremlin, if you had a way to identify distinct ranges of vertices easily you could split this up into multiple threads each doing batches of updates. If the example you show is representative of your actual need then that is likely not possible in this case.
Likewise some graph databases provide a bulk load capability that is often a good way to do large batch updates but probably not an option here as you need to do essentially a conditional update based on looking at the current presence (or not) of a property.
Without more information about your data model and hardware etc. the best answer is probably to do two things:
Use smaller limits. Maybe try 5K or even just 1K at first and work up from there until you find a reliable sweet spot.
Increase the query timeout settings.
You may need to experiment to find the sweet spot for your environment as the capacity of the hardware will definitely play a role in situations like this as well as how you write your query.
I successfully sent my application logs which is in JSON format to Cloud Watch Logs using the Cloud Watch Logs SDK but I could not understand how to handle the constraints provided by the end point.
Question 1: Documentation says
If you call PutLogEvents twice within a narrow time period using the
same value for sequenceToken, both calls may be successful, or one may
be rejected.
Now what the word "May Be" means, is there no certain outcome?
Question 2:
Restriction is 10,000 inputlogevent are allowed in one batch, this is not too hard to incorporate code wise but there is size constraint too, only 1 MB can be sent in one batch. Does that mean every time I append inputlogevent to logevent collection/batch I need to calculate the size of the batch? Does that mean I need to check for both number of inputlogevent as well as size of overall batch when sending logs? Isn't that too cumbersome?
Question 3
What happens if one of my inputlogevent's 100th character reached 1 MB. Then I cannot simply send incomplete last log with just 100 characters, I would have to comepletely take that inputlogevent out of the picture and send it as a part of other batch?
Question 4
With multiple docker container writing logs, there will be constant change in sequence token, and alot of calls will fail coz sequence token will keep on changing.
Question 5:
In offical POC they have not checked any constraint at all. why so?
PutBatchEvent POC
Am I thinking in the right direction?
Here's my understanding on how to use Cloudwatch log. Hope this helps
Question 1: I believe there is no guarantee due to the nature of distributed systems, your request can land on the same cluster and be rejected or land of different clusters and both of them accept it
Question 2 & Question 3: For me log events should always be small, fast and put pretty rapidly. Most logging frameworks do help you configure ( Batch size for AWS / Number of lines for file logging... ) Take a look at these frameworks
Question 4: Each of your containers ( Or, any parallel application unit) should use and maintain their own sequenceTokens and each of them will get a separate log stream
I've implemented a sensor scheduling problem using OptaPlanner 6.2 that has 1 hard constraint, 1 medium constraint, and 1 soft constraint. The trouble I'm having is that either I can satisfy some of the hard constraints after 30 seconds of or so, and then the solver makes very little progress satisfying them constraints with additional minutes of termination. I don't think the problem is over constrained; I also don't know how to help the local search process significantly increase the score.
My problem is a scheduling one, where I precalculate all possible times (intervals) that a sensor can observe objects prior to solving. I've modeled the problem as follows:
Hard constraint - no intervals can overlap
when
$s1: A( interval!=null,$id: id, $doy : interval.doy, $interval: interval, $sensor: interval.getSensor())
exists A( id > $id, interval!=null, $interval2: interval, $interval2.getSensor() == $sensor, $interval2.getDoy() == $doy, $interval.getStartSlot() <= $interval2.getEndSlot(), $interval.getEndSlot() >= $interval2.getStartSlot() )
then
scoreHolder.addHardConstraintMatch(kcontext,-10000);
Medium constraint - every assignment should have an Interval
when
A(interval==null)
then
scoreHolder.addMediumConstraintMatch(kcontext,-100);
Soft constraint - maximize a property/value in the Interval class
when
$s1: A( interval!=null)
then
scoreHolder.addSoftConstraintMatch(kcontext,-1 * $s1.getInterval().getSomeProperty())
A: entity planning class; each instance is an assignment for a particular object (i.e has an member objectid that corresponds with one in the Interval class)
Interval: planning variable class, all possible intervals (start time, stop time) for the sensor and objects
In A, I restrict access to B instances (intervals) to just those for the object associated with that assignment. For my test case, there are 40000 or so Intervals, but only dozens for each object. There are about 1100 instances of A (so dozens of possible Intervals for each one).
#PlanningVariable(valueRangeProviderRefs = {"intervalRange"},strengthComparatorClass = IntervalStrengthComparator.class, nullable=true)
public Interval getInterval() {
return interval;
}
#ValueRangeProvider(id = "intervalRange")
public List<Interval> getPossibleIntervalList() {
return task.getAvailableIntervals();
}
In my solution class:
//have tried commenting this out since the overall interval list does not apply to all A
#ValueRangeProvider (id="intervalRangeAll")
public List getIntervalList() {
return intervals;
}
#PlanningEntityCollectionProperty
public List<A> getAList() {
return AList;
}
I've reviewed the documentation and tried a lot of things. My problem is somewhat of a cross between the nurserostering and course scheduling examples, which I have looked at. I am using the FIRST_FIT_DECREASING construction heuristic.
What I have tried:
Turning on and off nullable in the planning variable annotation for A.getInterval()
Late acceptance, Tabu, both.
Benchmarking. I wasn't seeing any problems and average
Adding an IntervalChangeFactory as a moveListFactory. Restricting the custom ChangeMove to whether the interval can be accepted or not (i.e. enforcing or not the hard constraint in the IntervalChangeMove.isDoable).
Here is an example one, where most of the hard constraints are not satisfied, but the medium ones are:
[main] INFO org.optaplanner.core.impl.solver.DefaultSolver - Solving started: time spent (202), best score (0hard/-112500medium/0soft), environment mode (REPRODUCIBLE), random (WELL44497B with seed 987654321).
[main] INFO org.optaplanner.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase - Construction Heuristic phase (0) ended: step total (1125), time spent (2296), best score (-9100000hard/0medium/-72608soft).
[main] INFO org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase - Local Search phase (1) ended: step total (92507), time spent (30000), best score (-8850000hard/0medium/-74721soft).
[main] INFO org.optaplanner.core.impl.solver.DefaultSolver - Solving ended: time spent (30000), best score (-8850000hard/0medium/-74721soft), average calculate count per second (5643), environment mode (REPRODUCIBLE).
So I don't understand is why the hard constraints can't be deal with by the search process. Any my calculate count per second has dropped to below 10000 due to all the tinkering I've done.
If it's not due to the Score trap (see docs, this is the first thing to fix), it's probably because it gets stuck in a local optima and there are no moves that go from 1 feasible solution to another feasible solution except those that don't really change much. There are several ways to deal with that:
Add course-grained moves (but still leave the fine-grained moves such as ChangeMove in!). You can add generic course grained moves (such as the pillar moves) or add a custom Move. Don't start making smarter selectors that try to only select feasible moves, that's a bad idea (as it will kill your ACC or will limit your search space). Just mix in course grained moves (= diversification) to complement the fine grained moves (= intensification).
A better domain model might help too. The Project Job Scheduling and Cheap Time scheduling examples have a domain model which naturally leads to a smaller search space while still allowing all feasible solutions.
Increase Tabu list size, LA size or use a hard constraint in the SA starting temperature. But I presume you've tried that with the benchmarker.
Turn on TRACE logging to see optaplanner's decision making. Only look at the part after it reaches the local optimum.
In the future, I 'll also add Ruin&Recreate moves, which will be far less code than custom moves or a better domain model (but it will be less efficient).
Suppose I have a network of N nodes, each with a unique identity (e.g. public key) communicating with a central-server-less protocol (e.g. DHT, Kad). Each node stores a variable V. With reference to e-voting as an easy example, that variable could be the name of a candidate.
Now I want to execute an "aggregation" function on all V variables available in the network. With reference to e-voting example, I want to count votes.
My question is completely theoretical (I have to prove a statement, details at the end of the question), so please don't focus on the e-voting and all of its security aspects. Do I have to say it again? Don't answer me that "a node may have any number identities by generating more keys", "IPs can be traced back" etc. because that's another matter.
Let's see the distributed aggregation only from the privacy point of view.
THE question
Is it possible, in a general case, for a node to compute a function of variables stored at other nodes without getting their value associated to the node's identity? Did researchers design such a privacy-aware distributed algorithm?
I'm only dealing with privacy aspects, not general security!
Current thoughts
My current answer is no, so I say that a central server, obtaining all Vs and processes them without storing, is necessary and there are more legal than technical means to assure that no individual node's data is either stored or retransmitted by the central server. I'm asking to prove that my previous statement is false :)
In the e-voting example, I think it's impossible to count how many people voted for Alice and Bob without asking all the nodes, one by one "Hey, who do you vote for?"
Real case
I'm doing research in the Personal Data Store field. Suppose you store your call log in the PDS and somebody wants to find statistical values about the phone calls (i.e. mean duration, number of calls per day, variance, st-dev) without being revealed neither aggregated nor punctual data about an individual (that is, nobody must know neither whom do I call, nor my own mean call duration).
If a trusted broker exists, and everybody trusts it, that node can expose a double getMeanCallDuration() API that first invokes CallRecord[] getCalls() on every PDS in the network and then operates statistics on all rows. Without the central trusted broker, each PDS exposing double getMyMeanCallDuration() isn't statistically usable (the mean of the means shouldn't be the mean of all...) and most importantly reveals the identity of the single user.
Yes, it is possible. There is work that actually answers your question solving the problem, given some assumptions. Check the following paper: Privacy, efficiency & fault tolerance in aggregate computations on massive star networks.
You can do some computation (for example summing) of a group of nodes at another node without having the participants nodes to reveal any data between themselves and not even the node that is computing. After the computation, everyone learns the result (but no one learns any individual data besides their own which they knew already anyways). The paper describes the protocol and proves its security (and the protocol itself gives you the privacy level I just described).
As for protecting the identity of the nodes to unlink their value from their identity, that would be another problem. You could use anonymous credentials (check this: https://idemix.wordpress.com/2009/08/18/quick-intro-to-credentials/) or something alike to show that you are who you are without revealing your identity (in a distributed scenario).
The catch of this protocol is that you need a semi-trusted node to do the computation. A fully distributed protocol (for example, in a P2P network scenario) is not that easy though. Not because of a lack of a storage (you can have a DHT, for example) but rather you need to replace that trusted or semi-trusted node by the network, and that is when you find your issues, who does it? Why that one and not another one? And what if there is a collusion? Etc...
How about when each node publishes two sets of data x and y, such that
x - y = v
Assuming that I can emit x and y independently, you can correctly compute the overall mean and sum, while every single message is largely worthless.
So for the voting example and candidates X, Y, Z, I might have one identity publishing the vote
+2 -1 +3
and my second identity publishes the vote:
-2 +2 -3
But of course you cannot verify that I didn't vote multiple times anymore.
I would like to give customers a random-looking order number but use 0, 1, 2, ... in the backend. That way the customer gets a non-password-protected order status URL with the encrypted order number and they cannot look at other customers' order numbers by adding or subtracting 1. This might replace a scheme where random order keys are generated, checked for uniqueness among all the previous orders, and re-generated until unique. When the web server gets a request to view an order, it decrypts the order number and retrieves the order.
To keep the URL short, what "good" encryption algorithm has the shortest block size? Is this scheme a good idea? (What if I was encrypting Apple, Inc. employee ids to keep Steve Jobs from asking for Employee #0?)
Observe that all the package tracking websites allow you to track packages without authentication. It would be fine to limit the amount of information shown on the password-free order status page.
Most block ciphers are going to use larger than 32-bit sized blocks, for security reasons.
However, I found one that is made specifically for what you are doing: Skip32
You may consider using a GUID, but perhaps you have reasons you want to avoid that. (Say, your app is done already.)
Edit:
Actually, if a GUID is permissible, then that gives you a range of 128 bits. You could easily use any other block cipher. The benefit to having a larger space (at the cost of long ID strings) is that you'll have much more protection from people guessing IDs. (Not that it an order ID by itself should be a security token anyways...)
If your idea is that just knowing the order number (or URL) is enough to get information on the order then:
The order number space needs to be extremely large, otherwise attackers and/or customers will conceivably search the order space, to see what can be seen.
You should consider that an attacker may launch gradual probing from numerous machines, and may be patient.
Probing the order number space can be mitigated by rate limiting, but that's very hard to apply in a web environment -- it's hard to distinguish your customer access from attacker access.
Consider also that the order number is not much of a secret, people could be sending around in emails; once it's out, it's impossible to retract.
So, for the convenience of one-click check-my-order-without-logging-in, you have created a permanent security risk.
Even if you make the order number space huge, you still have the problem that those URLs are floating around out there, maybe in possession of folks who shouldn't have gotten them.
It would be much much better to require a login session in order to see anything, then only show them the orders they're authorized to see. Then you don't have to worry about hiding order numbers or attackers guessing order numbers, because just the order number isn't enough information to access anything.
Recently I started using Hashids set of small libraries. The idea is to encrypt a number or list of numbers into hashed string like:
12345 => "NkK9"
[683, 94108, 123, 5] => "aBMswoO2UB3Sj"
The libraries are implemented in popular programming languages by various authors. They are also cross-compatible, which means you can encode the number in Python and then decode it JavaScript. It supports salts, alphabet definition and even exclusion of bad words.
Python:
hashids = Hashids(salt="this is my salt")
id = hashids.encode(683, 94108, 123, 5)
JS:
var hashids = new Hashids("this is my salt"),
numbers = hashids.decode("aBMswoO2UB3Sj");
This is not govt proof encryption but totally sufficient for some non-predictable permalink sharing sites.
Issues of whether you should actually be doing this aside, here's a very simple block cipher with a fixed key (since you only seem to need one permutation anyway).
static uint permute(uint id)
{
uint R = id & 0xFFFF, L = (id>>16) ^ (((((R>>5)^(R<<2)) + ((R>>3)^(R<<4))) ^ ((R^0x79b9) + R)) & 0xFFFF);
R ^= ((((L>>5)^(L<<2)) + ((L>>3)^(L<<4))) ^ ((L^0xf372) + L)) & 0xFFFF;
return ((L ^ ((((R>>5)^(R<<2)) + ((R>>3)^(R<<4))) ^ ((R^0x6d2b) + R))) << 16) | R;
}
Skip32 is much better as far as 32-bit block ciphers go, but it's a bit heavyweight when three (long) lines would do. :-)
I prototyped this idea using Blowfish, a block cipher with 64-bit blocks.
I don't think this scheme is that great of an idea. Why aren't you verifying that a user is logged in and has access to view a specified order?
If you REALLY want to just have all orders out there without any authentication, a GUID would be best.
Or, you could try to come up with order numbers prefixed with something about the customer. Like (PhoneNumber)(1...100)
To meet the requirement you could simply use a hash such as SHA-1 or MD5 on your indexes. These will provide the adequate security you require.
To bring down the size you can change to a different encoding; such as 64 bit.
I'd also very strongly recommend insist on using a salt, otherwise the hash values could easily be broken.