How to calculate mask from TCP port range - tcp

I'm trying to wrap my head around calculating a mask value which can denote L4 port range. Suppose the input range is 500-1000. Given the mask and lower value, how can the upper value be calculated?

Port masking typically follows the format: <port>/<mask>. An example of port masking can be found in the OvS ovs-ofctl manual:
https://www.openvswitch.org/support/dist-docs-2.5/ovs-ofctl.8.html
Excerpt:
...
tcp_src=port/mask
tcp_dst=port/mask
udp_src=port/mask
udp_dst=port/mask
sctp_src=port/mask
sctp_dst=port/mask
Bitwise match on TCP (or UDP or SCTP) source or destination
port. The port and mask are 16-bit numbers written in decimal
or in hexadecimal prefixed by 0x. Each 1-bit in mask requires
that the corresponding bit in port must match. Each 0-bit in
mask causes the corresponding bit to be ignored.
Bitwise matches on transport ports are rarely useful in isola‐
tion, but a group of them can be used to reduce the number of
flows required to match on a range of transport ports. For
example, suppose that the goal is to match TCP source ports 1000
to 1999, inclusive. One way is to insert 1000 flows, each of
which matches on a single source port. Another way is to look
at the binary representations of 1000 and 1999, as follows:
01111101000
11111001111
and then to transform those into a series of bitwise matches
that accomplish the same results:
01111101xxx
0111111xxxx
10xxxxxxxxx
110xxxxxxxx
1110xxxxxxx
11110xxxxxx
1111100xxxx
which become the following when written in the syntax required
by ovs-ofctl:
tcp,tcp_src=0x03e8/0xfff8
tcp,tcp_src=0x03f0/0xfff0
tcp,tcp_src=0x0400/0xfe00
tcp,tcp_src=0x0600/0xff00
tcp,tcp_src=0x0700/0xff80
tcp,tcp_src=0x0780/0xffc0
tcp,tcp_src=0x07c0/0xfff0
Only Open vSwitch 1.6 and later supports bitwise matching on
transport ports.
Like the exact-match forms described above, the bitwise match
forms apply only when dl_type and nw_proto specify TCP or UDP or
SCTP.
The following python3 mask_range function returns a list of port masks given a start and end port number.
# port_mask.py
LIMIT = 65535
def max_port(port, mask):
xid = LIMIT - mask
nid = port & mask
return nid + xid
def port_mask(port, end):
bit = 1
mask = LIMIT
test_mask = LIMIT
net = port & LIMIT
max_p = max_port(net, LIMIT)
while net and max_p < end:
net = port & test_mask
if net < port:
break
max_p = max_port(net, test_mask)
if max_p <= end:
mask = test_mask
test_mask -= bit
bit <<= 1
return mask
def mask_range(start, end):
port_masks = []
if end <= start or end > LIMIT:
exit(1)
port = start
while port <= end:
mask = port_mask(port, end)
port_masks.append(f'{hex(port)}/{hex(mask)}')
port = max_port(port, mask) + 1
return port_masks
Example:
# test.py
from port_mask import mask_range
if __name__ == '__main__':
masks = mask_range(1000, 1999)
print(f'1000-1999: {masks}')
Outputs:
1000-1999: ['0x3e8/0xfff8', '0x3f0/0xfff0', '0x400/0xfe00', '0x600/0xff00', '0x700/0xff80', '0x780/0xffc0', '0x7c0/0xfff0']

Related

Golang &(*(&v)) semantic

I faced an issue today and was able to find it and fix it quickly but do not fully understand why golang semantic is like that.
I'm using Go 1.10.
package main
import "fmt"
type T struct {
V int
}
var testT = T{}
func main() {
t := &(*(&testT))
t.V = 4
fmt.Println(t, testT) // test.V == t.V -> t == &testT
t1 := &testT
t2 := *t1
t3 := &t2
t3.V = 5
fmt.Println(t3, testT) // t3.V == 4 and test.T == 4 -> t3 != &testT
}
Output
&{4} {4}
&{5} {4}
I was expecting not to be equals to &testT so have the same semantics as with t3, but instead I see that &(*(&)) sequence do not have the same semantic if I store intermediary results in variables
My question
What is the reason for that behaviour?
When you do this:
t1 := &testT
t2 := *t1
t3 := &t2
t3.V = 5
You take the address of testT, store it in t1. Then in the next line a new, distinct variable t2 is created which will have a different memory space and address than that of t1 or testT. Then t3 will store the address of this new, distinct variable, which is independent of t1 or testT.
When you do this:
t := &(*(&testT))
You take the address of testT, then you dereference the pointer (you get testT "back"), then you again take the address of this value which will be the address of testT, there is no new variable created. So t will point to testT.
This is normal and logical, nothing surprising is in it. Relevant section from the spec: Address operators:
For an operand x of pointer type *T, the pointer indirection *x denotes the variable of type T pointed to by x.
So &testT is the address of the variable testT, and *(&testT) will give you back the testT variable. Taking its address again will be identical to &testT.
What may hint against this is taking the address of a composite literal. Spec: Composite literals:
Taking the address of a composite literal generates a pointer to a unique variable initialized with the literal's value.
When you take the address of a composite literal (e.g. &image.Point{}), that does create a new, anonymous variable under the hood, and the address of that anonymous variable will be the result of the expression. But taking the address of a variable does not create a new variable.

Hash collisions for golang built-in map and string keys?

I wrote this function to generate random unique id's for my test cases:
func uuid(t *testing.T) string {
uidCounterLock.Lock()
defer uidCounterLock.Unlock()
uidCounter++
//return "[" + t.Name() + "|" + strconv.FormatInt(uidCounter, 10) + "]"
return "[" + t.Name() + "|" + string(uidCounter) + "]"
}
var uidCounter int64 = 1
var uidCounterLock sync.Mutex
In order to test it, I generate a bunch of values from it in different goroutines, send them to the main thread, which puts the result in a map[string]int by doing map[v] = map[v] + 1. There is no concurrent access to this map, it's private to the main thread.
var seen = make(map[string]int)
for v := range ch {
seen[v] = seen[v] + 1
if count := seen[v]; count > 1 {
fmt.Printf("Generated the same uuid %d times: %#v\n", count, v)
}
}
When I just cast the uidCounter to a string, I get a ton of collisions on a single key. When I use strconv.FormatInt, I get no collisions at all.
When I say a ton, I mean I just got 1115919 collisions for the value [TestUuidIsUnique|�] out of 2227980 generated values, i.e. 50% of the values collide on the same key. The values are not equal. I do always get the same number of collisions for the same source code, so at least it's somewhat deterministic, i.e. probably not related to race conditions.
I'm not surprised integer overflow in a rune would be an issue, but I'm nowhere near 2^31, and that wouldn't explain why the map thinks 50% of the values have the same key. Also, I wouldn't expect a hash collision to impact correctness, just performance, since I can iterate over the keys in a map, so the values are stored there somewhere.
In the output, all runes printed are 0xEFBFBD. It's the same number of bits as the highest valid unicode code point, but that doesn't really match either.
Generated the same uuid 2 times: "[TestUuidIsUnique|�]"
Generated the same uuid 3 times: "[TestUuidIsUnique|�]"
Generated the same uuid 4 times: "[TestUuidIsUnique|�]"
Generated the same uuid 5 times: "[TestUuidIsUnique|�]"
...
Generated the same uuid 2047 times: "[TestUuidIsUnique|�]"
Generated the same uuid 2048 times: "[TestUuidIsUnique|�]"
Generated the same uuid 2049 times: "[TestUuidIsUnique|�]"
...
What's going on here? Did the go authors assume that hash(a) == hash(b) implies a == b for strings? Or am I just missing something silly? go test -race isn't complaining either.
I'm on macOS 10.13.2, and go version go1.9.2 darwin/amd64.
String conversion of an invalid rune returns a string containing the unicode replacement character: "�".
Use the strconv package to convert an integer to text.

openresty: convert int64 to string

I am using openresty/1.7.7.2 with Lua 5.1.4. I am receiving int64 in request and i have it's string format saved in DB (can't change DB schema or request format). I am not able to match both of them.
local i = 913034578410143848 --request
local p = "913034578410143848" -- stored in DB
print(p==tostring(i)) -- return false
print(i%10) -- return 0 ..this also doesn't work
Is there a way to convert int64 to string and vice versa if possible?
update:
I am getting i from protobuf object. proto file describe i as int64. I am using pb4lua library for protobuf.
ngx.req.read_body()
local body = ngx.req.get_body_data()
local request, err = Request:load(body)
local i = request.id
Lua 5.1 can not represent integer values larger than 2^53.
Number literal not excaption. So you can not just write
local i = 913034578410143848.
But LuaJIT can represent int64 values like boxed values.
Also there exists Lua libraries to make deal with large numbers.
E.g. bn library.
I do not know how your pb4lua handle this problem.
E.g. lua-pb library uses LuaJIT boxed values.
Also it provide way to specify user defined callback to make int64 value.
First I suggest figure out what real type of your i value (use type function).
All other really depends on it.
If its number then I think pb4lua just loose some info.
May be it just returns string type so you can just compare it as string.
If it provide LuaJIT cdata then this is basic function to convert string
to int64 value.
local function to_jit_uint64(str)
local v = tonumber(string.sub(str, 1, 9))
v = ffi.new('uint64_t', v)
if #str > 9 then
str = string.sub(str, 10)
v = v * (10 ^ #str) + tonumber(str)
end
return v
end

One processing conduit, 2 IO sources of the same type

In my GHC Haskell application utilizing stm, network-conduit and conduit, I have a strand for each socket which is forked automatically using runTCPServer. Strands can communicate with other strands through the use of a broadcasting TChan.
This showcases how I would like to set up the conduit "chain":
So, what we have here is two sources (each bound to helper conduits which) which produce a Packet object which encoder will accept and turn into ByteString, then send out the socket. I've had a great amount of difficulty with the efficient (performance is a concern) fusing of the two inputs.
I would appreciate if somebody could point me in the right direction.
Since it would be rude of me to post this question without making an attempt, I'll put what I've previously tried here;
I've written/cherrypicked a function which (blocking) produces a Source from a TMChan (closeable channel);
-- | Takes a generic type of STM chan and, given read and close functionality,
-- returns a conduit 'Source' which consumes the elements of the channel.
chanSource
:: (MonadIO m, MonadSTM m)
=> a -- ^ The channel
-> (a -> STM (Maybe b)) -- ^ The read function
-> (a -> STM ()) -- ^ The close/finalizer function
-> Source m b
chanSource ch readCh closeCh = ConduitM pull
where close = liftSTM $ closeCh ch
pull = PipeM $ liftSTM $ readCh ch >>= translate
translate = return . maybe (Done ()) (HaveOutput pull close)
Likewise, a function to transform a Chan into a sink;
-- | Takes a stream and, given write and close functionality, returns a sink
-- which wil consume elements and broadcast them into the channel
chanSink
:: (MonadIO m, MonadSTM m)
=> a -- ^ The channel
-> (a -> b -> STM()) -- ^ The write function
-> (a -> STM()) -- ^ The close/finalizer function
-> Sink b m ()
chanSink ch writeCh closeCh = ConduitM sink
where close = const . liftSTM $ closeCh ch
sink = NeedInput push close
write = liftSTM . writeCh ch
push x = PipeM $ write x >> return sink
Then mergeSources is straightforward; fork 2 threads (which I really don't want to do, but what the heck) which can put their new items into the one list which I then produce a source of;
-- | Merges a list of 'Source' objects, sinking them into a 'TMChan' and returns
-- a source which consumes the elements of the channel.
mergeSources
:: (MonadIO m, MonadBaseControl IO m, MonadSTM m)
=> [Source (ResourceT m) a] -- ^ The list of sources
-> ResourceT m (Source (ResourceT m) a)
mergeSources sx = liftSTM newTMChan >>= liftA2 (>>) (fsrc sx) retn
where push c s = s $$ chanSink c writeTMChan closeTMChan
fsrc x c = mapM_ (\s -> resourceForkIO $ push c s) x
retn c = return $ chanSource c readTMChan closeTMChan
While I was successful in making these functions typecheck, I was unsuccessful in getting any utilization of these functions to typecheck;
-- | Helper which represents a conduit chain for each client connection
serverApp :: Application SessionIO
serverApp appdata = do
use ssBroadcast >>= liftIO . atomically . dupTMChan >>= assign ssBroadcast
-- appSource appdata $$ decoder $= protocol =$= encoder =$ appSink appdata
mergsrc $$ protocol $= encoder =$ appSink appdata
where chansrc = chanSource (use ssBroadcast) readTMChan closeTMChan
mergsrc = mergeSources [appSource appdata $= decoder, chansrc]
-- | Structure which holds mutable information for clients
data SessionState = SessionState
{ _ssBroadcast :: TMChan Packet -- ^ Outbound packet broadcast channel
}
makeLenses ''SessionState
-- | A transformer encompassing both SessionReader and SessionState
type Session m = ReaderT SessionReader (StateT SessionState m)
-- | Macro providing Session applied to an IO monad
type SessionIO = Session IO
I see this method as being flawed anyhow -- there are many intermediate lists and conversions. This can not be good for performance. Seeking guidance.
PS. From what I can understand, this is not a duplicate of; Fusing conduits with multiple inputs , as in my situation both sources produce the same type and I don't care from which source the Packet object is produced, as long as I'm not waiting on one while another has objects ready to be consumed.
PPS. I apologize for the usage (and therefore requirement of knowledge) of Lens in example code.
I don't know if it's any help, but I tried to implement Iain's suggestion and made a variant of mergeSources' that stops as soon as any of the channels does:
mergeSources' :: (MonadIO m, MonadBaseControl IO m)
=> [Source (ResourceT m) a] -- ^ The sources to merge.
-> Int -- ^ The bound of the intermediate channel.
-> ResourceT m (Source (ResourceT m) a)
mergeSources' sx bound = do
c <- liftSTM $ newTBMChan bound
mapM_ (\s -> resourceForkIO $
s $$ chanSink c writeTBMChan closeTBMChan) sx
return $ sourceTBMChan c
(This simple addition is available here).
Some comments to your version of mergeSources (take them with a grain of salt, it can be I didn't understand something well):
Using ...TMChan instead of ...TBMChan seems dangerous. If the writers are faster than the reader, your heap will blow. Looking at your diagram it seems that this can easily happen, if your TCP peer doesn't read data fast enough. So I'd definitely use ...TBMChan with perhaps large but limited bound.
You don't need the MonadSTM m constraint. All STM stuff is wrapped into IO with
liftSTM = liftIO . atomically
Maybe this will help you slightly when using mergeSources' in serverApp.
Just a cosmetic issue, I found
liftSTM newTMChan >>= liftA2 (>>) (fsrc sx) retn
very hard to read due to its use of liftA2 on the (->) r monad. I'd say
do
c <- liftSTM newTMChan
fsrc sx c
retn c
would be longer, but much easier to read.
Could you perhaps create a self-contained project where it would be possible to play with serverApp?

Translate binary string to mathematical expression

I've been experimenting with genetic algorithms as of late and now I'd like to build mathematical expressions out of the genomes (For easy talk, its to find an expression that matches a certain outcome).
I have genomes consisting of genes which are represented by bytes, One genome can look like this: {12, 127, 82, 35, 95, 223, 85, 4, 213, 228}. The length is predefined (although it must fall in a certain range), neither is the form it takes. That is, any entry can take any byte value.
Now the trick is to translate this to mathematical expressions. It's fairly easy to determine basic expressions, for example: Pick the first 2 values and treat them as products, pick the 3rd value and pick it as an operator ( +, -, *, /, ^ , mod ), pick the 4th value as a product and pick the 5th value as an operator again working over the result of the 3rd operator over the first 2 products. (or just handle it as an postfix expression)
The complexity rises when you start allowing priority rules. Now when for example the entry under index 2 represents a '(', your bound to have a ')' somewhere further on except for entry 3, but not necessarily entry 4
Of course the same goes for many things, you can't end up with an operator at the end, you can't end up with a loose number etc.
Now i can make a HUGE switch statement (for example) taking in all the possible possibilities but this will make the code unreadable. I was hoping if someone out there knows a good strategy of how to take this one on.
Thanks in advance!
** EDIT **
On request: The goal I'm trying to achieve is to make an application which can resolve a function for a set of numbers. As for the example I've given in the comment below: {4, 11, 30} and it might come up with the function (X ^ 3) + X
Belisarius in a comment gave a link to an identical topic: Algorithm for permutations of operators and operands
My code:
private static double ResolveExpression(byte[] genes, double valueForX)
{
// folowing: https://stackoverflow.com/questions/3947937/algorithm-for-permutations-of-operators-and-operands/3948113#3948113
Stack<double> operandStack = new Stack<double>();
for (int index = 0; index < genes.Length; index++)
{
int genesLeft = genes.Length - index;
byte gene = genes[index];
bool createOperand;
// only when there are enough possbile operators left, possibly add operands
if (genesLeft > operandStack.Count)
{
// only when there are at least 2 operands on the stack
if (operandStack.Count >= 2)
{
// randomly determine wether to create an operand by threating everything below 127 as an operand and the rest as an operator (better then / 2 due to 0 values)
createOperand = gene < byte.MaxValue / 2;
}
else
{
// else we need an operand for sure since an operator is illigal
createOperand = true;
}
}
else
{
// false for sure since there are 2 many operands to complete otherwise
createOperand = false;
}
if (createOperand)
{
operandStack.Push(GeneToOperand(gene, valueForX));
}
else
{
double left = operandStack.Pop();
double right = operandStack.Pop();
double result = PerformOperator(gene, left, right);
operandStack.Push(result);
}
}
// should be 1 operand left on the stack which is the ending result
return operandStack.Pop();
}
private static double PerformOperator(byte gene, double left, double right)
{
// There are 5 options currently supported, namely: +, -, *, /, ^ and log (math)
int code = gene % 6;
switch (code)
{
case 0:
return left + right;
case 1:
return left - right;
case 2:
return left * right;
case 3:
return left / right;
case 4:
return Math.Pow(left, right);
case 5:
return Math.Log(left, right);
default:
throw new InvalidOperationException("Impossible state");
}
}
private static double GeneToOperand(byte gene, double valueForX)
{
// We only support numbers 0 - 9 and X
int code = gene % 11; // Get a value between 0 and 10
if (code == 10)
{
// 10 is a placeholder for x
return valueForX;
}
else
{
return code;
}
}
#endregion // Helpers
}
Use "post-fix" notation. That handles priorities very nicely.
Post-fix notation handles the "grouping" or "priority rules" trivially.
For example, the expression b**2-4*a*c, in post-fix is
b, 2, **, 4, a, *, c, *, -
To evaluate a post-fix expression, you simply push the values onto a stack and execute the operations.
So the above becomes something approximately like the following.
stack.push( b )
stack.push( 2 )
x, y = stack.pop(), stack.pop(); stack.push( y ** x )
stack.push( 4 )
stack.push( a )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
stack.push( c )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
x, y = stack.pop(), stack.pop(); stack.push( y - x )
To make this work, you need to have to partition your string of bytes into values and operators. You also need to check the "arity" of all your operators to be sure that the number of operators and the number of operands balances out. In this case, the number of binary operators + 1 is the number of operands. Unary operators don't require extra operands.
As ever with GA a large part of the solution is choosing a good representation. RPN (or post-fix) has already been suggested. One concern you still have is that your GA might throw up expressions which begin with operators (or mismatch operators and operands elsewhere) such as:
+,-,3,*,4,2,5,+,-
A (small) part of the solution would be to define evaluations for operand-less operators. For example one might decide that the sequence:
+
evaluates to 0, which is the identity element for addition. Naturally
*
would evaluate to 1. Mathematics may not have figured out what the identity element for division is, but APL has.
Now you have the basis of an approach which doesn't care if you get the right sequence of operators and operands, but you still have a problem when you have too many operands for the number of operators. That is, what is the intepretation of (postfix following) ?
2,4,5,+,3,4,-
which (possibly) evaluates to
2,9,-1
Well, now you have to invent your own convention if you want to reduce this to a single value. But you could adopt the convention that the GA has created a vector-valued function.
EDIT: response to OP's comment ...
If a byte can represent either an operator or an operand, and if your program places no restrictions on where a genome can be split for reproduction, then there will always be a risk that the offspring represents an invalid sequence of operators and operands. Consider, instead of having each byte encode either an operator or an operand, a byte could encode an operator+operand pair (you might run out of bytes quickly so perhaps you'd need to use two bytes). Then a sequence of bytes might be translated to something like:
(plus 1)(plus x)(power 2)(times 3)
which could evaluate, following a left-to-right rule with a meaningful interpretation for the first term, to 3((x+1)^2)

Resources