Let's say you have an algebraic data type, and you work with huge data structures, would it be more efficient to write a writing function like:
val writer: out_channel -> mygadttype -> unit
That recursively visits the algebraic data type and writes every node, or something like:
val print: Format.formatter -> mygadttype -> unit
And then use Format.asprintf to write into a string and then write that string to the file.
I dont't have an estimation on how big the data structure will be, but performance wise (and memory wise?), which would be more efficient? The goal is for the result to be human readable so no marshalling
You can print directly to the file with Format by using Format.formatter_of_out_channel. It is unclear if the overhead of Format will matter without more information or benchmarking.
Related
I'm calling a grpc service with python that responds with about a million iterator objects. At the moment I'm using list comprehension to access the 1 attribute I need from the iterators:
stub = QueryStub(grpc_channel)
return [object.attribute_i_need for object in stub.ResponseMethod]
To access around a million attributes takes a while (around 2-3 minutes). Is there a way I can speed this up? Interested to know how people process such scenarios faster. I have also tried using list(stub.ResponseMethod) and [*stub.ResponseMethod] to unpack or retrieve the objects faster, however these approaches take even longer since the iterator objects have a lot of other metadata I don't need and its storing them.
PS I don't necessarily need to store the attributes in memory, accessing them faster is what I'm trying to achieve
According to this documentation, I would say you need to try two things:
working with asyncio API (if that's not already done) by doing something like:
async def run(stub: QueryStub) -> None:
async for object in stub.ResponseMethod(empty_pb2.Empty()):
print(object.attribute_i_need)
note that the Empty() is just because I do not know your API definition.
second would be to try the experimental feature SingleThreadedUnaryStream (if applicable to your case) by doing:
with grpc.insecure_channel(target='localhost:50051', options=[(grpc.experimental.ChannelOptions.SingleThreadedUnaryStream, 1)]) as channel:
What I tried
I don't really know if it covers your use case (you can give me more info on that and I'll update), but here is what I tried:
I have a schema like:
service TestService {
rpc AMethod(google.protobuf.Empty) returns (stream Test) {} // stream is optional, I tried with both
}
message Test {
repeated string message = 1;
repeated string message2 = 2;
repeated string message3 = 3;
repeated string message4 = 4;
repeated string message5 = 5;
repeated string message6 = 6;
repeated string message7 = 7;
repeated string message8 = 8;
repeated string message9 = 9;
repeated string message10 = 10;
repeated string message11 = 11;
}
on the server side (with asyncio) I have
async def AMethod(self, request: empty_pb2.Empty, unused_context) -> AsyncIterable[Test]:
test = Test()
for i in range(10):
test.message.append(randStr())
# repeat append for every other field or not
for i in range(1000000):
yield test
where randStr creates a random string of length 10000 (totally arbitrary).
and on the client side (with SingleThreadedUnaryStream and asyncio)
async def run(stub: TesterStub) -> None:
tests = stub.AMethod(empty_pb2.Empty())
async for test in tests:
print(test.message)
Benchmark
Note: This might vary depending on your machine
For the example with only one repeated field filled, I get an average (ran it 3 times) of 77 sec.
And for all the fields being filled, it is really long so I tried providing smaller strings (10 in length) and it still takes too long. I think the mix of repeated and stream is not a good idea. I also tried without stream and I get an average (run 3 times) of 45 sec.
My conclusion
This is really slow if all the repeated fields all filled with data and this is ok-ish when only one is filled. But overall I think asyncio helps.
Furthermore, this documentation explains that Protocol Buffers are not designed to handle large messages, however Protocol Buffers are great for handling individual messages within a large data set.
I would suggest that, if I got your schema right, you rethink the API design because that seems to be not optimal.
but once again I might have not understand the schema properly.
I would advise you to loop through the object using a for loop if you haven't already done it anyway. But something needs to be said about that:
It is important to realize that everything you put in a loop gets executed for every loop iteration. They key to optimizing loops is to minimize what they do. Even operations that appear to be very fast will take a long time if the repeated many times. Executing an operation that takes 1 microsecond a million times will take 1 second to complete.
Don't execute things like len(list) inside a loop or even in its starting condition.
example
a = [i for i in range(1000000)]
length = len(a)
for i in a:
print(i - length)
is much much faster than
a = [i for i in range(1000000)]
for i in a:
print(i - len(a))
You can also use techniques like Loop Unrolling(https://en.wikipedia.org/wiki/Loop_unrolling) which is loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space-time tradeoff.
Using functions like map, filter, etc. instead of explicit for loops can also provide some performance improvements.
I am parallelizing a certain dynamic programming problem using AVX2/SSE instructions.
In the main iteration of my calculation, I calculate column in matrix where each cell is a structure of AVX2 registers (_m256i). I use values from the previous matrix column as input values for calculating the current column. Columns can be big, so what I do is I have an array of structures (on stack), where each structure has two _m256i elements.
Structure:
struct Cell {
_m256i first;
_m256i second;
};
An then I have array like this: Cell prevColumn [N]. N will tipically be few hundreds.
I know that _m256i basically represents an avx2 register, so I am wondering how should I think about this array, how does it behave, since N is much larger than 16 (which is number of avx registers)? Is it a good practice to create such an array, or is there some better approach that i should use when storing a lot of _m256i values that are going to be reused real soon?
Also, is there any aligning I should be doing with this structures? I read a lot about aligning, but I am still not sure how and when to do it exactly.
It's better to structure your code to do everything it can with a value before moving on. Small buffers that fit in L1 cache aren't going to be too bad for performance, but don't do that unless you need to.
I think it's more typical to write your code with buffers of int [] type, rather than __m256i type, but I'm not sure. Either way works, and should get the compile to generate efficient code. But the int [] way means less code has to be different for the SSE, AVX2, and AVX512 version. And it might make it easier to examine things with a debugger, to have your data in an array with a type that will get the data formatted nicely.
As I understand it, the load/store intrinsics are partly there as a cast between _m256i and int [], since AVX doesn't fault on unaligned, just slows down on cacheline boundaries. Assigning to / from an array of _m256i should work fine, and generate load/store instructions where needed, otherwise generate vector instructions with memory source operands. (for more compact code and fewer fused-domain uops.)
I'm working on a compiler for a programming language and I have a good representation of the source language in the form of an Abstract Syntax Tree (AST). I tried generating backend code directly by traversing the AST, but that's not working well. In general, think of mapping an AST for a C-like language down to an assembly-like language. I now think that there is some phase that I'm missing to go from the AST to backend code. The problem is that going from the AST to the next representation (say, 3-address code) is rought. It feels like there is a step in between that I'm missing.
[source lang] -> [lex/parse]-> [AST] -> [semantic analysis] -> [?] -> [backend code]
This is what I've come up with while thinking about it:
1) Transform the AST of the source language to an AST that represents the backend. This means that I would need to have 2 different ASTs. Then, output the backend from the transformed AST.
2) Transform the AST to a different type of data structure, and generated backend code based on this other structure. I'm not sure what this other struct would be.
3) Traverse the AST in a different way (from the way used to pretty print it) to generate the backend code. This is how I tried doing it first but it doesn't seem right; I feels like a hackish way to go about it.
What are my options go to from AST to backend code?
Note that I'm not asking what kinds of representations the AST could be turned into, such as 3-address code. For example, I understand that a C addition like so:
x = a + b + c
could be turned into 3-address like so:
t1 = add a, b
t2 = add t1, c
set x, t2
I am asking for techniques/guidance/experience on how to do it.
To give an example of the type of answer that I'm looking for would be:
Question: What steps can I take to perform semantic analysis on the source lang?
Answer: Parse the language into an AST and traverse the AST to perform the semantic checks.
One can represent the program any way one likes; you can build compilers completely using trees. The purpose of a representation is to make it easy to collect certain kinds of facts; the representation serves to make collection/processing of those specific facts easier. Thus one expects the representation of the program to change at different stages of compilation, depending on what the compiler is trying to achieve in that stage.
One typical scheme is to translate programs through these representations to produce final code:
* ASTs
* Triples
* Abstract machine instructions
* Concrete machine instructions
The fact that you don't seem to know this, means you haven't done your homework. This is pretty well described in compiler books. Go read one.
I have a function that logs into a sensor via telnet/pexpect and acts as a data collector.
I don't want to rewrite the part that logs in, grabs the data, and parses out relevant output from it (pexpect). However, I need to do different things with this code and the data it gathers
For example, I may need to:
Time until the first reading is returned
Take the average of a varying number of sensor readings
Return the status (which is one piece of data) or return the sensor
reading (which is a separate piece of
data) from the output
Ultimately, it should still login and parse output the same and I want to use one code block for that part.
Higher up in the code, it's being used instantaneously. When I call it, I know what type of data I need to gather and that's that. Constructing objects is too clumsy.
My usage has outstripped adding more arguments to a single function.
Any ideas?
This is such a common situation, I'm surprised you haven't already done what everyone else does.
Refactor your function to decompose it into smaller functions.
Functions are objects, and can be passed as arguments to other functions.
def step1():
whatever
def step2():
whatever
def step2_alternative():
whatever
def original( args ):
step1()
step2()
def revised( args, step2_choice ):
step1()
step2_choice()
Now you can do this.
revised( step2 )
revised( step2_alternative )
It's just OO programming with function objects.
Could you pass a data processing function to the function you described as an argument?
That may be more or less elegant, depending on your taste.
(Forgive me: I know nothing about pexpect, and I may even have misunderstood your question!)
I read the mapreduce at http://en.wikipedia.org/wiki/MapReduce ,understood the example of how to get the count of a "word" in many "documents". However I did not understand the following line:
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
Can someone elaborate on the difference again(MapReduce framework VS map and reduce combination)? Especially, what does the reduce functional programming do?
Thanks a great deal.
The main difference would be that MapReduce is apparently patentable. (Couldn't help myself, sorry...)
On a more serious note, the MapReduce paper, as I remember it, describes a methodology of performing calculations in a massively parallelised fashion. This methodology builds upon the map / reduce construct which was well known for years before, but goes beyond into such matters as distributing the data etc. Also, some constraints are imposed on the structure of data being operated upon and returned by the functions used in the map-like and reduce-like parts of the computation (the thing about data coming in lists of key/value pairs), so you could say that MapReduce is a massive-parallelism-friendly specialisation of the map & reduce combination.
As for the Wikipedia comment on the function being mapped in the functional programming's map / reduce construct producing one value per input... Well, sure it does, but here there are no constraints at all on the type of said value. In particular, it could be a complex data structure like perhaps a list of things to which you would again apply a map / reduce transformation. Going back to the "counting words" example, you could very well have a function which, for a given portion of text, produces a data structure mapping words to occurrence counts, map that over your documents (or chunks of documents, as the case may be) and reduce the results.
In fact, that's exactly what happens in this article by Phil Hagelberg. It's a fun and supremely short example of a MapReduce-word-counting-like computation implemented in Clojure with map and something equivalent to reduce (the (apply + (merge-with ...)) bit -- merge-with is implemented in terms of reduce in clojure.core). The only difference between this and the Wikipedia example is that the objects being counted are URLs instead of arbitrary words -- other than that, you've got a counting words algorithm implemented with map and reduce, MapReduce-style, right there. The reason why it might not fully qualify as being an instance of MapReduce is that there's no complex distribution of workloads involved. It's all happening on a single box... albeit on all the CPUs the box provides.
For in-depth treatment of the reduce function -- also known as fold -- see Graham Hutton's A tutorial on the universality and expressiveness of fold. It's Haskell based, but should be readable even if you don't know the language, as long as you're willing to look up a Haskell thing or two as you go... Things like ++ = list concatenation, no deep Haskell magic.
Using the word count example, the original functional map() would take a set of documents, optionally distribute subsets of that set, and for each document emit a single value representing the number of words (or a particular word's occurrences) in the document. A functional reduce() would then add up the global counts for all documents, one for each document. So you get a total count (either of all words or a particular word).
In MapReduce, the map would emit a (word, count) pair for each word in each document. A MapReduce reduce() would then add up the count of each word in each document without mixing them into a single pile. So you get a list of words paired with their counts.
MapReduce is a framework built around splitting a computation into parallelizable mappers and reducers. It builds on the familiar idiom of map and reduce - if you can structure your tasks such that they can be performed by independent mappers and reducers, then you can write it in a way which takes advantage of a MapReduce framework.
Imagine a Python interpreter which recognized tasks which could be computed independently, and farmed them out to mapper or reducer nodes. If you wrote
reduce(lambda x, y: x+y, map(int, ['1', '2', '3']))
or
sum([int(x) for x in ['1', '2', '3']])
you would be using functional map and reduce methods in a MapReduce framework. With current MapReduce frameworks, there's a lot more plumbing involved, but it's the same concept.