libxml2 taking huge memory(like 8GB for 600MB doc) for parsing - libxml2

I am using libxml2 for parsing xml doc and i see using valgrind that xmlParseFile takes like 8GB for parsing 600MB file. i found on the libxml2 site that it usually takes 4 times the xml doc size but for me it takes lot more than that.
Can anyone point what could be the issue ?

It depends on the structure of the document, but a factor of 10-20 is nothing unusual. The "4 times" quote probably refers to 32-bit systems. On a 64-bit system, the overhead is larger, typically about 120-150 bytes for every element and attribute, in addition to the actual character data.
For data this large, XML is often the wrong choice. But using one of libxml2's streaming parser interfaces (SAX or xmlreader), you should be able parse such documents with much lower memory overhead.

Related

Methods to discourage reverse engineering of an opencl kernel

I am preparing my opencl accelerated toolkit for release. Currently, I compile my opencl kernels into binaries targeted for a particular card.
Are there other ways of discouraging reverse engineering? Right now, I have many opencl binaries in my release folder, one for each kernel. Would it be better to splice these binaries into one single binary, or even add them into the host binary, and somehow read them in using a special offset ?
OpenCL 2.0 and SPIR-V can be used for this, but is not available on all platforms yet.
Encode binaries. Keep keys in server and have clients request it at time of usage. Ofcourse keys should be encoded too,( using a variable value such as time of server maybe). Then decode in client to use as binary kernel.
I'm not encode pro but I would use multiple algorithms applied multiple times to make it harder, if they are crunchable in several months(needed for new version update of your GPGPU software for example) when they are alone. But simple unknown algorithm of your own such as reversing order of bits of all data (1st bit goes nth position, nth goes 1st) should make it look hard for level-1 hackers.
Warning: some profiling tools could get its codes in "run-time" so you should add many maybe hundreds of trivial kernels without performance penalty to hide it in a crowded timeline or you could disable profiling in kernel options or you could add a deliberate error maybe some broken events in queues then restart so profiler cannot initiate.
Maybe you could obfuscate final C99 code so it becomes unreadable by humans. If can, he/she doesn't need hacking in first place.
Maybe most effectively, don't do anything, just buy copyrights of your genuine algorithm and show it in a txt so they can look but can not dare copying for money.
If kernel can be rewritten into an "interpreted" version without performance penalty, you can get bytecodes from server to client, so when client person checks profiler, he/she sees only interpreter codes but not real working algorithm since it depends on bytecodes from server as being "data"(buffer). For example, c=a+b becomes if(..)else if(...)else(case ...) and has no meaning without a data feed.
On top of all these, you could buy time against some evil people reverseengineer it, you could pick variable names to initiate his/her "selective perception" so he/she can't focus for a while. Then you develop a meaner version at the same time. Such as c=a+b becomes bulletsToDevilsEar=breakDevilsLeg+goodGame

Store a video in a SQLite database?

I'm working on an algorithm which needs very fast random access to video frames in a possibly long video (minimum 30 minutes). I am currently using OpenCV's VideoCapture to read my video, but the seeking functionality is either broken or very slow. The best I found until now is using the MJPEG codec inside a MKV container, but it's not fast enough.
I can chose any video format or even create a new one. The storage space is not a problem (to some extents of course). The only requirement is to get the fastest possible seeking time to any location in the video. Ideally, I would like to be able to access to multiple frames simultaneously, taking advantages of my quad-core CPU.
I know that relational databases are very good to store large volumes of data, they allows simultaneous read accesses and they're very fast when using indexes.
Is SQLite a good fit for my specific needs ? I plan to store each video frame compressed in JPEG, and use an index on the frame number to access them quickly.
EDIT : for me a frame is just an image, not the entire video. A 30mn video # 25 fps contains 30*60*25=45000 frames, and I want to be able to quickly get one of them using its number.
EDIT : For those who could be interested, I finally implemented a custom video container saving each frame in fixed-sized blocks (consequently, the position of any frame can be directly computed !). The images are compressed with the turbojpeg library and file accesses are multi-threaded (to be NCQ-friendly). The bottleneck is not the HDD anymore and I finally obtained much better perfs :)
I don't think using SQLite (or any other dabatase engine) is a good solution for your problem. A database is not a filesystem.
If what you need is very fast random access, then stick to the filesystem, it was designed for this kind of usage, and optimized with this in mind. As per your comment, you say a 5h video would require 450k files, well, that's not a problem in my opinion. Certainly, directory listing will be a bit long, but you will get the absolute fastest possible random access. And it will certainly be faster than SQLite because you're one level of abstraction under.
And if you're really worried about directory listing times, you just have to organize your folder structure like a tree. That will get you longer paths, but fast listing.
Keep a high level perspective. The problem is that OpenCV isn't fast enough at seeking in the source video. This could be because
Codecs are not OpenCV's strength
The source video is not encoded for efficient seeking
You machine has a lot of dedicated graphics hardware to leverage, but it does not have specialized capabilities for randomly seeking within a 17 GB dataset, be it a file, a database, or a set of files. The disk will take a few milliseconds per seek. It will be better for an SSD but still not so great. Then you wait for it to load into main memory And you have to generate all that data in the first place.
Use ffmpeg, which should handle decoding very efficiently, perhaps even using the GPU. Here is a tutorial. (Disclaimer, I haven't used it myself.)
You might preprocess the video to add key frames. In principle this shouldn't require completely re-encoding, at least for MPEG, but I don't know much about specifics. MJPEG essentially turns all frames into keyframes, but you can find a middle ground and maybe seek 1.5x faster at a 2x size cost. But avoid hitting the disk.
As for SQLite, that is a fine solution to the problem of seeking within 17 GB of data. The notion that databases aren't optimized for random access is poppycock. Of course they are. A filesystem is a kind of database. Random access in 17 GB is slow because of hardware, not software.
I would recommend against using the filesystem for this task, because it's a shared resource synchronized with the rest of the machine. Also, creating half a million files (and deleting them when finished) will take a long time. That is not what a filesystem is specialized for. You can get around that, though, by storing several images to each file. But then you need some format to find the desired image, and then why not put them all in one file?
Indeed, (if going the 17 GB route) why not ignore the entire problem and put everything in virtual memory? VM is just as good at making the disk seek as SQLite or the filesystem. As long as the OS knows it's OK for the process to use that much memory, and you're using 64-bit pointers, it should be a fine solution, and the first thing to try.

Language without explicit memory alloc/dealloc AND without garbage collection

I was wondering if it is possible to create a programming language without explicit memory allocation/deallocation (like C, C++ ...) AND without garbage collection (like Java, C#...) by doing a full analysis at the end of each scope?
The obvious problem is that this would take some time at the end of each scope, but I was wondering if it has become feasible with all the processing power and multiple cores in current CPU's. Do such languages exist already?
I also was wondering if a variant of C++ where smart pointers are the only pointers that can be used, would be exactly such a language (or am I missing some problems with that?).
Edit:
Well after some more research apparently it's this: http://en.wikipedia.org/wiki/Reference_counting
I was wondering why this isn't more popular. The disadvantages listed there don't seem quite serious, the overhead should be that large according to me. A (non-interpreted, properly written from the ground up) language with C family syntax with reference counting seems like a good idea to me.
The biggest problem with reference counting is that it is not a complete solution and is not capable of collecting a cyclic structure. The overhead is incurred every time you set a reference; for many kinds of problems this adds up quickly and can be worse than just waiting for a GC later. (Modern GC is quite advanced and awesome - don't count it down like that!!!)
What you are talking about is nothing special, and it shows up all the time. The C or C++ variant you are looking for is just plain regular C or C++.
For example write your program normally, but constrain yourself not to use any dynamic memory allocation (no new, delete, malloc, or free, or any of their friends, and make sure your libraries do the same), then you have that kind of system. You figure out in advance how much memory you need for everything you could do, and declare that memory statically (either function level static variables, or global variables). The compiler takes care of all the accounting the normal way, nothing special happens at the end of each scope, and no extra computation is necessary.
You can even configure your runtime environment to have a statically allocated stack space (this one isn't really under the compiler's control, more linker and operating system environment). Just figure out how deep your function call chain goes, and how much memory it uses (with a profiler or similar tool), an set it in your link options.
Without dynamic memory allocation (and thus no deallocation through either garbage collection or explicit management), you are limited to the memory you declared when you wrote the program. But that's ok, many programs don't need dynamic memory, and are already written that way. The real need for this shows up in embedded and real-time systems when you absolutely, positively need to know exactly how long an operation will take, how much memory (and other resources) it will use, and that the running time and the use of those resources can't ever change.
The great thing about C and C++ is that the language requires so little from the environment, and gives you the tools to do so much, that smart pointers or statically allocated memory, or even some special scheme that you dream up can be implemented. Requiring the use them, and the constraints you put on yourself just becomes a policy decision. You can enforce that policy with code auditing (use scripts to scan the source or object files and don't permit linking to the dynamic memory libraries)

Are binary protocols dead? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It seems like there used to be way more binary protocols because of the very slow internet speeds of the time (dialup). I've been seeing everything being replaced by HTTP and SOAP/REST/XML.
Why is this?
Are binary protocols really dead or are they just less popular? Why would they be dead or less popular?
You Just Can't Beat the Binary
Binary protocols will always be more space efficient than text protocols. Even as internet speeds drastically increase, so does the amount and complexity of information we wish to convey.
The text protocols you reference are outstanding in terms of standardization, flexibility and ease of use. However, there will always be applications where the efficiency of binary transport will outweigh those factors.
A great deal of information is binary in nature and will probably never be replaced by a text protocol. Video streaming comes to mind as a clear example.
Even if you compress a text-based protocol (e.g. with GZip), a general purpose compression algorithm will never be as efficient as a binary protocol designed around the specific data stream.
But Sometimes You Don't Have To
The reason you are seeing more text-based protocols is because transmission speeds and data storage capacity have indeed grown fast compared to the data size for a wide range of applications. We humans find it much easier to work with text protocols, so we designed our ubiquitous XML protocol around a text representation. Certainly we could have created XML as a binary protocol, if we really had to save every byte, and built common tools to visualize and work with the data.
Then Again, Sometimes You Really Do
Many developers are used to thinking in terms of multi-GB, multi-core computers. Even your typical phone these days puts my first IBM PC-XT to shame. Still, there are platforms such as embedded devices, that have rather strict limitations on processing power and memory. When dealing with such devices, binary may be a necessity.
A parallel with programming languages is probably very relevant.
While hi-level languages are the preferred tools for most programming jobs, and have been made possible (in part) by the increases in CPU speed and storage capactity, they haven't removed the need for assembly language.
In a similar fashion, non-binary protocols introduce more abstraction, more extensibility and are therefore the vehicle of choice particularly for application-level communication. They too have benefited from increases in bandwidth and storage capacity. Yet at lower level it is still impractical to be so wasteful.
Furthermore unlike with programming languages where there are strong incentives to "take the performance hit" in exchange for added simplicity, speed of development etc., the ability to structure communication in layers makes the complexity and "binary-ness" of lower layers rather transparent to the application level. For example so long as the SOAP messages one receives are ok, the application doesn't need to know that these were effectively compressed to transit over the wire.
Facebook, Last.fm, and Evernote use the Thrift binary protocol.
I rarely see this talked about but binary protocols, block protocols especially can greatly simplify the complexity of server architectures.
Many text protocols are implemented in such a way that the parser has no basis upon which to infer how much more data is necessary before a logical unit has been received (XML, and JSON can all provide minimum necessary bytes to finish, but can't provide meaningful estimates). This means that the parser may have to periodically cede to the socket receiving code to retrieve more data. This is fine if your sockets are in blocking mode, not so easy if they're not. It generally means that all parser state has to be kept on the heap, not the stack.
If you have a binary protocol where very early in the receive process you know exactly how many bytes you need to complete the packet, then your receiving operations don't need to be interleaved with your parsing operations. As a consequence, the parser state can be held on the stack, and the parser can execute once per message and run straight through without pausing to receive more bytes.
There will always be a need for binary protocols in some applications, such as very-low-bandwidth communications. But there are huge advantages to text-based protocols. For example, I can use Firebug to easily see exactly what is being sent and received from each HTTP call made by my application. Good luck doing that with a binary protocol :)
Another advantage of text protocols is that even though they are less space efficient than binary, text data compresses very well, so the data may be automatically compressed to get the best of both worlds. See HTTP Compression, for example.
Binary protocols are not dead. It is much more efficient to send binary data in many cases.
WCF supports binary encoding using TCP.
http://msdn.microsoft.com/en-us/library/ms730879.aspx
So far the answers all focus on space and time efficiency. No one has mentioned what I feel is the number one reason for so many text-based protocols: sharing of information. It's the whole point of the Internet and it's far easier to do with text-based, human-readable protocols that are also easily processed by machines. You rid yourself of language dependent, application-specific, platform-biased programming with text data interchange.
Link in whatever XML/JSON/*-parsing library you want to use, find out the structure of the information, and snip out the pieces of data you're interested in.
Some binary protocols I've seen on the wild for Internet Applications
Google Protocol Buffers which are used for internal communications but also on, for example Google Chrome Bookmark Syncing
Flash AMF which is used for communication with Flash and Flex applications. Both Flash and Flex have the capability of communicating via REST or SOAP, however the AMF format is much more efficient for Flex as some benchmarks prove
I'm really glad you have raised this question, as non-binary protocols have multiplied in usage many folds since the introduction of XML. Ten years ago, you would see virtually everybody touting their "compliance" with XML based communications. However, this approach, one of several approaches to binary protocols, has many deficiencies.
One of the values, for example, was readability. But readability is important for debugging, when humans should read the transaction. They are very inefficient when compared with binary transfers. This is due to the fact that XML itself is a binary stream, that has to be translated using another layer into textual fragments ("tokens"), and then back into binary with the contained data.
Another value people found was extensibility. But extensibility can be easily maintained if a protocol version number for the binary stream is used at the beginning of the transaction. Instead of sending XML tags, one could send binary indicators. If the version number is an unknown one, then the receiving end can download the "dictionary" of this unknown version. This dictionary could, for example, be an XML file. But downloading the dictionary is a one time operation, instead of every single transaction!
So efficiency could be kept together with extensibility, and very easily! There are a good number of "compiled XML" protocols out there which do just that.
Last, but not least, I have even heard people say that XML is a good way to overcome little-endian and big-endian types of binary systems. For example, Sun computers vs Intel computers. But this is incorrect: if both sides can accept XML (ASCII) in the right way, surely both sides can accept binary in the right way, as XML and ASCII are also transmitted binarically.......
Hope you find this interesting reading!
Binary protocols will continue to live wherever efficency is required. Mostly, they will live in the lower-levels, where hardware-implementation is more common than software implementations. Speed isn't the only factor - the simplicity of implementation is also important. Making a chip process binary data messages is much easier than parsing text messages.
Surely this depends entirely on the application? There have been two general types of example so far, xml/html related answers and video/audio. One is designed to be 'shared' as noted by Jonathon and the other efficient in its transfer of data (and without Matrix vision, 'reading' a movie would never be useful like reading a HTML document).
Ease of debugging is not a reason to choose a text protocol over a 'binary' one - the requirements of the data transfer should dictate that. I work in the Aerospace industry, where the majority of communications are high-speed, predictable data flows like altitude and radio frequencies, thus they are assigned bits on a stream and no human-readable wrapper is required. It is also highly efficient to transfer and, other than interference detection, requires no meta data or protocol processing.
So certainly I would say that they are not dead.
I would agree that people's choices are probably affected by the fact that they have to debug them, but will also heavily depend on the reliability, bandwidth, data type, and processing time required (and power available!).
They are not dead because they are the underlying layers of every communication system. Every major communication system's data link and network layers are based on some kind of "binary protocol".
Take the internet for example, you are now probably using Ethernet in your LAN, PPPoE to communicate with your ISP, IP to surf the web and maybe FTP to download a file. All of which are "binary protocols".
We are seeing this shift towards text-based protocols in the upper layers because they are much easier to develop and understand when compared to "binary protocols", and because most applications don't have strict bandwidth requirements.
depends on the application...
I think in real time environment (firewire, usb, field busses...) will always be a need for binary protocols
Are binary protocols dead?
Two answers:
Let's hope so.
No.
At least a binary protocol is better than XML, which provides all the readability of a binary protocol combined with all the efficiency of less efficiency than a well-designed ASCII protocol.
Eric J's answer pretty much says it, but here's some more food for thought and facts. Note that the stuff below is not about media protocols (videos, images). Some items may be clear to you, but I keep hearing myths every day so here you go ...
There is no difference in expressiveness between a binary protocol and a text protocol. You can transmit the same information with the same reliability.
For every optimum binary protocol, you can design an optimum text protocol that takes just around 15% more space, and that protocol you can type on your keyboard.
In practice (practical protocols is see every day), the difference is often even less significant due to the static nature of many binary protocols.
For example, take a number that can become very large (e.g., in 32 bit range) but is often very small. In binary, people model this usually as four bytes. In text, it's often done as printed number followed by colon. In this case, numbers below ten become two bytes and numbers below 100 three bytes. (You can of course claim that the binary encoding is bad and that you can use some size bits to make it more space efficient, but that's another thing that you have to document, implement on both sides, and be able to troubleshoot when it comes over your wire.)
For example, messages in binary protocols are often framed by length fields and/or terminators, while in text protocols, you just use a CRC.
In practice, the difference is often less significant due to required redundancy.
You want some level of redundancy, no matter if it's binary or text. Binary protocols often leave no room for error. You have to 100% correctly document every bit that you send, and since most of us are humans, that happens rarely and you can't read it well enough to make a safe conclusion what is correct.
So in summary: Binary protocols are theoretically more space and compute efficient, but the difference is in practice often less than you think and the deal is often not worth it. I am working in the Internet of Things area and have to deal nearly on daily base with custom, badly designed binary protocols which are really hard to troubleshoot, annoying to implement and not more space efficient. If you don't need to absolutely tweak the last milliampere out of your battery and calculate with microcontroller cycles (or transmit media), think twice.

How do you pre allocate memory to a process in solaris?

My problem is:
I have a perl script which uses lot of memory (expected behaviour because of caching). But, I noticed that the more I do caching, slower it gets and the process spends most of the time in sleep mode.
I thought pre-allocating memory to the process might speed up the performance.
Does someone have any ideas here?
Update:
I think I am not being very clear here. I will put question in clearer way:
I am not looking for the ways of pre-allocating inside the perl script. I dont think that would help me much here. What I am interested in is a way to tell OS to allocate X amount of memory for my perl script so that it does not have to compete with other processes coming in later.
Assume that I cant get away with the memory usage. Although, I am exploring ways of reducing that too but dont expect much improvement there.
FYI, I am working on a solaris 10 machine.
What I gathered from your posting and comments is this:
Your program gets slow when memory use rises
Your pogram increasingly spends time sleeping, not computing.
Most likely eplanation: Sleeping means waiting for a resource to become available. In this case the resource most likely is memory. Use the vmstat 1 command to verify. Have a look at the sr column. If it goes beyond ~150 consistently the system is desperate to free pages to satisfy demand. This is accompanied by high activity in the pi, po and fr columns.
If this is in fact the case, your best choices are:
Upgrade system memory to meet demand
Reduce memory usage to a level appropiate for the system at hand.
Preallocating memory will not help. In either case memory demand will exceed the available main memory at some point. The kernel will then have to decide which pages need to be in memory now and which pages may be cleared and reused for the more urgently needed pages. If all regularily needed pages (the working set) exceeds the size of main memory, the system is constantly moving pages from and to secondary storage (swap). The system is then said to be thrashing and spends not much time doing useful work. There is nothing you can do about this execept adding memory or using less of it.
From a comment:
The memory limitations are not very severe but the memory footprint easily grows to GBs and when we have competing processes for memory, it gets very slow. I want to reserve some memory from OS so that thrashing is minimal even when too many other processes come. Jagmal
Let's take a different tack then. The problem isn't really with your Perl script in particular. Instead, all the processes on the machine are consuming too much memory for the machine to handle as configured.
You can "reserve" memory, but that won't prevent thrashing. In fact, it could make the problem worse because the OS won't know if you are using the memory or just saving it for later.
I suspect you are suffering the tragedy of the commons. Am I right that many other users are on the machine in question? If so, this is more of a social problem than a technical problem. What you need is someone (probably the System Administrator) to step in and coordinate all the processes on the machine. They should find the most extravagant memory hogs and work with their programmers to reduce the cost on system resources. Further, they ought to arrange for processes to be scheduled so that resource allocation is efficient. Finally, they may need to get more or improved hardware to handle the expected system load.
Some questions you might ask yourself:
are my data structures really useful for the task at hand?
do I really have to cache that much?
can I throw away cached data after some time?
my #array;
$#array = 1_000_000; # pre-extend array to one million elements,
# http://perldoc.perl.org/perldata.html#Scalar-values
my %hash;
keys(%hash) = 8192; # pre-allocate hash buckets
# (same documentation section)
Not being familiar with your code, I'll venture some wild speculation here [grin] that these techniques aren't going to offer new great efficiencies to your script, but that the pre-allocation could help a little bit.
Good luck!
-- Douglas Hunter
I recently rediscovered an excellent Randal L. Schwartz article that includes preallocating an array. Assuming this is your problem, you can test preallocating with a variation on that code. But be sure to test the result.
The reason the script gets slower with more caching might be thrashing. Presumably the reason for caching in the first place is to increase performance. So a quick answer is: reduce caching.
Now there may be ways to modify your caching scheme so that it uses less main memory and avoids thrashing. For instance, you might find that caching to a file or database instead of to memory can boost performance. I've found that file system and database caching can be more efficient than application caching and can be shared among multiple instances.
Another idea might be to alter your algorithm to reduce memory usage in other areas. For instance, instead of pulling an entire file into memory, Perl programs tend to work better reading line by line.
Finally, have you explored the Memoize module? It might not be immediately applicable, but it could be a source of ideas.
I could not find a way to do this yet.
But, I found out that (See this for details)
Memory allocated to lexicals (i.e.
my() variables) cannot be reclaimed or
reused even if they go out of scope.
It is reserved in case the variables
come back into scope. Memory allocated
to global variables can be reused
(within your program) by using
undef()ing and/or delete().
So, I believe a possibility here could be to check if i can reduce the total memory print of lexical variables at a given point in time.
It sounds like you are looking for limit or ulimit. But I suspect that will cause a script that goes over the limit to fail, which probably isn't what you want.
A better idea might be to share cached data between processes. Putting data in a database or in a file works well in my experience.
I hate to say it, but if your memory limitations are this severe, Perl is probably not the right language for this application. C would be a better choice, I'd think.
One thing you could do is to use solaris zones (containers) .
You could put your process in a zone and allocate it resources like RAM and CPU's.
Here are two links to some tutorials :
Solaris Containers How To Guide
Zone Resource Control in the Solaris 10 08/07 OS
While it's not pre-allocating as you asked for, you may also want to look at the large page size options, so that when perl has to ask the OS for more memory for your program, it gets it in
larger chunks.
See Solaris Internals: Multiple Page Size Support for more information on the difference this makes and how to do it.
Look at http://metacpan.org/pod/Devel::Size
You could also inline a c function to do the above.
As far as I know, you cannot allocate memory directly from Perl. You can get around this by writing an XS module, or using an inline C function like I mentioned.

Resources