Design of compression using OpenCL FPGA ,Memory allocation - opencl

I am trying to implement lossy compression algorithms using OpenCL ,I have divided the work into two kernels ,one for implementing the algorithm itself and the other one is used for encoding ,means that ,I am trying to concatenate different sized bytes into the stream of the compressed data ,the problem is that ,I am getting high initiation interval due to inefficient memory accesses ,and it takes too much time for the encoding part ,when I opened HTML report ,I got the following message :
**stallable, 13 reads and 25 writes.
Reduce the number of write accesses or fix banking to make this memory system stall-free. Banking may be improved by using compile-time known indexing on lowest array dimension.
Banked on bits 0, 1 into 4 separate banks.
Private memory implemented in on-chip block RAM
My question is ,how could I improve the allocation of the stream ,the way i do it is as follow :
unsigned char __attribute__((numbanks(4),bankwidth(8))) out[outsize];
but it is inefficient ,is there any technique or way that I can use for better utilization?
The way I do encoding is that ,I am adding byte while monitoring the index of last modified bit and byte ,so I am doing exoring and because sometime I got more than one byte or less than one btye so I work byte by byte

Related

How to replace MPI_Pack_size if I need to send more than 2GB of data?

I want to send and receive more than 2 GB of data using MPI and I came across a lot of articles like the ones cited below:
http://blogs.cisco.com/performance/can-we-count-on-mpi-to-handle-large-datasets,
http://blogs.cisco.com/performance/new-things-in-mpi-3-mpi_count
talking about changes that are made starting with MPI 3.0 allowing to send and receive bigger chunks of data.
Most of the functions now are receiving as parameter an MPI_Count object instead of int, but not all of them.
How can I replace
int MPI_Pack_size(int incount, MPI_Datatype datatype, MPI_Comm comm,
int *size)
in order to get the size of a larger buffer? (because here the size can only be at most 2GB)
The MPI_Pack routines (MPI_Pack, MPI_Unpack, MPI_Pack_size, MPI_Pack_external) are, as you see, unable to support more than 32 bits worth of data, due to the integer pointer used as a return value. I don't know why the standard did not provide MPI_Pack_x, MPI_Unpack_x, MPI_Pack_size_x, and MPI_Pack_external_x -- presumably an oversight? As Jeff suggests, it might have been done so because packing multiple gigs of data is unlikely to provide much benefit. Still, it breaks orthogonality not to have those...
A quality implementation (I do not know if MPICH is one of those) should return an error about the type being too big, allowing you to pack a smaller amount of data.

Read/write binary data on SD using Arduino

I'm working on a project with an Arduino, and I'd like to be able to save some data persistently. I'm already using an Ethernet shield, which has a MicroSD reader.
The data I'm saving will be incredibly small. At the moment, I'll just be saving 3 bytes at a time. What I'd really like is a way to open the SD card for writing starting at byte x and then write y bytes of data. When I want to read it back, I just read y bytes starting at byte x.
However, all the code I've seen involves working with a filesystem, which seems like an unneeded overhead. I don't need this data to be readable on any other system, storage space isn't an issue, and there's no other data on the card to worry about. Is there a way to just write binary data directly to an SD card?
It is possible to write raw binary data to an SD card. Most people do this using the 4-pin SPI interface supported by the SD card. Unfortunately, data isn't byte-addressed, but block-addressed (block size usually 512 bytes).
This means if you wanted to write 4 bytes at byte 516, you'd have to read in block 0x00000001 (the second block), and then calculate an offset, write your data, then write the entire block back. (I can't say that this limitation applies to the SD interface using more pins, I have no experience with it)
This complication is why a lot of people opt for using libraries that include "unneeded overhead".
With that said, I've had to do this in the past, because I needed a way of logging data that was robust in the face of power failures. I found the following resource very helpful:
http://elm-chan.org/docs/mmc/mmc_e.html
You'll probably find it easier to make your smaller writes to a memory buffer, and dump them to the SD card when you have a large enough amount of data to make it worthwhile.
If you look around, you'll find plenty of open-source code dealing with the SD SPI interface to make use of directly, or as reference to implement your own system.

Memory test operation without pointers in NXC on NXT?

I'm trying to write a memory test program for the NXT, since I have several with burned memory cells and would like to identify which NXTs are unusable. This program is intended to test each byte in memory for integrity by:
Allocating 64 bits to an Linear Feedback Shift Register randomizer
Adding another byte to a memory pointer
Writing random data to the selected memory cell
Verifying the data is read back correctly
However, I then discovered through these attempts that the NXT doesn't actually support pointer operations. Thus, I can't simply iterate the pointer byte and read its location to test.
How do I go about iterating over indexes in memory without pointers?
I think the problem is that you don't really get direct memory access in either NBC/NXC or RobotC.
From what I know, both run on an NXT firmware emulator; so the bad memory address[es] might change from your program's point of view (assuming the emulator does virtual memory).
To actual run bare metal, I would suggest using the NXTBINARY function of John Hansen's modified firmware as described here:
http://www.tau.ac.il/~stoledo/lego/nxt-native/
The enhanced fimware can be found at:
http://bricxcc.sourceforge.net/test_releases/

what size of buffer is the best for uploading file to internet

I'm using HTTP API provided by MS to upload video to YouTube, I noticed the total elapsed time is different with different buffer size, what size of buffer is the best for uploading file to internet? Thanks in advance.
Try it out. Depends on your network speed and other settings. If there would be the one optimal size, it would have been preconfigured.
The right one?
TCP/IP has lots of self-tuning functionality built in (although by default window scaling is disabled). If you are seeing different behaviours using different application level buffers then this is most likely due to anomolies within the application code. If the code is closed-source then you can only ever do black box testing to find the optimal behaviour. However at a guess it sounds like the source reads are delayed until the buffer is empty - try using rotating buffers with a pre-fetch, e.g.
i) read X bytes into buffer 1
ii) start writing buffer 1 to the output in a seperate thread
iii) read X bytes into buffer 2
iv) when the thread created in ii returns, swap the buffers around and repeat steps from ii
C.

Tibco Rendezvous - size constraints

I am attempting to put a potentially large string into a rendezvous message and was curious about size constraints. I understand there is a physical limit (64mb?) to the message as a whole, but I'm curious about how some other variables could affect it. Specifically:
How big the keys are?
How the string is stored (in one field vs. multiple fields)
Any advice on any of the above topics or anything else that could be relevant would be greatly appreciated.
Note: I would like to keep the message as a raw string (as opposed to bytecode, etc).
From the Tibco docs on Very Large Messages:
Rendezvous software can transport very
large messages; it divides them into
small packets, and places them on the
network as quickly as the network can
accept them. In some situations, this
behavior can overwhelm network
capacity; applications can achieve
higher throughput by dividing large
messages into smaller chunks and
regulating the rate at which it sends
those chunks. You can use the
performance tool to evaluate chunk
sizes and send rates for optimal
throughput.
This example, sends one message
consisting of ten million bytes.
Rendezvous software automatically
divides the message into packets and
sends them. However, this burst of
packets might exceed network capacity,
resulting in poor throughput:
sender> rvperfm -size 10000000 -messages 1
In this second example, the
application divides the ten million
bytes into one thousand smaller
messages of ten thousand bytes each,
and automatically determines the batch
size and interval to regulate the flow
for optimal throughput:
sender> rvperfm -size 10000 -messages 1000 -auto
By varying the -messages and -size
parameters, you can determine the
optimal message size for your
applications in a specific network.
Application developers can use this
information to regulate sending rates
for improved performance.
As to actual limits the Add string function takes a C style ansi string so is theoretically unbounded but, given the signature of the AddOpaque
tibrv_status tibrvMsg_AddOpaque(
tibrvMsg message,
const char* fieldName,
const void* value,
tibrv_u32 size);
which takes a u32 it would seem sensible to state that the limit is likely to be 4GB rather than 64MB.
That said using Tib to transfer such large packets is likely to be a serious performance bottleneck as it may have to buffer significant amounts of traffic as it tries to get these sorts of messages to all consumers. By default the rvd buffer is only 60 seconds so you may find yourself suffering message loss if this is a significant amount of your traffic.
Message overhead within tibco is largely as simple as:
the fixed cost associated with each message (the header)
All the fields (type info and the field id)
Plus the cost of all variable length aspects including:
the send and receive subjects (effectively limited to 256 bytes each)
the field names. I can find no limit to the length of the field names in the docs but the smaller they are the better, better still don't use them at all and use the numerical identifiers
the array/string/opaque/user defined variable length fields in the message
Note: If you use nested messages simply recurse the above.
In your case the payload overhead will be so vast in comparison to the names (so long as they are reasonable and simple) there is little point attempting to optimize these at all.
You may find you can considerable efficiency on the wire/buffered if you transmit the strings in a compressed form, either through the use of an rvrd with compression enabled or by changing your producer/consumer to use something fast but effective like deflate (or if you're feeling esoteric things like QuickLZ,FastLZ,LZO,etc. Especially ones with fixed memory footprint compress/decompress engines)
You don't say which platform api you are targeting (.net/java/C++/C for example) and this will colour things a little. On the wire all string data will be in 1 byte per character regardless of java/.net using UTF-16 by default however you will incur a significant translation cost placing these into/reading them out of the message because the underlying buffer cannot be reused in those cases and a copy (and compaction/expansion respectively) must be performed.
If you stick to opaque byte sequences you will still have the copy overhead in the naieve implementations possible through the managed wrapper apis but this will at least be less overhead if you have no need to work with the data as a native string.
The overall maximum size of a message is 64MB as was speculated in the OP. From the "Tibco Rendezvous Concepts" document:
Although the ability to exchange large data buffers is a feature of Rendezvous
software, it is best not to make messages too large. For example, to exchange data
up to 10,000 bytes, a single message is efficient. But to send files that could be
many megabytes in length, we recommend using multiple send calls, perhaps one
for each record, block or track. Empirically determine the most efficient size for
the prevailing network conditions. (The actual size limit is 64 MB, which is rarely
an appropriate size.)

Resources