How to replace combinational memory with ASIC cell in Chisel - synthesis

I am trying to do ASIC synthesis for Rocket processor which is written by Chisel.
It automatically generates *.conf and *.behave_srams.v files. So, I can easily replace SeqMem with ASIC SRAM. However, for "Mem" which is combinational memory is always changed to register. How can I replace the Mem with ASIC combinational memory or ASIC register file?
Is there an option for this when generating verilog?

Unfortunately, the current flow only supports replacing SeqMems. It would be nice for it to be extended to support combinational memories. Currently, your best bet would just be to instantiate your ASIC combinational memories as blackboxes directly in the Chisel.

Related

Methods to discourage reverse engineering of an opencl kernel

I am preparing my opencl accelerated toolkit for release. Currently, I compile my opencl kernels into binaries targeted for a particular card.
Are there other ways of discouraging reverse engineering? Right now, I have many opencl binaries in my release folder, one for each kernel. Would it be better to splice these binaries into one single binary, or even add them into the host binary, and somehow read them in using a special offset ?
OpenCL 2.0 and SPIR-V can be used for this, but is not available on all platforms yet.
Encode binaries. Keep keys in server and have clients request it at time of usage. Ofcourse keys should be encoded too,( using a variable value such as time of server maybe). Then decode in client to use as binary kernel.
I'm not encode pro but I would use multiple algorithms applied multiple times to make it harder, if they are crunchable in several months(needed for new version update of your GPGPU software for example) when they are alone. But simple unknown algorithm of your own such as reversing order of bits of all data (1st bit goes nth position, nth goes 1st) should make it look hard for level-1 hackers.
Warning: some profiling tools could get its codes in "run-time" so you should add many maybe hundreds of trivial kernels without performance penalty to hide it in a crowded timeline or you could disable profiling in kernel options or you could add a deliberate error maybe some broken events in queues then restart so profiler cannot initiate.
Maybe you could obfuscate final C99 code so it becomes unreadable by humans. If can, he/she doesn't need hacking in first place.
Maybe most effectively, don't do anything, just buy copyrights of your genuine algorithm and show it in a txt so they can look but can not dare copying for money.
If kernel can be rewritten into an "interpreted" version without performance penalty, you can get bytecodes from server to client, so when client person checks profiler, he/she sees only interpreter codes but not real working algorithm since it depends on bytecodes from server as being "data"(buffer). For example, c=a+b becomes if(..)else if(...)else(case ...) and has no meaning without a data feed.
On top of all these, you could buy time against some evil people reverseengineer it, you could pick variable names to initiate his/her "selective perception" so he/she can't focus for a while. Then you develop a meaner version at the same time. Such as c=a+b becomes bulletsToDevilsEar=breakDevilsLeg+goodGame

What's different between the normal memory object and OpenCL's pipe?

Pipe is one of the OpenCL 2.0's new features, and this feature has been demonstrated in the AMDAPPSDK's producer/consumer example. I've read some articles abut pipe's use cases and they're all like the producer/consumer way.
My question is, the same functionality can be achieved by creating a global memory space/object and passing the pointer to 2 kernel functions given that OpenCL 2.0 provides the shared virtual memory. So what's the difference between a pipe object and a global memory object? Or is it invented just for optimization?
It is as useful as std::vector and std::queue.
One is useful to store data, while the other is useful to store packets.
Packets are indeed data, but it is much easier to handle them as small units rather than a big block.
Pipes in OpenCL allow you to consume these small packets in a kernel, without having to deal with the indexing + storing + pointers + forloops hell that would happen if you manually implement a pipe mechanism yourself in the kernel.
Pipes are useful for example when each work item can generate variable number of outputs. Prior to OpenCL 2.0 this was difficult to handle.
Pipes may reside in faster memory (vendor specific) i.e. Altera recommends using pipes to exchange data between kernels instead of using global memory.
Pipes are designed to transfer data from one kernel to another kernel/s without the need to store/load data in/from global or host memory. This is essentially a FIFO on the FPGA device. So, the speed of access of the data are much faster than that of through DDR or host memory. This is probably the reason to use FPGA as an accelerator.
Sometimes the DDRs are used to share data between kernels as well. One example is that a SIMD kernel want to share some data with a single task kernel with requirement on input data sequence. As, Pipes will run out of order in a SIMD way.
Other than the Pipes, you can use Altera channels for more function support. But this is not portable to other OpenCL devices.
Hope this can help. :)

How to get kernel information

I want to get following information about compiled OpenCL kernels - list of types, params order (if possible - with memory and access classifiers). Kernels are build from the sources during run time of app.
Actually, in OpenCL 1.2 already exists appropriate functions for such query - clGetKernelArgInfo, but due to project restrictions I have to find way to achieve such functionality using pure OpenCL 1.0 without any extensions.
At present, I am thinking about three approaches:
write simple Ansi C parser to get info about kernel's signature directly from OpenCL kernel's source
using macros in OpenCL code to mark kernel's arguments for simple in-app parsing (by extending this idea)
define list of the most possible combination of kernel's arguments using macros and class-helpers (due to my project's constrains it is possible to operate under 3-5 common arg-types)
My question: is there any other ways to get info about compiled kernel?
I want to use this info to decrease amount of OpenCL routine in client code by encapsulate calls to clCreateBuffer, clEnqueueWrite/Read, clSetKernelArg in small wrapper, which should check provided params, allocate device side ptrs, copy data from/to hosts and so on.
The Khronos WebCL Validator gives you the equivalent of clGetKernelArgInfo, including all qualifiers.
The necessary downside is that it's a complete parser, based on Clang/LLVM. It takes roughly the same amount of time to run as a typical OpenCL compiler (not a coincidence), and adds around 10 megabytes to your executable size.

A way to change mcu program from the outside

We need to change a controller code from the out side as they do with industrial MCU .
So that you have an mcu,with a program on it, and someone can program some "words" to it, that will determine how it works.
So for example you can program an mcu -not with a programer but with some inputs from serial, to do some simple things such as:
if input A==1
b=1
I wonder if there is a smart way to do that with simple software on the mcu, that it has many #defines for various commands, and it perform them according to values it gets from the outside (and saved for the rest of the program).
I wonder if the industrial programers are using that method, or that every programing of a user is actually load a code(.hex) to the chip(with internal programer ) .
I prefer the simplest way(i wonder if its by pre defined software)
A couple of options come to mind so hopefully this answers your question. It sounds like the simplest version of your question is "How do I change the behavior of the MCU without an actual MCU programmer?" A couple of options come to mind.
1) Depending on the MCU you can have a bootloader that is essentially a small piece of code programmed in the MCU by a programmer that has the ability to reprogram other parts of the MCU. This doesn't require a programmer but involves some other form of letting the bootloader know what the new code is (USB, Serial, SD Card, etc). This will only work if the MCU has the ability to self flash.
2) Again, depending on MCU and scenario you could program a generic set of rules that carry out functionality based on the inputs given to the MCU. This could be in the form of IO pins, EEPROM, or a domain-specific script on an SD card that the MCU can read and interpret at runtime.
Both options depend on the MCU you are using and what hardware capabilities you have at your disposal. But you certainly have options other than reprogramming the end hardware with an actual programmer every time you want to make a change. Hopefully that helps.

mpi under the hood

I need to deliver a presentation on programming in MPI. I need to add a segment on how MPI works under the hood. For Example What happens when I call MPI_Init?
Do you know of any good source from where I can learn these details?
The MPI Spec contains the description of the knobs, sliders, and displays that are on the outside of the "black box" of each API.
The interior details of the black boxes will be implementation dependent...and will also depend on the interconnect (e.g. TCP, IBV, DAPL, etc), the OS (e.g. is the implementation using LSB, or native libraries, etc), and on many other factors to a lesser degree (e.g. message size thresholds will trigger different code paths, and so on). Using "strace" and "ltrace" on the a.out may provide some insight into the actual goings on inside the blackbox.
The best recommendation is to pick an open source implementation and examine the code to determine the internal details.
MPI is a specification, not a particular implementation. The observable behavior is given in the MPI spec. How it works under the hood depends on the particular implementation. If you'd like to take a look at an example implementation, you might be interested in looking at MPICH2 and browsing their source code.
Complement your study of the source code of an implementation of MPI with consideration of how you would implement MPI_Init on your platform of choice. MPI sits on top of already available O/S functionality. I don't mean to suggest that you can figure out how a particular version of MPI is implemented by this approach, but to suggest that you can learn better what is going on under the hood by tackling the problem from another angle.
MPI is only a spec. MPI spec is implemented by various groups and organizations. You will want to pick one implementation, say, MPICH, and you can find their design documentation. That will tell you how the MPI spec is implemented by that group.
If you just want to describe what happens when an application written in MPI is started, you can read about MPI and MPI programming. I highly recommend http://www.citutor.org

Resources