packed vs unpacked vectors in system verilog - vector

Looking at some code I'm maintaining in System Verilog I see some signals that are defined like this:
node [range_hi:range_lo]x;
and others that are defined like this:
node y[range_hi:range_lo];
I understand that x is defined as packed, while y is defined as unpacked. However, I have no idea what that means.
What is the difference between packed and unpacked vectors in System Verilog?
Edit: Responding to #Empi's answer, why should a hardware designer who's writing in SV care about the internal representation of the array? Are there any times when I shouldn't or can't use packed signals?

This article gives more details about this issue:
http://electrosofts.com/systemverilog/arrays.html, especially section 5.2.
A packed array is a mechanism for subdividing a vector into subfields which can be conveniently accessed as array elements. Consequently, a packed array is guaranteed to be represented as a contiguous set of bits. An unpacked array may or may not be so represented. A packed array differs from an unpacked array in that, when a packed array appears as a primary, it is treated as a single vector.

Before knowing what exactly packed and unpacked arrays are, lets also see how you can know which array is what, just by their declaration.
Packed arrays have an object name comes before size declaration. For example:
bit [3][7] a;
Unpacked array have an object name comes after size declaration. For example:
bit a[3];
Packed array make memory whereas Unpacked dont.
You can access/declare unpacked array like this also
reg unpacked_array [7:0] = '{0,0,0,0,0,0,0,1};
You can mix both packed and unpacked array to make a multidimensional memory. For example:
bit [3:0][7:0]a[2:0].
It makes an array of 4 (i.e. 4*8) bytes with depth of 3.

Packed array are mainly used for effective memory usage when we are writing a [3:0][7:0]A[4:0] which means in 32 bit memory locations 4slices each of 8 bit are packed to form a 32 bit. The right side value means there are 5 such slices are there.

bit[3:0] a -> packed array
The packed array can be used as a full array (a='d1) or just part of an array (a[0]='b1)
bit a [3:0] -> unpacked array
The unpacked array cannot be used as a[0]='b1, it has to be used as full a={8{'b1}}

A good explanation is given at verification academy.
A packed array of n vectors can be imagined as an array of single row and n columns.
Whereas an unpacked array of "n" vectors and "m" unpacked dimensions can be imagined as a matrix/2D array of n columns and m rows.

Unpacked arrays will give you more compile time error checking than packed arrays.
I see unpacked arrays on the port definitions of modules for this reason. The compiler will error if the dimensions of the signal are not exactly the same as the port with unpacked arrays. With packed arrays it will normally just go ahead and wire things the best it can, not issuing an error.

bit a [3:0] -> unpacked array The unpacked array cannot be used as a[0]='b1, it has to be used as full a={8{'b1}}
---> in above statement a[0] ='b1; will work for unpacked array , it won't work where some portion of the unpkd arry[eg logic unpkd [8];] like unpkd = 5'h7; assignment same will work for pkd array
--> unpkd = unpkd +2; won't wok for unpkd will work for pkd

Related

Max Array Length in Julia

I can create an array of a million elements like this:
Array(1:1_000_000)
Vector{Int64} with 1000000 elements
but if I try to create an array of a billion elements I get this:
Array(1:1_000_000_000)
Julia has exited.
Press Enter to start a new session.
Is Julia not able to handle a billion elements in an array or what am I doing wrong here?
You are creating an Array of Int64, each of which needs to be stored in memory:
julia> sizeof(3)
8
So at some point you're bound to run out of memory - this is not due to some inherent limit on the number of elements in an array, but rather the size of the overall array, which in turn depends on the size of each element. Consider:
julia> sizeof(Int8(3))
1
julia> [Int8(1) for _ in 1:1_000_000_000]
1000000000-element Array{Int8,1}:
1
1
1
⋮
1
1
1
so filling the array with a smaller data type (8-bit rather than 64-bit Integer) allows me to create an array with more elements.
While there is no limit how big an Array can be in Julia there is obviously the available RAM memory limit (mentioned in the other answer). Basically, you can assume that all your available system memory can be allocated for a Julia process. sizeof is a good way to calculate how much RAM you need.
However, if you actually do big array computing in Julia the above limit can be circumvented in many ways:
Use massive memory machines from a major cloud computing provider. I use Julia on AWS Linux and it walks like a charm - you can have a machine up to 4TB RAM on a virtual machine and 24TB RAM on a bare metal machine. While it is not a Julia solution, sometimes it is the easiest and cheapest way to go.
Sometimes your data is sparse - you do not actually use all of those memory cells. In such cases consider SparseArrays. In other cases your sparse data is formatted in some specific way (e.g. non-zero values only on diagonal) in that case use BanndedMatrices.jl. It is worth noting that there is even a Julia package for infinite algebra. Basically whatever you find at the Julia Matrices project is worth looking at.
You can use memory mapping - that means that most of your array is on disk and only some part is hold in RAM. In this way you are limited by your disk space rather than the RAM.
You can use DistributedArrays.jl and have a single huge Array hosted on several machines.
Hope it will be useful for you or other people trying to do big data algebra in Julia.

Deallocation of array target, but pointer still seems to have the values [duplicate]

I have a question related to one asked some years ago on Intel Developer Forum about the in-place reshaping of an array.
In short, the answer was that an array of a certain rank can be allocated, and a pointer created that refers to the same memory location (i.e. in-place), but with a different rank, e.g.:
use, intrinsic :: ISO_C_BINDING
integer, allocatable, target :: rank1_array(:)
integer, pointer :: rank3_array(:,:,:)
integer :: i
! Allocate rank1_array
allocate(rank1_array(24))
! Created rank3_pointer to rank1_array
call C_F_POINTER (C_LOC(rank1_array), rank3_array, [3,2,4])
! Now rank3_array is the same data as rank1_array, but a 3-dimension array with bounds (3,2,4)
My question is now that if I deallocate the original array rank1_array, why is it that the pointer rank3_array is still associated, and can be used without a problem (seemingly). Thus, if I append the code segment from above with:
! initialise the allocated array
rank1_array = [(i, i=1,24)]
! then deallocate it
deallocate(rank1_array)
! now do stuff with the pointer
print *, associated(rank3_array)
rank3_array(2,2,1) = 99
print *, rank3_array
Compiling and running this program gives me the output
gfortran -Wall my_reshape.f90 -o my_reshape
./my_reshape
T
1 2 3 4 99 6 7 ... 23 24
If the memory of rank1_array was deallocated, why does rank3_array still function unless it is a copy of the original? Was the initial reshape then in-place or not? Would be very grateful if someone could explain this behaviour to me.
I'm using gfortran 6.1.0 of that is of interest.
Edit/Update:
As the accepted answer by #francescalus indicates, the real issue here is how I (incorrectly!) handled pointers in general and not the in-place reshape with C_F_POINTER in particular. The strange behaviour I saw was just a result of undefined behaviour due to non-compliant fortran code I wrote. Based on #francescalus answer and comments, I did more reading online and thought it might be useful to give a link to a relevant section of a Fortran Reference Manual that very clearly explains how pointers and allocatable arrays should be handled.
That c_f_pointer is used instead of "normal" pointer assignment is not relevant to the problem, nor is the changing shape.
After the call to c_f_pointer the pointer rank3_array is pointer associated with the target rank1_array. There is no copy made.
When rank1_array is deallocated in the statement
deallocate(rank1_array)
this has an effect on the pointer which has rank1_array as a target. In particular, the pointer association status of rank3_array becomes undefined. (Whenever a pointer's target is deallocated except through the pointer, the pointer's association status becomes undefined.)
With the pointer of undefined association status the next part
print *, associated(rank3_array)
is not allowed. At this point the program is not a Fortran-compliant program (and the compiler needn't detect that) and the processor is allowed to print .TRUE. here if it wants to.
Equally, with
rank3_array(2,2,1) = 99
print *, rank3_array
rank3_array itself is undefined and those references are also not allowed. Again, any effect is available to the compiler.
Now, as in another answer on a similar topic: just because rank1_array has been deallocated that doesn't mean that the memory gets purged. Probably all that happens is some array descriptor for the first array has its status changed. It isn't the compiler's responsibility to do the same to all related pointers/descriptors. (And so the pointer's descriptor may indeed still say "associated".)
It's important to note, though: it may look like it's working, but I wouldn't advise betting your job on it.

Deleting some elements in an array in OpenCL kernel code

After compacting an array(putting required elements from an input array into an output array) by doing a scan operation, there might be some empty spaces left in the output(compacted) array in a contiguous form after the required elements are placed. Is there a way to free these empty spaces in the OpenCL kernel code itself without going back in the host(just for the sake of deleting)...?
for eg I have an input array of 100 elements with some no.s greater than 50 and some of them less than 50 and want to store the no.s more than 50 in a different array and do further processing only on those elements in that array, and I don't know the size of this output array since I don't know how many no.s are actually greater than 50(so I declare the size of this array to be 100)... then after performing a scan I get the output array with all elements more than 50... but there might be some continuous spaces empty in the output array after the storage of these elements... then how do we delete these spaces... Is there a way of doing this in the kernel code itself...? Or do we have to come back in the Host code for this...?
How do we deal with such compacted arrays to do further processing if we can't delete the remaining spaces in the kernel code itself and also if we don't want to go back in the host code..?
There is no simple solution to your problem I'm afraid.
What I think you might do, is to have a counter of the elements in each array. You can increment the counter first locally with atomic_inc() and then globally with atomic_add().
This way at the end of your kernel execution the total number of elements in each array will be present.
You can also use this atomic operation as an index for the array. This way you can write to the output without any "hole" in your array. However you will probably lose some speed due to abusing of atomic operations I'm afraid.

Nullifying Pointers in Fortran

I am adding a module to a Fortran code, and ran across the following issue. I have a derived data type Info that contains several other variables, among them a 4D pointer array (it is a hydro code, so it is 3 spatial components and 1 variable component). To make my subroutine easier to read, I just make a pointer q and point to Info%q, as follows:
real,pointer::q(:,:,:,:)
q=>Info%q
...
some work on q
The question I am running into is: should I use deallocate(q) before nullify(q)? Or, since q is pointing to an array that is necessary elsewhere in the code, should I just use nullify?
Thanks for your help.
Only nullify! Otherwise the original pointer would be undefined and the array would no longer exist!

How Is MD5 generation dependent on file size?

Is there any efficiency analysis of how MD5 dependent on the file size. Is it actually dependent of file size or content of the file. So for i have 500mb file with all blank spaces and a 500mb file with movie in it, would md5 take same time to generate the the hash code?
Any hashsum is, by definition, a mathematical sum of the bytes of what you're summing. You have to read the file through a stream at the very least - more bytes take longer to traverse. However, I'd say (generally speaking) the bottleneck will indeed be reading the file, no matter what you're trying to with it - not hashing it once you've read it.
Edit: I kinda misread the question. It will take exactly the same amount of time to hash two files of equal size. 500mb of spaces is 500mb of bytes which represent "space". That's still 8 bits of data per byte, same as any other file.
Because MD5 consists mostly of XOR, AND, OR and NOT operations, the speed is not dependent on a given bit containing a 1 or a 0.
From http://en.wikipedia.org/wiki/MD5:
There are four possible functions F; a different one is used in each round:
denote the XOR, AND, OR and NOT operations respectively.
All hashes in general, and including MD5, do not have performance dependent upon the content.
Here's a quick empirical test.
# dd if=/dev/urandom of=randomfile bs=1024 count=512000
# dd if=/dev/zero of=zerofile bs=1024 count=512000
# time md5 randomfile
MD5 (randomfile) = bb318fa1561b17e30d03b12e803262e4
real 0m2.753s
user 0m1.567s
sys 0m1.157s
# time md5 zerofile
MD5 (zerofile) = d8b61b2c0025919d5321461045c8226f
real 0m2.761s
user 0m1.567s
sys 0m1.168s
This is expected as per previous answers alluding to the bit manipulations used in the MD5 algorithm.
MD5, like most other hash algorithms, operates on blocks. For each 512-bit block of the input it performs the same operation and uses the output as part of the input for the next block.
The operation consists of the same basic operations (XOR, AND, NOT etc.). On all processors that I know, these operations will take the same time, no matter what the arguments are. So the time MD5 should take to process input should be linear in the number of 512-bit blocks in the input.

Resources