How to extract motion vectors and info on frame partition in HEVC HM 16.15 - codec

I am using the HEVC reference software, HM Encoder Version [16.15] (including RExt) on a [Mac OS X][GCC 4.2.1][64 bit] and would like to extract at encoder side:
1) the motion vectors for each block
2) the frame partition information, i.e. the size and location of each block in the frame to which a motion vector refers.
Does anybody have hints on what are the variables where this info is stored for each coding unit? Thanks!

All you need is available in the TComDataCU class.
1) For motion information, there is the function getCUMvField() which returns the motion vector. It's not easy to work with it though.
Basically, to access almost any of the PU/CU level syntax elements, you need to be able to work with the absolute index of that PU/CU. This unique index tells you where exactly your PU/CU is located in the CTU by pointing to the up-left 4x4 block of that part.
I rememberthat that most of the times this index is stored in the variable uiAbsPartIdx.
If you get to know how to work with this index, then you will be able to get the block partitioning information in the CTU level. so for 2) my suggestion is that you go to the slice level when you have a loop over CUTs (I think this is done in the compressSlice() function). And after the compressCtu() function is called for each CTU (which means that all RDO decisions have been made and the CTU partitioning is decided), you put a loop over all uiAbsPartIdxs of the CTU and get their width and height. For example if your CTU size is 128, then you will have 32*32=1024 unique 4x4 blocks in your CTU. The function for getting width/height of the CU corresponding to a certain uiAbsPartIdx is pCtu->getWidth(uiAbsPartIdx).
I hope it was clear.

Related

Find the first root and local maximum/minimum of a function

Problem
I want to find
The first root
The first local minimum/maximum
of a black-box function in a given range.
The function has following properties:
It's continuous and differentiable.
It's combination of constant and periodic functions. All periods are known.
(It's better if it can be done with weaker assumptions)
What is the fastest way to get the root and the extremum?
Do I need more assumptions or bounds of the function?
What I've tried
I know I can use root-finding algorithm. What I don't know is how to find the first root efficiently.
It needs to be fast enough so that it can run within a few miliseconds with precision of 1.0 and range of 1.0e+8, which is the problem.
Since the range could be quite large and it should be precise enough, I can't brute-force it by checking all the possible subranges.
I considered bisection method, but it's too slow to find the first root if the function has only one big root in the range, as every subrange should be checked.
It's preferable if the solution is in java, but any similar language is fine.
Background
I want to calculate when arbitrary celestial object reaches certain height.
It's a configuration-defined virtual object, so I can't assume anything about the object.
It's not easy to get either analytical solution or simple approximation because various coordinates are involved.
I decided to find a numerical solution for this.
For a general black box function, this can't really be done. Any root finding algorithm on a black box function can't guarantee that it has found all the roots or any particular root, even if the function is continuous and differentiable.
The property of being periodic gives a bit more hope, but you can still have periodic functions with infinitely many roots in a bounded domain. Given that your function relates to celestial objects, this isn't likely to happen. Assuming your periodic functions are sinusoidal, I believe you can get away with checking subranges on the order of one-quarter of the shortest period (out of all the periodic components).
Maybe try Brent's Method on the shortest quarter period subranges?
Another approach would be to apply your root finding algorithm iteratively. If your range is (a, b), then apply your algorithm to that range to find a root at say c < b. Then apply your algorithm to the range (a, c) to find a root in that range. Continue until no more roots are found. The last root you found is a good candidate for your minimum root.
Black box function for any range? You cannot even be sure it has the continuous domain over that range. What kind of solutions are you looking for? Natural numbers, integers, real numbers, complex? These are all the question that greatly impact the answer.
So 1st thing should be determining what kind of number you accept as the result.
Second is having some kind of protection against limes of function that will try to explode your calculations as it goes for plus or minus infinity.
Since we are touching the limes topics you could have your solution edge towards zero and look like a solution but never touch 0 and become a solution. This depends on your margin of error, how close something has to be to be considered ok, it's good enough.
I think for this your SIMPLEST TO IMPLEMENT bet for real number solutions (I assume those) is to take an interval and this divide and conquer algorithm:
Take lower and upper border and middle value (or approx middle value for infinity decimals border/borders)
Try to calculate solution with all 3 and have some kind of protection against infinities
remember all 3 values in an array with results from them (3 pair of values)
remember the current best value (one its closest to solution) in seperate variable (a pair of value and result for that value)
STEP FORWARD - repeat above with 1st -2nd value range and 2nd -3rd value range
have a new pair of value and result to be closest to solution.
clear the old value-result pairs, replace them with new ones gotten from this iteration while remembering the best value solution pair (total)
Repeat above for how precise you wish to get and look at that memory explode with each iteration, keep in mind you are gonna to have exponential growth of values there. It can be further improved if you lets say take one interval and go as deep as you wanna, remember best value-result pair and then delete all other memory and go for next interval and dig deep.

Tile Based A* Pathfinding, but with a bomb

I've written a simple A* path finding algorithm to quickly find a way through a tile based dungeon in which the tiles contain the information of walls.
An example of a dungeon (only 1 path for simplicity):
However now I'd like to add a variable amount of "Bombs" to the algorithm which would allow the path-finding to ignore 1 wall. However now it doesn't find the best paths anymore,
for example with use of only 1 bomb the generated path looks like the first image here:
Edit: actually it would look like this: https://i.stack.imgur.com/kPoAA.png
While the correct path would be the second image
The problem is that "Closed Nodes" now interfere with possible paths. Any ideas of how to tackle this problem would be greatly appreciated!
Your "game state" will no longer only be defined by your location, but also by an integer representing the number of bombs you have left. If you're following the pseudocode of A* on wikipedia, this means you cannot simply implement the closedSet as a grid of booleans. It should probably be implemented as, for example, a hash map / hash set, where every entry holds the following data:
x coordinate
y coordinate
number of bombs left
By visiting a certain position in the search process, you'll no longer mark just that position as closed. You'll mark the combination of position + number of bombs left as closed. That way, if later on in the same search process you run into a position where you're at the same location, but have more bombs left, you will not ignore it as closed but will actually continue searching that possibility.
Note that, if the maximum possible number of bombs is relatively low, you could also implement the closedSet as an array of boolean grids, where you first index by number of bombs, then by x and y coordinates to find out if a specific position is closed or not.
Doesn't this mean that you just pretend to not have any walls at all?
Use A* to find the shortest path from start to end and then check how many walls you'd have to go through. If you have enough bombs, you can use the path. Otherwise, try the next longest path and so on.
By the way: you might want to check http://gamedev.stackexchange.com for questions like this one.
You need to tweak the cost function to cost something for a bomb, then run the algorithm normally with an infinite cost for a second bomb. To get the bomb approximately halfway, play about with cost function, it should probably cost about the heuristic A-B distance times the cost for an empty tile. If you have two bombs, half the cost and of course then use of three bombs costs infinite.
But don't expect very good results. A* isn't designed for that kind of optimisation.

What is Vector data structure

I know Vector in C++ and Java, it's like dynamic Array, but I can't find any general definition of Vector data structure. So what is Vector? Is Vector a general data structure(like arrray, stack, queue, tree,...) or it just a data type depending on language?
The word "vector" as applied to computer science/programming is borrowed from math, which can make the use confusing (even your question could be on multiple subjects).
The simplest example of vectors in math is the number line, used to teach elementary math (especially to help visualize negative numbers, subtraction of negative numbers, addition of negative numbers, etc).
The vector is a distance and direction from a point. This is why it can confuse the discussion, because a vector data structure COULD be three points, X,Y,Z, in a structure used in 3D graphics engines, or a 2D point (just X,Y). In that context, the subtraction of two such points results in a vector - the vector describes how far and in what direction to travel from one of the source operands to the other.
This applies to storage, like stl vectors or Java vectors, in that storage is represented as a distance from an address (where a memory address is similar to a point in space, or on a number line).
The concept is related to arrays, because arrays could be the storage allocated for a vector, but I submit that the vector is a larger concept than the array. A vector must include the concept of distance from a starting point, and if you think of the beginning of an array as the starting point, the distance to the end of the array is it's size.
So, the data structure representing a vector must include the size, whereas an array doesn't have storage to include the size, it's assumed by the way it's allocated. That is to say, if you dynamically allocate an array, there is no data structure storing the size of that array, the programmer must assume to know that size, or store it in a some integer or long.
The vector data structure (say, the design of a vector class) DOES need to store the size, so at a minimum, there would be a starting point (the base of an array, or some address in memory) and a distance from that point indicating size.
That's really "RAM" oriented, though, in description, because there's one more point not yet described which must be part of the data describing the vector - the notion of element size. If a vector represents bytes, and memory storage is typically measured in bytes, an address and a distance (or size) would represent a vector of bytes, but nothing else - and that's a very machine level thinking. A higher thought, that of some structure, has it's own size - say, the size of a float or double, or of a structure or class in C++. Whatever the element size is, the memory required to store N of them requires that the vector data structure have some knowledge of WHAT it's storing, and how large that thing is. This is why you'd think in terms of "a vector of strings" or "a vector of points". A vector must also store an element size.
So, a basic vector data structure must have:
An address (the starting point)
An element size (each thing it stores is X bytes long)
A number of elements stored (how many elements times element size is 'minimum' storage size).
One important "assumption" made in this simple 3 item list of entries in the vector data structure is that the address is allocated memory, which must be freed at some point, and is to be guarded against access beyond the end of the vector.
That means there's something missing. In order to make a vector class work, there is a recognizable difference between the number of ITEMS stored in the vector, and the amount of memory ALLOCATED for that storage. Typically, as you might realize from the use of vector from the STL, it may "know" it has room to store 10 items, but currently only has 2 of them.
So, a working vector class would ALSO have to store the amount of memory allocation. This would be how it could dynamically extend itself - it would now have sufficient information to expand storage automatically.
Thinking through just how you would make a vector class operate gives you the structure of data required to operate a vector class.
It's an array with dynamically allocated space, everytime you exceed this space new place in memory is allocated and old array is copied to the new one. Old one is freed then.
Moreover, vector usually allocates more memory, than it needs to, so it does not have to copy all the data, when new element is added.
It may seem, that lists then are much much better, but it's not necessarily so. If you do not change your vector often (in terms of size), then computer's cache memory functions much better with vectors, than lists, because they are continuus in memory space. Disadvantage is when you have large vector, that you need to expand. Then you have to agree to copy large amount of data to another space in memory.
What's more. You can add new data to the end and to the front of the vector. Because Vector's are array-like, then every time you want to add element to the beginning of the vector all the array has to be copied. Adding elements to the end of vector is far more efficient. There's no such an issue with linked lists.
Vector gives random access to it's internal kept data, while lists,queues,stacks do not.
Vectors are the same as dynamic arrays with the ability to resize
itself automatically when an element is inserted or deleted.
Vector elements are placed in contiguous storage so that they can be
accessed and traversed using iterators.
In vectors, data is inserted at the end.

Creating an efficient function to fit a dataset

Basically I have a large (could get as large as 100,000-150,000 values) data set of 4-byte inputs and their corresponding 4-byte outputs. The inputs aren't guaranteed to be unique (which isn't really a problem because I figure I can generate pseudo-random numbers to add or xor the inputs with so that they do become unique), but the outputs aren't guaranteed to be unique either (so two different sets of inputs might have the same output).
I'm trying to create a function that effectively models the values in my data-set. I don't need it to interpolate efficiently, or even at all (by this I mean that I'm never going to feed it an input that isn't contained in this static data-set). However it does need to be as efficient as possible. I've looked into interpolation and found that it doesn't really fit what I'm looking for. For example, the large number of values means that spline interpolation won't do since it creates a polynomial per interval.
Also, from my understanding polynomial interpolation would be way too computationally expensive (n values means that the polynomial could include terms as high as pow(x,n-1). For x= a 4-byte number and n=100,000 it's just not feasible). I've tried looking online for a while now, but I'm not very strong with math and must not know the right terms to search with because I haven't come across anything similar so far.
I can see that this is not completely (to put it mildly) a programming question and I apologize in advance. I'm not looking for the exact solution or even a complete answer. I just need pointers on the topics that I would need to read up on so I can solve this problem on my own. Thanks!
TL;DR - I need a variant of interpolation that only needs to fit the initially given data-points, but which is computationally efficient.
Edit:
Some clarification - I do need the output to be exact and not an approximation. This is sort of an optimization of some research work I'm currently doing and I need to have this look-up implemented without the actual bytes of the outputs being present in my program. I can't really say a whole lot about it at the moment, but I will say that for the purposes of my work, encryption (or compression or any other other form of obfuscation) is not an option to hide the table. I need a mathematical function that can recreate the output so long as it has access to the input. I hope that clears things up a bit.
Here is one idea. Make your function be the sum (mod 232) of a linear function over all 4-byte integers, a piecewise linear function whose pieces depend on the value of the first bit, another piecewise linear function whose pieces depend on the value of the first two bits, and so on.
The actual output values appear nowhere, you have to add together linear terms to get them. There is also no direct record of which input values you have. (Someone could conclude something about those input values, but not their actual values.)
The various coefficients you need can be stored in a hash. Any lookups you do which are not found in the hash are assumed to be 0.
If you add a certain amount of random "noise" to your dataset before starting to encode it fairly efficiently, it would be hard to tell what your input values are, and very hard to tell what the outputs are even approximately without knowing the inputs.
Since you didn't impose any restriction on the function (continuous, smooth, etc), you could simply do a piece-wise constant interpolation:
or a linear interpolation:
I assume you can figure out how to construct such a function without too much trouble.
EDIT: In light of your additional requirement that such a function should "hide" the data points...
For a piece-wise constant interpolation, the constant intervals should be randomized so as to not reveal where the data point is. So for example in the picture, the intervals are centered about the data point it's interpolating. Instead, you might want to do something like:
[0 , 0.3) -> 0
[0.3 , 1.9) -> 0.8
[1.9 , 2.1) -> 0.9
[2.1 , 3.5) -> 0.2
etc
Of course, this only hides the x-coordinate. To hide the y-coordinate as well, you can use a linear interpolation.
Simply make it so that the "pointy" part isn't where the data point is. Pick random x-values such that every adjacent data point has one of these x-values in between. Then interpolate such that the "pointy" part is at these x-values.
I suggest a huge Lookup Table full of unused entries. It's the brute-force approach, having an ordered table of outputs, ordered by every possible value of the input (not just the data set, but also all other possible 4-byte value).
Though all of your data would be there, you could fill the non-used inputs with random, arbitrary, or stochastic (random whithin potentially complex constraints) data. If you make it convincing, no one could pick your real data out of it. If a "real" function interpolated all your data, it would also "contain" all the information of your real data, and anyone with access to it could use it to generate an LUT as described above.
LUTs are lightning-fast, but very memory hungry. Your case is on the edge of feasibility, requiring (2^32)*32= 16 Gigabytes of RAM, which requires a 64-bit machine to run. That is just for the data, not the program, the Operating System, or other data. It's better to have 24, just to be sure. If you can afford it, they are the way to go.

fourier transform to transpose key of a wav file

I want to write an app to transpose the key a wav file plays in (for fun, I know there are apps that already do this)... my main understanding of how this might be accomplished is to
1) chop the audio file into very small blocks (say 1/10 a second)
2) run an FFT on each block
3) phase shift the frequency space up or down depending on what key I want
4) use an inverse FFT to return each block to the time domain
5) glue all the blocks together
But now I'm wondering if the transformed blocks would no longer be continuous when I try to glue them back together. Are there ideas how I should do this to guarantee continuity, or am I just worrying about nothing?
Overlap the time samples for each block by half so that each block after the first consists of the last N/2 samples from the previous block and N/2 new samples. Be sure to apply some window to the samples before the transform.
After shifting the frequency, perform an inverse FFT and use the middle N/2 samples from each block. You'll need to adjust the final gain after the IFFT.
Of course, mixing the time samples with a sine wave and then low pass filtering will provide the same shift in the time domain as well. The frequency of the mixer would be the desired frequency difference.
For speech you might want to look at PSOLA - this is a popular algorithm for pitch-shifting and/or time stretching/compression which is a little more sophisticated than the basic overlap-add method, but not much more complex.
If you need to process non-speech samples, e.g. music, then there are several possibilities, however the overlap-add FFT/modify/IFFT approach mentioned in other answers is probably the best bet.
Found this great article on the subject, for anyone trying it in the future!
You may have to find a zero-crossing between the blocks to glue the individual wavs back together. Otherwise you may find that you are getting clicks or pops between the blocks.

Resources