Implicit conversion from system array to ilnumerics array doesnt retain double precision - math

When I assign a system array of doubles to an ilnumerics double array, the values are rounded off to nearest integer. This happens particularly for only large arrays.
Is there any way in ILnumerics to specify up to how many decimals the rounding should occur?
The following screenshot shows the problem . Sample_pulsedata is double array of length 1860 which I am assigning to sample_ydata.

The elements are not really rounded. The effect rather comes from the way the elements are displayed in Visual Studios data tips. ILNumerics tries to find a common scale factor which allows to display all elements in an array aligned.
In your example - presumably - there exist large values at higher indices, which are not shown currently (scroll down in order to find them). These elements cause the scale factor to be 1/10^4. This is indicated in the first line, index [0]: '(:;:) 1e+004'. The 32.57 therefore must get rounded to 33 in order to fit into the 4 digits after the decimal point. '4' is a fixed value in ILNumerics and cannot easily get changed.
The actual values of the array elements are not affected, of course. You can use the Watch window to show only the interesting part of the array, without the rounding effect:
sample_ydata["0:13"]
Or, even better, use the ILNumerics Array Visualizer in order to visualize your data graphically. This not only gives a nice overview of the whole array but also prevents from such artefacts as you encountered.

Related

Determine if there exists a number in the array occurring k times

I want to create a divide and conquer algorithm (O(nlgn) runtime) to determine if there exists a number in an array that occurs k times. A constraint on this problem is that only a equality/inequality comparison method is defined on the objects of the array (i.e can't use <, >).
So I have tried a number of approaches including splitting the array into k pieces of equal size (approximately). The approach is similar to finding the majority item in an array, however in the majority case when you split the array, you know that one half must have a majority item if such an item exists. Any pointers or tips that one could provide to put me in the right direction ?
EDIT: To clear up a little, I am wondering whether the problem of finding the majority item by splitting the array in half and using a recursive solution can be extended to other situations where k may be n/4 or n/5 etc.
Maybe I should of phrased the question using n/k instead.
This is impossible. As a simple example of why this is impossible, consider an input with a length-n array, all elements distinct, and k=2. The only way to be sure no element appears twice is to compare every element against every other element, which takes O(n^2) time. Until you perform all possible comparisons, you cannot be sure that some pair you didn't compare isn't actually equal.

Convert RGBA{U8}(0.384,0.0,0.0,1.0) to Integer

I am using Images.jl in Julia. I am trying to convert an image into a graph-like data structure (v,w,c) where
v is a node
w is a neighbor and
c is a cost function
I want to give an expensive cost to those neighbors which have not the same color. However, when I load an image each pixel has the following Type RGBA{U8}(1.0,1.0,1.0,1.0), is there any way to convert this into a number like Int64 or Float?
If all you want to do is penalize adjacent pairs that have different color values (no matter how small the difference), I think img[i,j] != img[i+1,j] should be sufficient, and infinitely more performant than calling colordiff.
Images.jl also contains methods, raw and separate, that allow you to "convert" that image into a higher-dimensional array of UInt8. However, for your apparent application this will likely be more of a pain, because you'll have to choose between using a syntax like A[:, i, j] != A[:, i+1, j] (which will allocate memory and have much worse performance) or write out loops and check each color channel manually. Then there's always the slight annoyance of having to special case your code for grayscale and color, wondering what a 3d array really means (is it 3d grayscale or 2d with a color channel?), and wondering whether the color channel is stored as the first or last dimension.
None of these annoyances arise if you just work with the data directly in RGBA format. For a little more background, they are examples of Julia's "immutable" objects, which have at least two advantages. First, they allow you to clearly specify the "meaning" of a certain collection of numbers (in this case, that these 4 numbers represent a color, in a particular colorspace, rather than, say, pressure readings from a sensor)---that means you can write code that isn't forced to make assumptions that it can't enforce. Second, once you learn how to use them, they make your code much prettier all while providing fantastic performance.
The color types are documented here.
Might I recommend converting each pixel to greyscale if all you want is a magnitude difference.
See this answer for a how-to:
Converting RGB to grayscale/intensity
This will give you a single value for intensity that you can then use to compare.
Following #daycaster's suggestion, colordiff from Colors.jl can be used.
colordiff takes two colors as arguments. To use it, you should extract the color part of the pixel with color i.e. colordiff(color(v),color(w)) where v would be RGBA{U8(0.384,0.0,0.0,1.0) value.

What is Vector data structure

I know Vector in C++ and Java, it's like dynamic Array, but I can't find any general definition of Vector data structure. So what is Vector? Is Vector a general data structure(like arrray, stack, queue, tree,...) or it just a data type depending on language?
The word "vector" as applied to computer science/programming is borrowed from math, which can make the use confusing (even your question could be on multiple subjects).
The simplest example of vectors in math is the number line, used to teach elementary math (especially to help visualize negative numbers, subtraction of negative numbers, addition of negative numbers, etc).
The vector is a distance and direction from a point. This is why it can confuse the discussion, because a vector data structure COULD be three points, X,Y,Z, in a structure used in 3D graphics engines, or a 2D point (just X,Y). In that context, the subtraction of two such points results in a vector - the vector describes how far and in what direction to travel from one of the source operands to the other.
This applies to storage, like stl vectors or Java vectors, in that storage is represented as a distance from an address (where a memory address is similar to a point in space, or on a number line).
The concept is related to arrays, because arrays could be the storage allocated for a vector, but I submit that the vector is a larger concept than the array. A vector must include the concept of distance from a starting point, and if you think of the beginning of an array as the starting point, the distance to the end of the array is it's size.
So, the data structure representing a vector must include the size, whereas an array doesn't have storage to include the size, it's assumed by the way it's allocated. That is to say, if you dynamically allocate an array, there is no data structure storing the size of that array, the programmer must assume to know that size, or store it in a some integer or long.
The vector data structure (say, the design of a vector class) DOES need to store the size, so at a minimum, there would be a starting point (the base of an array, or some address in memory) and a distance from that point indicating size.
That's really "RAM" oriented, though, in description, because there's one more point not yet described which must be part of the data describing the vector - the notion of element size. If a vector represents bytes, and memory storage is typically measured in bytes, an address and a distance (or size) would represent a vector of bytes, but nothing else - and that's a very machine level thinking. A higher thought, that of some structure, has it's own size - say, the size of a float or double, or of a structure or class in C++. Whatever the element size is, the memory required to store N of them requires that the vector data structure have some knowledge of WHAT it's storing, and how large that thing is. This is why you'd think in terms of "a vector of strings" or "a vector of points". A vector must also store an element size.
So, a basic vector data structure must have:
An address (the starting point)
An element size (each thing it stores is X bytes long)
A number of elements stored (how many elements times element size is 'minimum' storage size).
One important "assumption" made in this simple 3 item list of entries in the vector data structure is that the address is allocated memory, which must be freed at some point, and is to be guarded against access beyond the end of the vector.
That means there's something missing. In order to make a vector class work, there is a recognizable difference between the number of ITEMS stored in the vector, and the amount of memory ALLOCATED for that storage. Typically, as you might realize from the use of vector from the STL, it may "know" it has room to store 10 items, but currently only has 2 of them.
So, a working vector class would ALSO have to store the amount of memory allocation. This would be how it could dynamically extend itself - it would now have sufficient information to expand storage automatically.
Thinking through just how you would make a vector class operate gives you the structure of data required to operate a vector class.
It's an array with dynamically allocated space, everytime you exceed this space new place in memory is allocated and old array is copied to the new one. Old one is freed then.
Moreover, vector usually allocates more memory, than it needs to, so it does not have to copy all the data, when new element is added.
It may seem, that lists then are much much better, but it's not necessarily so. If you do not change your vector often (in terms of size), then computer's cache memory functions much better with vectors, than lists, because they are continuus in memory space. Disadvantage is when you have large vector, that you need to expand. Then you have to agree to copy large amount of data to another space in memory.
What's more. You can add new data to the end and to the front of the vector. Because Vector's are array-like, then every time you want to add element to the beginning of the vector all the array has to be copied. Adding elements to the end of vector is far more efficient. There's no such an issue with linked lists.
Vector gives random access to it's internal kept data, while lists,queues,stacks do not.
Vectors are the same as dynamic arrays with the ability to resize
itself automatically when an element is inserted or deleted.
Vector elements are placed in contiguous storage so that they can be
accessed and traversed using iterators.
In vectors, data is inserted at the end.

Does a data type exist in any language that spans a parameter range 0-1?

I am often programming mathematical algorithms that assume a nondimensional parameter spans the continuous space from 0..1 inclusive. These algorithms could in theory benefit from maximum resolution over the parameter space and I've considered that it would be of use to expend the full 32 or 64 bits of precision over the parameter space, with none wasted for exponents or signs.
I imagine the methods would look similar to an unsigned integer divided by its maximum representable value. Does this exist already and if so where, if not, is there a compelling reason why?
Can't you simply do all calculations in integers from 0 to MAX_INT, keeping all the same formulas/algorithms/whatever and then use "unsigned integer divided by its maximum representable value" conversion as very final step before printing result to user (or otherwise outputting it - for example in intermediate logs)?
The representation doesn't make sense without algorithms. E.g. you could represent it as fixed point (i.e. 0..MAX_INT / MAX_INT) or floated point a mantissa and exponent (e.g. to have an ability to store a values like 1e-1000) or something custom (e.g. to have an ability to represent a number 1/π precisely). After it you have define algos to manipulate the numbers in such representations. So, in other words there is no silver bullet to cover all cases. Only you know your task and could choose the best solution.
Moreover, the continuous space is impossible to represent using computes, because the space has infinite number of elements, so it cannot be algorithmized.

How to watermark some vector data in an invisible way?

I have a some vector data that has been manually created, it is just a list of x,y values. The coordinate of the points is not perfectly accurate - it can be off by a few pixels and it won't make any perceivable difference.
So now I am looking for some way to watermark this data, so that if someone steal the vector data, I can prove that it's indeed been stolen. I'm looking for some method reliable enough that even if someone take my data and shift all the points by a some small amount, I can still prove that it's been stolen.
Is there any way to do that? I know it exists for bitmap data but how about vector data?
PS: the vector graphic itself is rather random - it cannot be copyrighted.
Is the set of points all you can work with? If, for example, you were dealing with SVG, you could export the file with a certain type of XML formatting, a <!-- generated by thingummy --> comment at the top, IDs generated according to such-and-such a pattern, extra attributes specifically yours, a particular style of applying translations, etc. Just like you can work out from a JPEG what is likely to have been used to create it, you can tell a lot about what produced an SVG file by observation.
On the vectors themselves, you could do something like consider them as an ordered sequence and apply offsets given by the values of two pseudo-random sequences, each starting from a known seed, for X and Y translation, in a certain range (such as [-1, 1]). Even if some points are modified, you should be able to build up an argument from how things match the sequence. How to distinguish precisely what has been shifted could do with a bit more consideration, too; if you were simply doing int(x) + random(-1, 1), then if someone just rounded all values your evidence would be lost. A better way of dealing with this would be to, while still rendering at the same screen size, multiply everything by some constant like 953 (an arbitrary near-1000 prime) and then adjust your values by something in that range (viz, [0, 952]). This base-953 system would be superior to a base-10 system because it's much (much much) harder to see what's happening. If the person changes the scaling, it would require a bit more analysis of values, but it should still be quite possible. I've got a gut feeling that that's where picking a prime number could be a bit helpful, but I haven't thought about it terribly much. If in danger or in doubt in such matters, pick a prime number for the sake of it... you may find out later there are benefits to it!
Combine a number of different techniques for best results, of course.

Resources