I have a double vector:
r = -50 + (50+50)*rand(10,1)
Now i want to ideally have all the numbers in the vector equal upto a tolerance of say 1e-4. I want to represent each r with a scalar say s(r) such that its value gives an idea of the quality of the vector. The vector is high quality if all elements in the vector are equal-like. I can easily run a for loop like
for i=1:10
for j=i+1:10
check equality upto the tolerance
end
end
But even then i cannot figure what computation to do inside the nested for loops to assign a scalar representing the quality . Is there a better way such that given any vector r length n, i can quickly calculate a scalar representing the quality of the vector.
Your double-loop algorithm is somewhat slow, of order O(n**2) where n is the number of dimensions of the vector. Here is a quick way to find the closeness of the vector elements, which can be done in order O(n), just one pass through the elements.
Find the maximum and the minimum of the vector elements. Just use two variables to store the maximum and minimum so far and run once through all the elements. The difference between the maximum and the minimum is called the range of the values, a commonly accepted measure of dispersion of the values. If the values are exactly equal, the range is zero which shows perfect quality. If the range is below 1e-4 then the vector is of acceptable quality. The bigger the range, the worse the equality.
The code is obvious for just about any given language, so I'll leave that to you. If the fact that the range only really considers the two extreme values of the vector bothers you, you could use other measures of variation such as the interquartile range, variance, or standard deviation. But the range seems to best fit what you request.
Related
I have a vector that is being filled with random numbers within this range [0,1]. I want to somehow accept only the vectors, in which an element inside of it has a maximum deviation of 0,02 from its previous one and its next one.
For example I have the below vector [3,1]. This is acceptable, because the deviation of the 2nd element, between the first and the third element is not bigger than 0,02. Vector is not always consisted of 3 rows, it could be more.
**Vector**
0.32957
0.33097
0.33946
This is what i thought:
n=4
P=rand(1,n);
sort(P,"ascend");
for L=2:n
while P(L-1)-P(L)>0.02
P=rand(1,n);
endwhile
endfor
Vectorize this!
isvalid=~any(diff(sort(a))>0.02);
sort(a) : if its not sorted, sort
diff() : take the difference between adjacent elements
___ >0.02: Check if any of those differences is bigger than what you accept
~any(): if any is bigger, then return zero, "not valid".
From your code, it seems that there may be more to the question than what you ask, you seem to have the XY problem. You want to create a random vector that has the properties that you describe. You seem to be using uniform random numbers, so let me propose a way to generate your vector where your conditions are always true.
a(1)=rand(1); %or any other way to generate a first value.
length=100; %desired length.
a(2:length)=rand(length-1,1)*0.02; %generate random numbers never bigger than 0.02
a=cumsum(a); %cumulative sum
This ensures the vector is increasing in value, and never increasing more than 0.02
Suppose the problem posed is as follows:
On Mars there lives a colony of worms. Each worm is represented as elements in an 1D array. Worms decide to eat each other but any worm can eat only its nearest neighbour. Each worm has a preset amount of energy(i.e the value of the element). On Mars, the laws dictate that when a worm i with energy x eats another worm with energy y, the i-th worm’s final energy becomes x-y. A worm is allowed to have negative energy levels.
Find the maximum value of energy of the last standing worm.
Sample data:
0,-1,-1,-1,-1 has answer 4.
2,1,2,1 has answer 4.
What will be the suitable logic to address this problem?
This problem has a surprisingly simple O(N) solution.
If any two members in the array have different signs, the answer is then sum of absolute values of all elements.
To see why, imagine a single positive value in the array, all other elements are negative (Example 1). Now the best strategy would be keeping this value positive and gradually eating all neighbors away to increase this positive value. The position of the positive value doesn't matter. The strategy is same in case of a single negative element.
In more general case, if an array of size N have values of different signs, we can always find an array of size N-1 with different signs, because there must be a pair of neighbors with different sign, which we can combine to form a number of any sign we prefer.
For example with this array : [1,2,-5,4,-10]
we can combine either (2,-5) or (4,-10). Lets combine (4,-10) to get [1,2,-5,-14]
We can only take (2,-5) now. So our array now is : [1,-7,-14]
Again only (1,-7) possible. But this time we have to keep combined value positive. So we are left with: [8,-14]
Final combining gives us 22, sum of all absolute values.
In case of all values with same sign, our first move would be to produce an opposite sign combining a neighbor pair with as little "cost" as possible. Intuitively, we don't want to waste two big numbers on this conversion. If we take x,y neighbor pair, when combined the new value (of opposite sign) will be abs(x-y). Since result is simply sum of absolute values, we can interpret it as - "loosing" abs(x) and abs(y) from maximum possible output and "gaining" abs(x-y) instead. So the "cost" for using this pair for sign conversion is abs(x)+abs(y)-abs(x-y). Since we need to minimise this cost, we choose from initial array neighbor pair that have lowest such value.
So if we take the above array but now all values are positive [1,2,5,4,10]:
"cost" of converting (1,2) to -1 is 1+2-abs(-1)=2.
"cost" of converting (2,5) to -3 is 2+5-abs(-3)=4.
"cost" of converting (5,4) to -1 is 5+4-abs(-1)=8.
"cost" of converting (4,10) to -6 is 4+10-abs(-6)=8.
So, we take and convert pair (1,2) to -1. Then just sum absolute values of resultant array to get 20. Notice that this value is exactly 2 less than our previous example.
I'm working on a string similarity algorithm, and was thinking on how to give a score between 0 and 1 when comparing two strings. The two variables for this function are the Levenshtein distance D: (added, removed and changed characters) and the maximum length of the two strings L (but you could also take the average).
My initial algorithm was just 1-D/L but this gave too high scores for short strings, e.g. 'tree' and 'bee' would get a score of 0.5, and too low scores for longer strings which have more in common even if half of the characters is different.
Now I'm looking for a mathematical function that can output a better score. I wasn't able to come up with one, so I sketched this height map of a 3D plot (L is x and D = y).
Does anyone know how to convert such a graph to an equation, if I would be better off to just create a lookup table or if there is an existing solution?
Hello I am new to R and I can't find the way to do exactly what I want to. I have a vector of x numbers, and what i want to do is order it in increasing order, and then start making subtractions like this (let's say the vecto has 100 numbers for example):
[x(100)-x(99)]+[x(99)-x(98)]+[x(98)-x(97)]+[x(97)-x(96)]+...[x(2)-x(1)]
and then divide all that sum by the number of elements the vector has, in this case 100.
The only thing that I am able to do at the moment is order the vector with:
sort(nameOfTheVector)
Sorry for my bad English.
diff returns suitably lagged and iterated differences. In your case you want the default single lag. sum will return the sum any arguments passed to it, so....
sum(diff(sort(nameOfTheVector))) / length(nameOfTheVector)
What does the following code do:
rnorm(10, mean=2, sd=1:10)
The first number is from N(2,1)
The second number if from N(2,2)
The third number is from N(2,3)
etc...?
The first argument tells R how many random variates you want returned. In this case, it will give you back 10 values. Those values will be drawn from normal distributions with mean equal to 2. In addition, all 10 values will be drawn from distributions with different standard deviations, the first with SD=1, the second 2, ..., the 10th SD=10. Perhaps the thing to understand is that R, by its nature, is vectorized. That is, there is no such thing as a scalar, only a vector of length=1. (I recognize that that doesn't make a lot of sense within pure math, but it does in computer science.) As a result, arguments are often 'recycled' so that they will all match the length of the longest vector, i.e., you end up with a vector of 10 means, each equal to 2, to match your vector of 10 SDs. HTH.