Best way to generate random indices into an array? - functional-programming

Good afternoon. I have a set of values from which I'd like to draw a random subset.
My first thought was this:
let getRandomIndices size count =
if size >= count then
let r = System.Random()
r.GetValues(0,size) |> Seq.take count |> Seq.toList
else
[0..size-1]
However, r.GetValues(0,size) may generate the same value multiple times. How can I get distinct values? My first thought is to repeatedly store indexes into a set until the set holds the desired number of elements? But this seems too procedural/not-functional enough? Is there a better way?
Or should I start with [0..size-1] and remove random elements from it until it holds the desired number indices?
I'm not really looking for the most efficient approach, but the most functional one. I am struggling to better grok the functional mindset.

If you sort a list of all the indices randomly, you can just take the first count number of elements in the list.
let getRandomIndices size count =
if size >= count then
let r = System.Random()
[0..size-1] |> List.sortBy (fun _ -> r.Next()) |> List.take count
else
[0..size-1]

Related

Find which sum of any numbers in an array equals amount

I have a customer who sends electronic payments but doesn't bother to specify which invoices. I'm left guessing which ones and I would rather not try every single combination manually. I need some sort of pseudo-code to do it and then I can adapt it but I'm not sure I can come up with a good algorithm myself. . I'm familiar with php, bash, and python but I can adapt.
I would need an array with the following numbers: [357.15, 223.73, 106.99, 89.96, 312.39, 120.00]. Those are the amounts of the invoices. Then I would need to find a sum of any combination of two or more of those numbers that adds up to 596.57. Once found the program would need to tell me exactly which numbers it used to reach the sum so I can then know which invoices got paid.
This is very similar to the Subset Sum problem and can be solved using a similar approach to the typical brute-force method used for that problem. I have to do this often enough that I keep a simple template of this algorithm handy for when I need it. What is posted below is a slightly modified version1.
This has no restrictions on whether the values are integer or float. The basic idea is to iterate over the list of input values and keep a running list of every subset that sums to less than the target value (since there might be a later value in the inputs that will yield the target). It could be modified to handle negative values as well by removing the rule that only keeps candidate subsets if they sum to less than the target. In that case, you'd keep all subsets, and then search through them at the end.
import copy
def find_subsets(base_values, taget):
possible_matches = [[0, []]] # [[known_attainable_value, [list, of, components]], [...], ...]
matches = [] # we'll return ALL subsets that sum to `target`
for base_value in base_values:
temp = copy.deepcopy(possible_matches) # Can't modify in loop, so use a copy
for possible_match in possible_matches:
new_val = possible_match[0] + base_value
if new_val <= target:
new_possible_match = [new_val, possible_match[1]]
new_possible_match[1].append(base_value)
temp.append(new_possible_match)
if new_val == target:
matches.append(new_possible_match[1])
possible_matches = temp
return matches
find_subsets([list, of input, values], target_sum)
This is a very inefficient algorithm and it will blow up quickly as the size of the input grows. The Subset Sum problem is NP-Complete, so you are not likely to find a generalized solution that will work in all cases and is efficient.
1: The way lists are being used here is kludgy. If the goal was to simply find any match, the nested lists could be replaced with a dictionary, and we could exit right away once a match is found. But doing that will cause intermediate subsets that sum to the same value to also map to the same dictionary slot, so only one subset with that sum is kept. Since we need to report all matching subsets (because the values represent checks and are presumably not fungible even if the dollar amounts are equal), a dictionary won't work.
You can use itertools.combinations(t,r) to list all combinations of r elements in array t.
So we loop on the possible values of r, then on the results of itertools.combinations:
import itertools
def find_sum(t, obj):
t = [x for x in t if x < obj] # filter out elements which are too big
for r in range(1, len(t)+1): # loop on number of elements
for subt in itertools.combinations(t, r): # loop on combinations of r elements
if sum(subt) == obj:
return subt
return None
find_sum([1,2,3,4], 6)
# (2, 4)
find_sum([1,2,3,4], 10)
# (1, 2, 3, 4)
find_sum([1,2,3,4], 11)
# none
find_sum([35715, 22373, 10699, 8996, 31239, 12000], 59657)
# none
Rounding errors:
The code above is meant to be used with integers, rather than floats.
To use with floats, replace the test sum(subt) == obj with the more forgiving test sum(subt) - obj < 0.01.
Relevant documentation:
itertools.combinations

Delete all duplicated elements in a vector in Julia 1.1

I am trying to write a code which deletes all repeated elements in a Vector. How do I do this?
I already tried using unique and union but they both delete all the repeated items but 1. I want all to be deleted.
For example: let x = [1,2,3,4,1,6,2]. Using union or unique returns [1,2,3,4,6]. What I want as my result is [3,4,6].
There are lots of ways to go about this. One approach that is fairly straightforward and probably reasonably fast is to use countmap from StatsBase:
using StatsBase
function f1(x)
d = countmap(x)
return [ key for (key, val) in d if val == 1 ]
end
or as a one-liner:
[ key for (key, val) in countmap(x) if val == 1 ]
countmap creates a dictionary mapping each unique value from x to the number of times it occurs in x. The solution can then be easily found by extracting every key from the dictionary that maps to val of 1, ie all elements of x that occur precisely once.
It might be faster in some situations to use sort!(x) and then construct an index for the elements of the sorted x that only occur once, but this will be messier to code, and also the output will be in sorted order, which you may not want. The countmap method preserves the original ordering.

Allocating Subarrays in Mergesort

What's happening, folks.
So, I've done a fair amount of research on merge sort, and in spite of getting the "gist" of it, I am still baffled by how one is supposed to store the subarrays in order to merge them back together—in other words, save them somewhere so that they would "know" each other, as you would otherwise—in classic recursive fashion—have all these independent function calls returning data that I would assume would go out of scope.
Here's what I first thought: create a new array named "subs" to store the subarrays in upon each division (I also considered using a closure to do this and would like to know whether this is advisable). But, as you proceed to the next division, what are you gonna do—replace each element in subs with its subarrays? Then, you would be facing more costly work, especially once you consider how you're gonna move things around in subs in order to ensure that each subarray has its own index.
Heh—I have a bad feeling that this might be a far cry from what's actually supposed to be done. I understand that this algorithm is a classic example of the divide-and-conquer approach, but it's just strange to me that one couldn't just cut to the chase by splitting the array into all of its elements right off the bat (after all, that's the base case, and what would be wrong with throwing in a greedy approach to solving the problem?).
Thanks!
EDIT:
Alright, so I figured it out.
To sum it up: I used indices to track where to place elements (and obviate the need for built-in list functions that may slow down runtime).
By using nested functions and a (hidden) pointer to the new array, I kept data in scope. An auxiliary array buffers data from the subarrays.
In retrospect, what I originally had in mind vaguely resembled insertion sort was, in fact, bottom-up merge sort. Having previously questioned the efficiency and purpose of top-down merge sort, I now understand that by breaking down the problem, it expedites comparisons and swaps (especially when operating on larger lists, which insertion sort would prove to be less efficient in sorting). I did not attempt to implement my initial idea because I did not have a clear enough picture of recursion and how data is passed.
#!/bin/python
import sys
def merge_sort(arr):
def merge(*indices): # indices = first, last, and pivot indices, respectively
head, tail = indices[0], indices[1]
pivot = indices[2]
i = head
j = pivot+1
k = 0
while (i <= pivot and j <= tail):
if new[i] <= new[j]:
aux[k] = new[i]
i += 1
k += 1
else:
aux[k] = new[j]
j += 1
k += 1
while (i <= pivot):
aux[k] = new[i]
i += 1
k += 1
while (j <= tail):
aux[k] = new[j]
j += 1
k += 1
for x in xrange(head, tail+1):
new[x] = aux[x-head]
# end merge
def split(a, *indices): # indices = first and last indices, respectively
head, tail = indices[0], indices[1]
pivot = (head+tail) / 2
if head < tail:
l_sub = a[head:pivot+1]
r_sub = a[pivot+1:tail+1]
split(l_sub, head, pivot)
split(r_sub, pivot+1, tail)
merge(head, tail, pivot)
# end split
new = arr
aux = list(new)
tail = len(new)-1
split(new, 0, tail)
return new
# end merge_sort
if __name__ == "__main__":
loops = int(raw_input().strip())
for _ in xrange(loops):
arr = map(int, raw_input().strip().split(' '))
result = merge_sort(arr)
print result

Asking for pairs of information on a x_mdialog for scilab

I'm creating a program to calculate the linear regression of data. The program should, when you start, ask for the number of pairs of values (x and y) to be used. And according to the number of pairs that you define, it should go asking for the data to apply the regression (on pairs of X, Y). The program must apply regression for all methods available.
I already have the code for the regressions but the problem that I have is that I don't know how to ask for the data (the pairs of x and y) and for x create a vector, and for y create a separate vector. Also, it can be from 3 pairs to infinite number of pairs.
There are many ways to accomplish what you want. One possible way is to use a for-loop:
Inquiry the users on how many points they want to input, n_pairs, as you said.
Use a for-loop from 1 to n_pairs asking for inputs using x_mdialog.
At each iteration, evaluate the input and store the data.
Something like the following could work for you:
//inquiry how many pairs
n_pairs = x_mdialog("Data acquisition","How many points will you enter?","3");
n_pairs = evstr(n_pairs);
//initialise data
X_data = []; Y_data = [];
for i = 1 : n_pairs
//acquire each pair
pair = x_mdialog("Data acquisition",["X:","Y:"],["",""])
if pair(1) == "" | pair(2) == "" | pair == [] then
//break loop in case of blank input
break
else
//non-blank inputs are stored
X_data(i) = evstr(pair(1));
Y_data(i) = evstr(pair(2));
end
end
//sort values accordint to X
[X_data,idx] = gsort(X_data,"r","i");
Y_data = Y_data(idx);

Elixir loop over a matrix

I have a list of elements and I am converting it into a list of lists using the Enum.chunk_every method.
The code is something like this:
matrix = Enum.chunk_every(list_1d, num_cols)
Now I want to loop over the matrix and access the neighbors
Simply if I have the list [1,2,3,4,5,6,1,2,3] it is converted to a 3X3 matrix like:
[[1,2,3], [4,5,6], [1,2,3]]
Now how do I loop over this matrix? And what if I want to access the neighbors of the elements? For example the neighbors of 5 are 2,4,6 and 2.
I can see that recursion is a way to go but how will that work here?
There are many ways to solve this, and I think that you should consider first what is your use case (size of the matrix, number of matrices, number of accesses...) and adapt your data structure accordingly.
Nevertheless, here is a simple implementation (in Erlang shell, I let you adapt to elixir):
1> L = [[1,2,3], [4,5,6], [1,2,3]].
[[1,2,3],[4,5,6],[1,2,3]]
2> Get = fun(I,J,L) ->
try
V = lists:nth(I,lists:nth(J,L)),
{ok,V}
catch
_:_ -> {error,out_of_bound}
end
end.
#Fun<erl_eval.18.99386804>
3> Get(1,2,L).
{ok,4}
4> Get(2,3,L).
{ok,2}
5> Get(2,4,L).
{error,out_of_bound}
6> Neighbor = fun(I,J,L) ->
[ V || {I1,J1} <- [{I,J-1},{I-1,J},{I+1,J},{I,J+1}],
{ok,V} <- [Get(I1,J1,L)]
]
end.
#Fun<erl_eval.18.99386804>
7> Neighbor(2,2,L).
[2,4,6,2]
8> Neighbor(1,2,L).
[1,5,1]
9>
Remark: I like list comprehension, you may prefer to use lists:map in this case. This code is not efficient since it parses 4 time the list to get the neighbors. The only advantage is that it is "straight". so it should be easy to read.

Resources