I'm creating a new dictionary, say:
var a: [Int: Int] = []
And, I'd like to set the keys 0...n to 1.
I'm doing this, rather brutishly as:
for i in 0...n {
a[i] = 1
}
n is likely to be around 700000. And, this "initialization" takes hours.
I know I can do this to an array:
var z: [Int] = []
z = Array(0...700000)
and, in a few seconds I get a 700000 element array.
What's the right way to populate my dictionary? Thanks much.
I tried to run your code, slightly modified, like this:
let n = 700000
let start = NSDate()
var a: [Int: Int] = [:] //Dictionary<Int, Int>(minimumCapacity: n)
for i in 0..<n {
a[i] = 1
}
let stop = NSDate()
let dif = stop.timeIntervalSinceDate(start)
println(dif)
This runs for 6.7 seconds. If I replace the
[:]
with
Dictionary<Int, Int>(minimumCapacity: n)
it gets initialized in 3 seconds (debug build, no optimizations). Perhaps you are not on the latest build of Xcode?
Related
I'm learning functional programming with F#, and I want to write a function that will generate a sequence for me.
There is a some predetermined function for transforming a value, and in the function I need to write there should be two inputs - the starting value and the length of the sequence. Sequence starts with the initial value, and each following item is a result of applying the transforming function to the previous value in the sequence.
In C# I would normally write something like that:
public static IEnumerable<double> GenerateSequence(double startingValue, int n)
{
double TransformValue(double x) => x * 0.9 + 2;
yield return startingValue;
var returnValue = startingValue;
for (var i = 1; i < n; i++)
{
returnValue = TransformValue(returnValue);
yield return returnValue;
}
}
As I tried to translate this function to F#, I made this:
let GenerateSequence startingValue n =
let transformValue x =
x * 0.9 + 2.0
seq {
let rec repeatableFunction value n =
if n = 1 then
transformValue value
else
repeatableFunction (transformValue value) (n-1)
yield startingValue
for i in [1..n-1] do
yield repeatableFunction startingValue i
}
There are two obvious problems with this implementation.
First is that because I tried to avoid making a mutable value (analogy of returnValue variable in C# implementation), I didn't reuse values of former computations while generating sequence. This means that for the 100th element of the sequence I have to make additional 99 calls of the transformValue function instead of just one (as I did in C# implementation). This reeks with extremely bad performance.
Second is that the whole function does not seem to be written in accordance with Functional Programming. I am pretty sure that there are more elegant and compact implementation. I suspect that Seq.fold or List.fold or something like that should have been used here, but I'm still not able to grasp how to effectively use them.
So the question is: how to re-write the GenerateSequence function in F# so it would be in Functional Programming style and have a better performance?
Any other advice would also be welcomed.
The answer from #rmunn shows a rather nice solution using unfold. I think there are other two options worth considering, which are actually just using a mutable variable and using a recursive sequence expression. The choice is probably a matter of personal preference. The two other options look like this:
let generateSequenceMutable startingValue n = seq {
let transformValue x = x * 0.9 + 2.0
let mutable returnValue = startingValue
for i in 1 .. n do
yield returnValue
returnValue <- transformValue returnValue }
let generateSequenceRecursive startingValue n =
let transformValue x = x * 0.9 + 2.0
let rec loop value i = seq {
if i < n then
yield value
yield! loop (transformValue value) (i + 1) }
loop startingValue 0
I modified your logic slightly so that I do not have to yield twice - I just do one more step of the iteration and yield before updating the value. This makes the generateSequenceMutable function quite straightforward and easy to understand. The generateSequenceRecursive implements the same logic using recursion and is also fairly nice, but I find it a bit less clear.
If you wanted to use one of these versions and generate an infinite sequence from which you can then take as many elements as you need, you can just change for to while in the first case or remove the if in the second case:
let generateSequenceMutable startingValue n = seq {
let transformValue x = x * 0.9 + 2.0
let mutable returnValue = startingValue
while true do
yield returnValue
returnValue <- transformValue returnValue }
let generateSequenceRecursive startingValue n =
let transformValue x = x * 0.9 + 2.0
let rec loop value i = seq {
yield value
yield! loop (transformValue value) (i + 1) }
loop startingValue 0
If I was writing this, I'd probably go either with the mutable variable or with unfold. Mutation may be "generally evil" but in this case, it is a localized mutable variable that is not breaking referential transparency in any way, so I don't think it's harmful.
Your description of the problem was excellent: "Sequence starts with the initial value, and each following item is a result of applying the transforming function to the previous value in the sequence."
That is a perfect description of the Seq.unfold method. It takes two parameters: the initial state and a transformation function, and returns a sequence where each value is calculated from the previous state. There are a few subtleties involved in using Seq.unfold which the rather terse documentation may not explain very well:
Seq.unfold expects the transformation function, which I'll call f from now on, to return an option. It should return None if the sequence should end, or Some (...) if there's another value left in the sequence. You can create infinite sequences this way if you never return None; infinite sequences are perfectly fine since F# evaluates sequences lazily, but you do need to be careful not to ever loop over the entirely of an infinite sequence. :-)
Seq.unfold also expects that if f returns Some (...), it will return not just the next value, but a tuple of the next value and the next state. This is shown in the Fibonacci example in the documentation, where the state is actually a tuple of the current value and the previous value, which will be used to calculate the next value shown. The documentation example doesn't make that very clear, so here's what I think is a better example:
let infiniteFibonacci = (0,1) |> Seq.unfold (fun (a,b) ->
// a is the value produced *two* iterations ago, b is previous value
let c = a+b
Some (c, (b,c))
)
infiniteFibonacci |> Seq.take 5 |> List.ofSeq // Returns [1; 2; 3; 5; 8]
let fib = seq {
yield 0
yield 1
yield! infiniteFibonacci
}
fib |> Seq.take 7 |> List.ofSeq // Returns [0; 1; 1; 2; 3; 5; 8]
And to get back to your GenerateSequence question, I would write it like this:
let GenerateSequence startingValue n =
let transformValue x =
let result = x * 0.9 + 2.0
Some (result, result)
startingValue |> Seq.unfold transformValue |> Seq.take n
Or if you need to include the starting value in the sequence:
let GenerateSequence startingValue n =
let transformValue x =
let result = x * 0.9 + 2.0
Some (result, result)
let rest = startingValue |> Seq.unfold transformValue |> Seq.take n
Seq.append (Seq.singleton startingValue) rest
The difference between Seq.fold and Seq.unfold
The easiest way to remember whether you want to use Seq.fold or Seq.unfold is to ask yourself which of these two statements is true:
I have a list (or array, or sequence) of items, and I want to produce a single result value by running a calculation repeatedly on pairs of items in the list. For example, I want to take the product of this whole series of numbers. This is a fold operation: I take a long list and "compress" it (so to speak) until it's a single value.
I have a single starting value and a function to produce the next value from the current value, and I want to end up with a list (or sequence, or array) of values. This is an unfold operation: I take a small starting value and "expand" it (so to speak) until it's a whole list of values.
I would like to iterate over a list and occasionally delete items of said list. Below a toy example:
function delete_item!(myarray, item)
deleteat!(myarray, findin(myarray, [item]))
end
n = 1000
myarray = [i for i = 1:n];
for a in myarray
if a%2 == 0
delete_item!(myarray, a)
end
end
However I get error:
BoundsError: attempt to access 500-element Array{Int64,1} at index [502]
How can I fix it (as efficiently as possible)?
Additional information. The above seems like a silly example, in my original problem I have a list of agents which interact. Therefore I am not sure if iterating over a copy would be the best solution. For example:
#creating my agent
mutable struct agent <: Any
id::Int
end
function delete_item!(myarray::Array{agent, 1}, item::agent)
deleteat!(myarray, findin(myarray, [item]))
end
#having my list of agents
n = 1000
myarray = agent[agent(i) for i = 1:n];
#trying to remove agents from list while having them interact
for a in myarray
#agent does stuff
if a.id%2 == 0 #if something happens remove
delete_item!(myarray, a)
end
end
Unfortunately there is no single answer to this question as most efficient approach depends on the logic of the whole model (in particular do other agents' actions depend on the fact that some entry is actually deleted from an array).
In most cases the following approach should be the simplest (I am leaving findin which is inefficient but I understand that you may have duplicates in myarray in general):
n = 1000
myarray = [i for i = 1:n];
keep = trues(n)
for (i, a) in enumerate(myarray)
keep[i] || continue # do not process an agent that is marked for deletion
if a%2 == 0 # here application logic might also need to check keep in some cases
keep[findin(myarray, [a])] = false
end
end
myarray = myarray[keep]
If for some reason you really need to delete elements of myarray in each iteration here is how you can do it:
n = 1000
myarray = [i for i = 1:n];
i = 1
while i <= length(myarray)
a = myarray[i]
if a%2 == 0
todelete = findin(myarray, [a])
i -= count(x -> x < i, todelete) # if myarray has duplicates of a you have to move the counter back
deleteat!(myarray, todelete)
else
i += 1
end
end
In general the code you give will not be very fast (e.g. if you know myarray does not contain duplicates it can be much simpler - and I guess you can).
EDIT: Here is how you can implement both versions if you know you do not have duplicates (you can simply use agent's index - observe that we can also avoid unnecessary checks):
n = 1000
myarray = [i for i = 1:n];
keep = trues(n)
for (i, a) in enumerate(myarray)
if a%2 == 0 # here application logic might also need to check keep in some cases
keep[i] = false
end
end
myarray = myarray[keep]
If for some reason you really need to delete elements of myarray in each iteration here is how you can do it:
n = 1000
myarray = [i for i = 1:n];
i = 1
while i <= length(myarray)
a = myarray[i]
if a%2 == 0
deleteat!(myarray, i)
else
i += 1
end
end
I'm trying to write some code in a functional paradigm for practice. There is one case I'm having some problems wrapping my head around. I am trying to create an array of 5 unique integers from 1, 100. I have been able to solve this without using functional programming:
let uniqueArray = [];
while (uniqueArray.length< 5) {
const newNumber = getRandom1to100();
if (uniqueArray.indexOf(newNumber) < 0) {
uniqueArray.push(newNumber)
}
}
I have access to lodash so I can use that. I was thinking along the lines of:
const uniqueArray = [
getRandom1to100(),
getRandom1to100(),
getRandom1to100(),
getRandom1to100(),
getRandom1to100()
].map((currentVal, index, array) => {
return array.indexOf(currentVal) > -1 ? getRandom1to100 : currentVal;
});
But this obviously wouldn't work because it will always return true because the index is going to be in the array (with more work I could remove that defect) but more importantly it doesn't check for a second time that all values are unique. However, I'm not quite sure how to functionaly mimic a while loop.
Here's an example in OCaml, the key point is that you use accumulators and recursion.
let make () =
Random.self_init ();
let rec make_list prev current max accum =
let number = Random.int 100 in
if current = max then accum
else begin
if number <> prev
then (number + prev) :: make_list number (current + 1) max accum
else accum
end
in
make_list 0 0 5 [] |> Array.of_list
This won't guarantee that the array will be unique, since its only checking by the previous. You could fix that by hiding a hashtable in the closure between make and make_list and doing a constant time lookup.
Here is a stream-based Python approach.
Python's version of a lazy stream is a generator. They can be produced in various ways, including by something which looks like a function definition but uses the key word yield rather than return. For example:
import random
def randNums(a,b):
while True:
yield random.randint(a,b)
Normally generators are used in for-loops but this last generator has an infinite loop hence would hang if you try to iterate over it. Instead, you can use the built-in function next() to get the next item in the string. It is convenient to write a function which works something like Haskell's take:
def take(n,stream):
items = []
for i in range(n):
try:
items.append(next(stream))
except StopIteration:
return items
return items
In Python StopIteration is raised when a generator is exhausted. If this happens before n items, this code just returns however much has been generated, so perhaps I should call it takeAtMost. If you ditch the error-handling then it will crash if there are not enough items -- which maybe you want. In any event, this is used like:
>>> s = randNums(1,10)
>>> take(5,s)
[6, 6, 8, 7, 2]
of course, this allows for repeats.
To make things unique (and to do so in a functional way) we can write a function which takes a stream as input and returns a stream consisting of unique items as output:
def unique(stream):
def f(s):
items = set()
while True:
try:
x = next(s)
if not x in items:
items.add(x)
yield x
except StopIteration:
raise StopIteration
return f(stream)
this creates an stream in a closure that contains a set which can keep track of items that have been seen, only yielding items which are unique. Here I am passing on any StopIteration exception. If the underlying generator has no more elements then there are no more unique elements. I am not 100% sure if I need to explicitly pass on the exception -- (it might happen automatically) but it seems clean to do so.
Used like this:
>>> take(5,unique(randNums(1,10)))
[7, 2, 5, 1, 6]
take(10,unique(randNums(1,10))) will yield a random permutation of 1-10. take(11,unique(randNums(1,10))) will never terminate.
This is a very good question. It's actually quite common. It's even sometimes asked as an interview question.
Here's my solution to generating 5 integers from 0 to 100.
let rec take lst n =
if n = 0 then []
else
match lst with
| [] -> []
| x :: xs -> x :: take xs (n-1)
let shuffle d =
let nd = List.map (fun c -> (Random.bits (), c)) d in
let sond = List.sort compare nd in
List.map snd sond
let rec range a b =
if a >= b then []
else a :: range (a+1) b;;
let _ =
print_endline
(String.concat "\t" ("5 random integers:" :: List.map string_of_int (take (shuffle (range 0 101)) 5)))
How's this:
const addUnique = (ar) => {
const el = getRandom1to100();
return ar.includes(el) ? ar : ar.concat([el])
}
const uniqueArray = (numberOfElements, baseArray) => {
if (numberOfElements < baseArray.length) throw 'invalid input'
return baseArray.length === numberOfElements ? baseArray : uniqueArray(numberOfElements, addUnique(baseArray))
}
const myArray = uniqueArray(5, [])
I am having trouble figuring out how to get the length of a matrix within a matrix within a matrix (nested depth of 3). So what the code is doing in short is... looks to see if the publisher is already in the array, then it either adds a new column in the array with a new publisher and the corresponding system, or adds the new system to the existing array publisher
output[k][1] is the publisher array
output[k][2][l] is the system
where the first [] is the amount of different publishers
and the second [] is the amount of different systems within the same publisher
So how would I find out what the length of the third deep array is?
function reviewPubCount()
local output = {}
local k = 0
for i = 1, #keys do
if string.find(tostring(keys[i]), '_') then
key = Split(tostring(keys[i]), '_')
for j = 1, #reviewer_code do
if key[1] == reviewer_code[j] and key[1] ~= '' then
k = k + 1
output[k] = {}
-- output[k] = reviewer_code[j]
for l = 1, k do
if output[l][1] == reviewer_code[j] then
ltable = output[l][2]
temp = table.getn(ltable)
output[l][2][temp+1] = key[2]
else
output[k][1] = reviewer_code[j]
output[k][2][1] = key[2]
end
end
end
end
end
end
return output
end
The code has been fixed here for future reference: http://codepad.org/3di3BOD2#output
You should be able to replace table.getn(t) with #t (it's deprecated in Lua 5.1 and removed in Lua 5.2); instead of this:
ltable = output[l][2]
temp = table.getn(ltable)
output[l][2][temp+1] = key[2]
try this:
output[l][2][#output[l][2]+1] = key[2]
or this:
table.insert(output[l][2], key[2])
Given this algorithm, I would like to know if there exists an iterative version. Also, I want to know if the iterative version can be faster.
This some kind of pseudo-python...
the algorithm returns a reference to root of the tree
make_tree(array a)
if len(a) == 0
return None;
node = pick a random point from the array
calculate distances of the point against the others
calculate median of such distances
node.left = make_tree(subset of the array, such that the distance of points is lower to the median of distances)
node.right = make_tree(subset, such the distance is greater or equal to the median)
return node
A recursive function with only one recursive call can usually be turned into a tail-recursive function without too much effort, and then it's trivial to convert it into an iterative function. The canonical example here is factorial:
# naïve recursion
def fac(n):
if n <= 1:
return 1
else:
return n * fac(n - 1)
# tail-recursive with accumulator
def fac(n):
def fac_helper(m, k):
if m <= 1:
return k
else:
return fac_helper(m - 1, m * k)
return fac_helper(n, 1)
# iterative with accumulator
def fac(n):
k = 1
while n > 1:
n, k = n - 1, n * k
return k
However, your case here involves two recursive calls, and unless you significantly rework your algorithm, you need to keep a stack. Managing your own stack may be a little faster than using Python's function call stack, but the added speed and depth will probably not be worth the complexity. The canonical example here would be the Fibonacci sequence:
# naïve recursion
def fib(n):
if n <= 1:
return 1
else:
return fib(n - 1) + fib(n - 2)
# tail-recursive with accumulator and stack
def fib(n):
def fib_helper(m, k, stack):
if m <= 1:
if stack:
m = stack.pop()
return fib_helper(m, k + 1, stack)
else:
return k + 1
else:
stack.append(m - 2)
return fib_helper(m - 1, k, stack)
return fib_helper(n, 0, [])
# iterative with accumulator and stack
def fib(n):
k, stack = 0, []
while 1:
if n <= 1:
k = k + 1
if stack:
n = stack.pop()
else:
break
else:
stack.append(n - 2)
n = n - 1
return k
Now, your case is a lot tougher than this: a simple accumulator will have difficulties expressing a partly-built tree with a pointer to where a subtree needs to be generated. You'll want a zipper -- not easy to implement in a not-really-functional language like Python.
Making an iterative version is simply a matter of using your own stack instead of the normal language call stack. I doubt the iterative version would be faster, as the normal call stack is optimized for this purpose.
The data you're getting is random so the tree can be an arbitrary binary tree. For this case, you can use a threaded binary tree, which can be traversed and built w/o recursion and no stack. The nodes have a flag that indicate if the link is a link to another node or how to get to the "next node".
From http://en.wikipedia.org/wiki/Threaded_binary_tree
Depending on how you define "iterative", there is another solution not mentioned by the previous answers. If "iterative" just means "not subject to a stack overflow exception" (but "allowed to use 'let rec'"), then in a language that supports tail calls, you can write a version using continuations (rather than an "explicit stack"). The F# code below illustrates this. It is similar to your original problem, in that it builds a BST out of an array. If the array is shuffled randomly, the tree is relatively balanced and the recursive version does not create too deep a stack. But turn off shuffling, and the tree gets unbalanced, and the recursive version stack-overflows whereas the iterative-with-continuations version continues along happily.
#light
open System
let printResults = false
let MAX = 20000
let shuffleIt = true
// handy helper function
let rng = new Random(0)
let shuffle (arr : array<'a>) = // '
let n = arr.Length
for x in 1..n do
let i = n-x
let j = rng.Next(i+1)
let tmp = arr.[i]
arr.[i] <- arr.[j]
arr.[j] <- tmp
// Same random array
let sampleArray = Array.init MAX (fun x -> x)
if shuffleIt then
shuffle sampleArray
if printResults then
printfn "Sample array is %A" sampleArray
// Tree type
type Tree =
| Node of int * Tree * Tree
| Leaf
// MakeTree1 is recursive
let rec MakeTree1 (arr : array<int>) lo hi = // [lo,hi)
if lo = hi then
Leaf
else
let pivot = arr.[lo]
// partition
let mutable storeIndex = lo + 1
for i in lo + 1 .. hi - 1 do
if arr.[i] < pivot then
let tmp = arr.[i]
arr.[i] <- arr.[storeIndex]
arr.[storeIndex] <- tmp
storeIndex <- storeIndex + 1
Node(pivot, MakeTree1 arr (lo+1) storeIndex, MakeTree1 arr storeIndex hi)
// MakeTree2 has all tail calls (uses continuations rather than a stack, see
// http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!171.entry
// for more explanation)
let MakeTree2 (arr : array<int>) lo hi = // [lo,hi)
let rec MakeTree2Helper (arr : array<int>) lo hi k =
if lo = hi then
k Leaf
else
let pivot = arr.[lo]
// partition
let storeIndex = ref(lo + 1)
for i in lo + 1 .. hi - 1 do
if arr.[i] < pivot then
let tmp = arr.[i]
arr.[i] <- arr.[!storeIndex]
arr.[!storeIndex] <- tmp
storeIndex := !storeIndex + 1
MakeTree2Helper arr (lo+1) !storeIndex (fun lacc ->
MakeTree2Helper arr !storeIndex hi (fun racc ->
k (Node(pivot,lacc,racc))))
MakeTree2Helper arr lo hi (fun x -> x)
// MakeTree2 never stack overflows
printfn "calling MakeTree2..."
let tree2 = MakeTree2 sampleArray 0 MAX
if printResults then
printfn "MakeTree2 yields"
printfn "%A" tree2
// MakeTree1 might stack overflow
printfn "calling MakeTree1..."
let tree1 = MakeTree1 sampleArray 0 MAX
if printResults then
printfn "MakeTree1 yields"
printfn "%A" tree1
printfn "Trees are equal: %A" (tree1 = tree2)
Yes it is possible to make any recursive algorithm iterative. Implicitly, when you create a recursive algorithm each call places the prior call onto the stack. What you want to do is make the implicit call stack into an explicit one. The iterative version won't necessarily be faster, but you won't have to worry about a stack overflow. (do I get a badge for using the name of the site in my answer?
While it is true in the general sense that directly converting a recursive algorithm into an iterative one will require an explicit stack, there is a specific sub-set of algorithms which render directly in iterative form (without the need for a stack). These renderings may not have the same performance guarantees (iterating over a functional list vs recursive deconstruction), but they do often exist.
Here is stack based iterative solution (Java):
public static Tree builtBSTFromSortedArray(int[] inputArray){
Stack toBeDone=new Stack("sub trees to be created under these nodes");
//initialize start and end
int start=0;
int end=inputArray.length-1;
//keep memoy of the position (in the array) of the previously created node
int previous_end=end;
int previous_start=start;
//Create the result tree
Node root=new Node(inputArray[(start+end)/2]);
Tree result=new Tree(root);
while(root!=null){
System.out.println("Current root="+root.data);
//calculate last middle (last node position using the last start and last end)
int last_mid=(previous_start+previous_end)/2;
//*********** add left node to the previously created node ***********
//calculate new start and new end positions
//end is the previous index position minus 1
end=last_mid-1;
//start will not change for left nodes generation
start=previous_start;
//check if the index exists in the array and add the left node
if (end>=start){
root.left=new Node(inputArray[((start+end)/2)]);
System.out.println("\tCurrent root.left="+root.left.data);
}
else
root.left=null;
//save previous_end value (to be used in right node creation)
int previous_end_bck=previous_end;
//update previous end
previous_end=end;
//*********** add right node to the previously created node ***********
//get the initial value (inside the current iteration) of previous end
end=previous_end_bck;
//start is the previous index position plus one
start=last_mid+1;
//check if the index exists in the array and add the right node
if (start<=end){
root.right=new Node(inputArray[((start+end)/2)]);
System.out.println("\tCurrent root.right="+root.right.data);
//save the created node and its index position (start & end) in the array to toBeDone stack
toBeDone.push(root.right);
toBeDone.push(new Node(start));
toBeDone.push(new Node(end));
}
//*********** update the value of root ***********
if (root.left!=null){
root=root.left;
}
else{
if (toBeDone.top!=null) previous_end=toBeDone.pop().data;
if (toBeDone.top!=null) previous_start=toBeDone.pop().data;
root=toBeDone.pop();
}
}
return result;
}