Scala: defining a map via a list of integers - r

Coming to Scala from R (complete newbie).
Say I have two lists:
val index = List(0,0,1,0,1)
val a = List(1,2,3,4,5)
Then index can define a natural map from a to b: List[List[Int]]:
1 -> b(0)
2 -> b(0)
3 -> b(1)
4 -> b(0)
5 -> b(1)
So that:
b = List(List(1,2,4), List(3,5))
In general, given index and a, what is the most natural way to create b in Scala?
e.g. in R I could for example write:
run_map <- function(index, a) {
map <- function(data, i) {
data[[i]] <- a[which(index == i)]
data
}
Reduce(map, unique(index), list())
}
Cheers for any help!

If you want one list to contain the index and the other list to contain the values then you would zip them together:
val myMap = (a zip b).toMap
where a contains the index and b the values.
However, if there's duplicate indexes then you would want to do the following:
for{
(i, list) <- (a zip b) groupBy (_._1)
} yield (i, list map (_._2))
wherein you're zipping, grouping by the index and then mapping over the values in the list. That created list contains both the index and value as a tuple.

0) Given:
val index = List(0,0,1,0,1)
val a = List(1,2,3,4,5)
Here is an "iterative" way to arrive at the solution:
1) "Zip" 2 lists together to produce a list of pairs:
index.zip(a)
result:
List[(Int, Int)] = List((0,1), (0,2), (1,3), (0,4), (1,5))
2) Group by first element of the pair in the list.
index.zip(a).groupBy(_._1)
result:
Map[Int,List[(Int, Int)]] = Map(1 -> List((1,3), (1,5)), 0 -> List((0,1), (0,2), (0,4)))
3) Remove redundant index by projecting only second element from each pair in the list v:
index.zip(a).groupBy(_._1).map{ case (i, v) => (i, v.map(_._2)) }
result:
Map[Int,List[Int]] = Map(1 -> List(3, 5), 0 -> List(1, 2, 4))

Related

How to build the dictionary from two other dictionaries by some condition on their values

I am new to functional programming and so can not imagen how to build the new dictionary based on two other dictionaries with similar set of keys. The new dictionary will have the entries with all keys but values will be selected/computed based on some condition.
For example, having two dictionaries:
D1: [(1,100);(2,50);(3,150)]
D2: [(1,20);(2,30);(3,0);(4,10)]
and condition to get the average of two values, the resulting dictionary will be
DR: [(1,60);(2,40);(3,75);(4,10)]
I need implementation in F#.
Please could you give me some advise.
View them as two (or more...) lists of tuples that we concat makes it easier. The below solves your specfic problem. To generalise the process aggeragting a list of values to something specific you would need to change averageBy to fold and provide a fold function instead of float. Assuming d1 and d2 mataches your exmaple.
Seq.concat [ d1 ; d2 ]
|> Seq.map (|KeyValue|)
|> Seq.groupBy fst
|> Seq.map (fun (k, c) -> k, Seq.averageBy (snd >> float) c |> int)
|> dict
If you wanted to use an external library, you could do this using Deedle series, which has various operations for working with (time) series of data.
Here, you have two data series that have different keys. Deedle lets you zip series based on keys and handle the cases where one of the values is missing using the opt type:
#r "nuget:Deedle"
open Deedle
let s1 = series [(1,100);(2,50);(3,150)]
let s2 = series [(1,20);(2,30);(3,0);(4,10)]
Series.zip s1 s2
|> Series.mapValues (fun (v1, v2) ->
( (OptionalValue.defaultArg 0 v1) +
(OptionalValue.defaultArg 0 v2) ) / 2)
This may not make sense if this is a thing that you need just in one or two places, but if you're working with key-value series of data more generally, it may be worth checking out.
Solution 1
From a functional perspective I would use a Map data-structure, instead of a dictionary. You can convert a dictionary to a Map like this
let d1 = dict [(1,100);(2,50);(3,150)]
let m1 = Map [for KeyValue (key,value) in d1 -> key, value]
But i wouldn't use a Dictionary and convert it, I would use a Map diretly.
let m1 = Map [(1,100);(2,50);(3,150)]
let m2 = Map [(1,20);(2,30);(3,0);(4,10)]
Next, you need a way to get all keys from both Maps. You can get the keys of a map with Map.keys but you need all the keys from both. You could get them by using a Set.
let keys = Set (Map.keys m1) + Set (Map.keys m2)
By adding two Sets you get a Set.union of both sets. Once you have them, you can traverse the keys, and try to get both values from both keys. If you use Map.find then you get an optional. You can Pattern match on both cases at once.
let result = Map [
for key in keys do
match Map.tryFind key m1, Map.tryFind key m2 with
| Some x, Some y -> key, (x + y) / 2
| Some x, None -> key, x
| None , Some y -> key, y
| None , None -> failwith "Cannot happen"
]
This creates a new Map data-structure and saves it into result. If both cases are Some then you compute the average, otherwise you just keep the value. As you iterate the keys of both Maps the None,None case cannot happen. A Key always must be in either one or the other.
After all of this, result will be:
Map [(1, 60); (2, 40); (3, 75); (4, 10)]
Again, here is the whole code at once:
let m1 = Map [(1,100);(2,50);(3,150)]
let m2 = Map [(1,20);(2,30);(3,0);(4,10)]
let keys = Set (Map.keys m1) + Set (Map.keys m2)
let result = Map [
for key in keys do
match Map.tryFind key m1, Map.tryFind key m2 with
| Some x, Some y -> key, (x + y) / 2
| Some x, None -> key, x
| None , Some y -> key, y
| None , None -> failwith "Cannot happen"
]
You also can inline the keys variable, if you want.
Solution 2
When you have a Map then you can make use of the fact that adding a value always to a map, always creates a new Map data-structure. This way you are able to use Map.fold that traverses a Map data-structure and uses one of the map as the starting state while you traverse the other Map.
With Map.change you then can read and change a value in one step. If a key is already available you calculate the average, otherwise just add the value.
let m1 = Map [(1,100);(2,50);(3,150)]
let m2 = Map [(1,20);(2,30);(3,0);(4,10)]
let result =
(m1,m2) ||> Map.fold (fun state key y ->
state |> Map.change key (function
| Some x -> Some ((x + y) / 2)
| None -> Some y
)
)
Bonus: Adding Functions to Modules
It's sad sometimes that F# has so few functions on Map. But you need the a lot, you always can add a union function youself to the Module. For example:
module Map =
let union f map1 map2 =
let keys = Set (Map.keys map1) + Set (Map.keys map2)
Map [
for key in keys do
match Map.tryFind key map1, Map.tryFind key map2 with
| Some x, Some y -> key, (f x y)
| Some x, None -> key, x
| None , Some y -> key, y
| None , None -> failwith "Cannot happen"
]
let m1 = Map [(1,100);(2,50);(3,150)]
let m2 = Map [(1,20);(2,30);(3,0);(4,10)]
This way you get a Map.union and you can specify a lambda-function that is executed if both keys are present in both maps, otherwise the value is used unchanged.
There have been a couple of useful suggestions:
Group by keys with standard library functions from the Seq module, by user1981
Use a specialized library for dealing with data series, by Tomas Petricek
Use a map instead (a functional data structure based on comparison), by David Raab
To this I'd like to add
An imperative way, filling a combined dictionary by iterating through the keys of the source data structures, and finally
A query expression
An imperative way
The average calculation is hard-coded with the type int. You can still have generic keys, as their type does not figure in the function, except for the equality constraint required for dictionary keys. You could make the function generic for values too, by marking it inline, but that won't be a pretty sight as it will introduce a host of other constraints onto the type of values.
open System.Collections.Generic
let unionAverage (d1 : IDictionary<_,_>) (d2 : IDictionary<_,_>) =
let d = Dictionary<_,_>()
for k in Seq.append d1.Keys d2.Keys |> Seq.distinct do
match d1.TryGetValue k, d2.TryGetValue k with
| (true, v1), (true, v2) -> d.Add(k, (v1 + v2) / 2)
| (true, v), _ | _, (true, v) -> d.Add(k, v)
| _ -> failwith "Key not found"
d
let d1 = dict[1, 100; 2, 50; 3, 150]
let d2 = dict[1, 20; 2, 30; 3, 0; 4, 10]
unionAverage d1 d2
A query expression
It operates on the same principle as the answer from user1981, but for re-usability the average function has been factored out. It expects an arbitrary number of #seq<KeyValuePair<_,_>> elements, which is just another way to represent dictionaries that are accessed through their enumerators.
As the query expression uses System.Linq.IGrouping under the hood, this is upcast to a regular sequence to reduce confusion. Then there's the conversion to float for Seq.average to operate on, because the type int does not have the required member DivideByInt.
module Dict =
let unionByMany f src =
query{
for KeyValue(k, v) in Seq.concat src do
groupValBy v k into group
select (group.Key, f (group :> seq<_>)) }
|> dict
Dict.unionByMany (Seq.averageBy float >> int) [d1; d2]
Dict.unionByMany Seq.sum [d1; d2]
Dict.unionByMany Seq.min [d1; d2]

In F# define zip function for two lists

Having trouble with a problem:
Define a function called zip that takes a pair (tuple) of equal length lists as a single parameter and returns a list of pairs. The first pair should contain the first element of each list, the second pair contains the second element of each list, and so on.
I have been stuck and am looking for advice on if I'm headed in the right direction or should try another approach.
It needs to be a single function definition without any nested functions and can not use build in functions!
What I have done is:
let rec zip (a , b) =
if List.length a = 1 then List.head a , List.head b
else zip (List.tail a , List.tail b)
when
> zip (["a"; "b"; "c"; "d"; "e"], [1; 2; 3; 4; 5]);;
is entered
val it : string * int = ("e", 5)
is returned.
The expected result should be
val it : (string * int) list = [("a", 1); ("b", 2); ("c", 3); ("d", 4); ("e", 5)]
Let's start with your original implementation:
let rec zip (a , b) =
if List.length a = 1 then List.head a , List.head b
else zip (List.tail a , List.tail b)
First of all, the type is wrong - this returns a tuple of values, not a list of tuples. What this does is that it iterates over the list (following the tails using List.tail) and when it reaches the end, it returns the only element of each of the lists, which is "e" and 5.
The first step to fixing this could be to add type annotations. This will force you to return a list in the then branch. If you have two singleton lists ["e"] and [5], you want to return ["e", 5]:
let rec zip (a:'a list , b:'b list) : list<'a * 'b> =
if List.length a = 1 then [List.head a , List.head b]
else zip (List.tail a , List.tail b)
This is still not right - in the else case, you are just looking at the tails, but you are ignoring the heads. You need to access the head and concatenate it to the list returned from your recursive call:
let rec zip (a:'a list , b:'b list) : list<'a * 'b> =
if List.length a = 1 then [List.head a , List.head b]
else (List.head a, List.head b) :: zip (List.tail a , List.tail b)
This works, but using if .. then .. else in this case is inelegant. The answer from Filipe shows how to do this better with pattern matching.
let rec zip (a, b) =
match (a, b) with
| ha :: ta, hb :: tb -> (ha, hb) :: zip (ta, tb)
| _, _ -> []

Transform List (Generator a) into Generator (List a)

Here is a simplified version of my problem: Generate a list of random values in which each consecutive value depends on the previous one.
For example, generate a list of random Int, in which each consecutive value will establish minimum for the next step. Let's assume that starting value = 0 and maximum value is always currentValue + 5 :
First step: Random.int 0 5 => 3
Next: Random.int 3 8 => 4
Next: Random.int 4 9 => 8
etc.
Here is my approach:
intGen : Int -> List (Rnd.Generator Int) -> List (Rnd.Generator Int)
intGen value list =
if length list == 10 then
list
else
let newValue = Rnd.int value (value + 5)
newList = newValue :: list
in intGen newValue newList
Let's transform it into Rnd.Generator (List Int):
listToGen : List (Rnd.Generator Int) -> Rnd.Generator (List Int)
listToGen list =
foldr
(Rnd.map2 (::))
(Rnd.list 0 (Rnd.int 0 1))
list
I don't like this part: (Rnd.list 0 (Rnd.int 0 1)). It generates initial value of type Rnd.Generator (List Int), in which (Rnd.int 0 1) is actually never used but is needed by type checking. I would like to skip this part somehow or replace it with something more generic. Is it possible or my implementation is erroneous?
Here is one solution which uses andThen and map. The first parameters is the number of elements you want in the list. The second parameter is the starting value.
intGen : Int -> Int -> Rnd.Generator (List Int)
intGen num value =
if num <= 0 then
constant []
else
Rnd.int value (value + 5)
|> Rnd.andThen (\i -> intGen (num-1) i
|> Rnd.map (\rest -> i :: rest))
To match your example of a list of size 10 starting with 0 as the first low value, you would call this as intGen 10 0.
constant is a generator from elm-community/random-extra, or it can be defined simply like this (because it isn't exposed in the core Elm codebase):
constant : a -> Rnd.Generator a
constant a = Rnd.map (\_ -> a) (Rnd.int 0 1)
Regarding your example, I don't think you would want to use List (Rnd.Generator Int) because that implies a list of generators that aren't tied together in any way. That's why we need to use andThen to pull out the random value just generated, call intGen recursively minus one, then use map to put the list together.

F# Split Function

I'm building a merge sort function and my split method is giving me a value restriction error. I'm using 2 accumulating parameters, the 2 lists resulting from the split, that I package into a tuple in the end for the return. However I'm getting a value restriction error and I can't figure out what the problem is. Does anyone have any ideas?
let split lst =
let a = []
let b = []
let ctr = 0
let rec helper (lst,l1,l2,ctr) =
match lst with
| [] -> []
| x::xs -> if ctr%2 = 0 then helper(xs, x::l1, l2, ctr+1)
else
helper(xs, l1, x::l2, ctr+1)
helper (lst, a, b, ctr)
(a,b)
Any input is appreciated.
The code, as you have written it, doesn't really make sense. F# uses immutable values by default, therefore your function, as it's currently written, can be simplified to this:
let split lst =
let a = []
let b = []
(a,b)
This is probably not what you want. In fact, due to immutable bindings, there is no value in predeclaring a, b and ctr.
Here is a recursive function that will do the trick:
let split lst =
let rec helper lst l1 l2 ctr =
match lst with
| [] -> l1, l2 // return accumulated lists
| x::xs ->
if ctr%2 = 0 then
helper xs (x::l1) l2 (ctr+1) // prepend x to list 1 and increment
else
helper xs l1 (x::l2) (ctr+1) // prepend x to list 2 and increment
helper lst [] [] 0
Instead of using a recursive function, you could also solve this problem using List.fold, fold is a higher order function which generalises the accumulation process that we described explicitly in the recursive function above.
This approach is a bit more concise but very likely less familiar to someone new to functional programming, so I've tried to describe this process in more detail.
let split2 lst =
/// Take a running total of each list and a index*value and return a new
/// pair of lists with the supplied value prepended to the correct list
let splitFolder (l1, l2) (i, x) =
match i % 2 = 0 with
|true -> x :: l1, l2 // return list 1 with x prepended and list2
|false -> l1, x :: l2 // return list 1 and list 2 with x prepended
lst
|> List.mapi (fun i x -> i, x) // map list of values to list of index*values
|> List.fold (splitFolder) ([],[]) // fold over the list using the splitFolder function

Easier way to generate a truth table

I want to create a list of lists in SML, which represents a truth table of the following form:
Example:
[
[("r",true),("p",true),("q",true)],
[("r",false),("p",false),("q",true)],
[("r",false),("p",true),("q",true)],
...
]
I think I could achieve this in two ways:
(1) with the cartesian product
(2) converting truth table index entry to binary, which would represent an encoded line in the list (e.g. 8(decimal) is 1000(binary) => [("r",true),("p",false),("q",false)]), but I think this is to complicated and there is probably an easier way.
What would be the easiest way to go about this?
fun tt [] = [[]]
| tt (x :: xs) =
let
val txs = tt xs
in
map (fn l => (x, true) :: l) txs #
map (fn l => (x, false) :: l) txs
end
- tt ["a", "b", "c"];
val it =
[[("a",true),("b",true),("c",true)],[("a",true),("b",true),("c",false)],
[("a",true),("b",false),("c",true)],[("a",true),("b",false),("c",false)],
[("a",false),("b",true),("c",true)],[("a",false),("b",true),("c",false)],
[("a",false),("b",false),("c",true)],[("a",false),("b",false),("c",false)]]
: (string * bool) list list

Resources