How to sum on specified column in a RDD - rdd

Here are two data files:
spark16/file1.txt
1,9,5
2,7,4
3,8,3
spark16/file2.txt
1,g,h
2,i,j
3,k,l
After joined, I have:
(1, ((9,5),(g,h)) )
(2, ((7,4),(i,j)) )
(3, ((8,3),(k,l)) )
I need to get the
sum of (5,4,3) = 12
I stuck here:
val file1 = sc.textFile(“data96/file1.txt”).map(x=>(x.split(",")(0).toInt, (x.split(",")(1), x.split(",")(2).toInt)))
val file2 = sc.textFile(“data96/file2.txt”).map(x=>(x.split(",")(0).toInt, (x.split(",")(1), x.split(",")(2))))
val joined = file1.join(file2)
val sorted = joined.sortByKey()
val first = sorted.first
res4: (Int, ((String, Int), (String, String))) = (1,((9,5),(g,h)))
scala> joined.reduce(_._2._1._2 + _._2._1.2)
:34: error: type mismatch;
found : Int
required: (Int, ((String, Int), (String, String)))
joined.reduce(._2._1._2 + _._2._1._2)
How can I get the sum on the _._2._1._2?
Thank you very much.

If you got this after join
then
(1, ((9,5),(g,h)) )
(2, ((7,4),(i,j)) )
(3, ((8,3),(k,l)) )
Then select the column only you require as and perform the reduce
joined.map(_._2._1._2).reduce(_ + _)
This should give you the sum of 5, 4, 3 as 12
Reduce must return as same as the dataType you passed
Hope this helps!

Related

filter max of N

Could it be possible to write in FFL a version of filter that stops filtering after the first negative match, i.e. the remaining items are assumed to be positive matches? more generally, a filter.
Example:
removeMaxOf1([1,2,3,4], value>=2)
Expected Result:
[1,3,4]
This seems like something very difficult to write in a pure functional style. Maybe recursion or let could acheive it?
Note: the whole motivation for this question was hypothesizing about micro-optimizations. so performance is very relevant. I am also looking for something that is generally applicable to any data type, not just int.
I have recently added find_index to the engine which allows this to be done easily:
if(n = -1, [], list[:n] + list[n+1:])
where n = find_index(list, value<2)
where list = [1,2,3,4]
find_index will return the index of the first match, or -1 if no match is found. There is also find_index_or_die which returns the index of the first match, asserting if none is found for when you're absolutely certain there is an instance in the list.
You could also implement something like this using recursion:
def filterMaxOf1(list ls, function(list)->bool pred, list result=[]) ->list
base ls = []: result
base not pred(ls[0]): result + ls[1:]
recursive: filterMaxOf1(ls[1:], pred, result + [ls[0]])
Of course recursion can! :D
filterMaxOf1(input, target)
where filterMaxOf1 = def
([int] l, function f) -> [int]
if(size(l) = 0,
[],
if(not f(l[0]),
l[1:],
flatten([
l[0],
recurse(l[1:], f)
])
)
)
where input = [
1, 2, 3, 4, ]
where target = def
(int i) -> bool
i < 2
Some checks:
--> filterOfMax1([1, ]) where filterOfMax1 = [...]
[1]
--> filterOfMax1([2, ]) where filterOfMax1 = [...]
[]
--> filterOfMax1([1, 2, ]) where filterOfMax1 = [...]
[1]
--> filterOfMax1([1, 2, 3, 4, ]) where filterOfMax1 = [...]
[1, 3, 4]
This flavor loses some strong type safety, but is nearer to tail recursion:
filterMaxOf1(input, target)
where filterMaxOf1 = def
([int] l, function f) -> [int]
flatten(filterMaxOf1i(l, f))
where filterMaxOf1i = def
([int] l, function f) -> [any]
if(size(l) = 0,
[],
if(not f(l[0]),
l[1:],
[
l[0],
recurse(l[1:], f)
]
)
)
where input = [
1, 2, 3, 4, ]
where target = def
(int i) -> bool
i < 2

SML: Look and Say Function

I'm having trouble with writing the look and say function recursively. It's supposed to take a list of integers and evaluate to a list of integers that "reads as spoken." For instance,
look_and_say([1, 2, 2]) = "one one two twos" = [1, 1, 2, 2]
and
look_and_say([2, 2, 2]) = "three twos" = [3, 2]
I'm having some difficulty figuring out how to add elements to the list (and keep track of that list) throughout my recursive calls.
Here's an auxiliary function I've written that should be useful:
fun helper(current : int, count : int, remainingList : int list) : int list =
if (current = hd remainingList) then
helper(current, count + 1, tl remainingList)
else
(* add count number of current to list *)
helper(hd remainingList, 1, tl remainingList);
And here's a rough outline for my main function:
fun look_and_say(x::y : int list) : int list =
if x = nil then
(* print *)
else
helper(x, 1, y);
Thoughts?
You seem to have the right idea, although it doesn't look as if your helper will ever terminate. Here's a way of implementing it without a helper.
fun look_and_say [] = []
| look_and_say (x::xs) =
case look_and_say xs of
[] => [1,x]
| a::b::L => if x=b then (a+1)::b::L
else 1::x::a::b::L
And here's a way of implementing it with your helper.
fun helper(current, count, remainingList) =
if remainingList = [] then
[count, current]
else if current = hd remainingList then
helper(current, count + 1, tl remainingList)
else count::current::look_and_say(remainingList)
and look_and_say [] = []
| look_and_say (x::y) = helper(x, 1, y)

Missing operand in for loop

I have such a bubble sort statement;
procedure Bubble_Sort (Data: in out List) is
sorted: Boolean := false;
last : Integer := Data'LAST;
temp : Integer;
begin
while (not (sorted)) loop
sorted := true;
for check in range Data'First..(last-1) loop
if Data(check) < Data(check+1) then
-- swap two elements
temp := Data(check);
Data(check) := Data(check+1);
Data(check+1) := temp;
-- wasn't already sorted after all
sorted := false;
end if;
end loop;
last := last - 1;
end loop;
end Bubble_sort;
I have defined 'Data' like this:
Unsorted : constant List := (10, 5, 3, 4, 1, 4, 6, 0, 11, -1);
Data : List(Unsorted'Range);
And the type definition of 'List' is;
type List is array (Index range <>) of Element;
on the line
for check in range Data'Range loop
I get missing operand error. How can I solve this problem?
remove the range keyword:
for check in Data'Range loop
The range keyword is used to define ranges and subtypes (sometimes anonymous), which is not needed when you use the 'Range attribute.

Lookup function of Hash Table in SML

I was given an assignment to write a lookup function for this hashtable datatype in SML;
datatype 'a ht = table of (int * ('a list)) list;
which returns nil if the table is empty and/or the key doesn't exist the table.
The function is supposed to be this
val lookup = fn : int -> 'a ht -> 'a list
but i don't know how to look in each bucket of a hashtable or to display the value of the bucket of the key. I would appreciate some help on what kind of an algorithm to use.
for example the function should work like this;
-lookup 3 (table [(1, [2,3]), (2, [3,4,5]), (3, [4])]);
val it = [4] : int list
-lookup 4 (table [(1, [2,3]), (2, [3,4,5]), (3, [4])]);
val it = [] : int list
Your hash table is a list of pairs.
for fun lookup n table = ...
First you will need a function to travel down the list of pairs to the nth position.
you could use something like
case table of
table [] => nil
|table ls => lookup n-1 table (tl ls)
Second get the value (#2 of the pair) or by pattern matching when the second argument of your lookup function is reduced to 1 in the recursion, assuming your keys are ranked in order like in your examples.
Then just return it.

Length of nested array lua

I am having trouble figuring out how to get the length of a matrix within a matrix within a matrix (nested depth of 3). So what the code is doing in short is... looks to see if the publisher is already in the array, then it either adds a new column in the array with a new publisher and the corresponding system, or adds the new system to the existing array publisher
output[k][1] is the publisher array
output[k][2][l] is the system
where the first [] is the amount of different publishers
and the second [] is the amount of different systems within the same publisher
So how would I find out what the length of the third deep array is?
function reviewPubCount()
local output = {}
local k = 0
for i = 1, #keys do
if string.find(tostring(keys[i]), '_') then
key = Split(tostring(keys[i]), '_')
for j = 1, #reviewer_code do
if key[1] == reviewer_code[j] and key[1] ~= '' then
k = k + 1
output[k] = {}
-- output[k] = reviewer_code[j]
for l = 1, k do
if output[l][1] == reviewer_code[j] then
ltable = output[l][2]
temp = table.getn(ltable)
output[l][2][temp+1] = key[2]
else
output[k][1] = reviewer_code[j]
output[k][2][1] = key[2]
end
end
end
end
end
end
return output
end
The code has been fixed here for future reference: http://codepad.org/3di3BOD2#output
You should be able to replace table.getn(t) with #t (it's deprecated in Lua 5.1 and removed in Lua 5.2); instead of this:
ltable = output[l][2]
temp = table.getn(ltable)
output[l][2][temp+1] = key[2]
try this:
output[l][2][#output[l][2]+1] = key[2]
or this:
table.insert(output[l][2], key[2])

Resources