[Scala/Scalding]: map ID to name - dictionary

I am fairly new to Scalding and I am trying to write a scalding program that takes as input 2 datasets:
1) book_id_title: ('id,'title): contains the mapping between book ID and book title, Both are strings.
2) book_sim: ('id1, 'id2, 'sim): contains the similarity between pairs of books, identified by their IDs.
The goal of the scalding program is to replace each (id1, id2) in book_ratings with their respective titles by looking up the book_id_title table. However, I am not able to retrieve the title. I would appreciate it if someone could help with the getTitle() function below.
My scalding code is as follows:
// read in the mapping between book id and title from a csv file
val book_id_title =
Csv(book_file, fields=book_format)
.read
.project('id,'title)
// read in the similarity data from a csv file and map the ids to the titles
// by calling getTitle function
val result =
book_sim
.map(('id1, 'id2)->('title1, 'title2)) {
pair:(String,String)=> (getTitle(pair._1), getTitle(pair._2))
}
.write(out)
// function that searches for the id and retrieves the title
def getTitle(search_id: String) = {
val btitle =
book_id_title
.filter('id){id:String => id == search_id} // extract row matching the id
.project('title) // get the title
}
thanks

Hadoop is a batch processing system and there is no way to lookup data by index. Instead, you need to join book_id_title and book_sim by id, probably two times: for left and right ids. Something like:
book_sim.joinWithSmaller('id1->id, book_id_title).joinWithSmaller('id2->id, book_id_title)
I am not very familiar with the field-based API so consider the above as a pseudocode. You also need to add appropriate projections. Hopefully, it still gives you an idea.

Related

Create a perl hash from a db select

Having some trouble understanding how to create a Perl hash from a DB select statement.
$sth=$dbh->prepare(qq{select authorid,titleid,title,pubyear from books});
$sth->execute() or die DBI->errstr;
while(#records=$sth->fetchrow_array()) {
%Books = (%Books,AuthorID=> $records[0]);
%Books = (%Books,TitleID=> $records[1]);
%Books = (%Books,Title=> $records[2]);
%Books = (%Books,PubYear=> $records[3]);
print qq{$records[0]\n}
print qq{\t$records[1]\n};
print qq{\t$records[2]\n};
print qq{\t$records[3]\n};
}
$sth->finish();
while(($key,$value) = each(%Books)) {
print qq{$key --> $value\n};
}
The print statements work in the first while loop, but I only get the last result in the second key,value loop.
What am I doing wrong here. I'm sure it's something simple. Many thanks.
OP needs better specify the question and do some reading on DBI module.
DBI module has a call for fetchall_hashref perhaps OP could put it to some use.
In the shown code an assignment of a record to a hash with the same keys overwrites the previous one, row after row, and the last one remains. Instead, they should be accumulated in a suitable data structure.
Since there are a fair number of rows (351 we are told) one option is a top-level array, with hashrefs for each book
my #all_books;
while (my #records = $sth->fetchrow_array()) {
my %book;
#book{qw(AuthorID TitleID Title PubYear)} = #records;
push #all_books, \%book;
}
Now we have an array of books, each indexed by the four parameters.
This uses a hash slice to assign multiple key-value pairs to a hash.
Another option is a top-level hash with keys for the four book-related parameters, each having for a value an arrayref with entries from all records
my %books;
while (my #records = $sth->fetchrow_array()) {
push #{$books{AuthorID}}, $records[0];
push #{$books{TitleID}}, $records[1];
...
}
Now one can go through authors/titles/etc, and readily recover the other parameters for each.
Adding some checks is always a good idea when reading from a database.

Merging Value of one dictionary as key of another

I have 2 Dictionaries:
StatePops={'AL':4887871, 'AK':737438, 'AZ':7278717, 'AR':3013825}
StateNames={'AL':'Alabama', 'AK':'Alaska', 'AZ':'Arizona', 'AR':'Arkansas'}
I am trying to merge so the Value of StateNames is the Key for StatePops.
Ex.
{'Alabama': 4887871, 'Alaska': 737438, ...
I also have to display the name of states w/ population over 4million.
Any help is appreciated!!!
You have not specified in what programming language you want this problem to be solved.
Nonetheless, here is a solution in Python.
state_pops = {
'AL': 4887871,
'AK': 737438,
'AZ':7278717,
'AR':3013825
}
state_names = {
'AL':'Alabama',
'AK':'Alaska',
'AZ':'Arizona',
'AR':'Arkansas'
}
states = dict([([state_names[k],state_pops[k]]) for k in state_pops])
final = {k:v for k, v in states.items() if v > 4000000}
print(states)
print(final)
First, you can merge two dictionaries with the predefined dict python function in the states variable as such. Here, k is an iterator and it is used as index for state_names and state_pops.
Then, store the filtered dictionary in final where the states.items() is used to access the keys and values in states and type-cast it as a string with the str function.
There may be more simpler solutions but this is as far as I can optimize the problem.
Hope this helps.
Dictionary Keys cannot be changed in Python. You need to either add the modified key-value pair to the dictionary and then delete the old key, or you can create a new dictionary. I'd opt for the second option, i.e., creating a new dictionary.
myDict = {}
for i in StatePops:
myDict.update({StateNames[i] : StatePops[i]})
This outputs myDict as
{'Alabama': 4887871, 'Alaska': 737438, 'Arizona': 7278717, 'Arkansas': 3013825}

Query is unable to match parts after "/" or parts within "()" in the data

I have a search request written as
import sqlite3
conn = sqlite3.connect('locker_data.db')
c = conn.cursor()
def search1(teacher):
test = 'SELECT Name FROM locker_data WHERE Name or Email LIKE "%{0}%"'.format(teacher)
data1 = c.execute(test)
return data1
def display1(data1):
Display1 = []
for Name in data1:
temp1 = str(Name[0])
Display1.append("Name: {0}".format(temp1))
return Display1
def locker_searcher(teacher):
data = display1(search1(teacher))
return data
This allows me to search for the row containing "Mr FishyPower (Mr Swag)" or "Mr FishyPower / Mr Swag" with a search input of "FishyPower". However, when I try searching with an input of "Swag", I am then unable to find the same row.
In the search below, it should have given me the same search results.
The database is just a simple 1x1 sqlite3 database containing 'FishyPower / Mr Swag'
Search Error on 'Swag'
Edit: I technically did solve it by limiting the columns being searched to only 'Name' but I intended the code search both the 'Name' and 'Email' columns and output the results as long as the search in within either or both columns.
Edit2: SELECT Name FROM locker_data WHERE Email LIKE "%{0}%" or Name LIKE "%{0}%" was the right way to go.
I'm gonna guess that Mr. FishyPower's email address is something like mrFishyPower#something.com. The query is only comparing Email to teacher. If it was
WHERE Name LIKE "%{0}%"
OR Email LIKE "%{0}%"'
you would (probably) get the result you want.

How do I get a value from a dictionary when the key is a value in another dictionary in Lua?

I am writing some code where I have multiple dictionaries for my data. The reason being, I have multiple core objects and multiple smaller assets and the user must be able to choose a smaller asset and have some function off in the distance run the code with the parent noted.
An example of one of the dictionaries: (I'm working in ROBLOX Lua 5.1 but the syntax for the problem should be identical)
local data = {
character = workspace.Stores.NPCs.Thom,
name = "Thom", npcId = 9,
npcDialog = workspace.Stores.NPCs.Thom.Dialog
}
local items = {
item1 = {
model = workspace.Stores.Items.Item1.Main,
npcName = "Thom",
}
}
This is my function:
local function function1(item)
if not items[item] and data[items[item[npcName]]] then return false end
end
As you can see, I try to index the dictionary using a key from another dictionary. Usually this is no problem.
local thisIsAVariable = item[item1[npcName]]
but the method I use above tries to index the data dictionary for data that is in the items dictionary.
Without a ton of local variables and clutter, is there a way to do this? I had an idea to wrap the conflicting dictionary reference in a tostring() function to separate them - would that work?
Thank you.
As I see it, your issue is that:
data[items[item[npcName]]]
is looking for data[“Thom”] ... but you do not have such a key in the data table. You have a “name” key that has a “Thom” value. You could reverse the name key and value in the data table. “Thom” = name

In Biztalk mapper how to use split array concept

Required suggestion on below part.please any one give solution.
We have mapping from 850 to FlatFile
X12/PO1Loop1/PO1/PO109 and I need to map to field VALUE which is under record Option which is unbounded.
Split PO109 into substrings delimited by '.', foreach subsring after the first, create new Option with value=substring
So in input sample we have value like 147895632qwerqtyuui.789456123321456987
Similarly the field repeats under POLoop1.
So I need to split value based on (.) then pass a value to value field under option record(unbounded).
I tried using below code snippet
public string SplitValues(string strValue)
{
string[] arrValue = strValue.Split(".".ToCharArray());
foreach (string strDisplay in arrValue)
{
return strDisplay;
}
}
But it doesn't works, and I am not really familiar with the String methods and I am not sure if there's an easy way to do this. I have a String which contains couple of values delimited with "." .
So I need to separate values based on delimiter(.) and pass value to field.
How can I do this
As I mentioned, not too clear what is your objective, but I think you want to split a node that has some kind of delimiter into multiple nodes... if so, Try this: https://seroter.wordpress.com/2008/10/07/splitting-delimited-values-in-biztalk-maps/
He is doing exactly that. Given a node with a|b|c|d as value, output multiple nodes, each containing the value after splitted by |, so node1 = a, node2 = b, node3 = c, node4 = d.

Resources