How to migrate Presto `map` function to hive - dictionary

Presto map() function is quite a bit easier to use than hive. A presto map() invocation takes two lists: first one for the keys second for the values
A hive map() takes a varargs variable length parameter set of alternating key,values.
Here is a query snippet that I need to migrate (backwards?) from presto to hive:
, map(
concat(map_keys(decision_feature_importance), array['id_queue', 'queue_disposition']),
concat(map_values(decision_feature_importance), array[CAST(id_queue AS VARCHAR), queue_disposition])) other_info
The core of it is that the map() accepts two parallel arrays. But hive objects rather strongly to that. What is the pattern to [reverse- ?] migrate the map() ?
There are several questions about zipping lists in hive: e.g hive create map or key/value pair from two arrays They are pretty complicated, may involve UDF's (that I do not have ability to create) or libraries (brickhouse) that I do not have ability to install (shared cluster for hundreds of users). Also they constitute only a portion of the problem here.
The following toy query shows how to build the hive format map entries from two parallel lists. Basically we need to zip the lists manually - since there is no such builtin function for hive.
Hive partial equivalent
with mydata as (
select 1 id, map('key11','val11','key12','val12','key13','val13') as mymap
union all
select 2 id, map('key21','val21','key22','val22','key13','val13') as mymap
)
select split(concat_ws(',',collect_list(concat(key,',',value ))),',') keyval from (
select * from mydata lateral view outer explode (mymap) m
) d;

Related

Julia Querying from SQLite

Following this tutorial https://www.youtube.com/watch?app=desktop&v=qUrtLJcehE0, I created a database called Movies. Within the database a table called movies was created and next an entry was also added,
using SQLite
db = SQLite.DB("Movies")
SQLite.execute(db,"CREATE TABLE IF NOT EXISTS movies(movie_id REAL,movie_name TEXT, location TEXT)")
SQLite.execute(db,"INSERT INTO movies (movie_id,movie_name,location) VALUES(1,'Avengers','USA')")
However now when I try to Query the entry as follows,
SQLite.Query(db, "SELECT * from movies")
I get the this error, Error: MethodError: no method matching SQLite.Query.(::SQLite.DB,::String).
Any Ideas what I am doing wrong?
I don't know SQL, but I think you want to use SQLite.execute again not SQLite.Query. SQLite.Query is a struct not a function, and it doesn't have any documentation. I don't think you are meant to call it externally. Further documentation is here.
Method error means you are calling something with the wrong arguments. The SQLite.Query struct expects all of the following arguments:
struct Query
stmt::Stmt
status::Base.RefValue{Cint}
names::Vector{Symbol}
types::Vector{Type}
lookup::Dict{Symbol, Int}
end
The SQLite.execute function expects arguments in one of these forms:
SQLite.execute(db::SQLite.DB, sql, [params])
SQLite.execute(stmt::SQLite.Stmt, [params])
By convention in Julia, functions are all lowercase and types are capitalized.
To load a table using SQLite package,
using SQLite
using DataFrames
# Make a connection
db = SQLite.DB("Movies")
# To find out all tables available in schema
tbls = SQLite.tables(db)
# To load a specific table (movies table from Movies.db)
q = "SELECT * FROM movies"
data = SQLite.DBInterface.execute(db,q)
# To get as a dataframe
df = DataFrames.DataFrame(data)

Multiple variables in return object of function in R. Want to run it for multiple argument cases

How do I retrieve outputs from objects in an array as described in the background?
I have a function in R that returns multiple variables. For eg. if my function is called function_ABC,then:
a<-function_ABC (input_var)
gives a such that a$var1, a$var2, and a$var3 exist.
I have multiple cases to run such that I have put then in an array:
input_var <- c(1, 2, ...15)
for storing the outputs, I declared var such that:
var <- c(v1, v2, v3, .... v15)
Then I run:
assign(v1[i],function(input_var(i)))
However, after that I am unable to access these variables as v1[1]$var1. I can access them as: v1$var1, or v3$var1, etc. But this means I need to write 15*3 commands to retrieve my output.
Is there an easier way to do this?
Push your whole input set into an array Arr[ ].
Open a multi threaded executor E of certain size N.
Using a for loop on the input array Arr[], submit your function calls as a Callable job to the executor E. While submitting each job, hold the reference to the FutureTask in another Array FTArr[ ].
When all the FutureTask jobs are executed, you may retrieve the output for each of them by running another for loop on FTArr[ ].
Note :
• make sure to add synchronized block in your func_ABC, where you are accessing shared resources to avoid deadlocks.
• Please refer to the below link, if you want to know more about the usage of a count-down-latch. A count-down-latch helps you to find out, when exactly, all the child threads have finished execution.
https://www.geeksforgeeks.org/countdownlatch-in-java/

Running a loop and executing a function after each iteration of loop: r

I'm running a google big query script off of RStudio.
I have one important parameterised variable. Which needs to be replaced with values in a dataframe
health_tags<-read.csv('marker_tags.csv')
health_tags<-tail(tags, 7)
I have built a function which executes my query whilst adding the parameters to my variables.
query_details (MD2_date_start="2018-06-06",
MD2_date_end="2018-07-07",
Sterile_tag="7894")
So "query_details" is a function API call which fills in details for BQ to run. How do I write a looper which replaces the values in "sterile_tag" with the codes found in the health_tags CSV and then run the "query_details" function each time until all iterations have completed.
You can use sapply where column should be the real name of your column:
sapply(health_tags$column, function(x) query_details (MD2_date_start="2018-06-06",
MD2_date_end="2018-07-07",
Sterile_tag=as.character(x)))

Prolog - How can I save results from recursive calls?

I am still trying to understand the Prolog logic and have stumbled upon a problem.
I am trying to save values found within recursive calls, to pass on or gather.
As such:
main([]) :- !.
main([H|Tail]) :- findall(X,something(_,_,X),R),
getValueReturn(R,H,Lin, Lout),
main(Tail).
% X is the Head from main
getValueReturn([H|Tail],X,Lin, Lout) :- subset(X, H) ->
findall(A,something(A,_,H),L1),
append(Lin,L1,Lout),
getValueReturn(Tail,X,Lout,L)
;
getValueReturn(Tail,X,Lin,Lout).
I would like to gather the results from findall in getValueReturn, combine them, and send them back to main, which can then use them.
How do I create and add to a list within getValueReturn?
Similarly, how can I save the list in my main for all recursive calls?
EDIT:
I edited the code above as per a comment reply, however when I run this through trace, the list deletes all elements when the empty list is found.
What am I doing wrong? This is the first time I try to use the concept of building a list through recursion.
You should post complete code that can be run, with example data. I have not tested this.
You need to pass L around on the top-level also. Using the same variable names for different parameters in adjacent procedures does not improve readability.
main([E|Es],L0,L) :-
findall(X,something(_,_,X),Rs),
getValueReturn(Rs,E,L0,L1),
main(Es,L1,L).
main([],L,L).
getValueReturn([R|Rs],E,L0,L) :-
( subset(E,R) ->
findall(A,something(A,_,R),New),
append(L0,New,L1),
getValueReturn(Rs,E,L1,L)
; getValueReturn(Rs,E,L0,L) ).
getValueReturn([],_,L,L).
A variable can only have one value in Prolog. In your code, for example, Lout is the output from append/3, an input to a recursive call of getValueReturn/4, and then also the output on the top-level. This is probably not going to do what you want.
I have found the best way to do what I was trying to was to use asserta/z when a result was found, and then gather these results later on.
Otherwise the code became overly complicated and did not function as intended.

How to create a collection in Julia?

This seems like a really basic question, but can't find the answer. How do I create a collection in Julia? For example, I want to open a text file and parse each line to create an (iterable or otherwise) collection. Obviously I don't know how many elements there are in advance.
I can iterate through the lines like this
I = each_line(open(fileName,"r"))
state = start(I)
while !done(I, state)
(i, state) = next(I, state)
println(i)
end
But I don't know how to put each i into an array or other collection. I tried
map( i -> println(i), each_line(open(fileName,"r") ) )
But got the error
no method map(Function,EachLine)
You could do this:
lines = String[]
for line in each_line(open(fileName))
push!(lines, line)
end
And then lines contains the list of lines. You need the String in the first line to make the array extensible.
Standard collections and supported operations are mainly covered in the standard library documentation.
Specifically, the Deques section covers all of the operations supported by the 1d Array type (vector), including push! and pop! as well as insertion, resizing, etc.
Omar's answer is correct, and I will just add a small qualification: String[] creates a 1d array of Strings. The same constructor syntax may be used for example to create Int[], Float[], or even Any[] vectors. The latter type may hold objects of any type.
Depending on your Julia version, you may also be able to write collect(eachline(open("LICENSE.md"))) or [eachline(open("LICENSE.md"))...]. I think these won't work in 0.1.x versions but will working in newer 0.2 development versions (which are recommended at this point – 0.2 is on its way soon).

Resources