Groovy: Dynamically create nested / combine maps in code - dictionary

I'm trying to dynamically create nested map like below in code.
def people = [
[name: 'Ash', age: '21', gender: 'm'],
[name: 'Jo', age: '22', gender: 'f'],
[name: 'etc.', age: '42', gender: 'f']
]
So I can search it like below
person = people.findAll {item ->
item.gender == 'm' &&
item.age == '21'}
My problem is that whilst I can dynamically create one dimensional maps in code, I don't know how to dynamically combine maps in code to create nested map e.g. let's assume in code I have created two maps name1 and name2. How do I add them to people map so they are nested like above example?
def people = [:]
def name1 = [name:'ash', age:'21', gender:'m']
def name2 = [name:'Jo', age:'22', gender:'f']
I've searched / tried so many posts without success. Below is close, but does not work :(
people.put((),(name1))
people.put((),(name2))

In your example, people is a list of maps, not a nested map
So you can simply do:
def people = []
def name1 = [name:'ash', age:'21', gender:'m']
def name2 = [name:'Jo', age:'22', gender:'f']
Then:
people += name1
people += name2
Or define it in one line:
def people = [name1, name2]

Related

Web Scraping: How do I return specific user input forms in python?

I'm having trouble with the forms returning an exact match for the user input.
Emphasoft developer challenge:
Taking a list of tax form names (ex: "Form W-2", "Form 1095-C"),
search the website and return some informational results.
Specifically, you must return the "Product Number", the "Title", and
the maximum and minimum years the form is available for download.
Taking a tax form name (ex: "Form W-2") and a range of years
(inclusive, 2018-2020 should fetch three years), download all PDFs
available within that range.
import json import os import sys import requests from bs4 import BeautifulSoup
URL = 'https://apps.irs.gov/app/picklist/list/priorFormPublication.html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&{param.strip}&isDescending=false'
def get_forms(list_tax_form: list):
"""
function to get response from iris.gov with all forms content
:param list_tax_form: list of form names that we want to get info about
:return: dict with form name,form title
"""
response_list = [] # list for all responses of form names
with requests.session() as session:
for param in list_tax_form:
request_params = {'value': param,
'criteria': 'formNumber',
'submitSearch': 'Find',
}
res = session.get(URL, params=request_params).content
response_list.append(res)
return response_list
def parse_responses(list_tax_form: list):
"""
function to get all form names, titles years from previous func return
:param list_tax_form: list of form names that we want to get info about
:return: list of form names, titles, years
"""
responses = get_forms(list_tax_form)
# empty lists to fill them with the received information for all names, years, and titles
td_form_name, td_form_title, td_form_rev_year = [], [], []
for response in responses:
soup = BeautifulSoup(response, 'lxml')
td_name = soup.find_all('td', {'class': 'LeftCellSpacer'})
td_title = soup.find_all('td', {'class': 'MiddleCellSpacer'})
td_rev_year = soup.find_all('td', {'class': 'EndCellSpacer'})
td_form_name.extend(td_name)
td_form_title.extend(td_title)
td_form_rev_year.extend(td_rev_year)
return td_form_name, td_form_title, td_form_rev_year
def format_responses(list_tax_form: list):
"""
function to formate all responses for all forms we got!
1 Task
:param list_tax_form: list of form names that we want to get info about
:return: formated names,links,years
"""
td_names, td_titles, td_years = parse_responses(list_tax_form)
names = [name.text.strip() for name in td_names]
links = [link.find('a')['href'] for link in td_names]
titles = [title.text.strip() for title in td_titles]
years = [int(year.text.strip()) for year in td_years]
set_names = set(names)
final_dict = []
# loop to create dictionary of result information with years of tax form available to download
for name in set_names:
max_year = 0
min_year = max(years)
dict1 = {'form_number': name}
for index, p_name in enumerate(names):
if p_name == name:
if years[index] > max_year:
max_year = years[index]
elif years[index] < min_year:
min_year = years[index]
dict1['form_title'] = titles[index]
dict1['max_year'] = max_year
dict1['min_year'] = min_year
final_dict.append(dict1)
print(json.dumps(final_dict, indent=2))
return names, links, years
def download_files(list_tax_form):
"""
2 Task
Module to download pdf files of form_name that input from user.
:param list_tax_form: list of form names that we want to get info about
:return: message to user of successful create file or either
"""
names, links, years = format_responses(list_tax_form)
form_name = input('enter form name: ')
if form_name in names:
print('form exists. enter years range')
form_year1 = int(input('start year to analysis: '))
form_year2 = int(input('end year to analysis: '))
try:
os.mkdir(form_name)
except FileExistsError:
pass
# indecies to define names range in list of all tax form names
r_index = names.index(form_name) # index of first form_name mention on list
l_index = names.index(form_name) # index of last form_name mention on list
for name in names:
if name == form_name:
r_index += 1
years = years[l_index:r_index]
if form_year1 < form_year2:
range_years = range(form_year1, form_year2 + 1)
for year in range_years:
if year in years:
link = links[years.index(year)]
form_file = requests.get(link, allow_redirects=True)
open(f'{form_name}/{form_name}_{str(year)}.pdf', 'wb').write(form_file.content)
print(f'files saved to {form_name}/ directory!')
else:
print('input correct form name!')
if __name__ == '__main__':
tax_list = sys.argv[1:] # form names
download_files(tax_list)
(ex: "Form W-2" should not return "Form W-2 P")
When this file is ran, it is displaying other unrelated results.
How can I resolve this issue to display only specified user requests?

Kivy regarding binding multiple buttons to each individual function

Hi I am new to Kivy and just started programming. I have problem, I want to bind all the buttons i created in the for loops to the on_release for every single buttons. So that to make all buttons once click is able to go different screens. Below is my a small part of my code( I EDITED with more information)
#this are the pictures of the buttons
a = '_icons_/mcdonald2.png'
b = '_icons_/boostjuice.png'
c = '_icons_/duckrice.png'
d = '_icons_/subway_logo.png'
e = '_icons_/bakery.png'
f = '_icons_/mrbean.png'
#these are the names of the different screen
n1 = 'mcdonald_screen'
n2 = 'boost_screen'
n3 = 'duck_screen'
n4 = 'subway_screen'
n5 = 'bakery_screen'
n6 = 'mrbean_screen'
arraylist = [[a,n1],[b,n2],[c,n3],[d,n4],[e,n5],[f,n6]]
self.layout2 = GridLayout(rows=2, spacing = 50,size_hint = (0.95,0.5),
pos_hint = {"top":.65,"x":0},padding=(90,0,50,0))
for image in arraylist:
self.image_outlet = ImageButton(
size_hint=(1, 0.1),
source= image[0])
self.screen_name = image[1]
self.image_outlet[0].bind(on_release= ??) ## This part is the one
i want to change
according to the
different screen
self.layout2.add_widget(self.image_outlet)
self.add_widget(self.layout2)
GUI = Builder.load("_kivy_/trying.kv")
class TRYINGApp(App):
def build(self):
return GUI
def change_screen(self,screen_name):
screen_manager = self.root.ids['screen_manager']
screen_manager.current = screen_name
#kv file#
# all the varies kv file screen
#: include _kivy_/variestime_screen.kv
#: include _kivy_/homescreen.kv
#: include _kivy_/mcdonaldscreen.kv
#: include _kivy_/firstpage.kv
#: include _kivy_/mrbeanscreen.kv
#: include _kivy_/boostscreen.kv
#: include _kivy_/duckscreen.kv
#: include _kivy_/subwayscreen.kv
#: include _kivy_/bakeryscreen.kv
GridLayout:
cols:1
ScreenManager:
id : screen_manager
FirstPage:
name :"first_page"
id : first_page
VariesTimeScreen:
name: "variestime_screen"
id: variestime_screen
HomeScreen:
name : "home_screen"
id : home_screen
McDonaldScreen:
name : "mcdonald_screen"
id : mcdonald_screen
BoostScreen:
name : "boost_screen"
id : boost_screen
DuckScreen:
name: "duck_screen"
id: duck_screen
SubwayScreen:
name:"subway_screen"
id: subway_screen
BakeryScreen:
name: "bakery_screen"
id: bakery_screen
MrBeanScreen:
name: "mrbean_screen"
id : mrbean_screen
Your on_release can be something like:
self.image_outlet.bind(on_release=partial(self.change_screen, image[1]))
where change_screen is a method that you must define:
def change_screen(self, new_screen_name, button_instance):
# some code to change to the screen with name new_screen_name
Note that I have removed the [0] from self.image_outlet (I suspect that was a typo). I can't determine what code should go in the new method, because you haven't provided enough information.
If you have a change_screen method in your App class, you can use that directly by referencing it in your on_release as:
self.image_outlet.bind(on_release=partial(App.get_running_app().change_screen, image[1]))
You will need to make a minor change to your change_screen method to handle additional args:
def change_screen(self, screen_name, *args):
screen_manager = self.root.ids['screen_manager']
screen_manager.current = screen_name

Groovy GroupBy field with and without white spaces

I have invoices list as below
def invoices = [
'LEDES98BI V2',
'LINE|INVOICE_DATE|INVOICE_NUMBER|INVOICE_TOTAL',
'1|20150301|INV-Error_Test1|22',
'2|20150301|INV-Error_Test1|24',
'3|20150301|INV-Error_Test2|26',
'4|20150301|INV-Error_Test2|28,']
I am trying to do groupBy on the above collection with INVOICE_NUMBER and trying to achieve map with INVOICE_NUMBER and lines as values, below code does it
def lines = invoices*.split('\\|').findAll{ it.size()>1 }
def heads = lines.first()
def invoiceMap = lines.tail().collect{ [heads, it].transpose().collectEntries() }.groupBy{ it.INVOICE_NUMBER }
If I print invoiceMap I get what I intended as below map
[INV-Error_Test1:[[LINE:1, INVOICE_DATE:20150301, INVOICE_NUMBER:INV-Error_Test1, INVOICE_TOTAL:22],
[LINE:2, INVOICE_DATE:20150301, INVOICE_NUMBER:INV-Error_Test1, INVOICE_TOTAL:24]],
INV-Error_Test2:[[LINE:3, INVOICE_DATE:20150301, INVOICE_NUMBER:INV-Error_Test2, INVOICE_TOTAL:26],
[LINE:4, INVOICE_DATE:20150301, INVOICE_NUMBER:INV-Error_Test2, INVOICE_TOTAL:28,]]
]
But but if the INVOICE_NUMBER has any white spaces with it in the invoices map my code doesnt work. Can someone help me to make my code work with white spaces on INVOICE_NUMBER?
Use a proper CSV parser, rather than rolling your own.
#Grab('com.xlson.groovycsv:groovycsv:1.0')
import static com.xlson.groovycsv.CsvParser.parseCsv
def invoices = [
'LEDES98BI V2',
'LINE|INVOICE_DATE|INVOICE_NUMBER|INVOICE_TOTAL',
'1|20150301|INV-Error_Test1|22',
'2|20150301|INV-Error_Test1|24',
'3|20150301|INV-Error_Test2|26',
'4|20150301|INV-Error_Test2|28,']
def data = parseCsv(invoices.drop(1).join('\n'), separator:'|')
def invoiceMap = data.collect().groupBy { it.INVOICE_NUMBER }
Or with a space in the column title:
def invoices = [
'LEDES98BI V2',
'LINE|INVOICE_DATE|INVOICE NUMBER|INVOICE_TOTAL',
'1|20150301|INV-Error_Test1|22',
'2|20150301|INV-Error_Test1|24',
'3|20150301|INV-Error_Test2|26',
'4|20150301|INV-Error_Test2|28,']
def data = parseCsv(invoices.drop(1).join('\n'), separator:'|')
def invoiceMap = data.collect().groupBy { it.'INVOICE NUMBER' }
You just need to quote your name, like this
def invoiceMap = lines.tail().collect{ [heads, it].transpose().collectEntries() }.groupBy{ it.'INVOICE NUMBER' }

Spark: How to translate count(distinct(value)) in Dataframe API's

I'm trying to compare different ways to aggregate my data.
This is my input data with 2 elements (page,visitor):
(PAG1,V1)
(PAG1,V1)
(PAG2,V1)
(PAG2,V2)
(PAG2,V1)
(PAG1,V1)
(PAG1,V2)
(PAG1,V1)
(PAG1,V2)
(PAG1,V1)
(PAG2,V2)
(PAG1,V3)
Working with a SQL command into Spark SQL with this code:
import sqlContext.implicits._
case class Log(page: String, visitor: String)
val logs = data.map(p => Log(p._1,p._2)).toDF()
logs.registerTempTable("logs")
val sqlResult= sqlContext.sql(
"""select page
,count(distinct visitor) as visitor
from logs
group by page
""")
val result = sqlResult.map(x=>(x(0).toString,x(1).toString))
result.foreach(println)
I get this output:
(PAG1,3) // PAG1 has been visited by 3 different visitors
(PAG2,2) // PAG2 has been visited by 2 different visitors
Now, I would like to get the same result using Dataframes and thiers API, but I can't get the same output:
import sqlContext.implicits._
case class Log(page: String, visitor: String)
val logs = data.map(p => Coppia(p._1,p._2)).toDF()
val result = log.select("page","visitor").groupBy("page").count().distinct
result.foreach(println)
In fact, that's what I get as output:
[PAG1,8] // just the simple page count for every page
[PAG2,4]
What you need is the DataFrame aggregation function countDistinct:
import sqlContext.implicits._
import org.apache.spark.sql.functions._
case class Log(page: String, visitor: String)
val logs = data.map(p => Log(p._1,p._2))
.toDF()
val result = logs.select("page","visitor")
.groupBy('page)
.agg('page, countDistinct('visitor))
result.foreach(println)
You can use dataframe's groupBy command twice to do so. Here, df1 is your original input.
val df2 = df1.groupBy($"page",$"visitor").agg(count($"visitor").as("count"))
This command would produce the following result:
page visitor count
---- ------ ----
PAG2 V2 2
PAG1 V3 1
PAG1 V1 5
PAG1 V2 2
PAG2 V1 2
Then use the groupBy command again to get the final result.
df2.groupBy($"page").agg(count($"visitor").as("count"))
Final output:
page count
---- ----
PAG1 3
PAG2 2
I think in the newer versions of Spark it is easier. The following is tested with 2.4.0.
1. First, create an array for sample.
val myArr = Array(
("PAG1","V1"),
("PAG1","V1"),
("PAG2","V1"),
("PAG2","V2"),
("PAG2","V1"),
("PAG1","V1"),
("PAG1","V2"),
("PAG1","V1"),
("PAG1","V2"),
("PAG1","V1"),
("PAG2","V2"),
("PAG1","V3")
)
2. Crate a dataframe
val logs = spark.createDataFrame(myArr)
.withColumnRenamed("_1","page")
.withColumnRenamed("_2","visitor")
3. Now aggregation with distinctCount spark sql function
import org.apache.spark.sql.{functions => F}
logs.groupBy("page").agg(
F.countDistinct("visitor").as("visitor"))
.show()
4. Expected result:
+----+-------+
|page|visitor|
+----+-------+
|PAG1| 3|
|PAG2| 2|
+----+-------+
Use this if you want to display the distinct values of a column
display(sparkDF.select('columnName').distinct())

how to print recursively a Python dictionary and its subdictionaries with whitespace alignment into columns

I want to create a function that can take a dictionary of dictionaries such as the following
information = {
"sample information": {
"ID": 169888,
"name": "ttH",
"number of events": 124883,
"cross section": 0.055519,
"k factor": 1.0201,
"generator": "pythia8",
"variables": {
"trk_n": 147,
"zappo_n": 9001
}
}
}
and then print it in a neat way such as the following, with alignment of keys and values using whitespace:
sample information:
ID: 169888
name: ttH
number of events: 124883
cross section: 0.055519
k factor: 1.0201
generator: pythia8
variables:
trk_n: 147
zappo_n: 9001
My attempt at the function is the following:
def printDictionary(
dictionary = None,
indentation = ''
):
for key, value in dictionary.iteritems():
if isinstance(value, dict):
print("{indentation}{key}:".format(
indentation = indentation,
key = key
))
printDictionary(
dictionary = value,
indentation = indentation + ' '
)
else:
print(indentation + "{key}: {value}".format(
key = key,
value = value
))
It produces the output like the following:
sample information:
name: ttH
generator: pythia8
cross section: 0.055519
variables:
zappo_n: 9001
trk_n: 147
number of events: 124883
k factor: 1.0201
ID: 169888
As is shown, it successfully prints the dictionary of dictionaries recursively, however is does not align the values into a neat column. What would be some reasonable way of doing this for dictionaries of arbitrary depth?
Try using the pprint module. Instead of writing your own function, you can do this:
import pprint
pprint.pprint(my_dict)
Be aware that this will print characters such as { and } around your dictionary and [] around your lists, but if you can ignore them, pprint() will take care of all the nesting and indentation for you.

Resources