I'm having trouble with the forms returning an exact match for the user input.
Emphasoft developer challenge:
Taking a list of tax form names (ex: "Form W-2", "Form 1095-C"),
search the website and return some informational results.
Specifically, you must return the "Product Number", the "Title", and
the maximum and minimum years the form is available for download.
Taking a tax form name (ex: "Form W-2") and a range of years
(inclusive, 2018-2020 should fetch three years), download all PDFs
available within that range.
import json import os import sys import requests from bs4 import BeautifulSoup
URL = 'https://apps.irs.gov/app/picklist/list/priorFormPublication.html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&{param.strip}&isDescending=false'
def get_forms(list_tax_form: list):
"""
function to get response from iris.gov with all forms content
:param list_tax_form: list of form names that we want to get info about
:return: dict with form name,form title
"""
response_list = [] # list for all responses of form names
with requests.session() as session:
for param in list_tax_form:
request_params = {'value': param,
'criteria': 'formNumber',
'submitSearch': 'Find',
}
res = session.get(URL, params=request_params).content
response_list.append(res)
return response_list
def parse_responses(list_tax_form: list):
"""
function to get all form names, titles years from previous func return
:param list_tax_form: list of form names that we want to get info about
:return: list of form names, titles, years
"""
responses = get_forms(list_tax_form)
# empty lists to fill them with the received information for all names, years, and titles
td_form_name, td_form_title, td_form_rev_year = [], [], []
for response in responses:
soup = BeautifulSoup(response, 'lxml')
td_name = soup.find_all('td', {'class': 'LeftCellSpacer'})
td_title = soup.find_all('td', {'class': 'MiddleCellSpacer'})
td_rev_year = soup.find_all('td', {'class': 'EndCellSpacer'})
td_form_name.extend(td_name)
td_form_title.extend(td_title)
td_form_rev_year.extend(td_rev_year)
return td_form_name, td_form_title, td_form_rev_year
def format_responses(list_tax_form: list):
"""
function to formate all responses for all forms we got!
1 Task
:param list_tax_form: list of form names that we want to get info about
:return: formated names,links,years
"""
td_names, td_titles, td_years = parse_responses(list_tax_form)
names = [name.text.strip() for name in td_names]
links = [link.find('a')['href'] for link in td_names]
titles = [title.text.strip() for title in td_titles]
years = [int(year.text.strip()) for year in td_years]
set_names = set(names)
final_dict = []
# loop to create dictionary of result information with years of tax form available to download
for name in set_names:
max_year = 0
min_year = max(years)
dict1 = {'form_number': name}
for index, p_name in enumerate(names):
if p_name == name:
if years[index] > max_year:
max_year = years[index]
elif years[index] < min_year:
min_year = years[index]
dict1['form_title'] = titles[index]
dict1['max_year'] = max_year
dict1['min_year'] = min_year
final_dict.append(dict1)
print(json.dumps(final_dict, indent=2))
return names, links, years
def download_files(list_tax_form):
"""
2 Task
Module to download pdf files of form_name that input from user.
:param list_tax_form: list of form names that we want to get info about
:return: message to user of successful create file or either
"""
names, links, years = format_responses(list_tax_form)
form_name = input('enter form name: ')
if form_name in names:
print('form exists. enter years range')
form_year1 = int(input('start year to analysis: '))
form_year2 = int(input('end year to analysis: '))
try:
os.mkdir(form_name)
except FileExistsError:
pass
# indecies to define names range in list of all tax form names
r_index = names.index(form_name) # index of first form_name mention on list
l_index = names.index(form_name) # index of last form_name mention on list
for name in names:
if name == form_name:
r_index += 1
years = years[l_index:r_index]
if form_year1 < form_year2:
range_years = range(form_year1, form_year2 + 1)
for year in range_years:
if year in years:
link = links[years.index(year)]
form_file = requests.get(link, allow_redirects=True)
open(f'{form_name}/{form_name}_{str(year)}.pdf', 'wb').write(form_file.content)
print(f'files saved to {form_name}/ directory!')
else:
print('input correct form name!')
if __name__ == '__main__':
tax_list = sys.argv[1:] # form names
download_files(tax_list)
(ex: "Form W-2" should not return "Form W-2 P")
When this file is ran, it is displaying other unrelated results.
How can I resolve this issue to display only specified user requests?
Recent questions on StackOverflow pertaining to Mixins in Raku have piqued my interest as to whether Mixins can be applied to replicate features present in other programming languages.
For example, in the R-programming language, elements of a vector can be given a name (i.e. an attribute), which is very convenient for data analysis. For an excellent example see: "How to Name the Values in Your Vectors in R" by Andrie de Vries and Joris Meys, who illustrate this feature using R's built-in islands dataset. Below is a more prosaic example (code run in the R-REPL):
> #R-code
> x <- 1:4
> names(x) <- LETTERS[1:4]
> str(x)
Named int [1:4] 1 2 3 4
- attr(*, "names")= chr [1:4] "A" "B" "C" "D"
> x
A B C D
1 2 3 4
> x[1]
A
1
> sum(x)
[1] 10
Below I try to replicate R's 'named-vectors' using the same islands dataset used by de Vries and Meys. While the script below runs and (generally, see #3 below) produces the desired/expected output, I'm left with three main questions, at bottom:
#Raku-script below;
put "Read in data.";
my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82>.split(","); #Area
my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>; #Name
"----".say;
put "Count elements (Area): ", $islands_A.elems; #OUTPUT 48
put "Count elements (Name): ", $islands_N.elems; #OUTPUT 48
"----".say;
put "Create 'named vector' array (and output):\n";
my #islands;
my $i=0;
for (1..$islands_A.elems) {
#islands[$i] := $islands_A[$i] but $islands_N[$i].Str;
$i++;
};
say "All islands (returns Area): ", #islands; #OUTPUT: returns 48 areas (above)
say "All islands (returns Name): ", #islands>>.Str; #OUTPUT: returns 48 names (above)
say "Islands--slice (returns Area): ", #islands[0..3]; #OUTPUT: (11506 5500 16988 2968)
say "Islands--slice (returns Name): ", #islands[0..3]>>.Str; #OUTPUT: (Africa Antarctica Asia Australia)
say "Islands--first (returns Area): ", #islands[0]; #OUTPUT: 11506
say "Islands--first (returns Name): ", #islands[0]>>.Str; #OUTPUT: (Africa)
put "Islands--first (returns Name): ", #islands[0]; #OUTPUT: Africa
put "Islands--first (returns Name): ", #islands[0]>>.Str; #OUTPUT: Africa
Is there a simpler way to write the Mixin loop ...$islands_A[$i] but $islands_N[$i].Str;? Can the loop be obviated entirely?
Can a named-vector or nvec wrapper be written around put that will return (name)\n(value) in the same manner that R does, even for single elements? Might Raku's Pair method be useful here?
Related to #2 above, calling put on the single-element #islands[0] returns the name Africa not the Area value 11506. [Note this doesn't happen with the call to say]. Is there any simple code that can be implemented to ensure that put always returns (numeric) value or always returns (Mixin) name for all-lengthed slices of an array?
Is there a simpler way?
Yes using the zip meta operator Z combined with infix but
my #islands = $islands_A[] Z[but] $islands_N[];
Why don't you modify the array to change the format?
put calls .Str on the value it gets, say calls .gist
If you want put to output some specific text, make sure that the .Str method outputs that text.
I don't think you actually want put to output that format though. I think you want say to output that format.
That is because say is for humans to understand, and you want it nicer for humans.
When you have a question of “Can Raku do X” the answer is invariable yes, it's just a matter of how much work would it be, and if you would still call it Raku at that point.
The question you really want to ask is how easy it is to do X.
I went and implemented something like that link you provided talks about.
Note that this was just a quick implementation that I created right before bed. So think of this as a first rough draft.
If I were actually going to do this for-real, I would probably throw this away and start over after spending days learning enough R to figure out what it is actually doing.
class NamedVec does Positional does Associative {
has #.names is List;
has #.nums is List handles <sum>;
has %!kv is Map;
class Partial {
has $.name;
has $.num;
}
submethod TWEAK {
%!kv := %!kv.new: #!names Z=> #!nums;
}
method from-pairlist ( +#pairs ) {
my #names;
my #nums;
for #pairs -> (:$key, :$value) {
push #names, $key;
push #nums, $value;
}
self.new: :#names, :#nums
}
method from-list ( +#list ){
my #names;
my #nums;
for #list -> (:$name, :$num) {
push #names, $name;
push #nums, $num;
}
self.new: :#names, :#nums
}
method gist () {
my #widths = #!names».chars Zmax #!nums».chars;
sub infix:<fmt> ( $str, $width is copy ){
$width -= $str.chars;
my $l = $width div 2;
my $r = $width - $l;
(' ' x $l) ~ $str ~ (' ' x $r)
}
(#!names Zfmt #widths) ~ "\n" ~ (#!nums Zfmt #widths)
}
method R-str () {
chomp qq :to/END/
Named num [1:#!nums.elems()] #!nums[]
- attr(*, "names")= chr [1:#!names.elems()] #!names.map(*.raku)
END
}
method of () {}
method AT-POS ( $i ){
Partial.new: name => #!names[$i], num => #!nums[$i]
}
method AT-KEY ( $name ){
Partial.new: :$name, num => %!kv{$name}
}
}
multi sub postcircumfix:<{ }> (NamedVec:D $v, Str:D $name){
$v.from-list: callsame
}
multi sub postcircumfix:<{ }> (NamedVec:D $v, List \l){
$v.from-list: callsame
}
my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82>.split(","); #Area
my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>;
# either will work
#my $islands = NamedVec.from-pairlist( $islands_N[] Z=> $islands_A[] );
my $islands = NamedVec.new( names => $islands_N, nums => $islands_A );
put $islands.R-str;
say $islands<Asia Africa Antarctica>;
say $islands.sum;
A named vector essentially combines a vector with a map from names to integer positions and allows you to address elements by name. Naming a vector alters the behavior of the vector, not that of its elements. So in Raku we need to define a role for an array:
role Named does Associative {
has $.names;
has %!index;
submethod TWEAK {
my $i = 0;
%!index = map { $_ => $i++ }, $!names.list;
}
method AT-KEY($key) {
with %!index{$key} { return-rw self.AT-POS($_) }
else { self.default }
}
method EXISTS-KEY($key) {
%!index{$key}:exists;
}
method gist() {
join "\n", $!names.join("\t"), map(*.gist, self).join("\t");
}
}
multi sub postcircumfix:<[ ]>(Named:D \list, \index, Bool() :$named!) {
my \slice = list[index];
$named ?? slice but Named(list.names[index]) !! slice;
}
multi sub postcircumfix:<{ }>(Named:D \list, \names, Bool() :$named!) {
my \slice = list{names};
$named ?? slice but Named(names) !! slice;
}
Mixing in this role gives you most of the functionality of an R named vector:
my $named = [1, 2, 3] but Named<first second last>;
say $named; # OUTPUT: «first␉second␉last1␉2␉3»
say $named[0, 1]:named; # OUTPUT: «first␉second1␉2»
say $named<last> = Inf; # OUTPUT: «Inf»
say $named<end>:exists; # OUTPUT: «False»
say $named<last end>:named; # OUTPUT: «last␉endInf␉(Any)»
As this is just a proof of concept, the Named role doesn't handle the naming of non-existing elements well. It also doesn't support modifying a slice of names. It probably does support creating a pun that can be mixed into more than one list.
Note that this implementation relies on the undocumented fact that the subscript operators are multis. If you want to put the role and operators in a separate file, you probably want to apply the is export trait to the operators.
It might not be the most optimal way of doing it (or what you're specifically looking for) but as soon as I saw this particular problem's statement, the first thing that came to mind were Raku's allomorphs, which are types with two related values that are accessible separately depending on context.
my $areas = (11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82);
my $names = <"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">;
my #islands;
for (0..^$areas) -> \i {
#islands[i] := IntStr.new($areas[i], $names[i]);
}
say "Areas: ", #islands>>.Int;
say "Names: ", #islands>>.Str;
say "Areas slice: ", (#islands>>.Int)[0..3];
say "Names slice: ", (#islands>>.Str)[0..3];
say "Areas first: ", (#islands>>.Int)[0];
say "Names first: ", (#islands>>.Str)[0];
I think I would just do something like this:
class MyRow {
has Str $.island is rw;
has Numeric $.area is rw;
method Str {
$!island;
}
method Numeric {
+$!area;
}
# does Cool coercion of strings that look numeric
submethod BUILD ( Numeric(Cool) :$!area, :$!island ) {
};
}
class MyTable {
has #.data;
has MyRow #.rows is rw;
has %!lookup;
submethod TWEAK {
#!rows = gather
for #!data -> ( $island, $area ) {
my $row = MyRow.new( :$island, :$area );
%!lookup{ $island } = $row;
take $row;
}
}
method find_island( $island ) {
return %!lookup{ $island };
}
}
To set up a table:
my #raw = #island_names Z #island_areas;
my $table = MyTable.new( data => #raw );
Accessing the rows of the table by name:
my $row = $table.find_island('Africa');
say $row; # MyRow.new(island => "Africa", area => 11506)
Using the row element like a string gets you the name,
using it like a number gets you the area:
say ~$row; # Africa
say +$row; # 11506
One of the features here is that you can add more fields to your
rows, you're not constrained to just a value and a name.
The "find_island" method uses an internal %lookup hash to index
the rows by island name, but unlike a simple hash solution
there's no uniqueness constraint: if you have a duplicate island
name, "find_island" will locate the latest row in the set, but
the other row would still be there.
Caveat: I haven't thought much about how well this supports
dynamically adding more rows to the table.
I'm trying to chop out some lists from the following json using pljson
my_json := json('{"Order":
{"no": 1, "batch": 2,"id": 3,"quantity": 10,
"sm_pack": [
{
"no": 10,
"id": 1010,
"quantity": 2
},
{
"no": 11,
"id": 1040,
"quantity": 8
}
],
"sm_size": [
{ ....etc etc
However, I can't get it to work?
I can print the data using this syntax:
v_myjson.path('Order.sm_pack').print;
v_myjson.path('Order.sm_pack[1].no').print;
But how can I assing all those different lists to variables for further processing. i tried different versions of "v_list := json_list(my_json.get('Order.sm_pack')) .. my_json.get('sm_pack').. whatever I try its"NULL SELF" and I seem to have turned blind.
Regards
Printing json lists and objects differs from assigning them to some variables to manipulate them. I will try to answer your question through your example as follows:
DECLARE
obj json := json();
obj_1 json := json();
arr json_list := json_list();
val NUMBER;
BEGIN
/*Create your object*/
obj := json('{"Order":{"no":1,"batch":2,"id":3,"quantity":10,"sm_pack":[{"no":10,"id":1010,"quantity":2},{"no":11,"id":1040,"quantity":8}],"sm_size":[{"no":10,"id":1010,"quantity":2},{"no":11,"id":1040,"quantity":8}]}}');
/*Assign object*/
obj_1 :=json(obj.get('Order'));
/*Assign list from within the object*/
arr := json_list(obj_1.get('sm_pack'));
arr.print;
--or
arr := json_list(json(obj.get('Order')).get('sm_pack'));
arr.print;
/*Get object value from within list*/
val := json_ext.get_number(json(arr.get(2)), 'id');
DBMS_OUTPUT.PUT_LINE(VAL);
END;
/
Notice I used get_number function as your values without single quotes, otherwise, I would use get_string.
Hope that helps!