Why a list of pairs instead of hashes from map? [duplicate] - dictionary

This is a bit of unexpected behavior that's likely to bite beginners. First, is this intended? Second, what other things does Raku use to guess which object to create? Does it start off thinking it's Block or Hash and change later, or does it decide on the end?
You can construct a Hash with braces and the fat arrow:
my $color-name-to-rgb = {
'red' => 'FF0000',
};
put $color-name-to-rgb.^name; # Hash
Using the other Pair notation creates a Hash too.
my $color-name-to-rgb = {
:red('FF0000'),
};
But, absent the fat arrow, I get a Block instead:
my $color-name-to-rgb = {
'red', 'FF0000',
};
put $color-name-to-rgb.^name; # Block
The Hash docs only mention that using $_ inside the braces creates a Block.
There are other ways to define a hash, but I'm asking about this particular bit of syntax and not looking for the workarounds I already know about.
$ perl6 -v
This is Rakudo version 2017.04.3 built on MoarVM version 2017.04-53-g66c6dda
implementing Perl 6.c.

When it's a Hash
Your question1 and this answer only apply to braced blocks in term position2.
Braced code that precisely follows the rule explained below constructs a Hash:
say WHAT { } # (Hash)
say WHAT { %foo } # (Hash)
say WHAT { %foo, ... } # (Hash)
say WHAT { foo => 42, ... } # (Hash)
say WHAT { :foo, ... } # (Hash)
say WHAT { key => $foo, ... } # (Hash)
The rule
If the block is empty, or contains just a list whose first element is a % sigil'd variable (eg %foo) or a literal pair (eg :bar), and it does not have a signature or include top level statements, it's a Hash. Otherwise it's a Block.
To force Block or Hash interpretation
To force a {...} term to construct a Block instead of a Hash, write a ; at the start i.e. { ; ... }.
To write an empty Block term, write {;}.
To write an empty Hash term, write {}.
To force a {...} term to construct a Hash instead of a Block, follow the rule (explained in detail in the rest of this answer), or write %(...) instead.
An explicit signature means it's a Block
Some braced code has an explicit signature, i.e. it has explicit parameters such as $foo below. It always constructs a Block no matter what's inside the braces:
say WHAT { key => $foo, 'a', 'b' } # (Hash)
say WHAT -> $foo { key => $foo, 'a', 'b' } # (Block)
An implicit signature also means it's a Block
Some braced code has an implicit signature that is generated due to some explicit choice of coding within the block:
Use of a "pronoun" inside {...} means it's a Block with a signature (an implicit signature if it doesn't already have an explicit one). The pronouns are $_, #_, and %_.
This includes implied use of $_ inside {...} due to a .method call with no left hand side argument. In other words, even { .foo } has a signature ((;; $_? is raw)) due to .foo's lack of a left hand side argument.
Use of a "placeholder" variable (e.g. $^foo).
As with an explicit signature, if braced code has an implicit signature then it always constructs a Block no matter what's inside the braces:
say WHAT { key => $_ } # (Block)
say WHAT { key => 'value', .foo, .bar } # (Block)
Top level statements mean it's a Block
say WHAT { :foo; (do 'a'), (do 'b') } # (Block)
say WHAT { :foo, (do 'a'), (do 'b') } # (Hash)
The second line contains multiple statements but they're producing values within individual elements of a list that's the single top level expression.
A top level declaration of an identifier mean it's a Block
A declaration is a statement, but I've included this section just in case someone doesn't realize that.
say WHAT { :foo, $baz, {my $bar} } # (Hash)
say WHAT { :foo, $baz, (my $bar) } # (Block)
The first line contains a Block as a key that contains a declaration (my $bar). But that declaration belongs to the inner {my $bar} Block, not the outer {...}. So the inner Block is just a value as far as the outer {...} is concerned, and thus that outer braced code is still interpreted as a Hash.
In contrast the second line declares a variable directly within the outer {...}. So it's a Block.
Still Blocks, not Hashs
Recall that, to be a Hash, the content of braced code must be a list that begins with either a % sigil'd variable or a literal pair. So these all produce Blocks:
my $bar = key => 'value';
say WHAT { $bar, %baz } # (Block)
say WHAT { |%baz } # (Block)
say WHAT { %#quux } # (Block)
say WHAT { 'a', 'b', key => $foo } # (Block)
say WHAT { Pair.new: 'key', $foo } # (Block)
Footnotes
1 This "Hash or Block?" question is an example of DWIM design. In Raku culture, good DWIM design is considered a good thing. But every DWIM comes with corresponding WATs3. The key to good DWIM design is ensuring that, in general, WATs' barks are worse than their bites4; and that the barks are useful5; and that the net benefits of the DWIM are considered to far outweigh all the barking and biting.6
2 A term is Raku's analog of a noun or noun phrase in English. It's a value.
Examples of braced blocks that are terms:
.say given { ... } # closure? hash?
say 42, { ... } # closure? hash?
Examples of braced blocks that are not terms:
if True { ... } # always a closure
class foo { ... } # always a package
put bar{ ... } # always a hash index
This answer only discusses braced blocks that are terms. For more details about terms, or more specifically "term position" (places in the grammar where a braced block will be interpreted as a term), see the comments below this answer.
3 WAT refers to a dev's incredulous surprise when something seems crazy to them. It's known that, even for well designed DWIMs, for each one that works for most folk, most of the time, there are inevitably one or more related WATs that surprise some folk, some of the time, including some of the same folk who at other times benefit from the DWIM.
4 The bite of the WATs related to this DWIM varies. It's typically a bark (error message) that makes the problem obvious. But it can also be much more obscure:
say { a => 42 }() ; # No such method 'CALL-ME' for invocant of type 'Hash' WAT? Oh.
say { a => $_ }<a> ; # Type Block does not support associative indexing. WAT? Oh.
say { a => $_, b => 42, c => 99 } .elems # 1 WAT?????
5 A "bark" is an error message or warning in documentation. These can often be improved. cf Lock.protect({}) fails, but with surprising message.
6 Community member opinions differ on whether DWIM design in general, or any given DWIM in particular, is worth it. cf my perspective vs Sam's answer to this question.

The preferred Perl6 way is to use %( ) to create hashes.
my $color-name-to-rgb = %(
'red', 'FF0000',
);
I would not recommend people use braces to create hashes, ever. If they want to make a hash then %( ) is the proper way to do it.
If you are coming from the Perl 5 world it's best to just get in the habit of using %( ) instead of { } when creating a Hash.

Related

Add element to arrays, that are values to a given key name (json transformation with jq)

I'm a jq newbie, and I try to transform a json (a Swagger spec). I want to add an element to the array value of the "parameter" keys:
{
...
"paths": {
"/great/endpoint1": {
"get": {
"parameters": [] <<--- add a value here
}
}
"/great/endpoint2": {
"post": {
"parameters": [] <<-- and here too here too etc.
....
The following jqplay almost works. It adds values to the right arrays, but it has the nasty side effect of also removing the "x-id" value from the root of the input json. It's probably because of a faulty if-condition. As the paths contain a varying string (the endpoint names), I don't know how to write a wildcard path expression to address those, which is why I have tried using walk instead:
https://jqplay.org/s/az56quLZa3
Since the sample data is incomplete, it's difficult to say exactly what you're looking for but it looks like you should be using parameters in the call to walk:
walk(if type=="object" and has("parameters")
then .parameters += [{"extra": "value"}]
else . end)
If you want to restrict the walk to the top-level paths, you would preface the above with: .paths |=

"exponentially large number of cases" errors in latest Flow with common spread pattern

I frequently use the following pattern to create objects with null/undefined properties omitted:
const whatever = {
something: true,
...(a ? { a } : null),
...(b ? { b } : null),
};
As of flow release v0.112, this leads to the error message:
Computing object literal [1] may lead to an exponentially large number of cases to reason about because conditional [2] and conditional [3] are both unions. Please use at most one union type per spread to simplify reasoning about the spread result. You may be able to get rid of a union by specifying a more general type that captures all of the branches of the union.
It sounds to me like this isn't really a type error, just that Flow is trying to avoid some heavier computation. This has led to dozens of flow errors in my project that I need to address somehow. Is there some elegant way to provide better type information for these? I'd prefer not to modify the logic of the code, I believe that it works the way that I need it to (unless someone has a more elegant solution here as well). Asking here before I resort to // $FlowFixMe for all of these.
Complete example on Try Flow
It's not as elegant to write, and I think Flow should handle the case that you've shown, but if you still want Flow to type check it you could try rewriting it like this:
/* #flow */
type A = {
cat: number,
};
type B = {
dog: string,
}
type Built = {
something: boolean,
a?: A,
b?: B,
};
function buildObj(a?: ?A, b?: ?B): Built {
const ret: Built = {
something: true
};
if(a) ret.a = a
if(b) ret.b = b
return ret;
}
Try Flow

Suppress empty result from 'many' in 'seq(p, many(p))' construct with parser combinators

I'm trying to build parser combinators following Hutton and Meijer, "Monadic Parser Combinators". My implementation is in PostScript, but I think my issue is general to combinator parsers and not my specific implementation.
As a small exercise, I'm using the parsers to recognize regular expressions.
(pc9.ps)run
/Dot (.) char def
/Meta (*+?) anyof def
/Character (*+?.|()) noneof def
/Atom //Dot
//Character plus def
/Factor //Atom //Meta maybe seq def
/Term //Factor //Factor many seq def
/Expression //Term (|) char //Term xthen many seq def
/regex { string-input //Expression exec ps } def
(abc|def|ghi) regex
quit
It's working, but the output has lots of [] empty arrays that really get in the way when I try to bind handlers to process the values.
$ gsnd -q -dNOSAFER pc9re2.ps
stack:
[[[[[97 []] [[98 []] [[99 []] []]]] [[[100 []] [[101 []] [[102 []]
[]]]] [[[103 []] [[104 []] [[105 []] []]]] []]]] null]]
These are happening whenever a seq sequencing combinator accepts the result from maybe or many (which uses maybe) that had zero occurrences.
What is the normal way of excluding this extra noise in the output with Parser Combinators?
github repo
Sigh. It's seems I can just implement around it. I added special code in seq to detect an empty right-hand-side and just discard it. On to other problems...
Edit: I encountered the same problem again in version 11 (and a half). Now I've got a better solution IMO:
https://groups.google.com/g/comp.lang.functional/c/MbJxrJSk8Mw/m/MoT3Dr0IAwAJ
Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts
when I move my arm like this; ... so don't move your arm like that"
problem.
I want the "result" part of the "reply" structure (using new terms
following usage from the Parsec document) to be any of the /usual/
PostScript types: integer, real, string, boolean, array, dictionary.
But I also need some way to arbitrarily combine or concatenate two
objects regardless of type. My then (aka seq) combinator needs to
do this. So I made a hack-y function that does the combining. If it
has two arrays, it composes the contents into a longer array. If it
has one array and some other object it extends the array by one and
stuffs the object in the front or back as appropriate. If it has two
non-array objects it makes a new 2-element array to contain them.
So, instead of building xthen and thenx off of then and needing
to cons, car, and cdr the stuff, I can write all 3 of these as a more
general parameterized function.
sequence{ p q u }{
{ /p exec +is-ok {
next x-xs force /q exec +is-ok {
next x-xs 3 1 roll /u exec exch consok
}{
x-xs 3 2 roll ( after ) exch cons exch cons cons
} ifelse
} if } ll } #func
then { {append} sequence }
xthen { {exch pop} sequence }
thenx { {pop} sequence }
append { 1 index zero eq { exch pop }{
dup zero eq { pop }{
1 index type /arraytype eq {
dup type /arraytype eq { compose }{ one compose } ifelse
}{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }
(#func is my own non-standard extension to PostScript that takes a
procedure body and list of parameters and wraps the procedure with
code that defines the arguments in a local dictionary. ll is my
hack-y PostScript way of making lambdas with hard-patched parameters,
it's short for load all literals.)
The code also treats executable arrays (ie. PostScript procedures) as
a non-array for the purpose of combining sequences of results. This allows
the parser to be used as a syntax-directed compiler producing procedures
as output.

idl: pass keyword dynamically to isa function to test structure read by read_csv

I am using IDL 8.4. I want to use isa() function to determine input type read by read_csv(). I want to use /number, /integer, /float and /string as some field I want to make sure float, other to be integer and other I don't care. I can do like this, but it is not very readable to human eye.
str = read_csv(filename, header=inheader)
; TODO check header
if not isa(str.(0), /integer) then stop
if not isa(str.(1), /number) then stop
if not isa(str.(2), /float) then stop
I am hoping I can do something like
expected_header = ['id', 'x', 'val']
expected_type = ['/integer', '/number', '/float']
str = read_csv(filename, header=inheader)
if not array_equal(strlowcase(inheader), expected_header) then stop
for i=0l,n_elements(expected_type) do
if not isa(str.(i), expected_type[i]) then stop
endfor
the above doesn't work, as '/integer' is taken literally and I guess isa() is looking for named structure. How can you do something similar?
Ideally I want to pick expected type based on header read from file, so that script still works as long as header specifies expected field.
EDIT:
my tentative solution is to write a wrapper for ISA(). Not very pretty, but does what I wanted... if there is cleaner solution , please let me know.
Also, read_csv is defined to return only one of long, long64, double and string, so I could write function to test with this limitation. but I just wanted to make it to work in general so that I can reuse them for other similar cases.
function isa_generic,var,typ
; calls isa() http://www.exelisvis.com/docs/ISA.html with keyword
; if 'n', test /number
; if 'i', test /integer
; if 'f', test /float
; if 's', test /string
if typ eq 'n' then return, isa(var, /number)
if typ eq 'i' then then return, isa(var, /integer)
if typ eq 'f' then then return, isa(var, /float)
if typ eq 's' then then return, isa(var, /string)
print, 'unexpected typename: ', typ
stop
end
IDL has some limited reflection abilities, which will do exactly what you want:
expected_types = ['integer', 'number', 'float']
expected_header = ['id', 'x', 'val']
str = read_csv(filename, header=inheader)
if ~array_equal(strlowcase(inheader), expected_header) then stop
foreach type, expected_types, index do begin
if ~isa(str.(index), _extra=create_struct(type, 1)) then stop
endforeach
It's debatable if this is really "easier to read" in your case, since there are only three cases to test. If there were 500 cases, it would be a lot cleaner than writing 500 slightly different lines.
This snipped used some rather esoteric IDL features, so let me explain what's happening a bit:
expected_types is just a list of (string) keyword names in the order they should be used.
The foreach part iterates over expected_types, putting the keyword string into the type variable and the iteration count into index.
This is equivalent to using for index = 0, n_elements(expected_types) - 1 do and then using expected_types[index] instead of type, but the foreach loop is easier to read IMHO. Reference here.
_extra is a special keyword that can pass a structure as if it were a set of keywords. Each of the structure's tags is interpreted as a keyword. Reference here.
The create_struct function takes one or more pairs of (string) tag names and (any type) values, then returns a structure with those tag names and values. Reference here.
Finally, I replaced not (bitwise not) with ~ (logical not). This step, like foreach vs for, is not necessary in this instance, but can avoid headache when debugging some types of code, where the distinction matters.
--
Reflective abilities like these can do an awful lot, and come in super handy. They're work-horses in other languages, but IDL programmers don't seem to use them as much. Here's a quick list of common reflective features I use in IDL, with links to the documentation for each:
create_struct - Create a structure from (string) tag names and values.
n_tags - Get the number of tags in a structure.
_extra, _strict_extra, and _ref_extra - Pass keywords by structure or reference.
call_function - Call a function by its (string) name.
call_procedure - Call a procedure by its (string) name.
call_method - Call a method (of an object) by its (string) name.
execute - Run complete IDL commands stored in a string.
Note: Be very careful using the execute function. It will blindly execute any IDL statement you (or a user, file, web form, etc.) feed it. Never ever feed untrusted or web user input to the IDL execute function.
You can't access the keywords quite like that, but there is a typename parameter to ISA that might be useful. This is untested, but should work:
expected_header = ['id', 'x', 'val']
expected_type = ['int', 'long', 'float']
str = read_cv(filename, header=inheader)
if not array_equal(strlowcase(inheader), expected_header) then stop
for i = 0L, n_elemented(expected_type) - 1L do begin
if not isa(str.(i), expected_type[i]) then stop
endfor

Weird behaviour with struct constructors

I've written a basic Node struct in D, designed to be used as a part of a tree-like structure. The code is as follows:
import std.algorithm: min;
alias Number = size_t;
struct Node {
private {
Node* left, right, parent;
Number val;
}
this(Number n) {val = n;}
this(ref Node u, ref Node v) {
this.left = &u;
this.right = &v;
val = min(u.val, v.val);
u.parent = &this;
v.parent = &this;
}
}
Now, I wrote a simple function which is supposed to give me a Node (meaning a whole tree) with the argument array providing the leaves, as follows.
alias Number = size_t;
Node make_tree (Number[] nums) {
if (nums.length == 1) {
return Node(nums[0]);
} else {
Number half = nums.length/2;
return Node(make_tree(nums[0..half]), make_tree(nums[half..$]));
}
}
Now, when I try to run it through dmd, I get the following error message:
Error: constructor Node.this (ulong n) is not callable using argument types (Node, Node)
This makes no sense to me - why is it trying to call a one-argument constructor when given two arguments?
The problem has nothing to do with constructors. It has to do with passing by ref. The constructor that you're trying to use
this(ref Node u, ref Node v) {...}
accepts its arguments by ref. That means that they must be lvalues (i.e. something that can be on the left-hand side of an assignment). But you're passing it the result of a function call which does not return by ref (so, it's returning a temporary, which is an rvalue - something that can go on the right-hand side of an assignment but not the left). So, what you're trying to do is illegal. Now, the error message isn't great, since it's giving an error with regards to the first constructor rather than the second, but regardless, you don't have a constructor which matches what you're trying to do. At the moment, I can think of 3 options:
Get rid of the ref on the constructor's parameters. If you're only going to be passing it the result of a function call like you're doing now, having it accept ref doesn't help you anyway. The returned value will be moved into the function's parameter, so no copy will take place, and ref isn't buying you anything. Certainly, assigning the return values to local variables so that you can pass them to the constructor as it's currently written would lose you something, since then you'd be making unnecessary copies.
Overload the constructor so that it accepts either ref or non-ref. e.g.
void foo(ref Bar b) { ... }
void foo(Bar b) { foo(b); } //this calls the other foo
In general, this works reasonably well when you have one parameter, but it would be a bit annoying here, because you end up with an exponential explosion of function signatures as you add parameters. So, for your constructor, you'd end up with
this(ref Node u, ref Node v) {...}
this(ref Node u, Node v) { this(u, v); }
this(Node u, ref Node v) { this(u, v); }
this(Node u, Node v) { this(u, v); }
And if you added a 3rd parameter, you'd end up with eight overloads. So, it really doesn't scale beyond a single parameter.
Templatize the constructor and use auto ref. This essentially does what #2 does, but you only have to write the function once:
this()(auto ref Node u, auto ref Node v) {...}
This will then generate a copy of the function to match the arguments given (up to 4 different versions of it with the full function body in each rather than 3 of them just forwarding to the 4th one), but you only had to write it once. And in this particular case, it's probably reasonable to templatize the function, since you're dealing with a struct. If Node were a class though, it might not make sense, since templated functions can't be virtual.
So, if you really want to be able to pass by ref, then in this particular case, you should probably go with #3 and templatize the constructor and use auto ref. However, personally, I wouldn't bother. I'd just go with #1. Your usage pattern here wouldn't get anything from auto ref, since you're always passing it two rvalues, and your Node struct isn't exactly huge anyway, so while you obviously wouldn't want to copy it if you don't need to, copying an lvalue to pass it to the constructor probably wouldn't matter much unless you were doing it a lot. But again, you're only going to end up with a copy if you pass it an lvalue, since an rvalue can be moved rather than copied, and you're only passing it rvalues right now (at least with the code shown here). So, unless you're doing something different with that constructor which would involve passing it lvalues, there's no point in worrying about lvalues - or about the Nodes being copied when they're returned from a function and passed into the constructor (since that's a move, not a copy). As such, just removing the refs would be the best choice.

Resources