I have JSON coming from an external application, formatted like so:
{
"ticket_fields": [
{
"url": "https://example.com/1122334455.json",
"id": 1122334455,
"type": "tagger",
"custom_field_options": [
{
"id": 123456789,
"name": "I have a problem",
"raw_name": "I have a problem",
"value": "help_i_have_problem",
"default": false
},
{
"id": 456789123,
"name": "I have feedback",
"raw_name": "I have feedback",
"value": "help_i_have_feedback",
"default": false
},
]
}
{
"url": "https://example.com/6677889900.json",
"id": 6677889900,
"type": "tagger",
"custom_field_options": [
{
"id": 321654987,
"name": "United States,
"raw_name": "United States",
"value": "location_123_united_states",
"default": false
},
{
"id": 987456321,
"name": "Germany",
"raw_name": "Germany",
"value": "location_456_germany",
"default": false
}
]
}
]
}
The end goal is to be able to get the data into a TSV in the sense that each object in the custom_field_options array is grouped by the parent ID (ticket_fields.id), and then transposed such that each object would be represented on a single line, like so:
Ticket Field ID
Name
Value
1122334455
I have a problem
help_i_have_problem
1122334455
I have feedback
help_i_have_feedback
6677889900
United States
location_123_united_states
6677889900
Germany
location_456_germany
I have been able to export the data successfully to TSV already, but it reads per-line, and without preserving order, like so:
Using jq -r '.ticket_fields[] | select(.type=="tagger") | [.id, .custom_field_options[].name, .custom_field_options[].value] | #tsv'
Ticket Field ID
Name
Name
Value
Value
1122334455
I have a problem
I have feedback
help_i_have_problem
help_i_have_feedback
6677889900
United States
Germany
location_123_united_states
location_456_germany
Each of the custom_field_options arrays in production may consist of any number of objects (not limited to 2 each). But I seem to be stuck on how to appropriately group or map these objects to their parent ticket_fields.id and to transpose the data in a clean manner. The select(.type=="tagger") is mentioned in the query as there are multiple values for ticket_fields.type which need to be filtered out.
Based on another answer on here, I did try variants of jq -r '.ticket_fields[] | select(.type=="tagger") | map(.custom_field_options |= from_entries) | group_by(.custom_field_options.ticket_fields) | map(map( .custom_field_options |= to_entries))' without success. Any assistance would be greatly appreciated!
You need two nested iterations, one in each array. Save the value of .id in a variable to access it later.
jq -r '
.ticket_fields[] | select(.type=="tagger") | .id as $id
| .custom_field_options[] | [$id, .name, .value]
| #tsv
'
I have trained a newsmap model in the Newsmap package for quanteda in R and am trying to export the large dictionary it constructed based on my corpus (not the seed dictionary).
I have tried this code, but it only gives me the 10 most associated terms per country in a list format, which I also fail to extract in order to form a dictionary object I can use in R.
Dict <-coef(model)
I would really appreciate any and all help!
You only need to extract the names of the vectors with desired number of words passed to n.
> quanteda::dictionary(lapply(coef(model, n = 1000), FUN = names))
Dictionary object with 226 key entries.
- [bi]:
- burundi, burundi's, bujumbura, burundian, nkurunziza, uprona, msd, nduwimana, hutus, tutsi, radebe, drcongo, rapporteur, elderly, mushikiwabo, generation, kayumba, faustin, hutu, olga [ ... and 980 more ]
- [dj]:
- djibouti, djibouti's, djiboutian, western-led, pretty, photo, watkins, ask, entebbe, westerners, mujahideen, salvation, osprey, persistent, horn, afdb, donors, ismael, nevis, grenade [ ... and 980 more ]
- [er]:
- eritrea, eritreans, eritrean, keetharuth, issaias, eritrea's, binnie, sheila, somaliland, catania, mandeb, brutal, sicily's, lana, horn, lampedusa, aman, afdb, donors, monitoring [ ... and 980 more ]
- [et]:
- ethiopia, ethiopian, addis, ababa, addis, ababa, hailemariam, desalegn, ethiopians, maasho, ethiopia's, mandeb, igad, dibaba, genzebe, mesfin, bekele, spla, shrikesh, laxmidas [ ... and 980 more ]
- [ke]:
- kenya, kenyan, nairobi, nairobi, uhuru, lamu, mombasa, mpeketoni, kenyans, kws, nairobi's, akwiri, ruto, westgate, kenyatta's, mombasa, makaburi, kenyatta, kenya's, ol [ ... and 980 more ]
- [km]:
- comoros, mazen, emiratis, oil-rich, canterbury, lahiya, shoukri, gender, wadia, lombok, brisbane's, entire, christiana, blahodatne, everest's, culiacan, kamensk-shakhtinsky, protestants, pk-5, parwan [ ... and 980 more ]
[ reached max_nkey ... 220 more keys ]
Recent questions on StackOverflow pertaining to Mixins in Raku have piqued my interest as to whether Mixins can be applied to replicate features present in other programming languages.
For example, in the R-programming language, elements of a vector can be given a name (i.e. an attribute), which is very convenient for data analysis. For an excellent example see: "How to Name the Values in Your Vectors in R" by Andrie de Vries and Joris Meys, who illustrate this feature using R's built-in islands dataset. Below is a more prosaic example (code run in the R-REPL):
> #R-code
> x <- 1:4
> names(x) <- LETTERS[1:4]
> str(x)
Named int [1:4] 1 2 3 4
- attr(*, "names")= chr [1:4] "A" "B" "C" "D"
> x
A B C D
1 2 3 4
> x[1]
A
1
> sum(x)
[1] 10
Below I try to replicate R's 'named-vectors' using the same islands dataset used by de Vries and Meys. While the script below runs and (generally, see #3 below) produces the desired/expected output, I'm left with three main questions, at bottom:
#Raku-script below;
put "Read in data.";
my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82>.split(","); #Area
my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>; #Name
"----".say;
put "Count elements (Area): ", $islands_A.elems; #OUTPUT 48
put "Count elements (Name): ", $islands_N.elems; #OUTPUT 48
"----".say;
put "Create 'named vector' array (and output):\n";
my #islands;
my $i=0;
for (1..$islands_A.elems) {
#islands[$i] := $islands_A[$i] but $islands_N[$i].Str;
$i++;
};
say "All islands (returns Area): ", #islands; #OUTPUT: returns 48 areas (above)
say "All islands (returns Name): ", #islands>>.Str; #OUTPUT: returns 48 names (above)
say "Islands--slice (returns Area): ", #islands[0..3]; #OUTPUT: (11506 5500 16988 2968)
say "Islands--slice (returns Name): ", #islands[0..3]>>.Str; #OUTPUT: (Africa Antarctica Asia Australia)
say "Islands--first (returns Area): ", #islands[0]; #OUTPUT: 11506
say "Islands--first (returns Name): ", #islands[0]>>.Str; #OUTPUT: (Africa)
put "Islands--first (returns Name): ", #islands[0]; #OUTPUT: Africa
put "Islands--first (returns Name): ", #islands[0]>>.Str; #OUTPUT: Africa
Is there a simpler way to write the Mixin loop ...$islands_A[$i] but $islands_N[$i].Str;? Can the loop be obviated entirely?
Can a named-vector or nvec wrapper be written around put that will return (name)\n(value) in the same manner that R does, even for single elements? Might Raku's Pair method be useful here?
Related to #2 above, calling put on the single-element #islands[0] returns the name Africa not the Area value 11506. [Note this doesn't happen with the call to say]. Is there any simple code that can be implemented to ensure that put always returns (numeric) value or always returns (Mixin) name for all-lengthed slices of an array?
Is there a simpler way?
Yes using the zip meta operator Z combined with infix but
my #islands = $islands_A[] Z[but] $islands_N[];
Why don't you modify the array to change the format?
put calls .Str on the value it gets, say calls .gist
If you want put to output some specific text, make sure that the .Str method outputs that text.
I don't think you actually want put to output that format though. I think you want say to output that format.
That is because say is for humans to understand, and you want it nicer for humans.
When you have a question of “Can Raku do X” the answer is invariable yes, it's just a matter of how much work would it be, and if you would still call it Raku at that point.
The question you really want to ask is how easy it is to do X.
I went and implemented something like that link you provided talks about.
Note that this was just a quick implementation that I created right before bed. So think of this as a first rough draft.
If I were actually going to do this for-real, I would probably throw this away and start over after spending days learning enough R to figure out what it is actually doing.
class NamedVec does Positional does Associative {
has #.names is List;
has #.nums is List handles <sum>;
has %!kv is Map;
class Partial {
has $.name;
has $.num;
}
submethod TWEAK {
%!kv := %!kv.new: #!names Z=> #!nums;
}
method from-pairlist ( +#pairs ) {
my #names;
my #nums;
for #pairs -> (:$key, :$value) {
push #names, $key;
push #nums, $value;
}
self.new: :#names, :#nums
}
method from-list ( +#list ){
my #names;
my #nums;
for #list -> (:$name, :$num) {
push #names, $name;
push #nums, $num;
}
self.new: :#names, :#nums
}
method gist () {
my #widths = #!names».chars Zmax #!nums».chars;
sub infix:<fmt> ( $str, $width is copy ){
$width -= $str.chars;
my $l = $width div 2;
my $r = $width - $l;
(' ' x $l) ~ $str ~ (' ' x $r)
}
(#!names Zfmt #widths) ~ "\n" ~ (#!nums Zfmt #widths)
}
method R-str () {
chomp qq :to/END/
Named num [1:#!nums.elems()] #!nums[]
- attr(*, "names")= chr [1:#!names.elems()] #!names.map(*.raku)
END
}
method of () {}
method AT-POS ( $i ){
Partial.new: name => #!names[$i], num => #!nums[$i]
}
method AT-KEY ( $name ){
Partial.new: :$name, num => %!kv{$name}
}
}
multi sub postcircumfix:<{ }> (NamedVec:D $v, Str:D $name){
$v.from-list: callsame
}
multi sub postcircumfix:<{ }> (NamedVec:D $v, List \l){
$v.from-list: callsame
}
my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82>.split(","); #Area
my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>;
# either will work
#my $islands = NamedVec.from-pairlist( $islands_N[] Z=> $islands_A[] );
my $islands = NamedVec.new( names => $islands_N, nums => $islands_A );
put $islands.R-str;
say $islands<Asia Africa Antarctica>;
say $islands.sum;
A named vector essentially combines a vector with a map from names to integer positions and allows you to address elements by name. Naming a vector alters the behavior of the vector, not that of its elements. So in Raku we need to define a role for an array:
role Named does Associative {
has $.names;
has %!index;
submethod TWEAK {
my $i = 0;
%!index = map { $_ => $i++ }, $!names.list;
}
method AT-KEY($key) {
with %!index{$key} { return-rw self.AT-POS($_) }
else { self.default }
}
method EXISTS-KEY($key) {
%!index{$key}:exists;
}
method gist() {
join "\n", $!names.join("\t"), map(*.gist, self).join("\t");
}
}
multi sub postcircumfix:<[ ]>(Named:D \list, \index, Bool() :$named!) {
my \slice = list[index];
$named ?? slice but Named(list.names[index]) !! slice;
}
multi sub postcircumfix:<{ }>(Named:D \list, \names, Bool() :$named!) {
my \slice = list{names};
$named ?? slice but Named(names) !! slice;
}
Mixing in this role gives you most of the functionality of an R named vector:
my $named = [1, 2, 3] but Named<first second last>;
say $named; # OUTPUT: «first␉second␉last1␉2␉3»
say $named[0, 1]:named; # OUTPUT: «first␉second1␉2»
say $named<last> = Inf; # OUTPUT: «Inf»
say $named<end>:exists; # OUTPUT: «False»
say $named<last end>:named; # OUTPUT: «last␉endInf␉(Any)»
As this is just a proof of concept, the Named role doesn't handle the naming of non-existing elements well. It also doesn't support modifying a slice of names. It probably does support creating a pun that can be mixed into more than one list.
Note that this implementation relies on the undocumented fact that the subscript operators are multis. If you want to put the role and operators in a separate file, you probably want to apply the is export trait to the operators.
It might not be the most optimal way of doing it (or what you're specifically looking for) but as soon as I saw this particular problem's statement, the first thing that came to mind were Raku's allomorphs, which are types with two related values that are accessible separately depending on context.
my $areas = (11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82);
my $names = <"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">;
my #islands;
for (0..^$areas) -> \i {
#islands[i] := IntStr.new($areas[i], $names[i]);
}
say "Areas: ", #islands>>.Int;
say "Names: ", #islands>>.Str;
say "Areas slice: ", (#islands>>.Int)[0..3];
say "Names slice: ", (#islands>>.Str)[0..3];
say "Areas first: ", (#islands>>.Int)[0];
say "Names first: ", (#islands>>.Str)[0];
I think I would just do something like this:
class MyRow {
has Str $.island is rw;
has Numeric $.area is rw;
method Str {
$!island;
}
method Numeric {
+$!area;
}
# does Cool coercion of strings that look numeric
submethod BUILD ( Numeric(Cool) :$!area, :$!island ) {
};
}
class MyTable {
has #.data;
has MyRow #.rows is rw;
has %!lookup;
submethod TWEAK {
#!rows = gather
for #!data -> ( $island, $area ) {
my $row = MyRow.new( :$island, :$area );
%!lookup{ $island } = $row;
take $row;
}
}
method find_island( $island ) {
return %!lookup{ $island };
}
}
To set up a table:
my #raw = #island_names Z #island_areas;
my $table = MyTable.new( data => #raw );
Accessing the rows of the table by name:
my $row = $table.find_island('Africa');
say $row; # MyRow.new(island => "Africa", area => 11506)
Using the row element like a string gets you the name,
using it like a number gets you the area:
say ~$row; # Africa
say +$row; # 11506
One of the features here is that you can add more fields to your
rows, you're not constrained to just a value and a name.
The "find_island" method uses an internal %lookup hash to index
the rows by island name, but unlike a simple hash solution
there's no uniqueness constraint: if you have a duplicate island
name, "find_island" will locate the latest row in the set, but
the other row would still be there.
Caveat: I haven't thought much about how well this supports
dynamically adding more rows to the table.
I am using Google API for getting latitude and longitude of address,
geocoder.geocode({ 'address': address
, bounds: map.getBounds() },
function (results, status) {
if (status == google.maps.GeocoderStatus.OK) {
Lat = results[0].geometry.location.lat();
Long = results[0].geometry.location.lng();
});
but for same address some times I get value : Latitude=33.189967 and longitude=-96.7333 which is more correct,
and other times I get array of Latitude and Longitude from which I pick up the first and the value I get is Latitude=41.920 and Longitude=83.41.
The address that I am currently using is '1550 South Custer Rd'.
Please help me some one.
I get 4 results for that string:
Found 4 results
[ 0 ]: 1550 South Custer Road, Monroe Charter Township, MI 48161, USA (41.9200564, -83.41902479999999)
[ 1 ]: 1550 South Custer Road, McKinney, TX 75070, USA (33.189967, -96.73350699999997)
[ 2 ]: 1550 South Custer Road, Spokane, WA 99223, USA (47.6389694, -117.34156009999998)
[ 3 ]: 1550 South Custer Road, Custer, MI 49405, USA (43.9296513, -86.21892639999999)
I don't know how you expect the Geocoder to know which of the answers is "more correct" as they all contain that exact string. Perhaps you need to include more information (like the town or the state), or process the results to determine the one that is in your area of interest.