Crossfilter reductio post cap - crossfilter

I am having trouble trying to use reductio's post().cap functionality. My dataset is like so.
[{foo: 'one', bar: 'B', hits:10},
{foo: 'one', bar: 'B', hits:20},
{foo: 'two', bar: 'B', hits:50},
{foo: 'two', bar: 'B', hits:100},
{foo: 'one', bar: 'A', hits:150}.........]
What I am looking for is
[key: 'B', value:{count=4, sum=180},
key: 'A', value:{count=1, sum=150},
key: 'others', value:{count=7, sum=60}]
I have a foo dim setup as
var barDim = ndx.dimension(function(d){ return d.bar; });
var barGroup = reductio().count(true).sum('hits')(barDim.group());
Thanks in advance!
reductio cap functionality

Unfortunately during the comment thread above I was not familiar enough with the Reductio post API as I don't use it myself. It doesn't currently respect group ordering, but it does provide its own API to control order. For example:
group.post().sortBy('value.sum', d3.descending).cap(3)()
Note that the ordering function here is d3.descending, which is available if you are using D3.js. Otherwise you can use any ordering function with a similar API.
I also note that the sortBy API isn't documented. I will try to get this done so that others can discover it.

Related

How to add new special token to the tokenizer?

I want to build a multi-class classification model for which I have conversational data as input for the BERT model (using bert-base-uncased).
QUERY: I want to ask a question.
ANSWER: Sure, ask away.
QUERY: How is the weather today?
ANSWER: It is nice and sunny.
QUERY: Okay, nice to know.
ANSWER: Would you like to know anything else?
Apart from this I have two more inputs.
I was wondering if I should put special token in the conversation to make it more meaning to the BERT model, like:
[CLS]QUERY: I want to ask a question. [EOT]
ANSWER: Sure, ask away. [EOT]
QUERY: How is the weather today? [EOT]
ANSWER: It is nice and sunny. [EOT]
QUERY: Okay, nice to know. [EOT]
ANSWER: Would you like to know anything else? [SEP]
But I am not able to add a new [EOT] special token.
Or should I use [SEP] token for this?
EDIT: steps to reproduce
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
print(tokenizer.all_special_tokens) # --> ['[UNK]', '[SEP]', '[PAD]', '[CLS]', '[MASK]']
print(tokenizer.all_special_ids) # --> [100, 102, 0, 101, 103]
num_added_toks = tokenizer.add_tokens(['[EOT]'])
model.resize_token_embeddings(len(tokenizer)) # --> Embedding(30523, 768)
tokenizer.convert_tokens_to_ids('[EOT]') # --> 30522
text_to_encode = '''QUERY: I want to ask a question. [EOT]
ANSWER: Sure, ask away. [EOT]
QUERY: How is the weather today? [EOT]
ANSWER: It is nice and sunny. [EOT]
QUERY: Okay, nice to know. [EOT]
ANSWER: Would you like to know anything else?'''
enc = tokenizer.encode_plus(
text_to_encode,
max_length=128,
add_special_tokens=True,
return_token_type_ids=False,
return_attention_mask=False,
)['input_ids']
print(tokenizer.convert_ids_to_tokens(enc))
Result:
['[CLS]', 'query', ':', 'i', 'want', 'to', 'ask', 'a', 'question',
'.', '[', 'e', '##ot', ']', 'answer', ':', 'sure', ',', 'ask', 'away',
'.', '[', 'e', '##ot', ']', 'query', ':', 'how', 'is', 'the',
'weather', 'today', '?', '[', 'e', '##ot', ']', 'answer', ':', 'it',
'is', 'nice', 'and', 'sunny', '.', '[', 'e', '##ot', ']', 'query',
':', 'okay', ',', 'nice', 'to', 'know', '.', '[', 'e', '##ot', ']',
'answer', ':', 'would', 'you', 'like', 'to', 'know', 'anything',
'else', '?', '[SEP]']
As the intention of the [SEP] token was to act as a separator between two sentence, it fits your objective of using [SEP] token to separate sequences of QUERY and ANSWER.
You also try to add different tokens to mark the beginning and end of QUERY or ANSWER as <BOQ> and <EOQ> to mark the beginning and end of QUERY. Likewise, <BOA> and <EOA> to mark the beginning and end of ANSWER.
Sometimes, using the existing token works much better than adding new tokens to the vocabulary, as it requires huge number of training iterations as well as the data to learn the new token embedding.
However, if you want to add a new token if your application demands so, then it can be added as follows:
num_added_toks = tokenizer.add_tokens(['[EOT]'], special_tokens=True) ##This line is updated
model.resize_token_embeddings(len(tokenizer))
###The tokenizer has to be saved if it has to be reused
tokenizer.save_pretrained(<output_dir>)

Returning multiple values from one step using 'select', 'by' in Gremline

My graph schema looks like this:
(Location)<-[:INVENTOR_LOCATED_IN]-(Inventor)-[:INVENTOR_OF]->(Patent)
I'm trying to return multiple values from each step in the query path. Here's the query I have so far that runs correctly:
g.V().and(has('Location', 'city', textContains('Bloomington')), has('Location','state',textContains('IN'))).as('a').
bothE().bothV().hasLabel('Inventor').as('b').
bothE().bothV().has('Patent', 'title', textContains('Lid')).as('c').
select('a,', 'b', 'c').
by('state').by('name_first').by('title').
fold();
What I would like to do is for each step return two node properties. I tried the following but it returns an error:
g.V().and(has('Location', 'city', textContains('Bloomington')), has('Location', 'state',textContains('IN'))).as('a').
bothE().bothV().hasLabel('Inventor').as('b').
bothE().bothV().has('Patent', 'title', textContains('Lid')).as('c').
select('a,', 'b', 'c').
by('city', 'state').by('name_first', 'name_last').by('title', 'abstract').
fold();
Can anyone suggest syntax that will allow me to return multiple properties from each node in the path?
The by(key) is meant to be a sort of shorthand for values(key) which means if you have more than one value you could do:
g.V().and(has('Location', 'city', textContains('Bloomington')), has('Location', 'state',textContains('IN'))).as('a').
bothE().bothV().hasLabel('Inventor').as('b').
bothE().bothV().has('Patent', 'title', textContains('Lid')).as('c').
select('a,', 'b', 'c').
by(values('city', 'state').fold()).
by(values('name_first', 'name_last').fold()).
by(values('title', 'abstract').fold()).
fold()
You might also consider forms of elementMap(), valueMap(), or project() as alternatives. Since by() takes a Traversal you have a lot of flexibility.

using dictionaries in swi-prolog

I'm working on a simple web service in Prolog and wanted to respond to my users with data formatted as JSON. A nice facility is reply_json_dict/1 which takes a dictionary and converts it in a HTTP response with well formatted JSON body.
My trouble is that building the response dictionary itself seems a little cumbersome. For example, when I return some data, I have data id but may/may not have data properties (possibly an unbound variable). At the moment I do the following:
OutDict0 = _{ id : DataId },
( nonvar(Props) -> OutDict1 = OutDict0.put(_{ attributes : Props }) ; OutDict1 = OutDict0 ),
reply_json_dict(OutDict1)
Which works fine, so output is { "id" : "III" } or { "id" : "III", "attributes" : "AAA" } depending whether or not Props is bound, but... I'm looking for an easier approach. Primarily because if I need to add more optional key/value pairs, I end up with multiple implications like:
OutDict0 = _{ id : DataId },
( nonvar(Props) -> OutDict1 = OutDict0.put(_{ attributes : Props }) ; OutDict1 = OutDict0 ),
( nonvar(Time) -> OutDict2 = OutDict1.put(_{ time : Time }) ; OutDict2 = OutDict1 ),
( nonvar(UserName) -> OutDict3 = OutDict2.put(_{ userName : UserName }) ; OutDict3 = OutDict2 ),
reply_json_dict(OutDict3)
And that seems just wrong. Is there a simpler way?
Cheers,
Jacek
Instead of messing with dictionaries, my recommendation in this case is to use a different predicate to emit JSON.
For example, consider json_write/2, which lets you emit JSON, also on current output as the HTTP libraries require.
Suppose your representation of data fields is the common Name(Value) notation that is used throughout the HTTP libraries for option processing:
Fields0 = [attributes(Props),time(Time),userName(UserName)],
Using the meta-predicate include/3, your whole example becomes:
main :-
Fields0 = [id(DataId),attributes(Props),time(Time),userName(UserName)],
include(ground, Fields0, Fields),
json_write(current_output, json(Fields)).
You can try it out yourself, by plugging in suitable values for the individual elements that are singleton variables in the snippet above.
For example, we can (arbitrarily) use:
Fields0 = [id(i9),attributes(_),time('12:00'),userName(_)],
yielding:
?- main.
{"id":"i9", "time":"12:00"}
true.
You only need to emit the suitable Content-Type header, and have the same output that reply_json_dict/1 would have given you.
You can do it in one step if you use a list to represent all values that need to go into the dict.
?- Props = [a,b,c], get_time(Time),
D0 = _{id:001},
include(ground, [props:Props,time:Time,user:UserName], Fs),
D = D0.put(Fs).
D0 = _17726{id:1},
Fs = [props:[a, b, c], time:1477557597.205908],
D = _17726{id:1, props:[a, b, c], time:1477557597.205908}.
This borrows the idea in mat's answer to use include(ground).
Many thanks mat and Boris for suggestions! I ended up with a combination of your ideas:
dict_filter_vars(DictIn, DictOut) :-
findall(Key=Value, (get_dict(Key, DictIn, Value), nonvar(Value)), Pairs),
dict_create(DictOut, _, Pairs).
Which then I can use as simple as that:
DictWithVars = _{ id : DataId, attributes : Props, time : Time, userName : UserName },
dict_filter_vars(DictWithVars, DictOut),
reply_json_dict(DictOut)

Is there a way to save nested entities in gcloud-python?

I'm trying to save an object into Cloud Datastore, the object contains a dictionary as a property value:
client = datastore.Client(project_id)
key = client.key('Config', 'config', 'Environment', 'env_name')
env = datastore.entity.Entity(key)
env['prop1'] = dict(foo='bar')
client.put(env)
but it raises
ValueError: Unknown protobuf attr type
Although I'm able to do so using gcloud-node.
Is it possible to save compound object using gcloud-python?
It sounds like you're interested in storing an embedded entity, which I believe is what gcloud-node does automagically.
I think you can do this by setting the field (prop1) to a datastore.Entity containing a sub-property (foo) set to 'bar'.
client = datastore.Client(project_id)
key = client.key('Config', 'config', 'Environment', 'env_name')
env = datastore.Entity(key)
env['prop1'] = datastore.Entity(key=client.key('EmbeddedKind')
env['prop1']['foo'] = 'bar'
client.put(env)
When you get this back, it'll look like...
>>> c.get(env.key)
<Entity[{'kind': u'Config', 'name': u'config'}, {'kind': u'Env', 'name': u'env_name'}] {u'prop1': <Entity[{'kind': u'Embedded'}] {u'foo': 'bar'}>}>

google charts datetime type

Is it possible to use literal strings when instantiating a datetime type? (If no, skip) If so, does the - need to be a /? If that doesn't matter, please tell me what's wrong with this:
var data = new google.visualization.DataTable({cols:[{label: 'date', type: 'datetime'},
{label: 'power', type: 'number'}], rows: [{c: [{v:2007/12/01 00:12:00},{v:0}]},
{c: [{v:2007/12/01 01:12:00},{v:101}]}, {c: [{v:2007/12/01 02:12:00},{v:201}]},
{c: [{v:2007/12/01 03:12:00},{v:302}]}]});
(I already tried quoting literal datetimes.)
-Shawn
You might have to put it in Epoch format. try something like this
new Date("#epoch_seconds_go_here");
Here is the link that might help you

Resources