Is there a way to save nested entities in gcloud-python? - google-cloud-datastore

I'm trying to save an object into Cloud Datastore, the object contains a dictionary as a property value:
client = datastore.Client(project_id)
key = client.key('Config', 'config', 'Environment', 'env_name')
env = datastore.entity.Entity(key)
env['prop1'] = dict(foo='bar')
client.put(env)
but it raises
ValueError: Unknown protobuf attr type
Although I'm able to do so using gcloud-node.
Is it possible to save compound object using gcloud-python?

It sounds like you're interested in storing an embedded entity, which I believe is what gcloud-node does automagically.
I think you can do this by setting the field (prop1) to a datastore.Entity containing a sub-property (foo) set to 'bar'.
client = datastore.Client(project_id)
key = client.key('Config', 'config', 'Environment', 'env_name')
env = datastore.Entity(key)
env['prop1'] = datastore.Entity(key=client.key('EmbeddedKind')
env['prop1']['foo'] = 'bar'
client.put(env)
When you get this back, it'll look like...
>>> c.get(env.key)
<Entity[{'kind': u'Config', 'name': u'config'}, {'kind': u'Env', 'name': u'env_name'}] {u'prop1': <Entity[{'kind': u'Embedded'}] {u'foo': 'bar'}>}>

Related

Invalid type for parameter error when using put_item dynamodb

I want to write data in dataframe to dynamodb table
item = {}
for row in datasource_archived_df_join_repartition.rdd.collect():
item['x'] = row.x
item['y'] = row.y
client.put_item( TableName='tryfail',
Item=item)
but im gettin this error
Invalid type for parameter Item.x, value: 478.2, type: '<'type 'float''>', valid types: '<'type 'dict''>'
Invalid type for parameter Item.y, value: 696- 18C 12, type: '<'type 'unicode''>', valid types: '<'type 'dict''>'
Old question, but it still comes up high in a search and hasn't been answered properly, so here we go.
When putting an item in a DynamoDB table it must be a dictionary in a particular nested form that indicates to the database engine the data type of the value for each attribute. The form looks like below. The way to think of this is that an AttributeValue is not a bare variable value but a combination of that value and its type. For example, an AttributeValue for the AlbumTitle attribute below is the dict {'S': 'Somewhat Famous'} where the 'S' indicates a string type.
response = client.put_item(
TableName='Music',
Item={
'AlbumTitle': { # <-------------- Attribute
'S': 'Somewhat Famous', # <-- Attribute Value with type string ('S')
},
'Artist': {
'S': 'No One You Know',
},
'SongTitle': {
'S': 'Call Me Today',
},
'Year': {
'N': '2021' # <----------- Note that numeric values are supplied as strings
}
}
)
In your case (assuming x and y are numbers) you might want something like this:
for row in datasource_archived_df_join_repartition.rdd.collect():
item = {
'x': {'N': str(row.x)},
'y': {'N': str(row.y)}
}
client.put_item( TableName='tryfail', Item=item)
Two things to note here: first, each item corresponds to a row, so if you are putting items in a loop you must instantiate a new one with each iteration. Second, regarding the conversion of the numeric x and y into strings, the DynamoDB docs explain that the reason the AttributeValue dict requires this is "to maximize compatibility across languages and libraries. However, DynamoDB treats them as number type attributes for mathematical operations." For fuller documentation on the type system for DynamoDB take a look at this or read the Boto3 doc here since you are using Python.
The error message is indicating you are using the wrong type, it looks like you need to be using a dictionary when assigning values to item['x'] and item[y]. e.g.
item['x'] = {'value': row.x}
item['y'] = {'value': row.y}

using dictionaries in swi-prolog

I'm working on a simple web service in Prolog and wanted to respond to my users with data formatted as JSON. A nice facility is reply_json_dict/1 which takes a dictionary and converts it in a HTTP response with well formatted JSON body.
My trouble is that building the response dictionary itself seems a little cumbersome. For example, when I return some data, I have data id but may/may not have data properties (possibly an unbound variable). At the moment I do the following:
OutDict0 = _{ id : DataId },
( nonvar(Props) -> OutDict1 = OutDict0.put(_{ attributes : Props }) ; OutDict1 = OutDict0 ),
reply_json_dict(OutDict1)
Which works fine, so output is { "id" : "III" } or { "id" : "III", "attributes" : "AAA" } depending whether or not Props is bound, but... I'm looking for an easier approach. Primarily because if I need to add more optional key/value pairs, I end up with multiple implications like:
OutDict0 = _{ id : DataId },
( nonvar(Props) -> OutDict1 = OutDict0.put(_{ attributes : Props }) ; OutDict1 = OutDict0 ),
( nonvar(Time) -> OutDict2 = OutDict1.put(_{ time : Time }) ; OutDict2 = OutDict1 ),
( nonvar(UserName) -> OutDict3 = OutDict2.put(_{ userName : UserName }) ; OutDict3 = OutDict2 ),
reply_json_dict(OutDict3)
And that seems just wrong. Is there a simpler way?
Cheers,
Jacek
Instead of messing with dictionaries, my recommendation in this case is to use a different predicate to emit JSON.
For example, consider json_write/2, which lets you emit JSON, also on current output as the HTTP libraries require.
Suppose your representation of data fields is the common Name(Value) notation that is used throughout the HTTP libraries for option processing:
Fields0 = [attributes(Props),time(Time),userName(UserName)],
Using the meta-predicate include/3, your whole example becomes:
main :-
Fields0 = [id(DataId),attributes(Props),time(Time),userName(UserName)],
include(ground, Fields0, Fields),
json_write(current_output, json(Fields)).
You can try it out yourself, by plugging in suitable values for the individual elements that are singleton variables in the snippet above.
For example, we can (arbitrarily) use:
Fields0 = [id(i9),attributes(_),time('12:00'),userName(_)],
yielding:
?- main.
{"id":"i9", "time":"12:00"}
true.
You only need to emit the suitable Content-Type header, and have the same output that reply_json_dict/1 would have given you.
You can do it in one step if you use a list to represent all values that need to go into the dict.
?- Props = [a,b,c], get_time(Time),
D0 = _{id:001},
include(ground, [props:Props,time:Time,user:UserName], Fs),
D = D0.put(Fs).
D0 = _17726{id:1},
Fs = [props:[a, b, c], time:1477557597.205908],
D = _17726{id:1, props:[a, b, c], time:1477557597.205908}.
This borrows the idea in mat's answer to use include(ground).
Many thanks mat and Boris for suggestions! I ended up with a combination of your ideas:
dict_filter_vars(DictIn, DictOut) :-
findall(Key=Value, (get_dict(Key, DictIn, Value), nonvar(Value)), Pairs),
dict_create(DictOut, _, Pairs).
Which then I can use as simple as that:
DictWithVars = _{ id : DataId, attributes : Props, time : Time, userName : UserName },
dict_filter_vars(DictWithVars, DictOut),
reply_json_dict(DictOut)

How to modify the avro key/value schema in a RDD map transformation

I'm trying to migrate some Hadoop Map Reduce code to Spark and I have doubts about how to manage map and reduce transformations when the schema of either the key or value change from input to output.
I have avro files with Indicator records that I want to process somehow. I already have this code that works:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[Indicator]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, Indicator.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val indicatorsRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[Indicator]],
classOf[AvroKey[Indicator]],
classOf[NullWritable])
val myRecordOnlyRdd = indicatorsRdd.map(x => (doSomethingWith(x._1), NullWritable.get)
val indicatorPairRDD = new PairRDDFunctions(myRecordOnlyRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)
But this code works since the schema of the input and ouput keys does not change, is always Indicator. In hadoop Map Reduce you can define a map or reduce functions and modify the schema from input to output. In fact, I have map functions which process every Indicator record and generates a new record SoporteCartera. How can I do this in spark? It is possible from the same RDD or I have to define 2 different RDDs and pass from one to another somehow?
Thanks for your help.
To answer my own question... the problem was that you cannot change the RDD type, you must define a different RDD, so I solved it with the above code:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[SoporteCartera]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, SoporteCartera.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val soporteCarteraRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[SoporteCartera]],
classOf[AvroKey[SoporteCartera]],
classOf[NullWritable])
val indicatorsRdd = soporteCarteraRdd.map(x => (fromSoporteCarteraToIndicator(x._1), NullWritable.get))
val indicatorPairRDD = new PairRDDFunctions(indicatorsRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)

Losing some z3c relation data on restart

I have the following code which is meant to programmatically assign relation values to a custom content type.
publications = # some data
catalog = getToolByName(context, 'portal_catalog')
for pub in publications:
if pub['custom_id']:
results = catalog(custom_id=pub['custom_id'])
if len(results) == 1:
obj = results[0].getObject()
measures = []
for m in pub['measure']:
if m in context.objectIds():
m_id = intids.getId(context[m])
relation = RelationValue(m_id)
measures.append(relation)
obj.measures = measures
obj.reindexObject()
notify(ObjectModifiedEvent(obj))
Snippet of schema for custom content type
measures = RelationList(
title=_(u'Measure(s)'),
required=False,
value_type=RelationChoice(title=_(u'Measure'),
source=ObjPathSourceBinder(object_provides='foo.bar.interfaces.measure.IMeasure')),
)
When I run my script everything looks good. The problem is when my template for the custom content tries to call "pub/from_object/absolute_url" the value is blank - only after a restart. Interestingly, I can get other attributes of pub/from_object after a restart, just not it's URL.
from_object retrieves the referencing object from the relation catalog, but doesn't put the object back in its proper Acquisition chain. See http://docs.plone.org/external/plone.app.dexterity/docs/advanced/references.html#back-references for a way to do it that should work.

What's wrong with my filter query to figure out if a key is a member of a list(db.key) property?

I'm having trouble retrieving a filtered list from google app engine datastore (using python for server side). My data entity is defined as the following
class Course_Table(db.Model):
course_name = db.StringProperty(required=True, indexed=True)
....
head_tags_1=db.ListProperty(db.Key)
So the head_tags_1 property is a list of keys (which are the keys to a different entity called Headings_1).
I'm in the Handler below to spin through my Course_Table entity to filter the courses that have a particular Headings_1 key as a member of the head_tags_1 property. However, it doesn't seem like it is retrieving anything when I know there is data there to fulfill the request since it never displays the logs below when I go back to iterate through the results of my query (below). Any ideas of what I'm doing wrong?
def get(self,level_num,h_key):
path = []
if level_num == "1":
q = Course_Table.all().filter("head_tags_1 =", h_key)
for each in q:
logging.info('going through courses with this heading name')
logging.info("course name filtered is %s ", each.course_name)
MANY MANY THANK YOUS
I assume h_key is key of headings_1, since head_tags_1 is a list, I believe what you need is IN operator. https://developers.google.com/appengine/docs/python/datastore/queries
Note: your indentation inside the for loop does not seem correct.
My bad apparently '=' for list is already check membership. Using = to check membership is working for me, can you make sure h_key is really a datastore key class?
Here is my example, the first get produces result, where the 2nd one is not
import webapp2 from google.appengine.ext import db
class Greeting(db.Model):
author = db.StringProperty()
x = db.ListProperty(db.Key)
class C(db.Model): name = db.StringProperty()
class MainPage(webapp2.RequestHandler):
def get(self):
ckey = db.Key.from_path('C', 'abc')
dkey = db.Key.from_path('C', 'def')
ekey = db.Key.from_path('C', 'ghi')
Greeting(author='xxx', x=[ckey, dkey]).put()
x = Greeting.all().filter('x =',ckey).get()
self.response.write(x and x.author or 'None')
x = Greeting.all().filter('x =',ekey).get()
self.response.write(x and x.author or 'None')
app = webapp2.WSGIApplication([('/', MainPage)],
debug=True)

Resources