cloudera impala PARQUET_FALLBACK_SCHEMA_RESOLUTION - cloudera

It is possible to configure Cloudera Impala (5.12) to default to name instead of position for PARQUET_FALLBACK_SCHEMA_RESOLUTION?
My Parquet files don't always have the same set of columns so we need Impala to look them up by name rather than position, and its a bit of a pain to do this in Hue for every session:
set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;

I'm afraid this isn't configurable on Impala side.
case TImpalaQueryOptions::PARQUET_FALLBACK_SCHEMA_RESOLUTION: {
if (iequals(value, "position") ||
iequals(value, to_string(TParquetFallbackSchemaResolution::POSITION))) {
query_options->__set_parquet_fallback_schema_resolution(
TParquetFallbackSchemaResolution::POSITION);
} else if (iequals(value, "name") ||
iequals(value, to_string(TParquetFallbackSchemaResolution::NAME))) {
query_options->__set_parquet_fallback_schema_resolution(
TParquetFallbackSchemaResolution::NAME);
} else {
return Status(Substitute("Invalid PARQUET_FALLBACK_SCHEMA_RESOLUTION option: "
"'$0'. Valid options are 'POSITION' and 'NAME'.", value));
}
break;
}
Impala server doesn't set default query options. All options are set where client session is setting up. So you need to configure whatever the client you are using. See shell/impala_shell_config_defaults.py for instance.
However, you can still modify the code and recompile :)
in common/thrift/ImpalaInternalService.thrift
struct TQueryOptions {
....
// Determines how to resolve Parquet files' schemas in the absence of field IDs (which
// is always, since fields IDs are NYI). Valid values are "position" (default) and
// "name".
43: optional TParquetFallbackSchemaResolution parquet_fallback_schema_resolution = 0 <--- change it to 1
....
}

Thanks for the info Amos,
I posted the same question on the Cloudera forums and they pointed me to a way to configure this thru the Cloudera Manager.
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/PARQUET-FALLBACK-SCHEMA-RESOLUTION/m-p/62318#M3883

Related

DynamoBD/Amplify non-negative field and field validation on mutations

I am new to AWS in general, I am building a relatively simple application with Amplify, but I've used Google Firebase before. My question is: Is there a way to set a constrain for a field to be non-negative? I have an application that does transactions and I don't want my balance to be negative. I just need a simple error/exception. Is it possible to set a field constraint in DynamoDB that says "This field should be >= 0"?.
I also checked if it was possible to do it in the VTL amplify generated resolver of my graphql mutation, and indeed it is possible to set some constraints, But somehow it allows the operation and crashes on the next one (when the balance on the DB is already < 0, like if it checks it before the update). I tried saying something like "current_balance - transaction >= 0" but I couldn't get it to work.
So it seems that the only way is to create a custom lambda resolver that does the various checks before submitting the mutation to DynamoDB. I haven't tried it yet but I don't understand how I can do a check on the current balance (stored in the DB) without doing a query.
More in general is it even possible to validate fields (even with simple assertions like non-negative) on amplify/dynamoDB? Moving to another DB like Aurora would help?
Thanks for you help
DynamoDb supports conditional updates which allow an update to be applied when the given condition is met. You can set the condition current_balance >= cost for your update.
However, the negative balance is not the main problem. What you should address is how to prevent other requests from updating the same current_balance at the same time, or in short, race conditions on current_balance. In order to deal with that, you also need a conditional update whose condition is "current_balance = initial_balance". The initial_balance is, I guess, what you get from DynamoDB at the very beginning of the purchase process.
Sample VTL code
#set( $remaining_balance = $initial_balance - $transaction_cost )
#if( $remaining_balance < 0 )
$util.error("Insufficient balance")
#end
{
"version" : "2018-05-29",
"operation" : "UpdateItem",
"key": { <your-dynamodb-key> },
"update" : {
"expression" : "SET current_balance = :remaining_balance",
"expressionValues" : {
":remaining_balance" : $util.dynamodb.toNumberJson($remaining_balance)
}
},
"condition": {
"expression": "current_balance = :initial_balance",
"expressionValues" : {
":initial_balance" : $util.dynamodb.toNumberJson($initial_balance)
}
}
}

Can I create additional axes/dimensions in Premake?

Premake 5 gives you two functions to separate independent configuration variables in your projects: configurations and platforms. So for example, you might have:
configurations { "Debug", "Release" }
platform { "Windows", "Linux" }
The documentation refers to these as axes, which is a good way to describe them, since you can have independent settings for each axis:
Really, platforms are just another set of build configuration names, providing another axis on which to configure your project.
But what if I want another axis? For example, the data types used for particular calculations:
calctypes { "Long", "Default", "Short" }
Can I create this new axis, and if so, how?
I think tags (a new feature due to be released in the next alpha build) might be what you're looking for. Here is an example from the pull request where they were implemented:
workspace 'foobar'
configurations { 'release-std', 'debug-std', 'release-blz', 'debug-blz' }
filter { 'configuration:*-std' }
tags { 'use-std' }
filter { 'configuration:*-blz' }
tags { 'use-blz' }
project 'test'
filter { 'tags:use-blz' }
includedependencies { 'blz' }
defines { 'USE_BLZ' }
filter { 'tags:use-std' }
defines { 'USE_STD' }
Update: If you would like to see how to add custom fields (e.g. defines, configurations, etc.), have a look at the api.register() calls in _premake_init.lua. To see how to enable filtering on one of these fields, have a look at this pull request.
While adding new fields is trivial and can be done anywhere, we need to do some work before it will be as simple to enable those fields for filtering.

Meteor GroundDB granularity for offline/online syncing

Let's say that two users do changes to the same document while offline, but in different sections of the document. If user 2 goes back online after user 1, will the changes made by user 1 be lost?
In my database, each row contains a JS object, and one property of this object is an array. This array is bound to a series of check-boxes on the interface. What I would like is that if two users do changes to those check-boxes, the latest change is kept for each check-box individually, based on the time the when the change was made, not the time when the syncing occurred. Is GroundDB the appropriate tool to achieve this? Is there any mean to add an event handler in which I can add some logic that would be triggered when syncing occurs, and that would take care of the merging ?
The short answer is "yes" none of the ground db versions have conflict resolution since the logic is custom depending on the behaviour of conflict resolution eg. if you want to automate or involve the user.
The old Ground DB simply relied on Meteor's conflict resolution (latest data to the server wins) I'm guessing you can see some issues with that depending on the order of when which client comes online.
Ground db II doesn't have method resume it's more or less just a way to cache data offline. It's observing on an observable source.
I guess you could create a middleware observer for GDB II - one that checks the local data before doing the update and update the client or/and call the server to update the server data. This way you would have a way to handle conflicts.
I think to remember writing some code that supported "deletedAt"/"updatedAt" for some types of conflict handling, but again a conflict handler should be custom for the most part. (opening the door for reusable conflict handlers might be useful)
Especially knowing when data is removed can be tricky if you don't "soft" delete via something like using a "deletedAt" entity.
The "rc" branch is currently grounddb-caching-2016 version "2.0.0-rc.4",
I was thinking about something like:
(mind it's not tested, written directly in SO)
// Create the grounded collection
foo = new Ground.Collection('test');
// Make it observe a source (it's aware of createdAt/updatedAt and
// removedAt entities)
foo.observeSource(bar.find());
bar.find() returns a cursor with a function observe our middleware should do the same. Let's create a createMiddleWare helper for it:
function createMiddleWare(source, middleware) {
const cursor = (typeof (source||{}).observe === 'function') ? source : source.find();
return {
observe: function(observerHandle) {
const sourceObserverHandle = cursor.observe({
added: doc => {
middleware.added.call(observerHandle, doc);
},
updated: (doc, oldDoc) => {
middleware.updated.call(observerHandle, doc, oldDoc);
},
removed: doc => {
middleware.removed.call(observerHandle, doc);
},
});
// Return stop handle
return sourceObserverHandle;
}
};
}
Usage:
foo = new Ground.Collection('test');
foo.observeSource(createMiddleware(bar.find(), {
added: function(doc) {
// just pass it through
this.added(doc);
},
updated: function(doc, oldDoc) {
const fooDoc = foo.findOne(doc._id);
// Example of a simple conflict handler:
if (fooDoc && doc.updatedAt < fooDoc.updatedAt) {
// Seems like the foo doc is newer? lets update the server...
// (we'll just use the regular bar, since thats the meteor
// collection and foo is the grounded data
bar.update(doc._id, fooDoc);
} else {
// pass through
this.updated(doc, oldDoc);
}
},
removed: function(doc) {
// again just pass through for now
this.removed(doc);
}
}));

Is there a suitable hook for intercepting all POSTs to an OpenACS/AOLServer system?

I'd like to disable all POSTs to an OpenACS/AOLServer installation. Is there an good singular place – a request-hook or wrapper/middleware – to do this?
(Bonus points if the intercept can let a few URI patterns or logged-in users through.)
Yes, this is straight forward to do. You have a choice here: you can register a proc to run instead of all POSTs, or can you register a filter to run before the POST and filter out certain users or whatever. I think the filter is a better choice.
To do this you register your proc or filter using ns_register_proc or ns_register_filter (with preauth). Put the following code in a .tcl file under the tcl folder of an OpenACS package or under the main AOLserver /web/servername/tcl directory.
Filter example:
ns_register_filter preauth POST / filter_posts
proc filter_posts {} {
set user_id [ad_verify_and_get_user_id]
set list_of_allowed_user_ids [21 567 8999]
if {[lsearch -exact $list_of_allowed_user_ids $user_id] == -1 } {
#this user isn't allowed - so redirect them
ns_returnredirect "/register/"
# tell AOLserver to abort this thread
return filter_return
} else {
# this user is allowed, tell AOLserver to continue
return filter_ok
}
}
Proc example:
ns_register_proc POST / handle_posts
proc handle_posts {} {
ns_returnredirect "http://someotherwebsite.com"
}

Drupal : Modify node - hook_node_insert / hook_node_insert

I have a problem with Drupal 7, I have a content type named "server", wich contains different fields :
hostname
CPU speed
...
The field hostname is entered manually. The others field must be entered progamatically.
So I specified a hostname and a function must search the information (CPU speed, ...) and fills the empty fields.
But I don't manage to update my node. I tried the functions hook_node_insert and hook_node_insert. When I print the node before (1) and after (2) the use of theses functions I can see the difference. But when I access the node http://localhost/drupal/?q=node/32 modifications have disappeared.
Here is a part of my function :
function module_node_presave($node) {
if ($node->type == 'server') {
dpm($node); //(1)
$node->field_server_cpu_speed[LANGUAGE_NONE][0]['value'] = 55;
dpm($node); //(2)
}
}
Can someone help me ?
Thanks in advance,
BDR
Try the Computed field module to create the dynamic fields or add the node_save($node); at the end of the codes to save the node:
function module_node_presave($node) {
if ($node->type == 'server') {
$node->field_server_cpu_speed[LANGUAGE_NONE][0]['value'] = 55;
node_save($node);
}
}

Resources