Varchar measure in Olap Cube - olap

All the measures should be necessarily numerical? what about a varchar measure?

You cannot have real varchar measures. You can however create a calculated measure that has some varchar value assigned / calculated, but that is not what you want I believe.
What you can do is create a fact dimension (degenerate dimension) and use that field either as an attribute hierarchy or as a member property.
HTH,
Hrvoje

Related

DynamoDB schema for querying nearby coordinates

I'm struggling with DynamoDB schema design of a table storing locations. The table will have [userId, lastUpdatedTime, locationGooglePlaceId, longitude, latitude, hideOnUI(bool)]
one of the main query is given user current location (x, y) as GPS coordinate, query nearby userId based on their longitude and latitude.
The problem is how would I design index for this purpose? The table itself can have HASH key UserId, SORT key lastUpdatedTime; but how would GSI go? I seem can't identify any partition key for "equal" operation
In SQL it'll be something like:
select * from table
where x-c <= longitude and longitude < x+c
AND y-c <= latitude and latitude < y+c
Thanks
First of all, I am not sure if DynamoDB is a good fit here, maybe it's better to use another database, since Dynamo does not support complicated indexes.
Nonetheless, here is a design that you can try.
First of all you can split your map into multiple square blocks, every square block would have an id, and known position and size.
Then if you have a location and you want to find all nearby points, you can do the following.
Every point in your database will be storred in the Points table and it will have following keys:
BlockId (String, UUID, partition key) - id of a block this point belongs to
Latitude (Number, sort key) - latitute of a point
Longtitude (Number) - simple attribute
Now if you know what square a user location in and what squares are nearby, you can perform in all nearby squares the following search:
BlockId = <nearby_block_id>
Latitute between(y-c, y+c)
and use a filter expression based on the Longtitude attribute:
Longtitutede between(x-c, x+c)
It does not really matter what to use as a sort key latitude or longtitude here.
Between is a DynamoDB operator that can be used with sort keys or for filtering expressions:
BETWEEN : Greater than or equal to the first value, and less than or
equal to the second value. AttributeValueList must contain two
AttributeValue elements of the same type, either String, Number, or
Binary (not a set type). A target attribute matches if the target
value is greater than, or equal to, the first element and less than,
or equal to, the second element. If an item contains an AttributeValue
element of a different type than the one provided in the request, the
value does not match. For example, {"S":"6"} does not compare to
{"N":"6"}. Also, {"N":"6"} does not compare to {"NS":["6", "2", "1"]}
Now the downside of this is that for every partition key there can be no more than 10GB of data with this key. So the number of dots that you can put in a single square is limited. You can go around this if your squares are small enough or if your squares have variable sizes and you use big squares for not very crowded areas and small squares for very crowded areas, but seems to be a non-trivial project.

basic entity ID , integer or text?

new to SQLite, teaching a class and am wondering about entity ids. If you have only numbers in your id, should you choose integer OR does it ever make sense to choose text as a data type? or are integers used only for values that can be used for calculations? what if there is a number and a letter (text?)? thanks!
If Your entity Id primary key and auto increments so your id should be integers otherwise your id can be varchar or text whatever you desired.
As you said have only numbers in ids so that must be integer. if you sure about id has will be always number.

How do I define a dimension so that null values in the FK are not ignored when showing all values?

I a modeling an OLAP cube using Modrian Workbench Schema and using Jaspersoft to present it. The cube is built upon a fact table with FKs to dimension tables.
Currently my fact table has nullable foreign keys to the dimensions, which I personally find interesting (and, as far as I know, it is just s styling decision whether to use nullable or not nullable FKs ( https://dba.stackexchange.com/questions/3512/fact-table-foreign-keys-null ).
The problem is that when selecting ALL States (State is a dimension in my design), I get only the records that have a state, not the records without states (in which the state id is null).
Is Mondrian capable of getting the rows that have not state id information? How can I define that?
I think you'll have to go with non-nullable FKs and a none / n/a / unknown etc. member if you want the ALL member to refer to all facts.
If you later want to write queries that only consider rows with real dimension values, you can exclude the none member again.

Can We show the string value as a measure on mondrian olap

I want to show the string value as one of the measure value. When a fact table has a integer value and string value respectively and also has some foreign table's keys. Then I could show the integer value as a measure value, but I couldn't show the string value as a measure. Because Measure element in schema of cube (written in XML) doesn't allow that a measure value doesn't have 'aggregator'(It specify the aggregate function of measure values). Of course I understood that we can't aggregate some string values. But I want to show the string value of the latest level in hierarchy.
I read following article. A figure (around middle of this page) shows a cube that contains string value as a measure value. But this is an example of Property value of Dimension table, so this string value isn't contain in fact table. I want to show the string value that contains in fact table.
A Simple Date Dimension for Mondrian Cubes
Anyone have some idea that can be shown the string value as a measure value? Or I have to edit Mondrian's source code?
I have had the same problem and solved it by setting the aggregator attribute in the measure tag to max.
e.g.
\<Measure name="Comment" datatype="String" column="comment" caption="Comment" aggregator="max"/\>
Why does it need to be a measure?
If no aggregation would naturally be applied to it and you just want the string value, it is a dimension, not a measure. Trying to force it to be a measure is not the best approach.
I think the figure you reference is just showing a drillthrough, and that the only actual
measure is Turnover. The report layout is slightly misleading in terms of dimensions and measures.
You can just use the fact table again in the schema as a dimension table if for some reason you don't want to split this out into a separate physical table.
Sounds like the string may be high cardinality to the integer, possibly 1:1. Depending upon the size of your cube, this might or might not be a performance challenge. But don't try to make it a measure.
Good luck!

Bounding Box in SQL database Geodata

I have a SQL table :
SuburbId int,
SubrubName varchar,
StateName varchar,
Postcode varchar,
Latitude Decimal,
Longtitude Decimal
and in MY C# I have created code that creates a bounding box so I can search by distance.
And my stored procedure to get the subrubs is:
[dbo].[Lookup] (
#MinLat decimal(18,15),
#MaxLat decimal(18,15),
#MinLon decimal(18,15),
#MaxLon decimal(18,15)
)
AS
BEGIN
SELECT SuburbId, suburbName, StateName, Latitude, Longitude
FROM SuburbLookup
WHERE (Latitude >= #MinLat AND Latitude <= #MaxLat AND Longitude >= #MinLon AND Longitude <= #MaxLon)
END
My Question is.. this is a Clustered Index Scan... Is there a more efficient way of doing this?
This type of query tends to perform quite poorly with the standard B-tree index. For better performance you can use the geography column type and add a spatial index.
Queries such as WHERE geography.STDistance(geography2) < number can use a spatial index.
Couple of links that should help. Of course depending on the scope of you project you may already have the best solution.
That said if you care to, you can create a custom index in sql server for your locations.
Custom Indexing
Additionally if you wanted to, you could look into Quadtrees and Quadtiles. There is a technique where you can calculate a key via an interleaved addresses, a combination of the lat and lon pairs that, can be represented as an integer and then truncating to a base level to see how they relate to eachother.
see more here

Resources