Why uuid() creates very similar identifiers in MariaDB? - mariadb

I am creating a code generator for MariaDB to create a database based on a given JSON.
In that JSON, some initial data also exists. Thus I loop over data and insert them into the database.
Some columns have uuid() default value.
Here's the result of my code inserting data into such a table:
Id,Guid,Key,Order
1,5c52e1db-6809-11ec-982c-0242c0a81003,New,
2,5c530e55-6809-11ec-982c-0242c0a81003,WaitingForBusinessResponse,
3,5c533551-6809-11ec-982c-0242c0a81003,WaitingForUserResponse,
4,5c536433-6809-11ec-982c-0242c0a81003,UnderInvestigation,
5,5c538ba5-6809-11ec-982c-0242c0a81003,Closed,
As you can see UUID values are very very close to each other. This column has a unique index on it, so no duplicate entries would be allowed. But these values make it difficult to track them and one might easily confuse them with each other.
Is there a way to change this behavior? I want to tell MariaDB to create UUID more randomly.

The simple answer is "no, you can't" unless you write your own uuid function which provides an algorithm which creates a uuid() from random or pseudo random number as described in Chapter 4.4 of RFC 4122
The uuid() function of MariaDB (and MySQL) was implemented according to RFC 4122, but uses the algorithm for creating a time based uuid (see chapter 4.2)
Since all algorithms (name based, time based, random) deliver an Universal Unique Identifier which is globally unique in space and time, I don't really understand why you want to change the algorithm from time to random.
Time based uuids using uuidgen and mariadb:
~$ uuidgen -t;uuidgen -t
a5d3c032-6865-11ec-bd1f-1740cb8be951
a5d42d24-6865-11ec-bd1f-1740cb8be951
~$ mariadb -e"select uuid()\G";mariadb -e"select uuid()\G"
*************************** 1. row ***************************
uuid(): 45aca397-683c-11ec-a913-d83bbf89f2e2
*************************** 1. row ***************************
uuid(): 45ad94dd-683c-11ec-a913-d83bbf89f2e2

Related

MariaDB - Inserting historical data into a system versioned (temporal) table

I have some tables in MariaDB that I have been tracking the changes for by using a separate "changelog" table that updates every time a record is updated. However I have recently learned about temporal data tables in MariaDB and I would like to switch to that method as it is a much more elegant method of tracking changes. I'm wondering, however, if there is a way to transfer over my "changelog" table to the newly system versioned tables.
So I was hoping I could insert new rows somehow with the specified values for the table and also specify the row_end and row_start columns and also have that not trigger the table to create another historical row... is this possible? I tried just doing a a "insert into (id, row_start, row_end, etc) values(x, y, z)" but that results in an unknown column "row_start" error.
Old question, but starting with 10.11 MariaDB allows direct insertion of historical data using a command line option or setting.
https://mariadb.com/kb/en/system-versioned-tables/#system_versioning_insert_history
system_versioning_insert_history
Description: Allows direct inserts into ROW_START and ROW_END columns if secure_timestamp allows changing timestamp.
Commandline: --system-versioning-insert-history[={0|1}]
Scope: Global, Session
Dynamic: Yes
Type: Boolean
Default Value: OFF
Introduced: MariaDB 10.11.0

What is the point of Snowflake's Unique constraint?

Snowflake offers a Unique constraint but doesn't actually enforce it. I have an example below showing that with a test table.
What is the point, what value does the constraint add?
What workarounds do people use to avoid duplicates? I can perform a query before every insert but it seems like unnecessary usage.
CREATE OR REPLACE TABLE dbo.Test
(
"A" INT NOT NULL UNIQUE,
"B" STRING NOT NULL
);
INSERT INTO dbo.Test
VALUES (0, 'ABC');
INSERT INTO dbo.Test
VALUES (0, 'DEF');
SELECT *
FROM dbo.Test;
1. A, B
2. 0, ABC
3. 0, DEF
for one, Snowflake is not alone in this world. So data gets imported and exported, and while Snowflake does not enforce the constraints, some other systems might and this way they won't get lost while travelling through Snowflake
for other, it's also informational for the data analytical tools like already mentioned in the link Kirby provided
please remember, that execution is consecutive, so running a check before every query will still get you duplicates at high concurrency. To avoid duplicates fully you need to either run merges (which is admittedly going to be slower) or manually delete the "excessive" data after it's been loaded

Change the schema of a DynamoDB table: what is the best/recommended way?

What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.

How to make values unique in cassandra

I want to make unique constraint in cassandra .
As i want to all the value in my column be unique in my column family
ex:
name-rahul
phone-123
address-abc
now i want that i this row no values equal to rahul ,123 and abc get inserted again on seraching on datastax i found that i can achieve it by doing query on partition key as IF NOT EXIST ,but not getting the solution for getting all the 3 values uniques
means if
name- jacob
phone-123
address-qwe
this should also be not inserted into my database as my phone column has the same value as i have shown with name-rahul.
The short answer is that constraints of any type are not supported in Cassandra. They are simply too expensive as they must involve multiple nodes, thus defeating the purpose of having eventual consistency in first place. If you needed to make a single column unique, then there could be a solution, but not for more unique columns. For the same reason - there is no isolation, no consistency (C and I from the ACID). If you really need to use Cassandra with this type of enforcement, then you will need to create some kind of synchronization application layer which will intercept all requests to the database and make sure that the values are unique, and all constraints are enforced. But this won't have anything to do with Cassandra.
I know this is an old question and the existing answer is correct (you can't do constraints in C*), but you can solve the problem using batched creates. Create one or more additional tables, each with the constrained column as the primary key and then batch the creates, which is an atomic operation. If any of those column values already exist the entire batch will fail. For example if the table is named Foo, also create Foo_by_Name (primary key Name), Foo_by_Phone (primary key Phone), and Foo_by_Address (primary key Address) tables. Then when you want to add a row, create a batch with all 4 tables. You can either duplicate all of the columns in each table (handy if you want to fetch by Name, Phone, or Address), or you can have a single column of just the Name, Phone, or Address.

Generating Order Numbers - Keep unique across multiple machines - Unique string seed

I'm attempting to create an order number for customers to use. I will have multiple machines that do not have access to the same database (so can't use primary keys and generate a unique ID).
I will have a unique string that I could use for a seed for some algorithm that will generate a unique looking alphanumeric ID # for the order number. I do not want to use this unique string as the order # because its contents would not be appropriate in appearance for a customer to use for order #.
Would it be possible to combine the use of a GUID & my unique string with some algorithm to create a unique order #?
Open to any suggestions.
If you have a relatively small number of machines and each one can have it's own configuration file or setting, you can assign a letter to each machine (A,B,C...) and then append the letter onto the order number, which could just be an auto-incrementing integer in each DB.
i.e.
Starting each database ID at 1000:
1001A // First order on database A
1001B // First order on database B
1001C // First order on database C
1002A // Second order on database A
1003A // Third order on database A
1004A // etc...
1002B
1002C
Your order table in each database would have an ID column (integer) and "machine" identifier (character A,B,C...) so in case you ever needed to combine DBs into one, each order would still be unique.
Just use a straight up guid/uuid. They take into account the mac address of the network interface to make it unique to that machine.
http://en.wikipedia.org/wiki/Uuid
You can use ids and as a primary key if you generate they id from a stored procedure (or perhaps in Oracle using a sequence).
What you have to do is make each machine generate in a different range e.g. machine a from 1 to 1million, machine B from 1000001 to 2000000 etc.
You say you have a unique string that would not be 'appropriate' to show to customers.
If it's only inappropriate and not necessary i.e. security/privacy related you could just transform it somehow. A simple example would be Rot13
But generally I too would suggest using UUID (but version 4) for random numbers. The probability for generating duplicates is extremely low and there are libraries for many programming languages available.

Resources