I am making a SQLite database and want to have a way to add descriptions of the columns. For example, I may have table measurements with columns size and weight and want to add metadata on how the values in these columns were measured (eg. "Measured from fingertip to fingertip", "Measured at 10 AM").
How should I organize this extra information? In a separate table metadata with columns table, column and description? Wouldn't it be difficult to retrieve? Can I in some way use the descriptions as ALIAS?
EDIT. I am aware that it is not possible to add metadata directly to columns so the question is how to store the metadata in a way that makes them easily appendable/accessible when running queries on the corresponding columns
Related
I'm curious as to how you handle instances where users want to know how to exclude columns that have sensitive data. I know explicitly listing the columns is an option but what do you do when you have tables/views with 50-100+ columns?
Example: Say I have a customer table that has 50 columns, I want to exclude 15 columns from the select as they hold sensitive data. I'm aware of all sensitive columns as I have a separate view that specifies them, is there any options to using this view to dynamically define the columns to select?
Appreciate any suggestions!
One possible solution I looked into was creating violatile tables and dropping the sensitive columns.
I am new to OLAP,if I have two fact tables can they share the same dimension table?
A good example would be if I have tables fact1 and fact2, can they both have a foreign key into a single Date dimension (dimDate) table? Or, do I need/should create separate dimDate dimension tables for each separate fact?
To me, and based on my research, I don't see any downfall of sharing a dim table, but wanted to check.
Thanks!
They can, and should.
That's the whole point of conformed dimensions, keeping the attributes in a single place, so as to avoid multiple versions of truth coming from different fact tables.
So a single date dimension, with all the necessary attributes for each fact table, which is then linked from each fact table that needs it.
Same for a customer dimension. If you have a sales fact table that needs customer info such as billing address and a marketing fact table that holds info about campaigns each customer can benefit from, you would combine all those attributes in a single table. Some customers may not be referenced in the marketing fact table, others may not exist in the fact table, but all would exist in the single customer dimension, which is your single source of truth about who your customers are.
Problem
Can't create a table with an index column that references multiple rows in a table. Picture example below of what I'm trying to create.
Overview
Imagine an (SQLite) table will hold stock dividend payments. The index column is set to the ticker symbols. However, each ticker symbol refers to multiple records, which are organized by a time stamp. The documentation on SQLite and about 15 other tutorials all seem to focus on indexing where there is always a 1:1 relationship between an index and a record. I would like to create an index with a 1:many relationship.
The lookup would find the appropriate stock by symbol, and then (probably) a secondary index on the dates in the first column. But I cannot find any examples where others have tried to set up this structure. Makes me think maybe I don't have the right approach, or this is just a special case.
I don't think your problem is actually a problem. Putting an index on a column doesn't mean it has to contain unique values. It's perfectly reasonable for values in an indexed column to repeat. Of course there are diminishing returns. E.g. If you have a million rows and only five different values in a column, an index on that column isn't really going to do much for you.
A good rule of thumb is to start with an index on the column(s) you're using in your where clause. Then run the queries and see if you're getting satisfactory performance.
I am trying to put a large data frame into a new table of a database. It could be done simply done via:
dbWriteTable(conn=db,name="sometablename",value=my.data)
However, I want to specify the Primary keys, foreign keys and the column Types like Numeric, Text and so on.
Is there any thing I can do? Should I create a table with my columns first and then add the data frame into it?
RSQlite assumes you have already your data.frame table all set before writing it to disk. There is not much to specify in the writing query. So, I visualise two ways, either before firing a query to write it, or after. I usually write the table from R to disk, then I polish it using dbGetQuery to alter table attributes. The only problem with this workflow is that Sqlite has very limited feature for altering tables.
I have values in a SQLite table* that contain a number of strings, of different lengths, joined by periods, something like this:
SomeApp.SomeNameSpace.InterestingString.NotInteresting
SomeApp.OtherNameSpace.WantThisOne.ReallyQuiteDull
SomeApp.OtherNameSpace.WantThisOne.AlsoDull
SomeApp.DifferentNameSpace.AlwaysWorthALook.LittleValue
I'd like to extract (in this case) the third period-delimited substring so I could write something like
SELECT interesting_string, COUNT(*)
FROM ( SELECT third_part_of_period_delimited_string(name) interesting_string )
GROUP BY interesting_string;
Obviously I can do this any number of ways programmatically; I'm wondering if there's any way to achieve this in a SQLite SELECT query?
* It's a SharpDevelop Profiler database, if anyone's curious
No.
You can, as you mention, work with the strings after you have selected them from the database. Or you can split them up into separate columns when they are stored.
If you do not have access to the code that is storing the data, you might want to consider reading the data in its entirety, splitting the strings and storing the split out tokens in separate columns in a new table. If the data is not too large, you might look at storing this table in a new memory database to give excellent performance.
Whether this is worthwhile depends on whether one pass to split the data strings can be made use of many times. If the data is constantly changing, then this scheme would probably not work well.