want to add record vertically in database - asp.net

I am making attendance management system in which I have to record the time in and time out of an employee several times a day.
This can easily be achieved if I add a new row for every timein and time out.
But the problem is that the number of employees is very high, so
I want to add timein and timeout horizontally rather than vertically.
How that can be done?
e.g
id | Barcode | date | time in | time out | time in | time out| .......
| total time

I agree with the Comments that you should keep the data in a vertical fashion because it is a better design and utilizes the "relational" aspect of the database. Based on your description of the system having approximately 90,000 employees and may 5-10 rows per employee in the table (90k * 10 ).... you should consider having an Index on the employee id field (FK) so that your queries are more efficient. By using an Index, you shoudl be able to store many records and not see any speed issues.

Related

What is the best strategy for optimizing aggregation queries over a long period of time/dataset in Azure Data Explorer (Kusto)?

I have some queries that look at aggregated data over a long period of time (180 days) for data that is per second (example query below). The table's hot cache is 31 days so the queries can take over a minute to return and this is not acceptable for the dashboards I want to display them on. What would be recommended optimization strategies? My thoughts so far is to either use an update policy to push the data for these tags into a separate table with a hot cache of 180 days or to use a materialized view.
raw_table
| where TimeStamp between (now(-180d) .. now()) and TagName in ("Tag1","Tag2")
| extend Date = startofday(TimeStamp)
| summarize Value1=max(Value) by Date,TagName
| summarize Value1=sum(Value1) by Date
| project TagName="AggregatedData",Date,Value
My thoughts so far is to either use an update policy to push the data for these tags into a separate table with a hot cache of 180 days or to use a materialized view.
both options you mentioned are appropriate (even a combination of both, if required)

Question using distinct and grouping on WIndows Virtual Desktop session statistics

I need some help creating a Kusto query for filtering and grouping Windows Virtual Desktop statistics.
What I need: A chart that shows me the amount of total sessions of users logged in to a WVD Host Pool.
Data available: WVD logs this info every x seconds to log analytics for each host. But that interval is not exactly "every x seconds", but at least once every 3 minutes.
So I made this query for now:
WVDAgentHealthStatus
| where TimeGenerated > ago(3m)
| project SessionHostName, TimeGenerated, ActiveSessions, InactiveSessions, Totalsessions=(toint(ActiveSessions) + toint(InactiveSessions))
That makes results like this:
As you can see, some hosts are reported twice, some three times.
I need help with:
How to make this query usable in a chart, so that it shows "the amount of sessions per host" in steps of every 3 minutes, but the chart should show this data for the last 8 hours.
I guess "where TimeGenerated" has to be "> ago(8h)", but it needs to group the data in sets of 3 minutes, then get distinct data per host.
I have no idea how to do this. I'm not that good in Kusto. Can anyone help me out?
You need to use the summarize operator:
WVDAgentHealthStatus
| where TimeGenerated > ago(8h)
| summarize arg_max(TimeGenerated, ActiveSessions) by bin(TimeGenerated, 3m), SessionHostName

BigQuery to Data Studio : Show reliable COUNT DISTINCT regardless of the selected period

in my BigQuery project I store event data integrated from Firebase. The granularity and dimension is such that trying to present raw data in Data Studio quickly makes the report become VERY slow (1-2 min per page/interaction).
I then started to think how I could create pre-aggregated tables in BigQuery to speed everything up, but quickly realised COUNT DISTINCT metrics would be a problem with this approach.
Let me explain:
SELECT user, date
FROM UNNEST([
STRUCT("Adam" AS user, "20190923" AS date),
("Bob", "20190923"),
("Carl", "20190923"),
("Adam", "20190924"),
("Bob", "20190924"),
("Adam", "20190925"),
("Carl", "20190925"),
("Bob", "20190926")
]) AS website_visits;
+------+----------+
| User | Date |
+------+----------+
| Adam | 20190923 |
| Bob | 20190923 |
| Carl | 20190923 |
| Adam | 20190924 |
| Bob | 20190924 |
| Adam | 20190925 |
| Carl | 20190925 |
| Bob | 20190926 |
+------+----------+
The above is a table of website visits.
Clearly, creating a pre-aggregated table like
SELECT date, COUNT(DISTINCT user) FROM website_visits GROUP BY date
has the limitation that the count cannot be aggregated further (or even less, dinamically) to get a total, as doing a SUM would return 8 unique users which is not correct, there are only 3 unique users.
In BigQuery, this is fixed by using HLL_COUNT, which despite the approximation works ok for me.
Now to the big question:
How to do the same so that the result is displayable in Data Studio????
HLL_COUNT.EXTRACT is not available as function in there, and in the reporting I always have to keep in mind that the date range is set by the user however (s)he likes so it's not possible to store a pre-aggregated result for ALL cases...
EDIT 1: APPROX_COUNT_DISTINCT
As per answer from Bobbylank, I tried to use APPROX_COUNT_DISTINCT.
However I found that this just seems to move the issue down the line. My fault for not explaining what's over there.
Despite being performances acceptable it does not seem possible to me to blend a data source with this calculated metric.
Example: After displaying the amount of unique users in the selected period (which now works), I'm also trying to display Average Revenue Per User (ARPU) in Data Studio like Firebase does.
To do this, I have to SUM(REVENUE) / APPROX_COUNT_DISTINCT(USER)
Clearly, REVENUE works ok with pre-aggregation and is available in the raw data. I tried then to blend the raw data with a table containing just user visits. However APPROX_COUNT_DISTINCT can't be used in the blended data definition as calculated metrics are not allowed.
Even trying to use the USER field as a metric with Count Distinct aggregation, despite returning the correct figures when showing revenue and user count separately, when I try to divide them the problem becomes aggregation (apply SUM or AVG to the field and basically the result will be AVG(REVENUE/USERS) for each day).
I also then tried to store REVENUE directly in the visits table, but was reminded by Data Studio that I can't create calculated metrics that I can't mix dimensions and metrics in a calculated field.
APPROX_COUNT_DISTINCT might be more performance friendly for you?
https://support.google.com/datastudio/answer/9189108?hl=en
Otherwise the only way I can think would be to pre-calculate several metrics (e.g. unique users on that day, 7-day cumulative, 14-day, etc.) as your customer require for each single day.
Or you could provide a 2 page report with both of these methods with the caveat that the first can be used over a time period but will be much slower?

Separate tables vs map lists - DynamoDB

I need your help. I am quite new to databases.
I'm trying to get set up a table in DynamoDB to store info about TV shows. It seems pretty simple and straightforward but I am not sure if what I am doing is correct.
So far I have this structure. I am trying to fit everything about the TV shows into one table. Seasons and episodes are contained within a list of maps within a list of maps.
Is this too much layering?
Would this present a problem in the future where some items are huge?
Should I separate some of these lists of maps to another table?
Shows table
Ideally, you should not put a potentially unbounded list in a single row in DynamoDB because you could end up running into the item size limit of 400kb. Also, if you were to read or write one episode of one show, you consume capacity as if you are reading or writing all the episodes in a show.
Take a look at the adjacency list pattern. It’s a good choice because it will allow you to easily find the seasons in a show and the episodes in a season. You can also take a look at this slide deck. Part of the way through, it talks about hierarchical data, which is exactly what you’re dealing with.
If you can provide more information about your query patterns, I can give you more guidance on how to model your data in the table.
Update (2018-11-26)
Based on your comments, it sounds like you should use composite keys to establish hierarchical 1-N relationships.
By using a composite sort key of DataType:ItemId where ItemId is a different format depending on the data type, you have a lot of flexibility.
This approach will allow you to easily get the seasons in the show, get all episodes in all seasons, get all episodes in a particular season, or even get all episodes between season 1, episode 5 and season 2 episode 5.
hash_key | sort_key | data
----------|-----------------|----------------------------
SHOW_1234 | SHOW:SHOW_1234 | {name:"Some TV Show", ...
SHOW_1234 | SEASON:SE_01 | {descr:"In this season, the main character...
SHOW_1234 | EPISODE:S01_E01 | {...
SHOW_1234 | EPISODE:S01_E02 | {...
Here are the various key condition expressions for the queries I mentioned:
hash_key = "SHOW_1234" and sort_key begins_with("SEASON:") – gets all seasons
hash_key = "SHOW_1234" and sort_key begins_with("EPISODE:") – gets all episodes in all season
hash_key = "SHOW_1234" and sort_key begins_with("EPISODE:S02_") – gets all episodes in season 2
hash_key = "SHOW_1234" and sort_key between "EPISODE:S01_E5" and "EPISODE:S02_E5" – gets all episodes between season 1, episode 5 and season 2 episode 5

using single field for IN out tinme in table

we have a bio metric software for attendance, you press finger over sensor and it puts you IN/OUT time in SQL database, it's desktop app but boss asked me to make same but Web Application in asp.net MVC, now we have 2 tables
tblEmployee
SerialId
EmpId
EmpName
DepId
tblAttendance
AtdId
EmpId (FK)
Time(IN/OUT)
Date
now problem is that When i come into office then it put new record in attendance table and when i leave office then it again put whole record for OUT time in attendance table, it's using same field TIME for IN and OUT both. So thinking to solve this issue like Keeping single record for single day and only changing IN/OUT time ?
This look like normal behavior to me, but still if you want to achive what you have asked, you could change the table definition like this -
tblAttendance
AtdId
EmpId (FK)
Date --Just the date for every day
Time_IN
Time_OUT
Now you need to figure out which swipe it is - IN or OUT and update the particular value. For every IN there would be a new row - which you can control for a particular DATE - as in one row per DATE - Composite Primary Key - AtdId, EmpId (FK), Date
Note - Over the time the above approach will slow down the punch IN/punch OUT process. It'll take comparatively longer to register the swipes. Your current DB structure is more apt for long term usage. If you want to display the data in any particular format, you should rather design views as per your requirements.

Resources