I m on a problem that I can t figure out. I m building an application in c++ builder 2009 and oracle 11g. I have some calculated data that depend on users age. What I want to do is to re-calculate these data every new year. I thought I could have a trigger to do this, but I don t know which event I should catch and I didn t find something in internet.
My table is :
ATHLETE (name, ......, birthdate, Max_heart_frequency)
Max_heart_frequency is the field that depends on age. In insertion I calculate athlete's age, but what about next year??????
Can anyone help????
How is the max_heart_frequence calculated?
If this is a simply formula, I would create a view that returns that information. No need to store values that can easily be calculated:
CREATE VIEW v_athlete
AS
select name,
case
-- younger than 20 years
when (MONTHS_BETWEEN(sysdate, birthday) / 12) < 20 then 180
-- younger than 40 years
when (MONTHS_BETWEEN(sysdate, birthday) / 12) < 40 then 160
-- younger than 60 years
when (MONTHS_BETWEEN(sysdate, birthday) / 12) < 60 then 140
-- everyone else
else 120
end as max_heart_frequency
from athlete
Then you only need to select from the view and it will always be accurate.
You can use oracle scheduler to run a procedure at specific intervals (can be minutes hours, daily, yearly etc .. any time span).
Check this linke: http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/schedover.htm
You have two options:
Have a stored procedure that calculates and updates the Max_Heart_Frequency of all the athletes every 01st Jan (using the yearly scheduling of a procedure)
Have a stored procedure that runs daily and calculates and updates the Max_Heart_Frequency of all the athletes every day (using the daily scheduling of a procedure)
If Max_Heart_Frequency changes over time because the user is getting older, why are you storing it in the table in the first place? Why not just call the function that computes the maximum heart rate at runtime when you need the value? Potentially, it may make sense to have a view on top of the Athlete table that adds the computed Max_Heart_Frequency column to hide from the callers that this is a computed column.
Related
I have a report with a parameter where the end user chooses a practice name that corresponds to a group of people. Most of these groups have fewer than 10 people, but a small number of them have as many as 150. When there are more than 15 people in a given group, they want separate graphs, each with no more than 15 people. So for most of the groups, we only need one graph. For a few, we need a lot of graphs.
Behind the scenes, I created a graph for each multiple of 15 people, and set them to only be visible if there are actually that many people in the group. This does what I need it to, but it makes the report super slow. As close as I can tell, behind the scenes when an end user runs the report it's still somehow rendering the hidden graphs and slowing it all to heck. (I did find this link which I think suggests this is a known bug.
I need to have one report where the end user selects the practice name, so I can't make two reports, "My practice is normal" and "My practice is ginormous". I thought maybe I could make a conditional sub-report split into those two reports based on the practice name parameter, but that doesn't appear to be possible; you can play around with visibility but I'm guessing that will still cause the invisible graph rendering problem and not help my speed.
Are there any other cool tips I can try to speed up my report, or is this just a case of too many graphs spoiling the broth?
The easiest way would be to generate a group number for every 15 people and then use a list control to repeat the chart for each group.
Here's a very quick example of this in action. I just used some sample data from one of the Adventure Works sample database.
Here's my query that returns every person in each selected department. Note that I have commented out the DELCAREs as these were just in there for testing.
--DECLARE #Department varchar(50) = ''
--DECLARE #chartMax int = 5
SELECT
GroupName, v.Department, v.FirstName, v.LastName
, ChartGroup = (ROW_NUMBER() OVER(PARTITION BY Department ORDER BY LastName, FirstName)-1) / #chartMax -- calc which chart number the person belongs to
, Salary = ((ABS(CHECKSUM(NewId())) % 100) * 500) + (ABS(CHECKSUM(NewId())) % 1000) + 10000 -- Just some random number to plot
FROM [HumanResources].[vEmployeeDepartment] v
WHERE Department IN (#Department)
ORDER BY Department
The key bit is the ChartGroup column
ChartGroup = (ROW_NUMBER() OVER(PARTITION BY Department ORDER BY LastName, FirstName)-1) / #chartMax
This will give the first 5 rows in each department a ChartGroup of 0 the next 15 1 and so on. I used 5 rather than 15 just so it's easier to demo.
Here's the dataset results
Now, in your report, add a List, set it's dataset property to your dataset containing your main data (the query above in my case).
Now edit the 'details' rowgroup properties and add a grouping by Practice and ChartGroup (Department and ChartGroup in this example)
In the list box's textbox, right-click then insert a chart.
Set the chart up as required, in my example, I used salary as the values on a pie chart and the employee names as the labels.
Here's the final design ..
Note that I set the department as a multi-value parameter and also set the number of persons per chart (chartMax) as a report parameter.
When I preview the report I get this for 'Engineering' which has 6 employees
Sales has 18 employees so we get this
.... and so on, it will generate a new chart for every 15 people or part thereof.
Considering performance as main importance, how would be the best design approach to follow in order to build a recommendation system in DynamoDb?
The system would need to store an url and the numbers of times that topic was 'liked', but my requirement includes the need of searches by daily, weekly, monthly and yearly, for e.g.:
Give me the top 10 of the week
Give me the top 10 of the month
I was thinking about to include the date and time information, so that the query could control it through this field, but not sure wether it is the good in terms of performance.
If the only data structure you had was a hash map, how would you solve this problem?
What if on top of that constraint, you could only update any key up to 1000 times per second, and read a key up to 3000 per second?
How often do you expect your items to get liked? Presumably there will be some that will be hot and liked a lot, while others would almost never get any likes.
How real_time does your system need to be? Can the system be eventually consistent (meaning, would it be ok if you only reported likes as of several minutes ago)?
Let's give this a shot
Disclaimer: this is very much a didactic exercise -- in practice you may want to explore an analytics product, or some other technologies than DynamoDB to accompish this task
Part 1. Representing an Item And Updating Like Counts
First, let's talk about your aggregation/analytics goals: you mentioned that you want to query for "top 10 of the week" or "top 10 of the month" but you didn't specify if that is supposed to mean "calendar week"/"calendar month", or "last 7 days"/"last 30 days".
I'm going to take it literally and assume that "top 10 of the week" means top 10 items from this week that started on the most recent Monday (or Sunday if you roll that way). Same for month: "top 10 of the month" means "top 10 items since the beginning of this month.
In this case, you will probably want to store, for each item:
a count of total all-time likes
a count of likes since the beginning of current month
a count of likes since the beginning of current week
current month number - needed to determine if we need to reset
current week number - needed to determine if we need to reset
And each week, reset the counts for the current week; And each month reset the counts for the current month.
In DynamoDB, this might be represented like so:
{
id: "<item-id>",
likes_all: <numeric>, // total likes of all time
likes_wk: <numeric>, // total likes for the current week
likes_mo: <numeric>, // total likes for the current month
curr_wk: <numeric>, // number of the current week of year, eg. 27
curr_mo: <numeric>, // number of the current month of year, eg. 6
}
Now, you can update the number of likes with an UpdateItem operation, with an UpdateExpression, like so:
dynamodb update-item \
--table-name <your-table-name> \
--key '{"id":{"S":"<item-id>"}}' \
--update-expression "SET likes_all = likes_all + :lc, likes_wk = likes_wk + :lc, likes_mo = likes_mo + :lc" \
--expression-attribute-values '{":lc": {"N":"1"}}' \
--return-values ALL_NEW
This gives you a simple atomic way to increment the counts and get back the updated values. Notice the :lc value can be any number (not just 1). This will come in handy below.
But there's a catch. You also need to be able to reset the counts if the week or month rolled over, so to do that, you can break the update into two operations:
update the total count (and get the most recent values back)
conditionally update the week and month counts
So, our update sequence becomes:
Step 1. update total count and read back the updated item:
dynamodb update-item \
--table-name <your-table-name> \
--key '{"id":{"S":"<item-id>"}}' \
--update-expression "SET likes_all = likes_all + :lc" \
--expression-attribute-values '{":lc": {"N":"1"}}' \
--return-values ALL_NEW
This updates the total count and gives us back the state of the item. Based on the values of the curr_wk and curr_mo, you will have to decide what the update looks like. You may be either incrementing, or setting an absolute value. Let's say we're in the case when the update is being performed after the week rolled over, but not the month. And let's say that the result of the update above looks like this:
{
id: "<item-id>",
likes_all: 1000, // total likes of all time
likes_wk: 70, // total likes for the current week
likes_mo: 150, // total likes for the current month
curr_wk: 26, // number of the week of last update
curr_mo: 6, // number of the month of year of last update
}
curr_wk is 6, but at the time of update, the actual current week should be 7.
Then your update query would look look like this:
dynamodb update-item \
--table-name <your-table-name> \
--key '{"id":{"S":"<item-id>"}}' \
--update-expression "SET curr_wk = 27, likes_wk = :lc, likes_mo = likes_mo + :lc" \
--condition-expression "curr_wk = :wk AND curr_mo = :mo" \
--expression-attribute-values '{":lc": {"N":"1"}, ":wk": {"N":"26"}, ":lc": {"N":"6"},}' \
--return-values ALL_NEW
The ConditionExpression ensures that we don't reset the likes twice, if two conflicting updates happen at the same time. In that case, one of the updates would fail and you'd have to switch the update back to an increment.
Part 2 - Keeping Track of Statistics
To take care of your statistics you need to keep track of most likes per week and per month.
You can keep a sorted list of hottest items per week and per month. You also can store these lists in Dynamo.
For example, let's say you want to keep track of top 3. You might store something like:
{
id: "item-stats",
week_top: ["item3:4000", "item2:2000", "item9:700"],
month_top: ["item2:100000", "item4:50000", "item3:12000"],
curr_wk: 26,
curr_mo: 6,
sequence: <optimistic-lock-token>
}
Whenever you perform an update for items, you would also update the statistics.
The algorithm for updating statistics will be similar to updating an item, except you can't just use update expressions. Instead you have to implement your own read-modify-write sequence using GetItem, PutItem and ConditionExpression.
First, you read the current values for the item-stats special item, including the value of the current sequence (this is important to detect clobbering)
Then, you figure out if the item(s) you've just updated counts for would make it into the Top-N weekly or monthly list. If so, you would update the week_top and/or month-top attributes and prepare a conditional PutItem request.
The PutItem request must include a conditional check that verifies the sequeuce of the item-stats is the same as what you read earlier. If not, you need to read the item again and re-compute the top-N lists, then attempt to put again.
Also, similar to the way the counts get reset for items, when an update happens you need to check and see if the weekly or monthly top needs to be reset as part of the update.
When you make the PutItem request, make sure to generate a new sequence value.
Part 3 - Putting It All Together
In Part 1 and Part 2 we figured out how to keep track of likes and keep track of statics but there are big problems with our approach: performance would be pretty bad with any kind of real-life scale; hot items would create problems for us; updating the Top-N stats would be a significant bottleneck.
To improve performance and achieve some scalability we'd want to get away from updating each item and the item-stats for every single "like".
We can achieve a good balance of performance and scalability using a combination of queues + dynamodb + compute resource.
create a queue to store pending likes
let "likes API" would enqueue a message tagging a post with a like, instead of applying them as they come
implement a queue consumer (could be a Lambda, or some other periodically running process) to pull messages off the queue and aggregate likes per item, then update items and the item-stats
By batching updates, we can get control over concurrency (and cost) at the expense of latency/eventual consistency.
We may end up with a limited number of queue consumers, each processing items in batches. In each batch, multiple item likes would be aggregated and a single update per item would be applied. Similarly, a single item-stats update would be applied per batch processor.
Depending on volume of incoming likes, you may need to spin up more processors.
I do a lot of day-of reporting and one of the things I'm trying to do is compare today's performance to another day. The trouble is I'm trying to comp. to that day to the last full hour. Essentially filter out all data in that comp day that happened after the last completed hour.
I've accomplished this in Tableau but I'd like for this report to be done in Data Studio. Is there a way of using functions to create a custom metric that returns the current hour? If I could get that I could easily use it as a filter for my report.
Thanks for any help.
Here's what the solution looks like in tableau:
IF [Session Hour (int)] <= [Current Hour]
THEN
[Revenue GA]
END
And:
DATEPART('hour', Now())-4
I was able to filter to the current hour by
a) making a calculated field to figure out the minutes difference between now and the data
minutes_before_now = datetime_diff(current_datetime(),the_time,MINUTE)
and b) filtering the data where this field was < 60
My advice would be to create a filter inside a date range, p.e.
And then you can choose the comparison range within the primary date range.
That's what I'd do, if I understood your question.
I have a dataset in Tableau that contains sales data listing each sale the company has had in the past year. Each customer has a unique ID, and many customers return. I'm trying to figure out how to create a calculated field in Tableau that gives a True | False answer to whether a given patient ID appears again in the dataset within a given timeframe (say, within six weeks of a given observation).
I've tried running it with LOOKUP, but I haven't been able to specify the increase in time effectively.
This would be how I would approach it. Create two date parameters, start and end.
Then create a filter calc for the date.
[date] >= [param_date_start]
and
[date] <= [param_date_end]
Place this in the filter shelf set to TRUE.
Then create another parameter for your customer id and you can test for the existence of a customer id as follows.
max([customer_id] = [param_customer_id])
This will return true if there is at least one record where the customer_id equals the param_customer_id for the chosen dimensions and date range.
You could use this same approach to define a computed set, say the set of all products that were purchased by the customer in the specified date range, or the set of all patients that received a particular service. Just use this formula on the condition tab when defining the set.
Let's say I have three basic models: a User, a Company, and a Visit. Every time a User goes to a Company, a Visit is recorded in this format (user_id, company_id, visit_date).
I'd like to be able to calculate the average time between visits for a company. Not visits overall, but specifically how long on average one of their customers waits before returning to the store.
For example, if one user visited on Tuesday, Wednesday, and Friday that gives one "gap" of one day, and one "gap" of two days => (1, 2). If another user visited on Monday and Friday, that gives one gap of 4 days => (4). If a third user visited only once, he should not be considered. The average time between user visits for the company is (1 + 2 + 4) / 3 = 2.333 days.
If I have thousands of users, taps, and companies and I want to calculate a single figure for each company, how should I go about this? I've only done basic MapReduce applications before and I can't figure out what my Map and Reduce steps would be to get this done. Can anyone help me figure out a MapReduce in pseudocode? Or is there some other method of distributed calculation I can reasonably perform? For the record, I'd like to perform this operation on my database every night.
The overly simplistic approach would be to have two job steps.
First job step has a mapper to write key values in the form "company:user" and "visit_date". In the example above, the mapper would write something like:
"user1:companyA" -> "2012/07/16"
"user1:comapnyA" -> "2012/07/17"
"user1:comapnyA" -> "2012/07/19"
"user2:comapnyA" -> "2012/07/15"
"user2:comapnyA" -> "2012/07/19"
...
This means that each call to the reducer will pass all of the visits by a single user to a single company. That means that one call to the reducer will pass in:
"user1:companyA" -> {2012/07/16, 2012/07/17, 2012/07/19}
and another call will pass in:
"user2:companyA" -> {2012/07/15, 2012/07/19}
I'm assuming the set of dates (passed in as an Iterable value) is easily managed as you sort it, figure out the gaps and write a record for each gap as a key value pair in the form "company" and "gap". For example, when passed:
"user1:companyA" -> {2012/07/16, 2012/07/17, 2012/07/19}
The first job's reducer will write to the context:
"companyA" -> 1
"compnayA" -> 2
The second job has a pass-through mapper that just passes the company/gap info on to the reducer. Each call to the reducer gives an Iterable value of gaps for a specific company. Iterate through the data to produce an average and write the key value pair in the form "company" and "average_gap".
If the original set of visits gets too big, we can talk about getting hadoop to do the sorting for you with some custom comparators.