I have searched and searched the web and the TD forums, and have not had luck locating the Metadata (column definitions) for the two Performance Usage Tables below.
pdcrinfo.dbqlobjtbl
pdcrinfo.dbqlogtbl
Does anyone happen to be familiar with these and where I can find that information?
Thank you!
The DBQL-tables in PDCRDATA are mostly copies of the tables in DBC (besides the added LOGDATE partitioning column) and in PDCRINFO there 1:1-views simply selecting all columns.
So all the metadata can be found in the manuals covering DBQL, e.g. for TD15.00: Tracking Query Behavior with Database Query Logging
As Dieter said, you may probably just focus on DBC database, and the query below would suffice in most cases (in this example hits for the past 12 hours):
Select unique
, UserName
, ObjectDatabaseName
, ObjectTableName
, StatementType
From DBC.DBQLogTbl dbqlogtbl
, DBC.DBQLObjTbl dbqlobjtbl
Where 1=1
and dbqlobjtbl.ProcID = dbqlogtbl.ProcID
and dbqlobjtbl.QueryID = dbqlogtbl.QueryID
and dbqlogtbl.CollectTimeStamp between
(dbqlobjtbl.CollectTimeStamp - Interval '12' Hour)
and
Current_Timestamp(0)
--
and ObjectTableName is not null
Related
I'm pretty new to Bigquery/Firebase/GA even SQL. (btw, if you have some good experience or recommendations where I can start learning, that would be great!)
But I have main issue with Bigquery that needs solving right now. I'm kinda trying all sources I can get some info/tips from. I hope this community will be one of them.
So my issue is with Custom Definitions. we have them defined in Google Analytics . We want to divide users with this definition and analyze them separately:
My question is: where/how can I find these custom definitions in bigquery to filter my Data? I have normal fields, like user ID, Timestamps etc. but can't find these custom definitions.
I have been doing some research but still don't have a clear answer, if someone can give me some tips or mby a solution I would be forever in debt ! xD
I got one solution from the other community which looks like this, but I couldn't make it work, my bigquery doesn't recognize customDimensions as it says in the error.
select cd.* from table, unnest(customDimensions) cd
You can create your own custom function, Stored Procedure on Bigquery as per your requirements.
To apply formal Filter over filed like user ID, & Timestamps, you can simply apply standard SQL filter as given below:-
SELECT * FROM DATA WHERE USER_ID = 'User1' OR Timestamps = 'YYY-MM-DDTHH:MM'
Moreover, unnest is used to split data on fields, do you have data which need to be spited ?
I could help you more if you share what are you expecting from your SQL.
Your custom dimensions sit in arrays called customDimensions. These arrays are basically a list of structs - where each struct has 2 fields: key and value. So they basically look like this example: [ {key:1, value:'banana'}, {key:4, value:'yellow'}, {key:8, value:'premium'} ] where key is the index of the custom dimension you've set up in Google Analytics.
There are 3 customDimensions arrays! Two of them are nested within other arrays. If you want to work with those you really need to get proficient in working with arrays. E.g. the function unnest() turns arrays into table format on which you can run SQL.
customDimensions[]
hits[]
customDimensions[]
product[]
customDimensions[]
Example 1 with subquery on session-scoped custom dimension 3:
select
fullvisitorid,
visitStartTime,
(select value from unnest(customDimensions) where key=3) cd3
from
ga_sessions_20210202
Example 2 with a lateral cross join - you're enlargening the table here - not ideal:
select
fullvisitorid,
visitStartTime,
cd.*
from ga_session_20210202 cross join unnest(customDimensions) as cd
All names are case-sensitive - in one of your screenshots you used a wrong name with a "c" being uppercase.
This page can help you up your array game: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays - just work through all the examples and play around in the query editor
I've been attempting to increase my knowledge and trying out some challenges. I've been going at this for a solid two weeks now finished most of the challenge but this one part remains. The error is shown below, what am i not understanding?
Error in sqlite query: update users set last_browser= 'mozilla' + select sql from sqlite_master'', last_time= '13-04-2019' where id = '14'
edited for clarity:
I'm trying a CTF challenge and I'm completely new to this kind of thing so I'm learning as I go. There is a login page with test credentials we can use for obtaining many of the flags. I have obtained most of the flags and this is the last one that remains.
After I login on the webapp with the provided test credentials, the following messages appear: this link
The question for the flag is "What value is hidden in the database table secret?"
So from the previous image, I have attempted to use sql injection to obtain value. This is done by using burp suite and attempting to inject through the user-agent.
I have gone through trying to use many variants of the injection attempt shown above. Im struggling to find out where I am going wrong, especially since the second single-quote is added automatically in the query. I've gone through the sqlite documentation and examples of sql injection, but I cannot sem to understand what I am doing wrong or how to get that to work.
A subquery such as select sql from sqlite_master should be enclosed in brackets.
So you'd want
update user set last_browser= 'mozilla' + (select sql from sqlite_master''), last_time= '13-04-2019' where id = '14';
Although I don't think that will achieve what you want, which isn't clear. A simple test results in :-
You may want a concatenation of the strings, so instead of + use ||. e.g.
update user set last_browser= 'mozilla' || (select sql from sqlite_master''), last_time= '13-04-2019' where id = '14';
In which case you'd get something like :-
Thanks for everyone's input, I've worked this out.
The sql query was set up like this:
update users set last_browser= '$user-agent', last_time= '$current_date' where id = '$id_of_user'
edited user-agent with burp suite to be:
Mozilla', last_browser=(select sql from sqlite_master where type='table' limit 0,1), last_time='13-04-2019
Iterated with that found all tables and columns and flags. Rather time consuming but could not find a way to optimise.
I'm trying to resolve below issue:
I need to prepare table that consists 3 columns:
user_id,
month
value.
Each from over 200 users has got different values of parameters that determine expected value which are: LOB, CHANNEL, SUBSIDIARY. So I decided to store it in table ASYSTENT_GOALS_SET. But I wanted to avoid multiplying rows and thought it would be nice to put all conditions as a part of the code that I would use in "where" clause further in procedure.
So, as an example - instead of multiple rows:
I created such entry:
So far I created testing table ASYSTENT_TEST (where I collect month and value for certain user). I wrote a piece of procedure where I used BULK COLLECT.
declare
type test_row is record
(
month NUMBER,
value NUMBER
);
type test_tab is table of test_row;
BULK_COLLECTOR test_tab;
p_lob varchar2(10) :='GOSP';
p_sub varchar2(14);
p_ch varchar2(10) :='BR';
begin
select subsidiary into p_sub from ASYSTENT_GOALS_SET where user_id='40001001';
execute immediate 'select mc, sum(ppln_wartosc) plan from prod_nonlife.mis_report_plans
where report_id = (select to_number(value) from prod_nonlife.view_parameters where view_name=''MIS'' and parameter_name=''MAX_REPORT_ID'')
and year=2017
and month between 7 and 9
and ppln_jsta_symbol in (:subsidiary)
and dcs_group in (:lob)
and kanal in (:channel)
group by month order by month' bulk collect into BULK_COLLECTOR
using p_sub,p_lob,p_ch;
forall x in BULK_COLLECTOR.first..BULK_COLLECTOR.last insert into ASYSTENT_TEST values BULK_COLLECTOR(x);
end;
So now when in table ASYSTENT_GOALS_SET column SUBSIDIARY (varchar) consists string 12_00_00 (which is code of one of subsidiary) everything works fine. But the problem is when user works in two subsidiaries, let say 12_00_00 and 13_00_00. I have no clue how to write it down. Should SUBSIDIARY column consist:
'12_00_00','13_00_00'
or
"12_00_00","13_00_00"
or maybe
12_00_00','13_00_00
I have tried a lot of options after digging on topics like "Deling with single/escaping/double qoutes".
Maybe I should change something in execute immediate as well?
Or maybe my approach to that issue is completely wrong from the very beginning (hopefully not :) ).
I would be grateful for support.
I didn't create the table function described here but that article inspired me to go back to try regexp_substr function again.
I changed: ppln_jsta_symbol in (:subsidiary) to
ppln_jsta_symbol in (select regexp_substr((select subsidiary from ASYSTENT_GOALS_SET where user_id=''fake_num''),''[^,]+'', 1, level) from dual
connect by regexp_substr((select subsidiary from ASYSTENT_GOALS_SET where user_id=''fake_num''), ''[^,]+'', 1, level) is not null) Now it works like a charm! Thank you #Dessma very much for your time and suggestion!
"I wanted to avoid multiplying rows and thought it would be nice to put all conditions as a part of the code that I would use in 'where' clause further in procedure"
This seems a misguided requirement. You shouldn't worry about number of rows: databases are optimized for storing and retrieving rows.
What they are not good at is dealing with "multi-value" columns. As your own solution proves, it is not nice, it is very far from nice, in fact it is a total pain in the neck. From now on, every time anybody needs to work with subsidiary they will have to invoke a function. Adding, changing or removing a user's subsidiary is much harder than it ought to be. Also there is no chance of enforcing data integrity i.e. validating that a subsidiary is valid against a reference table.
Maybe none of this matters to you. But there are very good reasons why Codd mandated "no repeating groups" as a criterion of First Normal Form, the foundation step of building a sound data model.
The correct solution, industry best practice for almost forty years, would be to recognise that SUBSIDIARY exists at a different granularity to CHANNEL and so should be stored in a separate table.
We have started to use the updated System.Web.Providers provided in the Microsoft.AspNet.Providers.Core package from NuGet. We started to migrate our existing users and found performance slowing and then deadlocks occurring. This was with less than 30,000 users (much less than the 1,000,000+ we need to create). When we were calling the provider, it was from multiple threads on each server and there were multiple servers running this same process. This was to be able to create all the users we required as quickly as possible and to simulate the load we expect to see when it goes live.
The logs SQL Server generated for for a deadlock contained the EF generated sql below:
SELECT
[Limit1].[UserId] AS [UserId]
, [Limit1].[ApplicationId] AS [ApplicationId]
, [Limit1].[UserName] AS [UserName]
, [Limit1].[IsAnonymous] AS [IsAnonymous]
, [Limit1].[LastActivityDate] AS [LastActivityDate]
FROM
(SELECT TOP (1)
[Extent1].[UserId] AS [UserId]
, [Extent1].[ApplicationId] AS [ApplicationId]
, [Extent1].[UserName] AS [UserName]
, [Extent1].[IsAnonymous] AS [IsAnonymous]
, [Extent1].[LastActivityDate] AS [LastActivityDate]
FROM
[dbo].[Users] AS [Extent1]
INNER JOIN [dbo].[Applications] AS [Extent2] ON [Extent1].[ApplicationId] = [Extent2].[ApplicationId]
WHERE
((LOWER([Extent2].[ApplicationName])) = (LOWER(#p__linq__0)))
AND ((LOWER([Extent1].[UserName])) = (LOWER(#p__linq__1)))
) AS [Limit1]
We ran the query manually and the execution plan said that it was performing a table scan even though there was an underlying index. The reason for this is the use of LOWER([Extent1].[UserName]).
We looked at the provider code to see if we were doing something wrong or if there was a way to either intercept or replace the database access code. We didn't see any options to do this but we did find the source of the LOWER issue, .ToLower() is being called on both the column and parameter.
return (from u in ctx.Users
join a in ctx.Applications on u.ApplicationId equals a.ApplicationId into a
where (a.ApplicationName.ToLower() == applicationName.ToLower()) && (u.UserName.ToLower() == userName.ToLower())
select u).FirstOrDefault<User>();
Does anyone know of a way that we change the behaviour of the provider to not use .ToLower() so allowing the index to be used?
You can create an index on lower(username) per Sql Server : Lower function on Indexed Column
ALTER TABLE dbo.users ADD LowerFieldName AS LOWER(username) PERSISTED
CREATE NONCLUSTERED INDEX IX_users_LowerFieldName_ ON dbo.users(LowerFieldName)
I was using the System.Web.Providers.DefaultMembershipProvider membership provider too but found that it was really slow. I changed to the System.Web.Security.SqlMembershipProvider and found it to be much faster (>5 times faster).
This tutorial shows you how to set up the SQL database that you need to use the SqlMembershipProvider http://weblogs.asp.net/sukumarraju/archive/2009/10/02/installing-asp-net-membership-services-database-in-sql-server-expreess.aspx
This database that is auto generated uses stored procedures which may or may not be an issue for your DB guys.
I have a custom log/transaction table that tracks my users every action within the web application and it currently has millions of records and grows by the minute. In my application I need to implement some of way of precalculating a user's activities/actions in sql to determine whether other features/actions are available to the user within the application. For one example, before a page loads, I need to check if the user viewed a page X number of times.
(SELECT COUNT(*) FROM MyLog WHERE UserID = xxx and PageID = 123)
I am making several similar aggregate queries with joins for checking other conditions and the performance is poor. These checks are occuring on every page request and the application can receive hundreds of requests per minute.
I'm looking for any ideas to improve the application performance through sql and/or application code.
This is a .NET 2.0 app and using SQL Server 2008.
Much thanks in advance!
Easiest way is to store the counts in a table by themselves. Then, when adding records (hopefully through an SP), you can simply increment the affected row in your aggregate table. If you are really worried about the counts getting out of whack, you can put a trigger on the detail table to update the aggregated table, however I don't like triggers as they have very little visibility.
Also, how up to date do these counts need to be? Can this be something that can be stored into a table once a day?
Querying a log table like this may be more trouble then it is worth.
As an alternative I would suggest using something like memcache to store the value as needed. As long as you update the cache on each hit it will much faster the querying a large database table. Memcache has an build in increment operator that handles this kind of thing.
This way you only need to query the db on the first visit.
Another alternative is to use a precomputed table, updating it as needed.
Have you indexed MyLog on UserID and PageID? If not, that should give you some huge gains.
Todd this is a tough one because of the number of operations you are performing.
Have you checked your indexes on that database?
Here's a stored procedure you can execute to help at least find valid indexes. I can't remember where I found this but it helped me:
CREATE PROCEDURE [dbo].[SQLMissingIndexes]
#DBNAME varchar(100)=NULL
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0)
* (migs.user_seeks + migs.user_scans) AS improvement_measure,
'CREATE INDEX [missing_index_'
+ CONVERT (varchar, mig.index_group_handle)
+ '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL
AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*,
mid.database_id,
mid.[object_id]
FROM
sys.dm_db_missing_index_groups mig
INNER JOIN
sys.dm_db_missing_index_group_stats migs
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid
ON mig.index_handle = mid.index_handle
WHERE
migs.avg_total_user_cost
* (migs.avg_user_impact / 100.0)
* (migs.user_seeks + migs.user_scans) > 10
AND
(#DBNAME = db_name(mid.database_id) OR #DBNAME IS NULL)
ORDER BY
migs.avg_total_user_cost
* migs.avg_user_impact
* (migs.user_seeks + migs.user_scans) DESC
END
I modified it a bit to accept a db name. If you dont provide a db name it will run and give you information about all databases and give you suggestions on what fields need indexing.
To run it use:
exec DatabaseName.dbo.SQLMissingIndexes 'MyDatabaseName'
I usually put reusable SQL (Sproc) code in a seperate database called DBA then from any database I can say:
exec DBA.dbo.SQLMissingIndexes
As an example.
Edit
Just remembered the source, Bart Duncan.
Here is a direct link http://blogs.msdn.com/b/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
But remember I did modify it to accept a single db name.
We had the same problem, beginning several years ago, moved from SQL Server to OLAP cubes, and when that stopped working recently we moved again, to Hadoop and some other components.
OLTP (Online Transaction Processing) databases, of which SQL Server is one, are not very good at OLAP (Online Analytical Processing). This is what OLAP cubes are for.
OLTP provides good throughput when you're writing and reading many individual rows. It fails, as you just found, when doing many aggregate queries that require scanning many rows. Since SQL Server stores every record as a contiguous block on the disk, scanning many rows means many disk fetches. The cache saves you for a while - so long as your table is small, but when you get to tables with millions of rows the problem becomes evident.
Frankly, OLAP isn't that scalable either, and at some point (tens of millions of new records per day) you're going to have to move to a more distributed solution - either paid (Vertica, Greenplum) or free (HBase, Hypertable).
If neither is an option (e.g. no time or no budget) then for now you can alleviate your pain somewhat by spending more on hardware. You need very fast IO (fast disks, RAID), as as much RAM as you could get.