I asked a question a little over a week ago.
Firestore order by two fields
The response I got said that:
"The API supports the capability you want, although I don't see an example in the documentation that shows it.
The ordering of the query terms is important. Suppose you have a collection of cities and the fields of interest are population (h1) and name (h2). To get the cities with population in range 1000 to 2000, ordered by name, the query would be:
citiesRef.orderBy("population").orderBy("name").startAt(1000).endAt(2000)
This query requires a composite index, which you can create manually in the console. Or as the documentation there indicates, the system will help you:"
***But what this returns is not cities with population between 1000 and 2000 ordered by name but rather cities with population 1000 ordered by name followed by cities with population 1001 ordered by name followed by 1002 ordered by name and so on up to 2000.
I wondering if there is a way to get all cities between 1000 and 2000 in population ordered by name.
Thanks.
I think you're looking for where clauses on population:
citiesRef
.where("population", ">=", 1000)
.where("population","<",2000)
Because Cloud Firestore doesn't support ordering by a different field than the supplied inequality, you won't be able to sort by name directly from the query. Instead you'd need to sort client-side once you've fetched the data.
Related
I am doubting myself on how I should approach this problem.
My users are able to record many parts of their day, including activities, mood, health measurement (heart bpm, glucose), exercise, meals.
I originally thought that I should create one document per entry (i.e. one entry per day). However, when displaying data to the user it rarely occurs on a day by day basis but more on a month by month (charts).
Should I model my Firestore DB in relation to my views or would it be better to just save each entry for each day and then just query?
I am just thinking that it will be more efficient in many parts of the app to have the entries grouped by month than by day.
Am I thinking this right or is there really no benefit? (i.e. maybe the amount of data transferred offsets the costs of unnecessary queries).
If you plane to save each entry for each day and then just query and query to find the result. The more document you'll have, the more document you will query and it will increase you're read/day and so may be more expensive than in a month by month.
To answer your last question :
Let's take an example : if you have a collection with 100 documents.
And you want to query 20 of it.
It will only count as 20 read and not 100 as we might expect.
Just to remind that with firebase you can read up to 50k/day for free, after this limit is it 0.06$/100k read.
I hope it will help you.
Have a nice day !
After spending too much on Power BI trying to see why my user count didn't match when querying userAgeBracket, I used https://ga-dev-tools.appspot.com/query-explorer/ and here is the output:
start-date is 2019-11-01. end-date is 2019-11-30.
Without Date (Notice there are users with age 55-64 and 65+):
When adding Date dimension:
Notice there are now no users with age 55-64 and 65+.
How can I solve this?
As the documentation says:
Thresholds are applied to prevent anyone viewing a report from
inferring the demographics or interests of individual users. When a
report contains Age, Gender, or Interest Category (as a primary or
secondary dimension, or as part of an applied segment), a threshold
may be applied and some data may be withheld from the report. For
example, if there are fewer than N instances of Gender=male in a
report, then data for the male value may be withheld.
So you won't be able in some cases to get granular demographics data in GA reports.
Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).
I mean like, count how many clicks the website since beginning and get their country location.
i already testing Google Analytics API but it requires to input date range, which i want from the beginning but no "Since beginning" or "All data" option available.
will set start date from year 2007 works? any suggestion?
will set start date from year 2007 work
Sure, why shouldn't it. The limit is the number of rows your query returns (10 000 per query. Number of row depends on the number of distinct values for the selected combination of dimensions) and the frequency with which you run your query, not the timeframe selected.
You can test this with the Query Explorer were you can run your metrics/dimension combinations without writing API code, so you can first check if you get the expected results.
I am having
dimension tables
item (item_id,name,category)
Store(store_id,location,region,city)
Date(date_id,day,month,quarter)
customer(customer_id,name,address,member_card)
fact tables
Sales(item_id,store_id,date_id,customer_id,unit_sold,cost)
My question is if I want to find average sales of a location for a month Should I add average_sales column in fact table and if i want to find sales done using the membership card should I add corresponding field in fact table?
My understanding so far is only countable measures should be in fact table so I guess membership_card should not come in fact table.
Please let me know if I am wrong.
No, you should not add an average sales column to your fact table, it is a calculated value, and is not at the same "grain" as the fact table.
Your sales fact table should be as granular as possible, so it should really be sales_order_line_items, one row per sales order line item.
You want to calculate the average sales of a given store for a given month...?
First, by "sales" do you mean "revenue" (total dollars in) or "quantity sold"?
Average daily revenue?
Average monthly revenue, by month?
If you have the store id, date, quantity sold (per line item) and unit price, then it's pretty easy to figure out.
You Should not add aggregate columns In the same fact table. The measures in the fact table should be at the same grain. So if you want aggregate metrics, build a separate fact table at the required grain.
So, I might have a fact aggregate table named F_LOC_MON_AGG which has the measures aggregated at location and month level.
If you do not have aggregate tables, modern business intelligence tools such as OBIEE can do the aggregation at run time.
Vijay