I have a query that fetches data between 2 dates, using startAt (1 week ago) and endAt (now) on the last_visit fields.
Then I loop through the results to discard users who don’t have a profile picture.
Problem is around 20% of the users have a profile picture, so just to get 100 users with profile pictures, I have to query at least 500 people (I use limitToLast(500)).
Could I make this query more efficient, by somehow specifying something like in SQL: WHERE profile_picture IS NOT NULL?
If possible, I could also use only limitToLast(100) if it was possible to only take the users that do have a profile picture set.
Database looks like:
users: {
{user_uid}: {
profile_picture: null,
last_visit: 123456789
}
{user_uid}: {
profile_picture: 'example.com/pic.png',
last_visit: 123456789
}
}
If you're trying to exclude items that don't have a property, you need to query for the broadest range of values possible.
A simple example would be:
ref.orderByChild('profile_picture').startAt('!').endAt('~')
This will capture most keys that consist of ASCII characters, since ! is the first printable character (#33) and ~ is the last printable character (#126). Be careful with these, because they won't work when your keys consist of unicode characters.
Related
I have a requirement to find all users in a table that have same Id, Email or Phone.
Right now the data looks like this:
Id //hash
Market //sort
Email //gsi
Phone //gsi
I want to be able to do a query and say:
Get all items that have matching Id, email or phone.
From the docs it seems that you can only do a single query based on keys or one index. And it seems that even if I was to combine phone and email into one column and GSI that column I would still be limited to a begin with filter expression, is this correct? Are there any alternatives?
it seems that you can only do a single query based on keys or one index
Yes.
if I was to combine phone and email into one [GSI] I would still be limited to a begin with filter expression, is this correct?
Essentially, yes. Query constraints apply equally to indexes and the table keys. You must specify one-and-only-one Partition Key value, and optionally a range of Sort Key values.
Are there any alternatives?
Overload the Partition Key and denormalise the data. Redefine the Partition Key column (renamed PK) to hold Id, Email and Phone values. Each record is (fully or partially) repeated 3 times, each time with a different PK type.
PK Market Id More fields
Id-1 A Id-1 foo
zaphod#42.com A Id-1 # foo or blank
13015552572 A Id-1 # foo or blank
Querying PK = <something> AND Market > "" will return any matching id, email or phone number value.
If justified by your query patterns, repeat all fields 3x. Alternatively, use a hit on a truncated email/phone record to identify the Id, then query other fields using the Id.
There are different flavours of this pattern. For instance, you could also overload the Sort Key column (renamed to SK) with the Id value for Email and Phone records, which would permit multiple Ids per email/phone.
I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not
Is it possible to query just part of the name and get the data
Like I have data 12345678
can I somehow just search for 1234?
data.whereEqualsto("data", 1234);
This is needed because the last numbers changes and the first ones doesn't.
Is it possible to query just part of the name and get the data Like I have data 12345678 can I somehow just search for 1234?
Sure it is. When it comes to Firestore, you can simply use .startAt() as seen in the following query:
db.collection("collName").orderBy("data").startAt(1234);
When it comes to the Realtime Database, you can use .startAt() too, but as seen below:
db.child("nodeName").orderByChild("data").startAt(1234);
But remember, both queries will return elements that are greater than 1234. Since 12345678 is greater, it will be present in the result set.
Users store their phone numbers in different formats eg.: +1234567890 +1 (234) 567 890 etc.
I try to get user record from DB by phone number. Looks like I have to use Doctrine beberlei/DoctrineExtensions to make REGEX query but I don't understand how exactly to build query. Code below doesn't works.
$query = $this->createQueryBuilder('user')
->where('REGEXP(user.phone, :regexp) = :phone')
->setParameter('phone', preg_replace("/[^0-9]/", "", $phone ))
->setParameter('regexp', "[0-9]");
I have done extended work with parsing phonenumbers. When you store a phonenumber, you store it as a text. If you want to find a phonenumber, you use the LIKE clause. Also don't forget to enclose the phonenumber in quotes and %% signs.
Example WHERE clause:
WHERE phonenumber LIKE '%1234567890%'
So clean up the number and then use the above method to search for it.
So I've got to design a table for clients with fields (Id, name, bla, bla, Phone numbers). The last field terrifies me as there is not only one number, but many. I see 3 ways to accomplish this task
The field is String. Anytime before an insert, the String array of phone numbers is encoded using a delimiter ';' and thereafter inserted as String.
The field is BLOB. The string array is directly stored (no idea if this is possible in sqlite).
Create another table for Phone numbers with field (ClientId, PhoneNumber).
What seems the best approach?
As it is bad practice to store multiple values in one field, the third option stated is the regular way to go.