Forgive me but this is a supper basic question, I couldn't really find a good explanation..
Say I am querying a collection looking for certain attributes, and I return some results that match that.
Then I requery looking for some different attributes, some of which will have the same results.
Is there a way to avoid redownloading that data? some of my results will have already been loaded to the users phone. Does that make sense? Basically looking for a way to cache data and not have to worry about pulling the same data twice
Related
I'm creating my main project functionality right now so it's kind of a big decision to make in my project, I want efficient & scalable solution. I use different API's to fetch users products ultimately for 1 collection to display products information inside a table with possible merge by SKU TITLE from different sources.
I have thought of 2 approaches (In both approaches we add Meteor.userId() to collection insert so each users has it's own products:
1) to create each API it's own collection and fetch the products to it, after or in middle of the API query where I insert it to sourceXProducts also add the logic of merge products by sku and add it to main usersProducts Only the fields I need, and we have the collection of the sourceXproducts if we ever need anything we didn't really include to main usersProducts we can query it and get it so we basically keep all the information possible (because it can come handy)
source1Products = new Meteor.Collection('source1Products');
source2Products = new Meteor.Collection('source2Products');
usersProducts = new Meteor.Collection('usersProducts');
Pros: Honestly I'm not sure, It makes it organized also the way I learned Meteor it seems to be used a lot.
Cons: Meteor collection joins is not supported in core yet, So I have to use a meteor package such as: meteor-publish-composite which seems good but this way might hit performance
2) Create 1 collection and just insert everything the API resonse has and additional apiSource field so we can choose products from X user X api.
usersProducts = new Meteor.Collection('usersProducts');
Pros: No joins, possibly better performance
Cons: Not organized, It can become a large collection maybe it's not good for mongodb
3) Your ideas? :)
First, you should improve the question. You do not tell us anything precise about your schema. What are the entities you have and what type of relations are there and what type of joins do you think you will be doing. How often you will be doing them?
Second, you should rethink your schema and think in the terms of a non-relational database. I see many people coming from SQL world and then they simply design their schema in the same way. Wrong. MongoDB is not SQL and things you learned there you should not try to just reuse here. You should start using features like subdocuments and arrays which can help you solve many basic things you would do in SQL with joins. So, knowing your schema would help us help you design the schema. For example, see this answer and especially the comments for the discussion for a similar type of question you are asking here.
Third, have you evaluated various solutions which exist out there? There are many, but you have not shown us that you tried any of them and how it worked for you. What were pros and cons of them, for you and your project?
Fourth, if you are lazy to evaluate, you can just use peerlibrary:peerdb and peerlibrary:related. They are simply perfect. You should trust me. I am their author.
This is entity framework:
var department = _context.Departments
.Include(dep => dep.Employees.Select(emp => emp.ContactTypes))
.SingleOrDefault(d => d.Id == departmentId);
Here I expect one department to be returned containing all related employees and all contact types for each employee.
This is ormlite servicestack:
I have no idea. When I look at the docu/samples: https://github.com/ServiceStack/ServiceStack.OrmLite
They write:
Right now the Expression support can satisfy most simple queries with a strong-typed API. For anything more complex (e.g. queries with table joins) you can still easily fall back to raw SQL queries as seen below.
I have seen there is a JoinSqlBuilder class but I do not think it can return nested collections.
Maybe what I want is not possible but maybe I can do a compromise like get all employees for the departmentId. Then I inmemory foreach the employees and fetch all contact types for a certain employeeId. Creating the hierarchy and assigning the lists would still be my job.
But I hope there is a shorter solution.
What would also be fine is when the query however it might look like return an object (Dynamic?) with 3 flat properties: Department, Employees, ContactTypes and assign thoese properties to my DTO.
Ok, please don't take this as a definitive answer, but more just my take on the situation (I don't use service stack very much) however...
When I first started to use EF many years ago, I came across a similar situation, where the references just would not load. Like you I was faced with the likely hood of having to enumerate the individual collections myself and write a lot of extra code for an operation the ORM should be able to handle easily.
What I ended up doing, was to use auto-mapper, which basically reduce all the multiline loops I had everywhere to a single line mapping statement.
Granted, I still had to do one mapping statement for each linked property, but it reduced the code I had to write, and more importantly got me up and running until EF improved, or I found a better way of doing things.
Let me stress, I'm not proposing this as an answer, and it's a bit big for a comment, I'm simply suggesting shifting your thought in a different direction, that may help a better solution come to the surface.
I'd like to learn about using catalogs correctly.
I have about 30 useful content types, about 50 indexes in catalog.xml, and about 45 metadatas. There are just three types which account for most of the site's data - and I may need millions of these. I've been reading, and there's lots to do, but I want to have the basic configuration right before I begin all that.
This page told me that any non-default indexes should not be added to the portal_catalog. I've even read people explaining how removing one, or two of the default indexes makes a performance difference.
My question is: what are the rules for dividing up the indexes into different catalogs, and for selecting which catalog(s) index which type(s)?
So far I have created one additional catalog, used to catalog all indexes for my 'site-setup' objects (which I have caused to no longer be indexed in portal_catalog). The site-setup indexes are very often used, but more rarely modified than others, so I thought it was correct to separate them from objects which are reindexed more often. I'm not sure if that's the main consideration though.
Another similar question (a good example of the kind of thing I want to solve): how would you handle something like secondary workflow review_state variables? I give each workflow's review_state variable an index (and search on them quite often), but some of my workflows are only used on just a few types. (my most prolific objects have secondary workflows...)
I'd be very grateful for advice!
Campbell
This won't cover everything but I'll bring up some points..
Anything not in the portal_catalog won't work with collections, folder_contents view, getFolderContents method, search, portlet collections, related items(I think) and anything else the assumes you're using the portal_catalog.
I like to use an additional catalog when I need to be able to query the data but it only affects a sub-set of the content objects.
Use collective.indexing to speed up indexing operations.
Mount the catalogs on their own mount points so you can cache them differently from the rest of the site(so you can cache the whole catalog). Then, you can even serve the the catalogs from dedicated zeoserver.
Also, if your content doesn't have to be cataloged by the portal_catalog(with all the constraints listed), you may even want to think about if you need it as a full-fledged (archetype|dexterity) type in the first place. You can use a more slim repoze.catalog to catalog arbitrary objects(which could be very simple data) for whatever your purpose is and get even more performance. Or better yet, look into Solr for indexing it for VERY good performance.
On more thing, depending on the type of data you're storing, you could even look into using a relational database for a data store. But I don't know what kind of queries, indexes, data, etc you have...
30 different types seems like a lot but I don't know what your use case is. Care to share? Perhaps there is a better way to do it.
Search is the most used feature on our website and the search query is the most CPU intensive, complex and frequent query that executes on our db, causing heavy CPU usages on the db server. To reduce the load on the db we have been looking at various caching strategies. For now, we intend to use the ASP.NET Cache.
The idea is to have an in-memory db of the most frequently/recently created/accessed objects in the cache and then query the in-memory db using linq to come up with search results. My initial thought was to Cache a List of the Users and then query or modify this List using linq. But given the complexities of multiple threads accessing or trying to modify List I was looking at other options.
Which is when I thought that instead of Caching a List, cache the individual User objects with its Id as the key and try and query the Cache. At http://msdn.microsoft.com/en-us/library/system.web.caching.cache.aspx I see that the Cache has an extension method AsQueryable but I am not sure what does this mean. Cache is a key value pair so with AsQueryable will I able to query the keys and get a set of User Objects or will I able to query the User objects and get my desired result?
Before you start this you really need to have some measurability in place around it -- there is no way to figure out if your changes help or hurt without having some good, solid data to make that judgement on. Performance, especially performance at scale isn't something you can think or guess through. You have to know your way through it.
As for your solution, I think you might well make the problem worse or at least create another problem here. Your database server is theoretically designed to handle arbitrary user queries across vast information sets efficiently. Linq is awesome but it is not really meant to be an ad-hoc search engine -- it doesn't have the sorts of indexing capabilities one really expects from search engines. Just because it can expose things as an IQueryable doesn't mean you should treat it that way. And even if you've got a way to efficently search the cache, you've got another problem to get past -- how do you identify what is most frequently used? And how do you manage the ASP.NET cache to not start ejecting things when it gets low on memory?
You would probably be better served here by:
Starting with some good old fashioned database tuning -- why are your queries so slow and expensive? Are you missing an index somewhere?
Looking at caching the results page output, especially if your search URLs are GET-able as that is pretty easy to manage. This is a great short term solution if the site is melting.
Look at building the search bits properly. Using LIKE %whatever% is not a proper search. Full text indexes in your database is a good start. Something like lucene.net is probably better.
No, cannot use AsQueryable to query User objects and get the desired result I was looking for. So now I will be using a static List for the time being though I know I will have to change sooner rather than later.
I know there are already some questions on this topic on the site...
I am just trying to understand if it's safe to use ASP.NET Profile Provider with a website with huge traffic?
The way I see it, it's laid out inefficiently. You store property name (which is a string) and property value (which is a string too). If you are just trying to store even age in the profile, you are unnecessarily storing the string "age" in the database over and over whereas with a self-created table, you could just add a column titled age, and no redundancy?
(I am just trying to make sure I am not missing something about it, because I am fairly new to it.)
The profile provider uses an EAV (Entity-Attribute-Value) design deliberately, because profiles in general very commonly have a sparsely populated schema - that is, there are many potential attributes, but only a few will be used for a given single entity, and the few that are used varies widely from one entity to the next.
Let's use a totally arbitrary example - let's say only one in 10 of your users want to provide their age. Making that a column now seems more like a waste, no?
But what if your application makes age mandatory? OK, that column gets populated for everyone. But what if you need to make a note in the profile "user doesn't want to see this obscure dialog anymore". Do you really want a column for every single dialog in your application whether a user wants to see it? Probably not. When you get into the little one-off details of an application of any significant scope, EAV actually becomes the more economical choice.
In the general, it scales quite well (far better than you probably think). In the specific, it doesn't matter - as always, use what works and fix performance problems when they come up. Whatever the scalability limitations of the profile provider are, you'll know when you hit them. I guarantee two things - (1) you'll have to fix a lot of other performance problems you didn't expect before you have to fix that; and (2) if your site is getting enough traffic to break the profile provider, it's a good problem to have.
I agree with Rex M, unless you have a need to do things like sort all your users by age or do other procedures with aggregate profile data. Then you could consider rolling your own. But for just storing properties that you access here and there on a user-by-user basis, Rex M is right.
I do know what you mean. Wouldn't it make sense to supplment the profile provider's table with another table that has columns with mandatory fields? or do you think the overhead of join would not make it not worth it?