TLB structure in intel - intel

I started from Patterson & Hennessy book with basic definitions and then followed the intel programming reference documents for more information about TLB.
From the intel documents i got to know the high level design of TLB.
such as line size, associativity and levels of caching. But in need a detailed explanation of how TLB caching works with respect to cache misses and its replacement mechanisms in modern CPU. What pages moves
to L2 TLB from L1 TLB ? how many pages can a single entry in TLB address? How many entries are present in TLB ? (In particular DTLB)
Any Information or references will be of great help to me.
(If this is not the proper forum for this question, please suggest the right one)
Thank you.

TLB can be called a translation cache and thus, its functioning is almost as that of on-chip caches, e.g., the tradeoffs of exclusive/inclusive hierarchy, multi/single-level, private/shared are same as that in cache. Same for associativity, page size, etc.
One TLB entry only maps one virtual page to physical page, but the page size can be varied, e.g., instead of 4kB, a processor can use 2MB or 2GB, which is called a superpage or hugepage. Or a processor can use multiple page sizes.
Since you are asking for reference, see my survey paper on TLB which answers all these questions and reviews 85+ papers. Specifically, section 2 of the paper references papers that discuss TLB designs in commercial processors.

Related

Architectural choices for a CRM

I am looking for some light in the complexity of architectural selection, before starting the development of a CMS or CRM or ERP.
I was able to find this similar question: A CRM architecture (open source app)
But it seems old enough.
I watch and read recently several conferences, discussions about monolith vs distrubuted, DDD philosophy, CQRS and event driven design, etc.
And I panic even more than before on the architectural choice, having taken into account the flaws of each (I think).
What I find unfortunate with all the examples of microservices and distributed systems that can be found easily on the net is that they always take e-commerce as an example (Customers, Orders, Products ...). And for this kind of example, several databases (in general, a NoSQL DB by microservice) exist.
I see the advantage (more or less) ==> to keep a minimalist representation of the necessary data for each context.
But how to go for a unique and relational database? I really think I need a single relational database, having worked in a company producing a CRM (without access to the source code of the machine, but the structure of the database), I could see the importance of relational: necessary for listings, reports, and consult the links between entities within the CRM (a contact can have several companies and conversely, each user has several actions, tasks, but each of his tasks can also be assigned to other users, or even be linked to other items such as: "contact", "company", "publication", "calendarDate", etc. And there can be a lot of records in each table (+ 100,000 rows), so the choice of indexes will be quite important, and transactions are omni-present because there will be a lot of concurrent access to data records).
What I'm saying to myself is that if I choose to use a microservice system, there will be a lot of microservices to do because there would really be a lot of different contexts, and a high probability of having a bunch of different domain models. And then I will end up having the impression of having to light each small bulb of a garland, with perhaps too much process running simultaneously.
To try to be precise and not go in all directions, I have 2 questions to ask:
Can we easily mix the DDD philosophy with a monolith system, while uncoupling very small quantity (for the eventual services that should absolutely be set apart, for various reasons)?
If so, could I ask for resources where I can learn a lot more about this?
Do we necessarily have to work with a multitude of databases, and should it necessarily be of the kind mongoDb, nosql?
I can imagine that the answer is no, but could I ask to elaborate a little more? Or redirect me to articles that will give me clear enough answers?
Thank you in advance !
(It would be .NET Core, draft is here: https://github.com/Jin-K/simple-cms)
DDD works perfectly as an approach in designing your CRM. I used it in my last project (a web-based CRM) and it was exactly what I needed. As a matter of fact, if I wouldn't have used DDD then it would have been impossible to manage. The CRM that I created (the only architect and developer) was very complex and very custom. It integrates with many external systems (i.e. with email server and phone calls system).
The first thing you should do is to discover the main parts of your system. This is the hardest part and you probably get them wrong the first time. The good thing is that this is an iterative process that should stabilize before it gets to production because then it is harder to refactor (i.e. you need to migrate data and this is painful). These main parts are called Bounded contexts (BC) in DDD.
For each BC I created a module. I didn't need microservices, a modular monolith was just perfect. I used the Conway's Law to discover the BCs. I noticed that every department had common but also different needs from the CRM.
There were some generic BCs that were common to each department, like email receiving/sending, customer activity recording, task scheduling, notifications. The behavior was almost the same for all departments.
The department specific BCs had very different behaviour for similar concepts. For example, the Sales department and Data processing department had different requirements for a Contract so I created two Aggregates named Contract that shared the same ID but they had other data+behavior. To keep them "synchronized" I used a Saga/Process manager. For example, when a Contract was activated (manually or after the first payment) then a DataProcessingDocument was created, containing data based on the contract's content.
Another important point of view is to discover and respect the sources of truth. For example, the source of truth for the received emails is the Email Server. The CRM should reflect this in its UI, it should be very clear that it is only a delayed reflection of what is happening on the Email Server; there may be received emails that are not shown in the CRM for technical reasons.
The source of truth for the draft emails is the CRM, with it's Email composer module. If a Draft is not shown anymore then it means that it has been deleted by a CRM user.
When the CRM is not the source of truth then the code should have little or no behavior and the data should be mostly immutable. Here you could have CRUD, unless you have performance problems (i.e. millions of entries) in which case you could use CQRS.
And there can be a lot of records in each table (+ 100,000 rows), so the choice of indexes will be quite important, and transactions are omni-present because there will be a lot of concurrent access to data records).
CQRS helped my a lot to have a performant+responsive system. You don't have to use it for each module, just where you have a lot of data and/or different behavior for write and read. For example, for the recording of the activity with the customers, I used CQRS to have performant listings (so I used CQRS for performance reasons).
I also used CQRS where I had a lot of different views/projections/interpretations of the same events.
Do we necessarily have to work with a multitude of databases, and should it necessarily be of the kind mongoDb, nosql? I can imagine that the answer is no, but could I ask to elaborate a little more? Or redirect me to articles that will give me clear enough answer
Of course not. Use whatever works. I used MongoDB in 95% of cases and Mysql only for the Search module. It was easier to manage only a database system and the performance/scalability/availability was good enough.
I hope these thoughts help you. Good luck!

Why do I have more flash (STM32F103RCT6) than what my datasheet says?

I'm writing a firmware to a STM32F103RCT6 microcontroller that has a flash of 256KB according to the datasheet.
Because of a mistake of mine, I was writing some data at 0x0807F800 that according to the reference manual is the last page of a high density device. (The ref. manual make no distinction of different sizes of 'high density devices' on the memory layout)
The data that I wrote, was being read with no errors, so I did some tests and read/wrote 512KB of random data and compared the files and they matched!
files hash pic
I did some research I couldn't find similar experiences.
Are those extra flash reliable? Is that some kind of industrial maneuver?
I would not recommend using this extra FLASH memory for anything that matters.
It is not guaranteed to be present on other chips with the same part number. If used in a product that would be a major problem. Even if a sample is successful now, the manufacturer could change the design or processes in the future and take it away.
While it might be perfectly fine on your chip, it could also be prone to corruption if there are weak memory cells.
A common practice in the semiconductor industry is to have several parts that share a common die design. After manufacturing, the dies are tested and sorted. A die might have a defect in a peripheral, so is used as a part that doesn't have that peripheral. Alternatively, it might be perfectly good, but used as a lesser part for business reasons (i.e., supply and demand).
Often, the unused features are disabled by cutting traces, burning fuses, or special programming at the factory, but it's possible extra features might left intact if there are no negative effects and are unlikely to be observed.
If this is only for one-off use or experimentation, and corruption is an acceptable condition, I don't really see a harm in using it.

How to store huge amount of data in database

I have a simple basic question. Assume i have a large website like facebook, gmail and so on. this site probably save hundreds of gigabytes information every day. My question is how these sites save this large information in their database(Because of database capacity). Is there only one database? Is there only one server for this site? If there is another server and database, how they can communicate with each others?
They are clearly not using one computer...
The system behind such large sites are very complex, and distributed across datacenters. See - http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/
Take a look at this site for info on various architectures employed by those sites (and this site): http://highscalability.com/all-time-favorites/
Most of these sites have gone with a strategy called NoSQL - that is they don't use traditional RDBM databases, but instead have created their own object relationship frameworks which have the ability to be persisted. This strategy works well at large scale as it drops a number of constraints which would seriously impact performance of traditional DB methods. However this generally comes at the cost of a lowering of reliability, which is generally considered acceptable for those sites' scenarios.
ps. if your question's general interest then no worries. If you're trying to build a highly scalable application hold off and consider it for a moment - are you going to be serving a significant percentage of the population of the world, or are you writing a site for maybe a few thousand users. If it's the latter you don't need Facebook style scaling; invest your effort and resources elsewhere. If it's the former start small then evolve your system, bringing in investment and expertise as your user base grows.

OpenCL research/ academic papers

I'm about to start my honours project at uni on OpenCL and how it can be used to improve modern game development. I know there is a couple of books out now/soon about learning opencl but I was wondering if anyone knows any good papers on opencl.
I've been looking but can't seem to find any. Part of my project requires a literary review and contrast so any help on this would be appreciated.
I'll not point you directly to any papers, instead I'll give you a few hints on where to look for them.
Google scholar, One of the best places on the web to search for papers on any subject. Searching for "opencl game development" turned up a few interesting results right on the first page; for sure there are other valuable results in the following pages.
IEEE Explore; IEEE is one of the de facto establishments on all thing computer and electronics; their journals and conferences have many publications on OpenCL in particular and parallel processing in general. IEEE Explore is their search engine, although usually all articles are also referenced in Google Scholar (but may be easier to find using IEEE explore).
ACM Digital Libray; ACM is a large and important institution like IEEE, but with even bigger focus on computing. You will find many papers on OpenCL there.
Google, Yahoo, Bing, etc; sometimes when everything else fails, using normal search engines can go a long way. You may find information about ongoing projects, important game developers blog posts, etc. All of these can be valid references if there aren't more (be sure to search really well before concluding there aren't more).
You should favor articles published in scientific journals over: a) papers or extended abstracts published in conference proceedings; b) corporate articles, not peer-reviewed, usually found in the respective corporation websites; c) articles published in general scientific knowledge magazines (e.g. Scientific American, etc.).
Sometimes you may not be given access to certain papers, which you will be requested to purchase. Usually, universities have subscriptions to many journals or such, as such you may have better luck trying to download the PDFs when accessing the web inside your institution. If you have no luck, sometimes the authors put "preview/unfinished" copies of the articles in their websites (sometimes they even put the dubiously legal published copy). As a last resort, you can always contact the authors directly, they'll most likely send you the article by email (it's of their own interest).
Finally, to learn OpenCL, I found that a mixture of reference manual, quick reference card and looking at examples from Intel, AMD, Nvidia and IBM SDK's goes a long way. No doubt a book will help, though I can't recommend you any, because I didn't read any.
This probably isn't the answer you wanted, but believe me, it's the answer you need to do a good work.
Good luck!

Fractal based Image compression algorithm (and source code)

I am looking for a decent fractal based compression algorithm for images. So far I only find dead links to the FIF image format and dead links pointing to Iterated Systems Inc which then became MediaBin which then became nothing from what I can see.
The source files in ANSI C (enc.c and dec.c) for PIFS (i.e. partition iterated function system) are available on the website of Fractal Image Compression: Theory and Application to Digital Images book.
I think you should take a look at the Fiasco library. You can find an old article about it on Linux Journal.
The "nanocrunch.cpp" source code, implementing, yet another variant on fractal image compression, was developed by "Boojum", and is the top answer to another Stack Overflow question: Twitter image encoding challenge .

Resources