Sitecore Content Delivery Server Not Clearing Cache - asp.net

Our team is having an issue when a publish from the content authoring server to the the content delivery web database will not refresh the content delivery server's cache.
We have a content authoring server which has a master, core, and web database. We also have a content delivery server which has it's own master, core, and web. We have two publishing options. One will publish to the content authoring web database. The other will publish to the content delivery server's web database.
My question(s):
1) How would the publish on the content authoring server know to clear the cache on the content delivery server?
In our site definition file we have defined two events named "publish:end" and "publish:end:remote". The method we've attached here is type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel", method="ClearCache"
2) What is the difference between "publish:end" and "publish:end:remote"?
We have separate site definition files for each environment. There is one for the content authoring server and one for the content delivery server. Since the publish to the content delivery server's web database occurs on the content authoring server - one would assume that it is using the events declared in the content authoring server's site definition file.
3) Can we add the 'content delivery sites' into the content authoring site definition, and then add them into the "publish:end" and "publish:end:remote" event declarations?
4) Does one need to add them to one or both?
5) What exactly does the content authoring site do when it pulls the list of sites in the "publish:end" and "publish:end:remote" configurations?

As mentioned by Ruud, the Sitecore Scaling guide will help you with a lot of your questions.
The first thing to look for is to see if you have EnableEventQueues set to true. That will make sure that your distributed nodes are looking for the events in the master database.
The second thing to look for is to make sure you have your instance names specified so that the content delivery node knows which events to look for. These settings are InstanceName and Publishing.PublishingInstance. Your authoring node will have the same value in both, but your delivery node should have the authoring nodes name in the PublishingInstance setting.
Lastly, and this might just be me, it seems from your question that you have two core and two master databases? I'm not sure why your delivery node need its own core and master databases, unless you are replicating. From my understanding, the EventQueue table in the Core database is what is used by the all the instances to communicate events to each other. Make sure that in some way your delivery and authoring are able to 'share' the EventQueue table, either by using the same Core DB, or by replicating your authoring Core out to your delivery instance.
Between those three things and the Sitecore Scaling guide, you should be able to get your delivery instance to listen to the publishing events from authoring.

When searching for solutions to this issue none of the standard advice worked for me.
Only when I followed the steps from this page did my cache start clearing correctly again: http://sitecoreskills.blogspot.co.uk/2015/08/sitecore-html-cache-doesnt-clear.html
The solution turned out to be completely unrelated to the HTML cache or the event queues.
Sitecore maintains a legacy Lucene index called __System. In our case that index was locked so it couldn't be updated. That somehow prevented the clearance of the HTML cache. The answer was to simply delete it!
It turns out that the __System index can be removed without any problems - Sitecore just recreates it afterwards.

After hours of troubleshooting, we tried changing the InstanceName on the ScalabilitySettings.config file and the cache started to clear after a publish.
I believe you can change the InstanceName to any value and it started working.

Related

How do I download an aspnetForm page with links

I'm trying to download a municipal planning plan together with all the relevant documents.
All documents can be reached from the following link
I've tried the following command (that worked well for other sites) and some variations without success.
wget -E -k -r -l 3 "http://www.mavat.moin.gov.il/MavatPS/Forms/SV4.aspx?tid=4&et=1&mp_id=ppnCWTcsST9gG0%2fa0ayWnjFyZ%2bo14s221Ujlpi7UvR4jIRAHLKhJ8lOLSkomZ%2fvlHk8b2T0oENpI6Wh2hKzxQJCw9BPJP8gav%2ftgiKlk5S0%3d"
The same plan in their new site I can't get the files either,
https://mavat.iplan.gov.il/SV4/1/5000931297/310
I'd appreciate any help.
Well, these days, and especially with .net web sites?
We don't use hyper-links with a simple (full) path name to actual files from the web server. In fact in most cases one will not even give the web server rights to those folders. (they are not exposed to Internet Services).
So, no actual links as a full "url" to documents exist.
What happens is when you click on a button or button link? Then the code behind on the web server runs. (and that is code you don't have). And further more, that code behind can browser, read, retrieve any file from any folder on the server or other servers. But links from the web site don't exist and it not even possible to type in a url to resolve to a actual file name on the server.
So the server side code (not internet services) goes and grabs the document. In fact, the documents could be in a database. So, the code behind on the server side runs and pulls the binary data from the database (which represents a valid PDF file). Or the code behind reads the file from disk and then STREAMS the file for a download.
Now, this is often done for reasons of security. It means that no valid URL exists to get at a document.
Not only is this done for security, but from a developer point of view, it often better to retrieve a row from a database. That row can have the information you SEE rendered on that form, but the web page is not static, and the display of information is thus a developer coding a pull of rows from a database, and then you simply "assign" that data to some type of control - save datagrid, or listview or whatever. (this assignment of data is only 1 or two lines of code, and then the control + web server renders that datagrid control.
So, this is done since the developer thus only assigns the result of a database query to the control when then renders on the form. Thus, to add or remove documents? Then you only have to edit the database for the information on the web page to render.
As a result? There is no direct links to the actual documents on the server. To retrieve a document, you would have to send to the web site the exact command required.
You can hit f12 (most browsers support this). This will put your browser into developer mode. If we do this, and then select elements (select element feature). Now click on a pdf link. You get this:
<img src="../images/ft/file_PDF.gif" style="cursor:pointer"
onclick="openDoc('99000526871729',
'AABA7BE646E182B67DB1C15220E531DF36BBB591D8EEA7757435B2606C08E6F9')">
So, note above. The above code event openDoc is the SERVER side code you have to run to retrive a document. There is thus NO link. And you not going to be able to wire up, or run your OWN web page that hits that server and runs the routine "onclick".
However, the onclick DOES expose the internal database document numbers used to pull/read and retrieve a given document. But the path name, and how the code gets/grabs this file? You have no idea, and HAVE to run server side code (c#, or vb.net) code. That code as noted grabs the file and then uses code to "stream" the file when you download or click on a link.
So for simple HTML like pages? Well, for those that took a one day HTML course? Sure, such web sites will have scr=some path name to a valid url). And these simple systems thus allow you to enter a URL to grab/get a document. And those documents are fully exposed to the web site, and a simple valid URL path name to a file exists. Not so with asp.net, and as noted, this is not only done for security, but it a better over all developer experience to write code that grabs the files as opposed to rendering full path link names to files.
There are many additional benefits. For example, the database that drives this likely has a setting (or some settings) that contain the path names to the documents. If they run out of storage, or say want to move older files to a much slower storage system, which of course is much lower cost? Then can move the files, and update the path name columns in the database. The web site will continue to work, since we NEVER using a exposed URL on the web site. And as noted, actual direct URL's don't exist, and the web server (IIS) as opposed to the code behind will not even have rights to the file names.
As a result?
You not be able to simply pull the web page, and THEN extract the URL's to file names.
What you might be able to do is write code that loads the web page, and then scans all the event code stubs for the links, and have your code click on each button with web browser automation. But, even that don't allow you to enter file names into the download prompts.
So, what you ask is not easy, likely not possible, and a very difficult task. And the simple reason is that site does not use simple HTML and static links to files, and it never actually exposes a direct link to files, and even worse yet is the web server does not have or even allow a URL direct link to a site - they don't exist, and the web site will not even have rights or even allow such URL's to file names. (only the .net code behind does - not internet services).
and grabs the document and then code "streams" the file to to the web site or link you clicked on. So the simple HTML coders in the past would create say a folder (usually a virtual folder) that points to the files on some server/folder. But with .net, it easier (and far more secure).
Modern development tools don't use old fashioned ideas like a URL's to directly retrieve a file - they are designed differently.
In some cases, URL's are allowed or created, and this is done for reasons of sharing links. So if you have a cute video or document? Then the designers of the system will often permit use of parameters in the URL, so you can share a link to someone else. This page has no such provisions. So, you can share a link to the page, but no actual URL to documents or even provisions to allow URL's to a document even exists.
So this quite much means to retrieve a document, you have to go to that web page, and ONLY when you click on a document will the web site "stream" down that one particular document in question.

Sitecore auxiliary content database

Not sure if this is stackoverflow typically question (I'll remove it if suggested), still may help me understand the possible options here.
I would like to know if it's somehow possible beside core, master and web instances to append new content database (let say for some form's filled with data by web users with CRUD repository using existing sitecore api). Editable/readonly from CMS, visible for exports, reports or charts via CMS using custom modules.
Somehow this DB should be located on the same level with Web Database, it's important to follow templates and functionality from sitecore legacy functionality.
This entire shebang will be used in as Sitecore Custom Module (installation, integration customization, management, blah, blah blah). Important: Items stored in this database are pure data items.
I found vague information on John west Sitecore blog, so what I asking more then the direct solution in front of my eyes references or examples how to, or signals if it's against the policy.
Best reference until now: http://intothecore.cassidy.dk/2009/05/working-with-multiple-content-databases.html article written by Mark Cassidy.
The reason you don't find much information on this is because its very uncommon to add another database which is accessible to Sitecore as per the john west blog. Note the data of that post also. I'm not aware of your requirements but I have never seen it done or found a need for it.
With user input data such as forms, comments etc. You have three data considerations storage, access and reporting. In a scenario where you would like to store this data and access it in Sitecore. I would approach it as follows:
Storage of that data should be in the master database inside a bucket. From version 7.0+ buckets were introduced so you can add virtually unlimited data to a Sitecore database. There was a buckets module which supported 6.3+ but appears to not be downloadable anymore: https://marketplace.sitecore.net/en/modules/sitecore_item_buckets.aspx. The code is out there though and possibly Sitecore support would even provide it.
The master database in a standard production environment (split content management and content delivery environments) if not accessible directly via connection string is made accessible by calling the Sitecore web API or creating a custom web service.
Requirements such as reporting and/or shared access to the data for other applications could possibly provide reasons to create a custom database but otherwise there is no reason not to store it in the master database.
You have to save the information filled by the user in Master database so that you can modify or use it using SItecore API.
Since the users filling the form may not have access to modify Sitecore Master DB. You would have to either switch the user to a user with least permissions required to make those changes(safer) or You would have to disable the security for a while and perform those tasks(not recommended). Both of these are explained in http://www.nehemiahj.com/2012/03/how-to-use-securitydisabler-and_15.html
And then add the form as an item in master db. If the number of form items created using this is more then use Sitecore Buckets.

Drupal: How to share content between two Drupal sites?

I have two drupal 7 sites and I want them to share content of a certain content type. I want to have this content stored in an external database. How could I make this happen with a custom module?
You can setup a Rest server.
Then you can use views to share the information you want.
Module: https://drupal.org/project/services
Maybe have a closer look on these modules:
Drupal Sync
Drupal Deploy
I didn't try out one of these so far, but I think it's what you are looking for.
Contents (ie. nodes) on a Drupal site are not stored in a single tables. The same tables are used to store content of different content type. Some of the tables are created dynamically when you add fields to a content type, or when you change their settings. So you cannot share some contents between two sites by simply sharing the table(s) used to store them.
As a rule of thumb, you cannot achieve anything complexe in Drupal by simply doing stuff at the database level. There is too much storage logic implemented in (PHP) code that cannot be ignored when accessing the DB. You should always base your solution on Drupal's API (and most of the time, not the DB layer API, but the high-level API such as the Node and Fields APIs).
That said, there is no core API to communicate between sites. I would use one of the site as the canonical source of the shared contents and the only site where they can be edited. Then somehow replicate these content on the second site. This can be done with the Services or RESTful Web Services module on the second site, and a custom module on the first site, used to push new contents and the updated contents to the second through a REST service.

How to index a web site

I'm asking on behalf of somebody, so I don't have too many details.
What options are available for indexing site content in an ASP.NET web site? I suspect SQL Server's Full Text index may be used if the page content is stored in the database. How would I index dynamic and static content if that content isn't stored in the DB, but in html and aspx pages themselves?
We purchased Karamasoft Ultimate Search several years ago. It is a search engine add-on for your web site. I like it because it is a simple tool that taught us searching on our site. It is pretty inexpensive and we knew we could buy later if we needed more or different features. We needed something that would give us searching without having to do a lot of programming.
Specifically, this tool is a web crawler. It will run on your web server and it will act like an end-user and navigate through your site keeping a record of your web pages, so when a real users searches, they are told the pages that have the content they want.
Keep that in mind it is acting like an end-user, so your dynamic data is indexed right along with the static stuff because it indexes the final web page. We needed this feature and it is what appealed to us the most.
You can use a web crawler to crawl that site and add the content to a database which then is full text indexed. There are a number of web crawlers out there.
Lucene is a well known open source tool that would help you here. The main branch is Java based but there is a .Net port too.
Main site: http://lucene.apache.org/
.Net port: http://incubator.apache.org/lucene.net/
Having used several alternatives I would be loath to do anything other than Google Site Search.
The only reason I use SQL Full Text Search is to search through multiple columns. It's really hard to implement it in any effective manner.

What is the .MSPX file extension?

I've noticed a lot of Microsoft sites have the *.MSPX extension. While I'm very familiar with ASP.NET, I've not seen this extension before.
Does anyone know what this identifies?
A few internet searches led me to http://www.microsoft.com/backstage/bkst_column_46.mspx, but it was a dead link. Fortunately, it was archived on the Wayback Machine and you can read it here:
http://web.archive.org/web/20040803120105/http://www.microsoft.com/backstage/bkst_column_46.mspx
The .MSPX extension is part of the "Microsoft Network Project," which according to the article above, is designed to give Microsoft's sites a consistent look-and-feel worldwide, as well as keep the design of the site seperate from the content. Here's the gist of the article:
The presentation framework includes a custom Web handler built in ASP.NET. Pages that use the presentation framework have the .mspx filename extension, which is registered in Microsoft Internet Information Services (IIS) on the Web servers. When one of the Microsoft.com Web servers receives a request for an .mspx page, this custom Web handler intercepts that call and passes it to the framework for processing.
The framework first checks to see whether the result is cached. If it is, the page is rendered immediately. If the page is not cached, the handler looks up the URL for that page in the table of contents provided by the site owner (see below) to determine where the XML content for the page is stored. The framework then checks to see if the XML is cached, and either returns the cached content or retrieves the XML from the data store identified in the table of contents file.
Within the file that holds the content for the page, XML tags identify the content template to be used. The framework retrieves the appropriate template and uses a series of XSLTs to assemble the page, including the masthead, the footer, and the primary navigational column, finally rendering the content within the content pane.
I think it's an XML based template system that outputs HTML. I think it's internal to MS only.
Well, a little googling found this:
The presentation framework includes a
custom Web handler built in ASP.NET.
Pages that use the presentation
framework have the .mspx filename
extension, which is registered in
Microsoft Internet Information
Services (IIS) on the Web servers.
When one of the Microsoft.com Web
servers receives a request for an
.mspx page, this custom Web handler
intercepts that call and passes it to
the framework for processing."
I'd like to find out more info though.
I love you guys, i was asking myself also many times, why MS uses .mspx and what it is at all?! :)
That time i couldn´t find any informations quickly and assumed it would just be something on top of asp.net or maybe not even that, because you should be able to assign the same asp.net cgi dll to .mspx also easy too ;)
But, surely, it can be anything.. also an "special" CGI itself (completely beside ASP.NET), which processes that request with much better / much more cache-use, easier editing and so on..
The end of the story was, that i came accross the view, that maybe it´s not important to know, what .mspx exactly is :)

Resources