How to index a web site - asp.net

I'm asking on behalf of somebody, so I don't have too many details.
What options are available for indexing site content in an ASP.NET web site? I suspect SQL Server's Full Text index may be used if the page content is stored in the database. How would I index dynamic and static content if that content isn't stored in the DB, but in html and aspx pages themselves?

We purchased Karamasoft Ultimate Search several years ago. It is a search engine add-on for your web site. I like it because it is a simple tool that taught us searching on our site. It is pretty inexpensive and we knew we could buy later if we needed more or different features. We needed something that would give us searching without having to do a lot of programming.
Specifically, this tool is a web crawler. It will run on your web server and it will act like an end-user and navigate through your site keeping a record of your web pages, so when a real users searches, they are told the pages that have the content they want.
Keep that in mind it is acting like an end-user, so your dynamic data is indexed right along with the static stuff because it indexes the final web page. We needed this feature and it is what appealed to us the most.

You can use a web crawler to crawl that site and add the content to a database which then is full text indexed. There are a number of web crawlers out there.

Lucene is a well known open source tool that would help you here. The main branch is Java based but there is a .Net port too.
Main site: http://lucene.apache.org/
.Net port: http://incubator.apache.org/lucene.net/

Having used several alternatives I would be loath to do anything other than Google Site Search.
The only reason I use SQL Full Text Search is to search through multiple columns. It's really hard to implement it in any effective manner.

Related

Extend legacy site with another server-side programming platform best practice

Company I work for have a site developed 6-8 years ago by a team that was enthusiastic enough to use their own private PHP-based CMS. I have to put dynamic data from one intranet company database on this site in one week: 2-3 pages. I contacted company site administrator and she showed me administrative part - CMS allows only to insert html blocks & manage site map (site is deployed on machine that is inside company & fully accessible & upgradeable).
I'm not a PHP-guy & I don't want to dive into legacy hardly-who-ever-heard-about CMS engine
I also don't want to contact developers team, 'cos I'm not sure they are still present and capable enough to extend this old days site and it'll take too much time anyway.
I am about to deploy helper asp.net site on IIS with 2-3 pages required & refer helper site via iframe from present site. New pages will allow to download some dynamic content from present site also.
Is it ok and what are the pitfalls with iframe approach?
This is the second "I'm stuck with a legacy CMS, and fixing it would be too hard" question I've seen here in the last day. I really don't see what the problem is -- I've done this in less than a day:
Pick any modern CMS and see what tools it provides for importing pages. Spend a little time learning how it stores pages. (I chose Wordpress).
Backup the CMS database.
Run a web-spider through the old system and dump all of the pages to disk as plain HTML.
For each page that you saved:
Run HTML Tidy on each HTML page to make it more uniform.
Run it through sed or perl or write a custom program (say, python with BeautifulSoup) to separate out the page content from the (no longer needed) navigational cruft.
Insert the content into a new CMS-managed page (ideally by inserting a new row in the CMS database).
Review the site and manually clean up anything that wasn't caught in the conversion.
A little bit of shell scripting can automate most of this -- just keep refining your scripts until you get most of it 'right'. If you backup the CMS database before you run your script, you can reset the site to 'empty' for each import.
(In my case, the site in question had been in use for ~10 years, with a succession of webmasters, each who used different tools and techniques for managing content, and had been hacked a couple of times by spamvertisers.)
Admittedly, this isn't a science, and it may require you to learn some new tools. Go for it -- learning new stuff is good for you, and you won't have to keep that old server running for the next 10 years, just so you can wrap its content in an iframe.

I want to make a asp.net website in blogger..is it possible.what are the alternate ways?

I have been thinking of creating a website with asp.net.
Is it possible to develop asp.net sites in blogger? Wat i have seen is write something in plaintext or html and it wil be posted...but i want to write asp.net code using sql server in bloggger.
thx
There are a number of blog engines for asp.net. http://www.dotnetblogengine.net/ is one that I've heard is easy to use.
You can't write an asp.net application in blogger because blogger is a free service offered by Google, hosted on their own servers.
I'd assume you're referring to the way of inputting text. If that's the case, check out Textile.NET. This will simplify textual input and allow you to store values in the database which don't include any html.

Put ASP.NET on wordpress site

I work for a college and our main website has an ASP.NET based course information search which I created. This has become popular and our company facing website (training for companies) has asked for the same system on their website. I'm not involved in the day to day of either website but know theirs was made using Wordpress. Is it going to be possible for me to embed some ASP.NET code within some of the pages? Any articles on doing this?
EDIT:
The ASP.NET code that would appear in the actual Markup is minimal it's mainly a few asp:Literals I did this on purpose to hide most of it from the website developer to save myself hassle when something gets deleted by accident.
EDIT2 There was a response to do it as a webservice would this be possible. i.e. as search box on the main page displaying the results underneath.
Since asking this question a long time ago and creating a less than ideal iframe solution I have now found a great wordpress plugin called iframe-less
http://wordpress.org/extend/plugins/iframe-less-plugin/
Basically you give it an URL and it builds the content of that page directly into your wordpress page. So far it seems to work really well.
I have similar needs that the originator of this thread has. I maintain a CRM and corporate site that runs on ASP.NET/SQL along with a separate Wordpress php company blog. After we've been using Wordpress for a year, people here would love to be able to edit static content on our corporate site like we do in Wordpress, so I am looking at possible ASP.NET/Wordpress hybrid set ups.
I am hearing good things about "Phalanger": http://www.php-compiler.net
It is a PHP Language Compiler for the .NET Framework, and you can run PHP code in .NET
It was also great to find out in this thread that you can have PHP and ASP.NET in the same IIS web, its another reasonable sounding solution. If I had any nay reputation (I am new here) I'd give RickNZ a vote.
What you could do is create a web service on your ASP.NET application and then write a Wordpress plugin, that would read that service and display it in wordpress page.
This wasn't ideal but the solution I produced involved using IFrames which are still in the HTML 5 spec (infact they have some new attributes) so I think I am ok. Basically I make a page in wordpress with an IFrame and some javascript on its onload to make the iframe resize automatically based on the content size using the code below (iframe called frame with width 100 percent).
function autoIframe(){
try
{
var page_height = document.getElementById('frame').contentWindow.document.body.scrollHeight;
document.getElementById('frame').height = page_height+60;
}
catch (err)
{
window.status = err.message;
}
}
This code will resize on loading of the first content, if the content changes it will need to be called in someway. My solution was to call the method from the innerpage using parent.autoIFrame() each time a search was done.
p.s. The javascript will only work if the iframe and outer page are from the same domain (No cross site scripting).
Wordpress uses PHP and MySql. I have successfully installed and run it under Windows 2008 with IIS 7. The new CGI stuff in IIS 7 results in pretty good performance, too.
You can of course run a separate but related ASP.NET-based site on the same server.
You can also run a mixed ASP.NET + PHP site. IIS directs incoming requests to a particular HttpHandler based on the extension of the URL, so there's no reason why you can't mix *.php & *.aspx.
In fact, you can also do things like write a .NET-based HttpModule that integrates with a PHP/IIS site, to do things like logging, centralized cookie management, HTTP header "adjusting", etc.
If you want to put ASP.NET controls in a *.php file, that's a different thing entirely. To do that, you would need to write an HttpHandler that understood how to parse such a file. Either that, or just use iframes....
Short answer: no, not easily. Wordpress is PHP - you can't just put some .net code on a PHP page.
Long answer: yes, if... if you are really keen to do this, and it's worth the time and effort, you can work around it by using some of the strategies suggested already, e.g.: host the ASP.NET bit on a windows server (or use mono) and show it inside an iframe on the wordpress page.
Just bare in mind that this is not a common setup, and may be more difficult than simply creating or using some kind of Wordpress plugin.
I am exploring http://sourceforge.net/projects/wordpressnet/ if it helps anyone ...
Also,
http://wpdotnet.com/ (related article : http://www.php-compiler.net/blog/2011/wordpress-on-net-4-0)
http://wordpress.org/support/topic/installing-to-a-net-server
I know it is an old post and I too do not prefer necroposting but
these resources may improve the existing content.
WordPress is a LAMP(Linux Apache MySQL PHP) application, and normally running in Linux servers. I don't think you can integrate ASP.Net to wordpress. But off course you can provide link to ASP.Net application from WordPress.
No, this won't work. You cannot use ASP.NET on pages that are served by WordPress. You can use ASP.NET in the same web site as Wordpress, for example by having certain directories or certain pages serve ASP.NET content, while the rest of the site still serves WordPress content.
However, if the ASP.NET code you wish to use is very simple, why not do it in PHP instead? WordPress uses PHP, which is very similar to ASP.NET.
I can be able to use both Asp.Net and Wordpress on my Host (Dinamo.net.tr)
without using any plugin or iframe.
They can really work together,
you just upload your Asp.Net C# files,
and install Wordpress at the same time.

What are some ways to support multiple websites with a single code base?

I'm writing a pretty straight forward ASP.NET MVC web app: only a couple of CRUD pages, some folders where clients can browse documents and just 3 or 4 roles. The website will be used in a B2B scenario, where every client will have their "own" website.
At this point, the only thing that will change in the website, from client to client is the content (ie. the documents, and the rows of data they'll see). If this is the case, what's the best way to manage roles across all of my clients? I'm looking for the simplest possible solution because this is a proof of concept and I don't want to invest a lot of time right now.
What if it's not just the content that changes? Maybe some clients will want a few custom static pages. At this point, is my only option replicating the entire website? I'm leery of this because it'll become hard to maintain if I get a lot of clients.
I'd appreciate any help... I just don't want to shoot myself in the foot; I'm sure someone has done this before.
I create Virtual Directories in IIS for each client, all pointed back to the same folder where my ASP.NET code resides.
This allows me to support several dozen nearly-identical "web sites," each with their own database that is basically identical in form, only differs in data.
So, my site URLs look like:
http://mysite.com/clientacme/
http://mysite.com/clientbill/
http://mysite.com/clientcharlie/
There are two key implementation details I worked out for this:
I use the Virtual Directory folder name to determine which DSN my code reads from. This is accomplished by creating a simple static method that injects the folder name into a DSN string template. If you want to use the same database to store everyone's data, you can use the folder name as a default filter in your queries.
I store the settings for each web site (headers and footers, options, links to custom reports, etc.) in a simple "settings" table in each database (key, value) rather than in the web.config (which is shared). This allows me to extend the code base over time to customize the experience for each client without forking the code.
For user authentication, I use Basic authentication, and I keep usernames, passwords, and roles in a table in each database.
The important thing is that if you use different SQL Server databases for each client's content, you need to script any changes to your database tables, indexes, etc. and apply them across all databases at the same time (after testing of course). One simple way to do this is to maintain an Excel sheet with a table of database names and a big "SQL" cell at the top. Beside each database name, create a formula to "USE databasename;" and then concat the SQL code at the top.
I'm not sure if this answers your question completely, but as far as maintaining custom "static" pages I found myself implementing a system on a client's MVC website where the client can create "Pages" from their admin control panel and each Page has a collection of "PageContent" entities which consist of a Title and and HTML content field (populated using a WYISWYG editor). Upon creating a page the MVC application maps http://yoursite.com/Page/Page-Url-Specified-By-The-User to that page and renders its content there. Obviously, the pages are dynamic, but as far as the client can tell they have created a brand new custom page with little or no effort.

Using master pages with multiple entities

I'm beginning to plan a complete redesign of our departments intranet pages. As it stands, every department gets their own folder within root. They all share the same look and feel but don't use CSS everything in each file is straight up static text. Basically, if a change has to be made to the header, every file must get changed. The number of files is somewhere in the hundreds. Since we're in the process of getting a new look & feel, I figured this would be the appropriate time to redesign the structure as a whole as well. My idea was to create a new C# Web Project to utilize the C# Master Pages. Within that project, each page would use the master page. Since I know they like to make many minor cosmetic changes, master pages would make things much simpler and quite frankly, I don't have time to manually edit a header 564 (random) times. The other aspect of this site is that the root would contain a documents folder within sub folders pertaining to each fo our departments.
Guess my question is, has anyone tackled an issue like this and could shed some light as to how they fixed it.
Also, would it be worth upgrading IIS and .NET to their latest version?
If you are already working in .NET 2.0, then you shouldn't need to do any upgrades, and there won't be any additional infrastructure cost.
I would highly recommend using Master pages, as they do make it painless to have a common look and feel for your entire site.
Another cool feature of Master pages is that you can nest them together. This would let you have a common feel between all pages. And then each department would have it's own Master page nested into your top-level page.
I'll start with the cost question. You need to be using ASP.NET 2.0 or higher to take advantage of master pages. Technically the .NET framework is free, however Visual Studio is not. Visual Web Developer is free, but the license might be for non-commercial projects only. I'm not sure. IIS is also free, Windows Server 2008 is not. You are fine running on XP or Server 2000/2003. There isn't really any reason to upgrade.
I can't say that I have tackled a problem exactly like yours, but it sounds like what you need is a content management system. Some examples are the cuyahoga project, or Umbraco. These systems allow you to create a general look and feel, and store all content in a database or xml files and provide an online content editor, so the content of the pages can be managed by people that don't necessarily know HTML or Programming.
You can have a master-page hierarchy:
Master page for everyone
Master page for department A
Master page for department B
Whatever's common for everyone you set in the first master page.
Then you make a master page for department A pages - this master page's master page is the first one.

Resources