Capture text from an Aspx page - asp.net

I am trying to come up with a neat solution to create automated json schema markup on my aspx pages. The markup in question is FAQPage, but that's irrelevant.
I decided that I needed to scrape the content of the current page to find questions and answers. After a few false starts I came across the HtmlAgilityPack plugin which enables me to achieve what I want, but I've come across some issues.
The HtmlAgililtyPack parser can be initiated in a number of ways, but the only one I could get to work for me and my scenario (scrape current page) was to feed in a string.
First, I created an asp ID with a runat="server" tag.
To get the string, I used HTMLTextWriter; here's the code:
static string ConvertControlToString(Control ctl)
{
string s = null;
var sw = new StringWriter();
using (var w = new HtmlTextWriter(sw))
{
ctl.RenderControl(w);
s = sw.ToString();
}
return s;
}
Now, all that works fine - in most cases.
However, I'm running into edge cases where I use scriptmanager and updatepanels. I suspect there will be more. The error is: ... must be inside a form control with a runat="server". Of course it is but the rendercontrol doesn't realise it.
So, two questions:
Is there a way to feed HtmlAgilityPack parser in another way that doesn't
require a string (and that won't loop)?
Is there a better way to scrape the text other than Control.RenderControl() that won't cause errors?
Incidentally, I've found a solution to the problem I'm having but it involves manipulating each affected page, and that's not great.
So, thought I'd throw it out there and see if there are better workarounds or a better solution.

You can load HTML in a few different ways but ultimately HTML is a string so this is what the parser will operate on. I'm not sure what you mean about looping.
Rather than rendering controls as HTML and then parsing them it might be better to let the entire page load and parse it after it has rendered, this allows your javascript/updatepanels to finish transforming the page before you parse the HTML.
The LoadFromBrowser method (I believe) loads the specified url in a headless browser, allows any javascript to run and then parses the resulting HTML: https://html-agility-pack.net/from-browser
If you need to attach authentication credentials there is a question addressing that here: HtmlAgilityPack and Authentication
Alternatively (keeping your existing code) you might try instantiating a new HtmlControl with the tag "form", adding the the control passed in to ConvertControlToString to it and then parsing that which may avoid your error. You may need to check the control doesn't already have a form tag, this approach doesn't address javascript/update panels and I'm not 100% sure it would work.
HtmlGenericControl form = new HtmlGenericControl("form");
Control ctl = new Control();
form.Controls.Add(ctl);
string s = string.Empty;
var sw = new System.IO.StringWriter();
using (var w = new HtmlTextWriter(sw))
{
form.RenderControl(w);
s = sw.ToString();
}

Related

Programmatically Rendering an Umbraco Node

I'm using Umbraco 4.5.2 and I have a node with a number of child nodes. Each child node represents a fragment of HTML that will be rendered in a control. The control loops over all the child nodes and renders them.
For the moment I have a bit of a dirty hack going in order to get the thing going (still fairly new to Umbraco) but I'd rather do this better.
The code I have at the moment looks like this:
private string GetItemHtml(Node node)
{
// Work out the URL of the HTML fragment
string url = "http://" + Context.Request.Url.Host +
":" + Context.Request.Url.Port +
node.Url;
// Get the fragment by making a call to the page
WebRequest req = WebRequest.Create(url);
WebResponse res = req.GetResponse();
using (Stream stream = res.GetResponseStream())
{
StreamReader reader = new StreamReader(stream);
string result = reader.ReadToEnd();
return result;
}
}
As you can see, it is really rather ugly. I'm hoping there is some way to get this without having to make many HTTP calls, even if it is looping back to the same server - it can't be very efficient.
You can use the API to achieve what you are asking, try looking at the umbraco.library.RenderTemplate method. It accepts two parameters, the first is the id of the node to render and the second is the id of the template to use when rendering the node.
This is probably much easier to build using xslt in umbraco. If you want to do something that is not possible in xslt, you can create a XSLT extension function (implemented in C#, called from XSLT) to do that (see http://en.wikibooks.org/wiki/Umbraco/Create_xslt_exstension_like_umbraco.Library_in_C for more info).
For a sample XSLT that list child pages, see BlogListPosts.xslt in the umbraco blog package:
http://blog4umbraco.codeplex.com/SourceControl/changeset/view/54177#916032

How to pass data between pages without sessions in ASP.net MVC

I have one application in which I want to pass data between Pages (Views) without sessions. Actually I want to apply some settings to all the pages using query string.
For example if my link is like "http://example.com?data=test1", then I want to append this query string to all the link there after and if there is no query string then normal flow.
I was thinking if there is any way that if we get the query string in any link for the web application then some application level user specific property can be set which can be used for subsequent pages.
Thanks,
Ashwani
You can get the query string using the
Request.Url.Query
and on your links to the other page you can send it.
Here is an idea of how you can find and change your page:
public abstract class BasePage : System.Web.UI.Page
{
protected override void Render(System.Web.UI.HtmlTextWriter writer)
{
System.IO.StringWriter stringWriter = new System.IO.StringWriter();
HtmlTextWriter htmlWriter = new HtmlTextWriter(stringWriter);
// now you render the page on this buffer
base.Render(htmlWriter);
// get the buffer on a string
string html = stringWriter.ToString();
// manipulate your string html, and search all your links (hope full find only the links)
// this is a simple example of replace, THAT PROBABLY not work and need fix
html = html.Replace(".aspx", ".aspx?" + Request.Url.Query);
writer.Write(html);
}
}
I do not suggest it how ever, and I think that you must find some other way to avoid to manipulate all your links...
I don't undestand what kind of data are you trying to pass. Because it sounds weird to me the idea of trapping all links.
Anyway, I believe you may find the class TempData usefull for passing data between redirects.
And a final warning, be carefull about TempData, it has changed a little between MVC 1 and 2:
ASPNET MVC2: TempData Now Persists

ASP.NET HTML control with clean output?

I am developing an ASP.NET web application at work, and I keep running into the same issue:
Sometimes I want to write HTML to the page from the code-behind. For example, I have a page, Editor.aspx, with output which varies depending on the value of a GET variable, "view." In other words, "Editor.aspx?view=apples" outputs different HTML than "Editor.aspx?view=oranges".
I currently output this HTML with StringBuilder. For example, if I wanted to create a list, I might use the following syntax:
myStringBuilder.AppendLine("<ul id=\"editor-menu\">");
myStringBuilder.AppendLine("<li>List Item 1</li>");
myStringBuilder.AppendLine("</ul>");
The problem is that I would rather use ASP.NET's List control for this purpose, because several lines of StringBuilder syntax hamper my code's readability. However, ASP.NET controls tend to output convoluted HTML, with unique ID's, inline styles, and the occasional block of inline JavaScript.
My question is, is there an ASP.NET control which merely represents a generic HTML tag? In other words, is there a control like the following example:
HTMLTagControl list = new HTMLTagControl("ul");
list.ID = "editor-menu";
HTMLTagControl listItem = new HTMLTagControl("li");
listItem.Text = "List Item 1";
list.AppendChild(listItem);
I know there are wrappers and the such, but instead of taming ASP.NET complexity, I would rather start with simplicity from the get-go.
is there an ASP.NET control which
merely represents a generic HTML tag?
Yes, it's called the HtmlGenericControl :)
As far as exactly what you want no, but you can get around it easily:
HtmlGenericControl list = new HtmlGenericControl("ul");
list.ID = "editor-menu";
HtmlGenericControl listItem = new HtmlGenericControl("li");
listItem.InnerText = "List Item 1";
list.Controls.Add(listItem);
If you really need to get down to bare metal then you should use the HtmlTextWriter class instead of a StringBuilder as it is more custom tailored to pumping out raw HTML.
If you want to just assign the results of your stringbuilder to a blank control, you can use an <asp:Literal /> control for this.
LiteralControl has constractor that you can pass your html...
i think it is better.
new LiteralControl(sb.ToString());

Multiple user controls and javascript

I include a JS file in a user control. The host page has multiple instances of the user control.
The JS file has a global variable that is used as a flag for a JS function. I need the scope of this variable be restricted to the user control. Unfortunately, when I have multiple instances of the control, the variable value is overwritten.
What's the recommended approach in a situation like this?
Some options are to dynamically generate the javascript based on the ClientId of the User Control. You could dynamically generate the global variable for example.
Another option and one I would recommend is to encapsulate the global variable and function within an object, then your user control can emit the JS to create an instance of that object (Which can be dynamically named thus letting you scope the object as you see fit).
Edit
I don't have a working code sample that I can share but, I have done this in a couple different ways. the easiest method is to do this in the markup of your user control.
<script language='javascript'>
var <%=this.ClientID%>myObject=new myObject();
</script>
Assuming your control has a clientId of myControl this will create a variable myControlmyObject.
Another way to do this would be to generate the script in the code behind you could register it using: Page.ClientScript.RegisterStartupScript().
I would recommend refactoring your code such that all the common JS logic is stored in one place, not in every UserControl. This will reduce the size of your page by a good margin.
You can pass in the id of the UserControl to the common JS method(s) to differentiate between the UserControls.
For the issue of limiting the scope of your 'UserControl' variable, you could store some sort of a Key/Value structure to keep your UserControl-specific value - the Key would be the UserControl clientID, and the value would be the variable that you're interested in.
For example:
var UCFlags = new Object();
//set the flag for UserControl1:
UCFlags["UC1"] = true;
//set the flag for UserControl2:
UCFlags["UC2"] = false;
To access them, you simply pass the ClientID of the UserControl in to the UCFlags array:
myFlag = UCFlags["UC1"];
On the server-side, you can replace the constant strings "UC1" or "UC2" with
<%= this.ClientID %>
like this:
myFlag = UCFlags["<%= this.ClientID %>"];
You can still use the <%= this.ClientID %> syntax here even though the bulk of the JS is in a separate file; simply set
UCFlags["<%= this.ClientID %>"] = value;
before the call to embed the JS file.
Well, if you have to keep with the current solution, you could rename your global variable to something like the following code, which should be in the .ascx file for your control:
<script type='text/javascript'>
var <%= this.ClientID %>_name_of_global_variable;
</script>
Where "this" is the asp.net control. That way, each control has a unique variable name, based off the client id. Make sure you update the rest of your javascript to use this new naming convention. The problem, it looks messy, and the variable names will become very long depending on where the control is embedded in the page.
Does that make sense? It should take minimal javascript modification to get it working.
I ran into same issue and below blog post solved it. Solution is to take Object oriented way for javaScript
Adding multiple .NET User Controls that use JavaScript to the same page

ASP.NET Localized web site -- updating on the fly

I think I have a solution to this, but is there a better way, or is this going to break on me?
I am constructing a localized web site using global/local resx files. It is a requirement that non-technical users can edit the strings and add new languages through the web app.
This seems easy enough -- I have a form to display strings and the changes are saved with code like this snippet:
string filename = MapPath("App_GlobalResources/strings.hu.resx");
XmlDocument xDoc = new XmlDocument();
XmlNode xNode;
xDoc.Load(filename);
xNode = xDoc.SelectSingleNode("//root/data[#name='PageTitle']/value");
xNode.InnerText = txtNewTitle.Text;
xDoc.Save(filename);
Is this going to cause problems on a busy site? If it causes a momentary delay for recompilation, that's no big deal. And realistically, this form won't see constant, heavy use. What does the community think?
I've used a similar method before for a very basic "CMS". The site wasn't massively used but it didn't cause me any problems.
I don't think changing a resx will cause a recycle.
We did something similar, but used a database to store the user modified values. We then provided a fallback mechanism to serve the overridden value of a localized key.
That said, I think your method should work fine.
Have you considered creating a Resource object? You would need to wrap your settings into a single object that all the client code would use. Something like:
public class GuiResources
{
public string PageTitle
{
get return _pageTitle;
}
// Fired once when the class is first created.
void LoadConfiguration()
{
// Load settings from config section
_pageTitle = // Value from config
}
}
You could make it a singleton or a provider, that way the object is loaded only one time. Also you could make it smart to look at the current thread to get the culture info so you know what language to return.
Then in your web.config file you can create a custom section and set restartOnExternalChanges="true". That way, your app will get the changed when they are made.

Resources