I have some 50 pages of html which have around 100-plus rows of data in each, with all sort of CSS style, I want to read the html file and just get the data, like Name, Age, Class, Teacher. and store it in Database, but I am not able to read the html tags
e.g
space i kept to display it here
<table class="table_100">
<tr>
<td class="col_1">
<span class="txt_student">Gauri Singh</span><br>
<span class="txt_bold">13</span><br>
<span class="txt_bold">VIII</span><br>
</td>
<td class="col_2">
<span class="txt_teacher">Praveen M</span><br>
<span class="txt_bold">3494</span><br>
<span class="txt_bold">3Star</span><br>
</td>
<td class="col_3">
</td>
</tr>
</table>
For .NET you may try Html Agility Pack
You could "convert" HTML pages to XML documents with this:
HtmlDocument doc = new HtmlDocument();
doc.Load(#"..\..\your_page.htm");
doc.OptionOutputAsXml = true;
doc.Save("your_page.xml");
And then just parse a XML document.
Use Html Agility Pack. It provides an intuitive and robust .net API for parsing and otherwise toying with Html.
Related
I'm a newbie to Spring MVC. Here's my question.
I have a input view where a user can input a name and a text, which will then be displayed like this:
<tr>
<td>Name: </td>
<td>${product.name}</td>
</tr>
<tr>
<td>Text: </td>
<td>${product.text}</td>
</tr>
How can I make the two fields editable for the user? I'd like to place a link or so next to each field, that the user can use for editing entries.
Furthmore, I'd like to link the request to the same page as the input page/form, how can I distinguish the two cases in my controller? I'm using SimpleFormController...
You can use a <a href tag as follows:
${product.text}
Therefore, in your example above, you could use the following:
<tr>
<td>Name: </td>
<td>${product.name}</td>
</tr>
<tr>
<td>Text: </td>
<td>${product.text}</td>
</tr>
Your edit page link urls will have to be the correct link to that edit page.
Alternative:
However, another way to make the fields editable is to use the spring <form> tags as below:
<form:input path="name" placeholder="${product.name}" />
<form:input path="text" placeholder="${product.text}" />
How do i remove a new line from the start of a string?
what's happening is that i've been controlling and debugging someone else's php code, and converting it into asp. what he did was put html tags in a db table, and simply echoing them. ex, a field contains html table tags like <thead>,<tbody>,<tr>, etc. i didn't want to continue the wrong doing so what i did was to control them by first turning <tr>s into <br />s, and removing everything else. but problem is that the first <tr> makes a new line in the very start of the string. i want to remove it. another problem is that not all fields has htmlt tags inside, so i have to put something like if text.substring(0,1)="(idk what to put here)" then (maybe the replace or trim functions here). any help please?
here's a sample field content. pretty nasty indeed:
<table width="705" height="323" id="gradient-style" summary="Meeting Results">
<thead>
<tr>
<td width="238">Product</td>
<td width="610">TS2360 Tape<br />
Drive Express</td>
</tr>
</thead>
<tbody>
<tr>
<td>Machine/model, HVEC</td>
<td>3580<br />
S63, 3580S6X</td>
</tr>
<tr>
<td>Product strengths</td>
<td>Multi O/S<br />
Encryption & media<br />
partition capable<br />
LTFS support</td>
</tr>
</tbody>
</table>
so after making <tr>s into <br>s and removing other html tags, output beacame:
Product TS2360 Tape Drive Express
Give background color to the table cells to achieve seamless transition
Machine/model, HVEC 3580 S63, 3580S6X
Product strengths Multi O/S Encryption & media partition capable LTFS support
(supposedly skipping a line before "Product" because it has <tr> before it, but didn't show in the block quote)
Thanks in advance.
var cleanedFieldValue = someValueWithLineBreaks.TrimStart( '\n' );
The VB.NET version might look like:
Dim cleanedFieldValue = someValueWithLineBreaks.TrimStart(ControlChars.Lf)
Edit
It sounds as if you are trying to parse some Html and then do work on it. I would recommend using the Html Agility Pack for that and read about the evils of attempting to use RegEx to parse your Html.
I need to update a bit of text that's being rendered on a .aspx page. I've searched the source and DB tables, views, and stored procedures, and can't find it.
The bit of code that's generating the text looks like this:
<asp:PlaceHolder id="teamMemberTable" runat="server" />
I searched and couldn't find any references to teamMemberTable anywhere else in the code. Is it possible that the code generating that bit has been compiled into binary and doesn't exist in plaintext anymore?
Here is an example of the outputted html:
<span id="ctl00_rightContent_Repeater1_ctl01_Literal1" class="teamListName">
Team Number One
</span>
<table>
<tr>
<td class="teamListMember">Team Captian</td>
<td class="teamListPlayer">Jane Doe</td>
<td class="teamListStatus teamListStatusPaid">Paid</td>
</tr>
<tr>
<td class="teamListMember">Player 2</td>
<td class="teamListPlayer">John Q. Public</td>
<td class="teamListStatus teamListStatusNotPaid">Not Paid</td>
</tr>
</table>
Yes, it is possible that the code is in an assembly that has already been compiled and is not in plaintext. One option is to run a tool such as .NET Reflector or ILSpy and decompiling all the assemblies in the app and searching through the decompiled code to locate any references to "teamMemberTable".
Another possibility is that the control is being referenced by index instead of by name. For example, if the PlaceHolder control is in the page, it could be referenced as Page.Controls[5] and so you'd never see the name in the source code.
I want to generate a table in following fashion :
<table>
<tbody>
<tr class="folder" style="-moz-user-select: none;">
<td><div><img src="folder.png"><span>home</span></div></td>
<td class="bytes">Folder</td>
</tr>
<tr class="folder hover" style="-moz-user-select: none;">
<td><div><img src="folder.png"><span>share</span></div></td>
<td class="bytes">Folder</td>
</tr>
</tbody>
</table>
I want to add the rows from the CS code depending on the number of entries.
Instead of "adding elements to html table" you should consider using Repeater for data display, which would give you clean html (exactly as you want).
Then on each click you would do what you need to do (code behind) and rebind the repeater.
Hope that helps.
I would agree with Sebastian why not use a repeater or datalist to bind the data. What source are you using to get your data from? If your pulling the data from a SQL table here is a pretty good article on how to get you started.
http://msdn.microsoft.com/en-us/library/aa719636(v=vs.71).aspx
I'm having an html table of the format
<table>
<th>
<td> td1 </td>
<td> td2 </td>
<td> td3 </td>
<td> td4 </td>
<td> td5 </td>
<td> td6 </td>
<td> td7 </td>
<td> td8 </td>
<td> td9 </td>
<td> td10 <td>
</th>
</table>
I need to parse through the cells in each row within the table body. I looped through the row using a javascript and inorder to save the html content i'm using webmethod( because on saving, my page will reload and i will lose my html table, to avoid that i stored it on a session using webmethod and this too happens within my javascript call) . The issue is my client side script is getting skipped at times and i'm not able to save my html content. So i thought of to send the html content as a whole in one script call and do the parsing in server-side.
Now, i need to know how to parse it from server-side. Can some-body help me to parse it using xml?
I think you should try HTML Agility Pack
from CodePlex
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports
plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor
XSLT to use it, don't worry...). It is a .NET code library that allows
you to parse "out of the web" HTML files. The parser is very tolerant
with "real world" malformed HTML. The object model is very similar to
what proposes System.Xml, but for HTML documents (or streams).