We are using a CRM database which is accessed from a web console front-end. One of the merge fields retrieved from the database is the site address, which is returned as a one-line string separated by commas instead of the address format used for mailing addresses.
That is, instead of:
45 Seafield Place
FORT WILLIAM
PH33 4XJ
...it inserts:
45 Seafield Place, FORT WILLIAM, PH33 4XJ
As a proprietary product, we have no access to the configuration of the web server or the ability to modify the PHP used to generate the pages within it. The templates for customer letters are fully customisable and are simple HTML, with a #media print CSS block to control the styling when letters are printed straight from the browser (similar to Google Docs).
As I cannot control the content or make use of JavaScript/JQuery to perform text replacement, is there any way CSS can replace each comma within a class with a <br> tag?
unfortunately the only way to achieve this is using javascript.
since your server strips all javascript from the page, you have to 'trick' it and add javascript that it will not strip.
i believe it can be done by putting your javascript code into an onload attribute of your body element, this way the server will not see it as javascript but rather just an attribute and will leave it be.
<body onload="var spans = document.querySelectorAll('span');spans[0].innerHTML=spans[0].innerHTML.replace(/,/g,'<br/>');">
<span>
45 Seafield Place, FORT WILLIAM, PH33 4XJ
</span>
</body>
Live Example
UPDATE
you can even strip the javascript code yourself after executing it:
<body onload="var spans = document.querySelectorAll('span');spans[0].innerHTML=spans[0].innerHTML.replace(/,/g,'<br/>'); document.getElementsByTagName(
'body')[0].removeAttribute('onload');">
<span>
45 Seafield Place, FORT WILLIAM, PH33 4XJ
</span>
</body>
2nd Example
Related
Lets say I want to scrape the Neo4j RefCard found at: https://neo4j.com/docs/cypher-refcard/current/
And I would like to fetch a 'code' example along with its styling. Here's my target. Notice that it has CSS treatment (font, color...):
...so in Neo4j I call the apoc.load.html procedure as shown here, and you can see it's no problem finding the content:
It returns a map with three keys: tagName, attributes, and text.
The text is the issue for me. It's stripped of all styling. I would like for it to let me know more about the styling of the different parts of this text.
The actual HTML in the webpage looks like following image with all of these span class tags: cm-string, cm-node, cm-atom, etc. Note that this was not generated by Neo4j's apoc.load.html procedure. It came straight from my Chrome browser's inspect console.
I don't need the actual fonts and colors, just the tag names.
I can seen in the documentation that there is an optional config map you can supply, but there's no explanation for what can be configured there. It would be lovely if I could configure it to return, say, HTML rather than text.
The library that Neo4j uses for CSS selection here is jsoup.
So I am hoping to not strip the <span> tags, or otherwise, extract their class names for each segment of text.
Could you not generate the HTML yourself from the properties in your object? It looks they are all span tags with 3 different classes depending on whether your using the property name, property value, or property delimiter?
That is probably how they are generating the HTML themselves.
Okay, two years later I revisited this question I posted, and did find a solution. I'll keep it short.
The APOC procedure CALL apoc.load.html is using the scraping library Jsoup, which is not a full-fledged browser. When it visits a page it reads the html sent by the server but ignores any javascript. As a result, if a page uses javascript for inserting content or even just formatting the content, then Jsoup will miss the html that the javascript would have generated had it run.
So I have just tried out the service at prerender.com. It's simple to use. You send it a URL, it takes your url as an argument and fetches that page itself and executes the page's javascript as it does. It returns the final result as static HTML.
So if I just call prerender.com with apoc.load.html then the Jsoup library will simply ask for the html and this time it will get the fully rendered html. :)
You can try the following two queries and see the difference pre-rendering makes. The span tags in this page are rendered only by javascript. So if we call it asking for its span tags without pre-rendering we get nothing returned.
CALL apoc.load.html("https://neo4j.com/docs/cypher-refcard/current/", {target:".listingblock pre:contains(age: 38) span"}) YIELD value
UNWIND value.target AS spantags
RETURN spantags
...but if we call it via the prender.com website, you will get a bunch of span tags and their content.
CALL apoc.load.html("https://service.prerender.cloud/https://neo4j.com/docs/cypher-refcard/current/", {target:".listingblock pre:contains(age: 38) span"}) YIELD value
UNWIND value.target AS spantags
RETURN spantags
I need to modify html coming from external file (server side) before I render it and inject a quote 'component' like this:
This component needs to be injected after 2nd paragraph and I'm planning to use htmlagillity pack. Any examples? Is HtmlNode.InsertAfter() method good choice once I found third paragraph which should be trivial.
Another question is would it be possible to inject sitecore placeholder or even usercontrol that is going to render my quote instead of pure html? I feel it should be but not sure what would be good approach.
Thanks
I can suggest two possible approaches here:
1) Use snippets with some customisation. Snippets allow users to insert pre-defined chunks of HTML into a RTE field. You could have a pre-defined piece of HTML which might have some identifier to indicate it should use custom processing (I would suggest some data-xxx style attribute which would not conflict with any CSS or JavaScript). Then you could create a new renderField pipeline processor which would detect the data-xxx attribute within the content of a rich text field - you would use HtmlAgilityPack for this and then replace that snippet with the contents of your server-side file.
-or-
2) Split your text content into two separate chunks and have two instances of a "HtmlText" rendering within the placeholder, with a rendering for your quote text between them in the same placeholder.
I would advise that having a rule to insert text after the second paragraph would be quite 'brittle' as this would be very reliant on content editors setting the rich text field contents in quite a precise way e.g. to always ensure two or more paragraphs and to always break text with paragraphs - they might decide to use a load of line breaks instead to split their text. That said if you did do this, you would create a new renderField pipeline processor.
We did an upgrade from Tridion 5.3 to Tridion 2011 SP1.
In our existing content at so many place in RTF field we are using html element like <a name="top" id="top"></a>. When we publish component/page from tridion anchors <a> tags are getting converted to self closing anchor tags <a name="top" id="top" />. Because of this hyperlink is getting formed on entire content of RTF field, as browser is treating this tag a start tag of anchor <a>. When we check page source in FireFox it says "Self-closing syntax ("/>")" used on non-void HTML element. Ignoring the slashes and treating as a start tag. To fix this we update the existing content to <a name="top" id="top"> </a> it is working fine but not a good solution. Any other ideas/configuration, so that it will not be converted to self closing tags.
I have a similar question about this here
I have posted my work around there. Hope it helps.
I am not sure what kind of templates you are using, but generally I post process my output and look for any empty tags using an XSLT and the XSLT Mediator. When I find empty tags I tend to convert them to contain empty text to prvent any issues in the browsers viewing the final content.
<div></div> or <div/>
will get converted to
<div> </div>
Whilst the first examples are technically valid XML, they do (as you have discovered) break several browsers.
Is there a way to select all <br> tags that follow a paragraph with a given class? i.e. <p class="myclass">This is a paragraph</p><br>
There may be other <br> in the HTML so I cannot use this:
br {display:none;}
and I cannot delete all <br> tags. If there is a way to select these particular <br> tags then I can use CSS.
There are about 700 pages and I do not want to go through each of them to make sure if the <br> is needed or not. I do know that it is not needed following a paragraph with the class of "myclass".
If there is no way to select these tags then I think that I can use BBEdit to do the search and replace using a regular expression. But I don't know how to write the RE that would work.
TIA,
Linda
p.myClass+br {display:none;}
This will select all <br> elements that are directly adjacent to a <p class="myClass"> element. If you need anything more dynamic than that, you will need regex.
Assuming BBEdit is similar to TextWrangler, you could use the built in Find dialogue.
Go to Search > Find... (Command + F), do "Seach For" </p><br> and "Replace With" </p> and then use the "Multi-file search" option at the bottom of the window to choose your files.
This isn't a regex, but since you said you're using BBEdit, which is made by Bare Bones and supposedly shares a lot with TextWrangler, it should work. (Otherwise just download TW for free). It even gives you a nice pop-up telling you what it found and replaced in case you want to review, etc.
See this page for more info on BBEdit's search and other fun features.
Supposing you want to use regex to delete all <br> tags that follow a paragraph with a class named myclass:
Search for: (<p\b[^><]\sclass\s=\s*["']?myclass["']?[^><]>.?<\/p>\s*)<br\s*/?>
Replace with: $1
Note, you must ensure that all p tags in your HTML documents are properly closed.
I am using JSF to generate text and need newlines to make the text easier to read. I have an HTML version which works great, I hacked it together using <br/> (I'm not proud of that, but it works).
I would like to do the same for the plain text version such as inserting \n.
I am doing something like this:
<customTagLibrary:customTag>
<h:outputText value="Exception"/><br/><br/>
...
</customTagLibrary:customTag>
Instead of the <br/>, I want \n. What is the best way to do that?
Please keep in mind that I'm NOT using this to generate content that will be sent to the browser. This will be used to create email messages or (plain-text) attachments in emails.
Thanks,
Walter
If you use Facelets to render HTML, this did the trick for me:
<h:outputText value="
" />
Why not simply wrap it in a HTML <pre> tag?
The h: prefix means html. So if you don't want html, don't use h: tags. Create your own tags or at least renderers for h: tags and let them output \n.
But my personal opinion is that it's better to use another templating technology for emails.
I'm assuming that your template XML strips whitespace. Unfortunately, EL doesn't let you express newlines in string literals, but you could bind to a string that did (<h:outputText value="#{applicationScope.foo.newline}" />). However, since you want to serve multiple markups, this would be a less than ideal approach.
To share JSF templates between different content types, you could 1) remove all markup specific tags from the template and 2) provide RenderKits which would provide a Renderer appropriate for the current markup. This would be the way to serve content using JSF's model-view-presenter design.
You may have to make some decisions about how you handle markup-specific attributes. The default render kit is geared towards rendering the HTML concrete components. Exactly what you do depends on your goals.
I am going to simply write a newline tag. It will detect whether it should output a or a \n. In my tag library, it would look like this:
<content:newline/>
Walter