Converting tweet link so that it is embedded in asp.net app - asp.net

I want to convert a tweet url such as https://twitter.com/LindseyGrahamSC/status/320202348263780352 so that it is automagically embedded. So far the best i have come up with is to use a short code type syntax like [tweet:320202348263780352] which then makes the embdded tweet. But i'd really like to just paste the url in and have it work. I am stuck as to how achieve that. What i have so far:
var ctxTwitterContext = new TwitterContext(auth);
if (e.Location != ServingLocation.PostList && e.Location != ServingLocation.SinglePost)
return;
string regex__1 = #"\[tweet:.*?\]";
MatchCollection matches = Regex.Matches(e.Body, regex__1);
for (int i = 0; i < matches.Count; i++)
{
Int32 length = "[tweet:".Length;
string TweetID = matches[i].Value.Substring(length, matches[i].Value.Length - length - 1);
var embeddedStatus =
(from tweet in ctxTwitterContext.Status
where tweet.Type == StatusType.Oembed &&
tweet.ID == TweetID
select tweet.EmbeddedStatus)
.SingleOrDefault();
string html = embeddedStatus.Html;
e.Body = e.Body.Replace(matches[i].Value, String.Format(html));

The status ID is always the last segment of the URL. So, you can do this:
string statusUrl = "https://twitter.com/LindseyGrahamSC/status/320202348263780352";
string tweetID = statusUrl.Substring(statusUrl.LastIndexOf('/') + 1);
Unless I'm not understanding. If so, you can remove the RegEx code and the for loop.

Related

Extract text with iText not works: encoding or crypted text?

I have a pdf file that as the follow security properties: printing: allowed; document assembly: NOT allowed; content copy: allowed; content copy for accessibility: allowed; page extraction:NOT allowed;
I try to get text with sample code as documentation sample as follow:
pdftext.Text = null;
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(filename);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
text.Append(System.Environment.NewLine);
text.Append("\n Page Number:" + page);
text.Append(System.Environment.NewLine);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
progressBar1.Value++;
}
pdftext.Text += text.ToString();
pdfReader.Close();
but the output text is lines with ""??? ? ???????\n?? ??? ? " values;
seems that file is crypted or we have a encoding problem...
note that in the follow lines
var f = pdfReader.IsOpenedWithFullPermissions; -> FALSE
var f1 = pdfReader.IsEncrypted(); - > FALSE
var f2 = pdfReader.ComputeUserPassword(); - > NULL
var f3 = pdfReader.Is128Key(); - > FALSE
var f4 = pdfReader.HasUsageRights();
f, f1, f3, f4 return FALSE ...than seems that the document is not crypted,
...so I don't know if is a Encoding problem or question related to encrypet strings...
Someone can help me?
thanks in advance.
G.G.
Whenever you have trouble extracting text from a document using standard code, the first thing to do is try and copy&paste the text from it using Adobe Acrobat Reader. Adobe Reader copy&paste implements text extraction according to the recommendations of the PDF specification, and if this fails, this usually means that the necessary information required for text extraction in the document are either missing or broken (by accident or by design). To extract the text, one either needs to customize the code specifically to the specific PDF or resort to OCR.
In case of the document at hand, Adobe Reader copy&paste does result in garbage, too, just like when extracting with iText. Thus, there is something fishy in the document.
Inspecting the document one finds that the fonts contain ToUnicode mappings like this:
/CIDInit /ProcSet
findresource begin 12 dict begin begincmap /CIDSystemInfo<</Registry(Adobe)
/Ordering(Identity)
/Supplement 0
>>
def
/CMapName/F18 def
1 begincodespacerange <0000> <FFFF> endcodespacerange
44 beginbfrange
<20> <20> <0020>
<21> <21> <E0F9>
<22> <22> <E0F1>
<23> <23> <E0FA>
<24> <24> <E0F7>
<25> <25> <E0A3>
<26> <26> <E084>
<27> <27> <E097>
<28> <28> <E098>
<29> <29> <E09A>
<2A> <2A> <E08A>
<2B> <2B> <E099>
<2C> <2C> <E0A5>
<2D> <2D> <E086>
<2E> <2E> <E094>
<2F> <2F> <E0DE>
<30> <30> <E0A6>
<31> <31> <E096>
<32> <32> <E088>
<33> <33> <E082>
<34> <34> <E04C>
<35> <35> <E0A4>
<36> <36> <E0F6>
<37> <37> <E0F2>
<38> <38> <E0D8>
<39> <39> <E0AA>
<3A> <3A> <E06C>
<3B> <3B> <E087>
<3C> <3C> <E095>
<3D> <3D> <E0C4>
<3E> <3E> <E07E>
<3F> <3F> <E055>
<40> <40> <E089>
<41> <41> <E085>
<42> <42> <E083>
<43> <43> <E070>
<44> <44> <E0E6>
<45> <45> <E080>
<46> <46> <E0C8>
<47> <47> <E0F4>
<48> <48> <E062>
<49> <49> <E0F3>
<4A> <4A> <E04E>
<4B> <4B> <E05E>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end
I.e., if you are not into this, the fonts claim that all their glyphs (with the exception of the space glyph at 0x20) represent characters U+E0xx from the Unicode private use area. As the name of that area indicates, there is no common meaning of characters with these values.
Thus, text extraction according to the PDF specification will return strings of characters with undefined meaning with results as you observed in iText or I saw in Adobe Reader.
Sometimes in such a situation one can still enforce proper text extraction by ignoring the ToUnicode map and using either the font Encoding or information inside the embedded font program.
Unfortunately it turns out that here the Encoding effectively contains the same information as does the ToUnicode map, e.g. for the same font as above
/Differences [ 32 /space /uniE0F9 /uniE0F1 /uniE0FA /uniE0F7 /uniE0A3 /uniE084 /uniE097 /uniE098
/uniE09A /uniE08A /uniE099 /uniE0A5 /uniE086 /uniE094 /uniE0DE /uniE0A6 /uniE096
/uniE088 /uniE082 /uniE04C /uniE0A4 /uniE0F6 /uniE0F2 /uniE0D8 /uniE0AA /uniE06C
/uniE087 /uniE095 /uniE0C4 /uniE07E /uniE055 /uniE089 /uniE085 /uniE083 /uniE070
/uniE0E6 /uniE080 /uniE0C8 /uniE0F4 /uniE062 /uniE0F3 /uniE04E /uniE05E ]
and the fonts turns out to be Type3 fonts, i.e. there is no embedded font program but each glyph is defined as an individual PDF canvas without further character information.
Thus, nothing to gain here either.
Actually these small PDF canvasses contain inlined bitmap graphics of the respective glyph which also is the cause of the poor graphical quality of the document (if you don't see that immediately, simply zoom in a bit and you'll see the ragged outlines of the glyphs).
By the way, such a construct usually means that the producer of the PDF explicitly wants to prevent text extraction.
If you happen to have to extract text from many such documents, you can try and determine a mapping from their U+E0xx characters to actually sensible Unicode characters and apply that mapping to your extracted text.
If all those fonts in all those documents happen to use the same U+E0xx codepoints for the same actual characters, you'll be able to do text extraction from those documents after investing a certain amount of initial work.
Otherwise do try OCR.
The following code adds pages to a document which map the ToUnicode values to the characters shown:
void AddFontsTo(PdfReader reader, PdfStamper stamper)
{
int documentPages = reader.NumberOfPages;
for (int page = 1; page <= documentPages; page++)
{
// ignore inherited resources for now
PdfDictionary pageResources = reader.GetPageResources(page);
if (pageResources == null)
continue;
PdfDictionary pageFonts = pageResources.GetAsDict(PdfName.FONT);
if (pageFonts == null || pageFonts.Size == 0)
continue;
List<BaseFont> fonts = new List<BaseFont>();
List<string> fontNames = new List<string>();
HashSet<char> chars = new HashSet<char>();
foreach (PdfName key in pageFonts.Keys)
{
PdfIndirectReference fontReference = pageFonts.GetAsIndirectObject(key);
if (fontReference == null)
continue;
DocumentFont font = (DocumentFont) BaseFont.CreateFont((PRIndirectReference)fontReference);
if (font == null)
continue;
PdfObject toUni = PdfReader.GetPdfObjectRelease(font.FontDictionary.Get(PdfName.TOUNICODE));
CMapToUnicode toUnicodeCmap = null;
if (toUni is PRStream)
{
try
{
byte[] touni = PdfReader.GetStreamBytes((PRStream)toUni);
CidLocationFromByte lb = new CidLocationFromByte(touni);
toUnicodeCmap = new CMapToUnicode();
CMapParserEx.ParseCid("", toUnicodeCmap, lb);
}
catch
{
toUnicodeCmap = null;
}
}
if (toUnicodeCmap == null)
continue;
ICollection<int> mapValues = toUnicodeCmap.CreateDirectMapping().Values;
if (mapValues.Count == 0)
continue;
fonts.Add(font);
fontNames.Add(key.ToString());
foreach (int value in mapValues)
chars.Add((char)value);
}
if (fonts.Count == 0 || chars.Count == 0)
continue;
Rectangle size = (fonts.Count > 10) ? PageSize.A4.Rotate() : PageSize.A4;
PdfPTable table = new PdfPTable(fonts.Count + 1);
table.AddCell("Page " + page);
foreach (String name in fontNames)
{
table.AddCell(name);
}
table.HeaderRows = 1;
float[] widths = new float[fonts.Count + 1];
widths[0] = 2;
for (int i = 1; i <= fonts.Count; i++)
widths[i] = 1;
table.SetWidths(widths);
table.WidthPercentage = 100;
List<char> charList = new List<char>(chars);
charList.Sort();
foreach (char character in charList)
{
table.AddCell(((int)character).ToString("X4"));
foreach (BaseFont font in fonts)
{
table.AddCell(new PdfPCell(new Phrase(character.ToString(), new Font(font))));
}
}
stamper.InsertPage(reader.NumberOfPages + 1, size);
ColumnText columnText = new ColumnText(stamper.GetUnderContent(reader.NumberOfPages));
columnText.AddElement(table);
columnText.SetSimpleColumn(size);
while ((ColumnText.NO_MORE_TEXT & columnText.Go(false)) == 0)
{
stamper.InsertPage(reader.NumberOfPages + 1, size);
columnText.Canvas = stamper.GetUnderContent(reader.NumberOfPages);
columnText.SetSimpleColumn(size);
}
}
}
I applied it to your document like this:
string input = #"4700198773.pdf";
string output = #"4700198773-fonts.pdf";
using (PdfReader reader = new PdfReader(input))
using (FileStream stream = new FileStream(output, FileMode.Create, FileAccess.Write))
using (PdfStamper stamper = new PdfStamper(reader, stream))
{
AddFontsTo(reader, stamper);
}
The additional pages look like this:
Now you have to compare the outputs for the different fonts and pages of this document with each other and with those of a representative selection of file. If you find good enough a pattern, you can try this replacement way.

How to get the last two sections of a URL

When the URL is: http://www.example.com/services/product/Software.aspx , I need: "product/Software.aspx",
So far I just tried the below code :
string[] SplitUrls = Request.RawURL.Split('/');
string CategorynQuery = SplitUrls[SplitUrls.Length - 2]
+ SplitUrls[SplitUrls.Length - 1];
However, is there some other way to do this using functions IndexOf(), LastIndexOf() etc.. or any other Function? Or any possibility using Substring method ?
Please note that the above URL is just an example, there are around 100 such URls and I need the Last 2 sections for each.
Try this, using the LastIndexOf, and Substring.
string str = "http://www.example.com/services/product/Software.aspx";
int lastIndexOfBackSlash = str.LastIndexOf('/');
int secondLastIndex = lastIndexOfBackSlash > 0 ? str.LastIndexOf('/', lastIndexOfBackSlash - 1) : -1;
string result = str.Substring(secondLastIndex, str.Length - secondLastIndex);
I am also checking the presence when getting the second last index - obviously you can alter this depending on your requirements :)
You can use Uri class:
Uri uri = new Uri("http://myUrl/%2E%2E/%2E%2E");
uri.AbsoluteUri;
uri.PathAndQuery;
Not too efficient but a little more elegant:
string url = "http://www.example.com/services/product/Software.aspx";
var splitted = url.Split('/').Reverse().Take(2).Reverse().ToList();
var str = string.Format("{0}/{1}", splitted[0], splitted[1]);

Duplicate values in multiple textbox

How to validate whether the text in multiple textboxes are unique from each other.
It looks like that in asp.net but its not a valid syntax
bool hasNoDuplicate = (txtEmergencyName1.Text.Trim() <> txtEmergencyName2.Text.Trim() <> txtEmergencyName3.Text.Trim <> txtEmergencyName4.Text.Trim);
I am looking for an efficient appraoch, kind of lambda expression or inbuilt in asp.net
Since you're asking for lambda, here's a linq approach.
var allTxt = new[] { txtEmergencyName1, txtEmergencyName2, txtEmergencyName3, txtEmergencyName4 };
var allText = allTxt.Select((txt, i) => new { Text = txt.Text.Trim(), Pos = i + 1 }).ToList();
bool hasNoDuplicate = !allText.Any(t => allText.Skip(t.Pos).Any(t2 => t.Text == t2.Text));
Put all relevant TextBoxes in a collection like an array and use Enumerable.Any. By skipping all before the current textbox you avoid checking the TextBoxes twice.
If all relevant TextBoxes are in a container control like a Panel, you could also use Enumerable.OfType to find them:
IEnumerable<TextBox> allTxt = this.EmergencyPanel.Controls.OfType<TextBox>();
Side-note: it's premature optimization anyway to look for the most performant way to validate some controls. This is nothing what you are doing constantly and there are never millions of controls. Instead you should look for the shortest or most readable approach.
You can use and && or or || operator accordingly
bool isDuplicate=(txtEmergencyName1.Text.Trim() == txtEmergencyName2.Text.Trim()
&& txtEmergencyName2.Text.Trim() == txtEmergencyName3.Text.Trim);
it will set true or false in the isDuplicate variable.
Edit 1
bool isDuplicate=(txtEmergencyName1.Text.Trim() == txtEmergencyName2.Text.Trim()
&& txtEmergencyName2.Text.Trim() == txtEmergencyName3.Text.Trim
&& txtEmergencyName1.Text.Trim() == txtEmergencyName3.Text.Trim
);
You could also do something like
var test = new TextBox();
var AlltBox = new List<TextBox>() { new TextBox() };
for(int i=1; i == 5;i++)
AlltBox.Add((TextBox)this.FindName("txtEmergencyName"+i));
bool exist = AlltBox.Any(tb => ((tb.Text == test.Text)&& tb.Name != test.Name));
but i don't know about the performance

CKEditor to Send Emails with ASP.NET [vb] - Issues with Special Characters

I have a standard HTML page with an CKEditor on it wrapped in a form.
The form submits (POSTS) to Send_Emails.aspx
Send_Emails.aspx reads the content of the FCKEditor into a variable
Dim html As String = Request.Form("ck_content")
Then it sends an email.
Problem
Characters such as:
 -> this seems to show as a special character for blank spaces/carriage returns
’ -> this seems to show as apostrophe's
Can you reccomend some methods to cleanze my post data of these non-standard characters?
Thanks
I figured out how to strip unwanted characters by using this function:
function removeMSWordChars(str) {
var myReplacements = new Array();
var myCode, intReplacement;
myReplacements[8216] = 39;
myReplacements[8217] = 39;
myReplacements[8220] = 34;
myReplacements[8221] = 34;
myReplacements[8212] = 45;
for(c=0; c<str.length; c++) {
var myCode = str.charCodeAt(c);
if(myReplacements[myCode] != undefined) {
intReplacement = myReplacements[myCode];
str = str.substr(0,c) + String.fromCharCode(intReplacement) + str.substr(c+1);
}
}
return str;
}

ASP.NET StreamWriter - new line after x commas

I've got a JS array which is writing to a text file on the server using StreamWriter. This is the line that does it:
sw.WriteLine(Request.Form["seatsArray"]);
At the moment one line is being written out with the entire contents of the array on it. I want a new line to be written after every 5 commas. Example array:
BN,ST,A1,303,601,BN,ST,A2,303,621,BN,WC,A3,303,641,
Should output:
BN,ST,A1,303,601,
BN,ST,A2,303,621,
BN,WC,A3,303,641,
I know I could use a string replace but I only know how to make this output a new line after every comma, and not after a specified amount of commas.
How can I get this to happen?
Thanks!
Well, here's the simplest answer I can think of:
string[] bits = Request.Form["seatsArray"].Split(',');
for (int i = 0; i < bits.Length; i++)
{
sw.Write(bits[i]);
sw.Write(",");
if (i % 5 == 4)
{
sw.WriteLine();
}
}
It's not terribly elegant, but it'll get the job done, I believe.
You may want this afterwards to finish off the current line, if necessary:
if (bits[i].Length % 5 != 0)
{
sw.WriteLine();
}
I'm sure there are cleverer ways... but this is simple.
One question: are the values always three characters long? Because if so, you're basically just breaking the string up every 20 characters...
Something like:
var input = "BN,ST,A1,303,601,BN,ST,A2,303,621,BN,WC,A3,303,641,";
var splitted = input.Split(',');
var cols = 5;
var rows = splitted.Length / cols;
var arr = new string[rows, cols];
for (int row = 0; row < rows; row++)
for (int col = 0; col < cols; col++)
arr[row, col] = splitted[row * cols + col];
I will try find a more elegant solution. Properly with some functional-style over it.
Update: Just find out it is not actually what you needs. With this you get a 2D array with 3 rows and 5 columns.
This however will give you 3 lines. They do not have a ending ','. Do you want that? Do you always want to print it out? Or do you want to have access to the different lines?:
var splitted = input.Split(new [] { ','}, StringSplitOptions.RemoveEmptyEntries);
var lines = from item in splitted.Select((part, i) => new { part, i })
group item by item.i / 5 into g
select string.Join(",", g.Select(a => a.part));
Or by this rather large code. But I have often needed a "Chunk" method so it may be reusable. I do not know whether there is a build-in "Chunk" method - couldn't find it.
public static class LinqExtensions
{
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
int i = 0;
var curr = new List<T>();
foreach (var x in xs)
{
curr.Add(x);
if (++i % size == 0)
{
yield return curr;
curr = new List<T>();
}
}
}
}
Usage:
var lines = input.Split(',').Chunks(5).Select(list => string.Join(",", list));

Resources