parsing a set of input files by javacc - javacc

I can use javacc to parse a single file:
BufferedReader br = new BufferedReader(new FileReader(
pathFile));
if (parser == null)
parser = new MaNouvGrammaire(br);
else
MaNouvGrammaire.ReInit(br);
my question is: "How I can parse several input files?"

Here is how you do it if you use the STATIC=false option.
Use a loop:
while( <there are more files> ) {
File pathFile = <next file> ;
BufferedReader br = new BufferedReader(new FileReader(pathFile));
MaNouvGrammaire parser = new MaNouvGrammaire(br);
paser.start() ; // or whatever your start nonterminal is
}
With the STATIC = true option, I think it is something like what you had
while( <there are more files> ) {
File pathFile = <next file> ;
BufferedReader br = new BufferedReader(new FileReader(pathFile));
if (parser == null)
parser = new MaNouvGrammaire(br);
else
MaNouvGrammaire.ReInit(br);
parser.start() ; // or whatever your start nonterminal is
}
In addition, make sure that parser is declared as a static member of the class initialized to null.
Both should work. Nonstatic parsers are slightly more straightforward to initialize, as seen above. Also non static parsers work with multithreaded uses and recursive uses (e.g. recursively reading include files).

Related

In Kotlin, how do I read the entire contents of an InputStream into a String?

I recently saw code for reading entire contents of an InputStream into a String in Kotlin, such as:
// input is of type InputStream
val baos = ByteArrayOutputStream()
input.use { it.copyTo(baos) }
val inputAsString = baos.toString()
And also:
val reader = BufferedReader(InputStreamReader(input))
try {
val results = StringBuilder()
while (true) {
val line = reader.readLine()
if (line == null) break
results.append(line)
}
val inputAsString = results.toString()
} finally {
reader.close()
}
And even this that looks smoother since it auto-closes the InputStream:
val inputString = BufferedReader(InputStreamReader(input)).useLines { lines ->
val results = StringBuilder()
lines.forEach { results.append(it) }
results.toString()
}
Or slight variation on that one:
val results = StringBuilder()
BufferedReader(InputStreamReader(input)).forEachLine { results.append(it) }
val resultsAsString = results.toString()
Then this functional fold thingy:
val inputString = input.bufferedReader().useLines { lines ->
lines.fold(StringBuilder()) { buff, line -> buff.append(line) }.toString()
}
Or a bad variation which doesn't close the InputStream:
val inputString = BufferedReader(InputStreamReader(input))
.lineSequence()
.fold(StringBuilder()) { buff, line -> buff.append(line) }
.toString()
But they are all clunky and I keep finding newer and different versions of the same... and some of them never even close the InputStream. What is a non-clunky (idiomatic) way to read the InputStream?
Note: this question is intentionally written and answered by the author (Self-Answered Questions), so that the idiomatic answers to commonly asked Kotlin topics are present in SO.
Kotlin has a specific extension just for this purpose.
The simplest:
val inputAsString = input.bufferedReader().use { it.readText() } // defaults to UTF-8
And in this example, you could decide between bufferedReader() or just reader(). The call to the function Closeable.use() will automatically close the input at the end of the lambda's execution.
Further reading:
If you do this type of thing a lot, you could write this as an extension function:
fun InputStream.readTextAndClose(charset: Charset = Charsets.UTF_8): String {
return this.bufferedReader(charset).use { it.readText() }
}
Which you could then call easily as:
val inputAsString = input.readTextAndClose() // defaults to UTF-8
On a side note, all Kotlin extension functions that require knowing the charset already default to UTF-8, so if you require a different encoding you need to adjust the code above in calls to include encoding for reader(charset) or bufferedReader(charset).
Warning: You might see examples that are shorter:
val inputAsString = input.reader().readText()
But these do not close the stream. Make sure you check the API documentation for all of the IO functions you use to be sure which ones close and which do not. Usually, if they include the word use (such as useLines() or use()) they close the stream after. An exception is that File.readText() differs from Reader.readText() in that the former does not leave anything open and the latter does indeed require an explicit close.
See also: Kotlin IO related extension functions
【Method 1 | Manually Close Stream】
private fun getFileText(uri: Uri):String {
val inputStream = contentResolver.openInputStream(uri)!!
val bytes = inputStream.readBytes() //see below
val text = String(bytes, StandardCharsets.UTF_8) //specify charset
inputStream.close()
return text
}
inputStream.readBytes() requires manually close the stream: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io/java.io.-input-stream/read-bytes.html
【Method 2 | Automatically Close Stream】
private fun getFileText(uri: Uri): String {
return contentResolver.openInputStream(uri)!!.bufferedReader().use {it.readText() }
}
You can specify the charset inside bufferedReader(), default is UTF-8:
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io/java.io.-input-stream/buffered-reader.html
bufferedReader() is an upgrade version of reader(), it is more versatile:
How exactly does bufferedReader() work in Kotlin?
use() can automatically close the stream when the block is done:
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io/use.html
An example that reads contents of an InputStream to a String
import java.io.File
import java.io.InputStream
import java.nio.charset.Charset
fun main(args: Array<String>) {
val file = File("input"+File.separator+"contents.txt")
var ins:InputStream = file.inputStream()
var content = ins.readBytes().toString(Charset.defaultCharset())
println(content)
}
For Reference - Kotlin Read File
Quick solution works well when converting InputStream to string.
val convertedInputStream = String(inputStream.readAllBytes(), StandardCharsets.UTF_8)

Extract text with iText not works: encoding or crypted text?

I have a pdf file that as the follow security properties: printing: allowed; document assembly: NOT allowed; content copy: allowed; content copy for accessibility: allowed; page extraction:NOT allowed;
I try to get text with sample code as documentation sample as follow:
pdftext.Text = null;
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(filename);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
text.Append(System.Environment.NewLine);
text.Append("\n Page Number:" + page);
text.Append(System.Environment.NewLine);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
progressBar1.Value++;
}
pdftext.Text += text.ToString();
pdfReader.Close();
but the output text is lines with ""??? ? ???????\n?? ??? ? " values;
seems that file is crypted or we have a encoding problem...
note that in the follow lines
var f = pdfReader.IsOpenedWithFullPermissions; -> FALSE
var f1 = pdfReader.IsEncrypted(); - > FALSE
var f2 = pdfReader.ComputeUserPassword(); - > NULL
var f3 = pdfReader.Is128Key(); - > FALSE
var f4 = pdfReader.HasUsageRights();
f, f1, f3, f4 return FALSE ...than seems that the document is not crypted,
...so I don't know if is a Encoding problem or question related to encrypet strings...
Someone can help me?
thanks in advance.
G.G.
Whenever you have trouble extracting text from a document using standard code, the first thing to do is try and copy&paste the text from it using Adobe Acrobat Reader. Adobe Reader copy&paste implements text extraction according to the recommendations of the PDF specification, and if this fails, this usually means that the necessary information required for text extraction in the document are either missing or broken (by accident or by design). To extract the text, one either needs to customize the code specifically to the specific PDF or resort to OCR.
In case of the document at hand, Adobe Reader copy&paste does result in garbage, too, just like when extracting with iText. Thus, there is something fishy in the document.
Inspecting the document one finds that the fonts contain ToUnicode mappings like this:
/CIDInit /ProcSet
findresource begin 12 dict begin begincmap /CIDSystemInfo<</Registry(Adobe)
/Ordering(Identity)
/Supplement 0
>>
def
/CMapName/F18 def
1 begincodespacerange <0000> <FFFF> endcodespacerange
44 beginbfrange
<20> <20> <0020>
<21> <21> <E0F9>
<22> <22> <E0F1>
<23> <23> <E0FA>
<24> <24> <E0F7>
<25> <25> <E0A3>
<26> <26> <E084>
<27> <27> <E097>
<28> <28> <E098>
<29> <29> <E09A>
<2A> <2A> <E08A>
<2B> <2B> <E099>
<2C> <2C> <E0A5>
<2D> <2D> <E086>
<2E> <2E> <E094>
<2F> <2F> <E0DE>
<30> <30> <E0A6>
<31> <31> <E096>
<32> <32> <E088>
<33> <33> <E082>
<34> <34> <E04C>
<35> <35> <E0A4>
<36> <36> <E0F6>
<37> <37> <E0F2>
<38> <38> <E0D8>
<39> <39> <E0AA>
<3A> <3A> <E06C>
<3B> <3B> <E087>
<3C> <3C> <E095>
<3D> <3D> <E0C4>
<3E> <3E> <E07E>
<3F> <3F> <E055>
<40> <40> <E089>
<41> <41> <E085>
<42> <42> <E083>
<43> <43> <E070>
<44> <44> <E0E6>
<45> <45> <E080>
<46> <46> <E0C8>
<47> <47> <E0F4>
<48> <48> <E062>
<49> <49> <E0F3>
<4A> <4A> <E04E>
<4B> <4B> <E05E>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end
I.e., if you are not into this, the fonts claim that all their glyphs (with the exception of the space glyph at 0x20) represent characters U+E0xx from the Unicode private use area. As the name of that area indicates, there is no common meaning of characters with these values.
Thus, text extraction according to the PDF specification will return strings of characters with undefined meaning with results as you observed in iText or I saw in Adobe Reader.
Sometimes in such a situation one can still enforce proper text extraction by ignoring the ToUnicode map and using either the font Encoding or information inside the embedded font program.
Unfortunately it turns out that here the Encoding effectively contains the same information as does the ToUnicode map, e.g. for the same font as above
/Differences [ 32 /space /uniE0F9 /uniE0F1 /uniE0FA /uniE0F7 /uniE0A3 /uniE084 /uniE097 /uniE098
/uniE09A /uniE08A /uniE099 /uniE0A5 /uniE086 /uniE094 /uniE0DE /uniE0A6 /uniE096
/uniE088 /uniE082 /uniE04C /uniE0A4 /uniE0F6 /uniE0F2 /uniE0D8 /uniE0AA /uniE06C
/uniE087 /uniE095 /uniE0C4 /uniE07E /uniE055 /uniE089 /uniE085 /uniE083 /uniE070
/uniE0E6 /uniE080 /uniE0C8 /uniE0F4 /uniE062 /uniE0F3 /uniE04E /uniE05E ]
and the fonts turns out to be Type3 fonts, i.e. there is no embedded font program but each glyph is defined as an individual PDF canvas without further character information.
Thus, nothing to gain here either.
Actually these small PDF canvasses contain inlined bitmap graphics of the respective glyph which also is the cause of the poor graphical quality of the document (if you don't see that immediately, simply zoom in a bit and you'll see the ragged outlines of the glyphs).
By the way, such a construct usually means that the producer of the PDF explicitly wants to prevent text extraction.
If you happen to have to extract text from many such documents, you can try and determine a mapping from their U+E0xx characters to actually sensible Unicode characters and apply that mapping to your extracted text.
If all those fonts in all those documents happen to use the same U+E0xx codepoints for the same actual characters, you'll be able to do text extraction from those documents after investing a certain amount of initial work.
Otherwise do try OCR.
The following code adds pages to a document which map the ToUnicode values to the characters shown:
void AddFontsTo(PdfReader reader, PdfStamper stamper)
{
int documentPages = reader.NumberOfPages;
for (int page = 1; page <= documentPages; page++)
{
// ignore inherited resources for now
PdfDictionary pageResources = reader.GetPageResources(page);
if (pageResources == null)
continue;
PdfDictionary pageFonts = pageResources.GetAsDict(PdfName.FONT);
if (pageFonts == null || pageFonts.Size == 0)
continue;
List<BaseFont> fonts = new List<BaseFont>();
List<string> fontNames = new List<string>();
HashSet<char> chars = new HashSet<char>();
foreach (PdfName key in pageFonts.Keys)
{
PdfIndirectReference fontReference = pageFonts.GetAsIndirectObject(key);
if (fontReference == null)
continue;
DocumentFont font = (DocumentFont) BaseFont.CreateFont((PRIndirectReference)fontReference);
if (font == null)
continue;
PdfObject toUni = PdfReader.GetPdfObjectRelease(font.FontDictionary.Get(PdfName.TOUNICODE));
CMapToUnicode toUnicodeCmap = null;
if (toUni is PRStream)
{
try
{
byte[] touni = PdfReader.GetStreamBytes((PRStream)toUni);
CidLocationFromByte lb = new CidLocationFromByte(touni);
toUnicodeCmap = new CMapToUnicode();
CMapParserEx.ParseCid("", toUnicodeCmap, lb);
}
catch
{
toUnicodeCmap = null;
}
}
if (toUnicodeCmap == null)
continue;
ICollection<int> mapValues = toUnicodeCmap.CreateDirectMapping().Values;
if (mapValues.Count == 0)
continue;
fonts.Add(font);
fontNames.Add(key.ToString());
foreach (int value in mapValues)
chars.Add((char)value);
}
if (fonts.Count == 0 || chars.Count == 0)
continue;
Rectangle size = (fonts.Count > 10) ? PageSize.A4.Rotate() : PageSize.A4;
PdfPTable table = new PdfPTable(fonts.Count + 1);
table.AddCell("Page " + page);
foreach (String name in fontNames)
{
table.AddCell(name);
}
table.HeaderRows = 1;
float[] widths = new float[fonts.Count + 1];
widths[0] = 2;
for (int i = 1; i <= fonts.Count; i++)
widths[i] = 1;
table.SetWidths(widths);
table.WidthPercentage = 100;
List<char> charList = new List<char>(chars);
charList.Sort();
foreach (char character in charList)
{
table.AddCell(((int)character).ToString("X4"));
foreach (BaseFont font in fonts)
{
table.AddCell(new PdfPCell(new Phrase(character.ToString(), new Font(font))));
}
}
stamper.InsertPage(reader.NumberOfPages + 1, size);
ColumnText columnText = new ColumnText(stamper.GetUnderContent(reader.NumberOfPages));
columnText.AddElement(table);
columnText.SetSimpleColumn(size);
while ((ColumnText.NO_MORE_TEXT & columnText.Go(false)) == 0)
{
stamper.InsertPage(reader.NumberOfPages + 1, size);
columnText.Canvas = stamper.GetUnderContent(reader.NumberOfPages);
columnText.SetSimpleColumn(size);
}
}
}
I applied it to your document like this:
string input = #"4700198773.pdf";
string output = #"4700198773-fonts.pdf";
using (PdfReader reader = new PdfReader(input))
using (FileStream stream = new FileStream(output, FileMode.Create, FileAccess.Write))
using (PdfStamper stamper = new PdfStamper(reader, stream))
{
AddFontsTo(reader, stamper);
}
The additional pages look like this:
Now you have to compare the outputs for the different fonts and pages of this document with each other and with those of a representative selection of file. If you find good enough a pattern, you can try this replacement way.

It is possible to compile less file to less with compression and without comments?

i have much more less files that are imported in a main.less file. Now i wanna to make a main.min.less file with imported files and compressed and without any comments. what command i used is:
lessc main.less > main.min.less -x
This command compress the file but can't remove the restricted comments(/*! comments */).
Keep in mind i wanna to make another .less file and not .css file. Any idea?
Because of Less code is very similar to CSS code, you should be able to (re)use many methods of for instance clean-css.
Create a file called clean-less and write down the following content into it:
#!/usr/bin/env node
var path = require('path'),
CommentsProcessor = require('clean-css/lib/text/comments'),
fs = require('fs');
var args = process.argv.slice(1);
var input = args[1];
if (input && input != '-') {
input = path.resolve(process.cwd(), input);
}
var parseFile = function (e, data) {
var lineBreak = process.platform == 'win32' ? '\r\n' : '\n';
var commentsProcessor = new CommentsProcessor(0,false,lineBreak);
//single line comments
var regex = /\/\/ .*/;
data = data.replace(/\/{2,}.*/g,"");
/*
methods from clean css, see https://github.com/jakubpawlowicz/clean-css/
*/
var replace = function() {
if (typeof arguments[0] == 'function')
arguments[0]();
else
data = data.replace.apply(data, arguments);
};
//multi line comments
replace(function escapeComments() {
data = commentsProcessor.escape(data);
});
// whitespace inside attribute selectors brackets
replace(/\[([^\]]+)\]/g, function(match) {
return match.replace(/\s/g, '');
});
// line breaks
replace(/[\r]?\n/g, ' ');
// multiple whitespace
replace(/[\t ]+/g, ' ');
// multiple semicolons (with optional whitespace)
replace(/;[ ]?;+/g, ';');
// multiple line breaks to one
replace(/ (?:\r\n|\n)/g, lineBreak);
replace(/(?:\r\n|\n)+/g, lineBreak);
// remove spaces around selectors
replace(/ ([+~>]) /g, '$1');
// remove extra spaces inside content
replace(/([!\(\{\}:;=,\n]) /g, '$1');
replace(/ ([!\)\{\};=,\n])/g, '$1');
replace(/(?:\r\n|\n)\}/g, '}');
replace(/([\{;,])(?:\r\n|\n)/g, '$1');
replace(/ :([^\{\};]+)([;}])/g, ':$1$2');
process.stdout.write(data);
}
if (input != '-') {
fs.readFile(input, 'utf8', parseFile);
}
Than make the clean-less file executable on your system (chmod +x clean-css).
You should probably also run npm install path and npm install css. After that you should be able to run:
./cleanless input.less > input-clean.less
The input-clean.less file will be some kind of minimized and not contain comments any more.
In the comments #seven-phases-max wonders if minifying reduces client side compile time. Well the client side compiler loads the Less files over a XMLHttpRequest. Reduce the number of bytes will give faster loading. I can not say that is a bottleneck or not.
When i try the above code with the navbar.less file from Bootstrap i found:
8.0K navbar-clean.less
16K navbar.less

Batch for downloading most recent file (where filename changes on new version) from http website

i need a batch for downloading files from a http website (http://www.rarlab.com/download.htm).
From this website i only need the most recent version for the 32bit and 64bit english
program which is always listed at the top of this website.
Problem 1: There are more than this two files for download on the website
Problem 2: The name of the file changes with every new version
How can i download these 2 files (the most recent version) without knowing the exact file name
(and without first visiting the web page to find out the file name) ??
Maybe i can use wget, curl or aria2 for that task but i don't know the parameters/options.
Can anyone help me solving this problem ?
(Please only batch solutions - no vbs, java, jscript, powershell etc.)
thank you.
Sorry, i forgot to say that i use windows 7 32bit. And i prefer batch because the script should be able to run on all windows versions without having to download extra programs or resource kits for different windows version (as of powershell which must be downloaded for windows xp etc.) - and because i only understand batch scripting.
Here's a batch + JScript hybrid script. I know you said no vbs, java, jscript, etc, but you're going to have an awfully hard time scraping HTML with pure batch. But this does meet your other criteria -- running on all Windows versions without having to rely on optional software (like powershell or .Net).* And with JScript's XMLHTTP object you don't even need a 3rd party app to fetch web content.
As for not understanding JScript, aside from a few proprietary ActiveX objects it's just like JavaScript. In case you aren't familiar with JavaScript or regular expressions, I added copious amounts of comments to help you out. Hopefully whatever I didn't bother commenting is pretty obvious what it does.
Update
The script now detects the system locale, matches it with a language on the WinRAR download page, and downloads that language release.
Anyway, save this with a .bat extension and run it as you would any other batch script.
#if (#a==#b) #end /*
:: batch script portion
#echo off
setlocal
set "url=http://www.rarlab.com/download.htm"
set /p "savepath=Location to save? [%cd%] "
if "%savepath%"=="" set "savepath=%cd%"
cscript /nologo /e:jscript "%~f0" "%url%" "%savepath%"
goto :EOF
:: JScript portion */
// populate translation from locale identifier hex value to WinRAR language label
// http://msdn.microsoft.com/en-us/library/dd318693.aspx
var abbrev={}, a=function(arr,val){for(var i=0;i<arr.length;i++)abbrev[arr[i]]=val};
a(['1401','3c01','0c01','0801','2001','4001','2801','1c01','3801','2401'],'Arabic');
a(['042b'],'Armenian');
a(['082c','042c'],'Azerbaijani');
a(['0423'],'Belarusian');
a(['0402'],'Bulgarian');
a(['0403'],'Catalan');
a(['7c04'],'Chinese Traditional');
a(['0c04','1404','1004','0004'],'Chinese Simplified');
a(['101a'],'Croatian');
a(['0405'],'Czech');
a(['0406'],'Danish');
a(['0813','0413'],'Dutch');
a(['0425'],'Estonian');
a(['040b'],'Finnish');
a(['080c','0c0c','040c','140c','180c','100c'],'French');
a(['0437'],'Georgian');
a(['0c07','0407','1407','1007','0807'],'German');
a(['0408'],'Greek');
a(['040d'],'Hebrew');
a(['040e'],'Hungarian');
a(['0421'],'Indonesian');
a(['0410','0810'],'Italian');
a(['0411'],'Japanese');
a(['0412'],'Korean');
a(['0427'],'Lithuanian');
a(['042f'],'Macedonian');
a(['0414','0814'],'Norwegian');
a(['0429'],'Persian');
a(['0415'],'Polish');
a(['0816'],'Portuguese');
a(['0416'],'Portuguese Brazilian');
a(['0418'],'Romanian');
a(['0419'],'Russian');
a(['7c1a','1c1a','0c1a'],'Serbian Cyrillic');
a(['181a','081a'],'Serbian Latin');
a(['041b'],'Slovak');
a(['0424'],'Slovenian');
a(['2c0a','400a','340a','240a','140a','1c0a','300a','440a','100a','480a','080a','4c0a','180a','3c0a','280a','500a','0c0a','040a','540a','380a','200a'],'Spanish');
a(['081d','041d'],'Swedish');
a(['041e'],'Thai');
a(['041f'],'Turkish');
a(['0422'],'Ukranian');
a(['0843','0443'],'Uzbek');
a(['0803'],'Valencian');
a(['042a'],'Vietnamese');
function language() {
var os = GetObject('winmgmts:').ExecQuery('select Locale from Win32_OperatingSystem');
var locale = new Enumerator(os).item().Locale;
// default to English if locale is not in abbrev{}
return abbrev[locale.toLowerCase()] || 'English';
}
function fetch(url) {
var xObj = new ActiveXObject("Microsoft.XMLHTTP");
xObj.open("GET",url,true);
xObj.setRequestHeader('User-Agent','XMLHTTP/1.0');
xObj.send('');
while (xObj.readyState != 4) WSH.Sleep(50);
return(xObj);
}
function save(xObj, file) {
var stream = new ActiveXObject("ADODB.Stream");
with (stream) {
type = 1; // binary
open();
write(xObj.responseBody);
saveToFile(file, 2); // overwrite
close();
}
}
// fetch the initial web page
var x = fetch(WSH.Arguments(0));
// make HTML response all one line
var html = x.responseText.split(/\r?\n/).join('');
// create array of hrefs matching *.exe where the link text contains system language
var r = new RegExp('<a\\s*href="[^"]+\\.exe(?=[^\\/]+' + language() + ')', 'g');
var anchors = html.match(r)
// use only the first two
for (var i=0; i<2; i++) {
// use only the stuff after the quotation mark to the end
var dl = '' + /[^"]+$/.exec(anchors[i]);
// if the location is a relative path, prepend the domain
if (dl.substring(0,1) == '/') dl = /.+:\/\/[^\/]+/.exec(WSH.Arguments(0)) + dl;
// target is path\filename
var target=WSH.Arguments(1) + '\\' + /[^\/]+$/.exec(dl)
// echo without a new line
WSH.StdOut.Write('Saving ' + target + '... ');
// fetch file and save it
save(fetch(dl), target);
WSH.Echo('Done.');
}
Update 2
Here's the same script with a few minor tweaks to have it also detect the architecture (32/64-bitness) of Windows, and only download one installer instead of two:
#if (#a==#b) #end /*
:: batch script portion
#echo off
setlocal
set "url=http://www.rarlab.com/download.htm"
set /p "savepath=Location to save? [%cd%] "
if "%savepath%"=="" set "savepath=%cd%"
cscript /nologo /e:jscript "%~f0" "%url%" "%savepath%"
goto :EOF
:: JScript portion */
// populate translation from locale identifier hex value to WinRAR language label
// http://msdn.microsoft.com/en-us/library/dd318693.aspx
var abbrev={}, a=function(arr,val){for(var i=0;i<arr.length;i++)abbrev[arr[i]]=val};
a(['1401','3c01','0c01','0801','2001','4001','2801','1c01','3801','2401'],'Arabic');
a(['042b'],'Armenian');
a(['082c','042c'],'Azerbaijani');
a(['0423'],'Belarusian');
a(['0402'],'Bulgarian');
a(['0403'],'Catalan');
a(['7c04'],'Chinese Traditional');
a(['0c04','1404','1004','0004'],'Chinese Simplified');
a(['101a'],'Croatian');
a(['0405'],'Czech');
a(['0406'],'Danish');
a(['0813','0413'],'Dutch');
a(['0425'],'Estonian');
a(['040b'],'Finnish');
a(['080c','0c0c','040c','140c','180c','100c'],'French');
a(['0437'],'Georgian');
a(['0c07','0407','1407','1007','0807'],'German');
a(['0408'],'Greek');
a(['040d'],'Hebrew');
a(['040e'],'Hungarian');
a(['0421'],'Indonesian');
a(['0410','0810'],'Italian');
a(['0411'],'Japanese');
a(['0412'],'Korean');
a(['0427'],'Lithuanian');
a(['042f'],'Macedonian');
a(['0414','0814'],'Norwegian');
a(['0429'],'Persian');
a(['0415'],'Polish');
a(['0816'],'Portuguese');
a(['0416'],'Portuguese Brazilian');
a(['0418'],'Romanian');
a(['0419'],'Russian');
a(['7c1a','1c1a','0c1a'],'Serbian Cyrillic');
a(['181a','081a'],'Serbian Latin');
a(['041b'],'Slovak');
a(['0424'],'Slovenian');
a(['2c0a','400a','340a','240a','140a','1c0a','300a','440a','100a','480a','080a','4c0a','180a','3c0a','280a','500a','0c0a','040a','540a','380a','200a'],'Spanish');
a(['081d','041d'],'Swedish');
a(['041e'],'Thai');
a(['041f'],'Turkish');
a(['0422'],'Ukranian');
a(['0843','0443'],'Uzbek');
a(['0803'],'Valencian');
a(['042a'],'Vietnamese');
function language() {
var os = GetObject('winmgmts:').ExecQuery('select Locale from Win32_OperatingSystem');
var locale = new Enumerator(os).item().Locale;
// default to English if locale is not in abbrev{}
return abbrev[locale.toLowerCase()] || 'English';
}
function fetch(url) {
var xObj = new ActiveXObject("Microsoft.XMLHTTP");
xObj.open("GET",url,true);
xObj.setRequestHeader('User-Agent','XMLHTTP/1.0');
xObj.send('');
while (xObj.readyState != 4) WSH.Sleep(50);
return(xObj);
}
function save(xObj, file) {
var stream = new ActiveXObject("ADODB.Stream");
with (stream) {
type = 1; // binary
open();
write(xObj.responseBody);
saveToFile(file, 2); // overwrite
close();
}
}
// fetch the initial web page
var x = fetch(WSH.Arguments(0));
// make HTML response all one line
var html = x.responseText.split(/\r?\n/).join('');
// get OS architecture (This method is much faster than the Win32_Processor.AddressWidth method)
var os = GetObject('winmgmts:').ExecQuery('select OSArchitecture from Win32_OperatingSystem');
var arch = /\d+/.exec(new Enumerator(os).item().OSArchitecture) * 1;
// get link matching *.exe where the link text contains system language and architecture
var r = new RegExp('<a\\s*href="[^"]+\\.exe(?=[^\\/]+' + language() + '[^<]+' + arch + '\\Wbit)');
var link = r.exec(html)
// use only the stuff after the quotation mark to the end
var dl = '' + /[^"]+$/.exec(link);
// if the location is a relative path, prepend the domain
if (dl.substring(0,1) == '/') dl = /.+:\/\/[^\/]+/.exec(WSH.Arguments(0)) + dl;
// target is path\filename
var target=WSH.Arguments(1) + '\\' + /[^\/]+$/.exec(dl)
// echo without a new line
WSH.StdOut.Write('Saving ' + target + '... ');
// fetch file and save it
save(fetch(dl), target);
WSH.Echo('Done.');

How to remove the full file path from YSOD?

In the YSOD below, the stacktrace (and the source file line) contain the full path to the source file. Unfortunately, the full path to the source file name contains my user name, which is firstname.lastname.
I want to keep the YSOD, as well as the stack trace including the filename and line number (it's a demo and testing system), but the username should vanish from the sourcefile path. Seeing the file's path is also OK, but the path should be truncated at the solution root directory.
(without me having to copy-paste the solution every time to another path before publishing it...)
Is there any way to accomplish this ?
Note: Custom error pages aren't an option.
Path is embedded in .pdb files, which are produced by the compiler. The only way to change this is to build your project in some other location, preferably somewhere near the build server.
Never mind, I found it out myself.
Thanks to Anton Gogolev's statement that the path is in the pdb file, I realized it is possible.
One can do a binary search-and-replace on the pdb file, and replace the username with something else.
I quickly tried using this:
https://codereview.stackexchange.com/questions/3226/replace-sequence-of-strings-in-binary-file
and it worked (on 50% of the pdb files).
So mind the crap, that code-snippet in the link seems to be buggy.
But the concept seems to work.
I now use this code:
public static void SizeUnsafeReplaceTextInFile(string strPath, string strTextToSearch, string strTextToReplace)
{
byte[] baBuffer = System.IO.File.ReadAllBytes(strPath);
List<int> lsReplacePositions = new List<int>();
System.Text.Encoding enc = System.Text.Encoding.UTF8;
byte[] baSearchBytes = enc.GetBytes(strTextToSearch);
byte[] baReplaceBytes = enc.GetBytes(strTextToReplace);
var matches = SearchBytePattern(baSearchBytes, baBuffer, ref lsReplacePositions);
if (matches != 0)
{
foreach (var iReplacePosition in lsReplacePositions)
{
for (int i = 0; i < baReplaceBytes.Length; ++i)
{
baBuffer[iReplacePosition + i] = baReplaceBytes[i];
} // Next i
} // Next iReplacePosition
} // End if (matches != 0)
System.IO.File.WriteAllBytes(strPath, baBuffer);
Array.Clear(baBuffer, 0, baBuffer.Length);
Array.Clear(baSearchBytes, 0, baSearchBytes.Length);
Array.Clear(baReplaceBytes, 0, baReplaceBytes.Length);
baBuffer = null;
baSearchBytes = null;
baReplaceBytes = null;
} // End Sub ReplaceTextInFile
Replace firstname.lastname with something that has equally many characters, for example "Poltergeist".
Now I only need to figure out how to run the binary search and replace as a post-build action.

Resources