Write less than greater than to text file in asp vbscript - asp-classic

I am having a very hard time trying to write out an xml file from asp vbscript to a text file using the Scripting.FileSystemObject. The issue is the less than and greater than chars. In order for me to add these characters to variables in the code i need to use &lt ; &gt ;. This causes a problem when writing the text. The results look like this
<copyright>request copyright</copyright>
<lastBuildDate>10/26/2012</lastBuildDate>
proper format should be as such
<copyright>request copyright</copyright>
<lastBuildDate>10/26/2012</lastBuildDate>
Is there some sort of trick to converting those segments while writing the text file, or do i need to do something a bit more extravagant?
Thanks in advance!

When writing in the TextStream, you could just surround your variables with two calls to Replace
TextStream.Write Replace(Replace(myString, "<","<"),">",">")
This way the variables aren't altered, but the written out data uses the right characters.

Try this:
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.CharSet = "utf-8"
objStream.Open
objStream.WriteText "testdata"
objStream.SaveToFile "C:\test.txt", 2

Related

Exporting tweets text with multiple lines into csv [duplicate]

I need to generate a file for Excel, some of the values in this file contain multiple lines.
there's also non-English text in there, so the file has to be Unicode.
The file I'm generating now looks like this: (in UTF8, with non English text mixed in and with a lot of lines)
Header1,Header2,Header3
Value1,Value2,"Value3 Line1
Value3 Line2"
Note the multi-line value is enclosed in double quotes, with a normal everyday newline in it.
According to what I found on the web this supposed to work, but it doesn't, at least not win Excel 2007 and UTF8 files, Excel treats the 3rd line as the second row of data not as the second line of the first data row.
This has to run on my customer's machines and I have no control over their version of Excel, so I need a solution that will work with Excel 2000 and later.
Thanks
EDIT: I "solved" my problem by having two CSV options, one for Excel (Unicode, tab separated, no newlines in fields) and one for the rest of the world (UTF8, standard CSV).
Not what I was looking for but at least it works (so far)
You should have space characters at the start of fields ONLY where the space characters are part of the data. Excel will not strip off leading spaces. You will get unwanted spaces in your headings and data fields. Worse, the " that should be "protecting" that line-break in the third column will be ignored because it is not at the start of the field.
If you have non-ASCII characters (encoded in UTF-8) in the file, you should have a UTF-8 BOM (3 bytes, hex EF BB BF) at the start of the file. Otherwise Excel will interpret the data according to your locale's default encoding (e.g. cp1252) instead of utf-8, and your non-ASCII characters will be trashed.
Following comments apply to Excel 2003, 2007 and 2013; not tested on Excel 2000
If you open the file by double-clicking on its name in Windows Explorer, everything works OK.
If you open it from within Excel, the results vary:
You have only ASCII characters in the file (and no BOM): works.
You have non-ASCII characters (encoded in UTF-8) in the file, with a UTF-8 BOM at the start: it recognises that your data is encoded in UTF-8 but it ignores the csv extension and drops you into the Text Import not-a-Wizard, unfortunately with the result that you get the line-break problem.
Options include:
Train the users not to open the files from within Excel :-(
Consider writing an XLS file directly ... there are packages/libraries available for doing that in Python/Perl/PHP/.NET/etc
After lots of tweaking, here's a configuration that works generating files on Linux, reading on Windows+Excel, though the embedded newline format is not according to the standard:
Newlines within a field need to be \n (and obviously quoted in double quotes)
End of record: \r\n
Make sure that you don't start a field with equals, otherwise it gets treated as a formula and truncated
In Perl, I used Text::CSV to do this as follows:
use Text::CSV;
open my $FO, ">:encoding(utf8)", $filename or die "Cannot create $filename: $!";
my $csv = Text::CSV->new({ binary => 1, eol => "\r\n" });
#for each row...:
$csv -> print ($FO, \#row);
Recently I had similar problem, I solved it by importing a HTML file, the baseline example would be like this:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<style>
<!--
br {mso-data-placement:same-cell;}
-->
</style>
</head>
<body>
<table>
<tr>
<td>first line<br/>second line</td>
<td style="white-space:normal">first line<br/>second line</td>
</tr>
</table>
</body>
</html>
I know, it is not a CSV, and might work differently for various versions of Excel, but I think it is worth a try.
I hope this helps ;-)
In Excel 365 while importing the file:
Data -> From Text/CSV:
-> Select File > Transform Data:
In the Power Query Editor, right hand side at "Query Settings", under APPLIED STEPS, on "Source" row, click the "Settings icon"
-> In the line break dropdown select Ignore line breaks inside quotes.
Then press OK -> File -> Close & Load
It is worth noting that when a .CSV file has fields wrapped in double quotes which contain line breaks, Excel will not import the .CSV file properly if the .CSV file is written in UTF-8 format. Excel treats the line break as if it were CR/LF and begins a new line. The spreadsheet is garbled. That seems to be true even if semi-colons are used as field delimiters (instead of commas).
The problem can be resolved by using Windows Notepad to edit the .CSV file, using File > Save As... to save the file, and before saving the file, changing the file encoding from UTF-8 to ANSI. Once the file is saved in ANSI format, then I find that Microsoft Excel 2013 running on Windows 7 Professional will import the file properly.
Newline inside a value seems to work if you use semicolon as separator, instead of comma or tab, and use quotes.
This works for me in both Excel 2010 and Excel 2000. However, surprisingly, it works only when you open the file as a new spreadsheet, not when you import it into an existing spreadsheet using the data import feature.
On a PC, ASCII character #10 is what you want to place a newline within a value.
Once you get it into Excel, however, you need to make sure word wrap is turned on for the multi-line cells or the newline will appear as a square box.
This will not work if you try to import the file into EXCEL.
Associate the file extension csv with EXCEL.EXE so you will be able to invoke EXCEL by double-clicking the csv file.
Here I place some text followed by the NewLine Char followed by some more text AND enclosing the whole string with double quotes.
Do not use a CR since EXCEL will place part of the string in the next cell.
""text" + NL + "text""
When you invoke EXCEL, you will see this. You may have to auto size the height to see it all. Where the line breaks will depend on the width of the cell.
2
DATE
Here's the code in Basic
CHR$(34,"2", 10,"DATE", 34)
I found this and it has worked for me
$delimiter = ',';
$enc1 = '"';
$enc2 = '""';
Then where you need to have stuff enclosed
$myfile = ('/path/to/myfile.csv');
//erase any previous contents
$fp = fopen($myfile, 'w+');
fwrite($fp, $enc1 . 'Column Heading 1' . $enc1 . $delimiter );
//append to new file
$fp2 = fopen($myfile, 'a');
fwrite($fp2, $enc1 . 'Column Heading 2' . $enc1 . $delimiter );
.....
fwrite($fp2, $enc1 . 'Last Column Heading' . $enc1 . $delimiter. PHP_EOL );
Then when you need to write something out - like HTML that includes the " you can do this
fwrite($fp2, $enc2 . $myhtmlstring . $enc2 . $delimiter);
New lines end with . PHP_EOL
The end of the script prints out a link so that the user can download the file.
echo 'Click here to download file';
Test this:
It fully works for me:
Put the following lines in a xxxx.csv file
hola_x,="este es mi text1"&CHAR(10)&"I sigo escribiendo",hola_a
hola_y,="este es mi text2"&CHAR(10)&"I sigo escribiendo",hola_b
hola_z,="este es mi text3"&CHAR(10)&"I sigo escribiendo",hola_c
Open with excel.
in some cases will open directly otherwise will need to use column to data conversion.
expand the column width and hit the wrap text button. or format cells and activate wrap text.
and thanks for the other suggestions, but they did not work for me. I am in a pure windows env, and did not want to play with unicode or other funny thing.
This way you putting a formula from csv to excel. It may be many uses for this method of work.
(note the = before the quotes)
pd:In your suggestions please put some samples of the data not only the code.
UTF files that contain a BOM will cause Excel to treat new lines literally even in that field is surrounded by quotes. (Tested Excel 2008 Mac)
The solution is to make any new lines a carriage return (CHR 13) rather than a line feed.
putting "\r" at the end of each row actually had the effect of line breaks in excel, but in the .csv it vanished and left an ugly mess where each row was squashed against the next with no space and no line-breaks
For File Open only, the syntax is
,"one\n
two",...
The critical thing is that there is no space after the first ",". Normally spaces are fine, and trimmed if the string is not quoted. But otherwise nasty. Took me a while to figure that out.
It does not seem to matter if the line is ended \n or \c\n.
Make sure you expand the formula bar so you can actually see the text in the cell (got me after a long day...)
Now of course, File Open will not support UTF-8 Properly (unless one uses tricks).
Excel > Data > Get External Data > From Text
Can be set into UTF-8 mode (it is way down the list of fonts). However, in that case the new lines do not seem to work and I know no way to fix that.
(One might thing that after 30 years MS would get this stuff right.)
The way we do it (we use VB.Net) is to enclose the text with new lines in Chr(34) which is the char representing the double quotes and replace all CR-LF characters for LF.
Normally a new line is "\r\n". In my CSV, I replaced "\r" with empty value.
Here is code in Javascript:
cellValue = cellValue.replace(/\r/g, "")
When I open the CSV in MS Excel, it worked well. If a value has multiple lines, it will stay within 1 single cell in the Excel sheet.
you can do the next "\"Value3 Line1 Value3 Line2\"". It works for me generating a csv file in java
Here is an interesting approach using JavaScript ...
String.prototype.csv = String.prototype.split.partial(/,\s*/);
var results = ("Mugan, Jin, Fuu").csv();
console.log(results[0]=="Mugan" &&
results[1]=="Jin" &&
results[2]=="Fuu",
"The text values were split properly");
Printing a HTML newline <br/> into the content and opening in excel will work fine on any excel
You could use keyboard shortcut ALT+Enter.
Select the cell you wish to edit
enter edit mode either by double clicking it or pressing F2
3.Press Alt+enter. This will create a new line in cell

Restrict user input to characters in IBM System i 00280 code page

We need to restrict user input in a classic ASP web site to the characters allowed by the 00280 code page of IBM System i.
Is there a way to do it in a sane way besides having a (JavaScript|VBScript) function checking every character of an input string against a string of allowed characters?
A basic classic ASP function I thought of:
Function CheckInput(text, replacement)
Dim output : output = ""
Dim haystack : haystack = "abcd.. " ' Insert here the allowed characters.
Dim i : i = 0
For i = 1 To Len(text)
Dim needle : needle = Mid(text, i, 1)
If InStr(haystack, needle) = 0 Then
needle = replacement
End If
output = output & needle
Next
CheckInput = output
End Function
Would - in my function - a RegExp be an overkill?
The short answer to your first question is: No. To your second question: RegEx might not help you here because not all RegEx implementation in browsers will support the characters you need to test and neither does VBScript version of RegEx.
Even using the code approach you are proposing would need some very careful thought. In order to be able to place the set of characters you want to support in as string literal the codepage that you save the ASP file would need to be one that covers all the characters needed or alternatively you would need to use AscW to help you build a string containing those characters.
One slightly simpler approach would be to use Javascript and have the page charset and codepage set to UTF-8. This would allow you to create a string literal containing anyset of characters.
Since it is generally not considered secure to rely on browser validation, you should consider changing your IBM i (formerly OS/400) application interface to accept UCS-2 data, and perform any necessary validation and conversion at the server side.

Fix Special Characters in String

I've got a program that in a nutshell reads values from a SQL database and writes them to a tab-delimited text file.
The issue is that some of the values in the database have special characters (TM, dash, ellipsis, etc.) When written to the text file, the formatting is lost and they come across as junk "™ or – etc"
When the value is viewed in the immediate window, before it is written to the txt file, everything looks fine. My guess is that this is an issue of encoding. But, I'm not real sure how to proceed, where to look, or what to look for.
Is this ASCII or UTF-8? If it's one of those how do I correct it before it's written to the text file.
Here's how I build the text file (where feedStr is a StringBuilder)
objReader = New StreamWriter(filePath)
objReader.Write(feedStr)
objReader.Close()
The default encoding for StreamWriter is UTF8 (with no byte order mark). Your result file is ok, the question is what do you open it in afterwards? If you open it in a UTF8 capable text editor, the characters should look the way you want.
You can also write the text file in another encoding, for example iso-8859-1 (latin1)
objReader = New StreamWriter(filePath, false, Encoding.GetEncoding("iso-8859-1"))

ASP Readline non-standard Line Endings

I'm using the ASP Classic ReadLine() function of the File System Object.
All has been working great until someone made their import file on a Mac in TextEdit.
The line endings aren't the same, and ReadLine() reads in the entire file, not just 1 line at a time.
Is there a standard way of handling this? Some sort of page directive, or setting on the File System Object?
I guess that I could read in the entire file, and split on vbLF, then for each item, replace vbCR with "", then process the lines, one at a time, but that seems a bit kludgy.
I have searched all over for a solution to this issue, but the solutions are all along the lines of "don't save the file with Mac[sic] line endings."
Anyone have a better way of dealing with this problem?
There is no way to change the behaviour of ReadLine, it will only recognize CRLF as a line terminator. Hence the only simply solution is the one you have already described.
Edit
Actually there is another library that ought to be available out of the box on an ASP server that might offer some help. That is the ADODB library.
The ADODB.Stream object has a LineSeparator property that can be assigned 10 or 13 to override the default CRLF it would normally use. The documentation is patchy because it doesn't describe how this can be used with ReadText. You can get the ReadText method to return the next line from the stream by passing -2 as its parameter.
Take a look at this example:-
Dim sLine
Dim oStreamIn : Set oStreamIn = CreateObject("ADODB.Stream")
oStreamIn.Type = 2 '' # Text
oStreamIn.Open
oStreamIn.CharSet = "Windows-1252"
oStreamIn.LoadFromFile "C:\temp\test.txt"
oStreamIn.LineSeparator = 10 '' # Linefeed
Do Until oStreamIn.EOS
sLine = oStreamIn.ReadText(-2)
'' # Do stuff with sLine
Loop
oStreamIn.Close
Note that by default the CharSet is unicode so you will need to assign the correct CharSet being used by the file if its not Unicode. I use the word "Unicode" in the sense that the documentation does which actually means UTF-16. One advantage here is that ADODB Stream can handle UTF-8 unlike the Scripting library.
BTW, I thought MACs used a CR for line endings? Its Unix file format that uses LFs isn't it?

Character Support Issue - How to Translate Higher ASCII Characters to Lower ASCII Characters

So I have an ASP.Net (vb.net) application. It has a textbox and the user is pasting text from Microsoft Word into it. So things like the long dash (charcode 150) are coming through as input. Other examples would be the smart quotes or accented characters. In my app I'm encoding them in xml and passing that to the database as an xml parameter to a sql stored procedure. It gets inserted in the database just as the user entered it.
The problem is the app that reads this data doesn't like these characters. So I need to translate them into the lower ascii (7bit I think) character set. How do I do that? How do I determine what encoding they are in so I can do something like the following. And would just requesting the ASCII equivalent translate them intelligently or do I have to write some code for that?
Also maybe it might be easier to solve this problem in the web page to begin with. When you copy the selection of characters from Word it puts several formats in the clipboard. The straight text one is the one I want. Is there a way to have the html textbox get that text when the user pastes into it? Do I have to set the encoding of the web page somehow?
System.Text.Encoding.ASCII.GetString(System.Text.Encoding.GetEncoding(1251).GetBytes(text))
Code from the app that encodes the input into xml:
Protected Function RequestStringItem( _
ByVal strName As System.String) As System.String
Dim strValue As System.String
strValue = Me.Request.Item(strName)
If Not (strValue Is Nothing) Then
RequestStringItem = strValue.Trim()
Else
RequestStringItem = ""
End If
End Function
' I get the input from the textboxes into an array like this
m_arrInsertDesc(intIndex) = RequestStringItem("txtInsertDesc" & strValue)
m_arrInsertFolder(intIndex) = RequestInt32Item("cboInsertFolder" & strValue)
' create xml file for inserts
strmInsertList = New System.IO.MemoryStream()
wrtInsertList = New System.Xml.XmlTextWriter(strmInsertList, System.Text.Encoding.Unicode)
' start document and add root element
wrtInsertList.WriteStartDocument()
wrtInsertList.WriteStartElement("Root")
' cycle through inserts
For intIndex = 0 To m_intInsertCount - 1
' if there is an insert description
If m_arrInsertDesc(intIndex).Length > 0 Then
' if the insert description is of the appropriate length
If m_arrInsertDesc(intIndex).Length <= 96 Then
' add element to xml
wrtInsertList.WriteStartElement("Insert")
wrtInsertList.WriteAttributeString("insertdesc", m_arrInsertDesc(intIndex))
wrtInsertList.WriteAttributeString("insertfolder", m_arrInsertFolder(intIndex).ToString())
wrtInsertList.WriteEndElement()
' if insert description is too long
Else
m_strError = "ERROR: INSERT DESCRIPTION TOO LONG"
Exit Function
End If
End If
Next
' close root element and document
wrtInsertList.WriteEndElement()
wrtInsertList.WriteEndDocument()
wrtInsertList.Close()
' when I add the xml as a parameter to the stored procedure I do this
cmdAddRequest.Parameters.Add("#insert_list", OdbcType.NText).Value = System.Text.Encoding.Unicode.GetString(strmInsertList.ToArray())
How big is the range of these input characters? 256? (each char fits into a single byte). If that's true, it wouldn't be hard to implement a 256 value lookup table. I haven't toyed with BASIC in years, but basically you'd DIM an array of 256 bytes and fill in the array with translated values, i.e. the 'a'th byte would get 'a' (since it's OK as is) but the 150'th byte would get a hyphen.
This seems to work for long dash to short dash and smart quotes to regular quotes. As my html pages has the following as the content type. But it converts all the accented characters to questions marks. Which is not what the Text version of the clipboard has. So I'm closer, I just think I have the target encoding wrong.
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
System.Text.Encoding.ASCII.GetString(System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(m_arrFolderDesc(intIndex)))
Edit: Found the correct target encoding for my purposes which is 1252.
System.Text.Encoding.GetEncoding(1252).GetString(System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(m_arrFolderDesc(intIndex)))
If you convert to a non-unicode character set, you will lose some characters in the process. If the legacy app reading the data doesn't need to do any string transformations, you might want to consider using UTF-7, and converting it back once it gets back into the unicode world - this will preserve all special characters.

Resources