UNH 2.16 HTML stripper released
|
Don Hawkinson just
released UNH 2.16 HTML stripper. UNH is a fully functional, OS/2 utility which strips HTML codes from
saved Web pages leaving some formatting intact. It will optionally put the URLs
found in a 2nd output file.Hawkinson, author of CCA, DH-Grep-PM,PMStripper, Pastry
Box, and DH_ClipSave/2 was uploaded the archive UNH216.ZIP (37,31K) onto Pete Norloff's
BBS.
Jump Norloff @@@
UNH is an OS/2 command line utility to strip HTML codes from
files saved from the WebX or other web browsers. If it is executed without any
aarguments, the following message will be displayed.
usage: unh file1 file2 <file3>
file1 == html file
file2 == stripped text output file
file3 == URLs from html source file - optional
UNH does not check for the existance of the output file, and will overwrite any
existing file. UNH is HPFS aware.
UNH does not attempt to recreate the format of the Web page. UNH does not attempt
to force any format on the output text, nor does it attempt to remove any existing
text format. While the layout of tables and lists is lost during stripping, data
is sorted to separate lines for legibility.
UNH has a filter which translates any embedded NULL characters to spaces. I have
no idea why anyone would use NULL characters on a web page, but I have encountered
at least one Web site that has done this.
This program is free, but the author retains all rights. See the file icense.txt
file for further information.
The command line utility UNH.EXE uses the same logic as MStripper to strip the
HTML codes from files.
CONTACT AUTHOR:
Don Hawkinson
dwhawk@southwind.net
http://www2.southwind.net/~dwhawk
@Macarlo, Inc.
@Macarlo's Shareware & Web
OS/2
Java Lobby Member
Java Site Accredited
[TOP]
[HOME] [INDEX]