UNH 2.16 HTML stripper released

Don Hawkinson just released UNH 2.16 HTML stripper. UNH is a fully functional, OS/2 utility which strips HTML codes from saved Web pages leaving some formatting intact. It will optionally put the URLs found in a 2nd output file.Hawkinson, author of CCA, DH-Grep-PM,PMStripper, Pastry Box, and DH_ClipSave/2 was uploaded the archive UNH216.ZIP (37,31K) onto Pete Norloff's BBS.

Jump Norloff @@@


UNH is an OS/2 command line utility to strip HTML codes from files saved from the WebX or other web browsers. If it is executed without any aarguments, the following message will be displayed.


usage: unh file1 file2 <file3>
file1 == html file
file2 == stripped text output file
file3 == URLs from html source file - optional


UNH does not check for the existance of the output file, and will overwrite any existing file. UNH is HPFS aware.

UNH does not attempt to recreate the format of the Web page. UNH does not attempt to force any format on the output text, nor does it attempt to remove any existing text format. While the layout of tables and lists is lost during stripping, data is sorted to separate lines for legibility.

UNH has a filter which translates any embedded NULL characters to spaces. I have no idea why anyone would use NULL characters on a web page, but I have encountered at least one Web site that has done this.

This program is free, but the author retains all rights. See the file icense.txt file for further information.

The command line utility UNH.EXE uses the same logic as MStripper to strip the HTML codes from files.


CONTACT AUTHOR:

Don Hawkinson
dwhawk@southwind.net
http://www2.southwind.net/~dwhawk


@Macarlo, Inc.
@Macarlo's Shareware & Web
OS/2
Java Lobby Member
Java Site Accredited

[TOP] [HOME] [INDEX]