html2sgml — convert html to sgml according to linuxdoc.dtd
html2sgml is a fileconverter that converts html-files to sgml-files according to linuxdoc.dtd. It will ouput a file with the same name as the specified file but with the ending html changed to sgml.
It will not work on every html-file because of the free format of html. It is tuned to work well with html produced from Applix HTML-editor. If it finds a applix word file in the same directory and with the same name as the specified file, it will include any footnotes from the aw-file in the produced sgml-file.
html2sgml will also try to convert all included images of type gif to postscript.
By default html2sgml produces a ducument of type article. To change to book you can use the script mkbook. It also fills in a dummy name. If there is a title tag in the html-file it will use that as a title for the sgml-file. To change this you have to hand edit the sgml-file.
If there are more than one H1 tag these are used as the toplevel section. Everything marked H1 will become a sect in sgml, and H2 will become sect1 and so forth. If there is only one ore no H1, H2 will be used instead. If there is no H* tags, then the document i broken by design :-)
The resulting sgml-file can then be used by sgml-tools (was linuxdoc-sgml) to produce various new fileformats, eg latex, info, rtf.
html2sgml should work fine with straight html, that is, when no special layout formating has been done. For example: it can handle html table tags, but it can not handle them well if they are used to produce layout.
The best working thing is to use it with Applix html. You can both write directly in Applix Word or import document to applix word. Try to use predefined styles for your document. You can create heading1, heading2, pre, quote and so forth. Open Applix HTML and use File->Import words document. You will then get the chance to tell Applix wich html-tags your defined styles should match, eg heading1 -> html_h1. Then use Format -> HTML document setting where you can fill in the title; here you can also fill in the alternative to export Applix images as gif files. This is good to do because html2sgml can convert the gif files to ps-files wich can be used when/if converting to latex.
html2sgml is still under development and will most probably contain bugs. It also contain som features. All possible HTML and sgml tags are not implemented. Unimplemented HTML tags will show up in the sgml file where you have to hand edit them away. Some tags in sgml are also unsupported. More specific: no math tags is implemented. You can check the resulting sgml file with the command sgmlcheck to discover any leftover tags.
I have concentrated on making it work in english and in swedish. This means that there are a lot of characters that probably not will work OK, specialy when converting Applix footnotes. Look in the source and try to put in the missing characters if you have any problems. And pleas send the new improved version to mee.
Peter Antman (firstname.lastname@example.org)
sgml2latex(1), sgml2html(1), sgml2txt(1), sgml2info(1), sgml2rtf, sgml2lyx(1)