Cover Notes

By The Rectifier

Using this JavaScript Manual

For portability, this 'edition' of the manual is made available with all files zipped into one JS_MAN.ZIP file. To view it, unzip all files into an empty directory and do a Netscape Navigator (Ver 2.0 or later) 'Open File' on !_INDEX.HTM Please distribute it in this zipped form. Merely copying the files risks losing some, and you won't know till you hit a broken link. In the original manual, all files except one used DOS 8.3 short filenames. For maximum compatibility with older Zip utilities, this distribution has that file (COLORS.CLASS) renamed to COLORS.CLA There is only one small item in the manual that won't work unless you rename that file back to COLORS.CLASS. The requirement for this LFN is built into Netscape Navigator for Win95, which requires Java Applet files to have the suffix ".CLASS".

Foreword

24th August, 1997: Something rather surprising just happened, that has added considerably to my cynicism regarding the wonders of the html format. It all began when I attempted to download a large html document (this 'JavaScript Guide') from Netscape's www.netscape.com. I'm studying Java and related networkish things, and wanted a copy of the JavaScript manual on my own PC for convenient browsing. And I wanted to be able to archive it myself, so I could always count on having it- intact, entire and free of the time and money costs of downloading. This foreword is a detailed record of events that followed, and a final summary of my comments and conclusions. You may wish to skip the boring details, and read the summary only. If you want the whole story, read on. Naturally, like most html objects of any real size and complexity, the JavaScript manual turned out to consist of numerous seperate pieces (files), with no way to obtain the complete work other than manually following all the links, and saving each document seperately. As usual the browser's (Netscape 3.3 Gold) cache is useless for permanent archiving, since all files there are hash named, mixed in with unrelated documents, and liable to be purged. I am aware that there are browsers around that will automaticly fetch all the links they find. I will be looking into these, and how (if?) they differentiate between links to other parts of the same document, and elsewhere. But for my current purposes whats important is how the web and html present to the majority of users. Since Netscape and MS Explorer are by far the predominant browsers, its most interesting to me to see what can be achieved using them alone. Anyway, I downloaded the Javascript manual file by file, doing 'Save as..' for each file. Problems encountered:- #1. Its a document utilising javascript and dynamic frames. Netscape won't allow editing or saving of the master frame file. So I found that file in the cache, and saved it (renamed) manually. As for possible reasons _why_ it won't allow this, more later. #2. Discovered that the document includes many small image files, such as for the capital letters at the start of chapters, and various examples of screens. No simple way to ensure I saved all the components. #3. When you do a 'Save as..' in Netscape, the html file is actually written out in a corrupted form. The 'End Of Line' (EOL) codes in the original html source are generally a two byte 0x0D, 0x0A sequence (ie CR, LF). This is the normal EOL form for DOS originated files. When Netscape writes out html files, it prefixes another 0x0D (CR) code, resulting in EOL's that are three bytes long, and have _two_ CRs. This not only increases the file size, but also makes such files display with a blank line between every text line in typical editors. If this is an accidental bug it is an astonishing one - perhaps the text editors they use at Netscape don't show the extra CRs? Some editors don't. But surely someone would have bothered to look at the files with a hex-display editor? Personally, I think this is a deliberate 'feature'. More on this later. #4. The cache files are uncorrupted (ie they are byte for byte images of the original, but of course they have meaningless filenames. The correspondence between hashed filename and true file name is maintained in a database file called FAT.DB in the cache, but this is predominantly a binary file. There seems to be no way to get Netscape to tell you the correspondences, or to generate correctly named copies of its cache contents in another dir. You may think that it would have been possible to use correctly named files in the cache _unless_ there was a name conflict. But no, and as for why, more later. At this point I decided to exit Netscape while I pottered in the cache directory. So I checked the 'Options: Network: Cache' settings to be sure they didn't suggest files would be wiped on exit. Looked OK - there is no 'clear cache' control set, plenty of disk cache room, and 'refresh: never'. I exited Netscape. All the .HTM files in the cache vanished! Arrrgh! Nearly an hour of downloading wiped. Logged on again, and spent another tedious interval downloading the entire document set again. This time, before exiting Netscrape, I used Explorer to create a new subdir under the cache dir, and copied all files in the cache to this location. Hah! Let it delete its cache now, I thought! I left the explorer windows open on my copy dir and the original cache, and exited Netscape. Sounds of disk access, and _both_ sets of files disappeared from the windows. Sheeeit! What is going on here? Not being able to tell Netscape to _keep_ its cache is pretty stupid, but for it to go _hunting_ for copies of its cache, and kill them, is _perverse_! So, time to get serious. I entered Netscape. Wiped the entire cache (which now only contained .GIF files anyway, plus the database file.) Verified it was empty with Explorer. (Interesting question: why does Netscape wipe the .HTM files, but not the .GIF files?) Downloaded the whole thing again (watching files appear in the cache.) Naturally, most .HTMs had one or more .GIFs associated. This will be fun to patch together - not! Then, with Netscape still online: - Copied all cache to a different hard disk C:\N1 (using Explorer). - Zipped content of C:\N1 to N1.ZIP and copied two images of zip to A: - Tried to use my old XtreeGold (well, I trust it more than Win 95) to copy all the cache to C:\N2, but got a "sharing violation". Turns out Win 95 won't let Xtree look at the FAT.DB file. It copied the rest OK. - Used Xtree to copy all files (less FAT.DB) to CACHE\TEMP. - Used Explorer to copy all files to CACHE\TEMP2. Then exited Netscape. In the TEMP and TEMP2 subdirs of CACHE, _all_ files got deleted, as did all .htm files in CACHE. So Netscape wipes not only its cache, but its subdirs. The other copies (on C:) were untouched. I had been considering the faint possibility that Win95 might be keeping some record of what got copied where, and using this to allow a 'delete all copies' facility. I'm sure the copyright enforcement types would love such a thing, and Microsoft is certainly partial to building 'establishment' type features into their software. However the survival of the Explorer generated copies dismissed that paranoid thought, thankfully. I guess it was the annoyance of having to download the whole thing three times over that decided me to go to the trouble of binding this manual up properly for the convenience of others, and write this foreword about the experience. Also the irony of Netscape's own manual being so difficult to capture. You'd think they would want people to read it, and would want as many copies to be in circulation as possible. To put it up on their site, but in a form that makes it most difficult to actually read off-line, is really absurd. A contradiction begging to be 'rectified'. From then on the creation of this stand-alone copy of the JavaScript manual involved mainly restoring the true names of the files so that inter-file links would work. To do this I edited the FAT.DB file in a binary editor, to extract the associations between the hashed cache file names and the true file path/names. Fortunately these were all embedded in the binary trash in plain ascii form. Then I removed superfluous files such as the many instances of small .GIFs used for capital letters in the text, and modified the Javascript frame control files that handled the content/index switching buttons. Also flattened all references to other directories, and absolute URLs. I added the front cover and this foreword, and in the process found that there is a bug in the Javascript files controlling the appearance of Netscape's own JavaScript manual! Whats more, the bug seems to be fairly fundamental to the language itself. Cool. What an advertisement. To see the bug: Open the manual in Navigator, with an un-maximised window. Click the 'Show Index' button in the manual's buttonbar. The same button now says 'Show contents'. Then maximise the window. Note that the button now says 'Show Index' again!. Click it. Now it says 'Show Contents' again, and you still see the index in left frame. Whats going on here is that Navigator re-executes the scripts every time it refreshes the window, and the scripts re-set their state vars every time they run. And the button-bar's appearance is controlled by state variables. But the actual frame content is controlled by some temporary images of the html script files in the cache, and these get updated (overwritten) only when you click one of the buttons. Hence the numeric and true state get out of sync during window refreshes. In Netscape's original manual, this fault is present but easily missed. I only noticed it when I added the 'Book Cover' stuff, which really shows up the bug since the buttons change dramaticly when you maximise. I've been trying to find a way around this, and have a few ideas, but so far all I've tried have turned out to be thwarted by fundamental aspects of JavaScript.

General Summary

Consider these 'information loss' causing features of html/Netscape: 1 Browser's default discarding of true file names. Both in the cache and 'Save as..' operation. Result: document structures fall apart when transported. Comment: The browser knows the true filename. Refusal to inform the user, especially during 'Save', must be deliberate choice by Netscape. 2 No standard in html for specifying a file's prefered filename and relative path in the html header. Result: This is the underlying cause of feature 1 above. Comment: I cannot believe that this never occured to the drafters of the html standard. One must ask why they chose not to include this. 3 Continued inability of browser to archive complete sets of associated html/picture files as securely bound volumes. Result: bits of documents get lost. Comment: You'd think this would be an obvious design goal. Unless it was decided to deliberately _not_ facilitate such user objectives. 4 No relational attributes to links (eg 'that is part of this, vs 'that is not part of this') Result: No way to automaticly obtain all components of some whole (eg a manual) and no other unrelated files. Comment: This is another thing that any competent designer would leave out _only_ if the objective was to obstruct the use of html as a vector for distribution of freely copyable information in serious sized chunks. 5 The corruption of EOL codes in html files by the 'Save as..' action. Result: CR,LF codes become CR,CR,LF. Files get longer. Lines in the html source become double spaced in text editors. Comment: Yet another fatal flaw in the 'copy' process. How odd. Beautifuly chosen too. Very few people use hex editors, and many text editors completely ignore such variants to EOL codes. So few people would even realise that the files are being corrupted. Note also that this 'bug' has been persistent through several releases of Navigator. 6 Refusal of Netscape Navigator (3.1 and 3.3) to save or edit frame control or JavaScript scripts in html files. Result: These files get lost during saves of documents. Since these are the _key_ files for many pages, this means such documents are effectively destroyed by copying. Comment: There is no good reason for this refusal. They are plain text files like any other html file. Inability to save them _has_ to be due to deliberate choice by Netscape. Now why would they choose to make such lynchpin files uncopyable? Gee, is this clear enough yet? 7 Html does not include any standard for compression of the content of its own files. Let alone the idea of allowing compressed sets of files or even trees of files. We've had such tools (eg PKzip) since way back in the early days of DOS, but the supposed universal standard for the presentation of documents on the internet does not allow compression? Result: storage and transmission of large documents is wasteful of memory and bandwidth. Comment: But of course! If such a compression standard was developed, it would be really hard to justify the omission of capabilities to also hold file sets and directory trees. But if they were included, all the careful work already put into crippling html as a vector for data exchange could be lost! The only possible conclusion is that Netscape management have a considered policy of moulding html into a format that allows information providers to obstruct the free and reliable exchange of 'their' information. Whether or not you think this objective is reasonable, the results are very harmful to all uses of html. Specificly, html has evolved without any trace of features that might allow the preservation of file sets as complete and intact units through the process of network transmission. Hence even information providers who _want_ to make their information available in useful and copyable form, are seriously handicapped. The combination of html and browser provides a medium that allows the remote display of flashy, eye-catching web sites, but strongly obstructs the capture and storage of such page sets in complete and working form by the recipient. This is exactly the same sort of old-world, 'information as property' thinking that is behind the music and film industry's attempts to cripple the development of versatile bulk data storage media. (See the articles on information ownership, the DVD farce, etc, in the fascinating and free electronic book: 'Evergreen', by Guy Dunphy.) There is one other aspect of html that should be considered. This is the complete failure of html to address the need to capture true likenesses of existing physical documents. There are millons of precious and historic books out there, all of them growing ever more brittle and yellowed. Many of them are free of copyright restrictions, despite the strenuous efforts of powerful groups of copyright holders to have governments legislate to extend the stifling grip of copyright to cover _all_ the heritage of mankind. There are sections in Evergreen devoted to the urgent need to create a data standard for the efficient capture of such public domain works in an electronic form. Project Gutenberg is a good start, but suffers in its efforts from the lack of any such standard more stable than plain ASCII. Here it is only relevant to point out that html is completely unsuitable for such use. This 'standard', into which so much information is being translated, with so much effort, is neither stable, or even capable of performing what should be the most important use for any pretender to the title of 'universal document format'. In its obsession with logical reformatability, html denies that some document capture applications _demand_ the ability to accurately represent the true, graphical appearance of the original; warts, defects, fixed layout, and all. The preservation of the historical works of humanity, in a digitally preservable, easily reproduced and archived form, must be the prime objective of any such standard. A standard that does not achieve this, must eventually be discarded in favour of one that does. Html must either be extended to allow such a use, or be superceeded.