Cover Notes
By The Rectifier
Using this JavaScript Manual
For portability, this 'edition' of the manual is made available with
all files zipped into one JS_MAN.ZIP file. To view it, unzip all files
into an empty directory and do a Netscape Navigator (Ver 2.0 or later)
'Open File' on !_INDEX.HTM
Please distribute it in this zipped form. Merely copying the files risks
losing some, and you won't know till you hit a broken link.
In the original manual, all files except one used DOS 8.3 short filenames.
For maximum compatibility with older Zip utilities, this distribution
has that file (COLORS.CLASS) renamed to COLORS.CLA
There is only one small item in the manual that won't work unless you
rename that file back to COLORS.CLASS. The requirement for this LFN is
built into Netscape Navigator for Win95, which requires Java Applet
files to have the suffix ".CLASS".
Foreword
24th August, 1997: Something rather surprising just happened, that has
added considerably to my cynicism regarding the wonders of the html format.
It all began when I attempted to download a large html document (this
'JavaScript Guide') from Netscape's www.netscape.com. I'm studying Java
and related networkish things, and wanted a copy of the JavaScript manual
on my own PC for convenient browsing. And I wanted to be able to archive it
myself, so I could always count on having it- intact, entire and free of
the time and money costs of downloading.
This foreword is a detailed record of events that followed, and a final
summary of my comments and conclusions. You may wish to skip the boring
details, and read the summary only.
If you want the whole story, read on.
Naturally, like most html objects of any real size and complexity, the
JavaScript manual turned out to consist of numerous seperate pieces (files),
with no way to obtain the complete work other than manually following all the
links, and saving each document seperately. As usual the browser's (Netscape
3.3 Gold) cache is useless for permanent archiving, since all files there are
hash named, mixed in with unrelated documents, and liable to be purged.
I am aware that there are browsers around that will automaticly fetch all
the links they find. I will be looking into these, and how (if?) they
differentiate between links to other parts of the same document, and elsewhere.
But for my current purposes whats important is how the web and html present
to the majority of users. Since Netscape and MS Explorer are by far the
predominant browsers, its most interesting to me to see what can be achieved
using them alone.
Anyway, I downloaded the Javascript manual file by file, doing 'Save as..'
for each file.
Problems encountered:-
#1. Its a document utilising javascript and dynamic frames. Netscape
won't allow editing or saving of the master frame file.
So I found that file in the cache, and saved it (renamed) manually.
As for possible reasons _why_ it won't allow this, more later.
#2. Discovered that the document includes many small image files, such as
for the capital letters at the start of chapters, and various examples
of screens. No simple way to ensure I saved all the components.
#3. When you do a 'Save as..' in Netscape, the html file is actually written
out in a corrupted form. The 'End Of Line' (EOL) codes in the original
html source are generally a two byte 0x0D, 0x0A sequence (ie CR, LF).
This is the normal EOL form for DOS originated files.
When Netscape writes out html files, it prefixes another 0x0D (CR) code,
resulting in EOL's that are three bytes long, and have _two_ CRs.
This not only increases the file size, but also makes such files
display with a blank line between every text line in typical editors.
If this is an accidental bug it is an astonishing one - perhaps the
text editors they use at Netscape don't show the extra CRs? Some editors
don't. But surely someone would have bothered to look at the files with
a hex-display editor?
Personally, I think this is a deliberate 'feature'. More on this later.
#4. The cache files are uncorrupted (ie they are byte for byte images of
the original, but of course they have meaningless filenames.
The correspondence between hashed filename and true file name is
maintained in a database file called FAT.DB in the cache, but this is
predominantly a binary file. There seems to be no way to get Netscape
to tell you the correspondences, or to generate correctly named copies
of its cache contents in another dir.
You may think that it would have been possible to use correctly named
files in the cache _unless_ there was a name conflict. But no, and as
for why, more later.
At this point I decided to exit Netscape while I pottered in the cache
directory. So I checked the 'Options: Network: Cache' settings to be sure
they didn't suggest files would be wiped on exit. Looked OK - there is no
'clear cache' control set, plenty of disk cache room, and 'refresh: never'.
I exited Netscape. All the .HTM files in the cache vanished!
Arrrgh! Nearly an hour of downloading wiped. Logged on again, and spent
another tedious interval downloading the entire document set again.
This time, before exiting Netscrape, I used Explorer to create a new subdir
under the cache dir, and copied all files in the cache to this location.
Hah! Let it delete its cache now, I thought! I left the explorer windows
open on my copy dir and the original cache, and exited Netscape.
Sounds of disk access, and _both_ sets of files disappeared from the windows.
Sheeeit! What is going on here?
Not being able to tell Netscape to _keep_ its cache is pretty stupid, but
for it to go _hunting_ for copies of its cache, and kill them, is _perverse_!
So, time to get serious.
I entered Netscape. Wiped the entire cache (which now only contained .GIF
files anyway, plus the database file.) Verified it was empty with Explorer.
(Interesting question: why does Netscape wipe the .HTM files, but not the
.GIF files?)
Downloaded the whole thing again (watching files appear in the cache.)
Naturally, most .HTMs had one or more .GIFs associated. This will be fun to
patch together - not!
Then, with Netscape still online:
- Copied all cache to a different hard disk C:\N1 (using Explorer).
- Zipped content of C:\N1 to N1.ZIP and copied two images of zip to A:
- Tried to use my old XtreeGold (well, I trust it more than Win 95) to
copy all the cache to C:\N2, but got a "sharing violation". Turns out
Win 95 won't let Xtree look at the FAT.DB file.
It copied the rest OK.
- Used Xtree to copy all files (less FAT.DB) to CACHE\TEMP.
- Used Explorer to copy all files to CACHE\TEMP2.
Then exited Netscape.
In the TEMP and TEMP2 subdirs of CACHE, _all_ files got deleted, as did
all .htm files in CACHE. So Netscape wipes not only its cache, but its
subdirs.
The other copies (on C:) were untouched.
I had been considering the faint possibility that Win95 might be keeping
some record of what got copied where, and using this to allow a 'delete
all copies' facility. I'm sure the copyright enforcement types would love
such a thing, and Microsoft is certainly partial to building 'establishment'
type features into their software. However the survival of the Explorer
generated copies dismissed that paranoid thought, thankfully.
I guess it was the annoyance of having to download the whole thing three
times over that decided me to go to the trouble of binding this manual up
properly for the convenience of others, and write this foreword about the
experience. Also the irony of Netscape's own manual being so difficult to
capture. You'd think they would want people to read it, and would want as
many copies to be in circulation as possible. To put it up on their site,
but in a form that makes it most difficult to actually read off-line, is
really absurd. A contradiction begging to be 'rectified'.
From then on the creation of this stand-alone copy of the JavaScript
manual involved mainly restoring the true names of the files so that
inter-file links would work. To do this I edited the FAT.DB file in a
binary editor, to extract the associations between the hashed cache
file names and the true file path/names. Fortunately these were all
embedded in the binary trash in plain ascii form.
Then I removed superfluous files such as the many instances of small
.GIFs used for capital letters in the text, and modified the Javascript
frame control files that handled the content/index switching buttons.
Also flattened all references to other directories, and absolute URLs.
I added the front cover and this foreword, and in the process found
that there is a bug in the Javascript files controlling the appearance
of Netscape's own JavaScript manual! Whats more, the bug seems to be
fairly fundamental to the language itself. Cool. What an advertisement.
To see the bug: Open the manual in Navigator, with an un-maximised
window. Click the 'Show Index' button in the manual's buttonbar.
The same button now says 'Show contents'. Then maximise the window.
Note that the button now says 'Show Index' again!. Click it. Now it
says 'Show Contents' again, and you still see the index in left frame.
Whats going on here is that Navigator re-executes the scripts every
time it refreshes the window, and the scripts re-set their state vars
every time they run. And the button-bar's appearance is controlled by
state variables. But the actual frame content is controlled by some
temporary images of the html script files in the cache, and these get
updated (overwritten) only when you click one of the buttons. Hence
the numeric and true state get out of sync during window refreshes.
In Netscape's original manual, this fault is present but easily missed.
I only noticed it when I added the 'Book Cover' stuff, which really
shows up the bug since the buttons change dramaticly when you maximise.
I've been trying to find a way around this, and have a few ideas, but
so far all I've tried have turned out to be thwarted by fundamental
aspects of JavaScript.
General Summary
Consider these 'information loss' causing features of html/Netscape:
1 Browser's default discarding of true file names. Both in the cache
and 'Save as..' operation.
Result: document structures fall apart when transported.
Comment: The browser knows the true filename. Refusal to inform the
user, especially during 'Save', must be deliberate choice by Netscape.
2 No standard in html for specifying a file's prefered filename and
relative path in the html header.
Result: This is the underlying cause of feature 1 above.
Comment: I cannot believe that this never occured to the drafters of
the html standard. One must ask why they chose not to include this.
3 Continued inability of browser to archive complete sets of associated
html/picture files as securely bound volumes.
Result: bits of documents get lost.
Comment: You'd think this would be an obvious design goal. Unless it
was decided to deliberately _not_ facilitate such user objectives.
4 No relational attributes to links (eg 'that is part of this, vs 'that
is not part of this')
Result: No way to automaticly obtain all components of some whole (eg
a manual) and no other unrelated files.
Comment: This is another thing that any competent designer would leave
out _only_ if the objective was to obstruct the use of html as a vector
for distribution of freely copyable information in serious sized chunks.
5 The corruption of EOL codes in html files by the 'Save as..' action.
Result: CR,LF codes become CR,CR,LF. Files get longer. Lines in the
html source become double spaced in text editors.
Comment: Yet another fatal flaw in the 'copy' process. How odd.
Beautifuly chosen too. Very few people use hex editors, and many text
editors completely ignore such variants to EOL codes. So few people
would even realise that the files are being corrupted. Note also that
this 'bug' has been persistent through several releases of Navigator.
6 Refusal of Netscape Navigator (3.1 and 3.3) to save or edit frame
control or JavaScript scripts in html files.
Result: These files get lost during saves of documents. Since these
are the _key_ files for many pages, this means such documents are
effectively destroyed by copying.
Comment: There is no good reason for this refusal. They are plain text
files like any other html file. Inability to save them _has_ to be
due to deliberate choice by Netscape. Now why would they choose to
make such lynchpin files uncopyable? Gee, is this clear enough yet?
7 Html does not include any standard for compression of the content of
its own files. Let alone the idea of allowing compressed sets of files
or even trees of files. We've had such tools (eg PKzip) since way back
in the early days of DOS, but the supposed universal standard for the
presentation of documents on the internet does not allow compression?
Result: storage and transmission of large documents is wasteful of
memory and bandwidth.
Comment: But of course! If such a compression standard was developed,
it would be really hard to justify the omission of capabilities to
also hold file sets and directory trees. But if they were included,
all the careful work already put into crippling html as a vector for
data exchange could be lost!
The only possible conclusion is that Netscape management have a
considered policy of moulding html into a format that allows
information providers to obstruct the free and reliable exchange
of 'their' information.
Whether or not you think this objective is reasonable, the results
are very harmful to all uses of html. Specificly, html has evolved
without any trace of features that might allow the preservation of
file sets as complete and intact units through the process of network
transmission. Hence even information providers who _want_ to make
their information available in useful and copyable form, are seriously
handicapped.
The combination of html and browser provides a medium that allows the
remote display of flashy, eye-catching web sites, but strongly obstructs
the capture and storage of such page sets in complete and working form
by the recipient.
This is exactly the same sort of old-world, 'information as property'
thinking that is behind the music and film industry's attempts to
cripple the development of versatile bulk data storage media. (See the
articles on information ownership, the DVD farce, etc, in the fascinating
and free electronic book: 'Evergreen', by Guy Dunphy.)
There is one other aspect of html that should be considered.
This is the complete failure of html to address the need to capture
true likenesses of existing physical documents. There are millons of
precious and historic books out there, all of them growing ever more
brittle and yellowed. Many of them are free of copyright restrictions,
despite the strenuous efforts of powerful groups of copyright holders
to have governments legislate to extend the stifling grip of copyright
to cover _all_ the heritage of mankind.
There are sections in Evergreen devoted to the urgent need to create a
data standard for the efficient capture of such public domain works in
an electronic form. Project Gutenberg is a good start, but suffers in its
efforts from the lack of any such standard more stable than plain ASCII.
Here it is only relevant to point out that html is completely unsuitable
for such use. This 'standard', into which so much information is being
translated, with so much effort, is neither stable, or even capable of
performing what should be the most important use for any pretender to
the title of 'universal document format'. In its obsession with logical
reformatability, html denies that some document capture applications
_demand_ the ability to accurately represent the true, graphical
appearance of the original; warts, defects, fixed layout, and all.
The preservation of the historical works of humanity, in a digitally
preservable, easily reproduced and archived form, must be the prime
objective of any such standard. A standard that does not achieve this,
must eventually be discarded in favour of one that does.
Html must either be extended to allow such a use, or be superceeded.