Editor: Franck Portaneri <franck@langbox.com> Last Update: Jan 18th, 1999
[Not every item are filled. The important thing is we need to know who cover what and understand what is still uncovered. I have added the name of persons find in the Mozilla.org schedule. If I miss someone or I am wrong, please just let me know... ]
Specifications: The main support is common for Arabic and Hebrew because of the Bi-Di
(Bi-Directionality) specificity of both languages.
Of course, the charset is not the same, as well as the latest rendering process which is
more complex for Arabic due to the "glyph shaping determination". So, this part
of the document is splited in two sections - Arabic and Hebrew :
Arabic specific :
There are several charset commonly used on the web for Arabic/Hebrew languages. We
decide to support the following:
ISIRI 3342 : (Anoosh, any URL in mind? ) It is a Farsi codeset, not yet adopted by ISO, but by the Iranian Group of Normalization. It is also used on the Web with the PMosaic browser. It is the actual 8 bit standard for Farsi. The Farsi language cannot be managed by the ISO 8859-6 alone.
We decide to use ISO 8859-6 as Mail Charset since it is de-facto standard common to all platforms.
There is two types of host operating systems :
On these systems, the Bi-Di process must be done by Mozilla to display correctly HTML document, but all Operating System GUI will behave in Latin only (for <select...> , <textarea...> or <input...> fields in forms, or for dialog box such as Edit/Find in Page...).
The fontset must also be provided by Mozilla here.
On these systems, the Bi-Di rendering process is already done within the XDrawString() (Unix X11) or TextOut() (Windows) functions, and there is a potential risk that the Bi-Di process can be performed twice on the same string. This is not correct ane will give garbled output. So there is two options here:
The advantage to use an Arabic OS is that all GUI widgets and keyboard input will also work properly in Arabic. The System Arabic fonts could be used, or new font can be add, but according the same fontset that the system's one.
Posted by Catalin Rotaru on Sep 28, 1998
http://people.netscape.com/cata/i18n/index.html
It is strongly recommended to look at these diff files (diff beetween Comunicator 4.04 and a Win-32 only Bi-Di implementation) It is great for understanding the effort in implementing a Bi-Di Mozilla.
See the Frank Tang doc : How To Add Additional Charset : http://www.mozilla.org/docs/refList/i18n/addcharset.html
At this stage, I think that the best solution should be to define a Mozilla specific
API, that could be later implemented using Specific system libraries (UNIX CTL, Arabic
Windows, Arabic MacOS...) .
Here is a draft for an API definition proposal.
In the case we cannot use an existing Arabic System library (on pure Latin operating system for example), then the API must be implemented from scratch or from an existing public code (if it exists and is usable)
New: 15-Jan-1999 : Dov Grobgeld <dov@imagic.weizmann.ac.il> announces the first alpha version of FriBidi, a Free BiDi library that adhers closely to the Unicode BiDi algorithm. See http://imagic.weizmann.ac.il/~dov/freesw/FriBidi for more info.
However, under such systems, the GUI side (dialog boxes, text input forms...) will behave only in Latin (no dual keyboard management)
This part should determine if Mozilla Arabic support expects that all the RTL/LTR management is done as :
But this point should be in accordance with the HTML 4.0 definition. Please send you feedback here, this is really an open subject that need more input and discussions...
The API function calls must be embedded within the Mozilla
source tree to get the Bi-Di and Arabic support build-in. This is a complex part where the
following issues must be taken in account:
Hebrew specific :
This part has been directly created from the Dotan Dimet document : "A Proposal For Preliminary Hebrew Support In Mozilla" (URL??) where I made some light modification (Please Dotan, send me your comments)There are several charset commonly used on the web for Arabic/Hebrew languages. We decide to support the following:
We decide to use ISO 8859-8 as Mail Charset since it is the standard to all platforms for data exchange (RFC 1555).
By Dotan Dimet (Email: dotan@usa.net ) (Modified by Franck Portaneri <franck@langbox.com> - Dotan, any comments???):
1 - Support of Hebrew Visual : This means adding support for "visual" display of the iso-8859-8 charset.
Currently, most of hebrew language documents on the internet use the webfont or visual
encoding to display hebrew. The Visual encoding method does not rely on the OS or
windowing environment for hebrew support. In fact, it actively ignores such support by
requiring the user to install special fonts and the page creator to write his hebrew text
in reverse (if he's using an application with hebrew support) and use HTML tags such as
PRE and NOBR to handle line-breaking. Despite the hassle, this lowest common denominator
de-facto standard is in such wide use that it has been ratified officially, and Israeli
standard bodies have determined that the following META tag should be used to label such
pages:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=iso-8859-8">
Mozilla doesn't recognize this tag. Or rather, when it sees it, it sets the encoding to "Western (iso-8859-1)", and treats the hebrew text as a standard (Western) 8-bit character set, without applying any Bi-Di algorithm. However, if the special "web fonts" are chosen for this encoding, the pages will be readable.
Problems with this method include line-breaking (must be controlled by HTML tags, must not be done automatically by the display), printing (on systems with hebrew support the bidi algorithm kicks in, reversing text), and font choice (the limited selection of special web fonts is rather ugly).
The two big advantages of this method is that it should work on systems without any built-in hebrew support, and that is the de-facto standard.
The suggestion is to add support for this charset to the user interface. Instead of overriding the "Western" encoding, the user should have a seperate entry for "iso-8859-8 (visual)" where he can install his web fonts. A good improvement to this would be to bypass font/language association, and let the user use any installed hebrew fonts to view pages. This in fact is what the Hebrew version of Internet Explorer allows you to do. You'll still need to install fonts if your system has no hebrew support (and you'll still probably see the page title and any form elements as messed up), but if you have a Hebrew-aware system, you'll get more choice.
The second level of this "Visual" support should be to make it available on Hebrew Operating systems by either disable the System Bi-Di rendering in the TextOut (or equivalent) function, or by performing a reverse-transformation on the Visual line to get back the logical (Implicit) one and let the OS render it correctly (but a little bit tricky and resource consuming).
2. - Support Hebrew Implicitly: This means adding support for the logical or "implicit" interpretation of iso-8859-8. Documents written in this method will not be reversed when viewed with applications that DON'T have an hebrew support, it will be shown in the inputing order. The charset tag used should be"iso-8859-8-i", and the Bi-Di algorithm should be used to present this text. It consists in the support for codes that implicitaly set the text's direction (e.g. Latin, digit or punctuation mark characters are considered as LTR ("Left-To-Right") direction characters, while Hebrew characters are considered as RTL ("Right-To-Left") In fact, the Implicit coding represents and store the exact entry sequence of keys pressed by the user when he/she wrote the text. The support of this encoding is necessary for text editing.
On operating systems with Hebrew support, this implicit support is already there, and the Hebrew text will be displayed correctly, but without Bi-Di support within Mozilla, the text selection for cut/paste operation, mouse pointing will not work properly. But here, we should take care that the Bi-Di process is not performed twice on the same line (in Mozilla and in the OS TextOut (or equivalent) functions).
On standard (English) Operating systems, If you use a font that the system knows is hebrew to look at some text in the browser, it will be displayed the way it was written (and then cannot be read correctly)
3 - The Fiddly Bits: These include support for tricky directionality codes, HTML 4 stuff, CSS(?), Forms, and Javascript.
4- The support of Hebrew Explicit: This is really an optional case. Apparently, it is not really used for Web document, unless someone can explain or gives some input here : It consists in the support for codes that explicitly set the text's direction (codes that exist in iso-8859-8 and Unicode, as well as those in HTML 4) and that should be included to force specific nested LTR ot RTL sub-string within a line. The Bi-Di algorithm's should attempts to interpret these codes and by-pass the implicit ordering of characters to render its output text. The charset tag used could be "iso-8859-8-e".
XFE fonts: http://www.langbox.com/AraMosaic/mozilla/fontXFE
(See README file)
To be determined ...