Submitted by kratib on Sun, 17/04/2005 - 12:10.
( categories: Programming )

Hi all,

I'm trying to use preg_replace for some Arabic text normalization. Specifically, convert all end-of-word 'ي' to 'ى'. So I used the following:

preg_replace('/ي\b/', 'ى', $subject);

but it's not replacing anything, even though using it does work with latin chars. I'm using UTF-8. I've read elsewhere that there's a problem but I have no experience with PCRE.

Has anybody encountered this before?

TIA, Karim


Alaa's picture
Submitted by Alaa on Sun, 17/04/2005 - 13:02.

until recently pcre functions did not support UTF-8 at all, now they do but you have to add a u flag to all your regular expressions to indicate that this is unicode text.

as for the non pcre functions they don't support any form of multibyte encoding you have to use the multibyte extension instead and use mb_ prefixed functions .

cheers, Alaa


http://www.manalaa.net "i`m feeling for the 2nd time like alice in wonderland reading el wafd"


Submitted by kratib on Sun, 17/04/2005 - 17:05.

Thanks Alaa,

mb_ereg_replace did the trick - the POSIX syntax is just more annoying than PCRE...

As for /u, it didn't work, but then I did not pursue it very far.

Thanks again, K.

Submitted by alienbrain on Mon, 18/04/2005 - 00:55.

If I were you, I'd do my best to get it to work using preg_replace with /u modifier, because mb library is not available on many php installations.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.