History log of /PHP_TRUNK/ext/intl/breakiterator/breakiterator_methods.cpp
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
d0cb715 19-Sep-2014 Johannes Schlüter <johannes@php.net> s/PHP 5/PHP 7/
3234480 27-Aug-2014 Anatol Belski <ab@php.net> first show to make 's' work with size_t
c3e3c98 25-Aug-2014 Anatol Belski <ab@php.net> master renames phase 1
063079b 19-Aug-2014 Anatol Belski <ab@php.net> ported ext/intl, bugfixes to go
63d3f0b 19-Aug-2014 Anatol Belski <ab@php.net> basic macro replacements, all at once
9fb8c16 29-Jun-2014 Xinchen Hui <laruence@php.net> Fixed temporarily un-expected object re-init
b6e9c76 27-Jun-2014 Xinchen Hui <laruence@php.net> Refactoring ext/intl (only compilerable now, far to finish :<)
8aff7f0 28-Mar-2013 Nikita Popov <nikic@php.net> Fix "passing NULL to non-pointer argument" warnings in intl

The second argument to spprintf is a size_t (maximum length).
8d264db 27-Jun-2012 Felipe Pena <felipensp@gmail.com> - Fixed build
77daa34 22-Jun-2012 Gustavo André dos Santos Lopes <cataphract@php.net> BreakIterator::getPartsIterator: new optional arg

Can take one of:
* IntlPartsIterator::KEY_SEQUENTIAL (keys are 0, 1, ...)
* IntlPartsIterator::KEY_LEFT (keys are left boundaries)
* IntlPartsIterator::KEY_LEFT (keys are right boundaries)

The default is IntlPartsIterator::KEY_SEQUENTIAL (the previous behavior).
0a7ae87 22-Jun-2012 Gustavo André dos Santos Lopes <cataphract@php.net> Added IntlCodePointBreakIterator.

Objects of this class can be instantiated with


The method does not take a locale, as it would not make sense in this

This class has one additional method:

long IntlCodePointIterator::getLastCodePoint()

which returns either -1 or the last code point we moved over, if any
(and discounting any movement before the last call to
IntlBreakIterator::first() or IntlBreakIterator::last()).
87dd026 10-Jun-2012 Gustavo André dos Santos Lopes <cataphract@php.net> Remove trailing space
a4925fa 09-Jun-2012 Gustavo André dos Santos Lopes <cataphract@php.net> Replaced zend_parse_method_params with plain zpp
afed66b 09-Jun-2012 Gustavo André dos Santos Lopes <cataphract@php.net> BreakIter: Removed getAvailableLocales/getHashCode
f5b4216 30-May-2012 Gustavo André dos Santos Lopes <cataphract@php.net> BreakIterator and RuleBasedBreakiterator added

This commit adds wrappers for the classes BreakIterator and
RuleBasedbreakIterator. The C++ ICU classes are described here:

Additionally, a tutorial is available at:

This implementation wraps UTF-8 text in a UText. The text is
iterated without any copying or conversion to UTF-16. There is
also no validation that the input is actually UTF-8; where there
are malformed sequences, the UText will simply U+FFFD.

The class BreakIterator cannot be instantiated directly (has a
private constructor). It provides the interface exposed by the ICU
abstract class with the same name. The PHP class is not abstract
because we may use it to wrap native subclasses of BreakIterator
that we don't know how to wrap. This class includes methods to
move the iterator position to the beginning (first()), to the
end (last()), forward (next()), backwards (previous()), to the
boundary preceding a certain position (preceding()) and following
a certain position (following()) and to obtain the current position
(current()). next() can also be used to advance or recede an
arbitrary number of positions.

BreakIterator also exposes other native methods:
getAvailableLocales(), getLocale() and factory methods to build
several predefined types of BreakIterators: createWordInstance()
for word boundaries, createCharacterInstance() for locale
dependent notions of "characters", createSentenceInstance() for
sentences, createLineInstance() and createTitleInstance() -- for
title casing breaks. These factories currently return
RuleBasedbreakIterators where the names of the rule sets are found
in the ICU data, observing the passed locale (although the locale
is taken into considering there are very few exceptions to the
root rules).

The clone and compare_object PHP object handlers are also
implemented, though the comparison does not yield meaningful results
when used with >, <, >= and <=.

Note that BreakIterator is an iterator only in the sense of the
first 'Iterator' in 'IteratorIterator', i.e., it does not
implement the Iterator interface. The reason is that there is
no sensible implementation for Iterator::key(). Using it for
an ordinal of the current boundary is not feasible because
we are allowed to move to any boundary at any time. It we were
to determine the current ordinal when last() is called we'd
have to traverse the whole input text to find out how many
breaks there were before. Therefore, BreakIterator implements
only Traversable. It can be wrapped in an IteratorIterator,
but the usual warnings apply.

Finally, I added a convenience method to BreakIterator:
getPartsIterator(). This provides an IntlIterator, backed
by the BreakIterator PHP object (i.e. moving the pointer or
changing the text in BreakIterator affects the iterator
and also moving the iterator affects the backing BreakIterator),
which allows traversing the text between each boundary.
This iterator uses the original text to retrieve the text
between two positions, not the code points returned by the
wrapping UText. Therefore, if the text includes invalid code
unit sequences, these invalid sequences will be in the output
of this iterator, not U+FFFD code points.

The class RuleBasedIterator exposes a constructor that allows
building an iterator from arbitrary compiled or non-compiled
rules. The form of these rules in described in the tutorial linked
above. The rest of the methods allow retrieving the rules --
getRules() and getCompiledRules() --, a hash code of the rule set
(hashCode()) and the rules statuses (getRuleStatus() and

Because the RuleBasedBreakIterator constructor may return parse
errors, I reuse the UParseError to text function that was in the
transliterator files. Therefore, I move that function to

common_enum.cpp was also changed, mainly to expose previously
static functions. This avoided code duplication when implementing
the BreakIterator iterator and the IntlIterator returned by