|
|
Start of Tutorial > Start of Trail > Start of Lesson |
Search
Feedback Form |
You invoke thegetWordIteratormethod to instantiate aBreakIteratorthat detects word boundaries:BreakIterator wordIterator = BreakIterator.getWordInstance(currentLocale);You'll want to create such a
BreakIteratorwhen your application needs to perform operations on individual words. These operations might be common word- processing functions, such as selecting, cutting, pasting, and copying. Or, your application may search for words, and it must be able to distinguish entire words from simple strings.When a
BreakIteratoranalyzes word boundaries, it differentiates between words and characters that are not part of words. These characters, which include spaces, tabs, punctuation marks, and most symbols, have word boundaries on both sides.The example that follows, which is from the program
BreakIteratorDemo, marks the word boundaries in some text. The program creates the
BreakIteratorand then calls themarkBoundariesmethod:Locale currentLocale = new Locale ("en","US"); BreakIterator wordIterator = BreakIterator.getWordInstance(currentLocale); String someText = "She stopped. " + "She said, \"Hello there,\" and then went on."; markBoundaries(someText, wordIterator);The
markBoundariesmethod is defined inBreakIteratorDemo.java. This method marks boundaries by printing carets (^) beneath the target string. In the code that follows, notice thewhileloop wheremarkBoundariesscans the string by calling thenextmethod:static void markBoundaries(String target, BreakIterator iterator) { StringBuffer markers = new StringBuffer(); markers.setLength(target.length() + 1); for (int k = 0; k < markers.length(); k++) { markers.setCharAt(k,' '); } iterator.setText(target); int boundary = iterator.first(); while (boundary != BreakIterator.DONE) { markers.setCharAt(boundary,'^'); boundary = iterator.next(); } System.out.println(target); System.out.println(markers); }The output of the
markBoundariesmethod follows. Note where the carets (^) occur in relation to the punctuation marks and spaces:She stopped. She said, "Hello there," and then went on. ^ ^^ ^^ ^ ^^ ^^^^ ^^ ^^^^ ^^ ^^ ^^ ^^The
BreakIteratorclass makes it easy to select words from within text. You don't have to write your own routines to handle the punctuation rules of various languages; theBreakIteratorclass does this for you.The
extractWordsmethod in the following example extracts and prints words for a given string. Note that this method usesCharacter.isLetterOrDigitto avoid printing "words" that contain space characters.static void extractWords(String target, BreakIterator wordIterator) { wordIterator.setText(target); int start = wordIterator.first(); int end = wordIterator.next(); while (end != BreakIterator.DONE) { String word = target.substring(start,end); if (Character.isLetterOrDigit(word.charAt(0))) { System.out.println(word); } start = end; end = wordIterator.next(); } }The
BreakIteratorDemoprogram invokesextractWords, passing it the same target string used in the previous example. TheextractWordsmethod prints out the following list of words:She stopped She said Hello there and then went on
|
|
Start of Tutorial > Start of Trail > Start of Lesson |
Search
Feedback Form |
Copyright 1995-2004 Sun Microsystems, Inc. All rights reserved.