`
leonzhx
  • 浏览: 767840 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Chapter 13. Strings -- Thinking in Java

阅读更多

1) Objects of the String class are immutable. Every method in the String class that appears to modify a String actually creates and returns a brand new String object containing the modification. The original String is left untouched.

 

2) The '+' and '+=' for String are the only operators that are overloaded in Java, and Java does not allow the programmer to overload any others.

 

3) When you create a toString( ) method, if the operations are simple ones that the compiler can figure out on its own, you can generally rely on the compiler to build the result in a reasonable fashion(Using StirngBuilder or String to store mediate result). But if looping is involved (you repeatedly use += to operate on a String), you should explicitly use a StringBuilder in your toString( ) to store mediate result for performance consideration. StringBuilder was introduced in Java SE5. Prior to this, Java used StringBuffer, which ensured thread safety and so was significantly more expensive.

 

4)Do not use this to form the String you will return in toString() method, the compiler will try to convert this to a String. It does this conversion by calling toString( ), which produces a recursive call. If you really do want to print the address of the object, the solution is to call the Object.toString( ) method, which does just that. So instead of saying this, you’d say super.toString( ).

 

5) Here are some of the basic methods available for String objects :

 

 

 

6) Java SE5 introduced the format( ) method, available to PrintStream or PrintWriter objects, which includes System.out. The format( ) method is modeled after C’s printf( ). There’s even a convenience printf( ) method that you can use if you’re feeling nostalgic, which just calls format( ).

 

7) All of Java’s new formatting functionality is handled by the Formatter class in the java.util package. You can think of Formatter as a translator that converts your format string and data into the desired result. When you create a Formatter object, you tell it where you want this result to go by passing that information to the constructor: The constructor is overloaded to take a range of output locations, but the most useful are PrintStreams , OutputStreams, and Files.

 

8) The general syntax of format specifier is :

%[argument_index$][flags][width][.precision]conversion

 

    a. width is used to control the minimum size of a field. It is applicable to all of the data conversion types and behaves the same with each. The Formatter guarantees that a field is at least a certain number of characters wide by padding it with spaces if necessary. By default, the data is right justified, but this can be overridden by including a '-' in the flags section.(left justified)

    b. precision has a different meaning for different types. For Strings, the precision specifies the maximum number of characters from the String to print. For floating point numbers, precision specifies the number of decimal places to display (the default is 6), rounding if there are too many or adding trailing zeroes if there are too few. Since integers have no fractional part, precision isn’t applicable to them and you’ll get an exception if you use precision with an integer conversion type.

 

9) These are the conversions you’ll come across most frequently:



 

10) The following conversions cann't be applied to the corresponding data types, otherwise a runtime exception will be thrown :

 char  :  d , f , e , x

 byte, short , int : f, e

 BigInteger : c, f , e

 float, double : d , c, x

 Object : d , c,  f , e , x

Criterias: Character is not a number. Integer is not Floating Number but can be taken as Character while BigInteger can't.

 

11) String.format( ) is a static method which takes all the same arguments as Formatter’s format( ) but returns a String.

 

12) A regular expression is a way to describe strings in general terms, so that you can say, "If a string has these things in it, then it matches what I’m looking for."

 

13) In regular expressions, a digit is described by saying \d. Wile in Java, blackslash in a String need to be escaped, so "\\d" will be used. To express a literal blackslash in regular expression, you need use "\\\\" in Java.

 

14) To say that a number might or might not be preceded by a minus sign, you put in the minus sign followed by a question mark: -?. To indicate "one or more of the preceding expression," you use a '+'. So to say, "possibly a minus sign, followed by one or more digits," you write: "-?\\d+".

 

15) In regular expressions, parentheses have the effect of grouping an expression, and the vertical bar '|' means OR. So "(-I\\+)?" means that this part of the string may be either a '-' or a '+' or nothing (because of the '?'). Because the '+' character has special meaning in regular expressions, it must be escaped with a '\\' in order to appear as an ordinary character in the expression.

 

16) A useful regular expression tool that’s built into String is split( ), which means, "Split this string around matches of the given regular expression." An overloaded version of String. split( ) allows you to limit the number of splits that occur.

 

17) \W ("\\W" in Java) means a non-word character (the lowercase version, \w, means a word character).

 

18) The final regular expression tool built into String is replacement. You can either replace the first occurrence. (replaceFirst() , replaceAll() )

 

19) A complete list of constructs for building regular expressions can be found in the JDK documentation for the Pattern class for package java.util.regex. The fowllowings are the special character forms in a java String.

 


20) The power of regular expressions begins to appear when you are defining character classes. Here are some typical ways to create character classes, and some predefined classes:



 

  

 

21)  A quantifier describes the way that a pattern absorbs input text:
    a. Greedy: Quantifiers are greedy unless otherwise altered. A greedy expression finds as many possible matches for the pattern as possible. Your pattern is actually greedy and will keep going until it’s matched the largest possible string.
    b. Reluctant: Specified with a question mark, this quantifier matches the minimum number of characters necessary to satisfy the pattern. Also called lazy, minimal matching, non-greedy, or ungreedy.
    c. Possessive: Currently this is only available in Java (not in other languages) and is more advanced. As a regular expression is applied to a string, it generates many states so that it can backtrack if the match fails. Possessive quantifiers do not keep those intermediate states, and thus prevent backtracking. They can be used to prevent a regular expression from running away and also to make it execute more efficiently.



 

 

22) The interface called CharSequence establishes a generalized definition of a character sequence abstracted from the CharBuffer, String, StringBuffer, or StringBuilder classes.

 

23) To compile regular expression objects, you import java.util.regex, then compile a regular expression by using the static Pattern.compile( ) method. This produces a Pattern object based on its String argument. You use the Pattern by calling the matcher( ) method, passing the string that you want to search. The matcher( ) method produces a Matcher object, which has a set of operations to choose from (you can see all of these in the JDK documentation for java.util.regex.Matcher).

 

24) Pattern also has a static method: static boolean matches(String regex, CharSequence input) to check whether regex matches the entire input CharSequence, and a split( ) method that produces an array of String that has been broken around matches of the regex.

 

25) A Matcher object is generated by calling Pattern.matcher( ) with the input string as an argument. The Matcher object is then used to access the results, using methods to evaluate the success or failure of different types of matches. matches ( ) method is successful if the pattern matches the entire input string, while lookingAt( ) is successful if the input string, starting at the beginning, is a match to the pattern. find( ) can be used to discover multiple pattern matches in the CharSequence to which it is applied. find( ) is like an iterator, moving forward through the input string. However, the second version of find( ) can be given an integer argument that tells it the character position for the beginning of the search—this version resets the search position to the value of the argument. find( ) will locate the regular expression anywhere in the input, but lookingAt( ) and matches( ) only succeed if the regular expression starts matching at the very beginning of the input. While matches( ) only succeeds if the entire input matches the regular expression, lookingAt( ) succeeds if only the first part of the input matches. An existing Matcher object can be applied to a new character sequence using the reset( ). reset( ) without any arguments sets the Matcher to the beginning of the current sequence.

 

26) Groups are regular expressions set off by parentheses that can be called up later with their group number. Group 0 indicates the whole expression match, group 1 is the first parenthesized group. The Matcher object has methods to give you information about groups:
public int groupCount( ) returns the number of groups in this matcher’s pattern. Group 0 is not included in this count.

public String group( ) returns group 0 (the entire match) from the previous match operation (find(), for example).
public String group(int i) returns the given group number during the previous match operation. If the match was successful, but if the group specified failed to match any part of the input string, then null is returned.
public int start(int group) returns the start index of the group found in the previous match operation.
public int end(int group) returns the index of the last character, plus one, of the group found in the previous match operation.

 

27) Following a successful matching operation, start( ) returns the start index of the previous match, and end( ) returns the index of the last character matched, plus one. Invoking either start( ) or end( ) following an unsuccessful matching operation (or before attempting a matching operation) produces an IllegalStateException.

 

28) An alternative compile( ) method accepts flags that affect matching behavior:

Pattern Pattern.compile(String regex, int flag)

where flag is drawn from among the following Pattern class constants:


 

The behavior of most of the flags can also be obtained by inserting the parenthesized characters, shown beneath the flags in the table, into your regular expression preceding the place where you want the mode to take effect. You can combine the effect of these and other flags through an "OR" (‘|’) operation.

 

29) Regular expressions are especially useful to replace text. Here are the available methods:
replaceFirst(String replacement) replaces the first matching part of the input string with replacement.
replaceAll(String replacement) replaces every matching part of the input string with replacement.
appendReplacement(StringBuffer sbuf, String replacement) performs step-by-step replacements into sbuf, rather than replacing only the first one or all of them. This is a very important method, because it allows you to call methods and perform other processing in order to produce replacement. With this method, you can programmatically pick apart the groups and create powerful replacements.
appendTail(StringBuffer sbuf) is invoked after one or more invocations of the appendReplacement( ) method in order to copy the remainder of the input string. appendReplacement( ) also allows you to refer to captured groups directly in the replacement string by saying "$g", where ‘g’ is the group number.

 

30) The Scanner class, added in Java SE5, relieves much of the burden of scanning input. The Scanner constructor can take just about any kind of input object, including a File object, an InputStream, a String, or a Readable, which is an interface introduced in Java SE5 to describe "something that has a read(CharBuffer) method." (The BufferedReader falls into this category.) With Scanner, the input, tokenizing, and parsing are all ensconced in various different kinds of "next" methods. A plain next( ) returns the next String token, and there are "next" methods for all the primitive types (except char) as well as for BigDecimal and Biglnteger. All of the "next" methods block, meaning they will return only after a complete data token is available for input. There are also corresponding "hasNext" methods that return true if the next input token is of the correct type. One of the assumptions made by the Scanner is that an IOException signals the end of input, and so these are swallowed by the Scanner. However, the most recent exception is available through the ioException( ) method, so you are able to examine it if necessary. By default, a Scanner splits input tokens along whitespace, but you can also specify your own delimiter pattern in the form of a regular expression. Besides useDelimiter( ) for setting the delimiter pattern, there is also delimiter( ), which returns the current Pattern being used as a delimiter. When you use next( ) with a specific pattern, that pattern is matched against the next input token. The result (of type MatchResult is made available by the match( ) method, it works just like the regular expression matching.
There’s one caveat when scanning with regular expressions. The pattern is matched against the next input token only, so if your pattern contains a delimiter it will never be matched.

 

 

  • 大小: 276.6 KB
  • 大小: 195.8 KB
  • 大小: 38.3 KB
  • 大小: 76.4 KB
  • 大小: 50.4 KB
  • 大小: 37.1 KB
  • 大小: 40.2 KB
  • 大小: 254.2 KB
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics