`
leonzhx
  • 浏览: 766632 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Language Guide

 
阅读更多

1.  T he SearchRequest message definition below specifies three fields (name/value pairs), one for each piece of data that you want to include in this type of message. Each field has a name and a type:

 

 

message SearchRequest {

  required string query = 1;

  optional int32 page_number = 2;

  optional int32 result_per_page = 3;

} 
 

In the above example, all the fields are scalar types : two integers and a string. However, you can also specify composite types for your fields, including enumerations and other message types.

 

2.   Each field in the message definition has a unique numbered tag. These tags are used to identify your fields in the message binary format , and should not be changed once your message type is in use. Note that tags with values in the range 1 through 15 take one byte to encode, including the identifying number and the field's type (you can find out more about this in Protocol Buffer Encoding ). Tags in the range 16 through 2047 take two bytes. So you should reserve the tags 1 through 15 for very frequently occurring message elements. The smallest tag number you can specify is 1, and the largest is 229 - 1, or 536,870,911. You also cannot use the numbers 19000 through 19999 (FieldDescriptor::kFirstReservedNumber through FieldDescriptor::kLastReservedNumber ), as they are reserved for the Protocol Buffers implementation - the protocol buffer compiler will complain if you use one of these reserved numbers in your .proto .

 

3.   You specify that message fields are one of the following:

  a)   required : a well-formed message must have exactly one of this field.

  b)   optional : a well-formed message can have zero or one of this field.

  c)   repeated : this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

 

4.   For historical reasons, repeated fields of basic numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding:

repeated int32 samples = 4 [packed=true];

 

5.   Multiple message types can be defined in a single .proto file. This is useful if you are defining multiple related messages.

 

6.   To add comments to your .proto files, use // syntax.

 

7.   When you run the protocol buffer compiler on a .proto , the compiler generates the code in your chosen language you'll need to work with the message types you've described in the file, including getting and setting field values, serializing your messages to an output stream, and parsing your messages from an input stream. For Java , the compiler generates a .java file with a class for each message type, as well as a special Builder classes for creating message class instances.

 

8.   A scalar message field can have one of the following types:

  .proto Type

Notes

Java Type

double


double

float


float

int32

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

int

int64

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.

long

uint32

Uses variable-length encoding.

Int

uint64

Uses variable-length encoding.

Long

sint32

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

Int

sint64

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.

long

fixed32

Always four bytes. More efficient than uint32 if values are often greater than 228 .

int

fixed64

Always eight bytes. More efficient than uint64 if values are often greater than 256 .

long

sfixed32

Always four bytes.

int

sfixed64

Always eight bytes.

long

bool


boolean

string

A string must always contain UTF-8 encoded or 7-bit ASCII text.

String

bytes

May contain any arbitrary sequence of bytes.

ByteString

 

9.   When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field. The default value can be specified as part of the message description:

optional int32 result_per_page = 3 [default = 10];
 

 

If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For bools, the default value is false. For numeric types, the default value is zero. For enums, the default value is the first value listed in the enum's type definition.

 

10.   A field with an enum type can only have one of a specified set of constants as its value (if you try to provide a different value, the parser will treat it like an unknown field):

 

enum Corpus {

    UNIVERSAL = 0;

    WEB = 1;

    IMAGES = 2;

     } 
 

Enumerator constants must be in the range of a 32-bit integer. Since enum values use varint encoding on the wire, negative values are inefficient and thus not recommended. You can also use an enum type declared in one message as the type of a field in a different message, using the syntax MessageType.EnumType . When you run the protocol buffer compiler on a .proto that uses an enum , the generated code will have a corresponding enum .

 

11.   You can use other message types as field types. You can use definitions from other .proto   files by importing them:

import "myproject/other_protos.proto";
 

 

The protocol compiler searches for imported files in a set of directories specified on the protocol compiler command line using the -I/--proto_path flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the --proto_path flag to the root of your project and use fully qualified names for all imports.

 

12.   Extensions let you declare that a range of field numbers in a message are available for third-party extensions. Other people can then declare new fields for your message type with those numeric tags in their own .proto files without having to edit the original file:

message Foo {

  // ...

  extensions 100 to 199;

}
 

This says that the range of field numbers [100, 199] in Foo is reserved for extensions. Other users can now add new fields to Foo in their own .proto files that import your .proto , using tags within your specified range:

 

 

extend Foo {

  optional int32 bar = 126;

} 
 

13.   You can specify that your extension range goes up to the maximum possible field number using the max keyword:

 

message Foo {

  extensions 1000 to max;

} 
 

 

14.   You can declare extensions in the scope of another type:

message Baz {

  extend Foo {

    optional int32 bar = 126;

  }

  ...

}
 

Declaring an extend block nested inside a message type does not imply any relationship between the outer type and the extended type. In particular, the above example does not mean that Baz is any sort of subclass of Foo . All it means is that the symbol bar is declared inside the scope of Baz ; it's simply a static member.

 

 

15.   It's very simple to update message types without breaking any of your existing code. Just remember the following rules:

  a)   Don't change the numeric tags for any existing fields.

  b)  A ny new fields that you add should be optional or repeated .

  c)   Non-required fields can be removed, as long as the tag number is not used again in your updated message type (it may be better to rename the field instead, perhaps adding the prefix "OBSOLETE_ ", so that future users of your .proto can't accidentally reuse the number).

  d)   A non-required field can be converted to an extension and vice versa, as long as the type and number stay the same.

  e)   int32 , uint32 , int64 , uint64 , and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn't fit in the corresponding type, you will get the same effect as if you had cast the number to that type (e.g. if a 64-bit number is read as an int32 , it will be truncated to 32 bits).

  f)   sint32 and sint64 are compatible with each other but are not compatible with the other integer types.

  g)   string and bytes are compatible as long as the bytes are valid UTF-8.

  h)   Embedded messages are compatible with bytes if the bytes contain an encoded version of the message.

  i )   f ixed32 is compatible with sfixed32 , and fixed64 with sfixed64 .

  j)   optional is compatible with repeated . Given serialized data of a repeated field as input, clients that expect this field to be optional will take the last input value if it's a primitive type field or merge all input elements if it's a message type field.

  k)   Changing a default value is generally OK, as long as you remember that default values are never sent over the wire.

 

16.   You can add an optional package specifier to a .proto file to prevent name clashes between protocol message types:

package foo.bar;

message Open { ... }
 

You can then use the package specifier when defining fields of your message type:

 

    

message Foo {

      ...

  required foo.bar.Open open = 1;

  ...

}
 

 

In Java , the package is used as the Java package, unless you explicitly provide an option java_package in your .proto file.

 

17.   Type name resolution in the protocol buffer language works like: first the innermost scope is searched, then the next-innermost, and so on, with each package considered to be "inner" to its parent package. A leading '.' (for example, .foo.bar.Baz) means to start from the outermost scope instead.

 

18.   If you want to use your message types with an RPC (Remote Procedure Call) system, you can define an RPC service interface in a  .proto file and the protocol buffer compiler will generate service interface code and stubs in your chosen language:


 

service SearchService {

  rpc Search (SearchRequest) returns (SearchResponse);

}
 

The protocol compiler will then generate an abstract interface called SearchService and a corresponding "stub" implementation. The stub forwards all calls to an RpcChannel , which in turn is an abstract interface that you must define yourself in terms of your own RPC system.

The generated code may be undesirable as it is not tied to any particular RPC system, and thus requires more levels of indirection that code tailored to one system. If you do NOT want this code to be generated, you can specify the “option java_generic_services = false; ” in the .proto file. The option defaults to false, as generic services are deprecated after 2.4.0. RPC systems based on .proto-language service definitions should provide plugins to generate code approriate for the system.

 

19.   Individual declarations in a .proto file can be annotated with a number of options. Options do not change the overall meaning of a declaration, but may affect the way it is handled in a particular context. The complete list of available options is defined in google/protobuf/descriptor .proto . Some options are file-level options, meaning they should be written at the top-level scope, not inside any message, enum, or service definition. Some options are message-level options, meaning they should be written inside message definitions. Options can also be written on fields, enum types, enum values, service types, and service methods;

 

20.  j ava_package (file option): The package you want to use for your generated Java classes. If no explicit java_package option is given in the .proto file, then by default the proto package (specified using the "package " keyword in the .proto file) will be used:

option java_package = "com.example.foo";
 

 

 

21.   java_outer_classname (file option): The class name for the outermost Java class (and hence the file name) you want to generate. If no explicit java_outer_classname is specified in the .proto file, the class name will be constructed by converting the .proto file name to camel-case (so foo_bar.proto becomes FooBar.java ):

option java_outer_classname = "Ponycopter";
 

 

 

22.   optimize_for (file option): Can be set to SPEED , CODE_SIZE , or LITE_RUNTIME . This affects the Java code generators in the following ways:

  a)   SPEED (default): The protocol buffer compiler will generate code for serializing, parsing, and performing other common operations on your message types. This code is extremely highly optimized.

  b)   CODE_SIZE : The protocol buffer compiler will generate minimal classes and will rely on shared, reflection-based code to implement serialialization, parsing, and various other operations. The generated code will thus be much smaller than with SPEED , but operations will be slower. Classes will still implement exactly the same public API as they do in SPEED mode.

  c)   LITE_RUNTIME : The protocol buffer compiler will generate classes that depend only on the "lite" runtime library (libprotobuf-lite instead of libprotobuf ). The lite runtime is much smaller than the full library (around an order of magnitude smaller) but omits certain features like descriptors and reflection. The compiler will still generate fast implementations of all methods as it does in SPEED mode. Generated classes will only implement the MessageLite interface in each language, which provides only a subset of the methods of the full Message interface.

 

23.   packed (field option): If set to true on a repeated field of a basic integer type, a more compact encoding will be used. However, note that prior to version 2.3.0, parsers that received packed data when not expected would ignore it. Therefore, it was not possible to change an existing field to packed format without breaking wire compatibility. In 2.3.0 and later, this change is safe, as parsers for packable fields will always accept both formats:

repeated int32 samples = 4 [packed=true];
 

 

 

24.   Protocol Buffers even allow you to define and use your own options. Since options are defined by the messages defined in google/protobuf/descriptor.proto (like FileOptions or FieldOptions ), defining your own options is simply a matter of extending those messages:

import "google/protobuf/descriptor.proto";

 

extend google.protobuf.MessageOptions {

  optional string my_option = 51234;

}

 

message MyMessage {

  option (my_option) = "Hello world!";

}
 

The option name must be enclosed in parentheses to indicate that it is an extension. We can now read the value of my_option in Java:

 

String value = MyProtoFile.MyMessage.getDescriptor().getOptions()

  .getExtension(MyProtoFile.myOption); 
 

 

25.   If you want to use a custom option in a package other than the one in which it was defined, you must prefix the option name with the package name, just as you would for type names:

 

message FooOptions {

  optional int32 opt1 = 1;

  optional string opt2 = 2;

}

 

extend google.protobuf.FieldOptions {

  optional FooOptions foo_options = 1234;

}

 

// usage:

message Bar {

  optional int32 a = 1 [(foo_options.opt1) = 123, (foo_options.opt2) = "baz"];

  // alternative aggregate syntax (uses TextFormat):

  optional int32 b = 2 [(foo_options) = { opt1: 123 opt2: "baz" }];

} 
 

 

 

26.   Since custom options are extensions, they must be assigned field numbers like any other field or extension. The range 50000-99999 is reserved for internal use within individual organizations. To obtain globally unique field numbers, please send a request to protobuf-global-extension-registry@google.com . Simply provide your project name (e.g. Object-C plugin) and your project website (if available). Usually you only need one extension number. You can declare multiple options with only one extension number by putting them in a sub-message:

 

message FooOptions {

  optional int32 opt1 = 1;

  optional string opt2 = 2;

}

 

extend google.protobuf.FieldOptions {

  optional FooOptions foo_options = 1234;

}

 

// usage:

message Bar {

  optional int32 a = 1 [(foo_options.opt1) = 123, (foo_options.opt2) = "baz"];

  // alternative aggregate syntax (uses TextFormat):

  optional int32 b = 2 [(foo_options) = { opt1: 123 opt2: "baz" }];

} 
 

 

27.  T o generate the Java code you need to work with the message types defined in a .proto file, you need to run the protocol buffer compiler  protoc on the .proto :

protoc --proto_path=IMPORT_PATH --java_out=DST_DIR   path/to/file.proto
 

 

  a)   IMPORT_PATH specifies a directory in which to look for .proto files when resolving import directives. If omitted, the current directory is used. Multiple import directories can be specified by passing the --proto_path option multiple times; they will be searched in order. -I=IMPORT_PATH can be used as a short form of --proto_path .

  b)   --java_out generates Java code in DST_DIR . As an extra convenience, if the DST_DIR ends in .zip or .jar , the compiler will write the output to a single ZIP-format archive file with the given name. .jar outputs will also be given a manifest file as required by the Java JAR specification. Note that if the output archive already exists, it will be overwritten; the compiler is not smart enough to add files to an existing archive.

  d)   You must provide one or more .proto files as input. Multiple .proto files can be specified at once. Although the files are named relative to the current directory, each file must reside in one of the IMPORT_PATH s so that the compiler can determine its canonical name.

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics