拒做PB Boy！教你巧用 Protobuf 反射来优化代码

2020-12-01

作者：iversonluo，腾讯 WXG 应用开发工程师

有些后台同学将自己称为 SQL Boy，因为负责的业务主要是对数据库进行增删改查。经常和 Proto 打交道的同学，是不是也会叫自己 PB Boy？因为大部分工作也是对 Proto 进行 SET 和 GET。面对大量重复且丑陋的代码，除了宏是否有更好的解决方法？本文结合 PB 反射给出了我在运营系统开发工作中的一些代码优化实践。

一、背景

Protobuf(下文称为 PB)是一种常见的数据序列化方式，常常用于后台微服务之间传递数据。

笔者目前主要的工作都是和表单打交道，而表单一般涉及到大量的数据输入，表单调用方一般将数据格式化为 JSON 后传给 CGI，而 CGI 和后台服务、后台服务之前会用 PB 传递数据。

在写代码时，经常会遇到一些丑陋的、圈复杂度较高、较难维护的关于 PB 的使用代码：

对字段的必填校验硬编码在代码中：如果需要变更校验规则，则需要修改代码；
一个字段一个 if 校验，圈复杂度较高：对传进来的字段每个字段都进行多种规则校验，例如长度，XSS，正则校验等，一个校验一个 if 代码，代码圈复杂度很高；
想要获取 PB 中所有的非空字段，形成一个 map<string,string>，需要大量的 if 判断和重复代码；
在后台服务间传递数据，由于模块由不同的人开发，导致相同字段的命名不一样，从一个 PB 中挑选一部分内容到另外一个 PB 中，需要大量的 GET 和 SET 代码。

是否可以有方法解决上面的几个问题呢？

答案是使用PB 反射。

二、PB 反射的使用

反射的一般定义如下：计算机程序在运行时可以访问、检测和修改它本身状态或行为。

protobuf 的类图如下：

从上图我们可以看出，Message 类继承于 MessageLite 类，业务一般自定义的 Person 类继承于 Message 类。

Descriptor 类和 Reflection 类都聚合于 Message，是弱依赖的关系。

类名类描述Descriptor对 Message 进行描述，包括 message 的名字、所有字段的描述、原始 proto 文件内容等FieldDescriptor对 Message 中单个字段进行描述，包括字段名、字段属性、原始的 field 字段等Reflection提供了动态读和写 message 中单个字段能力

所以一般使用 PB 反射的步骤如下：

1. 通过Message获取单个字段的FieldDescriptor
2. 通过Message获取其Reflection
3. 通过Reflection来操作FieldDescriptor，从而动态获取或修改单个字段

获取 Descript、Reflection 的函数：

const google::protobuf::Reflection* pReflection = pMessage->GetReflection();
const google::protobuf::Descriptor* pDescriptor = pMessage->GetDescriptor();

获取 FieldDescriptor 的函数：

const google::protobuf::FieldDescriptor * pFieldDesc = pDescriptor->FindFieldByName(id);

下面分别介绍上面的三个类。

2.1 类 Descriptor 介绍

类 Descriptor 主要是对 Message 进行描述，包括 message 的名字、所有字段的描述、原始 proto 文件内容等，下面介绍该类中包含的函数。

首先是获取自身信息的函数：

const std::string & name() const; // 获取message自身名字
int field_count() const; // 获取该message中有多少字段
const FileDescriptor* file() const; // The .proto file in which this message type was defined. Never nullptr.

在类 Descriptor 中，可以通过如下方法获取类 FieldDescriptor：

const FieldDescriptor* field(int index) const; // 根据定义顺序索引获取，即从0开始到最大定义的条目
const FieldDescriptor* FindFieldByNumber(int number) const; // 根据定义的message里面的顺序值获取（option string name=3，3即为number）
const FieldDescriptor* FindFieldByName(const string& name) const; // 根据field name获取
const FieldDescriptor* Descriptor::FindFieldByLowercaseName(const std::string & lowercase_name)const; // 根据小写的field name获取
const FieldDescriptor* Descriptor::FindFieldByCamelcaseName(const std::string & camelcase_name) const; // 根据驼峰的field name获取

其中FieldDescriptor* field(int index)和FieldDescriptor* FindFieldByNumber(int number)这个函数中index和number的含义是不一样的，如下所示：

message Student{
  optional string name = 1;
  optional string gender = 2;
  optional string phone = 5;
}

其中字段phone，其index为 5，但是其number为 2。

同时还有一个我们在调试中经常使用的函数：

std::string Descriptor::DebugString(); // 将message转化成人可以识别出的string信息

2.2 类 FieldDescriptor 介绍

类 FieldDescriptor 的作用主要是对 Message 中单个字段进行描述，包括字段名、字段属性、原始的 field 字段等。

其获取获取自身信息的函数：

const std::string & name() const; // Name of this field within the message.
const std::string & lowercase_name() const; // Same as name() except converted to lower-case.
const std::string & camelcase_name() const; // Same as name() except converted to camel-case.
CppType cpp_type() const; //C++ type of this field.

其中cpp_type()函数是来获取该字段是什么类型的，在 PB 中，类型的类目如下：

enum FieldDescriptor::Type {
  TYPE_DOUBLE = = 1,
  TYPE_FLOAT = = 2,
  TYPE_INT64 = = 3,
  TYPE_UINT64 = = 4,
  TYPE_INT32 = = 5,
  TYPE_FIXED64 = = 6,
  TYPE_FIXED32 = = 7,
  TYPE_BOOL = = 8,
  TYPE_STRING = = 9,
  TYPE_GROUP = = 10,
  TYPE_MESSAGE = = 11,
  TYPE_BYTES = = 12,
  TYPE_UINT32 = = 13,
  TYPE_ENUM = = 14,
  TYPE_SFIXED32 = = 15,
  TYPE_SFIXED64 = = 16,
  TYPE_SINT32 = = 17,
  TYPE_SINT64 = = 18,
  MAX_TYPE = = 18
}

类 FieldDescriptor 中还可以判断字段是否是必填，还是选填或者重复：

bool is_required() const; // 判断字段是否是必填
bool is_optional() const; // 判断字段是否是选填
bool is_repeated() const; // 判断字段是否是重复值

类 FieldDescriptor 中还可以获取单个字段的index或者tag:

int number() const; // Declared tag number.
int index() const; //Index of this field within the message's field array, or the file or extension scope's extensions array.

类 FieldDescriptor 中还有一个支持扩展的函数，函数如下：

// Get the FieldOptions for this field.  This includes things listed in
// square brackets after the field definition.  E.g., the field:
//   optional string text = 1 [ctype=CORD];
// has the "ctype" option set.  Allowed options are defined by FieldOptions in
// descriptor.proto, and any available extensions of that message.
const FieldOptions & FieldDescriptor::options() const

具体关于该函数的讲解在 2.4 章。

2.3 类 Reflection 介绍

该类提供了动态读、写 message 中单个字段能力。

读单个字段的函数如下：

// 这里由于篇幅，省略了一部分代码，后面的代码部分也有省略，有需要的可以自行阅读源码。
int32 GetInt32(const Message & message, const FieldDescriptor * field) const

std::string GetString(const Message & message, const FieldDescriptor * field) const

const Message & GetMessage(const Message & message, const FieldDescriptor * field, MessageFactory * factory = nullptr) const // 读取单个message字段

写单个字段的函数如下：

void SetInt32(Message * message, const FieldDescriptor * field, int32 value) const

void SetString(Message * message, const FieldDescriptor * field, std::string value) const

获取重复字段的函数如下：

int32 GetRepeatedInt32(const Message & message, const FieldDescriptor * field, int index) const

std::string GetRepeatedString(const Message & message, const FieldDescriptor * field, int index) const

const Message & GetRepeatedMessage(const Message & message, const FieldDescriptor * field, int index) const

写重复字段的函数如下：

void SetRepeatedInt32(Message * message, const FieldDescriptor * field, int index, int32 value) const

void SetRepeatedString(Message * message, const FieldDescriptor * field, int index, std::string value) const

void SetRepeatedEnumValue(Message * message, const FieldDescriptor * field, int index, int value) const // Set an enum field's value with an integer rather than EnumValueDescriptor. more..

新增重复字段设计如下：

void AddInt32(Message * message, const FieldDescriptor * field, int32 value) const

void AddString(Message * message, const FieldDescriptor * field, std::string value) const

另外有一个较为重要的函数，其可以批量获取字段描述并将其放置到 vector 中：

void Reflection::ListFields(const Message & message, std::vector< const FieldDescriptor * > * output) const

2.4 options 介绍

PB 允许在 proto 中自定义选项并使用选项。在定义 message 的字段时，不仅可以定义字段内容，还可以设置字段的属性，比如校验规则，简介等，结合反射，可以实现丰富丰富多彩的应用。

下面来介绍下：

import "google/protobuf/descriptor.proto";

extend google.protobuf.FieldOptions {
  optional uint32 attr_id              = 50000; //字段id
  optional bool is_need_encrypt        = 50001 [default = false]; // 字段是否加密,0代表不加密，1代表加密
  optional string naming_conventions1  = 50002; // 商户组命名规范
  optional uint32 length_min           = 50003  [default = 0]; // 字段最小长度
  optional uint32 length_max           = 50004  [default = 1024]; // 字段最大长度
  optional string regex                = 50005; // 该字段的正则表达式
}

message SubMerchantInfo {
  // 商户名称
  optional string merchant_name = 1 [
    (attr_id) = 1,
    (is_encrypt) = 0,
    (naming_conventions1) = "company_name",
    (length_min) = 1,
    (length_max) = 80,
    (regex.field_rules) = "[a-zA-Z0-9]"
  ];

使用方法如下：

#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>

std::string strRegex = FieldDescriptor->options().GetExtension(regex);

uint32 dwLengthMinp = FieldDescriptor->options().GetExtension(length_min);

bool bIsNeedEncrypt = FieldDescriptor->options().GetExtension(is_need_encrypt);

三、PB 反射的进阶使用

第二章给出了 PB 反射，以及具体的使用细节，在本章中，作者结合自己日常的代码，给出 PB 反射一些使用场景。并且以开发一个表单系统为例，讲一下 PB 反射在开发表单系统中的进阶使用。

3.1 获取 PB 中所有非空字段

在业务中，经常会需要获取某个 Message 中所有非空字段，形成一个 map<string,string>，使用 PB 反射写法如下：

#include "pb_util.h"

#include <sstream>

namespace comm_tools {
int PbToMap(const google::protobuf::Message &message,
            std::map<std::string, std::string> &out) {
#define CASE_FIELD_TYPE(cpptype, method, valuetype)                            
  case google::protobuf::FieldDescriptor::CPPTYPE_##cpptype: {                 
    valuetype value = reflection->Get##method(message, field);                 
    std::ostringstream oss;                                                    
    oss << value;                                                              
    out[field->name()] = oss.str();                                            
    break;                                                                     
  }

#define CASE_FIELD_TYPE_ENUM()                                                 
  case google::protobuf::FieldDescriptor::CPPTYPE_ENUM: {                      
    int value = reflection->GetEnum(message, field)->number();                 
    std::ostringstream oss;                                                    
    oss << value;                                                              
    out[field->name()] = oss.str();                                            
    break;                                                                     
  }

#define CASE_FIELD_TYPE_STRING()                                               
  case google::protobuf::FieldDescriptor::CPPTYPE_STRING: {                    
    std::string value = reflection->GetString(message, field);                 
    out[field->name()] = value;                                                
    break;                                                                     
  }

  const google::protobuf::Descriptor *descriptor = message.GetDescriptor();
  const google::protobuf::Reflection *reflection = message.GetReflection();

  for (int i = 0; i < descriptor->field_count(); i++) {
    const google::protobuf::FieldDescriptor *field = descriptor->field(i);
    bool has_field = reflection->HasField(message, field);

    if (has_field) {
      if (field->is_repeated()) {
        return -1; // 不支持转换repeated字段
      }

      const std::string &field_name = field->name();
      switch (field->cpp_type()) {
        CASE_FIELD_TYPE(INT32, Int32, int);
        CASE_FIELD_TYPE(UINT32, UInt32, uint32_t);
        CASE_FIELD_TYPE(FLOAT, Float, float);
        CASE_FIELD_TYPE(DOUBLE, Double, double);
        CASE_FIELD_TYPE(BOOL, Bool, bool);
        CASE_FIELD_TYPE(INT64, Int64, int64_t);
        CASE_FIELD_TYPE(UINT64, UInt64, uint64_t);
        CASE_FIELD_TYPE_ENUM();
        CASE_FIELD_TYPE_STRING();
      default:
        return -1; // 其他异常类型
      }
    }
  }

  return 0;
}
} // namespace comm_tools

通过上面的代码，如果需要在 proto 中增加字段，不再需要修改原来的代码。

3.2 将字段校验规则放置在 Proto 中

后台服务接收到前端传来的字段后，会对字段进行校验，比如必填校验，长度校验，正则校验，xss 校验等，这些规则我们常常会硬编码在代码中。但是随着后台字段的增加，校验规则代码会变得越来越多，越来越难维护。如果我们把字段的定义和校验规则和定义放在一起，这样是不是更好的维护？

示例 proto 如下：

syntax = "proto2";

package student;

import "google/protobuf/descriptor.proto";

message FieldRule{
    optional uint32 length_min = 1; // 字段最小长度
    optional uint32 id         = 2; // 字段映射id
}

extend google.protobuf.FieldOptions{
    optional FieldRule field_rule = 50000;
}

message Student{
    optional string name   =1 [(field_rule).length_min = 5, (field_rule).id = 1];
    optional string email = 2 [(field_rule).length_min = 10, (field_rule).id = 2];
}

然后我们自己实现 xss 校验，必填校验，长度校验，选项校验等代码。

示例校验最小长度代码如下：

#include <IOStream>
#include "student.pb.h"
#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>

using namespace std;
using namespace student;
using namespace google::protobuf;

bool minLengthCheck(const std::string &strValue, const uint32_t &dwLenthMin) {
    return strValue.size() < dwLenthMin;
}

int allCheck(const google::protobuf::Message &oMessage){
    const auto *poReflect = oMessage.GetReflection();

    vector<const FieldDescriptor *> vecFD;
    poReflect->ListFields(oMessage, &vecFD);

    for (const auto &poFiled : vecFD) {
        const auto &oFieldRule = poFiled->options().GetExtension(student::field_rule);
        if (poFiled->cpp_type() == google::protobuf::FieldDescriptor::CPPTYPE_STRING && !poFiled->is_repeated()) {
            // 类型是string并且选项非重复的才会校验字段长度类型
            const std::string strValue = poReflect->GetString(oMessage, poFiled);
            const std::string strName = poFiled->name();

            if (oFieldRule.has_length_min()) {
                // 有才进行校验，没有则不进行校验
                if (minLengthCheck(strValue, oFieldRule.length_min())) {
                    cout << "the length of " << strName << " is lower than " << oFieldRule.length_min()<<endl;
                } else {
                    cout << "check min lenth pass"<<endl;
                }
            }
        }
    }
    return 0;
}

int main() {
    Student oStudent1;
    oStudent1.set_name("xiao");

    Student oStudent2;
    oStudent2.set_name("xiaowei");

    allCheck(oStudent1);
    allCheck(oStudent2);

    return 0;
}

如上，如果需要校验最大长度，必填，xss 校验，只需要使用工厂模式，扩展代码即可。

新增一个字段或者变更某个字段的校验规则，只需要修改 Proto，不需要修改代码，从而防止因变更代码导致错误。

3.3 基于 PB 反射的前端页面自动生成方案

在我们常见的运营系统中，经常会涉及到各种各样的表单页面。在前后端交互方面，当需要增加字段或者变更字段的校验规则时，需要面临如下问题：

前端：针对新字段编写 html 代码，同时需要修改前端页面；
后台：针对每个字段做接收，并进行校验。

每增加或变更一个字段，我们都需要在前端和后台进行修改，工作量大，同时频繁变更容易导致错误。有什么方法可以解决这些问题吗？答案是使用 PB 的反射能力。

通过获取 Message 中每个字段的描述然后返回给前端，前端根据字段描述来展示页面，并且对字段进行校验。同时通过这种方式，前后端可以共享一份表单校验规则。

在使用上述方案之后，当我们需要增加字段或者变更字段的校验规则时，只需要在 Proto 中修改字段，大大节省了工作量，同时避免了因发布带来的风险问题。

3.4 通用存储系统

在运营系统中，前端输入字段，传入到后台，后台校验字段之后，一般还需要把数据存储到数据库中。

对于某些运营系统来说，其希望能够快速接入一些数据，传统开发常常会面临如下问题：

如何在不增加或变更表结构的基础上，如何快速接入数据？
如何零开发实现频繁添加字段、新增渠道等需求？
如何兼容不同业务、不同数据协议（比如 PB 中的不同 message）？

答案是使用 PB 的反射，使得有结构的数据转换为非结构的数据，然后存储到非关系型数据库（在微信支付侧一般存入到 table kv）中。

以 3.2 节中的 Proto 为例，举例如下，学生类中定义了两个字段，name 和 email 字段，原始信息为：

Student oStudent;
oStudent.set_name("xiaowei");
oStudent.set_email("test@tencent.com");

通过 PB 的反射，可以转化为平铺的结构：

[{"id":"1","value":"xiaowei"},{"id":"2","value":"test@tencent.com"}]

转化为平铺结构后，可以快速存入到数据库中。如果现在学生信息里需要增加一个字段 address，则不需要修改表结构，从而完成存储动作。利用 PB 反射，可以完成有结构数据和无结构数据之间的转换，达到存储和业务解耦的特性。

四、总结

本文首先给出了 PB 的反射函数，然后再结合自己平时负责的工作，给出了 PB 的进阶使用。通过对 PB 的进阶使用，可以大大提高开发和维护的效率，同时提升代码的优雅度。有需要更进一步研究 PB 的，可以阅读其源代码，不得不说，通过阅读优秀代码能够极大的促进编程能力。

需要注意的是 PB 反射需要依赖大量计算资源，在密集使用 PB 的场景下，需要注意 CPU 的使用情况。