Serialization – 序列化技术 – JSON



Serialization – 序列化技术 – JSON

0 0


serialization

serialization tutor including json, protobuf, thrift

On Github kaktos / serialization

Serialization

序列化技术

- 陈浩 / @kaktos

什么是序列化

序列化就是将数据结构或对象转换成可以存储或者在网络上传输的格式。反序列化就是序列化的逆过程。

序列化格式的发展

XML JSON(BSON) YAML Protocol Buffer MessagePack Thrift

JSON

轻量级,易于阅读,基于Javascript,丰富的语言支持。

key/value(hash),集合(array)

                            
 { "appid" : "album",
  "content" : "asdfasdfafd",
  "ext" : { "piclist" : [ { "content" : null,
            "cover130" : "http://test2.img.pp.sohu.com.cn/2011/11/10/14/16/u163937567_1344a50f638g134_s.gif",
            "coverId" : 337942648
          },
          { "content" : null,
            "cover130" : "http://test1.img.pp.sohu.com.cn/2011/11/10/14/15/u163937567_1344a502ebbg134_s.gif",
            "coverId" : 337942641
          }
        ],
      "total" : 2
    },
  "id" : "337942648_337942641",
  "status" : 0,
  "title" : "20111119",
  "url" : "http://kaktos.i.sohu.com/photoset/41069588/photos/"
}                           
                        

数据类型

  • String
  • Number
  • Object
  • Array
  • Boolean
  • null

语言支持-Javascript

var feed = eval("(" + feedText + ")");

更安全的做法

var feed = JSON.parse(feedText);

语言支持-Java

Jackson

Gson

Protocol Buffers

不同于JSON,protobuf基于非自描述的二进制协议 体积小,速度快 语言支持广泛

定义格式

                            
//--------------------------------------- photopark.proto
package photopark;
option java_package = "photopark.protobuf";
enum PhotoPrivilege {
	PUBLIC_PHOTO = 0;
	PRIVATE_PHOTO = 1;
}
message Photo {
	optional int64 id = 1;
	optional int64 photoset_id = 2;
	required int64 user_id = 3;
	optional string name = 4;
	optional PhotoPrivilege privilege = 5 [default = PUBLIC_PHOTO];
	optional string hosturl = 6;
	optional string img_names = 7;
	optional string dimensions = 8;
	optional int32 image_size = 9;
	optional int64 upload_time = 11;
	optional string upload_ip = 12;
	optional int64 last_modified = 14;
	optional int32 status = 15;
}                            
                            
                        

编译

protoc –java_out=$DST_DIR $SRC_DIR/photopark.proto

Photopark.java

protoc –python_out=$DST_DIR $SRC_DIR/photopark.proto

photopark_pb2.py

使用

序列化(java)

    	Photo.Builder photo = Photo.newBuilder();
    	photo.setStatus(1); 
    	photo.setUserId(userId);
    	photo.setName(filename);
    	photo.setHosturl(_hosturl);
    	photo.setImgNames(_imagNames);
    	photo.setImageSize((int)lsize);  
        photo.setDimensions(dimensions.trim());
      	photo.setUploadIp(uploadIP);
    	photo.setUploadTime(System.currentTimeMillis());
    	photo.setLastModified(System.currentTimeMillis()); 
        photo.build().toByteArray();       
                        

反序列化(python)

photopark.Photo().ParseFromString(data)

编码

Varints是一种用一个或者多个字节序列化整数的方法

                        1010 1100 0000 0010 #300 
                        
                        
                        1010 1100 0000 0010
                        → 010 1100  000 0010
                        → 000 0010 010 1100
                        → 100101100
                        →  256 + 32 + 8 + 4 = 300
                        
                        

编码

  • PB的message是一系列的key-value对
  • 使用varints数字(包含了别名以及属性类型信息)来作为key

Thrift

Facebook开源的RPC框架

                            
          struct UserProfile {
            1: i32 uid,
            2: string name,
            3: string blurb
          }
          service UserStorage {
            void store(1: UserProfile user),
            UserProfile retrieve(1: i32 uid)
          }                        
                            
                        

数据类型

  • bool: A boolean value (true or false)
  • byte: An 8-bit signed integer
  • i16: A 16-bit signed integer
  • i32: A 32-bit signed integer
  • i64: A 64-bit signed integer
  • double: A 64-bit floating point number
  • string: A text string encoded using UTF-8 encoding
  • binary
  • List, Set, Map

THE END

kaktos@搜狐技术部