RIP TSRMLS_CC – Adam Harvey – @LGnome



RIP TSRMLS_CC – Adam Harvey – @LGnome

0 0


rip-tsrmls-cc

Slides for my RIP TSRMLS_CC talk.

On Github LawnGnome / rip-tsrmls-cc

RIP TSRMLS_CC

Adam Harvey

@LGnome

That's my Twitter account. I actually don't talk about cats much, but I do talk about lots of non-PHP things.
PHP 7 came out in December 2015. It's been kept quiet, but you may have heard of it.

API churn

  • 333 functions/macros removed
  • 410 functions/macros added
In userland, we did a good job of maintaining backward compatibility, but that wasn't the case for the API exposed by the Zend Engine and PHP itself. 333 functions is almost exactly one quarter of the functions and macros exposed by PHP 5.6.

API churn

  • 516 out of 990 functions/macros differ
Here's the more insidious bit: more than half of the functions that kept the same name have changed. (And that excluded TSRM-related noise.) Different types, different numbers of parameters, different return values. Most don't matter, but some really do.
So, now that I've thoroughly terrified you, what do you do with your PHP 4 or 5 era extension that your site relies on?

What do?

  • One way migration
  • Separate branches
  • Shared codebase
  • Ignore PHP 7 and hope it goes away
The way I see it, you have three real options. (pause) I guess there's a fourth.
Let's look at the pros and cons of each in turn. First up: one way migration.

Advantages

  • One time cost
  • Conceptually simple
  • Cleaner code

Disadvantages

  • You no longer support PHP 5
Not supporting PHP 5 may not be a huge issue if you're only supporting something internal: if you're planning to do a hard cut over, that works. Although I'm not going to talk a lot about this option as we go on, the things I highlight are also things you'd want to do with this option.
Maintaining parallel branches is another option.

Advantages

  • Both versions are supported
  • Able to write idiomatic code for both versions

Disadvantages

  • Maintenance is a pain
  • Potentially double the work
  • Branches may diverge over time
This might work if you have almost everything abstracted away, but if you do, the parts of your code that interface with PHP are probably so simple anyway that you could just support both versions in the one branch.
Finally (excluding ignoring PHP 7), you can run a shared codebase.

Advantages

  • Both versions are supported
  • Less maintenance overhead
  • Happy users
Happy users because they can just download the latest release; cf memcached.

Disadvantages

  • So many API changes
I'm going to delve deeper into this. There are ways you can mitigate the API changes — how hard it is depends a lot on what type of extension you're writing. Simple C library wrappers: not too bad. Deep voodoo magic extensions: bad (but still doable).
As I've touched on already, it depends on your use case. If you're doing a flag day cutover of your app, and you have an extension that solely exists to serve that app, you probably just want to do a one way migration and be done with it. Otherwise, I'd go for option 3: support both. Separate branches don't work in the long term: ask Python developers.
Whatever you choose, the best thing to start is by trying to build your extension. It will fail, but the compiler errors will guide you pretty well for a lot of things, and once you clear them up you can use your test suite to catch the rest. (You have tests, right?) Still, that won't pick up everything, and there are things that you're better off knowing about in advance. So let's get into it.
So we know that a lot changed. What do we need to look for when migrating an extension?

zval

Let's start from the inside and work our way out. Many of the changes that matter when working on an extension are fundamentally type-related, so let's start with the type that underpins all of PHP.
typedef struct {
	zend_uchar type;
	zend_uint  refcount__gc;
	zend_uchar is_ref__gc;
	union {
		long lval;
		double dval;
		struct {
			char *val;
			int len;
		} str;
		HashTable *ht;
		zend_object_value obj;
	} value;
} zval;
Here's a simplified version of what the zval struct looks like in PHP 5. The important thing to remember is that each type corresponds to a member in the value union. Many of these values have changed: either in structure or in type.
typedef struct {
	zend_value value;
	union {
		struct {
			zend_uchar type;
			...
		}
	}
} zval;
Here's a simplified version of the PHP 7 zval. We'll talk about the specifics of the types that have changed representation after this, but I want you to note that the refcount and is_ref fields are gone. zvals are passed around by value rather than reference, so simple types don't need to be refcounted, and types that do need to be are refcounted within their values rather than the zval itself.

IS_LONG

PHP 5

long lval;

PHP 7

zend_long lval;

#if defined(__LP64__) || defined(_LP64) || defined(_WIN64)
typedef int64_t zend_long;
#else
typedef int32_t zend_long;
#endif
Integers are now this zend_long type, which means that integers are now consistently 64 bit on all 64 bit platforms, including Windows (which used to be an oddball). This is good, but think about what happens if you're on Win64 and use long (which is 32 bit) with zend_parse_parameters().

IS_BOOL

PHP 5

long lval;

PHP 7

#define IS_FALSE 2
#define IS_TRUE  3

ZEND_API int zend_is_true(zval *op);
IS_BOOL is gone altogether! In PHP 5, it used the lval to indicate whether it was true or false. In PHP 7, false and true are separate types. The ZVAL_BOOL macro still exists for setting, but to check the value you either have to check the type or call zend_is_true().

IS_STRING

PHP 5

struct {
	char *val;
	int   len;
} str;

PHP 7

zend_string *str;
Let's talk about a more interesting one. Three types are now pointers to other structures with their own garbage collection. The Zend Engine has retained the Z_STRLEN and Z_STRVAL macros, but there's now a string structure that gets used throughout the engine, not just for zvals. Let's look at it in more detail...
typedef struct {
	zend_refcounted_h gc;
	zend_ulong        h;
	size_t            len;
	char              val[1];
} zend_string;
Key points: lengths are now size_t, not signed int, as $DEITY intended. Garbage collection now takes place within the structure. h is a cached hash value. It's variable length.
zend_string *zend_string_alloc(size_t len, int persistent);
zend_string *zend_string_init(const char *s, size_t len, int pers);
zend_string *zend_string_dup(zend_string *s, int persistent);
void zend_string_release(zend_string *s);

uint32_t zend_string_addref(zend_string *s);
uint32_t zend_string_delref(zend_string *s);

#define ZSTR_VAL(zstr)  (zstr)->val
#define ZSTR_LEN(zstr)  (zstr)->len
A new set of functions have been added to deal with zend_strings. These are the most important ones (the full set is in zend_string.h). Again, note that reference counting is done on the string, not the zval, so it has "methods" to deal with that.

PHP 5

RETURN_STRING(str, duplicate)
ZVAL_STRING(zv, str, duplicate)
add_assoc_string(zv, key, str, duplicate)

PHP 7

RETURN_STRING(str)
ZVAL_STRING(zv, str)
add_assoc_string(zv, key, str)
Most macros and functions that dealt with setting strings had parameters indicating whether you wanted to duplicate the input. Those are now gone, since they have to create a zend_string anyway and will always duplicate. You can override this by instantiating the string directly using the zend_string API, but why would you?
#if ZEND_MODULE_API_NO < 20151012

#undef ZVAL_STRING
#define ZVAL_STRING(zv, str) do {             \
	const char *__s = (s);                \
	int __l = strlen(str);                \
	zval *__z = (zv);                     \
	Z_STRLEN_P(__z) = l;                  \
	Z_STRVAL_P(__z) = estrndup(__s, __l); \
} while (0);

#endif
Again, if you're supporting both versions, your options are kind of problematic. You're going to have to have compatibility wrappers or redefine the macros (my preference, but make sure you do it after including all PHP headers!). The downside is that it's ugly, but you can basically crib the implementations from the PHP 5.6 source. I'm going to show you how the sausage is made, but there are wrappers available for some of this.

IS_OBJECT

PHP 5

typedef struct {
	zend_object_handle handle;
	const zend_object_handlers *handlers;
} zend_object_value;

zend_object_value obj;

PHP 7

zend_object *obj;
Another new type! I'm going to talk more about class and object handling later, but let's focus for now on the representation. In PHP 5, zend_object_value is a small inline structure with an object handle, which is an index into a hash table.
typedef struct {
	zend_refcounted_h           gc;
	uint32_t                    handle;
	zend_class_entry           *ce;
	const zend_object_handlers *handlers;
	HashTable                  *properties;
	zval                        properties_table[1];
} zend_object;
This is a structure that it's probably rare that you'll poke directly, but again: refcounting is done on the object. The class entry and properties are now inline, which improves caching and performance. Note that this is variable length, though: this is important if you're overriding the create_object handler, and I'll talk about it later. I won't get into the API, because it hasn't changed much.

IS_RESOURCE

PHP 5

long lval;

PHP 7

zend_resource *res;
Finally, resources change from being indexes stored in the long value to being pointers to their own structures.
typedef struct {
	zend_refcounted_h gc;
	int               handle;
	int               type;
	void             *ptr;
} zend_resource;
This one's actually a huge improvement: the resource type is now kept inline rather than having to poke through a murky API. The bad news is that the API for resources in general has changed, which I'll talk more about later.
#define IS_UNDEF     0

#define IS_REFERENCE 10
zend_reference *ref;
Finally, there are a bunch of new types. The two that are important are UNDEF, which is for undefined variables (as the name might suggest), and REFERENCE, which replaces the built in is_ref field in the zval with a pointer to a refcounted structure (remembering the earlier point about not having refcounting in the zval itself).
Let's look at more concrete things you need to audit. We'll start with parameter parsing, since that's not going to be caught by the compiler.

PHP 5

char *str;
int len;

zend_parse_parameters(ZEND_NUM_ARGS(), "s", &str, &len);

PHP 7

char *str;
size_t len;

zend_parse_parameters(ZEND_NUM_ARGS(), "s", &str, &len);
For example: as I mentioned earlier, string lengths are now size_t. If you don't change the length variable to size_t, you get interesting looking segfaults on 64 bit platforms. This one's insidious.
#if ZEND_MODULE_API_NO >= 20151012
typedef size_t zend_string_len_t;
#else
typedef int zend_string_len_t;
#endif

char *str;
zend_string_len_t len;
zend_parse_parameters(ZEND_NUM_ARGS(), "s", &str, &len);
How do you deal with this if you want to support both versions? You can do some macro and typedef magic.
#if ZEND_MODULE_API_NO < 20151012
typedef long zend_long;
#endif

zend_long l;
zend_parse_parameters(ZEND_NUM_ARGS(), "l", &l);
I'd do something similar for zend_long too.
Arrays aren't hugely different, but there are two changes of note, and one will bite you silently if you're not careful.

PHP 5

zend_hash_update(ht, "key", 4, &zv, sizeof(zval *), NULL);
zend_hash_find(ht, "key", 4, &zv);

PHP 7

zend_string *key = zend_string_init("key", 3, 0);

zend_hash_update(ht, key, zv);
zv = zend_hash_find(ht, key);

zend_string_release(key);
This is a good'un. Array keys in PHP 5 included the null terminator in their length because… hell if I know. In PHP 7, they don't (partly because we're using zend_strings). You'll also note that the API has changed significantly (for the better, since it removes a bunch of parameters nobody ever used).

IS_PTR

zend_string *key;
my_struct *ptr;

zend_hash_update_ptr(ht, key, ptr);
ptr = (my_struct *) zend_hash_find_ptr(ht, key);
There's another aspect to this too: you would have noticed that we didn't provide the size. The reason for this is because HashTables now store zval pointers only. Instead, to store a raw pointer, you use a parallel API that internally wraps the pointer in a zval with the new IS_PTR type.
zval *
compat_zend_hash_find(HashTable *ht, const char *key, size_t len) {
#if ZEND_MODULE_API_NO >= 20151012
	zend_string *zs = zend_string_init(key, len);
	zval *val = zend_hash_find(ht, zs);
	zend_string_release(zs);
	return val;
#else
	zval *val = NULL;
	int res = zend_hash_find(ht, key, len + 1, &val);
	return (res == SUCCESS) ? val : NULL;
#endif
}
This one's tricky to shim between versions. So far, everyone I've seen who's done it has written wrappers with different names — you can add the _ptr functions easily enough to PHP 5, but that doesn't help with the other API changes. You need to audit all zend_hash function calls and figure out if you want to DIY or pull in a compatibility helper.
HashPosition pos;
ulong num_key;
char *key;
uint key_len;
zval **zv_pp;

zend_hash_internal_pointer_reset_ex(&ht, &pos);
while (zend_hash_get_current_data_ex(&ht, &zv_pp, &pos)
       == SUCCESS) {
	if (zend_hash_get_current_key_ex(&ht, &key, &key_len,
	                                 &num_key, 0, &pos) ==
	                                 HASH_KEY_IS_STRING) {
		...
	}
}
One amazing new feature that I want to highlight, even though it's not directly a migration topic: HashTables now have these incredible iteration macros. If you've had to iterate an array before, you'll understand why this is a big deal. Here's the old code...
ulong num_key;
zend_string *key;
zval *zv;

ZEND_HASH_FOREACH_KEY_VAL(ht, num_key, key, val) {
	if (key) {
		...
	}
}
Here's the new.
Let's talk about resources, those weird holdovers from the old days. As I mentioned earlier, they're actually a fair bit nicer to use, but that doesn't necessarily mean that you want to.

PHP 5

int zend_register_resource(zval *zv, void *ptr, int type);
void *zend_fetch_resource(zval **id, int default_id,
                          const char *name, int type,
													int num_types, ...);
int zend_list_delete(int id);

PHP 7

zend_resource *zend_register_resource(void *ptr, int type);
void *zend_fetch_resource(zend_resource *res, const char *name,
                          int type);
void *zend_fetch_resource_ex(zval *res, const char *name,
                             int type);
int zend_list_close(zend_resource *res);
There were a set of macros on PHP 5 that mapped to the underlying functions. As you can see, the functions have changed a tonne for the better, but you can't really shim them in any meaningful way, particularly since zend_register_resource() changed the zval on PHP 5 but doesn't on PHP 7. On the bright side, the basic workflow hasn't really changed: you register a pointer, fetch it, and delete/close it.
#if ZEND_MODULE_API_NO >= 20151012
#define ZEND_REGISTER_RESOURCE(zv, ptr, type) \
	ZVAL_RES(zv, zend_register_resource(ptr, type));

#define ZEND_FETCH_RESOURCE(zv, type, id, default_id, name, type) \
	zend_fetch_resource_ex(zv, name, type);

#define ZEND_CLOSE_RESOURCE(zv) \
	zend_list_close(Z_RES_P(zv))
#else
#define ZEND_CLOSE_RESOURCE(zv) \
	zend_list_delete(Z_LVAL_P(zv))
#endif
Your options are either to write a wrapper like the hashtable wrapper, or reimplement the macros on PHP 7. I personally prefer the wrapper option (and implemented it in pecl-compat), but here's a rough version of the macros if you'd prefer that (the ignored values are unfortunate).
Let's talk about objects. Basic object handling is largely unchanged, mercifully.

PHP 5

zval *zv;

zv = zend_read_property(ce, obj, name, strlen(name), 0);

PHP 7

zval rv;
zval *zv;

zv = zend_read_property(ce, obj, name, strlen(name), 0, &rv);
The one common API that has changed a bit is reading properties. In PHP 7, you have to provide the storage for the returned value (this is only used if there's a __get method or a custom read_property, and you should use the return value "zv" and not access "rv").
zval *
compat_zend_read_property(zend_class_entry *ce, zval *obj,
                          const char *name, int name_length,
                          int silent, zval *rv TSRMLS_DC) {
#if ZEND_MODULE_API_NO >= 20151012
	return zend_read_property(ce, obj, name, name_length,
	                          silent, rv);
#else
	(void) rv;
	return zend_read_property(ce, obj, name, name_length,
	                          silent TSRMLS_CC);
#endif
}
As zend_read_property isn't a macro, you'll have to add a shim. I'd go with a little inline function or macro, assuming you're writing your own.
As I mentioned earlier, there is a difference with objects with custom allocators because the zend_object struct is variable length (to cope with properties).

PHP 5

typedef struct {
	zend_object std;
	my_struct *struct;
} my_object;

zend_object_value my_object_new(zend_class_entry *ce TSRMLS_DC) {
	my_object *intern;
	zend_object_value retval;

	intern = emalloc(sizeof(my_object));
	/* ... */
	retval.handle = zend_objects_store_put(intern,
		zend_objects_destroy_object, my_object_free, NULL TSRMLS_CC);
	return retval;
}
In PHP 5, the start of a create_object handler looks like this (in general, they have lots of boilerplate). You allocate the structure, and then you later register it in the object store and return that value. (Also the only slide with smaller text. Sorry.)

PHP 5

typedef struct {
	zend_object std;
	my_struct *struct;
} my_object;

my_object *my_object_get(zval *zv TSRMLS_DC) {
	return (my_object *)
		zend_object_store_get_object(zv TSRMLS_CC);
}
Retrieving an object was easy: you'd just cast what you got back from the object store.

PHP 7

typedef struct {
	my_struct *struct;
	zend_object std;
} my_object;

zend_object *my_object_new(zend_class_entry *ce) {
	struct my_object *intern;

	intern = emalloc(sizeof(my_object) +
	                 zend_objects_properties_size(ce));
	/* ... */
	return &intern->std;
}
The PHP 7 version of create_object has three major differences. Firstly, we've reordered the structure fields: the zend_object has to be the last element now because it's variable length. Secondly, there's the emalloc: you have to add the variable space required for the class's declared properties. Finally, you don't call zend_objects_store_put any more: you return a pointer to the zend_object buried within your structure.

PHP 7

typedef struct {
	my_struct *struct;
	zend_object std;
} my_object;

my_object *my_object_get(zval *zv) {
	zend_object *obj = Z_OBJ_P(zv);

	return (my_object *)
		((char *)(obj) - XtOffsetOf(my_object, std));
}
Oh. Oh no. The getter is more complicated than before. The zend_object pointer within the zval points partway through our struct, so we use XtOffsetOf to calculate how far back we have to go. The char * is because offsetof returns a value in bytes. XtOffsetOf instead of offsetof is because of ancient compilers.

PHP 7

zend_class_entry *my_object_ce;
zend_object_handlers my_object_handlers;

PHP_MINIT_FUNCTION(my_extension) {
	zend_class_entry ce;

	INIT_CLASS_ENTRY(ce, "My\\Object", NULL);
	my_object_ce = zend_register_internal_class_ex(&ce, NULL);

	memcpy(&my_object_handlers, &std_object_handlers,
	       sizeof(zend_object_handlers));
	my_object_handlers.offset = XtOffsetOf(my_object, std);
}
There's one other wrinkle, too, in PHP 7. You have to set the new offset field on the object handlers structure to the offset of your zend_object within your custom structure. This allows PHP to free your entire structure when the object is deleted (although you still have to free anything else you've allocated and pointed to). The reality is that you're going to have two versions of stuff to support both versions.
  • TSRMLS_C
  • TSRMLS_CC
  • TSRMLS_D
  • TSRMLS_DC
  • TSRMLS_FETCH
The last change I'll talk about is the one that renamed the talk: TSRMLS macros are no more in PHP 7. They've been preserved solely to ensure that you don't have to remove them if you're supporting both versions, but for PHP 7 only code, you can rip them out.
Finally, let's look at a few tools and references that I've personally found useful.

phpng-upgrading

https://wiki.php.net/phpng-upgrading The absolute best guide available today is the phpng upgrading guide on the php.net wiki. It's not perfect, but it covers a lot of things I've talked about in more depth, and covers a lot more too.

LXR

http://lxr.php.net/ LeXeR is a great tool for seeing how definitions and prototypes change between versions. Enter a definition, select PHP_5_6 and PHP_7_0, and you know how it changed!

git diff PHP-5.6..PHP-7.0 ext/gmp Comparing other extensions before and after PHP 7 support is a really useful tool too. You can do it with a Git checkout of php-src easily: pick an extension that does something like what you want, and diff PHP-5.6 to PHP-7.0. Or find a PECL extension and compare side by side.

pecl-compat

https://github.com/flaupretre/pecl-compat/ Francois Laupretre has written a set of headers called pecl-compat that you can pull into your project (I've done so in PECL radius through a submodule). They're not comprehensive, but he accepts pull requests happily, and where possible, this shims PHP 5 APIs so you can write PHP 7 code, and provides compatibility functions or macros where not. Object support would be great!
Don't be afraid. Upgrading looks scary when you first see what's changed, and the compiler errors when you first try to build your extension against PHP 7, but it's really not that bad. I did RADIUS in a pretty lazy afternoon, and nine out of ten PHP contributors agree that I write terrible code. (Ask Nikita about my CSV code.) If you're methodical, you'll have no problem.

Thank you!

Questions?

https://joind.in/talk/fc496

@LGnome

Slides: https://lawngnome.github.io/rip-tsrmls-cc/

Go to image credits.

Image credits

RIP TSRMLS_CC Adam Harvey @LGnome That's my Twitter account. I actually don't talk about cats much, but I do talk about lots of non-PHP things.