Object properties and immutability

There has been much discussion in recent weeks in PHP circles about how to make objects more immutable. There have been a number of proposals made either formally or informally that relate to object property access, all aimed at making objects safer through restricting write access in some way.

Since my last mega post on PHP object ergonomics was so well-received and successful (it resulted in both constructor promotion and named arguments being added to PHP 8.0, thanks Nikita!), I figure I'll offer another summary of the problem space in the hopes of a deeper analysis suggesting a unified way forward.

The problem space

At a high level, there are a series of goals that different people have laid out, usually implicitly. Not everyone is pushing for all of these, but at least someone is, and they're all in the same problem space, and as far as I can tell no one is actively against any of these desires.

  1. Reduce the need for boilerplate getter/setter methods by making properties immutable, and thus safe to make public without fear of them getting modified out from under you.
  2. Make objects more predictable generally by making them immutable entirely, or nearly entirely.
  3. Make "evolvable" objects more ergonomic. "Evolvable" objects refer to immutable objects that have methods to spawn "a new version of this object with some change." PSR-7 and PSR-13 are the canonical examples here, but DateTimeImmutable is also a good example. Often these are implemented using "with-er" methods: withX($new_x_value).
  4. There has long been a desire to bring back "objects that pass by value" rather than by reference/handle. Mainly this is to avoid "spooky action at a distance" when dealing with data objects, where you do want them to behave more like strings or arrays than like function pointers.

The lay of the land

Objects come in multiple flavors, depending on the use case. One of the great failings of conventional OOP (from the C++, Java, and PHP family) is that it conflates all of these flavors into a single syntax with no clear way to differentiate them, despite them needing different functionality. (The terms below are not any official terminology, just an approximation of what I see in practice.)

I would argue there a few other, lesser-used examples of object types. I may do a more complete taxonomy of objects, but for now this covers all of the variations that are relevant to this discussion.

Service objects

Service objects, particularly in PHP, are essentially closed over lambda functions with funny extra syntax for partial application. They consist of one or more stateless methods and zero or more dependencies. Those dependencies are by design hidden from the outside world, and by design never change over the life of the object. An example from my book on functional programming in PHP:

class ThemeRenderer
{
    private $config;
 
    public function __construct(SysConfiguration $config) {
        $this->config = $config;
    }
 
    public function renderUser(User $user): string {
        // ...
    }

    public function renderProduct(Product $product): string {
        // ...
    }
}

Which is equivalent to:

function renderUser(SysConfiguration $config, User $u): string {...}
function renderProduct(SysConfiguration $config, Product $p): string {...}

$config = getConfigFromWherever();

$userRenderer = fn(User $u): string
    => renderUser($config, $u);

$productRenderer = fn(Product $p): string
    => renderProduct($config, $p);

In this case, the $config property needs to be bound once, not accessible externally, and unable to change at random during the life of the object. Technically, if it's readable externally that doesn't hurt anything, but it's leaking an implementation detail. Making it modifiable externally is a no-no. It has to be readable internally, obviously. Making it writeable internally is not preferable, but personal discipline (which, in practice, almost all modern developers have in this case) is usually good enough since it's only the class author you need to rely on to not screw it up.

Service objects passing by reference (Java) or handle (PHP) is the only reasonable way to implement them, as otherwise you're duplicating functions pointlessly. They're stateless post-construction so protecting against spooky-action-at-a-distance is unnecessary.

Serializing service objects is almost never a good idea; they're very likely to have a reference, direct or indirect, to some unserializable value (DB connection, etc.) and serializing them is almost never useful.

In some cases (such as this example), cloning a service object is a safe if odd thing to do. The use cases are few, but those that do exist would involve making a clone with some property changed, generally a configuration of some kind.

For example, I was working on a library that represented different branches of work (Git-esque) but in SQL. The Repository object had a DB connection and various other properties, but also properties that indicated which "branch" it was on. It then used that to generate the appropriate SQL when a user tries to read or save an entity.

Making a new branch involved cloning a new object and changing the $branch property on the new object. That way, the Repository itself was still immutable and stateless, but could spawn alternate versions of itself as needed. That's quite easy in PHP due to how PHP's property access rules work:

class Repository {
    protected string $branch;
    protected Connection $db;
 
    public function forBranch(string $new_branch): self {
        $new = clone($this);
        $new->branch = $new_branch;
        return $new;
    }
}

This is really the only kind of cloning that makes sense on service objects.

Entity objects

Entity objects represent a value, not a function. Specifically, they represent a value that has a sense of persistent identity independent of the memory address of the object. A User record, a Product record, etc. are common examples. Generally they are read from and persisted to a database or REST service, and that ID is necessary for keeping track of the value being represented (to know what DB record to update, for example). Entities, however, come in two variants, useful in two different situations in different languages, and that's one of the challenges of discussing OOP.

Fat entities

Fat entities contain a lot of business logic themselves, and tend to be memory-resident for longer periods. They synchronize to a datastore, but most interaction is done in-memory. This is the classic "network of interacting objects passing messages to each other" model of OOP that was first envisioned. And, in practice, with PHP being itself mostly stateless between requests, this is almost never done in PHP, really. Most attempts to do so end up being grossly complicated because of all the startup and teardown effort, and the convoluted dependencies that result. ActiveRecord is an example of a cheap knock-off of Fat Entities. We won't talk about them too much, but for completeness let's consider how they interact with cloning and immutability.

Fat entities are where service and value objects overlap. A fat entity may have dependencies on service, up to and including a database connection (directly or transitively), which is what makes them so annoying to work with, especially in a short-lived environment like PHP.

Fat entities cannot be immutable. Their whole purpose is to be a place where data does change, behind a strict opaque interface. More specifically, they almost always disallow external changes but allow internal changes, be that a simple setter method or some other complex response to an incoming message.

Fat entities also make little sense to clone. Doing so creates two divergent representations of the same entity, which because they're mutable will readily go out of sync.

Fat entities, like service entities, should generally pass by reference/handle, because you don't want to have different versions of your Product or User object floating around in memory.

It may make sense to serialize a fat entity, however. There is potential for values to get out of sync if it's done excessively, but serializing an object to JSON or XML is how you would send it across the wire to/from a REST service. Technically even saving it to a database is a form of serializing. However, deserializing a fat entity requires reconnecting any service-esque dependencies, which is basically impossible without global variables to access. That's one of the reasons fat entities in PHP are just a bad idea in general. Please stop using them.

Skinny entities

Skinny entities contain little business logic themselves beyond being an identified set of values. The logic they do contain generally revolves around validation. Unlike fat entities, they keep their dependencies to a minimum and avoid service dependencies. (If an entity has a service dependency, I would not call it a skinny entity anymore but a fat dependency.) They may have other objects internally that are value objects to represent complex structures (such as an Address property of a Person object), but those are not really dependencies but compound internal values.

Like fat entities, they are internally mutable, externally opaque (but mutable through methods), dangerous to clone, but reasonable to serialize. Like fat entities, you generally want pass by reference/handle semantics.

Value objects

Value objects are objects that do not represent behavior (like a function or service object) but represent a value. They don't have state. They are state.

From a functional point of view, they're product types: A Point object is the product of two integer values, $x and $y. From an object-oriented point of view, they define a new "thingie" that has opaque internal properties that may or may not correspond to public methods. Both viewpoints are valid and valuable (no pun intended), but both share an important feature: The identity of a value object is it defined by the totality of its properties. That is, Point(x: 3, y: 5) and Point(x: 3, y: 6) represent different, discrete values just as much as 5 and 6 are different, discrete values.

By extension, it follows that value objects must be both internally and externally immutable. Allowing them to be mutable means the object memory address may suddenly refer to a different intrinsic value, causing spooky-action-at-a-distance. Allowing the Point to be changed in-place is akin to allowing 6 to be redefined as half of 10, and then nothing makes sense anymore.

Depending on the use case, sometimes individual properties should be externally readable, other times not. The internal representation may not be the same as the external representation (via a method), but should still be immutable. There may also be invariant relationships between properties, such as one needs to be twice another, or similar.

A value object in which all properties are externally readable is sometimes called a "Struct object", because it works basically like a struct in languages like C, C++, or Go.

Value objects generally benefit from pass-by-value semantics, because they reduce spooky-action-at-a-distance over a call boundary. However, as long as they are immutable it doesn't really matter, and passing by reference/handle is an optimization detail. If they were mutable, you generally want pass-by-value semantics.

That's well and good as far as it goes, but there's more to it than that. Just as integers and strings have intrinsic operations on them (potentially an infinite number), so too do value objects. What those operations are and whether they make sense varies with the type.

For integers, common operations are addition, subtraction, and multiplication.

For a Point, they may include move_up($y), move_left($x), or area() (to compute the area from the origin).

For a DateTimeImmuable object, they may include nextTuesday().

For a PSR-7 URL object, it may (does) include "same thing but with this query string."

Importantly, every one of those operations must produce a new value object, because to do otherwise would change the value of 6 and upset the laws of physics. And that is potentially more involved than it sounds.

In some cases, making an entirely new object via a new call is entirely feasible. For a Rectangle object, it's reasonable to just expose both height and width in the constructor:

class Rectangle {
    public function __construct(public int $height, public int $width) {}
    
    public function widenBy(int $widen) {
        return new static($this->height, $this->width + $widen);
    }
}

For more complex value objects that can be highly prohibitive. Consider Url again. It has at least eight properties needed according to its interface, and a given implementation may have more. A fresh new statement with eight $this->blah statements in it is a lot of code to write and to read, especially when you have to repeat it eight times (once for each "withX" method, or equivalent). It also forces the constructor to have a parameter for every potentially settable property. In some cases that's fine, in others not.

It's perfectly valid to want to support passing a string as the constructor argument and parsing it out to components, but if the constructor must take individual components then it can't. In this case that could be pushed off on a static constructor factory, but that won't always be feasible. The core point is that making all operations on value objects construct a new value from scratch, always, forces the constructor to be written in a specific way that may or may not be desirable.

It also presumes that there's no relationship between properties, and that properties are trivially set. If there's complex logic around determining the properties of the value object in the first place, constructor-only logic would require first building all of the properties into stand-alone variables by hand, possibly validating them externally to the object, then passing them in all at once. That's procedural code, not object-oriented.

All of which is to say that clone-and-modify is a much, much more straightforward and ergonomic pattern for such methods. It sidesteps the problem of conflating the constructor with the internal object structure. It makes the resulting code vastly simpler. It's also what is typically done these days, and is essentially exactly what the Repository example above with $branch does.

However, let's look at how to enforce immutability. Properties of a value object may or may not be externally readable, must not be externally writeable, and must not be internally writable either. But forbidding internal writeability precludes the clone-and-modify approach, unless some special mechanism is carved out as an exception.

Value objects are also the object type most often serialized, which means deserializing an object needs to allow its properties to be re-set. That's another carve-out exception to property-level immutability.

In short, while object-level immutability is absolutely vital for a value object, property-level immutability only works in narrow circumstances and precludes many behaviors.

One possible way around that problem is a builder object; such objects are themselves mutable, but take all the same parameters as the value object they build, and finally produce a constructed value object. In order to get around the verbosity of repeating every property every time, though, they would need to take a value object as a starting point:

class Rectangle {
    public function __construct(public int $height, public int $width) {}
    
    public function widenBy(int $widen) {
        $b = new RectBuilder($this);
        $b->setWidth($this->width + $widen);
        return $b->buildRect();
    }
}

But that then presumes all properties are available publicly, either directly or through accessor methods. That's an awful lot of complexity just to force the constructor to be boring, which isn't even desirable in the first place.

Value objects can appear in surprising places. For instance, arguably a Command object in a Command Bus, CQRS, or Event Sourcing model should be a value object. However, it also benefits from having more flexibility around its construction than a procedural call to a constructor. The more robust the object, the more it benefits from actually being object-oriented rather than procedural with extra curly braces.

Carrier objects

There's no standard name for this pattern as far as I'm aware, so I am inventing one. The canonical example here is PSR-14 Event objects, but a collection (such as a Set or Queue) is another example in which the values being carried are all of the same type. They have no service dependencies, but should be mutable, both internally and externally. Their whole point is to be modified by a series of other operations to arrive at a final result. They may or may not encapsulate dependency-free business logic.

Carrier objects may be implemented the same as value objects, with each mutation method instead returning a new object instance. However, that is not always optimal. Unlike true value objects they do not represent a "specific thing" akin to an integer. In the case of PSR-14, the spec authors specifically chose not to force them to be value objects precisely because of the lack of language support to require it, and relying on the self-discipline of an infinite number of developers, rather than just a single class author, seemed unwise.

Sometimes a collection object makes sense to be immutable (a Set), other times it really shouldn't be (a Queue or Stack). When they are mutable, though, they're mutable through a very restricted set of operations (push, pop, etc.). Their actual properties should be publicly unreadable and unwriteable, but internally writable.

Builder objects, as mentioned in the last section, probably fall into this category as well.

Carrier objects may or may not make sense to serialize, depending on the context. They may or may not benefit from pass-by-reference/handle vs pass-by-value semantics, depending on the context.

Review

Looking back across our review of object types, let's consider in what situations they should or should not be mutable.

Public readPublic writePrivate readPrivate writeChange on cloneSerialize
Service object (dependencies)NNYNNN
Service object (configuration)NNYNMN
Fat entityMNYMNM
Skinny entityMNYMNM
Value objectMNYNYY
Carrier objectMNYMMM

A few important points to note:

  • Everything but Entities needs to be clonable.
  • Everything but service objects needs to be serializable.
  • Cloning an object and changing none of its properties renders cloning utterly pointless. The whole point of cloning is to be step one in "make a new object like this other one, but with this difference." If there is no way to make it different, then cloning has no purpose.
  • Public write is almost never useful, except on carrier objects. Sometimes an internal-only struct object could be viewed as a carrier object, but we'll set that aside for now.
  • Everything needs internal read (unsurprisingly), so we can ignore that as a factor.
  • Service objects need cloning, but not serializing. Entity objects need serializing, but not cloning. Value and carrier objects need both.

The places where we want immutable properties include:

  • Service object dependencies
  • Value object properties

The places we want properties to be mutable include:

  • Internal to entities
  • Internal to carrier objects
  • On service object cloning
  • On value object clone and serialization
  • On carrier object clone and serialization

Importantly, we end up in a situation where whether or not we want a given property to be mutable is context-dependent. It needs to be settable from a constructor, obviously, but also in select other contexts. Specifically, in initialization contexts where we're setting up a new object, but not necessarily internal to its constructor.

Also, there are cases where a property should be publicly read-only, but still modifiable internally. That's particularly the case for value and carrier objects. Of note, those are only sometimes the same as those select contexts. For entity objects, for instance, public-read, private-write-at-any-time is a valid pattern.

We then end up with the following combinations:

  • public read, private write
  • public read, private read, init write
  • public none, private write
  • public none, private read
  • public none, private read, init write

public none, private write is already covered by the private keyword today.

Asymmetric cloning

An important caveat here is that not all value objects are struct objects. A value object may well have an internal representation that is not accessible externally, but may need to be updated when cloning or serializing. It could also have invariants between properties. That leads to another point that we've not considered yet: Clone access is asymmetric.

That can be handled today just fine. Since no property is truly immutable, making all properties private and accessible only through methods has essentially the desired effect, if verbosely. At that point, however, neither asymmetric visibility nor initonly is useful, since no property would be externally writable and the class author can be trusted to not violate their own "can't touch this" expectations. The whole point of making properties public, though, is to not need the extra getter methods.

However, how can we allow a property to be modified on clone when cloned internally (inside a method that has the extra logic in it to be aware of internal structures and inter-property dependencies) but not externally? Following existing visibility rules is insufficient in this case; it would require that a property that is publicly readable is also publicly writable-on-clone. Sometimes that is fine, but often it bypasses internal validation.

What we need to do is divorce public-read from public-clone.

If we view public clone as a case of public write, then where we end up is public-read, private-write. If not, then we need a third operation beyond get/set, for cloning (or, arguably, any "init" context including serialization).

The proposals

A number of proposals have been floated to address some of the desires listed at the start of this post. These are presented in no particular order.

initonly properties

This would allow a property to be flagged initonly, much like it is for public or static access. If marked, the property would be read-only except in select "initialization" contexts. At present, that is considered the constructor and deserialization.

class Foo {
    public initonly string $bar;

    public function __construct(string $bar) {
        // This is legal
        $this->bar = $bar;
    }

    public function setBar(string $bar) {
        // This is an error.
        $this->bar = $bar;
    }
}

$f = new Foo('beep');

// This is legal.
print $f->bar;

// This is an error;
$f->bar = 'baz';

Asymmetric visibility

This would allow properties to have different public/protected/private visibility for get and set operations. That is, a property could be publicly readable, but privately writable. (Or, technically, the other way around although no use case for that has been demonstrated.)

class Foo {
    get:public set:private string $bar;

    public function __construct(string $bar) {
        // This is legal
        $this->bar = $bar;
    }

    public function setBar(string $bar) {
        // This is legal.
        $this->bar = $bar;
    }
}

$f = new Foo('beep');

// This is legal.
print $f->bar;

// This is an error;
$f->bar = 'baz';

clone-with

This would allow a clone operation to be extended to modify properties of the cloned object. It is similar conceptually to Rust's object initialization syntax, which allows this:

struct Point {
   x: f32,
   y: f32,
}

let p1: Point = Point { x: 10.3, y: 0.4 };
let p2 = Point { y: 5, x: p1.x }

to be shortened to:

let p2 = Point { y: 5, .. p1 }

The proposed PHP syntax equivalent would look like this:

$new = clone $f with {
    x: $newX,
    y: $newY,
};

Which would be logically equivalent to what with-er methods do today:

$new = clone($f);
$new->x = $newX;
$new->y = $newY;
return $new;

The new syntax would make it easier to set multiple properties in one pass, and turn the entire operation into a single expression, making it valid anywhere an expression is. (Think match() statements, short-functions, etc.)

As a side effect, it would also create a known "clone context" in which an initonly property could be modifiable. It's unclear at the moment how property visibility would interact with it, but presumably you would be able to set only those properties visible in the calling scope.

__clone() arguments

An alternative to clone-with that has been discussed is to allow arguments to be passed to __clone() methods. __clone() is already called after an object is cloned to allow it to deep-clone properties if needed, but this approach would make it possible to pass additional values to it to use as it sees fit. Most likely they would usually be passed as variadics although that's not strictly required.

This approach could effectively emulate clone-with by using named arguments and variadics, like so:

Class Foo {
    private $bar;
    private $beep;

    // Very basic, needs more error handling.
    public function __clone(...$args) {
        foreach ($args as $k => $v) {
            if (property_exists($this, $k) {
                $this->k = $v;
            }
        }
    }
}

$f = new Foo();
clone($f, bar: 'newbar', beep: 'newbeep');

As with clone-with, it would also create a "clone context" that could be treated as an initialization context. Essentially, it would allow more flexibility than clone-with at the cost of a chunk of boilerplate code for the most common use case. It's impact on visibility is also unclear.

Immutability analysis

Technically initonly and asymmetric visibility are not mutually exclusive, but combining them would lead to a highly complex set of combinations. Effectively, you would be able to declare a property public, private, or protected in each context of get, set, or "set in init context". I dislike this idea, as it just gets too complicated too fast.

initonly has the appeal of simplicity, as it's a single flag to consider. It would only be viable if combined with one or the other clone enhancements to create a "clone init" scope for modification, but could reasonably work with either one.

However, initonly creates the problem that if a property is writable in an init context only, you still need to have write access to it generally or you still couldn't write to it. For example:

class Foo {
    public initonly int $baz = 5;
    private initonly int $beep = 6;
}

$f = new Foo();

// OK
print $f->baz;

// Error
$f2 = clone $f with {
    baz: 8,
    beep: 9,
};

Setting $beep in that clone operation cannot be allowed, since it is private. Setting $baz could be allowed, but then bypasses the unrepresentable requirement that $baz be a multiple of 5. Normally, addressing that involves a setter or wither method, but then you're back to where we are today. It would also mean that $baz cannot be public for reading, because that would also make it public for write/clone and bypass validation. At that point, the only benefit is that the one class author couldn't accidentally modify the property without it throwing an error.

Asymmetric visibility has the opposite trade-off. It would allow "public read, private set" configuration, where "set" includes "change after cloning." It technically doesn't need a clone-context or init-context at all, although it would in no way conflict with either clone enhancement.

The previous example would in this case become:

class Foo {
    get:public set:private int $baz = 5;
    private int $beep = 6;

    public function setBaz(int $baz) {
        If (! $baz % 5) throw new Exception();
        $this->baz = $baz;
    }

    public function withBaz(int $baz) {
        if (! $baz % 5) throw new Exception();
        return clone $this with { baz: $baz };
    }
}

$f = new Foo();

// OK.
print $f->baz;

// Error.
$f2 = clone $f with {
    baz: 8,
    beep: 9,
};

Now, we're explicitly saying that third parties can read $baz but not write to it. Whether that writing happens right after a clone operation or not is irrelevant. The external clone fails because it's an external write operation, where the internal one succeeds. But $baz can still be accessed externally.

The downside is that there's nothing to indicate if setBaz() or withBaz() is appropriate for this object. The property cannot be forced to be clone-only, and thus the object cannot be forced to be clone-only. That is left up to the self-discipline and documentation of the class author to enforce by hand. However, I would argue that it's reasonable to trust the author of a class to not modify things that shouldn't be modified, and thus an acceptable limitation. It's not a reasonable expectation to trust that no consumer of the class will ever modify-via-clone to produce an invalid state.

I would therefore argue that asymmetric visibility is "close enough" to initonly in terms of the net result, in addition to offering other benefits unrelated to immutability (as with the setBaz() example above), that it is the superior option of the two.

Enhanced clone analysis

If we went with initonly, then one or the other clone enhancements would be a requirement to avoid blocking evolvable objects (which would be one of the main use cases). With asymmetric visibility, neither is a requirement but either is a nice-to-have.

Neither is a clear winner. One offers more flexibility and more boilerplate while the other less flexibility and less boilerplate. Technically they aren't mutually exclusive, either, although that would potentially result in some highly fugly code.

Assuming asymmetric visibility, I think I would favor clone-with. Technically, there is nothing that either one could do that couldn't be done with a method, unless there are initonly properties to deal with. If not, then any clone-with-args use case could also be implemented as a method on the class, as today. clone-with then becomes an optimization of the most-common case, which turns that most-common-case into a single expression that could be used internally or externally as appropriate in context.

If we went with initonly, then clone-with-args is potentially necessary as otherwise there are use cases that initonly would be incompatible with, with no way around it.

Recommendation

My analysis, therefore, suggests we should implement clone-with and asymmetric visibility, as two independent but play-nice-together features. That combination would result in a similar net-benefit for object properties as the combination of constructor promotion and named arguments did for object construction.

H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now
Logo
Center