On strong type checking in PHP, and the opportunities it presents today.
Coming from a dynamic language background, mostly PHP and python, I was pleasantly surprised by the way the Rust compiler has your back and figures out most of the runtime issues at compile time. In my role as a cloud architect who supports a large PHP web application, I would love to have that certainty for all our code that gets pushed to production.
There are static code analysis tools that try to cover that gap, and while they do a good job at trying to infer types based on usage, if the codebase you are trying to analyse doesn’t have a lot of type information (either via types in functions/method definitions or PHPDocs) there is only so much they can do.
What was unclear to me, however, was just how far those static analysis tools could go and how much error handling you can get from them if you are thoughtful about the type information you give them.
Inspired by that question, I set out to build a PHP library that attempts to have a similar interface to the Rust Iterator trait. The reasoning behind this choice was three-:
- Rust’s Iterator interface is a lot more friendly than PHP array functions. Also it does lazy iteration by default, which I though would be an interesting thing to implement.
- I wanted to replicate the way Rust checks the functions you pass to the different methods and statically validates if types match along the entire way.
I am very happy with the results. You can take a look at the library here, also available via composer here.
But back to the tools, I think there is a couple of lessons learned, both from the experience and from using it day to day, that I think should be shared with the rest of the community:
Use static code analysis tools in your PHP projects
I don’t see this recommended as much as I think it should be. I think tools like Phan, phpstan or psalm should be part of the standard PHP application toolkit as much as PHPunit is today, and as with writing tests, it’s best if you incorporate them from the start. Similarly, make sure you run them in your CI pipeline. I can’t tell you how many bugs we caught before they reached production because of Phan.
Start writing typed code, annotate existing code before converting it
As I said above, the tools work as well as you let them. For new code, be as explicit with your types as possible.
Adding types to existing code can be a lot more complex, especially if you don’t have a robust testing suite. Luckily these tools have you covered by the way of annotations, which allow you to get the benefit of adding type information without changing the semantics of existing code. Over time, and with the confidence that the tools give you, you can set the actual types of your function parameters, knowing that any case of it being called with a different type has already been caught.
Don’t be constrained by the current PHP type system
This was the biggest lesson learned for me from writing the library. It makes heavy use of Generics to be able to catch the kind of errors I was after. I.e., calling map on an array of Foo
instances and on the closure calling a method that doesn’t exist in Foo
.
Just to show a bit of the magic, let me show you what the map
method looks like:
From a types perspective, the map
operation is defined as follow:
- It takes an iterator over elements of type A
- It also takes a function that takes a parameter of type A, and returns value of type B
- It returns an iterator over elements of type B
So how do we translate that to something which, in this case, phpstan can understand? Using PHPDocs.
First, we define the generic types of our iterator, in our case, the LazyIter
class.
/**
* @template TKey
* @template TValue
*/
class LazyIter {
For the sake of the example we only care about TValue
, which is the type of the iterator elements. Tkey
is there due to some internal implementation details, and is constant across the map operation.
Then, we define the map method:
/**
* @template UValue
* @param callable(TValue): UValue $callable
* @return LazyIter<TKey,UValue>
*/
function map(callable $callable): LazyIter {
return new LazyIter(new Iterators\MapIterator($this->iterator,$callable));
}
There is a bit to unpack there.
- We are defining a new type
UValue
. This will be the type of the new iterator. - We specify that the function we’ll receive is a callable takes a
TValue
(ie: the type of our original iterator) and returns aUValue
. - We specify that the method returns a new instance of our Iterator with types
TKey
andUValue
And that’s it. Well not quite, there is still an implementation to do. But the important point is that you can enforce that specification both ways.
You can validate that your implementation matches it using these tools, but also anyone calling your code, assuming they use these tools will get an error if they use it incorrectly.
What kind of errors? They’ll probably vary from tool to tool, but this is what phpstan reports:
LazyIter::fromArray(["one","two","three","four"])
->map(fn(int $n): int => $n * 2)
->collect();Parameter #1 $callable of method LazyIter\LazyIter<int,string>::map() expects callable(string): int, Closure(int): int given.
And just to clarify, this is all happening without ever executing the code. It’s the tools inferring the types based on the array passed, and then validating that the types of the callable passed to map match the specification of what they should be.
That was a complex example, but hopefully it serves as a good illustration of how far these tools can go, and the value you can get from them. And, also, as an illustration of things that aren’t part of the language today which you can leverage for your own projects.