Advanced Elm Advanced Elm

Units of Measure & Phantom Types

Check out a free preview of the full Advanced Elm course:
The "Units of Measure & Phantom Types" Lesson is part of the full, Advanced Elm course featured in this preview video. Here's what you'd learn in this lesson:

Using the Mars Climate Orbiter disaster as an application, Richard demonstrates how to use custom types to add measurement units to numbers, and contrasts this method with a refactored solution using phantom types.

Get Unlimited Access Now

Transcript from the "Units of Measure & Phantom Types" Lesson

[00:00:00]
>> Richard Feldman: So part three, creating constraints. We'll talk about a few things here. First, we're gonna talk about units of measure. And we're gonna talk about accessible HTML. Then the Never type. And finally, type parameter design. Okay, so this right here is the Mars Climate Orbiter. I should say the ill-fated Mars Climate Orbiter.

[00:00:19] A bad thing happened to this, because there were two different teams of programmers working on it. And one of the teams put some of their units in imperial units like inches and feet, and stuff, and the others used the SI units of centimeters, and millimeters, and so forth.

[00:00:35] And they got their units mixed up, and so their calculations when they sort of put them together didn't end up giving it the right numbers, and so it crashed. So that's a bummer, and it would have been nice if we could have averted that disaster by having some way to prevent ourselves from mixing up our units of measure.

[00:00:55] So here's a way we can do that using Elm. The server defining a type called length and length has a type parameter called units or a type variable called units. And all it holds inside is two things. One, is it holds on to a float, which can be actual measurements itself.

[00:01:11] And then it's holding into another guide for the units. Now, let's separately define type four centimeters, type for inches. And just like we had in the previous example with preview. We're gonna have these just defined with a single variant that doesn't hold on anything. Since are just place holders.

[00:01:27] So they're just there to describe the units themselves. They don't actually hold any information themselves. So we can write a function that called centimeters, which is sort of convenience that takes a float and gives you back the length of centimeters. Same thing with inches. Give limitations of those just be to take the float, put it inside this length variant, and then the hard code centimeters in this case and inches in the other case for the units value.

[00:01:51]
>> Richard Feldman: We could also write a function called Add, which we won't get in the implementation just yet, but it would have this type. It would say length units to length units. And so if you gave it some different things that had the same units. They would work out.

[00:02:04] So I could say it like add 5 centimeters and 12 centimeters. And that'll work out and give me some answer that had a length centimeters, which is what I want. And I could also do the same thing with inches, say add inches 5, inches 12 and that would work out.

[00:02:18] But if I tried to do centimeters for one, inches for the other, those would not unify and it would give me a compile error. Okay, so here's one way we can implement add. We can say add and then immediately destructure this one variant with num1, which would be the float of the first one, and then num2, for the second one.

[00:02:37] And we can also have units1 and units2, which we know from the type, are going to be the same thing. So we can return length and just by chance we decided to pick units1 there, but we could have picked either one, we could have also said it returns units2.

[00:02:51] Another way we could do this is what's known as a phantom type. Now that transition was pretty quick, so let me just go back and forth so you can see it too. So this is with the phantom type and this is without phantom type, without. So the difference is phantom type version, we still have units up here in the type of length, but we don't actually have it in the variant.

[00:03:12] We don't use it, it just sits up there being constrained, being all constrainty, but it doesn't actually exist anywhere in the implementation. It's something that we're defining at the type level purely for the benefit of opting into a constraint that we're going to enforce by other means than what's actually inside there.

[00:03:31] So again going back and forth. So this version actually holds on to some units. This version doesn't. This version has to pattern match on those. It has to pull the units out and this one pulls it out in both places, whereas the other one doesn't. And then, of course, the first one ends up needing to provide that to the variants, so that we can actually construct one of these length values.

[00:03:51] Which it does, and it arbitrarily chooses units one, but it could just have easily have chosen units2. Whereas this one doesn't provide that at all. So this sorta begs the question, okay, well in this version, we can see that it needs to be at least returning some sort of units here, either units1 or units2.

[00:04:10] But this one doesn't need to be doing that. So when it says that it takes length units and length units and returns length units, why is that true? I mean, is that just some sort of made up nonsense like really, couldn't we write just as easily write this one, where it claims to take ab and return c?

[00:04:27] Yeah, we could. So in actuality, there is no constraint on length, like in a to the way that we designed this implementation. The whole point of the phantom type, is that it's a constraint that is sort of free floating, does not exist by default, doesn't do anything by default.

[00:04:45] But when you write type annotations in Elm, you have the choice of choosing to make them more constrained than they have to be if you want to. And the compiler will respect that. So most of the time in Elm, when you write a type annotation, it's just for documentation.

[00:04:59] And you're gonna put down pretty close to exactly what the compiler would infer. However, there are some examples like this where it might make sense for you to say, you know what, I know that technically, I could have length of one thing and length of another thing as the two arguments in return, a length of a third thing.

[00:05:17] And I don't really care what they are, doesn't really matter. I'm just gonna ignore them anyway so, whatever. Just give me anything, give me centimeters, millimeters, feet, uers, I don't care. I doesn't matter because I'm not gonna use those values anyway. You could do that but then you would be defeating the entire purpose of this design, which is that the whole point here is that we want them to be constrained.

[00:05:38] We want to enforce that they are the same in all three cases, or rather the case of the two arguments. Because otherwise it's entirely possible that you will give centimeters on one side and inches on the other side. And there's still compile they just will result in it unfortunately crushed miles pro.

[00:05:57] So the trade off here is that here we have a little bit more enforcement. Its worth noting though that although we have a little bit more enforcement, we still have to constrain this add type annotation artificially. So notice that we chose arbitrarily to return units1 here. Because of that, really, in actuality, we could change the length of this other one to say, length units2, or length b or length x, and then return the length units.

[00:06:24] Really, we are still artificially constraining this one on purpose, and opting in to saying actually, no, I promised, those second ones need to be there. And this is sort of a natural consequence of having types that we don't really end up using in our actual calculations, or anything like that.

[00:06:41] So this implementation at least requires that the return value is going to be one of the two that we gave it, but it's still not sort of innately as constrained as we've made it here. In either implementation, the phantom type, or the non-phantom type version, we have to sort of opt into constraining it more.

[00:06:56] If we want it to actually do its job of preventing the type mismatch between the inches, the imperial units and the SI units. Okay, so keeping on that mind, what are the tradeoffs between the two like? Why would I choose one or the other? I mean I think the value of having some sort of enforcement is pretty clear, we can prevent ourselves from making the thing crash due to mixed up units of measure.

[00:07:19] However, it's less clear in Italy what the tradeoffs are or why you might choose to use the phantom type versus actually holding on to the units inside the value. And it's honestly, the tradeoff really comes down to the degree of flexibility that the phantom types give you and also there is a little bit of a performance aspect to it.

[00:07:37] So in Elm, when you have a custom type with a single variant, and that variant holds onto one value. That's sort of a trivial type, you look at this and you say yeah, this is just a wrapper around this one type. Do we really need that wrapper at run time?

[00:07:54] And the answer is, no we don't. And so actually when Elm encounters this specific situation, a custom type with one variant that holds on the one value. It does what's called unboxing, when you run it, it was dash dash optimize. Which is to say, this code right here at compiles sometime is a custom type.

[00:08:10] But a run type is just a float, like it's as if we just said alias Length = Float at run time. Now compile is not gonna behave exactly like a fully box wrapped up custom type with all the trimmings, but a compile time, that's a performance optimization I was able to do.

[00:08:25] Because it says well we'll just substitute it in there and we already checked to make sure that it's gonna be used consistently across the code base. So it's fine to just unbox it and not bother putting that wrapper around it since the wrapper was only for the benefit of the compiler anyway.

[00:08:39] However, it can't do that with this version, because this version is holding on to two things. If we were to substitute it with float, then this other piece of information which theoretically we might be using at runtime, would get discarded, we'd no longer be, it wouldn't work anymore.

[00:08:52] Now, as it happens, we know that we're not using this thing at runtime, but the compiler doesn't really have any way of guaranteeing that ahead of time. I guess, theoretically, it might be able to figure out someday but there's the running joke. With a sufficiently smart compiler you can do just about anything, but of course, compilers don't just build themselves.

[00:09:11] So there is a definite performance advantage here and the flexibility is both a pro and a con. On the one hand, the flexibility to be able to write essentially whatever type of variables you want in here. To make them as constrained or unconstrained as you like, it's sort of a blessing and a curse.

[00:09:27] On one hand, it's a blessing in the sense that hey, you can be as flexible as you want, maybe we want this. On the other hand, it's some sets of foot gun, I can very easily shoot myself in the foot with this and make it so that I have defined a type for add, which completely defeats the entire purpose of my going down this road anyway.

[00:09:44] That makes it so that it's perfectly possible to mix these units up, and I won't get any compile time errors. So, that flexibility is both a pro and a con. So essentially the tradeoff between these two options here is, the phantom type version where we only hold onto a float and we do not hold onto the units.

[00:10:01] We only have the units as part of the type variable so that we can constrain it if we want to, gives us more performance but fewer guaranties. It's a little bit easier to shoot ourselves in the foot. The other version doesn't quite has good performance but it does have a little bit more in the way of guarantees because this is actually part of the implementation.

[00:10:21] Having said that, the performance difference here is probably not going to really matter in practice for almost all use cases. So I would definitely benchmark before worrying about this from a performance perspective.