SIP-ZZ - Opaque types

Language

Authors: Erik Osheim and Jorge Vicente Cantero

Supervisor and advisor: Sébastien Doeraene

History

Date Version
Sep 20th 2017 Initial Draft

Introduction

This is a proposal to introduce syntax for type aliases that only exist at compile time and emulate wrapper types.

The goal is that operations on these wrapper types must not create any extra overhead at runtime while still providing a type safe use at compile time.

Some use cases for opaque types are:

  • New numeric classes, such as unsigned ints. There would no longer need to be a boxing overhead for such classes. So this is similar to value types in .NET and newtype in Haskell.

  • Classes representing units of measure. Again, no boxing overhead would be incurred for these classes.

  • Classes representing different entities with the same underlying type, such as Id and Password being defined in terms of String.

We expand on all these points in our Motivation section.

For a definition of boxing and previous state-of-the-art, we recommend reading SIP-15.

Implementation note

The proposal is currently in an early stage. It’s being implemented, but the proposed implementation strategy is not detailed enough to be able to predict with certainty that it will work as specified. Consequently, details of the proposal might change driven by implementation concerns.

Opaque types

Motivation

Value classes, a Scala feature proposed in SIP-15, were introduced to the language to offer inlined classes whose operations have zero overhead at runtime.

This feature allows users to define classes with a few restrictions in exchange of performance (value classes are boxed/unboxed under some concrete scenarios, that are deemed to be commonplace).

These scenarios, while certainly common, do not cover the majority of scenarios that library authors have to deal with. In reality, experimentation shows that they are insufficient, and hence performance-sensitive code suffers (see Appendix A).

In the following example, we show the current limitations of value classes and motivate the need of compile-time wrapper types.

An example

Imagine we are working with floating-point data that are logarithmically-scaled (e.g. sensor data). We might have gigabytes of data which would be convenient to represent in terms of arrays of primitive doubles (rather than boxing each value).

We might choose to use value classes as follows:

package object valueclass {
  class Logarithm(val exponent: Double) extends AnyVal {
    def toDouble: Double = math.exp(exponent)
    def +(that: Logarithm): Logarithm = Logarithm(toDouble + that.toDouble)
    def *(that: Logarithm): Logarithm = new Logarithm(exponent + that.exponent)
  }
  
  object Logarithm {
    def apply(x: Double): Logarithm = new Logarithm(math.log(x))
  }
}

package object usesites {
  // 1e7 is approximately (e ** 16.11809565095832),
  // so x is encoded as 16.11809565095832.
  val x: Logarithm = Logarithm(1e7) 
}

Users of this library can use new Logarithm(...) to wrap Double values (and access alternate implementations of + and *) without allocating Logarithm values. This is confirmed when users have a look at the generated bytecode for Logarithm and notice that the signature of apply is redefined as def apply(x: Double): Double and that val x: Logarithm = Logarithm(1e7) is instead val x: Double = Logarithm.apply(1e7).

Value classes can rewrite many instances Logarithm in terms of Double, including local variables, method parameters, return types, and most fields. SIP-15 lays out the exact circumstances when these rules occur, and extension methods ensure that boxing and unboxing occurs transparently.

Unfortunately, this transparent boxing is relatively easy to trigger. Some very common situations where Logarithm instances will be allocated at runtime include:

  • use in arrays (e.g. Array[Logarithm])
  • use in generic collections (e.g. List[Logarithm])
  • parameters or return values of anonymous functions
  • calling equals or hashCode on a Logarithm

Concretely, consider:

val x = Logarithm(1e5)
val xs = List(Logarithm(12345.0), Logarithm(67890.0)).map(_ + x)

When looking at the bytecode, the author will notice that the the emitted code boxes Double values into allocated Logarithm values, stores those in the list, then produces a Function1[Logarithm, Logarithm] which unboxes each logarithm and allocates a new result. Users who expect use of value classes to be allocation-free (and equivalent to working with the underlying values) will be sorely-disappointed in these cases.

Appendix A illustrates more cases where this unintended boxing/unboxing happens, hurting the performance of the code.

Boxing/unboxing of value classes happens anywhere in the program where the type signatures are generic and require the runtime to pass a java.lang.Object.

Given the frequency of these scenarios in real-world code, we would like to introduce the notion of compile-time wrapper types. Compile-time wrapper types do not depend on the runtime and its capabilities, and are always erased at runtime. The previous code snippet can be implemented using opaque types and will produce code that never boxes/unboxes.

In exchange, opaque types disallow the redefinition of equals and hashCode, which in our experience is often unnecessary.

Definition

Let’s continue with the example in our motivation section, and define Logarithm with an opaque type:

package object opaquetypes {
  opaque type Logarithm = Double
}

The opaque type definition is akin to the one of type aliases, except that it is prefixed by a new opaque keyword.

Opaque types are always accompanied by type companions. Type companions define the public API of an opaque type, and they are defined in the same way that class companions are. It is possible to define an opaque type without a type companion, but then the opaque type is useless.

Let’s define a type companion for our previous opaque type Logarithm:

package object opaquetypes {
  // The previous opaque type definition
  opaque type Logarithm = Double

  object Logarithm {
    // These are the ways to lift to the logarithm type
    def apply(d: Double): Logarithm = math.log(d)

    def safe(d: Double): Option[Logarithm] =
      if (d > 0.0) Some(math.log(d)) else None
      
    // This is the first way to unlift the logarithm type
    def exponent(l: Logarithm): Double = l

    // Extension methods define opaque types' public APIs
    implicit class LogarithmOps(val `this`: Logarithm) extends AnyVal {
      // This is the second way to unlift the logarithm type
      def toDouble: Double = math.exp(`this`)
      def +(that: Logarithm): Logarithm = Logarithm(math.exp(`this`) + math.exp(that))
      def *(that: Logarithm): Logarithm = Logarithm(`this` + that)
    }
  }
}

The above Logarithm type companion contains the following definitions:

  • Methods to lift the type from Double to Logarithm (i.e. apply and safe)
  • Methods to lower the type from Logarithm to Double (i.e. exponent)
  • Extension methods to unlift the type from Logarithm to Double (i.e. toDouble)
  • Extension methods to define more operations on the type (i.e. like + and *)

The key peculiarity of opaque types is that they behave like normal type aliases inside their type companion; that is, users can convert from the type alias and its equivalent definition interchangeably without the use of explicit lift and unlift methods. We can say that opaque types are “transparent” inside their type companion.

However, the story changes for users of this API. Outside of their type companions, opaque types are not transparent and, therefore, the Scala compiler fails to compile code that pretends they are. A common example is:

val l: Logarithm = 1.0

which fails to compile with a type mismatch error:

<console>:11: error: type mismatch;
 found   : Double
 required: Logarithm
       val l: Logarithm = 1.0
                          ^

The same happens for val d: Double = l where l is an instance of Logarithm.

The idea, then, is to let library authors create wrapper types and their API in a concrete, isolated part of their code and force users to use this API to lift to and unlift from the opaque type.

By design, downstream users can define more operations on these opaque types via their own extension methods, but they cannot create a new API to lift/unlift them, e.g. users need to use the library-provided toDouble, Logarithm.safe and Logarithm.apply.

The following code showcases legit uses of the Logarithm opaque type:

package object usesites {
  import opaquetypes._
  val l = Logarithm(1.0)
  val l2 = Logarithm(2.0)
  val l3 = l + l2
  val d = l3.toDouble
  val l3: Logarithm = (1.0).asInstanceOf[Logarithm]
}

While the following fails to typecheck:

package object usesites {
  import opaquetypes._
  val l: Logarithm = Logarithm(1.0)
  val d: Double = l // fails to typecheck
  val l2: Logarithm = 1.0 // fails to typecheck
}

For the sake of completeness, this is how you extend these opaque types with more operations:

package object usesites {
  // ...
  implicit class UserOps(`this`: Logarithm) extends AnyVal {
    def **(that: Logarithm): Logarithm = Logarithm(`this` * that)
  }
}

Note that the rational for this encoding is to allow users to convert from the opaque type and the underlying type in constant time. Users have to be able to tag complex type structures without having to reallocate, iterate over or inspect it.

Formal definition

The Scala Language doesn’t have a concept of either opaque types or type companions. In the following section, we formalize our previous definitions and specify the required changes to the Scala Language Specification.

Opaque type definition

An opaque type follows the same structure as an alias type but it requires the use of a new opaque modifier.

LocalModifier     ::=  ‘abstract’
                    |  ‘final’
                    |  ‘sealed’
                    |  ‘implicit’
                    |  ‘lazy’
                    |  ‘opaque’

This new modifier is then used to qualify the type alias definition:

Def        ::= [‘opaque‘] ‘type’ {nl} TypeDef
TypeDef    ::=  id [TypeParamClause] ‘=’ Type

Opaque modifiers are only valid for type definitions. Note that contrary to type alias definitions, opaque type definitions cannot be overridden.

Here’s a formal definition of opaque types:

An opaque type t = T defines t to be an alias name for the type T only in the scope of the opaque type companion t. The left hand side of an opaque type may have a type parameter clause, e.g. type t[tps] = T. The scope of a type parameter extends over the right hand side T and the type parameter clause tps itself.

As per this definition, opaque types modify the type equivalence relationship explained in the 3.5.1. Equivalence section of the Scala Language Specification. In particular, the next item qualifies the type equivalence for opaque types:

If t is defined by an opaque type opaque type t = T, then t is not equivalent to T unless t occurs in the template of the opaque type companion.

In the Implementation notes, we explain how this can be achieved in the implementation.

Opaque type companion

We define a type companion in a similar way companion classes are described in the Scala Language Specification in 5.5. Object Definitions:

Generally, a companion module of an opaque type is an object which has the same name as the opaque type and is defined in the same scope and compilation unit. Conversely to companion classes, the companion class of the module does not have a notion of companion type.

These opaque type companions are also part of the implicit search scope of the opaque type t. Therefore, uses of extension methods defined in the opaque type companion do not require users to import the opaque type companion explicitly.

Technical discussions

In our current proposal, we make several trade-offs. Next, we defend these trade-offs and propose alternative ways to achieve the same (if any).

opaque as a modifier

opaque has been added as a modifier to a type def for simplicity. We believe that a new keyword for this feature is worthwhile.

Note that adding opaque as a modifier prevents the use of opaque anywhere in the users’ program, which could possibly break someone’s code. To fix this scenario, we could create a Scalafix rule that will rewrite any place where opaque is an identifier.

For those SIP Committee members wary of adding a new keyword to the language, we propose an alternative approach. Instead of defining opaque types with the opaque modifier, opaque types may also be defined by combining the existing new and type keywords, e.g. new type Logarithm = Double. This option would be akin to the syntax used in Haskell for wrapper types, e.g. newtype.

Type companions

This proposal only adds the notion of type companions for opaque types. After discussing with members of the SIP Committee and Scala compiler hackers, we believe that a notion of type companions extended to type aliases would not work because the typechecker aggressively dealiases**, and it is not possible to link to the type companion symbol once type aliases have been dealiased.

** Note that dealiasing means beta reducing a type alias.

Static extension methods

The extension methods synthesized for implicit value classes are not static. We have not measured if the current encoding has an impact in performance, but if so, we can consider changing the encoding of extension methods either in this proposal or in a future one.

Implementation notes

To implement opaque types, we need to modify three compiler phases: parser, namer and typer. At the time of this writing, it is unclear if later phases like erasure must be changed as well, but we think this should not be necessary.

There are several key ideas in the current, work-in-progress implementation:

  • To meet the type equivalence relationship for opaque types, we synthesize two implicit conversions inside the opaque type companion, if they do not already exist. If opaque type t = T, then two implicit conversions are synthesized, one from t to T is synthesized and another for the other way around. The body of these methods will use t.asInstanceOf[T] and viceversa **.

  • Phases after typer always dealias opaque types. This way, erasure and codegen can unwrap opaque types out of the box and, at the bytecode level, their underlying representation is used instead.

All these ideas are open to refinements by the SIP Committee.

On removing asInstanceOf at compile-time

** Note that these asInstanceOf can be removed at compile-time, but there is no precedent of doing so in the Scalac compiler. However, it is not clear whether leaving these casts will have an impact on performance – the underlying virtual machine may remove the operation based on type analysis due to the fact that the cast is from Double => Double via Class Hierarchy Analysis. Despite not having a precedence, the authors of the proposal incline to remove these checks at compile-time. This will also require a modification to the spec.

We’re also assuming that implicit enrichment via value types is a sufficient way to provide zero-cost extension methods. We’ll need to benchmark opaque types + value class enrichment to ensure that there aren’t hidden performance problems.

Cross-platform

We believe that by implementing opaque types early in the pipeline, Scala.js and Scala Native can compile them out-of-the-box. Thus, we do not expect opaque types to have problems for different backends, since erasure will always see the dealiased types.

Conclusion

We believe that opaque types fit in the language nicely. The combination of type aliases and value classes (for the zero runtime overhead of extension methods) result in compile-time wrapper types that address the performance issues of value classes.

As a summary, opaque types are:

  • A subset of type aliases that are transparent only within the type companion.
  • Extensible (users can define their own extension methods).
  • Fit for performance-sensitive code (no boxing/unboxing by design).
  • A lightweight way to add extra type-safety to strings, numbers, etc.

References

  1. (Appendix A) The High Cost of AnyVal classes