Collections

Iterators

Language

An iterator is not a collection, but rather a way to access the elements of a collection one by one. The two basic operations on an iterator it are next and hasNext. A call to it.next() will return the next element of the iterator and advance the state of the iterator. Calling next again on the same iterator will then yield the element one beyond the one returned previously. If there are no more elements to return, a call to next will throw a NoSuchElementException. You can find out whether there are more elements to return using Iterator’s hasNext method.

The most straightforward way to “step through” all the elements returned by an iterator it uses a while-loop:

while (it.hasNext)
  println(it.next())
while it.hasNext do
  println(it.next())

Iterators in Scala also provide analogues of most of the methods that you find in the Iterable and Seq classes. For instance, they provide a foreach method which executes a given procedure on each element returned by an iterator. Using foreach, the loop above could be abbreviated to:

it.foreach(println)

As always, for-expressions can be used as an alternate syntax for expressions involving foreach, map, withFilter, and flatMap, so yet another way to print all elements returned by an iterator would be:

for (elem <- it) println(elem)
for elem <- it do println(elem)

There’s an important difference between the foreach method on iterators and the same method on iterable collections: When called on an iterator, foreach will leave the iterator at its end when it is done. So calling next again on the same iterator will fail with a NoSuchElementException. By contrast, when called on a collection, foreach leaves the number of elements in the collection unchanged (unless the passed function adds or removes elements, but this is discouraged, because it may lead to surprising results).

The other operations that Iterator has in common with Iterable have the same property. For instance, iterators provide a map method, which returns a new iterator:

scala> val it = Iterator("a", "number", "of", "words")
val it: Iterator[java.lang.String] = <iterator>

scala> it.map(_.length)
val res1: Iterator[Int] = <iterator>

scala> it.hasNext
val res2: Boolean = true

scala> res1.foreach(println)
1
6
2
5

scala> it.hasNext
val res4: Boolean = false

As you can see, after the call to it.map, the it iterator hasn’t advanced to its end, but traversing the iterator resulting from the call to res1.foreach also traverses it and advances it to its end.

Another example is the dropWhile method, which can be used to find the first elements of an iterator that has a certain property. For instance, to find the first word in the iterator above that has at least two characters you could write:

scala> val it = Iterator("a", "number", "of", "words")
val it: Iterator[java.lang.String] = <iterator>

scala> it.dropWhile(_.length < 2)
val res4: Iterator[java.lang.String] = <iterator>

scala> res4.next()
val res5: java.lang.String = number

Note again that it was changed by the call to dropWhile: it now points to the second word “number” in the list. In fact, it and the result res4 returned by dropWhile will return exactly the same sequence of elements.

One way to circumvent this behavior is to duplicate the underlying iterator instead of calling methods on it directly. The two iterators that result will each return exactly the same elements as the underlying iterator it:

scala> val (words, ns) = Iterator("a", "number", "of", "words").duplicate
val words: Iterator[String] = <iterator>
val ns: Iterator[String] = <iterator>

scala> val shorts = words.filter(_.length < 3).toList
val shorts: List[String] = List(a, of)

scala> val count = ns.map(_.length).sum
val count: Int = 14

The two iterators work independently: advancing one does not affect the other, so that each can be destructively modified by invoking arbitrary methods. This creates the illusion of iterating over the elements twice, but the effect is achieved through internal buffering. As usual, the underlying iterator it cannot be used directly and must be discarded.

In summary, iterators behave like collections if one never accesses an iterator again after invoking a method on it. The Scala collection libraries make this explicit with an abstraction IterableOnce, which is a common superclass of Iterable and Iterator. IterableOnce[A] only has two methods: iterator: Iterator[A] and knownSize: Int. If an IterableOnce object is in fact an Iterator, its iterator operation always returns itself, in its current state, but if it is an Iterable, its iterator operation always return a new Iterator. A common use case of IterableOnce is as an argument type for methods that can take either an iterator or a collection as argument. An example is the appending method concat in class Iterable. It takes an IterableOnce parameter, so you can append elements coming from either an iterator or a collection.

All operations on iterators are summarized below.

Operations in class Iterator

WHAT IT IS WHAT IT DOES
Abstract Methods:  
it.next() Returns next element on iterator and advances past it.
it.hasNext Returns true if it can return another element.
Variations:  
it.buffered A buffered iterator returning all elements of it.
it.grouped(size) An iterator that yields the elements returned by it in fixed-sized sequence “chunks”.
it.sliding(size) An iterator that yields the elements returned by it in sequences representing a sliding fixed-sized window.
Duplication:  
it.duplicate A pair of iterators that each independently return all elements of it.
Additions:  
it.concat(jt)
or it ++ jt
An iterator returning all elements returned by iterator it, followed by all elements returned by iterator jt.
it.padTo(len, x) The iterator that first returns all elements of it and then follows that by copies of x until length len elements are returned overall.
Maps:  
it.map(f) The iterator obtained from applying the function f to every element returned from it.
it.flatMap(f) The iterator obtained from applying the iterator-valued function f to every element in it and appending the results.
it.collect(f) The iterator obtained from applying the partial function f to every element in it for which it is defined and collecting the results.
Conversions:  
it.toArray Collects the elements returned by it in an array.
it.toList Collects the elements returned by it in a list.
it.toIterable Collects the elements returned by it in an iterable.
it.toSeq Collects the elements returned by it in a sequence.
it.toIndexedSeq Collects the elements returned by it in an indexed sequence.
it.toLazyList Collects the elements returned by it in a lazy list.
it.toSet Collects the elements returned by it in a set.
it.toMap Collects the key/value pairs returned by it in a map.
Copying:  
it.copyToArray(arr, s, n) Copies at most n elements returned by it to array arr starting at index s. The last two arguments are optional.
Size Info:  
it.isEmpty Test whether the iterator is empty (opposite of hasNext).
it.nonEmpty Test whether the collection contains elements (alias of hasNext).
it.size The number of elements returned by it. Note: it will be at its end after this operation!
it.length Same as it.size.
it.knownSize The number of elements, if this one is known without modifying the iterator’s state, otherwise -1.
Element Retrieval Index Search:  
it.find(p) An option containing the first element returned by it that satisfies p, or None is no element qualifies. Note: The iterator advances to after the element, or, if none is found, to the end.
it.indexOf(x) The index of the first element returned by it that equals x. Note: The iterator advances past the position of this element.
it.indexWhere(p) The index of the first element returned by it that satisfies p. Note: The iterator advances past the position of this element.
Subiterators:  
it.take(n) An iterator returning of the first n elements of it. Note: it will advance to the position after the n‘th element, or to its end, if it contains less than n elements.
it.drop(n) The iterator that starts with the (n+1)‘th element of it. Note: it will advance to the same position.
it.slice(m,n) The iterator that returns a slice of the elements returned from it, starting with the m‘th element and ending before the n‘th element.
it.takeWhile(p) An iterator returning elements from it as long as condition p is true.
it.dropWhile(p) An iterator skipping elements from it as long as condition p is true, and returning the remainder.
it.filter(p) An iterator returning all elements from it that satisfy the condition p.
it.withFilter(p) Same as it filter p. Needed so that iterators can be used in for-expressions.
it.filterNot(p) An iterator returning all elements from it that do not satisfy the condition p.
it.distinct An iterator returning the elements from it without duplicates.
Subdivisions:  
it.partition(p) Splits it into a pair of two iterators: one returning all elements from it that satisfy the predicate p, the other returning all elements from it that do not.
it.span(p) Splits it into a pair of two iterators: one returning all elements of the prefix of it that satisfy the predicate p, the other returning all remaining elements of it.
Element Conditions:  
it.forall(p) A boolean indicating whether the predicate p holds for all elements returned by it.
it.exists(p) A boolean indicating whether the predicate p holds for some element in it.
it.count(p) The number of elements in it that satisfy the predicate p.
Folds:  
it.foldLeft(z)(op) Apply binary operation op between successive elements returned by it, going left to right and starting with z.
it.foldRight(z)(op) Apply binary operation op between successive elements returned by it, going right to left and starting with z.
it.reduceLeft(op) Apply binary operation op between successive elements returned by non-empty iterator it, going left to right.
it.reduceRight(op) Apply binary operation op between successive elements returned by non-empty iterator it, going right to left.
Specific Folds:  
it.sum The sum of the numeric element values returned by iterator it.
it.product The product of the numeric element values returned by iterator it.
it.min The minimum of the ordered element values returned by iterator it.
it.max The maximum of the ordered element values returned by iterator it.
Zippers:  
it.zip(jt) An iterator of pairs of corresponding elements returned from iterators it and jt.
it.zipAll(jt, x, y) An iterator of pairs of corresponding elements returned from iterators it and jt, where the shorter iterator is extended to match the longer one by appending elements x or y.
it.zipWithIndex An iterator of pairs of elements returned from it with their indices.
Update:  
it.patch(i, jt, r) The iterator resulting from it by replacing r elements starting with i by the patch iterator jt.
Comparison:  
it.sameElements(jt) A test whether iterators it and jt return the same elements in the same order. Note: Using the iterators after this operation is undefined and subject to change.
Strings:  
it.addString(b, start, sep, end) Adds a string to StringBuilder b which shows all elements returned by it between separators sep enclosed in strings start and end. start, sep, end are all optional.
it.mkString(start, sep, end) Converts the collection to a string which shows all elements returned by it between separators sep enclosed in strings start and end. start, sep, end are all optional.

Laziness

Unlike operations directly on a concrete collection like List, operations on Iterator are lazy.

A lazy operation does not immediately compute all of its results. Instead, it computes the results as they are individually requested.

So the expression (1 to 10).iterator.map(println) would not print anything to the screen. The map method in this case doesn’t apply its argument function to the values in the range, it returns a new Iterator that will do this as each one is requested. Adding .toList to the end of that expression will actually print the elements.

A consequence of this is that a method like map or filter won’t necessarily apply its argument function to all the input elements. The expression (1 to 10).iterator.map(println).take(5).toList would only print the values 1 to 5, for instance, since those are only ones that will be requested from the Iterator returned by map.

This is one of the reasons why it’s important to only use pure functions as arguments to map, filter, fold and similar methods. Remember, a pure function has no side-effects, so one would not normally use println in a map. println is used to demonstrate laziness as it’s not normally visible with pure functions.

Laziness is still valuable, despite often not being visible, as it can prevent unneeded computations from happening, and can allow for working with infinite sequences, like so:

def zipWithIndex[A](i: Iterator[A]): Iterator[(Int, A)] =
  Iterator.from(0).zip(i)

Buffered iterators

Sometimes you want an iterator that can “look ahead”, so that you can inspect the next element to be returned without advancing past that element. Consider for instance, the task to skip leading empty strings from an iterator that returns a sequence of strings. You might be tempted to write the following

def skipEmptyWordsNOT(it: Iterator[String]) =
  while (it.next().isEmpty) {}
def skipEmptyWordsNOT(it: Iterator[String]) =
  while it.next().isEmpty do ()

But looking at this code more closely, it’s clear that this is wrong: The code will indeed skip leading empty strings, but it will also advance it past the first non-empty string!

The solution to this problem is to use a buffered iterator. Class BufferedIterator is a subclass of Iterator, which provides one extra method, head. Calling head on a buffered iterator will return its first element but will not advance the iterator. Using a buffered iterator, skipping empty words can be written as follows.

def skipEmptyWords(it: BufferedIterator[String]) =
  while (it.head.isEmpty) { it.next() }
def skipEmptyWords(it: BufferedIterator[String]) =
  while it.head.isEmpty do it.next()

Every iterator can be converted to a buffered iterator by calling its buffered method. Here’s an example:

scala> val it = Iterator(1, 2, 3, 4)
val it: Iterator[Int] = <iterator>

scala> val bit = it.buffered
val bit: scala.collection.BufferedIterator[Int] = <iterator>

scala> bit.head
val res10: Int = 1

scala> bit.next()
val res11: Int = 1

scala> bit.next()
val res12: Int = 2

scala> bit.headOption
val res13: Option[Int] = Some(3)

Note that calling head on the buffered iterator bit does not advance it. Therefore, the subsequent call bit.next() returns the same value as bit.head.

As usual, the underlying iterator must not be used directly and must be discarded.

The buffered iterator only buffers the next element when head is invoked. Other derived iterators, such as those produced by duplicate and partition, may buffer arbitrary subsequences of the underlying iterator. But iterators can be efficiently joined by adding them together with ++:

scala> def collapse(it: Iterator[Int]) = if (!it.hasNext) Iterator.empty else {
      |  var head = it.next
      |  val rest = if (head == 0) it.dropWhile(_ == 0) else it
      |  Iterator.single(head) ++ rest
      |}
def collapse(it: Iterator[Int]): Iterator[Int]

scala> def collapse(it: Iterator[Int]) = {
      |  val (zeros, rest) = it.span(_ == 0)
      |  zeros.take(1) ++ rest
      |}
def collapse(it: Iterator[Int]): Iterator[Int]

scala> collapse(Iterator(0, 0, 0, 1, 2, 3, 4)).toList
val res14: List[Int] = List(0, 1, 2, 3, 4)
scala> def collapse(it: Iterator[Int]) = if !it.hasNext then Iterator.empty else
      |  var head = it.next
      |  val rest = if head == 0 then it.dropWhile(_ == 0) else it
      |  Iterator.single(head) ++ rest
      |
def collapse(it: Iterator[Int]): Iterator[Int]

scala> def collapse(it: Iterator[Int]) =
      |  val (zeros, rest) = it.span(_ == 0)
      |  zeros.take(1) ++ rest
      |
def collapse(it: Iterator[Int]): Iterator[Int]

scala> collapse(Iterator(0, 0, 0, 1, 2, 3, 4)).toList
val res14: List[Int] = List(0, 1, 2, 3, 4)

In the second version of collapse, the unconsumed zeros are buffered internally. In the first version, any leading zeros are dropped and the desired result constructed as a concatenated iterator, which simply calls its two constituent iterators in turn.

Contributors to this page: