Raffi Khatchadourian (based on “Scala for the Impatient” 2nd Edition by Cay Horstmann)
October 20, 2020
Iterable
sAn Iterable
is any collection that can yield an Iterator
, which allows you to systematically access each element of the collection (see the Iterator Pattern).
Seq
uencesSeq
is an ordered sequence of values like an array or a List
.IndexedSeq
allows fast random access through an integer index.An ArrayBuffer
is an IndexedSeq
but a LinkedList
is not.
Set
sSet
is an unordered collection of values.SortedSet
maintains an sorted visitation order.
Map
sMap
is a set of (key, value) pairs.SortedMap
maintains a sorted visitation order by the keys.Each Scala collection trait or class has a companion object with an apply
method that constructs instances of the collection.
toSeq
, toSet
, toMap
, etc. to convert between collection types.to[C]
for type C
(the target collection type).scala.collection.mutable.Map
scala.collection.immutable.Map
The preference is to immutability.
Compute the set of all digits of an integer:
Vector
is the immutable equivalent of an ArrayBuffer
.Range
is an integer sequence, e.g., 1 to 10
or 10.to(30, 10)
.Nil
(empty) or an object with a head
and a tail
.tail
is also a List
.car
and cdr
operations in Lisp.val lst = List(4, 2) // : List[Int] = List(4, 2)
lst.head // : Int = 4
lst.tail // : List[Int] = List(2)
lst.tail.head // : Int = 2
lst.tail.tail // : List[Int] = List()
List
sYou can use ::
to create a List
with a given head
and tail
:
List(9, 4, 2)
.cons
operator in Lisp for constructing lists.List
sNatural fit for recursion:
def sum(lst: List[Int]): Int =
if (lst == Nil) 0 else lst.head + sum(lst.tail)
sum(List(9, 4, 2)) // : Int = 15
Use pattern matching:
def sum(lst: List[Int]): Int = lst match {
case Nil => 0
case h :: t => h + sum(t) // h is lst.head, t is lst.tail
}
sum(List(9, 4, 2)) // : Int = 15
This is just for demonstration purposes. Should really just use the built-in method:
Set
sSet
is an unordered collection of unique elements.Set
that already exists in the Set
has no effect.Set(2, 0, 1) + 1 == Set(2, 0, 1) // : Boolean = true
Set(2, 0, 1) + 4 == Set(2, 0, 1, 4) // : Boolean = true
LinkedHashSet
sElements are traversed in the order for which they were inserted:
val weekdays = scala.collection.mutable.LinkedHashSet("Mo", "Tu", "We", "Th", "Fr")
weekdays.mkString(", ") // : String = Mo, Tu, We, Th, F
SortedSet
sElements are traversed in the sorted order:
Set
Operations+
is used for adding elements to unordered collections.+:
and :+
are for pre-pending and appending elements to ordered collections, respectively.Vector(1, 2, 3) :+ 5 // : Vector[Int] = Vector(1, 2, 3, 5)
1 +: Vector(1, 2, 3) // : Vector[Int] = Vector(1, 1, 2, 3)
val numbers = ArrayBuffer(1, 2, 3) // : ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
numbers += 5 // : ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 5)
map
applies a function to each element of a collection and results in another collection containing the mapped elements.Same as:
However, the map
version is preferred as it is more easily parallelizable (more on this later.)
If a function returns a collection, you can “flatten” each of these into a single result using flatMap
.
Consider the following function that returns a collection:
Now, let’s map
each of the names to the value produced by this function:
names.map(ulcase) // : List[Vector[String]] = Vector("PETER", "peter"), Vector("PAUL", "paul"), Vector("MARY", "mary")
Suppose we just want a List
of names rather than a List
of Vector
s of names:
transform
instead of map
.collect
works with partial functions, i.e., those that may not be defined for all inputs.To apply a function where the return value is not important (e.g., it returns Unit
), use foreach
:
Outputs:
Peter
Paul
Mary
c.reduceLeft(op)
applies op
to successive elements.reduceRight
Does the same but starts at the end of the collection:
Expands to: 1 - (7 - (2 - 9)) = 1 - 7 + 2 - 9 = -13
coll.foldLeft(init)(op)
.QUESTION: What kind of method is this?
QUESTION: Why use this kind of method?
Count the frequencies of letters in a String
.
Map
:val freq = scala.collection.mutable.Map[Char, Int]()
for (c <- "Mississippi") freq(c) = freq.getOrElse(c, 0) + 1
After this code, freq
becomes:
collection.mutable.Map[Char,Int] = Map(M -> 1, s -> 4, p -> 2, i -> 4)
op
is the partially filled map.op
is the new letter.Map[Char, Int]()
is the initial, empty map.scanLeft
and scanRight
combine folding and mapping.Suppose we want to combine the following two collections:
Specifically, we want a list of pairs of corresponding elements from each list: (5.0, 10), (20.0, 2), ...
.
Apply a function to each pair:
Total price of all items:
zipAll
:
List(5.0, 20.0, 9.95).zipAll(List(10, 2), 0.0, 1) // : List[(Double, Int)] = List((5.0,10), (20.0,2), (9.95,1))
zipWithIndex
:"Scala".zipWithIndex // : collection.immutable.IndexedSeq[(Char, Int)] = Vector((S,0), (c,1), (a,2), (l,3), (a,4))
iterator
method.Source.fromFile
returns an iterator
to avoid reading an entire file into memory.
Iterator
Iterator
Or:
Vector(1, 2, 3)
Vector(2, 3, 4)
Vector(3, 4, 5)
Vector(4, 5, 6)
Vector(5, 6, 7)
Vector(6, 7, 8)
Vector(7, 8, 9)
Vector(8, 9, 10)
Obtaining the number of elements “exhausts” the iterator:
Iterator
to a Collection
Iterator
Iterator
without consuming it.LazyList
sStream
s in Scala version 2.12.x.
LazyList
is fully lazy.Stream
, in Scala 2.12.x, was only tail lazy.Iterator
s are “lazy” alternatives to Collection
s.
Collection
s, useful for large or computationally intensive collections to construct.Iterator
s are fragile.
next
(and even length
) change the iterator.LazyList
implements an immutable linked list.
LazyList
s compute their elements on-demand, they can be infinite collections!
count
, sum
, max
or min
) will not terminate.LazyList
Examples#::
operator is similar to ::
for List
s.tail
is unevaluated:head
to compute values:LazyList
sLazyList
s are deferred:val squares = numsFrom(1).map(x => x * x)
// : scala.collection.immutable.LazyList[scala.math.BigInt] = LazyList(<not computed>)
head
to force the evaluation of the next element:To get multiple elements, you need to invoke operations that force the “realization” of the list.
Outputs:
1
4
9
16
25
Alternatively, squares.take(5).force
results in LazyList(1, 4, 9, 16, 25)
.
LazyList
s from Iterator
sLazyList
from an Iterator
.Source.getLines
returns Iterator[String]
, which allows you to only visit each line once.LazyList
s, on the other hand, elements are memoized; that is, the value of each element is computed at most once.
import scala.io.Source
val words = Source.fromFile("/usr/share/dict/words").getLines().to(LazyList)
words // : scala.collection.immutable.LazyList[String] = LazyList(<not computed>)
words.head // : String = A
words(5) // : String = AOL's
words // : scala.collection.immutable.LazyList[String] = LazyList(A, A's, AMD, AMD's, AOL, AOL's, <not computed>)
List
s, other kinds of Collection
s can also be made “lazy” using the view
method.view
method returns a Collection
whose methods’ execution is deferred.import scala.math._
val palindromicSquares = (1 to 1000000).view
.map(x => x * x)
.filter(x => x.toString == x.toString.reverse) // : scala.collection.View[Int] = View(<not computed>)
LazyList
s, this View
also is not yet computed.LazyList
s, the results are not memoized.
LazyList
s, the force
method will force the computation of elements.apply
method forces evaluation of the entire collection.Collection
.Collection
.The following code will set the elements in the given slice to 0
but leave the remaining elements unchanged:
To sum a large collection coll
in parallel, simply write coll.par.sum
.
par
Methodpar
method of Collection
results a parallel implementation of the Collection.
Collection
implementation will execute parallel versions of its operations.Count the even numbers in coll
by evaluating the predicate on sub-collections in parallel and combine the results:
You can even parallelize for
loops using .par
:
This prints the numbers in the order that the threads working on the task process them:
25 26 27 28 29 30 37 12 13 14 15 16 17 43 44 45 46 47 48 49 41 42 6 7 8 9 10 11 3 4 5 1 2 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 62 63 64 65 66 67 68 69 70 71 72 73 74 56 57 58 59 60 61 53 54 55 51 52 50 0 40 18 19 20 21 38 39 31 22 23 24 32 33 34 35 36
Compare this to the output of the sequential version, (0 until 100).foreach(i => print(s"$i "))
:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
You should never try to mutate shared (mutable) variables in parallel operations as the results are unpredictable.
In one run, count
may be 186119. But, in another run of the same code, count
may be 337736.
build.sbt
:scalaVersion := "2.13.3"
libraryDependencies += "org.scala-lang.modules" %% "scala-parallel-collections" % "0.2.0"
.par
methods, as well as the parallel collection hierarchy..par
have types include the ParSeq
, ParSet
, and ParMap
traits.Seq
, Set
, or Map
.You can convert a parallel collection to a sequential one using seq
:
As previously mentioned, not all operations are parallelizable, such as those maintaining data ordering.
Examples include reduceLeft
and reduceRight
.
Alternatively, you may use reduce
, which operates on portions of the collection, combining the results.
However, the given operation must be associative, i.e., fulfilling (a op b) op c = a op (b op c).
Folding has a similar problem, and there is a fold
method for that.
Unfortunately, it is not as flexible as both arguments must be elements of the collection.
Produces:
val str: String = 12345678910111213141516171819202122232425262728...
aggregate
to solve the problem of complex parallel folding.is equivalent to:
str
.aggregate
is deprecated for sequential collections.Write a function that receives a collection of strings and a map from strings to integers. Return a collection of integers that are values of the map corresponding to one of the strings in the collection. For example, given Array("Tom", "Fred", "Harry")
and Map("Tom" -> 3, "Dick" -> 4, "Harry" -> 5)
, return Array(3, 5)
. Hint: Use flatMap
to combine the Option
values returned by get
.
The method java.util.TimeZone.getAvailableIDs
yields time zones such as Africa/Cairo
and Asia/Chungking
. Which continent has the most time zones? Hint: groupBy
.
Harry Hacker reads a file into a String
and wants to use a parallel collection to update the letter frequencies concurrently on portions of the String
. He uses the following code:
val frequencies = new scala.collection.mutable.HashMap[Char, Int]
for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1
Why is this a terrible idea? How can he really parallelize the computation? (Hint: Use aggregate
.)