class: center, middle, title-slide-cs49365 ## CSci 493.65 Parallel Computing
## Chapter 1: Introduction .author[ Stewart Weiss
] .license[ Copyright 2021-24 Stewart Weiss. Unless noted otherwise all content is released under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/). Background image: George Washington Bridge at Dusk, by Stewart Weiss. ] --- name: cc-notice template: default layout: true .bottom-left[© Stewart Weiss. CC-BY-SA.] --- name: tinted-slide template: cc-notice layout: true class: tinted --- name: objectives ### Objectives of This Course This course has two primary goals: - To introduce you to the major concepts and ideas of parallel computing, and - To give you the basic ability to write simple parallel programs using MPI, OpenMP, and Pthreads. -- So it has theoretical and practical objectives. But we begin with the basics - what is parallel computing and why bother with it? --- name: intro ### Introduction - People invented computers to solve problems faster than they could be solved by hand. -- - Computers have gotten faster and faster, as predicted by Gordon Moore (1965, Moore's Law) -- - But individual processors are not capable of solving the most significant computational problems, nor will they ever be, because of their inherent computational complexity. -- - Parallelism in computers came into existence in 1970's --- name: parallelcomputing ### What is Parallel Computing? .redbold[Parallel computing] is the use of multiple processors or computers working together on a common task. - Each processor works on its section of the problem or data - Processors can exchange information --- ### Sequential Versus Parallel Computing - An ordinary sequential program is designed for .redbold[sequential] (serial) computation: -- - It runs on a single computer with a single processor. -- - A problem is decomposed into a sequence of discrete instructions. - Instructions are executed one after another. - Only one instruction can be executed at a time. -- - A parallel program is designed for .redbold[parallel] computation: -- - It runs on multiple processors. - A problem is decomposed into discrete components that can be solved concurrently - Each component is decomposed into a sequence of instructions. - Instructions from each component execute simultaneously on different processors. --- ### Why Do We Need It? A single CPU has limits on - performance (time and speed) - available memory (problem size) -- With parallel computing - we can solve problems that are too large to fit on a single CPU - and/or solve problems that can’t be solved in a reasonable amount of time -- We can either - solve larger problems (like more nodes in Traveling Salesman) - solve the same problem in less time, - solve problems with greater accuracy (like more accurate prediction or estimation, - solve many more cases of the same problem (different "what if" scenarios) --- ### Hard Problems Some problems cannot be done on ordinary computers and need either supercomputers - those with many, many, many processors in them. Just a few examples: - modeling computational fluid dynamics and turbulence - materials design: finding new superconductors, immunological agents - genome sequencing, genetic engineering, protein folding, enzyme activity, and cell modeling - natural language understanding, automated reasoning - forecasting severe weather events - predicting global warming --- ### Definitions - .redbold[Serial or sequential code] allows only a single thread of execution to work on a single data item at any one time - .redbold[Parallel code] can have multiple computations at any instance of time. These could be - a single thread of execution operating on multiple data items simultaneously - multiple serial threads of execution in a single executable - multiple executables (think separate programs) all working on the same problem - any combination of the above --- ### Definitions - A .redbold[task] is the name we use for a single instance of an executable. Each task has its own virtual address space and may have multiple threads. - A .redbold[parallel computer ]is a computer containing more than one processor. Parallel computers can be categorized as either multi-computers or multiprocessors. - A .redbold[multicomputer] is a computer that contains two or more computers connected by an interconnection network. - A .redbold[centralized multiprocessor], also known as a symmetrical multiprocessor (SMP), is a computer in which the processors share access to a single, global memory. - A .redbold[multi-core processor] is a particular type of multiprocessor in which the individual processors (called ".redbold[cores]") are in a single integrated circuit. --- ### Definitions - A .redbold[node] is a discrete unit of a computer system that typically runs its own instance of the operating system. - A .redbold[cluster] is a collection of machines or nodes that function in some way as a single resource. - .redbold[Parallel programming] is programming in a language that allows one to explicitly indicate how the different parts of the computation can be executed concurrently by different processors. --- ### Data Dependence Graphs A .redbold[data dependence graph] is a directed graph G = < V;E > in which each vertex represents a task to be completed, and an edge from vertex s to vertex t exists if and only if task s must be completed before task t can be started. When a task t cannot be started until s completes, we say t .redbold[depends] on s. Example .center[
] So the task "execute" depends on the task "parse", but "parse" and "cleanup" do not depend on each other. --- ### Data Dependence Graphs and Parallelism When tasks do not depend on each other, they can be run in parallel. Data dependence graphs show the parallelism possible in a problem; the longest path through the graph shows the longest sequence of tasks that must execute in sequence, so it shows limits of parallelization. .center[
] Here 7,11,10 is length 3, as are other paths. But 7, 5, and 3 can run in parallel, as can 11 and 8. --- ### 1.5.2 Data Parallelism A data dependence graph exhibits .redbold[data parallelism] when it contains instances of a task that applies the same sequence of operations to different elements of a data set. It can be - fine-grained, as in ```C for ( i = 0; i < 4000; i++ ) A[i]++; ``` which increments each element of an array named `A` - or coarse-grained, as in ```C sort(A,0,999); sort(A, 1000, 1999); sort(A,2000,2999); sort(A,3000,3999); ``` which applies a function named `sort` to multiple ranges of an array named `A`. --- ### Functional Parallelism A data dependence graph exhibits functional parallelism if it contains independent tasks that apply different operations to possibly different data elements. .center[
] In this dependence graph, the compiler has various tasks that could execute in parallel so it has functional parallelism. --- ### Data versus Functional Parallelism - Partitioning the problem by task (i.e., functional parallelism): - Each process performs a different "function" or executes a different code section - First identify functions, then look at the data requirements - Commonly programmed with message-passing libraries - Partitioning problem by data (i.e., data parallelism): - Each process does the same work on a unique piece of data - First divide the data. Each process then becomes responsible for whatever work is needed to process its data. - Data placement is an essential part of a data-parallel algorithm - Usually more scalable than functional parallelism (can handle larger and larger problem sizes) --- ### Pipelining Pipelining is a form of partial parallelism. When a sequence of instructions is executed many times with different data, it can be staged like an assembly line. Code like this: ```C sum = 0; for ( i = 0; i < 4; i++ ) sum = sum + a[i]; ``` lends itself to pipelining. The loop is unrolled into four instructions: ``` sum0 = a[0]; sum1 = sum0 + a[1]; sum2 = sum1 + a[2]; sum3 = sum2 + a[3]; ``` which become like stages on an assembly line: .center[
] --- ### Paths Towards Parallel Programming How to introduce parallel computation into sequential programming languages, or how to get programmers to write parallel code when their languages are sequential - Modifying compilers so that they detect parallelism and generate parallel code. - Compilers that do this are called .redbold[parallelizing compilers]. - 90% of execution time is spent in 10% of the code - means finding that code to parallelize it saves lots of time - downsides - programmer does not help much and can actually hinder compiler; and it is difficult when pointers are used a lot in code. --- ### Paths Towards Parallel Programming - Providing libraries and/or preprocessors that extend the features of a sequential language to allow the parallel computation. - MPI is an example of such a library. Programmers write code that uses the library, which is linked into the code. - OpenMP is an example of a modification of the compiler with new preprocessor directives as well as libraries that allow the programmer to specify parallel code. --- ### Paths Towards Parallel Programming - Creating new parallel languages - Many new languages have been written to support some type of parallelism. They number in the dozens, perhaps more than a hundred. - Many are not useful for all purposes and most are not widely supported across multiple hardware platforms. - Some popular ones: Clojure, Haskell, Parlog, C#, C*, Java, Python, FortranM, Occam - Wikipedia has a long list of them.