ch4

of 22
11 views
PDF
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Parallel Processing and Multiprocessors why parallel processing? types of parallel processors cache coherence synchronization memory ordering Why Parallel Processing go past physical limits of uniprocessing (speed of light) pros: performance ã power ã cost-effectiveness (commodity parts) ã fault tolerance cons: difficult to parallelize applications ã automatic by compiler hard in general cases ã parallel program development ã IT IS THE SOFTWARE, stupid! © 2009 by Vijaykumar ECE565 Lecture Note
Document Share
Document Tags
Document Transcript
  ECE565 Lecture Notes: Chapter 4 1  © 2009 by Vijaykumar Parallel Processing and Multiprocessors why parallel processing?types of parallel processorscache coherencesynchronizationmemory ordering ECE565 Lecture Notes: Chapter 4 2  © 2009 by Vijaykumar Why Parallel Processing go past physical limits of uniprocessing (speed of light)pros: performanceã powerã cost-effectiveness (commodity parts)ã fault tolerancecons: difficult to parallelize applicationsã automatic by compiler hard in general casesã parallel program developmentã IT IS THE SOFTWARE, stupid! ECE565 Lecture Notes: Chapter 4 3  © 2009 by Vijaykumar Amdahl’s Law speedup = 1/(frac enhanced  /speedup enhanced + 1 - frac enhanced )speedup of 80 with 100 processors=> frac parallel = 0.9975only 0.25% work can be serialmay help: problems where parallel parts scale faster than serialã O(n 2 ) parallel vs. O(n) serialchallenge: long latencies (often several microsecs)ã achieve data locality in some fashion ECE565 Lecture Notes: Chapter 4 4  © 2009 by Vijaykumar Application Domains Parallel Processing - true parallelism in one jobã data may be tightly sharedOS - large parallel program that runs a lot of timeã typically hand-crafted and fine-tunedã data more loosely sharedã typically locked data structures at differing granularitiestransaction processing - parallel among independent transactionsã throughput oriented parallelism  ECE565 Lecture Notes: Chapter 4 5  © 2009 by Vijaykumar Types Flynn Taxonomyã 1966ã not all encompassing but simpleã based on # instruction streams and data streamsã SISD - uniprocessorã SIMD - like vectorã MISD - few practical examplesã MIMD - multiprocessors - most common, very flexible ECE565 Lecture Notes: Chapter 4 6  © 2009 by Vijaykumar Single Instruction Single Data (SISD) your basic uniprocessor OperandInstructionStorageStorageExecutionUnitInstructionUnitinstructionsinglestreamsinglestreamdata ECE565 Lecture Notes: Chapter 4 7  © 2009 by Vijaykumar Single Instruction Multiple Data (SIMD) . . .. . . DataregistersflagALUregistersflagALUregistersflagALUMemory Memory MemoryData DataControlProcessorMemoryInstruction . . . broadcastinstructionInterconnect/AlignmentNetwork  ECE565 Lecture Notes: Chapter 4 8  © 2009 by Vijaykumar Single Instruction Multiple Data (SIMD) Vectors are same as SIMDã deeply pipelined FUs vs. multiple FUs in previous slideintrs and data usually separatedleads to data parallel programming modelworks best for very regular, loop-oriented problemsã many important classes- eg graphicsnot for commercial databases, middleware (80% of server codes)automatic parallelization can work  ECE565 Lecture Notes: Chapter 4 9  © 2009 by Vijaykumar Multiple Instruction Multiple Data (MIMD) most flexible and of most interest to ushas become the general-purpose computerautomatic parallelization more difficult ECE565 Lecture Notes: Chapter 4 10  © 2009 by Vijaykumar Perfection: PRAM parallel RAM - theoretical modelfully shared memory - unit latencyno contention, no need to exploit locality ProcessorProcessorProcessor . . .. . . unitlatency Interconnection Network  no Main Memory  contention no contention ECE565 Lecture Notes: Chapter 4 11  © 2009 by Vijaykumar Perfection not achievable latencies grow as the system size growsbandwidths restricted by memory organization and interconnectdealing with reality leads to division betweenã UMA and NUMA ECE565 Lecture Notes: Chapter 4 12  © 2009 by Vijaykumar UMA: uniform memory access ProcessorProcessorProcessor . . .. . . latencylong Main Memory  contention in memory banks Interconnection Network  contention in network  ECE565 Lecture Notes: Chapter 4 13  © 2009 by Vijaykumar UMA: uniform memory access latencies are the sameã but may be highdata placement unimportantlatency gets worse as system grows => scalability issuestypically used in small MPs onlycontention restricts bandwidthcaches are often allowed in UMA systems ECE565 Lecture Notes: Chapter 4 14  © 2009 by Vijaykumar Caches another way of tackling latency/bandwidth problemsholds recently used dataBUT cache coherence problems ECE565 Lecture Notes: Chapter 4 15  © 2009 by Vijaykumar NUMA: non-uniform memory access . . . Interconnection Network  contention in network . . . latencylong MemoryProcessorMemoryProcessorMemoryProcessor  shortlatency ECE565 Lecture Notes: Chapter 4 16  © 2009 by Vijaykumar NUMA: non-uniform memory access latency low to local memorylatency much higher to remote memoriesperformance very sensitive to data placementbandwidth to local may be highercontention in network and for memories
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x