Tree-based Launch In Open Mpi

SERVIDORES

I've mentioned it before: the run-time systems of MPI implementations are frequently unsung heroes.

A lot of blood, sweat, tears, and innovation goes into parallel run time systems, particularly those that can scale to very large systems. But they're not discussed often, mainly because they're not as sexy and ultra-low latency numbers, or other popular MPI benchmarks.

Here's one cool thing that we added to the runtime in Open MPI a few years ago, and have continued to improve on over the years (including pretty pictures!).

In the 1990's when clusters of Linux servers were a new concept, the only way to launch MPI processes on remote servers was via ssh (rsh was used for a while, but it eventually mostly died out).

While job schedulers and cluster resource managers tend to offer fast MPI/parallel job startup these days, there are a surprising number of users who still use ssh-backed job startup mechanisms. There are a number of valid and good reasons for this, but we'll explore that another time.

Let's take a step back and look at what a job launcher does.

Conceptually, parallel job launchers are simple: loop over starting each target process on their target machine. Keeping with the ssh theme here, the figure below shows this model using individual ssh connections:

An obvious optimization - one that Open MPI has done since its inception - is to only connect to each target machine onlyonce, and then launch the desired target processes from that initial single connection:

(NOTE: the above figure is a bit simplified: mpirun actually launches a proxy daemon on each node; the daemon then forks each of the target MPI processes).

As your parallel application grows in terms of number of servers, such a serial launch mechanism becomes an obvious bottleneck.

It therefore makes sense to parallelize the launcher: use a tree-based launch structure. Have the job initiator (shown as "mpirun" in each figure) be the root of a tree. Each server that is launched upon can also launch on further servers. The inherent parallelization speeds up the overall launch from O(N) to O(log N):

Schweet!

Open MPI debuted a tree-based ssh launcher back in the v1.3 series (circa 2009). The first generation tree-based launcher used a binomial tree. This shape effectively amortized the high costs for creating (expensive) ssh connections.

Note that the tree-based ssh structure necessitates setting up password-less/passphrase-less ssh loginsbetween each pair of servers in the HPC cluster. If you use the same ssh keys on every server, this is trivial to setup. If you use different ssh keys on each server, it's a little more work.

That being said, Open MPI allows users to disable the tree-based launch and use the linear ssh launcher, if desired.

This blog entry is getting a bit long, so stay tuned: I'll describe a few more fun things about the Open MPI ssh tree-based launching system in the next entry...

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVIDORES

NOTÍCIAS QUENTES

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Tree-based launch in Open MPI

Tags quentes : HPC mpi Open MPI

Ordering Guide

Recursos

Quem somos