Distributed Computing with Python

Distributed Computing

The term distributed computing refers to joining together multiple software components working on multiple computers, but all components run together as a single system. These multiple computer servers link together over a network to form a cluster. The cluster represents a distributed system and helps share data and coordinate processing power.

Benefits of Distributed Computing

There are various advantages of using distributed computing. Some of these advantages are the following:

Scalability
Performance
Resilience
Cost-efficiency

Python Tools for Computing

There are various tools in Python to perform low-level primitives and send and receive messages. Some of these tools are the following:

OpenMPI (message passing library)
Python multiprocessing (process-based threading interface)
ZeroMQ (universal message passing library)

Similarly, there are numerous domain-specific tools in Python. Some of these tools are the following:

TensorFlow (machine learning program for model training)
Spark (unified engine for data processing)
Flink (stateful computations for stream processing)

All of these tools have limitations in performing efficient distributed computing.

Ray in Python

Ray is an open-source project that supports parallel and distributed computing using Python. Ray is a critical framework that provides a simple, universal, and user-friendly API for building any distributed application. Ray consists of various crucial libraries, such as the RLib, a scalable reinforcement learning library, and the Tune, a scalable hyperparameter tuning library. Ray provides various benefits over the traditional multiprocessing module of Python. The multiprocessing module of Python cannot cope with the requirements of the latest technological applications.

Advantages of Using Ray

There are many benefits of using the Ray framework over the traditional multiprocessing module. Some of these benefits are the following:

It runs the same code on multiple computers.
It builds microservices and actors to provide state and communication.
Ray handles various machine failures effectively.
Ray handles large objects and numerical data efficiently.
It Makes complex computing possible.

Ray uses existing functions and classes and converts them according to distributed settings as tasks and actors.

Sample Ray Task

Here is a simple example of using Ray for distributed computing. The example compares a normal Python function of generating a Fibonacci sequence to a Ray task. The following command can install Ray using pip:

pip install 'ray[defualt]'

The next thing is to import essential libraries. Following commands import the libraries:

import os
import time
import ray

Following is the simple Python code to generate the Fibonacci sequence:

def fibonacci_local(seq_size):
   fib= []
   for i in range(0, seq_size):
     if i < 2:
       fib.append(i)
       continue
     fib.append(fib[i-1]+fib[i-2])
   return seq_size

Similarly, the following code represents a Ray task to generate a Fibonacci sequence:

@ray.remote
def fibonacci_distributed(seq_size):
   fib = []
   for i in range(0, seq_size):
     if i < 2:
       fib.append(i)
       continue
     fib.append(fib[i-1]+fib[i-2])
   return seq_size

The code ‘@ray.remote’ creates a Ray task that can be scheduled across the laptop’s CPU cores. Another thing here is that the function does not return the Fibonacci sequence but the sequence size. This approach helps reduce the value of a distributed function so that it requires or returns a large number of data parameters. Moreover, It is also vital to examine the overall time both the normal and the Ray function take to generate a specific number of Fibonacci sequences for speed comparison. The following code can compare the execution time for these functions:

# Normal Python
def run_local(seq_size):
   start_time = time.time()
   results = [fibonacci_local(seq_size) for _ in range(os.cpu_count())]
   duration = time.time() - start_time
   print('Sequence size: {}, Local execution time: {}'.format(seq_size, duration))
# Ray
def run_remote(seq_size):
   # Starting Ray
   ray.init()
   start_time = time.time()
   results = ray.get([fibonacci_distributed.remote(seq_size) for _ inrange(os.cpu_count())])
   duration = time.time() - start_time
   print('Sequence size: {}, Remote execution time: {}'.format(sequence_size, duration))

The ‘os.cpu-count()’ returns the number of CPUs in a system. In this example, the user can test both functions to generate multiple 100000 number Fibonacci sequences. The Ray function distributes the computation across multiple CPUs, so it gets the required output in less time. In this sample case, the execution time for the normal Python function turns out to be 4.2s, while the execution time for the Ray function is 1.7s. Similarly, the developers can make highly efficient applications using Ray for distributed computing in Python.

Other useful articles: