Parallel processing/Multiprocessing using python and databricks

Sanajit Ghosh
3 min readMay 24, 2024

This is a simple program to find the power of a number. There are 10000 numbers in a list and the processing is done first synchronously and then asynchronously. There will be 4 processes running parallel to do this task using 4 pools. Here one of the async method “pool.map” is used to process the function in parallel across the cores of CPU.

The synchronous operation takes 189sec, which is double the time (99sec) what the async operation takes to complete the task.

By increasing the pool size to 8, the total time for completion of the task got reduced by 2 sec.

The Databricks cluster which I am using is a 32core machine, so at max 32 cores or 32 parallel processes can be spun to run tasks in parallel. However, the tradeoff should be taken into consideration while choosing the pool size as clocking higher pool not always give desirable results as other tasks might be waiting in the queue.

By increasing the pool size to 16, the result set gets completed within 91sec.

The methods like map(), imap() and imap_unordered submits multiple tasks to the process pool .The advantage here in imap_unordered is :

1. It returns results immediately rather than complete all its execution and return the results at once like map(). 2. The order of execution is random, and it is not iterable. Both of these methods help significantly to improve concurrency and is very effective in maintaining CPU utilization and costs.

from multiprocessing.pool import Pool
import time

def power_of_number(num):
toThePowerNum = num**num
return toThePowerNum

def list_of_numbers():
lst = [num for num in range(10000)]
return lst

# entry point
def main():
start_time = time.time()
number_list = list_of_numbers()
print(f'there are total {len(number_list)} numbers in the list {number_list} \n\n')
with Pool(16) as pool:
for result in pool.imap_unordered(power_of_number, number_list, chunksize=100):
print(f'Got result: {result}', flush=True)

print("---Total time taken for completion %s seconds ---" % (time.time() - start_time))

if __name__ == '__main__':
main()

If the same code is executed in VS, we can see the subprocesses w.r.t to the CPU cores. As the number of workers in the process pool is increased, the CPU worker utilization is also increased rapidly and at a certain point if it is left with no workers then it can lead to queues and failure. So there should be moderate trade off between the chunksize, poolsize and the async method.

4Cores / pool(4)
8cores/ pool(8)

--

--