AWS Lambda And Multi Threading Using Python

June 11, 2019  1 minute read  

An example of using Python multi-threading in AWS Lambda.

Why Multi Threading

Using multithreading in AWS Lambda can speed up your Lambda execution and reduce cost as Lambda charges in 100 ms unit.

Python ThreadPoolExecutor

Note that ThreadPoolExecutor is available with Python 3.6 and 3.7+ runtime. ThreadPoolExecutor provides a simple abstraction to using multiple threads to perform tasks concurrently.

Creating a ThreadPoolExecutor

You can create a ThreadPoolExecutor instance using the following syntax. You can control number of concurrent workers / threads by setting max_workers parameter.


thread_pool_executor = ThreadPoolExecutor(max_workers=4)

Submit A Task To Worker

Workers in this thread pool executor receives a task via ThreadPoolExecutor submit method.


thread_pool_executor.submit(
    a_task, 
    parameters_for_task_function)

Free Resources

ThreadPoolExecutor shutdown method with wait parameter as True tells ThreadPoolExecutor instance that it should free the resources it is using when pending tasks are done executing.


thread_pool_executor.shutdown(wait=True) 

Context Manager

Using Context Manager(with statement) syntax stops the function from exiting the with block before all the tasks(Futures) are completed. When all tasks are completed, it will call ThreadPoolExecutor shutdown method to free the resources that it was using.


with ThreadPoolExecutor(max_workers=4) as executor:
    executor.submit(task)

Future Iterator

concurrent.futures.as_completed returns an iterator over Future instances.

Example

The following is an example using ThreadPoolExecutor and as_completed in Context Manager syntax.

from concurrent.futures import \
    ThreadPoolExecutor, as_completed

# task that is performed in parallel
def task(item, index):     
    if item.frequency > 10:
        return True, index
    else:
        return False, index

# multi-threading function 
def countHighFrequencyItem(list_of_items):
    if len(list_of_items) == 0:
        return 0

    all_tasks = []

    with ThreadPoolExecutor(max_workers=4) as executor:
        for item_index in range(len(list_of_items)):
            all_tasks.append(
                executor.submit(
                    task, 
                    list_of_items[item_index], 
                    item_index)

        temp_res = list(range(len(list_of_items)))
        # process completed tasks
        for future in as_completed(all_tasks):
            tooFrequent, index = future.result()
            temp_res[index] = tooFrequent

        count = 0
        for is_frequent in temp_res:
            if is_frequent:
                count += 1

        return count

Summary

You can use Python ThreadPoolExecutor to execute functions concurrently to reduce total runtime possibly at the expense of higher memory usage.

Support Jun

Thank you for reading!    Support JunSupport Jun

Support Jun on Amazon US

Support Jun on Amazon Canada

If you are preparing for Software Engineer interviews, I suggest Elements of Programming Interviews in Java for algorithm practice. Good luck!

You can also support me by following me on Medium or Twitter.

Feel free to contact me if you have any questions.

Comments