About this Event
2461 SW Campus Way, Corvallis, OR 97331
TITLE: EphFlow: Addressing Resource Limitations of Cross-platform FaaS Workflows via Serverless Ephemeral VM Provisioning
ABSTRACT: Scientific workflows are essential for automating complex computational pipelines. While serverless Function-as-a-Service (FaaS) cloud platforms have the potential to enable wider adoption of workflows, existing systems present significant adoption barriers: vendor lock-in, strict resource limitations, and limited language support. We present EphFlow, a novel middleware that enables serverless cross-platform execution of workflows across multiple cloud platforms (without code modifications), exposing the FaaS programming abstraction while supporting different languages, and supporting dynamic instantiation of ephemeral VMs to host executions that exceed the capacity of FaaS platforms. EphFlow introduces three key innovations: (1) a client-server architecture that supports deployment of FaaS actions in general-purpose containers, abstracting serverless platform-specific APIs while generalizing to multiple programming languages (Python, R, and Julia demonstrated); (2) automatic in-DAG provisioning of ephemeral VMs that transparently overcomes FaaS execution time/memory limits by injecting lifecycle management actions into workflow DAGs; and (3) passive S3-based coordination that eliminates the need for dedicated workflow engines. Experiments demonstrate the ability to deploy workflows within or across five different platforms, building upon a common foundation of REST APIs for container invocation: GitHub Actions, AWS Lambda, Google Cloud Run, OpenWhisk, and SLURM. The system has been qualitatively and quantitatively evaluated using both synthetic workflows and a realistic event-driven ecological forecasting application (FLARE). Qualitative evaluations demonstrate the ability to deploy multi-language FaaS workflows across the above platforms and ephemeral in-workflow VM provisioning as GitHub self-hosted Runners on AWS EC2 without any code changes or managed servers. Event-driven deployment of a synthetic workflow across Lambda, Google Cloud and GitHub Actions over two weeks (1,051 invocations) quantifies the total latency between container action invocations in a workflow DAG (with median ranging from 9.3s in Lambda to 131.0s with ephemeral VM provisioning) as well as overheads associated with individual steps in the middleware’s entry point (0.3–5 seconds to process a workflow configuration payload and invoke the user function). Experiments also demonstrate the ability to combine FaaS and ephemeral VM provisioning in a single workflow with actions that exceed AWS Lambda resource limits. These results show the feasibility of deployment of FaaS workflows across serverless providers without vendor lock-in, in particular for those with long-running actions (minutes or longer).
MAJOR ADVISOR: Renato Figueiredo
COMMITTEE: Kyle Hale
COMMITTEE: Wenqian Dong
GCR: Adam Branscum