Git For Data - DataOps best practices at scale
Concepts such as Dev/Test environments and CI/CD are harder to implement in data engineering, since the data, and not just the code, should be managed.
In this session, we will review ways to achieve version control over the data lake, with lakeFS, using git-like semantics to create and access those versions.
Introducing known concepts from code, such as ״branch ״ to create an isolated version of the data, ״commit ״, to create a reproducible point it time, and “merge” in order to incorporate your changes in one atomic action.
We will review real life examples of how lakeFS customers use these concepts to reduce their cloud storage cost, increase their data engineering efficiency and achieve millisecond recoveries from data outages.
The session will include a live demo - from nothing to complete data lake versioning in 15 minutes or less.