Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • A awesome-python
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 13
    • Issues 13
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 317
    • Merge requests 317
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Vinta Chen
  • awesome-python
  • Merge requests
  • !1221

Add Apache Airflow

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge github/fork/duyet/patch-1 into master Jan 28, 2019
  • Overview 1
  • Commits 1
  • Pipelines 0
  • Changes 1

Created by: duyet

What is this Python project?

Apache Airflow: Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

What's the difference between this Python project and similar ones?

Airflow vs. Luigi:

Airflow

  • Easy-to-use UI (+)
  • Built in scheduler (+)
  • Easy testing of DAGs (+)
  • Separates output data and task state (+)
  • Strong and active community (+) Luigi
  • Creating and testing tasks is difficult (-)
  • The UI is challenging to navigate (-)
  • Not scalable due to tight coupling with cron jobs; the number of worker processes is bounded by number of cron workers assigned to a job (-)
  • Re-running pipelines is not possible

Airflow vs. Oozie

Airflow

  • Python Code for DAGs (+)
  • Has connectors for every major service/cloud provider (+)
  • More versatile (+)
  • Advanced metrics (+)
  • Better UI and API (+)
  • Capable of creating extremely complex workflows (+)
  • Jinja Templating (+)
  • Can be parallelized (=)
  • Native Connections to HDFS, HIVE, PIG etc.. (=)
  • Graph as DAG (=)

Oozie

  • Java or XML for DAGs (---)
  • Hard to build complex pipelines (-)
  • Smaller, less active community (-)
  • Worse WEB GUI (-)
  • Java API (-)
  • Can be parallelized (=)
  • Native Connections to HDFS, HIVE, PIG etc.. (=)
  • Graph as DAG (=)

--

Anyone who agrees with this pull request could vote for it by adding a 👍 to it, and usually, the maintainer will merge it when votes reach 20.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/duyet/patch-1