, we’ll go over how one can combine DataHub occasions into Jira workflows utilizing the DataHub Actions framework. Earlier than diving in, we’ll give a bit of background into what DataHub is and how one can use its Actions framework for efficient knowledge administration. Lastly, we’ll stroll via a particular instance of writing a customized motion for making a Jira ticket upon creating an information product in DataHub.
Hopefully, this text can function a normal template for how one can combine DataHub occasions of curiosity into your particular Jira processes.
Contents
What’s DataHub?
DataHub is an information catalog that helps knowledge discovery, governance, and metadata administration. It gives options that enable organizations to implement their very own data mesh — a decentralized methodology of knowledge administration that empowers particular person enterprise domains to take initiative over their very own knowledge high quality/necessities.
A metadata platform equivalent to DataHub is extremely useful for varied causes:
- Cross-domain knowledge discovery and evaluation — data consumers (ex: knowledge analysts/scientists) might use DataHub to discover related datasets that they will use for his or her analyses. To assist knowledge customers make sense of knowledge throughout varied domains, every area wants to complement their knowledge property with enough enterprise context.
- — DataHub allows organizations to establish a single source of truth for their data, implement strong security by enforcing access policies on data assets, and ensure regulatory compliance by supporting data classification.
What are DataHub Actions?
In a data-mature organization, metadata is always evolving. Thus, it’s important for organizations to react to metadata changes in real time.
- DataHub provides the Actions Framework to assist combine metadata adjustments that happen in DataHub into a bigger event-based architecture.
- The framework permits you to specify configuration to set off sure actions relying on occasions that happen in DataHub.
- Widespread use instances might embrace notifying related events upon adjustments in a dataset, integrating metadata adjustments into organizational workflows, and so on.
One of many frequent use instances of the DataHub Actions framework is to combine metadata adjustments into organization-specific notifications. DataHub gives native help for this into sure third social gathering platforms, equivalent to Slack and Teams.
On this article, we’ll take a look at how one can use the Actions framework to combine DataHub metadata adjustments into Jira workflows. Particularly, we’ll implement a customized motion that creates a Jira ticket upon creating a brand new data product. Nonetheless, the configuration and code right here could also be simply altered to set off Jira workflows for different DataHub occasions.
Growing a Customized Jira Motion
Every DataHub Motion is run in a separate pipeline, which is a repeatedly operating course of that repeats the next steps: ballot occasion knowledge from related sources, apply transformation/filtering to these occasions, and execute the specified motion.
- We will outline the configuration for our motion pipeline through YAML.
- We will outline the logic to execute for our customized motion by extending DataHub’s base Action class.
Let’s stroll via each of those steps.
YAML configuration
Our YAML configuration requires specifying 4 key facets:
- Motion Pipeline Title (needs to be distinctive and static)
- Occasion Supply Configurations
- Rework + Filter Configurations
- Motion Configuration
There are two attainable occasion sources for DataHub Actions:
Kafka is the framework’s default occasion supply. Except you’re on an occasion of DataHub Cloud above v0.3.7, you’ll be processing occasion knowledge from Kakfa.
The Kafka occasion supply emits two forms of occasions:
Since we’re listening for the creation of knowledge merchandise, and the information construction of the Entity Change Occasion is less complicated to work with, we’ll filter for Entity Change Occasions.
Here’s what our YAML file will appear to be (jira_action.yaml
).
# jira_action.yaml
# 1. Motion Pipeline Title
# This can be no matter you want so long as it is distinctive.
title: "jira_action"
# 2. Occasion Supply - The place to supply occasion knowledge from.
# Kafka is the default occasion supply (https://docs.datahub.com/docs/actions/sources/kafka-event-source).
supply:
kind: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
# 3. Filter - Filter occasions that attain the Motion
# We'll pay attention for Knowledge Product CREATE occasions.
# For extra details about different occasions: https://docs.datahub.com/docs/actions/occasions/entity-change-event#entity-create-event
filter:
event_type: "EntityChangeEvent_v1"
occasion:
entityType: "dataProduct"
class: "LIFECYCLE"
operation: "CREATE"
# 4. Motion - What motion to tackle occasions.
# Right here, we'll reference our customized Motion file (jira_action.py).
motion:
kind: "jira_action:JiraAction"
config:
# Motion-specific configs (map)
Defining our Customized Motion Class
Subsequent, we have to implement our logic for making a Jira ticket in a customized Motion class, which we’ll outline in a separate jira_action.py
file.
Our motion class will prolong DataHub’s base motion class, which consists of overriding the next strategies:
create()
— invoked when instantiating the motion. When you specified any action-specific config in your YAML file, this methodology will move that config as a dictionary to all situations of this motion.act()
— invoked when an occasion is acquired. This methodology will comprise the core logic of our motion i.e. creating the Jira ticket.shut()
— invoked when our motion pipeline is shutdown.
Since we didn’t specify any action-specific config, and we don’t have to fret about any particular cleanup as soon as our motion is terminated, our work will primarily encompass overriding act()
.
We are going to use Python Jira, a python wrapper across the Jira REST API, to programmatically work together with our Jira occasion. For extra data/examples for how one can programmatically work together with Jira, try the docs.
Here’s what our code will appear to be.
# jira_action.py
from datahub_actions.motion.motion import Motion
from datahub_actions.occasion.event_envelope import EventEnvelope
from datahub_actions.occasion.occasion import Occasion
from datahub_actions.pipeline.pipeline_context import PipelineContext
from jira import JIRA
class JiraAction(Motion):
@classmethod
def create(cls, config_dict: dict, ctx: PipelineContext) -> "Motion":
"""
Shares any action-specific config throughout all situations of the motion.
"""
return cls(ctx, config_dict)
def __init__(self, ctx: PipelineContext, config_dict: dict):
self.ctx = ctx
self.config = config_dict
def act(self, occasion: EventEnvelope) -> None:
"""
Create a Jira ticket when an information product is created in DataHub with its DataHub hyperlink.
We'll use the Python Jira API to programmatically work together with Jira (https://jira.readthedocs.io/index.html).
"""
event_object = occasion.occasion
entity_urn = event_object.entityUrn
# Extract DataHub hyperlink for knowledge product
DATAHUB_DOMAIN = "http://localhost:9002/" # substitute with hyperlink to your DataHub occasion
data_product_link = f"{DATAHUB_DOMAIN}{entity_urn}"
# Authenticate into your Jira occasion (https://jira.readthedocs.io/examples.html#authentication).
jira = JIRA(
token_auth="API token", # Self-Hosted Jira (e.g. Server): the PAT token
# basic_auth=("e mail", "API token"), # Jira Cloud: a username/token tuple
# basic_auth=("admin", "admin"), # a username/password tuple [Not recommended]
# auth=("admin", "admin"), # a username/password tuple for cookie auth [Not recommended]
)
# Create Jira problem (https://jira.readthedocs.io/examples.html#points).
# For more information concerning the attributes you may specify in a Jira Concern,
# try the Concern class (https://github.com/pycontribs/jira/blob/major/jira/sources.py)
issue_dict = {
'undertaking': {}, # JIRA undertaking to create problem below in dict type (ex: {'id': 123})
'abstract': 'New Knowledge Product',
'description': f'Knowledge Product Hyperlink: {data_product_link}',
'issuetype': {}, # outline problem kind in dict type (ex: {'title': 'Bug'})
'reporter': '',
'assignee': ''
}
jira.create_issue(fields=issue_dict)
def shut(self) -> None:
"""
Cleanup any processes taking place contained in the Motion.
"""
move
Though the configuration & code right here is restricted for knowledge product create occasions, these may be altered to combine different DataHub occasions into Jira workflows, equivalent to including/eradicating tags, phrases, domains, homeowners, and so on.
- You will discover the listing of various Entity Change Occasions here.
- Listening for these occasions would contain altering the Filter configuration in our YAML to the sector values of the particular Entity Change Occasion.
For example, to create a Jira ticket for an Add Tag Event on a Dataset, we’d replace our Filter configuration in our YAML as follows:
filter:
event_type: "EntityChangeEvent_v1"
occasion:
entityType: "dataset"
class: "TAG"
operation: "ADD"
Working our Motion
Now that we’ve created our motion configuration and implementation, we are able to run this motion by inserting these two information (jira_action.yaml
and jira_action.py
) in the identical python runtime atmosphere as our DataHub occasion.
Then, we are able to run our motion through CLI utilizing the next command:
datahub actions -c jira_action.yaml
For extra data on growing/operating a customized motion, try the docs.
Wrap-up
Thanks for studying! To briefly recap about what we talked about:
- DataHub is an information catalog that facilitates environment friendly knowledge discovery, administration, and governance.
- DataHub gives its personal Actions Framework to combine metadata adjustments into organizational workflows in actual time.
- Utilizing the framework, we are able to write our personal motion to combine DataHub occasions into Jira workflows by merely defining the motion pipeline in YAML and the implementation logic in a customized python class.
In case you have some other concepts/experiences with utilizing DataHub Actions to implement real-time knowledge governance, I’d love to listen to it within the feedback!
Sources
DataHub Actions:
Python Jira: