NiFi
This plugin extracts the following:
- NiFi flow as
DataFlow
entity - Ingress, egress processors, remote input and output ports as
DataJob
entity - Input and output ports receiving remote connections as
Dataset
entity - Lineage information between external datasets and ingress/egress processors by analyzing provenance events
Current limitations:
- Limited ingress/egress processors are supported
- S3:
ListS3
,FetchS3Object
,PutS3Object
- SFTP:
ListSFTP
,FetchSFTP
,GetSFTP
,PutSFTP
- S3:
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[nifi]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"
# Credentials
auth: SINGLE_USER
username: admin
password: password
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
site_url ✅ string | URI to connect |
auth Enum | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS Default: NO_AUTH |
ca_file One of boolean, string | Path to PEM file containing certs for the root CA(s) for the NiFi |
client_cert_file string | Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT" |
client_key_file string | Path to PEM file containing the client’s secret key |
client_key_password string | The password to decrypt the client_key_file |
password string | Nifi password, must be set for auth = "SINGLE_USER" |
provenance_days integer | time window to analyze provenance events for external datasets Default: 7 |
site_name string | Site name to identify this site with, useful when using input and output ports receiving remote connections Default: default |
site_url_to_site_name map(str,string) | |
username string | Nifi username, must be set for auth = "SINGLE_USER" |
env string | The environment that all assets produced by this connector belong to Default: PROD |
process_group_pattern AllowDenyPattern | regex patterns for filtering process groups Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
process_group_pattern.allow array(string) | |
process_group_pattern.deny array(string) | |
process_group_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "NifiSourceConfig",
"description": "Any source that produces dataset urns in a single environment should inherit this class",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"site_url": {
"title": "Site Url",
"description": "URI to connect",
"type": "string"
},
"auth": {
"description": "Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS",
"default": "NO_AUTH",
"allOf": [
{
"$ref": "#/definitions/NifiAuthType"
}
]
},
"provenance_days": {
"title": "Provenance Days",
"description": "time window to analyze provenance events for external datasets",
"default": 7,
"type": "integer"
},
"process_group_pattern": {
"title": "Process Group Pattern",
"description": "regex patterns for filtering process groups",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"site_name": {
"title": "Site Name",
"description": "Site name to identify this site with, useful when using input and output ports receiving remote connections",
"default": "default",
"type": "string"
},
"site_url_to_site_name": {
"title": "Site Url To Site Name",
"description": "Lookup to find site_name for site_url, required if using remote process groups in nifi flow",
"default": {},
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"username": {
"title": "Username",
"description": "Nifi username, must be set for auth = \"SINGLE_USER\"",
"type": "string"
},
"password": {
"title": "Password",
"description": "Nifi password, must be set for auth = \"SINGLE_USER\"",
"type": "string"
},
"client_cert_file": {
"title": "Client Cert File",
"description": "Path to PEM file containing the public certificates for the user/client identity, must be set for auth = \"CLIENT_CERT\"",
"type": "string"
},
"client_key_file": {
"title": "Client Key File",
"description": "Path to PEM file containing the client\u2019s secret key",
"type": "string"
},
"client_key_password": {
"title": "Client Key Password",
"description": "The password to decrypt the client_key_file",
"type": "string"
},
"ca_file": {
"title": "Ca File",
"description": "Path to PEM file containing certs for the root CA(s) for the NiFi",
"anyOf": [
{
"type": "boolean"
},
{
"type": "string"
}
]
}
},
"required": [
"site_url"
],
"additionalProperties": false,
"definitions": {
"NifiAuthType": {
"title": "NifiAuthType",
"description": "An enumeration.",
"enum": [
"NO_AUTH",
"SINGLE_USER",
"CLIENT_CERT",
"KERBEROS"
]
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
}
}
}
Authentication
This connector supports following authentication mechanisms
Single User Authentication (auth: SINGLE_USER
)
Connector will pass this username
and password
as used on Nifi Login Page over /access/token
REST endpoint. This mode also works when Kerberos login identity provider is set up for Nifi.
Client Certificates Authentication (auth: CLIENT_CERT
)
Connector will use client_cert_file
(required) and client_key_file
(optional), client_key_password
(optional) for mutual TLS authentication.
Kerberos Authentication via SPNEGO (auth: Kerberos
)
If nifi has been configured to use Kerberos SPNEGO, connector will pass user’s Kerberos ticket to nifi over /access/kerberos
REST endpoint. It is assumed that user's Kerberos ticket is already present on the machine on which ingestion runs. This is usually done by installing krb5-user and then running kinit for user.
sudo apt install krb5-user
kinit user@REALM
No Authentication (auth: NO_AUTH
)
This is useful for testing purposes.
Access Policies
This connector requires following access policies to be set in Nifi for ingestion user.
Global Access Policies
Policy | Privilege | Resource | Action |
---|---|---|---|
view the UI | Allows users to view the UI | /flow | R |
query provenance | Allows users to submit a Provenance Search and request Event Lineage | /provenance | R |
Component level Access Policies (required to be set on root process group)
Policy | Privilege | Resource | Action |
---|---|---|---|
view the component | Allows users to view component configuration details | /<component-type>/<component-UUID> | R |
view the data | Allows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events | /data/<component-type>/<component-UUID> | R |
view provenance | Allows users to view provenance events generated by this component | /provenance-data/<component-type>/<component-UUID> | R |
Code Coordinates
- Class Name:
datahub.ingestion.source.nifi.NifiSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack.