Apache NiFi is a powerful data processing and integration platform that provides a user-friendly interface for designing data flows. Docker is a popular tool for containerization, allowing you to package and run applications in lightweight, portable containers. Docker Compose, on the other hand, simplifies the management of multi-container Docker applications by defining and running them with a single YAML file.
In this guide, we’ll walk through the process of setting up Apache NiFi using Docker Compose.
Prerequisites:
- Docker installed on your system. You can download and install Docker from the official website.
Step 1: Create a Docker Compose file
Create a new directory for your NiFi project and navigate into it. Inside this directory, create a file named
1 | docker-compose.yml |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68 version: "3"
services:
# configuration manager for NiFi
zookeeper:
hostname: myzookeeper
container_name: zookeeper_container_persistent
image: 'bitnami/zookeeper:3.7.0'
restart: on-failure
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
networks:
- my_persistent_network
# version control for nifi flows
registry:
hostname: myregistry
container_name: registry_container_persistent
image: 'apache/nifi-registry:1.24.0'
restart: on-failure
ports:
- "18080:18080"
environment:
- LOG_LEVEL=INFO
- NIFI_REGISTRY_DB_DIR=/opt/nifi-registry/nifi-registry-current/database
- NIFI_REGISTRY_FLOW_PROVIDER=file
- NIFI_REGISTRY_FLOW_STORAGE_DIR=/opt/nifi-registry/nifi-registry-current/flow_storage
volumes:
- .\nifi_registry\database:/opt/nifi-registry/nifi-registry-current/database
- .\nifi_registry\flow_storage:/opt/nifi-registry/nifi-registry-current/flow_storage
networks:
- my_persistent_network
# data extraction, transformation and load service
nifi:
hostname: mynifi
container_name: nifi_container_persistent
image: 'apache/nifi:1.24.0'
restart: on-failure
ports:
- '8091:8080'
environment:
- NIFI_WEB_HTTP_PORT=8080
- NIFI_CLUSTER_IS_NODE=true
- NIFI_CLUSTER_NODE_PROTOCOL_PORT=8082
- NIFI_ZK_CONNECT_STRING=myzookeeper:2181
- NIFI_ELECTION_MAX_WAIT=30 sec
- NIFI_SENSITIVE_PROPS_KEY='12345678901234567890A'
healthcheck:
test: "${DOCKER_HEALTHCHECK_TEST:-curl localhost:8091/nifi/}"
interval: "60s"
timeout: "3s"
start_period: "5s"
retries: 5
volumes:
- .\nifi\database_repository:/opt/nifi/nifi-current/database_repository
- .\nifi\flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
- .\nifi\content_repository:/opt/nifi/nifi-current/content_repository
- .\nifi\provenance_repository:/opt/nifi/nifi-current/provenance_repository
- .\nifi\state:/opt/nifi/nifi-current/state
- .\nifi\conf:/opt/nifi/nifi-current/conf
- .\nifi\logs:/opt/nifi/nifi-current/logs
- .\nifi\input_directory:/opt/nifi/nifi-current/input_directory
- .\nifi\output_directory:/opt/nifi/nifi-current/output_directory
# uncomment the next line after copying the /conf directory from the container to your local directory to persist NiFi flows
#- ./nifi/conf:/opt/nifi/nifi-current/conf
networks:
- my_persistent_network
networks:
my_persistent_network:
driver: bridge
This Docker Compose file defines a single service named
1 | nifi |
1 | 8080 |
1 | ./data |
1 | NIFI_WEB_HTTP_PORT |
1 | 8080 |
Volumes: These volumes define paths on the host machine mapped to directories within the Apache NiFi container. Here’s a breakdown of each volume definition:
-
:1.\nifi\database_repository:/opt/nifi/nifi-current/database_repository
- This volume maps the host directory
to the container directory1.\nifi\database_repository. It allows NiFi to store its database repository data persistently on the host machine.1/opt/nifi/nifi-current/database_repository
- This volume maps the host directory
-
:1.\nifi\flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
- Similar to the first volume, this maps the host directory
to the container directory1.\nifi\flowfile_repository. It allows NiFi to store its flowfile repository data persistently on the host machine.1/opt/nifi/nifi-current/flowfile_repository
- Similar to the first volume, this maps the host directory
-
:1.\nifi\content_repository:/opt/nifi/nifi-current/content_repository
- This volume maps the host directory
to the container directory1.\nifi\content_repository. It allows NiFi to store its content repository data persistently on the host machine.1/opt/nifi/nifi-current/content_repository
- This volume maps the host directory
-
:1.\nifi\provenance_repository:/opt/nifi/nifi-current/provenance_repository
- Similar to the previous volumes, this maps the host directory
to the container directory1.\nifi\provenance_repository. It allows NiFi to store its provenance repository data persistently on the host machine.1/opt/nifi/nifi-current/provenance_repository
- Similar to the previous volumes, this maps the host directory
-
:1.\nifi\state:/opt/nifi/nifi-current/state
- This volume maps the host directory
to the container directory1.\nifi\state. It allows NiFi to store its state data persistently on the host machine.1/opt/nifi/nifi-current/state
- This volume maps the host directory
-
:1.\nifi\conf:/opt/nifi/nifi-current/conf
- This volume maps the host directory
to the container directory1.\nifi\conf. It allows you to provide custom NiFi configuration files on the host machine, which will be used by NiFi running inside the container.1/opt/nifi/nifi-current/conf
- This volume maps the host directory
-
:1.\nifi\logs:/opt/nifi/nifi-current/logs
- This volume maps the host directory
to the container directory1.\nifi\logs. It allows NiFi to store its log files persistently on the host machine.1/opt/nifi/nifi-current/logs
- This volume maps the host directory
These volume mappings enable Apache NiFi to store its various data repositories, configuration files, and logs outside the container, ensuring persistence and easy access to these files even if the container is stopped or removed.
Apache NiFi Registry serves as a centralized repository for storing and managing Apache NiFi flow configurations. It allows users to version control their data flows, ensuring reproducibility, traceability, and collaboration across different environments. NiFi Registry enables users to create, manage, and share reusable components, such as processors, controller services, and templates, making it easier to maintain consistency and standardization in data integration workflows. With support for fine-grained access control and role-based permissions, NiFi Registry provides a secure platform for sharing and deploying data flows across teams and organizations. Additionally, NiFi Registry integrates seamlessly with Apache NiFi, allowing users to import, export, and synchronize flow versions between NiFi instances, facilitating continuous integration and deployment pipelines. Overall, Apache NiFi Registry enhances the scalability, reliability, and manageability of Apache NiFi deployments, empowering organizations to streamline their data processing pipelines effectively.
Step 2: Start Apache NiFi
Save the
1 | docker-compose.yml |
1 docker compose up -d
This command will download the NiFi Docker image if it’s not already available and start a container named
1 | nifi-container |
1 | -d |
Step 3: Access Apache NiFi
Once the container is running, you can access Apache NiFi by opening your web browser and navigating to
1 | http://localhost:8080 |
Step 4: Stop and Remove Apache NiFi
To stop and remove the Apache NiFi container, run the following command:
1 docker compose down
This command will stop and remove the containers defined in the
1 | docker-compose.yml |
1 | ./data |
Conclusion
In this guide, you learned how to run Apache NiFi on Docker using Docker Compose. Docker Compose makes it easy to define and manage multi-container applications, allowing you to quickly spin up and tear down NiFi instances for your data processing needs. Experiment with different configurations and scale your NiFi deployment as needed to handle various data integration tasks.
Feel free to adjust the Docker Compose file according to your specific requirements, such as configuring additional environment variables or volumes.