top of page
Writer's pictureergemp

Using Zookeeper for FLUME configurations

Flume is a simple and robust Apache Project which enables streaming data ingestion to Apache Hadoop HDFS. I generally use Flume for fast, simple and realtime data ingestion from Apache Kafka to Hadoop HDFS and also data ingestion from log files as well. One problem that I face while using flume is, there is no cluster option for the flume processes. This means, when a flume process dies, then the process should be run again to start the streaming data ingestion. Well, this is true if only the individual process went down and the OS layer is still up. But what if the OS layer goes down? There should be a second (or standby) server should be up and running in a waiting state for fast switch-over. So the first option in mind is, installing Apache Flume on a standby server and replicate all configuration files of the flume and if something goes wrong, manually start the flume processes on this standby server via flume-ng or write a bash script to automatize flume-ng to start the flume processes. This may be the only option, but the heavy lifting of replicating the configuration files over the standby servers can handed over Zookeeper by storing the Flume configuration files on Zookeeper nodes. And then, flume-ng can access the Zookeeper to get and run the configuration file.


Installing Zookeeper


For this, all you need an installed Zookeeper. You should already have it up and running if you have clustered HDFS name-nodes in your environment. But in case, here is the Zookeeper Documentation.


Creating Zookeeper Directories


First thing to achieve this Zookeeper is creating a node in the Zookeeper directory. I will create /flume-configs2 node in the first node of my Zookeeper cluster.


[hadoop@dwh-hadoop-nn1 ~]$ /u01/hadoop/zookeeper/bin/zkCli.sh \
-server dwh-zookeeper001 create /flume-configs2 "flume-configs2" 

Connecting to dwh-zookeeper001
2018-11-21 20:28:14,886 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT
2018-11-21 20:28:14,889 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=dwh-hadoop-nn1
2018-11-21 20:28:14,889 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_181
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/u01/hadoop/jdk/jre
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/u01/hadoop/zookeeper/bin/../build/classes:/u01/hadoop/zookeeper/bin/../build/lib/*.jar:/u01/hadoop/zookeeper/bin/../lib/slf4j-log4j12-1.7.25.jar:/u01/hadoop/zookeeper/bin/../lib/slf4j-api-1.7.25.jar:/u01/hadoop/zookeeper/bin/../lib/netty-3.10.6.Final.jar:/u01/hadoop/zookeeper/bin/../lib/log4j-1.2.17.jar:/u01/hadoop/zookeeper/bin/../lib/jline-0.9.94.jar:/u01/hadoop/zookeeper/bin/../lib/audience-annotations-0.5.0.jar:/u01/hadoop/zookeeper/bin/../zookeeper-3.4.12.jar:/u01/hadoop/zookeeper/bin/../src/java/lib/*.jar:/u01/hadoop/zookeeper/bin/../conf:
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.10.0-693.11.1.el7.x86_64
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
2018-11-21 20:28:14,891 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
2018-11-21 20:28:14,892 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
2018-11-21 20:28:14,892 [myid:] - INFO  [main:ZooKeeper@441] - Initiating client connection, connectString=dwh-zookeeper001 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29
2018-11-21 20:28:14,909 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server dwh-zookeeper001/10.100.2.189:2181. Will not attempt to authenticate using SASL (unknown error)
2018-11-21 20:28:14,955 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@878] - Socket connection established to dwh-zookeeper001/10.100.2.189:2181, initiating session
2018-11-21 20:28:14,961 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server dwh-zookeeper001/10.100.2.189:2181, sessionid = 0x100036d0a6d0034, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Created /flume-configs2

Import Flume configuration to Zookeeper


Now, since the root node is created, I am going to create the first configuration node and put the contents of the Flume configuration file in it.


[hadoop@dwh-hadoop-nn1 ~]$ /u01/hadoop/zookeeper/bin/zkCli.sh \ 
-server dwh-zookeeper001 \
create /flume-configs2/delphoi-to-dwh-hadoop-all "`cat /u01/hadoop/flume/conf/delphoi-to-dwh-hadoop-all`"

Connecting to dwh-zookeeper001
2018-11-21 20:28:41,083 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT
2018-11-21 20:28:41,085 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=dwh-hadoop-nn1
2018-11-21 20:28:41,086 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_181
2018-11-21 20:28:41,087 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2018-11-21 20:28:41,087 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/u01/hadoop/jdk/jre
2018-11-21 20:28:41,087 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/u01/hadoop/zookeeper/bin/../build/classes:/u01/hadoop/zookeeper/bin/../build/lib/*.jar:/u01/hadoop/zookeeper/bin/../lib/slf4j-log4j12-1.7.25.jar:/u01/hadoop/zookeeper/bin/../lib/slf4j-api-1.7.25.jar:/u01/hadoop/zookeeper/bin/../lib/netty-3.10.6.Final.jar:/u01/hadoop/zookeeper/bin/../lib/log4j-1.2.17.jar:/u01/hadoop/zookeeper/bin/../lib/jline-0.9.94.jar:/u01/hadoop/zookeeper/bin/../lib/audience-annotations-0.5.0.jar:/u01/hadoop/zookeeper/bin/../zookeeper-3.4.12.jar:/u01/hadoop/zookeeper/bin/../src/java/lib/*.jar:/u01/hadoop/zookeeper/bin/../conf:
2018-11-21 20:28:41,087 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.10.0-693.11.1.el7.x86_64
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
2018-11-21 20:28:41,088 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
2018-11-21 20:28:41,089 [myid:] - INFO  [main:ZooKeeper@441] - Initiating client connection, connectString=dwh-zookeeper001 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29
2018-11-21 20:28:41,107 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server dwh-zookeeper001/10.100.2.189:2181. Will not attempt to authenticate using SASL (unknown error)
2018-11-21 20:28:41,156 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@878] - Socket connection established to dwh-zookeeper001/10.100.2.189:2181, initiating session
2018-11-21 20:28:41,162 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server dwh-zookeeper001/10.100.2.189:2181, sessionid = 0x100036d0a6d0036, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Created /flume-configs2/delphoi-to-dwh-hadoop-all

To check the nodes in the Zookeeper ls command can be used as follows and get command can also be used to get the contents of the node.


[hadoop@dwh-hadoop-nn1 ~]$ /u01/hadoop/zookeeper/bin/zkCli.sh \
-server dwh-zookeeper001 \
ls /flume-configs2
Connecting to dwh-zookeeper001
2018-11-21 20:29:00,115 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.12-e5259e437540f349646870ea94dc2658c4e44b3b, built on 03/27/2018 03:55 GMT
2018-11-21 20:29:00,118 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=dwh-hadoop-nn1
2018-11-21 20:29:00,118 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_181
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/u01/hadoop/jdk/jre
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/u01/hadoop/zookeeper/bin/../build/classes:/u01/hadoop/zookeeper/bin/../build/lib/*.jar:/u01/hadoop/zookeeper/bin/../lib/slf4j-log4j12-1.7.25.jar:/u01/hadoop/zookeeper/bin/../lib/slf4j-api-1.7.25.jar:/u01/hadoop/zookeeper/bin/../lib/netty-3.10.6.Final.jar:/u01/hadoop/zookeeper/bin/../lib/log4j-1.2.17.jar:/u01/hadoop/zookeeper/bin/../lib/jline-0.9.94.jar:/u01/hadoop/zookeeper/bin/../lib/audience-annotations-0.5.0.jar:/u01/hadoop/zookeeper/bin/../zookeeper-3.4.12.jar:/u01/hadoop/zookeeper/bin/../src/java/lib/*.jar:/u01/hadoop/zookeeper/bin/../conf:
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.10.0-693.11.1.el7.x86_64
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
2018-11-21 20:29:00,120 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
2018-11-21 20:29:00,121 [myid:] - INFO  [main:ZooKeeper@441] - Initiating client connection, connectString=dwh-zookeeper001 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29
2018-11-21 20:29:00,138 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server dwh-zookeeper001/10.100.2.189:2181. Will not attempt to authenticate using SASL (unknown error)
2018-11-21 20:29:00,184 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@878] - Socket connection established to dwh-zookeeper001/10.100.2.189:2181, initiating session
2018-11-21 20:29:00,190 [myid:] - INFO  [main-SendThread(dwh-zookeeper001:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server dwh-zookeeper001/10.100.2.189:2181, sessionid = 0x100036d0a6d0038, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[delphoi-to-dwh-hadoop-all]
[hadoop@dwh-hadoop-nn1 ~]$ 

Run Flume with the Configs in the Zookeeper


Since we have the flume configuration file as a Zookeeper node, we can now run the flume-ng with the configuration stored in the Zookeeper. To enable flume use this configuration run the flume with following parameters. My Zookeeper installation has three nodes, so I use all three nodes with comma separated for high availability purposes. Also to run the flume agent as a background process, I used nohup and & at the end of the command.


nohup flume-ng agent -c ./conf -z dwh-zookeeper001:2181,dwh-zookeeper002:2181,dwh-zookeeper003:2181 -p /flume-configs2 -n delphoi-to-dwh-hadoop-all &

14 views0 comments

Recent Posts

See All

Comments


bottom of page