I want to kill all my hadoop jobs automatically when my code encounters an unhandled exception. I am wondering what is the best practice to do it?
Thanks
6 Answers
Depending on the version, do:
version <2.3.0
Kill a hadoop job:
hadoop job -kill $jobIdYou can get a list of all jobId's doing:
hadoop job -listversion >=2.3.0
Kill a hadoop job:
yarn application -kill $ApplicationIdYou can get a list of all ApplicationId's doing:
yarn application -list 1 Use of folloing command is depreciated
hadoop job -list
hadoop job -kill $jobIdconsider using
mapred job -list
mapred job -kill $jobId 1 Run list to show all the jobs, then use the jobID/applicationID in the appropriate command.
Kill mapred jobs:
mapred job -list
mapred job -kill <jobId>Kill yarn jobs:
yarn application -list
yarn application -kill <ApplicationId> An unhandled exception will (assuming it's repeatable like bad data as opposed to read errors from a particular data node) eventually fail the job anyway.
You can configure the maximum number of times a particular map or reduce task can fail before the entire job fails through the following properties:
mapred.map.max.attempts- The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.mapred.reduce.max.attempts- Same as above, but for reduce tasks
If you want to fail the job out at the first failure, set this value from its default of 4 to 1.
1Simply forcefully kill the process ID, the hadoop job will also be killed automatically . Use this command:
kill -9 <process_id> eg: process ID no: 4040 namenode
username@hostname:~$ kill -9 4040 Use below command to kill all jobs running on yarn.
For accepted jobs use below command.
for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done
For running, jobs use the below command.
for x in $(yarn application -list -appStates RUNNING | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done