What is the difference between directory monitoring and file watching?
Question:
The reference documentation states two chapters on "Directory Monitoring" and "Directory Monitoring with File Orders". What is the difference between these features?
Answer:
Directory Monitoring:
is used to start jobs automatically when a file event is triggered in a directory. You could either monitor a directory to start a job or have separate orders being created for every file.
Directory monitoring for job starts
You could have any job start automatically in case of changes to one or more directories by adding the <start_when_directory_changed> element to the job configuration. JobScheduler will start the job if an event is triggered in the directory for a file that matches a regular expression.
However, your job implementation will have to care for the fact that multiple files could arrive simultaneously. JobScheduler passes the file names to the job by the environment variable
SCHEDULER_TASK_TRIGGER_FILES
and by the API methodspooler_task.trigger_files()
. Multiple file names are separated by ";". Additionally your job has to be careful in handling file names that contain spaces as you could see from the below examples. Moreover, its up to your job implementation to move or remove the files from the input directory and to handle respective errors. Therefore it could be more convenient to use file orders.Example:
<job name="my_job"> <!-- for unix shell --> <script language = "shell"><![CDATA[ IFS=";" for trigger_file in ${SCHEDULER_TASK_TRIGGER_FILES} do echo $trigger_file mv "$trigger_file" /tmp/output done IFS=$' \t\n' exit 0 ]]</script> <!-- for windows shell --> <-- <script language = "shell"><![CDATA[ @echo off if not defined SCHEDULER_TASK_TRIGGER_FILES exit 0 set trigger_files=%SCHEDULER_TASK_TRIGGER_FILES:;=?% :loop for /F "usebackq tokens=1* delims=?" %%i in ('%trigger_files%') do ( set trigger_files=%%j @echo %%~fi move /y "%%~fi" \tmp\output goto loop ) exit 0 ]]></script> --> <start_when_directory_changed directory="/tmp" regex="sos.*"/> </job>
Directory Monitoring for File Orders
Starting with release 1.2.9 you could have orders being created automatically for every file that appears in one of the monitored directories. This is done by adding one or more <file_order_source/> elements as the first job nodes of your job chain. For an explanation of orders see What is the concept of "job chains and order processing"?
For every directory covered by a <file_order_source> element orders will be created automatically for files that match the given regular expression. You do not have to care for concurrency issues: the order is created just once and the file name is provided by the order parameter
order.params().value("scheduler_file_path")
. Having processed the file by your jobs it will be moved or removed by a <file_order_sink> element at the end of the job chain. Should the file be manually deleted, then JobScheduler would automatically remove the order unless it is processed by a job node.As orders can be persistently stored in a JobScheduler database, processing could be resumed after a restart of JobScheduler. The same is true for errors during processing in job nodes: the order could be setback or repeat the job node after a given delay.
Example:
<job_chain name = "inbound_files"> <file_order_source directory = "/tmp/inbound" regex = "[^~]$" delay_after_error = "5"/> <file_order_source directory = "/tmp/inbound.add" regex = "[^~]$" delay_after_error = "5"/> <job_chain_node state = "convert" next_state = "transfer" error_state = "error" job = "file_convert"/> <job_chain_node state = "transfer" next_state = "success" error_state = "error"/> job = "file_transfer"/> <file_order_sink move_to = "/tmp/inbound.success" state = "success"/> <file_order_sink move_to = "/tmp/inbound.error" state = "error"/> </job_chain>
For more details see the reference documentation on Directory Monitoring with File Orders.