Difference between revisions of "OS4X Directory Scanner"

From OS4X
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 34: Line 34:
  
 
If the regular expression is not correct, the directory scanner will identify this situation, add a log entry to the system log and disable this configured directory scanner configuration.
 
If the regular expression is not correct, the directory scanner will identify this situation, add a log entry to the system log and disable this configured directory scanner configuration.
 +
 +
Since OS4X version 2018-10-25, you can choose the engine for the regular expression (before that point, POSIX was the default engine). Offering the PCRE engine, regular expression can become much more complex, offering a huge benefit of selecting the correct file or directory.
 +
 +
=== Ignore files with suffix ".part" ===
 +
Many upload tools append a filename suffix "<code>.part</code>" during the transmission and remove it afterwards. These files can be ignored by enabling this checkbox. This is also a handy feature if [[OS4X_Core_configuration#Serialize_incoming_files|serialization of incoming OFTP2 files]] is enabled.
  
 
=== Age of entry ===
 
=== Age of entry ===
 
With the given age, only entities (files or directories; depending on your configuration) are taken which are older than this amount of seconds. The minimum age of entries must be at least the value of the [[OS4X_Core_configuration#time_slice_for_send_queue_daemon|send queue daemon timeslice value]].
 
With the given age, only entities (files or directories; depending on your configuration) are taken which are older than this amount of seconds. The minimum age of entries must be at least the value of the [[OS4X_Core_configuration#time_slice_for_send_queue_daemon|send queue daemon timeslice value]].
 +
 +
=== Recursive search path depth ===
 +
You can influence, how "deep" the directory scanner scans for valid entries. By default, the directory scanner scans for objects only in the configured directory (depth: 0). If you want to dig deeper, you can give a valid depth value.
  
 
=== Type selection ===
 
=== Type selection ===
Line 107: Line 115:
 
===== Sender selection =====
 
===== Sender selection =====
 
The sender of the job will be defined here. Only jobs with a valid (non-deleted) sender are created. If the sender is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.
 
The sender of the job will be defined here. Only jobs with a valid (non-deleted) sender are created. If the sender is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.
 +
 +
===== Sender selection (regular expression) =====
 +
By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the sender entity. The entity must be active.
  
 
===== Recipient selection =====
 
===== Recipient selection =====
Line 112: Line 123:
  
 
Remember that the plugin group for send jobs of the recipient of the job will be executed, which can be configured at user, department, location or company level.
 
Remember that the plugin group for send jobs of the recipient of the job will be executed, which can be configured at user, department, location or company level.
 +
 +
===== Recipient selection (regular expression) =====
 +
By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the recipient entity. The entity must be active.
  
 
===== Job comment =====
 
===== Job comment =====
Line 126: Line 140:
 
===== Sender selection =====
 
===== Sender selection =====
 
The sender of the job will be defined here. Only jobs with a valid (non-deleted) sender are created. If the sender is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.
 
The sender of the job will be defined here. Only jobs with a valid (non-deleted) sender are created. If the sender is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.
 +
 +
===== Sender selection (regular expression) =====
 +
By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the sender entity. The entity must be active.
  
 
===== Recipient selection =====
 
===== Recipient selection =====
Line 131: Line 148:
  
 
Remember that the plugin group for receive jobs of the recipient of the job will be executed, which can be configured at user, department, location or company level.
 
Remember that the plugin group for receive jobs of the recipient of the job will be executed, which can be configured at user, department, location or company level.
 +
 +
===== Recipient selection (regular expression) =====
 +
By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the recipient entity. The entity must be active.
  
 
===== Job comment =====
 
===== Job comment =====
 
An optional job comment can be added to the send job.
 
An optional job comment can be added to the send job.
 +
 +
=== Person regular expression ===
 +
For OS4X Enterprise send or receive jobs, dynamically searched persons can be used via addressing with PCRE regular expressions. This regular expression must return a match with a first value. This value is then searched in the following fields for a unique active person entry, the first occurance defines the value:
 +
*address code
 +
*username
 +
*recipient's comment
 +
*API key
  
 
=== Sorting order ===
 
=== Sorting order ===
Line 159: Line 186:
  
 
=== Clone entry ===
 
=== Clone entry ===
For faster configuration, a directory scanner entry can be cloned. Every configuration parameter is being copied into the cloned entry, a new name has to be given to the entry. Click on the icon "<code>clone directory entry '...'</code>" to clone an entry:
+
For faster configuration, a directory scanner entry can be cloned. Every configuration parameter is being copied into the cloned entry, a new name has to be given to the entry. Click on the icon "<code>Clone</code>" <i class="far fa-copy"></i> to clone an entry:
  
 
[[Image:Dirscanner clone.png]]  
 
[[Image:Dirscanner clone.png]]  
  
Then configure a new name for the cloned entry:
+
Then configure all parameters for the cloned entry:
  
 
[[Image:Dirscanner clone2.png]]
 
[[Image:Dirscanner clone2.png]]
Line 191: Line 218:
 
Since Regular Expressions are not everybody's best friend, some handy tools are available online for testing and verifying regular expressions. Some are listed here, but they may be offline from time to time. Use your favorite search engine to look for tools helping with regular expressions.
 
Since Regular Expressions are not everybody's best friend, some handy tools are available online for testing and verifying regular expressions. Some are listed here, but they may be offline from time to time. Use your favorite search engine to look for tools helping with regular expressions.
  
 +
*[https://regex101.com https://regex101.com]
 
*[http://regexpal.com/ http://regexpal.com/]
 
*[http://regexpal.com/ http://regexpal.com/]
 
*[http://www.fileformat.info/tool/regex.htm http://www.fileformat.info/tool/regex.htm]
 
*[http://www.fileformat.info/tool/regex.htm http://www.fileformat.info/tool/regex.htm]
*[http://www.solmetra.com/scripts/regex/ http://www.solmetra.com/scripts/regex/]
 
*[http://www.quanetic.com/Regex http://www.quanetic.com/Regex]
 
*RegExecution iOS App: [http://itunes.apple.com/de/app/regexecution/id380985639?mt=8 http://itunes.apple.com/de/app/regexecution/id380985639?mt=8]
 
*[https://regex101.com https://regex101.com]
 

Latest revision as of 14:12, 7 March 2024

What is the OS4X Directory Scanner?

The goal of the directory scanner is to scan configured directories (without recursion) for new files (older than 60 seconds) and apply a matching pattern on them. If the pattern matches, the file will be moved to the configured outgoing directory and an executable will be started with parameters defined for this directory scanner entry, based on either fix values or dynamic ones.

The OS4X Directory Scanner is available since OS4X 3 in OS4X 3 Core.

Configuration of scanning tasks

Using the directory scanner needs some configuration via web interface and optionally in addition on the filesystem (if you really want to modify the behaviour more deeply).

Queues menu.png

Menu entry

The menu entry "Dir.scanner" in the administrative web interface exists if the binary

os4x_ds_dryrun

exists in the installation directory for binaries of OS4X.

Clicking on that links shows you the actually configured directory scanner entries, with an empty view in the default installation.

Dirscanner empty.png

You can click on "New" or the empty paper icon to create a new entry. In order to edit an entry, click on the edit icon.

The following screenshot shows the edit page of an existing directory scanner entry:

Dirscanner scantest.png

Name

The name of the directory scanner entry can be a human-interpretable textual string which will only occur in the logs.

Directory

The directory on which the directory scanner works on. Remember that only that directory without subdirectories will be scanned. The configured outgoing directory cannot be configured since the files will be moved into that directory before executing the command for a found file.

Regular expression

The file name found in the configured directory must match this regular expression. Regular expressions are quite complex but very powerful. The name of the found file must result into a true value (which means that any output of the regular expression is valid but not the empty). The engine compiling these regular expression values is always PCRE, which implements Perl-style regular expressions, which are widely used across different systems.

If the regular expression is not correct, the directory scanner will identify this situation, add a log entry to the system log and disable this configured directory scanner configuration.

Since OS4X version 2018-10-25, you can choose the engine for the regular expression (before that point, POSIX was the default engine). Offering the PCRE engine, regular expression can become much more complex, offering a huge benefit of selecting the correct file or directory.

Ignore files with suffix ".part"

Many upload tools append a filename suffix ".part" during the transmission and remove it afterwards. These files can be ignored by enabling this checkbox. This is also a handy feature if serialization of incoming OFTP2 files is enabled.

Age of entry

With the given age, only entities (files or directories; depending on your configuration) are taken which are older than this amount of seconds. The minimum age of entries must be at least the value of the send queue daemon timeslice value.

Recursive search path depth

You can influence, how "deep" the directory scanner scans for valid entries. By default, the directory scanner scans for objects only in the configured directory (depth: 0). If you want to dig deeper, you can give a valid depth value.

Type selection

If you configured and licensed to use OS4X Enterprise, you have the option to

  • Scan for files, handled for OS4X Core enqueueing
  • Scan for directories or files to be used for OS4X Enterprise job creation

Depending on your choice, you're getting a different configuration view:

OS4X Core

Configuration values types

A found file matching the configured regular expression leads to a number of paraeters which are then passed to the executable for later using them. There are two types of configuration values you can use for every single configuration parameter:

fix values

The easiest way to use a configuration value is to pre-set it with a fix value. This is mostly a good decision if i.e. the directory is partner-based and the configuration of the communication partner is fix (due to its nature of residence in that configured directory).

variable values

Another way to extract a configuration value is based on the found file. The found filename (without path) will be passed to the configured regular expression, where the first variable definition, which are normally enclosed by round brackets: '(' and ')'. Subsequent variable extractions will be ignored. If no variable value is extractable by the configured regular expression on the given file, an empty string is used as parameter value.

"matching pattern activates functionality" configuration values

There exist parameters which are being activated if the returned value is non-empty. So even a zero ("0") activates the functionality. Be sure to enable a functionality only by configuring values, ignoring their interpretation.

Configuration values

These fixed parameters are available which are then passed to the configured executable below:

Partner selection

This parameter defines normally a partner shortname. Used by the enqueueing script.

Virtual filename selection

Since the file has a separate name on the filesystem and during transport (and lateron at partner's receive side), you have to define a virtual filename.

Comment selection

This comment will be put into the comment field of the enqueued file when using the standard enqueueing process.

Originator SFID / Destination SFID

For a separate sender's and receiver's SFID extraction, this value defines with which SFID the file will be sent. Leave empty if you want to use the partner's default configuration.

Passive switch selection

If the found file should be enqueued passively, the value of this configuration parameter should be not-empty. (see "os4xeq", parameter "-P").

Binary transfer mode selection

If the default transfer mode of "binary" should be used instead of fixed or variable record length, this parameter activates this functionality if an non-empty value is returned.

Fixed record length transfer mode selection

If the default transfer mode should be overridden and "fixed record length" files should be transfered, a non-empty return value activates this functionality.

Variable record length transfer mode selection

If the default transfer mode should be overridden and "variable record length" files should be transfered, a non-empty return value activates this functionality.

record length selection

If a non-binary transfer mode is used for the found file, you have to define which record length is being used (max.: 2048). This value is ignored in binary transfer mode.

Execution

As stated before, a found file matching the regular expression pattern has got a number of configuration values. These parameters are passed to an executable, which has the task to handle these input parameters. You can insert any executable you want, you may want to script your own ones or use a preset included in the standard installation.

The preset is:

  • "OS4X Core enqueueing" (dirscanner_os4xeq.sh): This script parses all parameters correctly to enqueue the found file to the OS4X send queue with the given parameters.

The presets are not fix, you may insert any executable you want.

OS4X Enterprise send job

If you want to create OS4X send jobs automatically via the directory scanner, you can switch to "OS4X Enterprise send job" mode.

For every single send job created by the directory scanner, a new directory will be created in the configured outgoing directory with a configured name prefix, appended by the dynamic job number.

Scan for

You have the possibility to create OS4X send jobs from single files (which match your regular expression configured above) or directories (which are scanned for files within and in subdirectories). Every single file will be moved into the created outgoing send job directory, the original directory will be removed.

Sender selection

The sender of the job will be defined here. Only jobs with a valid (non-deleted) sender are created. If the sender is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.

Sender selection (regular expression)

By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the sender entity. The entity must be active.

Recipient selection

The recipient of the job will be defined here. Only jobs with a valid (non-deleted) recipient are created. If the recipient is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.

Remember that the plugin group for send jobs of the recipient of the job will be executed, which can be configured at user, department, location or company level.

Recipient selection (regular expression)

By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the recipient entity. The entity must be active.

Job comment

An optional job comment can be added to the send job.

OS4X Enterprise receive job

If you want to create OS4X receive jobs automatically via the directory scanner, you can switch to "OS4X Enterprise receive job" mode.

For every single receive job created by the directory scanner, a new directory will be created in the configured outgoing directory with a configured name prefix, appended by the dynamic job number.

Scan for

You have the possibility to create OS4X receive jobs from single files (which match your regular expression configured above) or directories (which are scanned for files within and in subdirectories). Every single file will be moved into the created outgoing receive job directory, the original directory will be removed.

Sender selection

The sender of the job will be defined here. Only jobs with a valid (non-deleted) sender are created. If the sender is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.

Sender selection (regular expression)

By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the sender entity. The entity must be active.

Recipient selection

The recipient of the job will be defined here. Only jobs with a valid (non-deleted) recipient are created. If the recipient is not valid, the directory scanner entry will be deactivated dynamically by the directory scanner.

Remember that the plugin group for receive jobs of the recipient of the job will be executed, which can be configured at user, department, location or company level.

Recipient selection (regular expression)

By defining a regular expression, the PCRE engine will be used to extract the value (first match) of the regexp. This value is being searched in all fields for person search (see below). If exactly one value is found, this person is used as the recipient entity. The entity must be active.

Job comment

An optional job comment can be added to the send job.

Person regular expression

For OS4X Enterprise send or receive jobs, dynamically searched persons can be used via addressing with PCRE regular expressions. This regular expression must return a match with a first value. This value is then searched in the following fields for a unique active person entry, the first occurance defines the value:

  • address code
  • username
  • recipient's comment
  • API key

Sorting order

The OS4X Directory Scanner functionality scans directories in the order of the configuration, so you have to keep in mind that top-level entries will be scanned first. You may want to configure the same directory with different regular expressions for file name mathing in order to minimize the amount of scanned directories but increase the complexity of file names. Since regular expressions may match in both cases, you can keep your entries in order by clicking the icons on the right-hand of the directory list (up and down).

Dirscanner movedown.png

Enable / disable entry

In order to test entries (or if anything is not configured correctly OS4X disables entries, too), you have the possibility to en- and disable directory scanner entries. Disabled entries are not used by the directory scanner, they are displayed as a grey line. To disable, click on the third icon on the left hand entitled as "deactivate directory entry '...'":

Dirscanner disable.png

To enable an entry, click on the icon "activate directory entry '...'":

Dirscanner enable.png

Preview / Dry-run

A preview shows you via web interface what would happen if the directory scanner whould start to work on the selected entry. Click on the icon "dry-run directory entry '...' for verification" to start the process:

Dirscanner start dryrun.png

The new opening windows shows you what would happen:

Dirscanner dryrun.png

Clone entry

For faster configuration, a directory scanner entry can be cloned. Every configuration parameter is being copied into the cloned entry, a new name has to be given to the entry. Click on the icon "Clone" to clone an entry:

Dirscanner clone.png

Then configure all parameters for the cloned entry:

Dirscanner clone2.png

Delete entry

In order to delete a directory scanner entry, click on the trash icon entitled with "delete directory entry '...'":

Dirscanner delete.png

Then confirm the deletion:

Dirscanner delete2.png

Logging

Logging will be done in general for the following items:

  • a regular expression is not valid
  • a directory is not accessable
  • a file which should be moved by the directory scanner is not movable (which includes a inner-filesystem and outer-filesystem file movement)

In addition, logging is globally configurable for every found file and execution of the command via a configuration parameter ("Configuration" -> "Logging" -> "Enable directory scanner logging?"). If this configuration parameter is enabled, every single file which has been found by the directory scanner and which succeeds the configured regular expression will be logged, including the time and date, the script, all parameters, returncode of the script and the its output. Log vault functionality is given here, too.

often used regular expressions

  • Everything (any file):
.*

external links

Since Regular Expressions are not everybody's best friend, some handy tools are available online for testing and verifying regular expressions. Some are listed here, but they may be offline from time to time. Use your favorite search engine to look for tools helping with regular expressions.