OS4X Enterprise - Fetch files from (S)FTP server

From OS4X
Jump to navigation Jump to search

OS4X offers an easy way to create OS4X Enterprise receive jobs from FTP server content. This solution is based on two mechanisms:

  1. Mount remote server directory as a local directory.
  2. Configure the OS4X Directory Scanner to that mounted directory, create OS4X Enterprise receive jobs.

This documentation covers all technical aspects to implement a functionality to automatically fetch new files from a (S)FTP server.

Assumption

All details explained here are based on the freely available pre-installed OS4X VMware virtualized image, which is also available for other virtualization solutions via OVA. In general, all solutions explained here can be used in any modern Linux environment. All steps explained here must be executed as user "root" unless any other documentation states to switch user context.

In this example, the following attributes are used:

  • User running OS4X: "www-data", group "www-data".
  • Target mount point for FTP directory: /mnt/ftp/server1
  • Target FTP server: 192.168.20.71, username "os4x", password "os4x"
  • Path on server containing files for OS4X Enterprise jobs: "to_be_retrieved"

Your user which is connecting to the server must be able to delete files, since after transmission the file will be deleted from the server.

Install required packages

You need to install the following packages for (S)FTP mounting:

apt-get -y install sshfs curlftpfs

Change user group membership

The user running OS4X (configured in "Configuration" -> "Daemon" -> "Run OS4X programs as user") must be added in the user group "fuse":

adduser www-data fuse

Change permissions of /dev/fuse

By default, the required device file "/dev/fuse" is only writable by user "root". We need to extend the permissions:

chgrp fuse /dev/fuse
chmod g+rw /dev/fuse

Create target mountpoint

The FTP target directory must be mounted somewhere into the local filesystem to be readable by OS4X. You may use any (in best case empty) directory. We need this directory name later for the configuration of the OS4X directory scanner. The owner of the target directory must be the configured user running OS4X):

mkdir -p /mnt/ftp/server1
chown www-data.www-data /mnt/ftp/server1

Save FTP credentials securely

In order to automatically connect to the (S)FTP server, save the credentials in a single line in the following file:

/root/.netrc

The syntax of the file is simple: per line, one server can be given by its name (hostname or IP), followed by keywords for username and password, with their values. Example:

machine 192.168.20.71 login os4x password os4x

This file must have permissions to be readable only by root, so you might change the permissions after creating / modifying the file:

chmod 600 /root/.netrc

Add server mount for bootup

Many situations for different FTP servers may occur. Some common situations are documented here. To let the (S)FTP server be mounted at bootup (which is the most common way), you have to add a line to the filesystem table file:

/etc/fstab

You have to change the IP address (and possibly username) and mount point accordingly to your needs.

Add simple FTP server

The line to be added has the following syntax:

curlftpfs#192.168.20.71 /mnt/ftp/server1 fuse auto,allow_other,disable_eprt,_netdev 0 0

Add FTPS server

The line to be added has the following syntax:

curlftpfs#192.168.20.71 /mnt/ftp/server1 fuse auto,allow_other,disable_eprt,_netdev,ssl,no_verify_peer,no_verify_hostname 0 0

Add FTP over implicit TLS server

The line to be added has the following syntax:

curlftpfs#192.168.20.71 /mnt/ftp/server1 fuse auto,allow_other,disable_eprt,_netdev,ssl 0 0

If you receive the error complaining about an invalid hostname in the certificate, like:

Error connecting to ftp: SSL: certificate subject name (a.b.c.d) does not match target host name 'a.b.c.d'

add the options "no_verify_peer" and "no_verify_hostname" to the options:

curlftpfs#192.168.20.71 /mnt/ftp/server1 fuse auto,allow_other,disable_eprt,_netdev,ssl,no_verify_peer,no_verify_hostname 0 0

Add FTP over explicit TLS server

The line to be added has the following syntax:

curlftpfs#192.168.20.71 /mnt/ftp/server1 fuse auto,allow_other,disable_eprt,_netdev,ssl_control 0 0

Add SFTP (FTP over SSH)

You have to know the absolute path fro the remote server to be mounted for that task. For an automatic mount, you need to save your own SSH public key in the remote system's file "~/.ssh/authorized_keys". This enabled an automatic login without password prompt (if allowed by the remote SSH server).

sshfs#os4x@192.168.20.71:/home/os4x/ /mnt/ftp/server1 fuse auto,_netdev,allow_other 0 0

Using a proxy server

When using a proxy server, you have to add an option to the entry line in "/etc/fstab":

proxy=http://username:password@proxy-server:3128

A complete line in "/etc/fstab" would be:

curlftpfs#192.168.20.71 /mnt/ftp/server1 fuse auto,allow_other,disable_eprt,_netdev,proxy=http://proxyuser:proxypwd@proxy-server:3128 0 0

Beware that the credentials are saved in a system-wide readable format, so use a pre-defined proxy user only for that task (i.e. with limited permissions). You may also want to set up the proxy environment variable as described in OS4X HTTP Proxy support.

Testing server connection

After having added the appropriate entry in "/etc/fstab", execute the following command to mount the file system:

mount -a

If no error occurs, the target mountpoint shows the content of the remote server.

Add directory scanner rule

In the administrative web interface, navigate to "Queues" -> "Dir.scanner". In the toolbar on top of the panel, click on the button "Add". In the opening window, configure your directory scanner rule according to your setting of your mounted directory:

Dirscanner Receive job.png

Be sure to use "OS4X Enterprise receive job" as type selection, since this implies that the files are "incoming" files, not to be sent to external partners. For file selection (per OS4X job), two possibilites exist:

  • Multiple files in one OS4X job: if your OS4X jobs are divided into subdirectories (relative to the directory configured in "Directory" above), you can generate OS4X Enterprise jobs based on the content of these directories. One job will be created per directory.
  • Single file per OS4X job: if files are lying in that directory without logic, you can create incoming jobs, one per file.

The sender of the job is for documentation reasons only, commonly you configure a person of a sender company who is resonsible for the data or service.

The recipient of the job is used to calculate the executed plugin group. The configured plugin group for receive jobs is executed.

Side notes

  • During the transfer of files from the mounted server, the directory scanner waits until the end of the current task. This is required due to the uniqueness of exactly one process per OS4X, it is the basis of all directory scanner rules.
  • The used solution "curlftpfs" automatically reconnects if server connection is lost.
  • In many cases, the time stamps of the files on a mounted file system are relatively seen not the same as the local system. "curlftpfs" does provide a mechanism to transfer only files from the server which are not used by other processes (i.e. upload of the same file from other source). You can solve errornous situations by the following approach:
    • during upload, name the file with a pre- or postfix name (i.e. "*.uploading")
    • configure the regular expression of your scanning rule not to pick such files
    • rename the correctly uploaded file to a name without such a pre- or postfix name
    • if nothing helps, you can use a separate mechanism, i.e. cron, to move files to a target directory which is then configured as the scanned directory of the directory scanner
  • During the transfer of the files from an external server, the OS4X Enterprise job itself is created, the status "waitig for files" is given. The plugin logs contain information about what is actually in progress:

Dirscanner move file log.png

External documentation

Since technologies from external sources are used, there exist several good documentation about mounting remote file systems into your Linux environment. The technology basically used is called "fuse", which is a file system support in user space (instead of kernel space, which is normal for Unix).

External ressources: