To execute an extraction, you must have the 'Execute extractions' right.
In the Home tab, click on the 'Extractions' button:
The 'Extractions' tool opens, listing all existing definition files. Definition files are located in the WEB-INF/param/extraction directory:
A contextual 'Extractions' tab also appears in the ribbon:
Select a definition file and click on the 'Modify' button from the 'Extractions' tab:
The 'Extraction details' tool opens, displaying the various extraction components in tree form. The tool title is the name of the definition file:
In the 'Extractions' tool, select a definition file and click on 'Execute'. from the 'Extractions' tab. You can also click on the 'Execute' button in the 'Extraction details' tool.
Certain parameters are required to run an extraction. The extraction execution tool opens, allowing you to view the extraction details and enter these parameters:
In the administration area, you can schedule a 'Run extraction' task. The parameters requested are the same as those shown above. Values for query variables and optional columns must be supplied in JSON format.
An extraction can be run fromURL url -du-cms.ext/_extraction/extract?file=extractionDef.xml &lang=en&maVariable=value&pipeline=pipeline1.xml
URL is authenticated, and token authentication is possible. The logged-in user must have the right to "View and execute extractions".
Of course, query parameters must be encoded.
An output is the combination of 0, one or more style sheets ( XSLT files), and a result file format (XML, text, PDF).
Stylesheets ( XSLT files) are used to customize the extraction result file.
If you have chosen to generate output in PDF format, this style sheet is mandatory.
An output definition is defined in a file XML in WEB-INF/param/extraction/config
It is used to define the output label, style sheets and output format.
Its syntax is as follows:
<pipeline> <label i18n="true">I18N_KEY</label> <stylesheets> <xslt name="foo.xsl"/> <xslt name="bar.xsl"/> <xslt name="qux.xsl"/> </stylesheets> <out type="text" path="monchemin/vers/${meta1}/mondossier" extension="rtf"/> <extractions> <extraction>extraction_1.xml</extraction> <extraction>extraction_2.xml</extraction> </extractions> </pipeline>
The label can be a key in a i18n catalog, or just a character string.
Style sheets must be located in WEB-INF/param/extraction/stylesheets
The type attribute on the out tag is mandatory. Valid output format types are xml | text | pdf
The path attribute on the out tag is optional. This is the path of the subfolder/file in which the results of the extraction run will be placed. The sub-folder may contain variables, for example, if you set mypath/to/${title}/mondossier/result.xml then a result file will be created in mypath/to/Content1/mondossier/result.xml, one in mypath/to/Content2/mondossier/result.xml,etc.
The extension attribute on the out tag is optional. It is the extension of the result file(s) when the path attribute is a folder and not a file.
On the other hand, if the type is text, the following attributes can be added:
You can enter the list of extractions managed by this output. If no extraction is entered, the output will be proposed for all extractions.
In the 'Extractions' tab, click on the 'Results' button :
The 'Results' tool opens, containing a list of all existing results files. The name of a results file is constructed from the name of the definition file and the execution date:
To refresh the tool, click on the Refresh button :
To download a results file, select it and click on the Download button. :
To delete a results file, select it and click on the Delete button. :
The results of an extraction run can be downloaded fromURL url -du-cms.ext/_extraction/download/my/path/to/the/result.xml
my/path/to/the/result.xml is the path to the result file.
URL is authenticated, and token authentication is possible. The logged-in user must have the right to "View and execute extractions".
Since version 4.4, theURL link to access the results is included in the mail file received when an extraction is run.
In the administration area, you can schedule a "Delete obsolete extractions" task. This takes into account a lifetime parameter: extractions older than this will be deleted.