1. General presentation
  2. Configuration
  3. Detecting duplicates
    1. Unit detection by enriching workflow
      1. Configuration
      2. Behavior
    2. Global detection in a scheduled task

This feature is available from version 4.5.

General presentation

Duplicate detection checks whether two items of content are identical (duplicates) or almost identical (near-duplicates). 

Two contents are considered duplicates if the attributes present in a configured list are identical. 

Two contents are considered to be near-duplicates if the text attributes present in a configured list are all identical within two characters(for example, "toto" and "tata" would be considered close enough to be near-duplicates) .

Duplicate detection can be performed either :

  • on a particular content
  • on all types of content

Configuration

Duplicate detection is configured in a XML file named duplicate-contents.xml, which must be present in the WEB-INF/param directory. By default, there is no duplicate configuration.

This file lists the contents to be checked and the attributes to be checked. 

The syntax of the XML file is as follows: 

<duplicates>
 <content-type id="org.ametys.web.default.Content.article">
  <attribute path="title" strict="true" />
  <attribute path="comment"/>
 </content-type>
</duplicates>

The types of content to check for duplicates are listed in : 

  • une balise <content-type> avec l'attribut "id" identifiant le contenu. 

Chaque balise <content-type> contient la liste des attributs à vérifier via : 

  • une balise <attribute>, contenant : 
    • the "path" attribute identifying the path of the attribute to be checked
    • the "strict" attribute (optional) indicating whether the search will be strict only (search for duplicates only). By default, this attribute is set to false.

The following attribute types are not supported by duplicate detection: geocode, reference, user.

It's not possible to specify a repeater or composite in its entirety. It is, however, possible to target a particular attribute of the repeater or composite, with the following syntax: repeater/attribute1 or composite/attribute1

The near-duplicate search only applies to fields of type String. 

Detecting duplicates

Unit detection by enriching workflow

Configuration

All workflows supplied by Ametys are already configured to detect duplicates. 

If you are using a custom workflow, duplicate detection is performed in the extensible editing post-functions (generally placed at creation and modification):

<function type="avalon">
<arg name="role">org.ametys.cms.workflow.extensions.ExtensibleFunction</arg>
<arg name="extension-point">org.ametys.cms.workflow.extensions.PostContentEditionFunctionsExtensionPoint</arg>
</function>

Behavior

When you save a content whose type is listed in the duplicate-contents configuration file.xml, a search for duplicates and near-duplicates is performed. 

If duplicates or near-duplicates are found, a window of this type is displayed, with a link to open the duplicate content: 

Global detection in a scheduled task

A new "Duplicate detection" entry is available in the administration tool's scheduled tasks: 

This task initiates duplicate detection, and analyzes content whose type is listed in the duplicate-contents configuration file.xml. 

A report is sent to mail, with links to open duplicate content. 

Extract: 

Back to top