This feature is available from version 4.5.
Duplicate detection checks whether two items of content are identical (duplicates) or almost identical (near-duplicates).
Two contents are considered duplicates if the attributes present in a configured list are identical.
Two contents are considered to be near-duplicates if the text attributes present in a configured list are all identical within two characters(for example, "toto" and "tata" would be considered close enough to be near-duplicates) .
Duplicate detection can be performed either :
Duplicate detection is configured in a XML file named duplicate-contents.xml, which must be present in the WEB-INF/param directory. By default, there is no duplicate configuration.
This file lists the contents to be checked and the attributes to be checked.
The syntax of the XML file is as follows:
<duplicates> <content-type id="org.ametys.web.default.Content.article"> <attribute path="title" strict="true" /> <attribute path="comment"/> </content-type> </duplicates>
The types of content to check for duplicates are listed in :
Chaque balise <content-type> contient la liste des attributs à vérifier via :
The following attribute types are not supported by duplicate detection: geocode, reference, user.
It's not possible to specify a repeater or composite in its entirety. It is, however, possible to target a particular attribute of the repeater or composite, with the following syntax: repeater/attribute1 or composite/attribute1
The near-duplicate search only applies to fields of type String.
All workflows supplied by Ametys are already configured to detect duplicates.
If you are using a custom workflow, duplicate detection is performed in the extensible editing post-functions (generally placed at creation and modification):
<function type="avalon"> <arg name="role">org.ametys.cms.workflow.extensions.ExtensibleFunction</arg> <arg name="extension-point">org.ametys.cms.workflow.extensions.PostContentEditionFunctionsExtensionPoint</arg> </function>
When you save a content whose type is listed in the duplicate-contents configuration file.xml, a search for duplicates and near-duplicates is performed.
If duplicates or near-duplicates are found, a window of this type is displayed, with a link to open the duplicate content:
A new "Duplicate detection" entry is available in the administration tool's scheduled tasks:
This task initiates duplicate detection, and analyzes content whose type is listed in the duplicate-contents configuration file.xml.
A report is sent to mail, with links to open duplicate content.
Extract: