How to Identify Duplicate Records by Multiple Fields in ServiceNow

I have some records in a table which are duplicates. In this case, I can only identify “duplicate records” by whether several fields are identical; not just one. For example, let’s say I want to identify Incidents where the Short description, Assignment group, and Assigned to user are all the same. If only one or two of those fields match, but one of them is different, then the records are not duplicates. The’re only duplicates if all three fields are the same.

Is there any way to accomplish this with a script? Maybe by using GlideAggregate?

Thanks for your question, and indeed, there is a way to identify duplicate records based on multiple matching fields in a script in ServiceNow!

At first glance, you might think that the GlideAggregate APIs do not support any direct mechanism by which to identify duplicate records across multiple identical fields. However, there is a way!
By getting just a smidge clever, we can call the .groupBy() API method multiple times, in combination with the .addAggregate() and .addHaving() methods, to accomplish our goal of identifying duplicate records with more than one field that are identical.

Consider the following code, for example - which is a basic example of how to find duplicates in a table based on multiple fields:

Note: This is not the optimal solution. There is far more optimal and functional code at the end of this article. If you just came here for the solution and want to skip the the learning, you can click here.
var dupeCount = 0;
var gaMyTableDuplicate = new GlideAggregate('incident');
gaMyTableDuplicate.addAggregate('COUNT');
gaMyTableDuplicate.groupBy('short_description');
gaMyTableDuplicate.groupBy('assignment_group');
gaMyTableDuplicate.groupBy('assigned_to');
gaMyTableDuplicate.addHaving('COUNT', '>', 1);
gaMyTableDuplicate.query();

while (gaMyTableDuplicate.next()) {
  dupeCount = gaMyTableDuplicate.getAggregate('COUNT');
  gs.info(
    dupeCount + ' records found with the following identical values:\n' +
    '•Short description: ' + (gaMyTableDuplicate.getValue('short_description') || '(blank)') + '\n' +
    '•Assignment group: ' + (gaMyTableDuplicate.assignment_group.getDisplayValue() || '(blank)') + '\n' +
    '•Assigned to: ' + (gaMyTableDuplicate.assigned_to.getDisplayValue() || '(blank)') + '\n\n'
  );
}

This basic method of using the GlideAggregate .groupBy() method allows us to group by fields, count groups matching on those fields, and filter where the groups have more than one record in them (aka, duplicates!)

When we execute this in a background script, we get results printed out which look something like this:

5 records found with the following identical values:
•Short description: (blank)
•Assignment group: Team Development Code Reviewers
•Assigned to: Tim Woodruff

Cool, that works!

Now let's drop it into a function we can call, and add some arguments, and some input validation.

function getMultiFieldDupes(tableName, arrDupeFields) {
    var iDupeField, gaMyTableDuplicate;
    var dupeCount = 0;
    
    if (typeof tableName !== 'string' || !tableName) {
        throw new Error(
            'getMultiFieldDupes(): tableName must be a string consisting of a valid ' +
            'table name in the ServiceNow database.'
        );
    }
    
    if ( //If arrDupeFields is not an array or a string, or if it's an array but it's empty
        typeof arrDupeFields === 'undefined' ||
        !arrDupeFields ||
        (!Array.isArray(arrDupeFields)  && typeof arrDupeFields !== 'string') ||
        !arrDupeFields.length
    ) {
        throw new Error(
            'getMultiFieldDupes(): arrDupeFields must be a string with a single ' +
            'field name, or an array of strings - each string representing a ' +
            'field name in the ' + tableName + ' table.'
        );
    }
    
    //If arrDupeFields is a string, convert it to an array.
    if (typeof arrDupeFields === 'string') {
        arrDupeFields = arrDupeFields.split(',');
    }
    
    gaMyTableDuplicate = new GlideAggregate(tableName);
    gaMyTableDuplicate.addAggregate('COUNT');
    
    //Group by each field in the arrDupeFields array
    for (
        iDupeField = 0;
        iDupeField < arrDupeFields.length;
        iDupeField++
    ) {
        gaMyTableDuplicate.groupBy(arrDupeFields[iDupeField]);
    }
    
    gaMyTableDuplicate.addHaving('COUNT', '>', 1);
    gaMyTableDuplicate.query();
    
    while (gaMyTableDuplicate.next()) {
        dupeCount = gaMyTableDuplicate.getAggregate('COUNT');
        gs.info( //todo: replace with dynamic field values - but whatever, this is just an intermediary example. 
            dupeCount + ' records found with the following identical values:\n' +
            '•Short description: ' + (gaMyTableDuplicate.getValue('short_description') || '(blank)') + '\n' +
            '•Assignment group: ' + (gaMyTableDuplicate.assignment_group.getDisplayValue() || '(blank)') + '\n' + 
            '•Assigned to: ' + (gaMyTableDuplicate.assigned_to.getDisplayValue() || '(blank)') + '\n\n'
        );
    }
}

Now we can call this function with a table name and an array of fields to group by, and it will return pretty much the same results as before.

Next, let's add some additional features and functionality.

I noticed that with the first version of our code, we were outputting records where some of the fields (such as the short description) were blank.

This may be fine, but I can imagine a scenario in which we may not want that.
Therefore, let's add a feature to allow us to filter out records where any of a specified (optional) list of fields is blank.

Since we may want to get the sys_ids for each duplicate record within a given duplicate "type" (each specific combination of unique field values), let's also add a feature to return an array of sys_ids for each duplicate record found within each duplicate "type".

And finally, rather than printing it out, let's have the function return an array of objects with details about each set of duplicates found.


Optimal Solution

This solution supports specifying one or more field(s) you want to compare for duplicates (yes, it works with only a single dupe-field as well), specifying an optional list of fields to filter out if they're blank, and even optionally getting a list of sys_ids for all of the records which match each duplicate scenario!

/**
 * Get records with duplicate values in multiple fields, and return a list of the records'
 * counts and the dupe fields' values.
 *
 * @param {string} tableName - The table to check for duplicates. Must be a string.
 *
 * @param {string|string[]} arrDupeFields - The fields to check for duplicates.
 *  Must be a string or an array of strings.
 *
 * @param {string|string[]} [nonBlankFields=[]] - This optional parameter, if specified, will
 *  be an array of strings consisting of the names of the fields which should not be allowed
 *  to be blank when checking for duplicates.
 * To put it another way, if any of the fields specified in the `nonBlankFields` array are
 *  blank, then the record should not be considered a duplicate.
 * Note that the fields specified in nonBlankFields do NOT need to also be specified in
 * arrDupeFields.
 *
 * @param {boolean} [getDupeSysIds=false] - If true, the sys_ids of the records with the
 *  duplicate values will be included in the returned object. If false, they will not be
 *  included.
 * This may have a performance impact, as it requires an additional query for each
 *  duplicate combination, so use with caution.
 * Default is false.
 *
 * @returns {{dupeCount: (*|number), fields: {}}[]} - An array of objects, each object
 *  representing a confluence of duplicate values in the specified fields - a specific
 *  combination of field values - and the number of records that have that combination of
 *  those values in the specified fields.
 */
function getMultiFieldDupes(tableName, arrDupeFields, nonBlankFields, getDupeSysIds) {
    var gaMyTableDuplicate, grDupe, iNonBlankField, iDupeField, objDupeRecord;
    var arrDupeRecords = [];
    var dupeCount = 0;
    
    /***** INPUT VALIDATION *****/
    
    getDupeSysIds = (typeof getDupeSysIds === 'boolean') ? getDupeSysIds : false;
    
    if (typeof tableName !== 'string' || !tableName) {
        throw new Error(
            'getMultiFieldDupes(): tableName must be a string consisting of a valid ' +
            'table name in the ServiceNow database.'
        );
    }
    
    if ( //If arrDupeFields is not an array or a string, or if it's an array but it's empty
        typeof arrDupeFields === 'undefined' ||
        !arrDupeFields ||
        (!Array.isArray(arrDupeFields)  && typeof arrDupeFields !== 'string') ||
        !arrDupeFields.length
    ) {
        throw new Error(
            'getMultiFieldDupes(): arrDupeFields must be a string with a single ' +
            'field name, or an array of strings - each string representing a ' +
            'field name in the ' + tableName + ' table.'
        );
    }
    
    //If arrDupeFields is a string, convert it to an array.
    if (typeof arrDupeFields === 'string') {
        arrDupeFields = arrDupeFields.split(',');
    }
    
    //If nonBlankFields is undefined, null, or an empty string, set it to an empty array.
    //If it's a string, convert it to an array.
    if (typeof nonBlankFields === 'undefined' || !nonBlankFields) {
        nonBlankFields = [];
    } else if (typeof nonBlankFields === 'string') {
        //Splitting just in case the input data is a comma-separated string - which it
        // shouldn't be, but I don't trust anyone who calls this code. They seem sus.
        nonBlankFields = nonBlankFields.split(',');
    } else if (!Array.isArray(nonBlankFields)) {
        //If it's not a string or an array or undefined, throw an error because wth am I s'posed to do with that
        throw new Error(
            'getMultiFieldDupes(): nonBlankFields must be a string with a single ' +
            'field name, or an array of strings - each string representing a ' +
            'field name in the ' + tableName + ' table.'
        );
    }
    
    /***** ACTUALLY DOING THE THING *****/
    gaMyTableDuplicate = new GlideAggregate(tableName);
    gaMyTableDuplicate.addAggregate('COUNT');
    
    //Group by each field in the arrDupeFields array
    for (
        iDupeField = 0;
        iDupeField < arrDupeFields.length;
        iDupeField++
    ) {
        gaMyTableDuplicate.groupBy(arrDupeFields[iDupeField]);
    }
    
    //If any nonBlankFields were specified, add a query to exclude records where
    // any of those fields are blank.
    for (
        iNonBlankField = 0;
        iNonBlankField < nonBlankFields.length;
        iNonBlankField++
    ) {
        gaMyTableDuplicate.addNotNullQuery(nonBlankFields[iNonBlankField]);
    }
    
    //Only show records with more than one match (duplicates)
    gaMyTableDuplicate.addHaving('COUNT', '>', 1);
    gaMyTableDuplicate.query();
    
    while (gaMyTableDuplicate.next()) {
        dupeCount = gaMyTableDuplicate.getAggregate('COUNT');
        
        //Populate the arrDupeRecords array with some info about the records that have duplicates
        objDupeRecord = {
            "dupeCount": dupeCount,
            "fields": {}
        };
        
        //For each field in the arrDupeFields array, add that field's value to
        // the objDupeRecord.fields object.
        for (
            iDupeField = 0;
            iDupeField < arrDupeFields.length;
            iDupeField++
        ) {
            objDupeRecord.fields[arrDupeFields[iDupeField]] = gaMyTableDuplicate.getValue(arrDupeFields[iDupeField]);
        }
        
        if (getDupeSysIds) {
            objDupeRecord.dupe_sys_ids = [];
            //Add the sys_ids of all the records that have this combination of dupe fields in objDupeRecord.dupe_sys_ids:
            grDupe = new GlideRecord(tableName);
            
            for (
                iDupeField = 0;
                iDupeField < arrDupeFields.length;
                iDupeField++
            ) {
                grDupe.addQuery(arrDupeFields[iDupeField], objDupeRecord.fields[arrDupeFields[iDupeField]]);
            }
            
            grDupe.query();
            while (grDupe.next()) {
                objDupeRecord.dupe_sys_ids.push(grDupe.getUniqueValue());
            }
        }
        
        arrDupeRecords.push(objDupeRecord);
    }
    
    return arrDupeRecords;
}

Usage

Example 1: Duplicate Incidents with sys_ids

Calling the function in this way will get a list of field values and sys_ids for records which are duplicates based on the Short description, Assignment group, and Assigned to fields all matching. It will not count any records with a blank Short description or Assigned to field as duplicates.

var dupeRecords;
dupeRecords = getMultiFieldDupes(
    'incident', //Table name
    ['short_description', 'assignment_group', 'assigned_to'], //Fields to check for duplicates
    ['short_description','assigned_to'], //Fields that must not be blank
    true //Get the sys_ids of the records with duplicates
);
//Print out the results:
gs.info(
    'Found ' + dupeRecords.length + ' duplicate types:\n' +
    JSON.stringify(dupeRecords, null, 2)
);

Output:

[
    {
        "dupeCount": "2",
        "fields": {
            "short_description": "Watchlist ticket",
            "assignment_group": "cfcbad03d711110050f5edcb9e61038f",
            "assigned_to": "713e4bd0dba90110495d70f33996198b"
        },
        "dupe_sys_ids": [
            "3c4cf41e1bd069900f679713b24bcbff",
            "b44cb01e1bd069900f679713b24bcb36"
        ]
    },
    {
        "dupeCount": "4",
        "fields": {
            "short_description": "Some Short Description",
            "assignment_group": "cfcbad03d711110050f5edcb9e61038f",
            "assigned_to": "713e4bd0dba90110495d70f33996198b"
        },
        "dupe_sys_ids": [
            "0a8fe93a1b50e9900f679713b24bcbb6",
            "0a9ab1041ba4ed900f679713b24bcb0d",
            "134542be1b90e9900f679713b24bcba6",
            "1511c27a1b90e9900f679713b24bcbb7"
        ]
    }
]

Example 2: Users with duplicate first+last names

This function-call will get a list of users who have the same first and last name. It will not get the sys_ids of each duplicate user, but will give you the list of first+last names corresponding to duplicates.

var dupeRecords = getMultiFieldDupes(
    'sys_user', //Table name
    ['first_name', 'last_name'],
    null, //No fields must be non-blank
    false //Don't get the sys_ids of the records with duplicates
);
gs.info(
    'Found ' + dupeRecords.length + ' duplicate types:\n' +
    JSON.stringify(dupeRecords, null, 2)
);

Output:

[
    {
        "dupeCount": "15",
        "fields": {
            "first_name": "",
            "last_name": ""
        }
    },
    {
        "dupeCount": "2",
        "fields": {
            "first_name": "John",
            "last_name": "Smith"
        }
    }
]

Example 3: Incidents with duplicate Short descriptions

Calling the function in this way will return the Incident Short descriptions which are duplicates. This is to demonstrate that this function works whether you specify one dupe field, or multiple.

This will also filter out any records with a blank Assignment group field, just to demonstrate that the blank field filter does not need to operate exclusively on a field which is included in your list of dupe-fields.

var dupeRecords = getMultiFieldDupes(
    'incident', //Table name
    ['short_description'], //Field to check for duplicates
    'assignment_group', //Field that must not be blank
    false //Don't get the sys_ids of the records with duplicates
);
gs.info(
    'Found ' + dupeRecords.length + ' duplicate types:\n' +
    JSON.stringify(dupeRecords, null, 2)
);

Output:

[
    {
        "dupeCount": "5",
        "fields": {
            "short_description": ""
        }
    },
    {
        "dupeCount": "2",
        "fields": {
            "short_description": "Reset my password"
        }
    }
]

Thanks for reading! If you enjoyed this article or found this tool helpful, please consider sharing it on your favorite social media so others can find it too! If you want to see more from SN Pro Tips and Tim Woodruff, consider subscribing below or following Tim on LinkedIn!