Please enable JavaScript to view this site.

Duplicate File Detective Help

Navigation: Duplicate File Detective Projects

Comparison Options

Scroll Prev Top Next More

The Comparison Options window provides granular control over how Duplicate File Detective's compares files. The window contains three tabs, as described below.

 

General

 

Compare file names - Names must match in order for files to be considered duplicates.

oIgnore whitespace and special characters during name comparison - Disregards all non-alphanumeric characters during file name comparisons (e.g. "_test.txt" and "test.txt" would be considered the same).

oIgnore numeric characters (digits) during name comparison - Disregards all numeric characters during file name comparisons (e.g. "test1.txt" and "test2.txt" would be considered the same). You can combine this with the special character option above so that files such as "test.txt" and "test(1).txt" are considered to be the same.

oMatching mode - Allows you to match all characters in the file name (the default) or a specific number of characters at the beginning or end. You can also choose to ignore a specific number of characters at the end of the file name.

Compare file extensions - Extensions must match in order for files to be considered duplicates.

Compare file sizes - Sizes must match in order for files to be considered duplicates.

oCompare file contents - The contents of files must match in order for files to be considered duplicates.

When hashing zip files, enumerate and hash the files they contain - Zip files (those with a .zip extension) often contain metadata that prevent them from responding well to normal file content comparisons. Using this option will cause the archived contents (e.g. the individual files) to be hashed independently of the zip file that contains them, improving comparison potential. You can also specify a password that will be used to access the contents of encrypted zip files.

Byte-for-byte content match confirmation - Confirms that matches identified by content hashing are identical at the byte level.

Compare last modified date and time - File modified date/time stamps must match in order for files to be considered duplicates.

oIgnore times - Only the date part of timestamps will be compared.

oIgnore seconds when comparing timestamps - Will compare date, hour, and minute portions of timestamps (but not seconds).

Compare music tags - The audio tags that you specify (see below) must match in order for files to be considered duplicates.

Compare parent folders up to this depth - When enabled, requires that parent folder names (up to the specified depth) also match. For example, enabling and configuring parent folder comparisons with a depth of "1" would cause "c:\temp\folder1\test.txt" to match "d:\temp2\folder1\test.txt" (because the immediate folder is the same), but would not match "d:\temp2\folder2\test.txt" (because the immediate parent folder is different).

 

Notes:

Files will be considered duplicates of one another only when all the chosen comparison options match.

Byte-for-byte content matching will slow the overall duplicate search process considerably, and is rarely necessary (see file hashing notes below).

When using file content comparison, combining it with other match options (such as file name and/or extension) will often improve performance by reducing the number of files that need to be hashed.

 

Hashing

 

When "Compare file contents" is selected on the General Tab of the Comparison Options window (see above), this tab can be used to specify precisely which hashing method is used to generate file content checksums.

 

A file hash is a numerical checksum value, derived through some mathematical formula, that represents the contents of the related file as a whole. Theoretically speaking, stronger file hash algorithms produce checksums that are more unique than weaker ones, and thus are more likely to correctly identify duplicate files. Generally, the stronger the file hashing algorithm, the longer it takes to produce a file checksum.

 

Duplicate File Detective supports the following file comparison hash types:

 

CRC32 - A quick, 32-bit checksum.

ADLER32 - Another 32-bit checksum, similar in accuracy to CRC32.

MD5 - A very accurate, slower 128-bit checksum.

SHA1 - Even more accurate, slower 160-bit checksum.

SHA256 - Even more accurate, slower 256-bit checksum.

SHA512 - Even more accurate, slower 512-bit checksum.

 

Stronger file content hashing algorithms (such as SHA1 and SHA256) are very unlikely to produce false positives (e.g. mistakenly identify two files as being identical to one another when they actually different). Even the smallest differences in file contents will (with overwhelming probability) result in completely different hashes due to a cryptographic concept known as the avalanche effect. If you must be absolutely certain that two files are identical, use the byte-for-byte content match confirmation, which validates file comparisons at the binary level.

 

File Matching options are project-specific, and are saved and loaded on a per-project basis.

 

Music Tags

 

Many types of audio files (including MP3, WMA, OGG, ASF, etc.) contain special data fields called tags. Tags were designed to store additional information about an audio file, such as the track title, artist, album name, genre, and more.

 

Audio tags can also be useful when searching for duplicate songs. File content comparison (through hashing or byte-by-byte analysis) is often ineffective at detecting duplicate audio files because their contents naturally tend to vary depending upon how (and when) the audio itself was captured - meaning that the files themselves are often not truly identical. However, we can compare audio tags to great effect - for example, if two music files have identical title and artist tags, they are very likely to be the same song.

 

When "Compare music tags" is selected in on the General Tab of the Comparison Options window (see above), this tab is used to specify precisely which tags are used to compare audio files. Duplicate File Detective supports a core set of audio tags which have been broadly adopted within the music industry (including artist, title, album, track, etc.). Audio tag comparisons are always performed in a case-insensitive manner (e.g. upper and lower case are ignored).

 

Duplicate File Detective supports extraction and comparison of audio tags from the following music file formats: MP3, Ogg Vorbis, FLAC, MPC, Speex, WavPack TrueAudio, Monkey's Audio (APE), WAV, AIFF, MP4 and ASF. Supported audio file extensions include .mp3, .ogg, .flac, .oga, .mpc, .wv, .spx, .tta, .m4a, .m4b, .m4p, .3g2, .mp4, .mv4, .ap3, .wma, .asf, .aif, .aiff, and .wav.

 

Important: Audio files are not required to contain tag data (most do), and this duplicate detection method will not work with files that don't.

 

You can choose how to handle the comparison of music tags with empty values. You can either elect to always fail the comparison (in which case all selected tags must have values in order to be considered a match) or you can elect not to fail the comparison if at least one other non-empty tag matches (for example, if the title tag matches but the artist tag is empty).

 

There's also an option that causes music tags to be extracted from compatible file types even if those tags aren't used for comparison purposes. Enable this option if you want to see music tags in the results report output but don't wish to compare music tag data values.

 


 

See also:

File Checksums