Preprocessor Commands

KDiff3 supports two preprocessor options.

Preprocessor-Command:: When any file is read, it will be piped through this external command. The output of this command will be visible instead of the original file. You can write your own preprocessor that fulfills your specific needs. Use this to cut away disturbing parts of the file, or to automatically correct the indentation etc.
Line-Matching Preprocessor-Command:: When any file is read, it will be piped through this external command. If a preprocessor-command (see above) is also specified, then the output of the preprocessor is the input of the line-matching preprocessor. The output will only be used during the line matching phase of the analysis. You can write your own preprocessor that fulfills your specific needs. Each input line must have a corresponding output line.

The idea is to allow the user greater flexibility while configuring the diff-result. But this requires an external program, and many users don't want to write one themselves. The good news is that very often sed or perl will do the job.

Example: Simple testcase: Consider file a.txt (6 lines):

      aa
      ba
      ca
      da
      ea
      fa

And file b.txt (3 lines):

      cg
      dg
      eg

Without a preprocessor the following lines would be placed next to each other:

      aa - cg
      ba - dg
      ca - eg
      da
      ea
      fa

This is probably not wanted since the first letter contains the actually interesting information. To help the matching algorithm to ignore the second letter we can use a line matching preprocessor command, that replaces 'g' with 'a':

   sed 's/g/a/'

With this command the result of the comparison would be:

      aa
      ba
      ca - cg
      da - dg
      ea - eg
      fa

Internally the matching algorithm sees the files after running the line matching preprocessor, but on the screen the file is unchanged. (The normal preprocessor would change the data also on the screen.)

sed Basics

This section only introduces some very basic features of sed. For more information see info:/sed or http://www.gnu.org/software/sed/manual/html_mono/sed.html. A precompiled version for Windows can be found at http://unxutils.sourceforge.net. Note that the following examples assume that the sed-command is in some directory in the PATH-environment variable. If this is not the case, you have to specify the full absolute path for the command.

Note

Also note that the following examples use the single quotation mark (') which won't work for Windows. On Windows you should use the double quotation marks (") instead.

In this context only the sed-substitute-command is used:

   sed 's/REGEXP/REPLACEMENT/FLAGS'

Before you use a new command within KDiff3, you should first test it in a console. Here the echo-command is useful. Example:

   echo abrakadabra | sed 's/a/o/'
   -> obrakadabra

This example shows a very simple sed-command that replaces the first occurance of "a" with "o". If you want to replace all occurances then you need the "g"-flag:

   echo abrakadabra | sed 's/a/o/g'
   -> obrokodobro

The "|"-symbol is the pipe-command that transfers the output of the previous command to the input of the following command. If you want to test with a longer file then you can use cat on Unix-like systems or type on Windows-like systems. sed will do the substitution for each line.

   cat filename | sed options

Examples For sed-Use In KDiff3

Ignoring Other Types Of Comments

Currently KDiff3 understands only C/C++ comments. Using the Line-Matching-Preprocessor-Command you can also ignore other types of comments, by converting them into C/C++-comments. Example: To ignore comments starting with "#", you would like to convert them to "//". Note that you also must enable the "Ignore C/C++-Comments" option to get an effect. An appropriate Line-Matching-Preprocessor-Command would be:

   sed 's/#/\/\//'

Since for sed the "/"-character has a special meaning, it is necessary to place the "\"-character before each "/" in the replacement-string. Sometimes the "\" is required to add or remove a special meaning of certain characters. The single quotation marks (') before and after the substitution-command are important now, because otherwise the shell will try to interpret some special characters like '#', '$' or '\' before passing them to sed. Note that on Windows you will need the double quotation marks (") here. Windows substitutes other characters like '%', so you might have to experiment a little bit.

Caseinsensitive Diff

Use the following Line-Matching-Preprocessor-Command to convert all input to uppercase:

   sed 's/\(.*\)/\U\1/'

Here the ".*" is a regular expression that matches any string and in this context matches all characters in the line. The "\1" in the replacement string refers to the matched text within the first pair of "$" and "$". The "\U" converts the inserted text to uppercase.

Ignoring Version Control Keywords

CVS and other version control systems use several keywords to insert automatically generated strings (info:/cvs/Keyword substitution). All of them follow the pattern "$KEYWORD generated text$". We now need a Line-Matching-Preprocessor-Command that removes only the generated text:

   sed 's/\$\(Revision\|Author\|Log\|Header\|Date\).*\$/\$\1\$/'

The "\|" separates the possible keywords. You might want to modify this list according to your needs. The "\" before the "$" is necessary because otherwise the "$" matches the end of the line.

While experimenting with sed you might come to understand and even like these regular expressions. They are useful because there are many other programs that also support similar things.

Ignoring Numbers

Ignoring numbers actually is a built-in option. But as another example, this is how it would look as a Line-Matching-Preprocessor-command.

   sed 's/[0123456789.-]//g'

Any character within '[' and ']' is a match and will be replaced with nothing.

Ignoring Certain Columns

Sometimes a text is very strictly formatted, and contains columns that you always want to ignore, while there are other columns you want to preserve for analysis. In the following example the first five columns (characters) are ignored, the next ten columns are preserved, then again five columns are ignored and the rest of the line is preserved.

   sed 's/.....\(..........\).....\(.*\)/\1\2/'

Each dot '.' matches any single character. The "\1" and "\2" in the replacement string refer to the matched text within the first and second pair of "$" and "$" denoting the text to be preserved.

Combining Several Substitutions

Sometimes you want to apply several substitutions at once. You can then use the semicolon ';' to separate these from each other. Example:

   echo abrakadabra | sed 's/a/o/g;s/\(.*\)/\U\1/'
   -> OBROKODOBRO

Using perl instead of sed

Instead of sed you might want to use something else like perl.

   perl -p -e 's/REGEXP/REPLACEMENT/FLAGS'

But some details are different in perl. Note that where sed needed "$" and "$" perl requires the simpler "(" and ")" without preceding '\'. Example:

   sed 's/\(.*\)/\U\1/'
   perl -p -e 's/(.*)/\U\1/'

Order Of Preprocessor Execution

The data is piped through all internal and external preprocessors in the following order:

Normal preprocessor,
Line-Matching-Preprocessor,
Ignore case (conversion to uppercase),
Detection of C/C++ comments,
Ignore numbers,
Ignore white space

The data after the normal preprocessor will be preserved for display and merging. The other operations only modify the data that the line-matching-diff-algorithm sees.

In the rare cases where you use a normal preprocessor note that the line-matching-preprocessor sees the output of the normal preprocessor as input.

Warning

The preprocessor-commands are often very useful, but as with any option that modifies your texts or hides away certain differences automatically, you might accidentally overlook certain differences and in the worst case destroy important data.

For this reason during a merge if a normal preprocessor-command is being used KDiff3 will tell you so and ask you if it should be disabled or not. But it won't warn you if a Line-Matching-Preprocessor-command is active. The merge will not complete until all conflicts are solved. If you disabled "Show White Space" then the differences that were removed with the Line-Matching-Preprocessor-command will also be invisible. If the Save-button remains disabled during a merge (because of remaining conflicts), make sure to enable "Show White Space". If you don't wan't to merge these less important differences manually you can select "Choose [A|B|C] For All Unsolved White space Conflicts" in the Merge-menu.