Related Information Examples & Tutorials

Scanf

The Scan For field in the Import/Export field specification is used to pull out parts of a string before importing or exporting.

This field uses the scanf function in 'C' which offers very flexible ways to parsing strings.

tip.gif We use a single string argument, so the simplest expression you can use in the Scan For field is %s. This would extract the first whitespace delimited text in the target string.

Description

The scanf function reads data from the standard input stream, and writes the data into the location given by argument. Each argument must be a pointer to a variable; this variable must be of a type that corresponds to a type specifier in format. If copying takes place between strings that overlap, the behavior is undefined.

Top of page.

Format Specification

A format specification has the following form:

% [ * ] [ width ] [ { h | l | L } ] type

The format argument specifies the interpretation of the input and can contain one or more of the following.

* Whitespace characters: blank (' '); tab ('\t'); or newline ('\n').

A whitespace character causes scanf to read, but not store, all consecutive whitespace characters in the input up to the next non-whitespace character. One whitespace character in the format matches any number (including 0) and combination of whitespace characters in the input.

* Non whitespace characters, except for the percent sign (%).

A non whitespace character causes scanf to read, but not store, a matching non whitespace character. If the next character in stdin does not match, scanf terminates.

* Format specifications, introduced by the percent sign (%).

A format specification causes scanf to read and convert characters in the input into values of a specified type. The value is assigned to an argument in the argument list.

The format is read from left to right. Characters outside format specifications are expected to match the sequence of characters in stdin; the matching characters in stdin are scanned but not stored. If a character in stdin conflicts with the format specification, scanf terminates, and the character is left in stdin as if it had not been read.

When the first format specification is encountered, the value of the first input field is converted according to this specification. It is then stored in the location that is specified by the first argument. The second format specification causes the second input field to be converted and stored in the second argument, and so on through the end of the format string.

An input field is defined as all characters up to the first whitespace character (space, tab, or newline), or up to the first character that cannot be converted according to the format specification, or, finally, until the field width (if specified) is reached. If there are too many arguments for the given specifications, the extra arguments are evaluated but ignored. The results are unpredictable if there are not enough arguments for the format specification.

Each field of the format specification is a single character or a number signifying a particular format option. The type character, which appears after the last optional format field, determines whether the input field is interpreted as a character, a string, or a number. The simplest format specification contains only the percent sign and a type character (for example, %s). If a percent sign (%) is followed by a character that has no meaning as a format-control character, that character and the following characters (up to the next percent sign) are treated as an ordinary sequence of characters; in other words, a sequence of characters that must match the input. For example, to specify that a percent-sign character is to be input, use %%.

An asterisk (*) following the percent sign suppresses assignment of the next input field, which is interpreted as a field of the specified type. The field is scanned but not stored. See the examples at the end of this document for clarification of this point.

Top of page.

Type

The type character is the only required format field; it appears after any optional format fields. The type character determines whether the associated argument is interpreted as a character, string, or number.

tip.gif In the Collect! "Scan For" function, only the 's' type specifier is used. A 1024 character buffer is used to store the text extracted by the scanf function. The resulting text is then read into a specified field, or exported to file based on standard Collect! formatting specifiers.

Type Characters for scanf functions
CharacterType of Input ExpectedType of Argument
cWhen used with scanf functions, specifies single-byte character. Whitespace characters that are ordinarily skipped are read when c is specified. To read next non whitespace single-byte character, use %1s; to read next non whitespace wide character, use %1ws. Pointer to char when used with scanf functions, pointer to wchar_t when used with wscanf functions.
CWhen used with scanf functions, specifies wide character. Whitespace characters that are ordinarily skipped are read when C is specified. To read next non whitespace single-byte character, use %1s; to read next non whitespace wide character, use %1ws. Pointer to wchar_t when used with scanf functions, pointer to char when used with wscanf functions.
dDecimal integer. Pointer to int.
i Decimal, hexadecimal, or octal integer.Pointer to int.
o Octal integer.Pointer to int.
u Unsigned decimal integer.Pointer to unsigned int.
x Hexadecimal integer.Pointer to int.
e, E, f, g, G Floating-point value consisting of optional sign (+ or -), series of one or more decimal digits containing decimal point, and optional exponent ('e' or 'E') followed by an optionally signed integer value. Pointer to float.
nNo input read from stream or buffer. Pointer to int, into which is stored number of characters successfully read from stream or buffer up to that point in current call to scanf functions or wscanf functions.
sString, up to first whitespace character (space, tab or newline). To read strings not delimited by space characters, use set of square brackets ([ ]), as discussed following Table R.7. When used with scanf functions, signifies single-byte character array; when used with wscanf functions, signifies wide-character array. In either case, character array must be large enough for input field plus terminating null character, which is automatically appended.
SString, up to first whitespace character (space, tab or newline). To read strings not delimited by space characters, use set of square brackets ([ ]), as discussed preceding this table. When used with scanf functions, signifies wide-character array; when used with wscanf functions, signifies single-byte character array. In either case, character array must be large enough for input field plus terminating null character, which is automatically appended.

The types c, C, s, and S are Microsoft extensions and are not ANSI-compatible.

Thus, to read single-byte or wide characters with scanf functions and wscanf functions, use format specifiers as follows.

To Read Character As Use This Function With These Format Specifiers
single bytescanf functionsc, hc, or hC
widescanf functionsC, lc, or lC

To scan strings with scanf functions, and wscanf functions, use the prefixes h and l analogously with format type-specifiers s and S.

Top of page.

Width

Width is a positive decimal integer which controls the maximum number of characters to be read from stdin. No more than width characters are converted and stored at the corresponding argument. Fewer than width characters may be read if a whitespace character (space, tab, or newline) or a character that cannot be converted according to the given format occurs before width is reached.

The optional prefixes h, l, and L indicate the size of the argument (long or short, single-byte character or wide character, depending upon the type character that they modify). These format-specification characters are used with type characters in scanf or wscanf functions to specify interpretation of arguments as shown in the table below. The type prefixes h, l, and L are Microsoft extensions and are not ANSI-compatible. The type characters and their meanings are described in ANSI C documentation.

Size Prefixes for scanf and wscanf Format-Type Specifiers
To SpecifyUse Prefix With Type Specifier
doublele, E, f, g, or G
long intld, i, o, x, or X
long unsigned intlu
short inthd, i, o, x, or X
short unsigned inthu
Single-byte character with scanfhc or C
Single-byte character with wscanfhc or C
Wide character with scanflc or C
Wide character with wscanflc, or C
Single-byte character string with scanfhs or S
Wide-character string with scanfls or S

Top of page.

Examples

Following are examples of the use of scanf functions.

Scan For: %s // Reads a string.

tip.gif In the Collect! Scan For function only the 's' type specifier is used. A 1024 character buffer is used to store the text extracted by the scanf function. The resulting text is then read into a specified field, or exported to file based on standard Collect! formatting specifiers.

To read strings not delimited by space characters, a set of characters in brackets ([ ]) can be substituted for the s (string) type character. The corresponding input field is read up to the first character that does not appear in the bracketed character set. If the first character in the set is a caret (^), the effect is reversed. The input field is read up to the first character that does appear in the rest of the character set.

Note that %[a-z] and %[z-a] are interpreted as equivalent to %[abcde...z]. This is a common scanf function extension, but note that the ANSI standard does not require it.

To store a string without storing a terminating null character ('\0'), use the specification %nc where n is a decimal integer. In this case, the c type character indicates that the argument is a pointer to a character array. The next n characters are read from the input stream into the specified location, and no null character ('\0') is appended. If n is not specified, its default value is 1.

The scanf function scans each input field, character by character. It may stop reading a particular input field before it reaches a space character for a variety of reasons.

1. The specified width has been reached.

2. The next character cannot be converted as specified.

3. The next character conflicts with a character in the control string that it is supposed to match.

4. The next character fails to appear in a given character set.

For whatever reason, when the scanf function stops reading an input field, the next input field is considered to begin at the first unread character. The conflicting character, if there is one, is considered unread and is the first character of the next input field or the first character in subsequent read operations.

Top of page.

Example 1

Scan For: %*s %s // Omits the first string and reads the next string.

This command will skip over the first string and output the second. This is useful for extracting the last name from a name string such as John Doe.

Input: John Doe

Using %*s %s the output is Doe.
Using %s the output is John.

Top of page.

Example 2

Scan For: %*s %[a-zA-Z,.] // Omits the first string and reads all the characters in the [ ] and stops at the first unknown character.

This command will skip over the first string and output the remaining text. This is useful for extracting the first string from a field and bringing in the remaining data from a Debtor Company such as 567 Collections, Inc.

Input: 567 Collections, Inc.

Using %*s %[a-zA-Z,.] the output is Collections, Inc.

Top of page.

Example 3

Scan For: %[^*] // Brings in all characters before the first [*].

This command will output all the text from a field that is before the first [*] character. This is useful for extracting text from a field with extra unneeded text such as John Doe **.

Input: John Doe **

Using %[^*] the output is John Doe

Top of page.

Example 4

Scan For: %[^,] // Outputs all the text before the first comma.

This command will all the text before the first comma.

This is useful for extracting a last name with a generation from a Name field such as Doe III, John.

Input: Doe III, John

Using %[^,] the output is Doe III

Top of page.

Example 5

Scan For: %*[^,]%*[,]%[^\"] // Omits everything before the comma. Then omits the comma. Lastly outputs all remaining text to end of field.

This command will skip over all the text before and including the comma, then output the remaining text. This is useful for extracting a first name with middle names from a Name field such as Doe, John Harry William.

Input: Doe, John Harry William

Using %*[^,]%*[,]%[^\"] the output is John Harry William

Top of page.

Phone Number Import

You can use scanf strings in imports for outputting different parts of phone numbers. The commands will extract a specific number of characters from a phone number string.

Your import map needs three fields - Area Code, Exchange and Number. You can use the append, default values and other parameters in the field specifications.

Scan For: %3s // Outputs Area Code

Scan For: %*3s %3s // Outputs Exchange

Scan For: %*6s %4 // Outputs Number

tip.gif You cannot use a default value like a dash in the field specification using the scanf. They must be a separate field specification.

Input: 2503910466

Using %3s for Area Code, the output is 250


Using %*3s %3s for Exchange, the output is 391


Using %*6s %4 for Number, the output is 0466

Top of page.

See Also

- File Format Specification
- How To Use Import/Export
- Import Menu
- Export Menu
- Import Field Specification
- Import/Export Topics

Top of page.

Was this page helpful? Do you have any comments on this document? Can we make it better? If so how may we improve this page.

Please click this link to send us your comments: helpinfo@collect.org