How To Use Scanf
The Scan For field in the Import/Export field specification is used to pull out parts of a string
before importing or exporting.
This field uses the scanf function in 'C' which offers very flexible ways to parsing strings.
We use a single string argument, so the simplest expression you can use in the Scan For
field is %s. This would extract the first white space delimited text in the target string.
Description
The scanf function reads data from the standard input stream, and writes the data into the location
given by argument. Each argument must be a pointer to a variable; this variable must be of a type
that corresponds to a type specifier in format. If copying takes place between strings that overlap,
the behavior is undefined.
Format Specification
A format specification has the following form:
% [ * ] [ width ] [ { h | l | L } ] type
The format argument specifies the interpretation of the input and can contain one or more of the
following:
- White space characters: blank (' '); tab ('\t'); or newline ('\n').
A white space character causes scanf to read, but not store, all consecutive white space
characters in the input up to the next non-white space character. One white space character
in the format matches any number (including 0) and combination of white space characters in
the input.
- Non white space characters, except for the percent sign (%).
A non white space character causes scanf to read, but not store, a matching non white space
character. If the next character in stdin does not match, scanf terminates.
- Format specifications, introduced by the percent sign (%).
A format specification causes scanf to read and convert characters in the input into values of a
specified type. The value is assigned to an argument in the argument list.
The format is read from left to right. Characters outside format specifications are expected to
match the sequence of characters in stdin; the matching characters in stdin are scanned but not
stored. If a character in stdin conflicts with the format specification, scanf terminates, and the
character is left in stdin as if it had not been read.
When the first format specification is encountered, the value of the first input field is converted
according to this specification. It is then stored in the location that is specified by the first
argument. The second format specification causes the second input field to be converted and stored
in the second argument, and so on through the end of the format string.
An input field is defined as all characters up to the first white space character (space, tab, or
newline), or up to the first character that cannot be converted according to the format specification,
or, finally, until the field width (if specified) is reached. If there are too many arguments for the
given specifications, the extra arguments are evaluated but ignored. The results are unpredictable
if there are not enough arguments for the format specification.
Each field of the format specification is a single character or a number signifying a particular
format option. The type character, which appears after the last optional format field, determines
whether the input field is interpreted as a character, a string, or a number. The simplest format
specification contains only the percent sign and a type character (for example, %s). If a percent
sign (%) is followed by a character that has no meaning as a format-control character, that character
and the following characters (up to the next percent sign) are treated as an ordinary sequence of
characters; in other words, a sequence of characters that must match the input. For example, to
specify that a percent-sign character is to be input, use %%.
An asterisk (*) following the percent sign suppresses assignment of the next input field, which is
interpreted as a field of the specified type. The field is scanned but not stored. See the examples
at the end of this document for clarification of this point.
Type
The type character is the only required format field; it appears after any optional format fields.
The type character determines whether the associated argument is interpreted as a character, string,
or number.
In the Collect! "Scan For" function, only the 's' type specifier is used. A 1024 character
buffer is used to store the text extracted by the scanf function. The resulting text is then
read into a specified field, or exported to file based on standard Collect! formatting
specifiers.
Type Characters for scanf functions |
Character |
Type of Input Expected |
Type of Argument |
c |
When used with scanf functions, specifies single-byte character. |
White space characters that are ordinarily skipped are read when c is specified. To read next
non white space single-byte character, use %1s; to read next non white space wide character,
use %1ws. Pointer to char when used with scanf functions, pointer to wchar_t when used with
wscanf functions. |
C |
When used with scanf functions, specifies wide character. |
White space characters that are ordinarily skipped are read when C is specified. To read next
non white space single-byte character, use %1s; to read next non white space wide character,
use %1ws. Pointer to wchar_t when used with scanf functions, pointer to char when used with
wscanf functions. |
d |
Decimal integer. |
Pointer to int. |
i |
Decimal, hexadecimal, or octal integer. |
Pointer to int. |
o |
Octal integer. |
Pointer to int. |
u |
Unsigned decimal integer. |
Pointer to unsigned int. |
x |
Hexadecimal integer. |
Pointer to int. |
e, E, f, g, G |
Floating-point value consisting of optional sign (+ or -), series of one or more decimal
digits containing decimal point, and optional exponent ('e' or 'E') followed by an optionally
signed integer value. |
Pointer to float. |
n |
No input read from stream or buffer. |
Pointer to int, into which is stored number of characters successfully read from stream or
buffer up to that point in current call to scanf functions or wscanf functions. |
s |
String, up to first white space character (space, tab or newline). |
To read strings not delimited by space characters, use set of square brackets ([ ]), as
discussed following Table R.7. When used with scanf functions, signifies single-byte
character array; when used with wscanf functions, signifies wide-character array. In either
case, character array must be large enough for input field plus terminating null character,
which is automatically appended. |
S |
String, up to first white space character (space, tab or newline). |
To read strings not delimited by space characters, use set of square brackets ([ ]), as
discussed preceding this table. When used with scanf functions, signifies wide-character
array; when used with wscanf functions, signifies single-byte character array. In either
case, character array must be large enough for input field plus terminating null character,
which is automatically appended. |
The types c, C, s, and S are Microsoft extensions and are not ANSI-compatible.
Thus, to read single-byte or wide characters with scanf functions and wscanf functions, use format
specifiers as follows.
To Read Character As |
Use This Function |
With These Format Specifiers |
single byte |
scanf functions |
c, hc, or hC |
wide |
scanf functions |
C, lc, or lC |
To scan strings with scanf functions, and wscanf functions, use the prefixes h and l analogously
with format type-specifiers s and S.
Width
Width is a positive decimal integer which controls the maximum number of characters to be read from
stdin. No more than width characters are converted and stored at the corresponding argument. Fewer
than width characters may be read if a white space character (space, tab, or newline) or a character
that cannot be converted according to the given format occurs before width is reached.
The optional prefixes h, l, and L indicate the size of the argument (long or short, single-byte
character or wide character, depending upon the type character that they modify). These
format-specification characters are used with type characters in scanf or wscanf functions to specify
interpretation of arguments as shown in the table below. The type prefixes h, l, and L are Microsoft
extensions and are not ANSI-compatible. The type characters and their meanings are described in
ANSI C documentation.
Size Prefixes for scanf and wscanf Format-Type Specifiers |
To Specify |
Use Prefix |
With Type Specifier |
double |
l |
e, E, f, g, or G |
long int |
l |
d, i, o, x, or X |
long unsigned int |
l |
u |
short int |
h |
d, i, o, x, or X |
short unsigned int |
h |
u |
Single-byte character with scanf |
h |
c or C |
Single-byte character with wscanf |
h |
c or C |
Wide character with scanf |
l |
c or C |
Wide character with wscanf |
l |
c, or C |
Single-byte character string with scanf |
h |
s or S |
Wide-character string with scanf |
l |
s or S |
Examples
Following are examples of the use of scanf functions.
Scan For: %s
Reads a string, up to the first white space.
In the Collect! Scan For function only the 's' type specifier is used. A 1024 character
buffer is used to store the text extracted by the scanf function. The resulting text is
then read into a specified field, or exported to file based on standard Collect! formatting
specifiers.
To read strings not delimited by space characters, a set of characters in brackets ([ ]) can be
substituted for the s (string) type character. The corresponding input field is read up to the
first character that does not appear in the bracketed character set. If the first character in the
set is a caret (^), the effect is reversed. The input field is read up to the first character that
does appear in the rest of the character set.
Note that %[a-z] and %[z-a] are interpreted as equivalent to %[abcde...z]. This is a common scanf
function extension, but note that the ANSI standard does not require it.
To store a string without storing a terminating null character ('\0'), use the specification %nc
where n is a decimal integer. In this case, the c type character indicates that the argument is a
pointer to a character array. The next n characters are read from the input stream into the specified
location, and no null character ('\0') is appended. If n is not specified, its default value is 1.
The scanf function scans each input field, character by character. It may stop reading a particular
input field before it reaches a space character for a variety of reasons.
- The specified width has been reached.
- The next character cannot be converted as specified.
- The next character conflicts with a character in the control string that it is supposed to
match.
- The next character fails to appear in a given character set.
For whatever reason, when the scanf function stops reading an input field, the next input field is
considered to begin at the first unread character. The conflicting character, if there is one, is
considered unread and is the first character of the next input field or the first character in
subsequent read operations.
EXAMPLE: OMIT FIRST STRING AND LOAD NEXT ONE
Scan For: %*s %s
Omits the first string and reads the next string.
This command will skip over the first string and output the second. This is useful for extracting
the last name from a name string such as John Doe.
Input: John Doe
Using %*s %s the output is Doe.
Using %s the output is John.
EXAMPLE: OMIT FIRST STRING THEN LOAD UNTIL UNSPECIFIED CHARACTER FOUND
Scan For: %*s %[a-zA-Z,.]
Omits the first string and reads all the characters in the [ ] and stops at the first unknown
character.
This command will skip over the first string and output the remaining text. This is useful for
extracting the first string from a field and bringing in the remaining data from a Debtor Company
such as 567 Collections, Inc.
Input: 567 Collections, Inc.
Using %*s %[a-zA-Z,.] the output is Collections, Inc.
EXAMPLE: LOAD ALL TEXT BEFORE AN ASTERISK
Scan For: %[^*]
Brings in all characters before the first [*].
This command will output all the text from a field that is before the first [*] character. This is
useful for extracting text from a field with extra unneeded text such as John Doe **.
Input: John Doe **
Using %[^*] the output is John Doe
EXAMPLE: LOAD ALL TEXT BEFORE A COMMA
Scan For: %[^,]
Outputs all the text before the first comma.
This command will all the text before the first comma.
This is useful for extracting a last name with a generation from a Name field such as Doe III, John.
Input: Doe III, John
Using %[^,] the output is Doe III
EXAMPLE: OMIT TEXT BEFORE COMMA AND LOAD REMAINING TEXT
Scan For: %*[^,]%*[,]%[^\"]
Omits everything before the comma. Then omits the comma. Lastly outputs all remaining text to end
of field.
This command will skip over all the text before and including the comma, then output the remaining
text. This is useful for extracting a first name with middle names from a Name field such as Doe,
John Harry William.
Input: Doe, John Harry William
Using %*[^,]%*[,]%[^\"] the output is John Harry William
EXAMPLE: PHONE NUMBER IMPORT
You can use scanf strings in imports for outputting different parts of phone numbers.
The commands will extract a specific number of characters from a phone number string.
Your import map needs three fields - Area Code, Exchange and Number. You can use the append, default
values and other parameters in the field specifications.
Scan For: %3s // Outputs Area Code
Scan For: %*3s %3s // Outputs Exchange
Scan For: %*6s %4s // Outputs Number
You cannot use a default value like a dash in the field specification using the
scanf. They must be a separate field specification.
Input: 2503910466
Using %3s for Area Code, the output is 250
Using %*3s %3s for Exchange, the output is 391
Using %*6s %4s for Number, the output is 0466
Troubleshooting
Scan For only supports 1 return result. If you try to load more than one result into a field,
Collect! will provide the below message indicating the Record Definition and Field that need to
be corrected.
ScanF Message
|
Was this page helpful? Do you have any comments on this document? Can we make it better? If so how may we improve this page.
Please click this link to send us your comments: helpinfo@collect.org