Scanf
The Scan For field in the Import/Export field specification is
used to pull out parts of a string before importing or exporting.
This field uses the scanf function in 'C' which offers very flexible
ways to parsing strings.
We use a single string argument, so the simplest
expression you can use in the Scan For field is %s.
This would extract the first whitespace delimited text
in the target string.
Description
The scanf function reads data from the standard input stream, and
writes the data into the location given by argument. Each argument
must be a pointer to a variable; this variable must be of a type that
corresponds to a type specifier in format. If copying takes place
between strings that overlap, the behavior is undefined.
Format Specification
A format specification has the following form:
% [ * ] [ width ] [ { h | l | L } ] type
The format argument specifies the interpretation of the input and
can contain one or more of the following.
* Whitespace characters: blank (' '); tab ('\t'); or newline ('\n').
A whitespace character causes scanf to read, but not store,
all consecutive whitespace characters in the input up to the
next non-whitespace character. One whitespace character in
the format matches any number (including 0) and combination of
whitespace characters in the input.
* Non whitespace characters, except for the percent sign (%).
A non whitespace character causes scanf to read, but not store,
a matching non whitespace character. If the next character in
stdin does not match, scanf terminates.
* Format specifications, introduced by the percent sign (%).
A format specification causes scanf to read and convert
characters in the input into values of a specified type.
The value is assigned to an argument in the argument list.
The format is read from left to right. Characters outside
format specifications are expected to match the sequence of
characters in stdin; the matching characters in stdin are
scanned but not stored. If a character in stdin conflicts
with the format specification, scanf terminates, and the
character is left in stdin as if it had not been read.
When the first format specification is encountered, the value of
the first input field is converted according to this specification.
It is then stored in the location that is specified by the first
argument. The second format specification causes the second
input field to be converted and stored in the second argument,
and so on through the end of the format string.
An input field is defined as all characters up to the first whitespace
character (space, tab, or newline), or up to the first character that
cannot be converted according to the format specification, or,
finally, until the field width (if specified) is reached. If there are
too many arguments for the given specifications, the extra
arguments are evaluated but ignored. The results are
unpredictable if there are not enough arguments for the format
specification.
Each field of the format specification is a single character or a
number signifying a particular format option. The type character,
which appears after the last optional format field, determines
whether the input field is interpreted as a character, a string, or
a number. The simplest format specification contains only the
percent sign and a type character (for example, %s). If a percent
sign (%) is followed by a character that has no meaning as a
format-control character, that character and the following
characters (up to the next percent sign) are treated as an ordinary
sequence of characters; in other words, a sequence of characters
that must match the input. For example, to specify that a
percent-sign character is to be input, use %%.
An asterisk (*) following the percent sign suppresses assignment
of the next input field, which is interpreted as a field of the
specified type. The field is scanned but not stored. See the
examples at the end of this document for clarification of this point.
Type
The type character is the only required format field; it appears
after any optional format fields. The type character determines
whether the associated argument is interpreted as a character,
string, or number.
In the Collect! "Scan For" function, only the 's' type
specifier is used. A 1024 character buffer is used to store the text
extracted by the scanf function. The resulting text is then read into
a specified field, or exported to file based on standard Collect!
formatting specifiers.
Type Characters for scanf functions |
Character | Type of Input Expected | Type of Argument |
c | When used with scanf functions, specifies single-byte character. |
Whitespace characters that are ordinarily skipped
are read when c is specified. To read next non whitespace
single-byte character, use %1s; to read next non whitespace
wide character, use %1ws. Pointer to char when used with
scanf functions, pointer to wchar_t when used with
wscanf functions. |
C | When used with scanf functions,
specifies wide character. |
Whitespace characters that are ordinarily skipped are read
when C is specified. To read next non whitespace single-byte
character, use %1s; to read next non whitespace wide character,
use %1ws. Pointer to wchar_t when used with scanf
functions, pointer to char when used with wscanf functions. |
d | Decimal integer. |
Pointer to int. |
i |
Decimal, hexadecimal, or octal integer. | Pointer to int. |
o |
Octal integer. | Pointer to int. |
u |
Unsigned decimal integer. | Pointer to unsigned int. |
x |
Hexadecimal integer. | Pointer to int. |
e, E, f, g, G |
Floating-point value consisting of optional sign (+ or -), series of one
or more decimal digits containing decimal point, and optional
exponent ('e' or 'E') followed by an optionally signed integer value. |
Pointer to float. |
n | No input read from stream or buffer. |
Pointer to int, into which is stored number of characters successfully read
from stream or buffer up to that point in current call to scanf functions or
wscanf functions. |
s | String, up to first whitespace
character (space, tab or newline). |
To read strings not delimited by space characters, use set
of square brackets ([ ]), as discussed following Table R.7. When used
with scanf functions, signifies single-byte character array; when used
with wscanf functions, signifies wide-character array. In either case, character
array must be large enough for input field plus terminating
null character, which is automatically appended. |
S | String, up to first whitespace
character (space, tab or newline). |
To read strings not delimited by space characters, use set
of square brackets ([ ]), as discussed preceding this table. When used
with scanf functions, signifies wide-character array; when used with
wscanf functions, signifies single-byte character array. In either case, character
array must be large enough for input field plus terminating
null character, which is automatically appended. |
The types c, C, s, and S are Microsoft extensions and are not
ANSI-compatible.
Thus, to read single-byte or wide characters with scanf functions
and wscanf functions, use format specifiers as follows.
To Read Character As |
Use This Function |
With These Format Specifiers |
single byte | scanf functions | c, hc, or hC |
wide | scanf functions | C, lc, or lC |
To scan strings with scanf functions, and wscanf functions, use
the prefixes h and l analogously with format type-specifiers s and S.
Width
Width is a positive decimal integer which controls the maximum
number of characters to be read from stdin. No more than width
characters are converted and stored at the corresponding
argument. Fewer than width characters may be read if a
whitespace character (space, tab, or newline) or a character that
cannot be converted according to the given format occurs before
width is reached.
The optional prefixes h, l, and L indicate the size of the argument
(long or short, single-byte character or wide character, depending
upon the type character that they modify). These format-specification
characters are used with type characters in scanf or wscanf functions
to specify interpretation of arguments as shown in the table below.
The type prefixes h, l, and L are Microsoft extensions and are not
ANSI-compatible. The type characters and their meanings are described
in ANSI C documentation.
Size Prefixes for scanf and wscanf Format-Type Specifiers |
To Specify | Use Prefix |
With Type Specifier |
double | l | e, E, f, g, or G |
long int | l | d, i, o, x, or X |
long unsigned int | l | u |
short int | h | d, i, o, x, or X |
short unsigned int | h | u |
Single-byte character with scanf | h | c or C |
Single-byte character with wscanf | h | c or C |
Wide character with scanf | l | c or C |
Wide character with wscanf | l | c, or C |
Single-byte character string with scanf | h | s or S |
Wide-character string with scanf | l | s or S |
Examples
Following are examples of the use of scanf functions.
Scan For: %s
// Reads a string.
In the Collect! Scan For function only the 's' type specifier
is used. A 1024 character buffer is used to store the text extracted
by the scanf function. The resulting text is then read into a specified
field, or exported to file based on standard Collect! formatting specifiers.
To read strings not delimited by space characters, a set of characters
in brackets ([ ]) can be substituted for the s (string) type character.
The corresponding input field is read up to the first character that
does not appear in the bracketed character set. If the first character
in the set is a caret (^), the effect is reversed. The input field is
read up to the first character that does appear in the rest of the
character set.
Note that %[a-z] and %[z-a] are interpreted as equivalent to %[abcde...z].
This is a common scanf function extension, but note that the ANSI
standard does not require it.
To store a string without storing a terminating null character ('\0'), use
the specification %nc where n is a decimal integer. In this case, the c
type character indicates that the argument is a pointer to a character
array. The next n characters are read from the input stream into the
specified location, and no null character ('\0') is appended. If n is
not specified, its default value is 1.
The scanf function scans each input field, character by character. It
may stop reading a particular input field before it reaches a space
character for a variety of reasons.
1. The specified width has been reached.
2. The next character cannot be converted as specified.
3. The next character conflicts with a character in the control string
that it is supposed to match.
4. The next character fails to appear in a given character set.
For whatever reason, when the scanf function stops reading an
input field, the next input field is considered to begin at the
first unread character. The conflicting character, if there is
one, is considered unread and is the first character of the next
input field or the first character in subsequent read operations.
Example 1
Scan For: %*s %s
// Omits the first string and reads the next string.
This command will skip over the first string and output the second.
This is useful for extracting the last name from a name string
such as John Doe.
Input: John Doe
Using %*s %s the output is Doe.
Using %s the output is John.
Example 2
Scan For: %*s %[a-zA-Z,.]
// Omits the first string and reads all the characters in the [ ] and
stops at the first unknown character.
This command will skip over the first string and output the remaining text.
This is useful for extracting the first string from a field and bringing in the
remaining data from a Debtor Company such as 567 Collections, Inc.
Input: 567 Collections, Inc.
Using %*s %[a-zA-Z,.] the output is Collections, Inc.
Example 3
Scan For: %[^*]
// Brings in all characters before the first [*].
This command will output all the text from a field that is before the first [*]
character. This is useful for extracting text from a field with extra unneeded
text such as John Doe **.
Input: John Doe **
Using %[^*] the output is John Doe
Example 4
Scan For: %[^,]
// Outputs all the text before the first comma.
This command will all the text before the first comma.
This is useful for extracting a last name with a generation from a Name field
such as Doe III, John.
Input: Doe III, John
Using %[^,] the output is Doe III
Example 5
Scan For: %*[^,]%*[,]%[^\"]
// Omits everything before the comma. Then omits the
comma. Lastly outputs all remaining text to end of field.
This command will skip over all the text before and including the comma, then
output the remaining text. This is useful for extracting a first name with middle
names from a Name field such as Doe, John Harry William.
Input: Doe, John Harry William
Using %*[^,]%*[,]%[^\"] the output is John Harry William
Phone Number Import
You can use scanf strings in imports
for outputting different parts of phone numbers. The
commands will extract a specific number of characters
from a phone number string.
Your import map needs three fields - Area Code,
Exchange and Number. You can use the append, default
values and other parameters in the field specifications.
Scan For: %3s // Outputs Area Code
Scan For: %*3s %3s // Outputs Exchange
Scan For: %*6s %4 // Outputs Number
You cannot use a default value like a dash in the
field specification using the scanf.
They must be a separate field specification.
Input: 2503910466
Using %3s for Area Code, the output
is 250
Using %*3s %3s for Exchange, the
output is 391
Using %*6s %4 for Number, the
output is 0466
See Also
- File Format Specification
- How To Use Import/Export
- Import Menu
- Export Menu
- Import Field Specification
- Import/Export Topics
|
Was this page helpful? Do you have any comments on this document? Can we make it better? If so how may we improve this page.
Please click this link to send us your comments: helpinfo@collect.org