Previously, we covered a simple introduction to the language of C. Today, we are going to delve into the characters. We are now going to do some related programs for processing character data. You will find that many programs are just expanded version of the following prototypes that we are going to discuss here:
Text input and outputs is dealt with as streams of characters. A text stream is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character.
The standard library in C provides several functions for reading and writing one character at a time, of which getchar and putchar area the simplest.
Each time it is called,
We can have some surprising amount of useful codes given just the getchar and putchar. Consider the following file copying program.
The relational operator != means “not equal to”
Why is c declared as int type?
What appears to be characters on the keyboard or screen are actually stored internally just as a bit pattern. (refer to the ASCII code of this post) The type char is specifically meant for storing character data, but any integer type can be used. We used int for a very special reason.
It has something to do with distinguishing the end of an input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that is not a real character. This value is called EOF (“end of file”). So we declared c to be big enough to hold any value that getchar returns. Hence, we use int.
EOF is actually an integer symbolic constant in <stdio.h> library. By using the symbolic constant, we are assured that nothing in the program depends on the specific numeric value.
Concise version of the program
An experienced C programmer can concisely rewrite the program we presented as,
The assignment can appear as part of a larger expression. Just as what is presented above. The while gets a character, assigns it to c, then test whether the character was the end-of-file signal. If it was not, the body of the while is executed, printing the character. The while then repeats.
This version shrinks the program into smaller lines, and also centralizes the input. It is more compact and easier to read. You’ll see this style more often.
The precedence of != is higher than that of = , which means that in the absence of the parentheses the relational test != would be done before assignment.
This next program counts the number of characters inputted:
The new operator ++nc means increment by one. You can write nc = nc+1 but the thing about ++nc it that it is more precise and efficient. (Similarly, there is a corresponding operator – which means decrement by 1).
Note that the operators ++ and – can be either a prefix operator (nc) or a postfix (nc).
The conversion specification %ld implies that we are printing a long integer.
It is possible to cope with even bigger numbers using double.
Version 2 of the Line Counting Program
We’ve change the while loop with for loop for a more concise and efficient presentation of the logic. For float data, we’ve used %.0f for conversion specification, which suppresses printing of the decimal point.
Note that the body of the for loop is empty, all the work is done in the test and increment parts. The isolated semicolon. is called the null statement.
Here is the output for these program, I’ve used Ctrl+Z to end the file.
This next program counts the number of inputted lines. Remember that standard libraries ensures that an input text stream appears as a sequence of lines, each terminated by a newline.
Our line counting program is,
The expected output would look like this:
The fourth in our series of useful programs counts lines, words, and characters, with the loose definition that a word is any sequence of characters that does not contain a blank, tab, or newline.
Example:
The phrase “steem is going to the the moon” is consist of a single line (1), 7 words, and 27 characters.
The code for this program is given below:
The state OUT implies that the program is currently not examining a word; it is “outside a word”. We prefer the symbolic constants OUT and IN instead of 0 and 1 to make the program more readable. (You’ll appreciate this technique when you start writing larger programs). The line,
sets all three variables to zero. This assignment can be represented more clearly if we had written,
But the first one offers more real life expression.
The operator
says “if c is blank OR c is a newline OR c is a tab”. There is a corresponding operator
In this example, we show an else which specifies an alternative action if the condition part of an if statement is false.
The general form is
Disclaimer: this article is a summary of section 1.5 from the book The C Programming Language (ANSI C): by Brian Kernighan and Dennis Ritchie, the content apart from rephrasing is identical, most of the equations are screenshots of the book and the same line of codes are treated.