尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Modern C
Jens Gustedt
INRIA, FRANCE
ICUBE, STRASBOURG, FRANCE
E-mail address: jens gustedt inria fr
URL: http://icube-icps.unistra.fr/index.php/Jens_Gustedt
This is a preliminary version of this book compiled on October 27, 2015.
It contains feature complete versions of Levels 0, 1 and 2, and most of the material that I foresee for Level 4.
The table of contents already gives you a glimpse on what should follow for the rest.
You might find a more up to date version at
http://icube-icps.unistra.fr/index.php/File:ModernC.pdf (inline)
http://icube-icps.unistra.fr/img_auth.php/d/db/ModernC.pdf (download)
You may well share this by pointing others to my home page or one of the links above.
Since I don’t know yet how all of this will be published at the end, please don’t distribute the file itself.
If you represent a publishing house that would like to distribute this work under an open license, preferably
CC-BY, please drop me a note.
All rights reserved, Jens Gustedt, 2015
Special thanks go to the people that encouraged the writing of this book by providing me with constructive
feedback, in particular Cédric Bastoul, Lucas Nussbaum, Vincent Loechner, Kliment Yanev, Szabolcs Nagy and
Marcin Kowalczuk.
3
PRELIMINARIES. The C programming language has been around for a long time — the canonical reference
for it is the book written by its creators, Kernighan and Ritchie [1978]. Since then, C has been used in an
incredible number of applications. Programs and systems written in C are all around us: in personal computers,
phones, cameras, set-top boxes, refrigerators, cars, mainframes, satellites, basically in any modern device that has
a programmable interface.
In contrast to the ubiquitous presence of C programs and systems, good knowledge of and about C is
much more scarce. Even experienced C programmers often appear to be stuck in some degree of self-inflicted
ignorance about the modern evolution of the C language. A likely reason for this is that C is seen as an "easy
to learn" language, allowing a programmer with little experience to quickly write or copy snippets of code that
at least appear to do what it’s supposed to. In a way, C fails to motivate its users to climb to higher levels of
knowledge.
This book is intended to change that general attitude. It is organized in chapters called “Levels” that sum-
marize levels of familiarity with the C language and programming in general. Some features of the language are
presented in parts on earlier levels, and elaborated in later ones. Most notably, pointers are introduced at Level 1
but only explained in detail at Level 2. This leads to many forward references for impatient readers to follow.
As the title of this book suggests, today’s C is not the same language as the one originally designed by
its creators Kernighan and Ritchie (usually referred to as K&R C). In particular, it has undergone an important
standardization and extension process now driven by ISO, the International Standards Organization. This led to
three major publications of C standards in the years 1989, 1999 and 2011, commonly referred to as C89, C99 and
C11. The C standards committee puts a lot of effort into guaranteeing backwards compatibility such that code
written for earlier versions of the language, say C89, should compile to a semantically equivalent executable with
a compiler that implements a newer version. Unfortunately, this backwards compatibility has had the unwanted
side effect of not motivating projects that could benefit greatly from the new features to update their code base.
In this book we will mainly refer to C11, as defined in JTC1/SC22/WG14 [2011], but at the time of this
writing many compilers don’t implement this standard completely. If you want to compile the examples of this
book, you will need at least a compiler that implements most of C99. For the changes that C11 adds to C99,
using an emulation layer such as my macro package P99 might suffice. The package is available at http:
//p99.gforge.inria.fr/.
Programming has become a very important cultural and economic activity and C remains an important
element in the programming world. As in all human activities, progress in C is driven by many factors, corporate
or individual interest, politics, beauty, logic, luck, ignorance, selfishness, ego, sectarianism, ... (add your primary
motive here). Thus the development of C has not been and cannot be ideal. It has flaws and artifacts that can only
be understood with their historic and societal context.
An important part of the context in which C developed was the early appearance of its sister language
C++. One common misconception is that C++ evolved from C by adding its particular features. Whereas this is
historically correct (C++ evolved from a very early C) it is not particularly relevant today. In fact, C and C++
separated from a common ancestor more than 30 years ago, and have evolved separately ever since. But this
evolution of the two languages has not taken place in isolation, they have exchanged and adopted each other’s
concepts over the years. Some new features, such as the recent addition of atomics and threads have been designed
in a close collaboration between the C and C++ standard committees.
Nevertheless, many differences remain and generally all that is said in this book is about C and not C++.
Many code examples that are given will not even compile with a C++ compiler.
Rule A C and C++ are different, don’t mix them and don’t mix them up.
ORGANIZATION. This book is organized in levels. The starting level, encounter, will introduce you to the very
basics of programming with C. By the end of it, even if you don’t have much experience in programming, you
should be able to understand the structure of simple programs and start writing your own.
The acquaintance level details most principal concepts and features such as control structures, data types,
operators and functions. It should give you a deeper understanding of the things that are going on when you run
your programs. This knowledge should be sufficient for an introductory course in algorithms and other work at
that level, with the notable caveat that pointers aren’t fully introduced yet at this level.
The cognition level goes to the heart of the C language. It fully explains pointers, familiarizes you with
C’s memory model, and allows you to understand most of C’s library interface. Completing this level should
enable you to write C code professionally, it therefore begins with an essential discussion about the writing and
organization of C programs. I personally would expect anybody who graduated from an engineering school with
a major related to computer science or programming in C to master this level. Don’t be satisfied with less.
The experience level then goes into detail in specific topics, such as performance, reentrancy, atomicity,
threads and type generic programming. These are probably best discovered as you go, that is when you encounter
them in the real world. Nevertheless, as a whole they are necessary to round off the picture and to provide you
with full expertise in C. Anybody with some years of professional programming in C or who heads a software
project that uses C as its main programming language should master this level.
Last but not least comes ambition. It discusses my personal ideas for a future development of C. C as it
is today has some rough edges and particularities that only have historical justification. I propose possible paths
to improve on the lack of general constants, to simplify the memory model, and more generally to improve the
modularity of the language. This level is clearly much more specialized than the others, most C programmers can
probably live without it, but the curious ones among you could perhaps take up some of the ideas.
Contents
Level 0. Encounter 1
1. Getting started 1
1.1. Imperative programming 1
1.2. Compiling and running 3
2. The principal structure of a program 6
2.1. Grammar 6
2.2. Declarations 7
2.3. Definitions 9
2.4. Statements 10
Level 1. Acquaintance 13
Warning to experienced C programmers 13
3. Everything is about control 14
3.1. Conditional execution 15
3.2. Iterations 17
3.3. Multiple selection 20
4. Expressing computations 22
4.1. Arithmetic 22
4.2. Operators that modify objects 24
4.3. Boolean context 24
4.4. The ternary or conditional operator 26
4.5. Evaluation order 27
5. Basic values and data 28
5.1. Basic types 30
5.2. Specifying values 32
5.3. Initializers 34
5.4. Named constants 35
5.5. Binary representions 39
6. Aggregate data types 46
6.1. Arrays 46
6.2. Pointers as opaque types 51
6.3. Structures 52
6.4. New names for types: typedef 56
7. Functions 58
7.1. Simple functions 58
7.2. main is special 59
7.3. Recursion 61
8. C Library functions 66
8.1. Mathematics 70
8.2. Input, output and file manipulation 70
8.3. String processing and conversion 79
8.4. Time 83
8.5. Runtime environment settings 85
5
6 CONTENTS
8.6. Program termination and assertions 88
Level 2. Cognition 91
9. Style 91
9.1. Formatting 91
9.2. Naming 92
10. Organization and documentation 95
10.1. Interface documentation 97
10.2. Implementation 99
10.3. Macros 99
10.4. Pure functions 101
11. Pointers 104
11.1. Address-of and object-of operators 105
11.2. Pointer arithmetic 106
11.3. Pointers and structs 108
11.4. Opaque structures 110
11.5. Array and pointer access are the same 111
11.6. Array and pointer parameters are the same 111
11.7. Null pointers 113
12. The C memory model 113
12.1. A uniform memory model 114
12.2. Unions 114
12.3. Memory and state 116
12.4. Pointers to unspecific objects 117
12.5. Implicit and explicit conversions 118
12.6. Alignment 119
13. Allocation, initialization and destruction 121
13.1. malloc and friends 121
13.2. Storage duration, lifetime and visibility 129
13.3. Initialization 134
13.4. Digression: a machine model 136
14. More involved use of the C library 138
14.1. Text processing 138
14.2. Formatted input 145
14.3. Extended character sets 146
14.4. Binary files 153
15. Error checking and cleanup 154
15.1. The use of goto for cleanup 156
Level 3. Experience 159
15.2. Project organization 159
16. Performance 159
16.1. Inline functions 159
16.2. Avoid aliasing: restrict qualifiers 159
16.3. Functionlike macros 160
16.4. Optimization 160
16.5. Measurement and inspection 160
17. Variable argument lists 160
17.1. va_arg functions 160
17.2. __VA_ARGS__ macros 160
17.3. Default arguments 160
18. Reentrancy and sharing 160
18.1. Short jumps 160
CONTENTS 7
18.2. Long jumps 162
18.3. Signal handlers 162
18.4. Atomic data and operations 162
19. Threads 162
20. Type generic programming 162
21. Runtime constraints 162
Level 4. Ambition 163
22. The rvalue overhaul 164
22.1. Introduce register storage class in file scope 164
22.2. Typed constants with register storage class and const qualification 166
22.3. Extend ICE to register constants 169
22.4. Unify designators 171
22.5. Functions 174
23. Improve type generic expression programming 174
23.1. Storage class for compound literals 175
23.2. Inferred types for variables and functions 176
23.3. Anonymous functions 179
24. Improve the C library 181
24.1. Add requirements for sequence points 181
24.2. Provide type generic interfaces for string search functions 182
25. Modules 184
25.1. C needs a specific approach 184
25.2. All is about naming 184
25.3. Modular C features 185
26. Simplify the object and value models 186
26.1. Remove objects of temporary lifetime 186
26.2. Introduce comparison operator for object types 186
26.3. Make memcpy and memcmp consistent 187
26.4. Enforce representation consistency for _Atomic objects 187
26.5. Make string literals char const[] 187
26.6. Default initialize padding to 0 187
26.7. Make restrict qualification part of the function interface 187
26.8. References 188
27. Contexts 188
27.1. Introduce evaluation contexts in the standard 188
27.2. Convert object pointers to void* in unspecific context 188
27.3. Introduce nullptr as a generic null pointer constant and deprecate NULL 189
Appendix A. 191
Reminders 195
Listings 203
Appendix. Bibliography 205
Appendix. Index 207
LEVEL 0
Encounter
This first level of the book may be your first encounter with the programming language
C. It provides you with a rough knowledge about C programs, about their purpose, their
structure and how to use them. It is not meant to give you a complete overview, it can’t and
it doesn’t even try. On the contrary, it is supposed to give you a general idea of what this is
all about and open up questions, promote ideas and concepts. These then will be explained
in detail on the higher levels.
1. Getting started
In this section I will try to introduce you to one simple program that has been chosen
because it contains many of the constructs of the C language. If you already have experi-
ence in programming you may find parts of it feel like needless repetition. If you lack such
experience, you might feel ovewhelmed by the stream of new terms and concepts.
In either case, be patient. For those of you with programming experience, it’s very
possible that there are subtle details you’re not aware of, or assumptions you have made
about the language that are not valid, even if you have programmed C before. For the
ones approaching programming for the first time, be assured that after approximately ten
pages from now your understanding will have increased a lot, and you should have a much
clearer idea of what programming might represent.
An important bit of wisdom for programming in general, and for this book in particu-
lar, is summarized in the following citation from the Hitchhiker’s guide to the Galaxy:
Rule B Don’t panic.
It’s not worth it. There are many cross references, links, side information present in
the text. There is an Index on page 207. Follow those if you have a question. Or just take
a break.
1.1. Imperative programming. To get started and see what we are talking about
consider our first program in Listing 1:
You probably see that this is a sort of language, containing some weird words like
“main”, “include”, “for”, etc. laid out and colored in a peculiar way and mixed with a
lot of weird characters, numbers, and text “Doing some work” that looks like an ordinary
English phrase. It is designed to provide a link between us, the human programmers, and
a machine, the computer, to tell it what to do — give it “orders”.
Rule 0.1.1.1 C is an imperative programming language.
In this book, we will not only encounter the C programming language, but also some
vocabulary from an English dialect, C jargon, the language that helps us to talk about C.
It will not be possible to immediately explain each term the first time it occurs. But I will
explain each one, in time, and all of them are indexed such that you can easily cheat and
jumpC
to more explanatory text, at your own risk.
As you can probably guess from this first example, such a C program has different
components that form some intermixed layers. Let’s try to understand it from the inside
out.
1
2 0. ENCOUNTER
LISTING 1. A first example of a C program
1 /* This may look like nonsense, but really is -*- mode: C -*- */
2 # include <stdlib.h>
3 # include <stdio.h>
4
5 /* The main thing that this program does. */
6 i n t main(void) {
7 // Declarations
8 double A[5] = {
9 [0] = 9.0,
10 [1] = 2.9,
11 [4] = 3.E+25,
12 [3] = .00007,
13 };
14
15 // Doing some work
16 for ( s i z e _ t i = 0; i < 5; ++i) {
17 p r i n t f ("element %zu is %g, tits square is %gn",
18 i,
19 A[i],
20 A[i]*A[i]);
21 }
22
23 return EXIT_SUCCESS;
24 }
1.1.1. Giving orders. The visible result of running this program is to output 5 lines
of text on the command terminal of your computer. On my computer using this program
looks something like
Terminal
0 > ./getting-started
1 element 0 is 9, its square is 81
2 element 1 is 2.9, its square is 8.41
3 element 2 is 0, its square is 0
4 element 3 is 7e-05, its square is 4.9e-09
5 element 4 is 3e+25, its square is 9e+50
We can easily identify parts of the text that this program outputs (printsC
in the C
jargon) inside our program, namely the blue part of Line 17. The real action (statementC
in C) happens between that line and Line 20. The statement is a callC
to a functionC
named printf.
. getting-started.c
17 p r i n t f ("element %zu is %g, tits square is %gn",
18 i,
19 A[i],
20 A[i]*A[i]);
Here, the printf functionC
receives four argumentsC
, enclosed in a pair of parenthesisC
,
“( ... )” :
1. GETTING STARTED 3
• The funny-looking text (the blue part) is a so-called string literalC
that serves as
a formatC
for the output. Within the text are three markers (format specifiersC
),
that mark the positions in the output where numbers are to be inserted. These
markers start with a "%" character. This format also contains some special
escape charactersC
that start with a backslash, namely "t" and "n".
• After a comma character we find the word “i”. The thing that “i” stands for will
be printed in place of the first format specifier, "%zu".
• Another comma separates the next argument “A[i]”. The thing that stands for
will be printed in place of the second format specifier, the first "%g".
• Last, again separated by comma, appears “A[i]*A[i]”, corresponding to the
last "%g".
We will later explain what all of these arguments mean. Let’s just remember that we
identified the main purpose of that program, namely to print some lines on the terminal,
and that it “orders” function printf to fulfill that purpose. The rest is some sugarC
to
specify which numbers will be printed and how many of them.
1.2. Compiling and running. As it is shown above, the program text that we have
listed can not be understood by your computer.
There is a special program, called a compiler, that translates the C text into something
that your machine can understand, the so-called binary codeC
or executableC
. What that
translated program looks like and how this translation is done is much too complicated to
explain at this stage.1
However, for the moment we don’t need to understand more deeply,
as we have that tool that does all the work for us.
Rule 0.1.2.1 C is a compiled programming language.
The name of the compiler and its command line arguments depend a lot on the platformC
on which you will be running your program. There is a simple reason for this: the target
binary code is platform dependentC
, that is its form and details depend on the computer
on which you want to run it; a PC has different needs than a phone, your fridge doesn’t
speak the same language as your set-top box. In fact, that’s one of the reasons for C to
exist.
Rule 0.1.2.2 A C program is portable between different platforms.
It is the job of the compiler to ensure that our little program above, once translated for
the appropriate platform, will run correctly on your PC, your phone, your set-top box and
maybe even your fridge.
That said, there is a good chance that a program named c99 might be present on your
PC and that this is in fact a C compiler. You could try to compile the example program
using the following command:
Terminal
0 > c99 -Wall -o getting-started getting-started.c -lm
The compiler should do its job without complaining, and output an executable file
called getting-started in your current directory.[Exs 2]
In the above line
• c99 is the compiler program.
• -Wall tells it to warn us about anything that it finds unusual.
1In fact, the translation itself is done in several steps that goes from textual replacement, over proper com-
pilation to linking. Nevertheless, the tool that bundles all this is traditionally called compiler and not translator,
which would be more accurate.
[Exs 2] Try the compilation command in your terminal.
4 0. ENCOUNTER
• -o getting-started tells it to store the compiler outputC
in a file named
getting-started.
• getting-started.c names the source fileC
, namely the file that contains
the C code that we have written. Note that the .c extension at the end of the file
name refers to the C programming language.
• -lm tells it to add some standard mathematical functions if necessary, we will
need those later on.
Now we can executeC
our newly created executableC
. Type in:
Terminal
0 > ./getting-started
and you should see exactly the same output as I have given you above. That’s what
portable means, wherever you run that program its behaviorC
should be the same.
If you are not lucky and the compilation command above didn’t work, you’d have to
look up the name of your compilerC
in your system documentation. You might even have
to install a compiler if one is not available. The names of compilers vary. Here are some
common alternatives that might do the trick:
Terminal
0 > clang -Wall -lm -o getting-started getting-started.c
1 > gcc -std=c99 -Wall -lm -o getting-started getting-started.c
2 > icc -std=c99 -Wall -lm -o getting-started getting-started.c
Some of these, even if they are present on your computer, might not compile the
program without complaining.[Exs 3]
With the program in Listing 1 we presented an ideal world — a program that works
and produces the same result on all platforms. Unfortunately, when programming yourself
very often you will have a program that only works partially and that maybe produces
wrong or unreliable results. Therefore, let us look at the program in Listing 2. It looks
quite similar to the previous one.
If you run your compiler on that one, it should give you some diagnosticC
, something
similar to this
Terminal
0 > c99 -Wall -o getting-started-badly getting-started-badly.c
1 getting-started-badly.c:4:6: warning: return type of ’main’ is not ’int’ [-Wmain]
2 getting-started-badly.c: In function ’main’:
3 getting-started-badly.c:16:6: warning: implicit declaration of function ’printf’ [-Wimplicit-func
4 getting-started-badly.c:16:6: warning: incompatible implicit declaration of built-in function ’pr
5 getting-started-badly.c:22:3: warning: ’return’ with a value, in function returning void [enabled
Here we had a lot of long “warning” lines that are even too long to fit on a terminal
screen. In the end the compiler produced an executable. Unfortunately, the output when
we run the program is different. This is a sign that we have to be careful and pay attention
to details.
clang is even more picky than gcc and gives us even longer diagnostic lines:
[Exs 3] Start writing a textual report about your tests with this book. Note down which command worked for you.
1. GETTING STARTED 5
LISTING 2. An example of a C program with flaws
1 /* This may look like nonsense, but really is -*- mode: C -*- */
2
3 /* The main thing that this program does. */
4 void main() {
5 // Declarations
6 i n t i;
7 double A[5] = {
8 9.0,
9 2.9,
10 3.E+25,
11 .00007,
12 };
13
14 // Doing some work
15 for (i = 0; i < 5; ++i) {
16 p r i n t f ("element %d is %g, tits square is %gn",
17 i,
18 A[i],
19 A[i]*A[i]);
20 }
21
22 return 0;
23 }
Terminal
0 > clang -Wall -o getting-started-badly getting-started-badly.c
1 getting-started-badly.c:4:1: warning: return type of ’main’ is not ’int’ [-Wmain-return-type]
2 void main() {
3 ^
4 getting-started-badly.c:16:6: warning: implicitly declaring library function ’printf’ with type
5 ’int (const char *, ...)’
6 printf("element %d is %g, tits square is %gn", /*@label{printf-start-badly}*/
7 ^
8 getting-started-badly.c:16:6: note: please include the header <stdio.h> or explicitly provide a d
9 ’printf’
10 getting-started-badly.c:22:3: error: void function ’main’ should not return a value [-Wreturn-typ
11 return 0;
12 ^ ~
13 2 warnings and 1 error generated.
This is a good thing! Its diagnostic outputC
is much more informative. In particular
it gave us two hints: it expected a different return type for main and it expected us to
have a line such as Line 3 of Listing 1 to specify where the printf function comes from.
Notice how clang, unlike gcc, did not produce an executable. It considers the problem
in Line 22 fatal. Consider this to be a feature.
In fact depending on your platform you may force your compiler to reject programs
that produce such diagnostics. For gcc such a command line option would be -Werror.
Rule 0.1.2.3 A C program should compile cleanly without warnings.
6 0. ENCOUNTER
So we have seen two of the points in which Listings 1 and 2 differed, and these two
modifications turned a good, standard conforming, portable program into a bad one. We
also have seen that the compiler is there to help us. It nailed the problem down to the
lines in the program that cause trouble, and with a bit of experience you will be able to
understand what it is telling you.[Exs 4] [Exs 5]
2. The principal structure of a program
Compared to our little examples from above, real programs will be more complicated
and contain additional constructs, but their structure will be very similar. Listing 1 already
has most of the structural elements of a C program.
There are two categories of aspects to consider in a C program: syntactical aspects
(how do we specify the program so the compiler understands it) and semantic aspects (what
do we specify so that the program does what we want it to do). In the following subsections
we will introduce the syntactical aspects (“grammar”) and three different semantic aspects,
namely declarative parts (what things are), definitions of objects (where things are) and
statements (what are things supposed to do).
2.1. Grammar. Looking at its overall structure, we can see that a C program is com-
posed of different types of text elements that are assembled in a kind of grammar. These
elements are:
special words: In Listing 1 we have used the following special words6
: #include, int, void,
double, for, and return. In our program text, here, they will usually be printed in bold
face. These special words represent concepts and features that the C language imposes
and that cannot be changed.
punctuationsC
: There are several punctuation concepts that C uses to structure the program
text.
• There are five sorts of parenthesis: { ... }, ( ... ), [ ... ], /* ... */ and
< ... >. Parenthesis group certain parts of the program together and should al-
ways come in pairs. Fortunately, the < ... > parenthesis are rare in C, and only
used as shown in our example, on the same logical line of text. The other four are
not limited to a single line, their contents might span several lines, like they did
when we used printf earlier.
• There are two different separators or terminators, comma and semicolon. When we
used printf we saw that commas separated the four arguments to that function, in
line 12 we saw that a comma also can follow the last element of a list of elements.
. getting-started.c
12 [3] = .00007,
One of the difficulties for newcomers in C is that the same punctuation characters are
used to express different concepts. For example, {} and [] are each used for two differ-
ent purposes in our program.
Rule 0.2.1.1 Punctuation characters can be used with several different meanings.
commentsC
: The construct /* ... */ that we saw as above tells the compiler that ev-
erything inside it is a comment, see e.g Line 5.
[Exs 4] Correct Listing 2 step by step. Start from the first diagnostic line, fix the code that is mentioned there,
recompile and so on, until you have a flawless program.
[Exs 5] There is a third difference between the two programs that we didn’t mention, yet. Find it.
6In the C jargon these are directivesC , keywordsC and reservedC identifiers
2. THE PRINCIPAL STRUCTURE OF A PROGRAM 7
. getting-started.c
5 /* The main thing that this program does. */
Comments are ignored by the compiler. It is the perfect place to explain and
document your code. Such “in-place” documentation can (and should) improve
the readability and comprehensibility of your code a lot. Another form of com-
ment is the so-called C++-style comment as in Line 15. These are marked by //.
C++-style comments extend from the // to the end of the line.
literalsC
: Our program contains several items that refer to fixed values that are part of the
program: 0, 1, 3, 4, 5, 9.0, 2.9, 3.E+25, .00007, and
"element %zu is %g, tits square is %gn". These are called literalsC
.
identifiersC
: These are “names” that we (or the C standard) give to certain entities in
the program. Here we have: A, i, main, printf, size_t , and EXIT_SUCCESS.
Identifiers can play different roles in a program. Amongst others they may refer
to:
• data objectsC
(such as A and i), these are also referred to as variablesC
• typeC
aliases, size_t , that specify the “sort” of a new object, here of i.
Observe the trailing _t in the name. This naming convention is used by the
C standard to remind you that the identifier refers to a type.
• functions (main and printf),
• constants (EXIT_SUCCESS).
functionsC
: Two of the identifiers refer to functions: main and printf. As we have already
seen printf is used by the program to produce some output. The function main
in turn is definedC
, that is its declarationC
int main(void) is followed by a
blockC
enclosed in { ... } that describes what that function is supposed to
do. In our example this function definitionC
goes from Line 6 to 24. main has a
special role in C programs as we will encounter them, it must always be present
since it is the starting point of the program’s execution.
operatorsC
: Of the numerous C operators our program only uses a few:
• = for initializationC
and assignmentC
,
• < for comparison,
• ++ to increment a variable, that is to increase its value by 1
• * to perform the multiplication of two values.
2.2. Declarations. Declarations have to do with the identifiersC
that we encountered
above. As a general rule:
Rule 0.2.2.1 All identifiers of a program have to be declared.
That is, before we use an identifier we have to give the compiler a declarationC
that tells it what that identifier is supposed to be. This is where identifiers differ from
keywordsC
; keywords are predefined by the language, and must not be declared or rede-
fined.
Three of the identifiers we use are effectively declared in our program: main, A and
i. Later on, we will see where the other identifiers (printf, size_t , and EXIT_SUCCESS)
come from.
Above, we already mentioned the declaration of the main function. All three declara-
tions, in isolation as “declarations only”, look like this:
1 i n t main(void);
2 double A[5];
3 s i z e _ t i;
These three follow a pattern. Each has an identifier (main, A or i) and a specification
of certain properties that are associated with that identifier.
8 0. ENCOUNTER
• i is of typeC
size_t .
• main is additionally followed by parenthesis, ( ... ), and thus declares a function of
type int.
• A is followed by brackets, [ ... ], and thus declares an arrayC
. An array is an aggre-
gate of several items of the same type, here it consists of 5 items of type double. These
5 items are ordered and can be referred to by numbers, called indicesC
, from 0 to 4.
Each of these declarations starts with a typeC
, here int, double and size_t . We will
see later what that represents. For the moment it is sufficient to know that this specifies
that all three identifiers, when used in the context of a statement, will act as some sort of
“numbers”.
For the other three identifiers, printf, size_t and EXIT_SUCCESS, we don’t see any
declaration. In fact they are pre-declared identifiers, but as we saw when we tried to com-
pile Listing 2, the information about these identifiers doesn’t come out of nowhere. We
have to tell the compiler where it can obtain information about them. This is done right at
the start of the program, in the Lines 2 and 3: printf is provided by stdio.h, whereas#include <stdio.h>
size_t and EXIT_SUCCESS come from stdlib.h. The real declarations of these identi-#include <stdlib.h>
fiers are specified in .h files with these names somewhere on your computer. They could
be something like:
1 i n t p r i n t f (char const format[ s t a t i c 1], ...);
2 typedef unsigned long s i z e _ t ;
3 # define EXIT_SUCCESS 0
but this is not important for the moment. This information is normally hidden from
you in these include filesC
or header filesC
. If you need to know the semantics of these,
it’s usually a bad idea to look them up in the corresponding files, as they tend to be barely
readable. Instead, search in the documentation that comes with your platform. For the
brave, I always recommend a look into the current C standard, as that is where they all
come from. For the less courageous the following commands may help:
Terminal
0 > apropos printf
1 > man printf
2 > man 3 printf
Declarations may be repeated, but only if they specify exactly the same thing.
Rule 0.2.2.2 Identifiers may have several consistent declarations.
Another property of declarations is that they might only be valid (visibleC
) in some
part of the program, not everywhere. A scopeC
is a part of the program where an identifier
is valid.
Rule 0.2.2.3 Declarations are bound to the scope in which they appear.
In Listing 1 we have declarations in different scopes.
• A is visible inside the definition of main, starting at its very declaration on Line 8
and ending at the closing } on Line 24 of the innermost { ... } block that
contains that declaration.
2. THE PRINCIPAL STRUCTURE OF A PROGRAM 9
• i has a more restricted visibility. It is bound to the for construct in which it is
declared. Its visibility reaches from that declaration in Line 16 to the end of the
{ ... } block that is associated with the for in Line 21.
• main is not enclosed in any { ... } block, so it is visible from its declaration
onwards until the end of the file.
In a slight abuse of terminology, the first two types of scope are called block scopeC
.
The third type, as used for main is called file scopeC
. Identifiers in file scope are often
referred to as globals.
2.3. Definitions. Generally, declarations only specify the kind of object an identifier
refers to, not what the concrete value of an identifier is, nor where the object it refers to
can be found. This important role is filled by a definitionC
.
Rule 0.2.3.1 Declarations specify identifiers whereas definitions specify objects.
We will later see that things are a little bit more complicated in real life, but for now
we can make a simplification
Rule 0.2.3.2 An object is defined at the same time as it is initialized.
Initializations augment the declarations and give an object its initial value. For in-
stance:
1 s i z e _ t i = 0;
is a declaration of i that is also a definition with initial valueC
0.
A is a bit more complex
. getting-started.c
8 double A[5] = {
9 [0] = 9.0,
10 [1] = 2.9,
11 [4] = 3.E+25,
12 [3] = .00007,
13 };
this initializes the 5 items in A to the values 9.0, 2.9, 0.0, 0.00007 and 3.0E+25, in
that order. The form of an initializer we see here is called designatedC
: a pair of brackets
with an integer designate which item of the array is initialized with the corresponding
value. E.g. [4] = 3.E+25 sets the last item of the array A to the value 3.E+25. As a
special rule, any position that is not listed in the initializer is set to 0. In our example the
missing [2] is filled with 0.0.7
Rule 0.2.3.3 Missing elements in initializers default to 0.
You might have noticed that array positions, indicesC
, above are not starting at 1 for
the first element, but with 0. Think of an array position as the “distance” of the correspond-
ing array element from the start of the array.
Rule 0.2.3.4 For an array with n the first element has index 0, the last has index n-1.
For a function we have a definition (as opposed to only a declaration) if its declaration
is followed by braces { ... } containing the code of the function.
7We will see later how these number literals with dots . and exponents E+25 work.
10 0. ENCOUNTER
1 i n t main(void) {
2 ...
3 }
In our examples so far we have seen two different kinds of objects, data objectsC
,
namely i and A, and function objectsC
, main and printf.
In contrast to declarations, where several were allowed for the same identifier, defini-
tions must be unique:
Rule 0.2.3.5 Each object must have exactly one definition.
This rule concerns data objects as well as function objects.
2.4. Statements. The second part of the main function consists mainly of statements.
Statements are instructions that tell the compiler what to do with identifiers that have been
declared so far. We have
. getting-started.c
16 for ( s i z e _ t i = 0; i < 5; ++i) {
17 p r i n t f ("element %zu is %g, tits square is %gn",
18 i,
19 A[i],
20 A[i]*A[i]);
21 }
22
23 return EXIT_SUCCESS;
We have already discussed the lines that correspond to the call to printf. There are
also other types of statements: a for and a return statement, and an increment operation,
indicated by the operatorC
++.
2.4.1. Iteration. The for statement tells the compiler that the program should execute
the printf line a number of times. It is the simplest form of domain iterationC
that C has
to offer. It has four different parts.
The code that is to be repeated is called loop bodyC
, it is the { ... } block that
follows the for ( ... ). The other three parts are those inside ( ... ) part, divided by
semicolons:
(1) The declaration, definition and initialization of the loop variableC
i that we
already discussed above. This initialization is executed once before any of the
rest of the whole for statement.
(2) A loop conditionC
, i < 5, that specifies how long the for iteration should con-
tinue. This one tells the compiler to continue iterating as long as i is strictly less
than 5. The loop condition is checked before each execution of the loop body.
(3) Another statement, ++i, is executed i after each iteration. In this case it increases
the value of i by 1 each time.
If we put all those together, we ask the program to perform the part in the block 5
times, setting the value of i to 0, 1, 2, 3, and 4 respectively in each iteration. The fact that
we can identify each iteration with a specific value for i makes this an iteration over the
domainC
0, ..., 4. There is more than one way to do this in C, but a for is the easiest,
cleanest and best tool for the task.
Rule 0.2.4.1 Domain iterations should be coded with a for statement.
A for statement can be written in several ways other than what we just saw. Often
people place the definition of the loop variable somewhere before the for or even reuse the
same variable for several loops. Don’t do that.
2. THE PRINCIPAL STRUCTURE OF A PROGRAM 11
Rule 0.2.4.2 The loop variable should be defined in the initial part of a for.
2.4.2. Function return. The last statement in main is a return. It tells the main func-
tion, to return to the statement that it was called from once it’s done. Here, since main has
int in its declaration, a return must send back a value of type int to the calling statement.
In this case that value is EXIT_SUCCESS.
Even though we can’t see its definition, the printf function must contain a similar
return statement. At the point where we call the function in Line 17, execution of the
statements in main is temporarily suspended. Execution continues in the printf function
until a return is encountered. After the return from printf, execution of the statements in
main continues from where it stopped.
main();
call
return
return
call
progam
code
6 i n t main ( void ) {
7 / / D e c l a r a t i o n s
8 double A[ 5 ] = {
9 [ 0 ] = 9.0 ,
10 [ 1 ] = 2 .9 ,
11 [ 4 ] = 3 .E+25 ,
12 [ 3 ] = .00007 ,
13 };
14
15 / / Doing some work
16 for ( s i z e _ t i = 0; i < 5; ++ i ) {
17 p r i n t f ( " element %zu i s %g ,  t i t s square i s %g  n" ,
18 i ,
19 A[ i ] ,
20 A[ i ]∗A[ i ] ) ;
21 }
22
23 return EXIT_SUCCESS;
24 }
int printf (char const fmt [], ...) {
return something;
}
processstartup
Clibrary
FIGURE 1. Execution of a small program
In Figure 1 we have a schematic view of the execution of our little program. First, a
process startup routine (on the left) that is provided by our platform calls the user-provided
function main (middle). That in turn calls printf, a function that is part of the C libraryC
,
on the right. Once a return is encountered there, control returns back to main, and when we
reach the return in main, it passes back to the startup routine. The latter transfer of control,
from a programmer’s point of view, is the end of the program’s execution.
LEVEL 1
Acquaintance
This chapter is supposed to get you acquainted with the C programming language,
that is to provide you with enough knowledge to write and use good C programs. “Good”
here refers to a modern understanding of the language, avoiding most of the pitfalls of
early dialects of C, offering you some constructs that were not present before, and that are
portable across the vast majority of modern computer architectures, from your cell phone
to a mainframe computer.
Having worked through this you should be able to write short code for everyday needs,
not extremely sophisticated, but useful and portable. In many ways, C is a permissive
language, a programmer is allowed to shoot themselves in the foot or other body parts if
they choose to, and C will make no effort to stop them. Therefore, just for the moment, we
will introduce some restrictions. We’ll try to avoid handing out guns in this chapter, and
place the key to the gun safe out of your reach for the moment, marking its location with
big and visible exclamation marks.
The most dangerous constructs in C are the so-called castsC
, so we’ll skip them at this
level. However, there are many other pitfalls that are less easy to avoid. We will approach
some of them in a way that might look unfamiliar to you, in particular if you have learned
your C basics in the last millennium or if you have been initiated to C on a platform that
wasn’t upgraded to current ISO C for years.
• We will focus primarily on the unsignedC
versions of integer types.
• We will introduce pointers in steps: first, in disguise as parameters to functions
(6.1.4), then with their state (being valid or not, 6.2) and then, only when we
really can’t delay it any further (11), using their entire potential.
• We will focus on the use of arrays whenever possible, instead.
Warning to experienced C programmers. If you already have some experience with
C programming, this may need some getting used to. Here are some of the things that may
provoke allergic reactions. If you happen to break out in spots when you read some code
here, try to take a deep breath and let it go.
We bind type modifiers and qualifiers to the left. We want to separate identifiers visu-
ally from their type. So we will typically write things as
1 char* name;
where char* is the type and name is the identifier. We also apply the left binding rule to
qualifiers and write
1 char const* const path_name;
Here the first const qualifies the char to its left, the * makes it to a pointer and the second
const again qualifies what is to its left.
We use array or function notation for pointer parameters to functions. wherever these
assume that the pointer can’t be null. Examples
1 s i z e _ t s t r l e n (char const string[ s t a t i c 1]);
13
14 1. ACQUAINTANCE
2 i n t main( i n t argc, char* argv[argc+1]);
3 i n t a t e x i t (void function(void));
The first stresses the fact that strlen must receive a valid (non-null) pointer and will access
at least one element of string. The second summarizes the fact that main receives an array
of pointers to char: the program name, argc-1 program arguments and one null pointer
that terminates the array. The third emphasizes that semantically atexit receives a function
as an argument. The fact that technically this function is passed on as a function pointer
is usually of minor interest, and the commonly used pointer-to-function syntax is barely
readable. Here are syntactically equivalent declarations for the three functions above as
they would be written by many:
1 s i z e _ t s t r l e n (const char *string);
2 i n t main( i n t argc, char **argv);
3 i n t a t e x i t (void (*function)(void));
As you now hopefully see, this is less informative and more difficult to comprehend visu-
ally.
We define variables as close to their first use as possible. Lack of variable initializa-
tion, especially for pointers, is one of the major pitfalls for novice C programmers. This
is why we should, whenever possible, combine the declaration of a variable with the first
assignment to it: the tool that C gives us for this purpose is a definition - a declaration
together with an initialization. This gives a name to a value, and introduces this name at
the first place where it is used.
This is particularly convenient for for-loops. The iterator variable of one loop is se-
mantically a different object from the one in another loop, so we declare the variable within
the for to ensure it stays within the loop’s scope.
We use prefix notation for code blocks. To be able to read a code block it is important
to capture two things about it easily: its purpose and its extent. Therefore:
• All { are prefixed on the same line with the statement or declaration that intro-
duces them.
• The code inside is indented by one level.
• The terminating } starts a new line on the same level as the statement that intro-
duced the block.
• Block statements that have a continuation after the } continue on the same line.
Examples:
1 i n t main( i n t argc, char* argv[argc+1]) {
2 puts("Hello world!");
3 i f (argc > 1) {
4 while (true) {
5 puts("some programs never stop");
6 }
7 } e l s e {
8 do {
9 puts("but this one does");
10 } while ( f a l s e );
11 }
12 return EXIT_SUCCESS;
13 }
3. Everything is about control
In our introductory example we saw two different constructs that allowed us to control
the flow of a program execution: functions and the for-iteration. Functions are a way to
3. EVERYTHING IS ABOUT CONTROL 15
transfer control unconditionally. The call transfers control unconditionally to the function
and a return-statement unconditionally transfers it back to the caller. We will come back
to functions in Section 7.
The for statement is different in that it has a controlling condition (i < 5 in the ex-
ample) that regulates if and when the dependent block or statement ({ printf(...)}) is
executed. C has five conditional control statements: if, for, do, while and switch. We will
look at these statements in this section.
There are several other kinds of conditional expressions we will look at later on: the
ternary operatorC
, denoted by an expression in the form “cond ? A : B”, and the
compile-time preprocessor conditionals (#if-#else) and type generic expressions (noted
with the keyword _Generic). We will visit these in Sections 4.4 and 20, respectively.
3.1. Conditional execution. The first construct that we will look at is specified by
the keyword if. It looks like this:
1 i f (i > 25) {
2 j = i - 25;
3 }
Here we compare i against the value 25. If it is larger than 25, j is set to the value i - 25.
In that example i > 25 is called the controlling expressionC
, and the part in { ... } is
called the dependent blockC
.
This form of an if statement is syntactically quite similar to the for statement that we
already have encountered. It is a bit simpler, the part inside the parenthesis has only one
part that determines whether the dependent statement or block is run.
There is a more general form of the if construct:
1 i f (i > 25) {
2 j = i - 25;
3 } e l s e {
4 j = i;
5 }
It has a second dependent statement or block that is executed if the controlling con-
dition is not fulfilled. Syntactically, this is done by introducing another keyword else that
separates the two statements or blocks.
The if (...)... else ... is a selection statementC
. It selects one of the two
possible code pathsC
according to the contents of ( ... ). The general form is
1 i f (condition) statement0-or-block0
2 e l s e statement1-or-block1
The possibilities for the controlling expression “condition” are numerous. They can
range from simple comparisons as in this example to very complex nested expressions. We
will present all the primitives that can be used in Section 4.3.2.
The simplest of such “condition” specifications in an if statement can be seen in
the following example, in a variation of the for loop from Listing 1.
1 for ( s i z e _ t i = 0; i < 5; ++i) {
2 i f (i) {
3 p r i n t f ("element %zu is %g, tits square is %gn",
4 i,
5 A[i],
6 A[i]*A[i]);
7 }
8 }
16 1. ACQUAINTANCE
Here the condition that determines whether printf is executed or not is just i: a nu-
merical value by itself can be interpreted as a condition. The text will only be printed when
the value of i is not 0.[Exs 1]
There are two simple rules for the evaluation a numerical “condition”:
Rule 1.3.1.1 The value 0 represents logical false.
Rule 1.3.1.2 Any value different from 0 represents logical true.
The operators == and != allow us to test for equality and inequality, respectively.
a == b is true if the value of a is equal to the value of b and false otherwise; a != b is
false if a is equal to b and true otherwise. Knowing how numerical values are evaluated as
conditions, we can avoid redundancy. For example, we can rewrite
1 i f (i != 0) {
2 ...
3 }
as:
1 i f (i) {
2 ...
3 }
The type bool, specified in stdbool.h, is what we should be using if we want to#include <stdbool.h>
store truth values. Its values are false and true. Technically, false is just another name for
0 and true for 1. It’s important to use false and true (and not the numbers) to emphasize
that a value is to be interpreted as a condition. We will learn more about the bool type in
Section 5.5.4.
Redundant comparisons quickly become unreadable and clutter your code. If you have
a conditional that depends on a truth value, use that truth value directly as the condition.
Again, we can avoid redundancy by rewriting something like:
1 bool b = ...;
2 ...
3 i f ((b != f a l s e ) == true) {
4 ...
5 }
as
1 bool b = ...;
2 ...
3 i f (b) {
4 ...
5 }
Generally:
Rule 1.3.1.3 Don’t compare to 0, false or true.
Using the truth value directly makes your code clearer, and illustrates one of the basic
concepts of the C language:
Rule 1.3.1.4 All scalars have a truth value.
Here scalarC
types include all the numerical types such as size_t , bool or int that we
already encountered, and pointerC
types, that we will come back to in Section 6.2.
[Exs 1] Add the if (i) condition to the program and compare the output to the previous.
3. EVERYTHING IS ABOUT CONTROL 17
3.2. Iterations. Previously, we encountered the for statement that allows us to iterate
over a domain; in our introductory example it declared a variable i that was set to the
values 0, 1, 2, 3 and 4. The general form of this statement is
1 for (clause1; condition2; expression3) statement-or-block
This statement is actually quite genereric. Usually “clause1” is an assignment ex-
pression or a variable definition. It serves to state an initial value for the iteration domain.
“condition2” tests if the iteration should continue. Then, “expression3” updates the
iteration variable that had been used in “clause1”. It is performed at the end of each
iteration. Some advice
• In view of Rule 0.2.4.2 “clause1” should in most cases be be a variable defini-
tion.
• Because for is relatively complex with its four different parts and not so easy to
capture visually, “statement-or-block” should usually be a { ... } block.
Let’s see some more examples:
1 for ( s i z e _ t i = 10; i; --i) {
2 something(i);
3 }
4 for ( s i z e _ t i = 0, stop = upper_bound(); i < stop; ++i) {
5 something_else(i);
6 }
7 for ( s i z e _ t i = 9; i <= 9; --i) {
8 something_else(i);
9 }
The first for counts i down from 10 to 1, inclusive. The condition is again just the
evaluation of the variable i, no redundant test against value 0 is required. When i becomes
0, it will evaluate to false and the loop will stop. The second for declares two variables,
i and stop. As before i is the loop variable, stop is what we compare against in the
condition, and when i becomes greater than or equal to stop, the loop terminates.
The third for appears like it would go on forever, but actually counts down from 9 to
0. In fact, in the next section we will see that “sizes” in C, that is numbers that have type
size_t , are never negative.[Exs 2]
Observe that all three for statements declare variables named i. These three variables
with the same name happily live side by side, as long as their scopes don’t overlap.
There are two more iterative statements in C, namely while and do.
1 while (condition) statement-or-block
2 do statement-or-block while(condition);
The following example shows a typical use of the first:
1 #include <tgmath.h>
2
3 double const eps = 1E-9; // desired precision
4 ...
5 double const a = 34.0;
6 double x = 0.5;
7 while (fabs(1.0 - a*x) >= eps) { // iterate until close
8 x *= (2.0 - a*x); // Heron approximation
9 }
[Exs 2] Try to imagine what happens when i has value 0 and is decremented by means of operator --.
18 1. ACQUAINTANCE
It iterates as long as the given condition evaluates true. The do loop is very similar,
except that it checks the condition after the dependent block:
1 do { // iterate
2 x *= (2.0 - a*x); // Heron approximation
3 } while (fabs(1.0 - a*x) >= eps); // iterate until close
This means that if the condition evaluates to false, a while-loop will not run its dependent
block at all, and a do-loop will run it once before terminating.
As with the for statement, for do and while it is advisable to use the { ... } block
variants. There is also a subtle syntactical difference between the two, do always needs a
semicolon ; after the while (condition) to terminate the statement. Later we will see
that this is a syntactic feature that turns out to be quite useful in the context of multiple
nested statements, see Section 10.3.
All three iteration statements become even more flexible with break and continue state-
ments. A break statement stops the loop without re-evaluating the termination condition or
executing the part of the dependent block after the break statement:
1 while (true) {
2 double prod = a*x;
3 i f (fabs(1.0 - prod) < eps) // stop if close enough
4 break;
5 x *= (2.0 - prod); // Heron approximation
6 }
This way, we can separate the computation of the product a*x, the evaluation of the
stop condition and the update of x. The condition of the while then becomes trivial. The
same can be done using a for, and there is a tradition among C programmers to write it in
as follows:
1 for (;;) {
2 double prod = a*x;
3 i f (fabs(1.0 - prod) < eps) // stop if close enough
4 break;
5 x *= (2.0 - prod); // Heron approximation
6 }
for(;;) here is equivalent to while(true). The fact that the controlling expression of a for
(the middle part between the ;;) can be omitted and is interpreted as “always true” is just
an historic artifact in the rules of C and has no other special reason.
The continue statement is less frequently used. Like break, it skips the execution of the
rest of the dependent block, so all statements in the block after the continue are not executed
for the current iteration. However, it then re-evaluates the condition and continues from
the start of the dependent block if the condition is true.
1 for ( s i z e _ t i =0; i < max_iterations; ++i) {
2 i f (x > 1.0) { // check if we are on the correct side of 1
3 x = 1.0/x;
4 continue;
5 }
6 double prod = a*x;
7 i f (fabs(1.0 - prod) < eps) // stop if close enough
8 break;
9 x *= (2.0 - prod); // Heron approximation
10 }
3. EVERYTHING IS ABOUT CONTROL 19
In the examples above we made use of a standard macro fabs, that comes with the
tgmath.h header3
. It calculates the absolute value of a double. If you are interested in #include <tgmath.h>
how this works, Listing 1.1 is a program that does the same thing without the use of fabs.
In it, fabs has been replaced by several explicit comparisons.
The task of the program is to compute the inverse of all numbers that are provided to
it on the command line. An example of a program execution looks like:
Terminal
0 > ./heron 0.07 5 6E+23
1 heron: a=7.00000e-02, x=1.42857e+01, a*x=0.999999999996
2 heron: a=5.00000e+00, x=2.00000e-01, a*x=0.999999999767
3 heron: a=6.00000e+23, x=1.66667e-24, a*x=0.999999997028
To process the numbers on the command line the program uses another library function
strtod from stdlib.h.[Exs 4][Exs 5][Exs 6]
#include <stdlib.h>
LISTING 1.1. A program to compute inverses of numbers
1 # include <stdlib.h>
2 # include <stdio.h>
3
4 /* lower and upper iteration limits centered around 1.0 */
5 s t a t i c double const eps1m01 = 1.0 - 0x1P-01;
6 s t a t i c double const eps1p01 = 1.0 + 0x1P-01;
7 s t a t i c double const eps1m24 = 1.0 - 0x1P-24;
8 s t a t i c double const eps1p24 = 1.0 + 0x1P-24;
9
10 i n t main( i n t argc, char* argv[argc+1]) {
11 for ( i n t i = 1; i < argc; ++i) { // process args
12 double const a = strtod(argv[i], 0); // arg -> double
13 double x = 1.0;
14 for (;;) { // by powers of 2
15 double prod = a*x;
16 i f (prod < eps1m01) x *= 2.0;
17 e l s e i f (eps1p01 < prod) x *= 0.5;
18 e l s e break;
19 }
20 for (;;) { // Heron approximation
21 double prod = a*x;
22 i f ((prod < eps1m24) || (eps1p24 < prod))
23 x *= (2.0 - prod);
24 e l s e break;
25 }
26 p r i n t f ("heron: a=%.5e,tx=%.5e,ta*x=%.12fn",
27 a, x, a*x);
28 }
29 return EXIT_SUCCESS;
30 }
3“tgmath” stands for type generic mathematical functions.
[Exs 4] Analyse Listing 1.1 by adding printf calls for intermediate values of x.
[Exs 5] Describe the use of the parameters argc and argv in Listing 1.1.
[Exs 6] Print out the values of eps1m01 and observe the output when you change them slightly.
20 1. ACQUAINTANCE
3.3. Multiple selection. The last control statement that C has to offer is called switch
statement and is another selectionC
statement. It is mainly used when cascades of if-else
constructs would be too tedious:
1 i f (arg == ’m’) {
2 puts("this is a magpie");
3 } e l s e i f (arg == ’r’) {
4 puts("this is a raven");
5 } e l s e i f (arg == ’j’) {
6 puts("this is a jay");
7 } e l s e i f (arg == ’c’) {
8 puts("this is a chough");
9 } e l s e {
10 puts("this is an unknown corvid");
11 }
In this case, we have a choice that is more complex than a false -true decision and that can
have several outcomes. We can simplify this as follows:
1 switch (arg) {
2 case ’m’: puts("this is a magpie");
3 break;
4 case ’r’: puts("this is a raven");
5 break;
6 case ’j’: puts("this is a jay");
7 break;
8 case ’c’: puts("this is a chough");
9 break;
10 default : puts("this is an unknown corvid");
11 }
Here we select one of the puts calls according to the value of the arg variable. Like printf,
the function puts is provided by stdio.h. It outputs a line with the string that is passed#include <stdio.h>
as an argument. We provide specific cases for characters ’m’, ’r’, ’j’, ’c’ and a
fallbackC
case labeled default. The default case is triggered if arg doesn’t match any of
the case values.[Exs 7]
Syntactically, a switch is as simple as
1 switch (expression) statement-or-block
and the semantics of it are quite straightforward: the case and default labels serve as jump
targetsC
. According to the value of the expression, control just continues at the state-
ment that is labeled accordingly. If we hit a break statement, the whole switch under which
it appears terminates and control is transferred to the next statement after the switch.
By that specification a switch statement can in fact be used much more widely than
iterated if-else constructs.
1 switch (count) {
2 default :puts("++++ ..... +++");
3 case 4: puts("++++");
4 case 3: puts("+++");
5 case 2: puts("++");
6 case 1: puts("+");
7 case 0:;
[Exs 7] Test the above switch statement in a program. See what happens if you leave out some of the break
statements.
3. EVERYTHING IS ABOUT CONTROL 21
8 }
Once we have jumped into the block, the execution continues until it reaches a break or the
end of the block. In this case, because there are no break statements, we end up running all
subsequent puts statements. For example, the output when the value of count is 3 would
be a triangle with three lines.
Terminal
0 +++
1 ++
2 +
The structure of a switch can be more flexible than if-else, but it is restricted in another
way:
Rule 1.3.3.1 case values must be integer constant expressions.
In Section 5.4.2 we will see what these expressions are in detail. For now it suffices to
know that these have to be fixed values that we provide directly in the source such as the
4, 3, 2, 1, 0 above. In particular variables such as count above are only allowed in the
switch part but not for the individual cases.
With the greater flexibility of the switch statement also comes a price: it is more error
prone. In particular, we might accidentally skip variable definitions:
Rule 1.3.3.2 case labels must not jump beyond a variable definition.
22 1. ACQUAINTANCE
4. Expressing computations
We’ve already made use of some simple examples of expressionsC
. These are code
snippets that compute some value based on other values. The simplest such expressions
are certainly arithmetic expressions that are similar to those that we learned in school. But
there are others, notably comparison operators such as == and != that we already saw
earlier.
In this section, the values and objects on which we will do these computations will be
mostly of the type size_t that we already met above. Such values correspond to “sizes”, so
they are numbers that cannot be negative. Their range of possible values starts at 0. What
we would like to represent are all the non-negative integers, often denoted as N, N0, or
“natural” numbers in mathematics. Unfortunately computers are finite so we can’t directly
represent all the natural numbers, but we can do a reasonable approximation. There is a
big upper limit SIZE_MAX that is the upper bound of what we can represent in a size_t .
Rule 1.4.0.3 The type size_t represents values in the range [0, SIZE_MAX].
The value of SIZE_MAX is quite large, depending on the platform it should be one of
216
− 1 = 65535
232
− 1 = 4294967295
264
− 1 = 18446744073709551615
The first value is a minimal requirement, the other two values are much more commonly
used today. They should be large enough for calculations that are not too sophisticated.
The standard header stdint.h provides SIZE_MAX such that you don’t have to figure it#include <stdint.h>
out yourself to write portable code.
The concept of “numbers that cannot be negative” to which we referred for size_t
corresponds to what C calls unsigned integer typesC
. The symbols and combinations
like + or != are called operatorsC
and the things to which they are applied are called
operandsC
, so in something like “a + b”, “+” is the operator and “a” and “b” are its
operands.
For an overview of all C operators see the tables in the appendix; Table 2 lists the
operators that operate on values, Table 3 those that operate objects and Table 4 those that
operate on types.
4.1. Arithmetic. Arithmetic operators form the first group in Table 2 of operators
that operate on values.
4.1.1. +, - and *. Arithmetic operators +, - and * mostly work as we would expect
by computing the sum, the difference and the product of two values.
1 s i z e _ t a = 45;
2 s i z e _ t b = 7;
3 s i z e _ t c = (a - b)*2;
4 s i z e _ t d = a - b*2;
must result in c being equal to 76, and d to 31. As you can see from that little example,
sub-expressions can be grouped together with parenthesis to enforce a preferred binding of
the operator.
In addition, operators + and - also have unary variants. -b just gives the negative of b,
namely a value a such that b + a is 0. +a simply provides the value of a. The following
would give 76 as well.
3 s i z e _ t c = (+a + -b)*2;
4. EXPRESSING COMPUTATIONS 23
Even though we use an unsigned type for our computation, negation and difference
by means of the operator - is well defined. In fact, one of the miraculous properties of
size_t is that +-* arithmetic always works where it can. This means that as long as the
final mathematical result is within the range [0, SIZE_MAX], then that result will be the
value of the expression.
Rule 1.4.1.1 Unsigned arithmetic is always well defined.
Rule 1.4.1.2 Operations +, - and * on size_t provide the mathematically correct re-
sult if it is representable as a size_t .
In case that we have a result that is not representable, we speak of arithmetic overflowC
.
Overflow can e.g. happen if we multiply two values that are so large that their mathemat-
ical product is greater than SIZE_MAX. We’ll look how C deals with overflow in the next
section.
4.1.2. Division and remainder. The operators / and % are a bit more complicated,
because they correspond to integer division and remainder operation. You might not be
as used to them as to the other three arithmetic operators. a/b evaluates to the number
of times b fits into a, and a%b is the remaining value once the maximum number of b are
removed from a. The operators / and % come in pair: if we have z = a / b the remainder
a % b could be computed as a - z*b:
Rule 1.4.1.3 For unsigned values, a == (a/b)*b + (a%b).
A familiar example for the % operator are the hours on a clock. Say we have a 12
hour clock: 6 hours after 8 o’clock is 2 o’clock. Most people are able to compute time
differences on 12 hour or 24 hour clocks. This computation corresponds to a % 12, in
our example (8 + 6)% 12 == 2.[Exs 8]
Another similar use for % is computation with
minutes in the hour, of the form a % 60.
There is only one exceptional value that is not allowed for these two operations: 0.
Division by zero is forbidden.
Rule 1.4.1.4 Unsigned / and % are well defined only if the second operand is not 0.
The % operator can also be used to explain additive and multiplicative arithmetic on
unsigned types a bit better. As already mentioned above, when an unsigned type is given a
value outside its range, it is said to overflowC
. In that case, the result is reduced as if the %
operator had been used. The resulting value “wraps around” the range of the type. In the
case of size_t , the range is 0 to SIZE_MAX, therefore
Rule 1.4.1.5 Arithmetic on size_t implicitly does computation %(SIZE_MAX+1).
Rule 1.4.1.6 In case of overflow, unsigned arithmetic wraps around.
This means that for size_t values, SIZE_MAX + 1 is equal to 0 and 0 - 1 is equal
to SIZE_MAX.
This “wrapping around” is the magic that makes the - operators work for unsigned
types. For example, the value -1 interpreted as a size_t is equal to SIZE_MAX and
so adding -1 to a value a, just evaluates to a + SIZE_MAX which wraps around to
a + SIZE_MAX - (SIZE_MAX+1)= a - 1.
Operators / and % have the nice property that their results are always smaller than or
equal to their operands:
Rule 1.4.1.7 The result of unsigned / and % is always smaller than the operands.
And thus
Rule 1.4.1.8 Unsigned / and % can’t overflow.
[Exs 8] Implement some computations using a 24 hour clock, e.g. 3 hours after ten, 8 hours after twenty.
24 1. ACQUAINTANCE
4.2. Operators that modify objects. Another important operation that we already
have seen is assignment, a = 42. As you can see from that example this operator is not
symmetric, it has a value on the right and an object on the left. In a freaky abuse of
language C jargon often refers to the right hand side as rvalueC
(right value) and to the
object on the left as lvalueC
(left value). We will try to avoid that vocabulary whenever we
can: speaking of a value and an object is completely sufficient.
C has other assignment operators. For any binary operator @ from the five we have
known above all have the syntax
1 an_object @= some_expression;
They are just convenient abbreviations for combining the arithmetic operator @ and
assignment, see Table 3. An equivalent form would be
1 an_object = (an_object @ (some_expression));
In other words there are operators +=, -=, *=, /=, and %=. For example in a for loop
operator += could be used:
1 for ( s i z e _ t i = 0; i < 25; i += 7) {
2 ...
3 }
The syntax of these operators is a bit picky, you aren’t allowed to have blanks between
the different characters, e.g. “i + = 7” instead of “i += 7” is a syntax error.
Rule 1.4.2.1 Operators must have all their characters directly attached to each other.
We already have seen two other operators that modify objects, namely the increment
operatorC
++ and the decrement operatorC
--:
• ++i is equivalent to i += 1,
• --i is equivalent to i -= 1.
All these assignment operators are real operators, they return a value (but not an ob-
ject!). You could, if you were screwed enough write something like
1 a = b = c += ++d;
2 a = (b = (c += (++d))); // same
But such combinations of modifications to several objects in one go is generally frowned
upon. Don’t do that unless you want to obfuscate your code. Such changes to objects that
are involved in an expression are referred to as side effectsC
.
Rule 1.4.2.2 Side effects in value expressions are evil.
Rule 1.4.2.3 Never modify more than one object in a statement.
For the increment and decrement operators there are even two other forms, namely
postfix incrementC
and postfix decrementC
. They differ from the one that we have seen
in the result when they are used inside a larger expression. But since you will nicely obey
to Rule 1.4.2.2, you will not be tempted to use them.
4.3. Boolean context. Several operators yield a value 0 or 1 depending on whether
some condition is verified or not, see Table 2. They can be grouped in two categories,
comparisons and logical evaluation.
4. EXPRESSING COMPUTATIONS 25
4.3.1. Comparison. In our examples we already have seen the comparison operators
==, !=, <, and >. Whereas the later two perform strict comparison between their operands,
operators <= and >= perform “less or equal” and “greater or equal” comparison, respec-
tively. All these operators can be used in control statements as we have already seen, but
they are actually more powerful than that.
Rule 1.4.3.1 Comparison operators return the values false or true.
Remember that false and true are nothing else then fancy names for 0 and 1 respec-
tively. So they can perfectly used in arithmetic or for array indexing. In the following
code
1 s i z e _ t c = (a < b) + (a == b) + (a > b);
2 s i z e _ t d = (a <= b) + (a >= b) - 1;
we have that c will always be 1, and d will be 1 if a and b are equal and 0 otherwise. With
1 double largeA[N] = { 0 };
2 ...
3 /* fill largeA somehow */
4
5 s i z e _ t sign[2] = { 0, 0 };
6 for ( s i z e _ t i = 0; i < N; ++i) {
7 sign[(largeA[i] < 1.0)] += 1;
8 }
the array element sign[0] will hold the number of values in largeA that are greater or
equal than 1.0 and sign[1] those that are strictly less.
Finally, let’s mention that there also is an identifier “not_eq” that may be used as a
replacement for !=. This feature is rarely used. It dates back to the times where some
characters were not properly present on all computer platforms. To be able to use it you’d
have to include the file iso646.h . #include <iso646.h>
4.3.2. Logic. Logic operators operate on values that are already supposed to repre-
sent values false or true. If they are not, the rules that we described for conditional ex-
ecution with Rules 1.3.1.1 and 1.3.1.2 apply first. The operator ! (not) logically negates
its operand, operator && (and) is logical and, operator || (or) is logical or. The results of
these operators are summarized in the following table:
TABLE 1. Logical operators
a not a
false true
true false
a and b false true
false false false
true false true
a or b false true
false false true
true true true
Similar as for the comparison operators we have
Rule 1.4.3.2 Logic operators return the values false or true.
Again, remember that these values are nothing else than 0 and 1 and can thus be used
as indices:
1 double largeA[N] = { 0 };
2 ...
3 /* fill largeA somehow */
4
26 1. ACQUAINTANCE
5 s i z e _ t isset[2] = { 0, 0 };
6 for ( s i z e _ t i = 0; i < N; ++i) {
7 isset[!!largeA[i]] += 1;
8 }
Here the expression !!largeA[i] applies the ! operator twice and thus just ensures that
largeA[i] is evaluated as a truth value according to the general Rule 1.3.1.4. As a result,
the array elements isset[0] and isset[1] will hold the number of values that are equal
to 0.0 and unequal, respectively.
Operators && and || have a particular property that is called short circuit evaluationC
.
This barbaric term denotes the fact that the evaluation of the second operand is omitted, if
it is not necessary for the result of the operation. Suppose isgreat and issmall are two
functions that yield a scalar value. Then in this code
1 i f (isgreat(a) && issmall(b))
2 ++x;
3 i f (issmall(c) || issmall(d))
4 ++y;
then second function call on each line would conditionally be omitted during execution:
issmall(b) if isgreat(a) was 0, issmall(d) if issmall(c) was not 0. Equivalent
code would be
1 i f (isgreat(a))
2 i f (issmall(b))
3 ++x;
4 i f (issmall(c)) ++y;
5 e l s e i f (issmall(d)) ++y;
4.4. The ternary or conditional operator. The ternary operator is much similar to
an if statement, only that it is an expression that returns the value of the chosen branch:
1 s i z e _ t size_min( s i z e _ t a, s i z e _ t b) {
2 return (a < b) ? a : b;
3 }
Similar to the operators && and || the second and third operand are only evaluated if
they are really needed. The macro sqrt from tgmath.h computes the square root of a#include <tgmath.h>
non-negative value. Calling it with a negative value raises a domain errorC
.
1 # include <tgmath.h>
2
3 # i f d e f __STDC_NO_COMPLEX__
4 # error "we need complex arithmetic"
5 # endif
6
7 double complex sqrt_real(double x) {
8 return (x < 0) ? CMPLX(0, sqrt(-x)) : CMPLX(sqrt(x), 0);
9 }
In this function sqrt is only called once, and the argument to that call is never negative. So
sqrt_real is always well behaved, no bad values are ever passed to sqrt.
Complex arithmetic and the tools used for it need the header complex.h which is#include <complex.h>
indirectly included by tgmath.h. They will be introduced later in Section 5.5.6.#include <tgmath.h>
4. EXPRESSING COMPUTATIONS 27
In the example above we also see conditional compilation that is achieved with preprocessor
directivesC
, the #ifdef construct ensures that we hit the #error condition only if the macro
__STDC_NO_COMPLEX__ isn’t defined.
4.5. Evaluation order. Of the above operators we have seen that &&, || and ?:
condition the evaluation of some of their operands. This implies in particular that for
these operators there is an evaluation order on the operands: the first operand, since it is a
condition for the remaining ones is always evaluated first:
Rule 1.4.5.1 &&, ||, ?: and , evaluate their first operand first.
Here, , is the only operator that we haven’t introduced, yet. It evaluates its operands in
order and the result is then the value of the right operand. E.g. (f(a), f(b)) would first
evaluate f(a), then f(b) and the result would be the value of f(b). This feature is rarely
useful in clean code, and is a trap for beginners. E.g. A[i, j] is not a two dimension
index for matrix A, but results just in A[j].
Rule 1.4.5.2 Don’t use the , operator.
Other operators don’t have an evaluation restriction. E.g. in an expression such as
f(a)+g(b) there is no pre-established ordering specifying whether f(a) or g(b) is to be
computed first. If any of functions f or g work with side effects, e.g. if f modifies b behind
the scenes, the outcome of the expression will depend on the chosen order.
Rule 1.4.5.3 Most operators don’t sequence their operands.
That chosen order can depend on your compiler, on the particular version of that com-
piler, on compile time options or just on the code that surrounds the expression. Don’t rely
on any such particular sequencing, it will bite you.
The same holds for the arguments of functions. In something like
1 p r i n t f ("%g and %gn", f(a), f(b));
we wouldn’t know which of the last two arguments is evaluated first.
Rule 1.4.5.4 Function calls don’t sequence their argument expressions.
The only reliable way not to depend on evaluation ordering of arithmetic expressions
is to ban side effects:
Rule 1.4.5.5 Functions that are called inside expressions should not have side effects.
28 1. ACQUAINTANCE
5. Basic values and data
We will now change the angle of view from the way “how things are to be done”
(statements and expressions) to the things on which C programs operate, valuesC
and
dataC
.
A concrete program at an instance in time has to represent values. Humans have
a similar strategy: nowadays we use a decimal presentation to write numbers down on
paper, a system that we inherited from the arabic culture. But we have other systems to
write numbers: roman notation, e.g., or textual notation. To know that the word “twelve”
denotes the value 12 is a non trivial step, and reminds us that European languages are
denoting numbers not entirely in decimal but also in other systems. English is mixing with
base 12, French with bases 16 and 20. For non-natives in French such as myself, it may be
difficult to spontaneously associate “quatre vingt quinze” (four times twenty and fifteen)
with the number 95.
Similarly, representations of values in a computer can vary “culturally” from architec-
ture to architecture or are determined by the type that the programmer gave to the value.
What representation a particular value has should in most cases not be your concern; the
compiler is there to organize the translation between values and representations back and
forth.
Not all representations of values are even observable from within your program. They
only are so, if they are stored in addressable memory or written to an output device. This
is another assumptions that C makes: it supposes that all data is stored in some sort of
storage called memory that allows to retrieve values from different parts of the program in
different moments in time. For the moment only keep in mind that there is something like
an observable stateC
, and that a C compiler is only obliged to produce an executable that
reproduces that observable state.
5.0.1. Values. A value in C is an abstract entity that usually exists beyond your pro-
gram, the particular implementation of that program and the representation of the value
during a particular run of the program. As an example, the value and concept of 0 should
and will always have the same effects on all C platforms: adding that value to another value
x will again be x, evaluating a value 0 in a control expression will always trigger the false
branch of the control statement. C has the very simple rule
Rule 1.5.0.6 All values are numbers or translate to such.
This really concerns all values a C program is about, whether these are the characters
or texts that we print, truth values, measures that we take, relations that we investigate.
First of all, think of these numbers as of mathematical entities that are independent of your
program and its concrete realization.
The data of a program execution are all the assembled values of all objects at a given
moment. The state of the program execution is determined by:
• the executable
• the current point of execution
• the data
• outside intervention such as IO from the user.
If we abstract from the last point, an executable that runs with the same data from
the same point of execution must give the same result. But since C programs should be
portable between systems, we want more than that. We don’t want that the result of a
computation depends on the executable (which is platform specific) but idealy that it only
depends on the program specification itself.
5.0.2. Types. An important step in that direction is the concept of typesC
. A type
is an additional property that C associates with values. Up to now we already have seen
several such types, most prominently size_t , but also double or bool.
5. BASIC VALUES AND DATA 29
Rule 1.5.0.7 All values have a type that is statically determined.
Rule 1.5.0.8 Possible operations on a value are determined by its type.
Rule 1.5.0.9 A value’s type determines the results of all operations.
5.0.3. Binary representation and the abstract state machine. Unfortunately, the va-
riety of computer platforms is not such that the C standard can impose the results of the
operations on a given type completely. Things that are not completely specified as such
by the standard are e.g. how the sign of signed type is represented, the so-called sign
representation, or to which precision a double floating point operation is performed, so-
called floating point representation. C only imposes as much properties on all representa-
tions, such that the results of operations can be deduced a priori from two different sources:
• the values of the operands
• some characteristic values that describe the particular platform.
E.g. the operations on the type size_t can be entirely determined when inspecting the value
of SIZE_MAX in addition to the operands. We call the model to represent values of a given
type on a given platform the binary representationC
of the type.
Rule 1.5.0.10 A type’s binary representation determines the results of all operations.
Generally, all information that we need to determine that model are in reach of any C
program, the C library headers provide the necessary information through named values
(such as SIZE_MAX), operators and function calls.
Rule 1.5.0.11 A type’s binary representation is observable.
This binary representation is still a model and so an abstract representation in the sense
that it doesn’t completely determine how values are stored in the memory of a computer
or on a disk or other persistent storage device. That representation would be the object
representation. In contrast to the binary representation, the object representation usually is
of not much concern to us, as long as we don’t want to hack together values of objects in
main memory or have to communicate between computers that have a different platform
model. Much later, in Section 12.1, we will see that we may even observe the object
representation if such an object is stored in memory and we know its address.
As a consequence all computation is fixed through the values, types and their binary
representations that are specified in the program. The program text describes an abstract
state machineC
that regulates how the program switches from one state to the next. These
transitions are determined by value, type and binary representation, only.
Rule 1.5.0.12 (as-if) Programs execute as if following the abstract state machine.
5.0.4. Optimization. How a concrete executable achieves this goal is left to the discre-
tion of the compiler creators. Most modern C compilers produce code that doesn’t follow
the exact code prescription, they cheat wherever they can and only respect the observable
states of the abstract state machine. For example a sequence of additions with constants
values such as
1 x += 5;
2 /* do something else without x in the mean time */
3 x += 7;
may in many cases be done as if it were specified as either
30 1. ACQUAINTANCE
1 /* do something without x */
2 x += 12;
or
1 x += 12;
2 /* do something without x */
The compiler may perform such changes to the execution order as long as there will be no
observable difference in the result, e.g. as long we don’t print the intermediate value of “x”
and as long as we don’t use that intermediate value in another computation.
But such an optimization can also be forbidden because the compiler can’t prove that
a certain operation will not force a program termination. In our example, much depends on
the type of “x”. If the current value of x could be close to the upper limit of the type, the
innocent looking operation x += 7 may produce an overflow. Such overflows are handled
differently according to the type. As we have seen above, overflow of an unsigned type
makes no problem and the result of the condensed operation will allways be consistent with
the two seperated ones. For other types such as signed integer types (signed) or floating
point types (double) an overflow may “raise an exception” and terminate the program. So
in this cases the optimization cannot be performed.
This allowed slackness between program description and abstract state machine is a
very valuable feature, commonly referred to as optimizationC
. Combined with the relative
simplicity of its language description, this is actually one of the main features that allows
C to outperform other programming languages that have a lot more knobs and whistles.
An important consequence about the discussion above can be summarized as follows.
Rule 1.5.0.13 Type determines optimization opportunities.
5.1. Basic types. C has a series of basic types and some means of constructing derived
typesC
from them that we will describe later in Section 6.
Mainly for historical reasons, the system of basic types is a bit complicated and the
syntax to specify such types is not completely straightforward. There is a first level of
specification that is entirely done with keywords of the language, such as signed, int or
double. This first level is mainly organized according to C internals. On top of that there
is a second level of specification that comes through header files and for which we already
have seen examples, too, namely size_t or bool. This second level is organized by type
semantic, that is by specifying what properties a particular type brings to the programmer.
We will start with the first level specification of such types. As we already discussed
above in Rule 1.5.0.6, all basic values in C are numbers, but there are numbers of dif-
ferent kind. As a principal distinction we have two different classes of numbers, with
two subclasses, each, namely unsigned integersC
, signed integersC
, real floating point
numbersC
and complex floating point numbersC
All these classes contain several types. They differ according to their precisionC
,
which determines the valid range of values that are allowed for a particular type.9
Table 2
contains an overview of the 18 base types. As you can see from that table there are some
types which we can’t directly use for arithmetic, so-called narrow typesC
. A a rule of
thumb we get
Rule 1.5.1.1 Each of the 4 classes of base types has 3 distinct unpromoted types.
9The term precision is used here in a restricted sense as the C standard defines it. It is different from the
accuracy of a floating point computation.
5. BASIC VALUES AND DATA 31
TABLE 2. Base types according to the four main type classes. Types
with a grey background don’t allow for arithmetic, they are promoted
before doing arithmetic. Type char is special since it can be unsigned or
signed, depending on the platform. All types in the table are considered
to be distinct types, even if they have the same class and precision.
class systematic name other name
integers
unsigned
_Bool bool
unsigned char
unsigned short
unsigned int unsigned
unsigned long
unsigned long long
[un]signed char
signed
signed char
signed short short
signed int signed or int
signed long long
signed long long long long
floating point
real
float
double
long double
complex
float _Complex float complex
double _Complex double complex
long double _Complex long double complex
Contrary to what many people believe, the C standard doesn’t even prescribe the pre-
cision of these 12 types, it only constrains them. They depend on a lot of factors that are
implementation dependentC
. Thus, to chose the “best” type for a given purpose in a
portable way could be a tedious task, if we wouldn’t get help from the compiler implemen-
tation.
Remember that unsigned types are the most convenient types, since they are the only
types that have an arithmetic that is defined consistently with mathematical properties,
namely modulo operation. They can’t raise signals on overflow and can be optimized best.
They are described in more detail in Section 5.5.1.
Rule 1.5.1.2 Use size_t for sizes, cardinalities or ordinal numbers.
Rule 1.5.1.3 Use unsigned for small quantities that can’t be negative.
If your program really needs values that may both be positive and negative but don’t
have fractions, use a signed type, see Section 5.5.5.
Rule 1.5.1.4 Use signed for small quantities that bear a sign.
Rule 1.5.1.5 Use ptrdiff_t for large differences that bear a sign.
If you want to do fractional computation with values such as 0.5 or 3.77189E+89
use floating point types, see Section 5.5.6.
Rule 1.5.1.6 Use double for floating point calculations.
Rule 1.5.1.7 Use double complex for complex calculations.
32 1. ACQUAINTANCE
TABLE 3. Some semantic arithmetic types for specialized use cases
type header context of definition meaning
uintmax_t stdint.h maximum width unsigned
integer, preprocessor
intmax_t stdint.h maximum width signed inte-
ger, preprocessor
errno_t errno.h Appendix K error return instead of int
rsize_t stddef.h Appendix K size arguments with bounds
checking
time_t time.h time(0), difftime(t1, t0) calendar time in seconds
since epoch
clock_t time.h clock() processor time
The C standard defines a lot of other types, among them other arithmetic types that
model special use cases. Table 3 list some of them. The first two represents the type with
maximal width that the platform supports.
The second pair are types that can replace int and size_t in certain context. The first,
errno_t, is just another name for int to emphasize the fact that it encodes an error value;
rsize_t , in turn, is used to indicate that an interface performs bounds checking on its “size”
parameters.
The two types time_t and clock_t are used to handle times. They are semantic types,
because the precision of the time computation can be different from platform to platform.
The way to have a time in seconds that can be used in arithmetic is the function difftime:
it computes the difference of two timestamps. clock_t values present the platforms model
of processor clock cycles, so the unit of time here is usually much below the second;
CLOCKS_PER_SEC can be used to convert such values to seconds.
5.2. Specifying values. We have already seen several ways in which numerical con-
stants, so-called literalsC
can be specified:
123 decimal integer constantC
. The most natural choice for most of us.
077 octal integer constantC
. This is specified by a sequence of digits, the first being
0 and the following between 0 and 7, e.g. 077 has the value 63. This type of
specification has merely historical value and is rarely used nowadays.There is
only one octal literal that is commonly used, namely 0 itself.
0xFFFF hexadecimal integer constantC
. This is specified by a start of 0x followed by
a sequence of digits between 0, ..., 9, a ...f, e.g. 0xbeaf is value 48815. The
a .. f and x can also be written in capitals, 0XBEAF.
1.7E-13 decimal floating point constantsC
. Quite familiar for the version that just has
a decimal point. But there is also the “scientific” notation with an exponent. In
the general form mEe is interpreted as m · 10e
.
0x1.7aP-13 hexadecimal floating point constantsC
. Usually used to describe floating point
values in a form that will ease to specify values that have exact representations.
The general form 0XhPe is interpreted as h · 2e
. Here h is specified as an hexa-
decimal fraction. The exponent e is still specified as a decimal number.
’a’ integer character constantC
. These are characters put into ’ apostrophs, such
as ’a’ or ’?’. These have values that are only implicitly fixed by the C stan-
dard. E.g. ’a’ corresponds to the integer code for the character “a” of the Latin
alphabet.
Inside character constants a “” character has a special meaning. E.g. we
already have seen ’n’ for the newline character.
5. BASIC VALUES AND DATA 33
"hello" string literalsC
. They specify text, e.g. as we needed it for the printf and puts
functions. Again, the “” character is special as in character constants.
All but the last are numerical constants, they specify numbers. An important rule
applies:
Rule 1.5.2.1 Numerical literals are never negative.
That is if we write something like -34 or -1.5E-23, the leading sign is not considered
part of the number but is the negation operator applied to the number that comes after. We
will see below where this is important. Bizarre as this may sound, the minus sign in the
exponent is considered to be part of a floating point literal.
In view of Rule 1.5.0.7 we know that all literals must not only have a value but also a
type. Don’t mix up the fact of a constant having a positive value with its type, which can
be signed.
Rule 1.5.2.2 Decimal integer constants are signed.
This is an important feature, we’d probably expect the expression -1 to be a signed,
negative value.
To determine the exact type for integer literals we always have a “first fit” rule. For
decimal integers this reads:
Rule 1.5.2.3 A decimal integer constant has the first of the 3 signed types that fits it.
This rule can have surprising effects. Suppose that on a platform the minimal signed
value is −215
= −32768 and the maximum value is 215
− 1 = 32767. The constant
32768 then doesn’t fit into signed and is thus signed long. As a consequence the expression
-32768 has type signed long. Thus the minimal value of the type signed on such a platform
cannot be written as a literal constant.[Exs 10]
Rule 1.5.2.4 The same value can have different types.
Deducing the type of an octal or hexadecimal constant is a bit more complicated.
These can also be of an unsigned type if the value doesn’t fit for a signed one. In our ex-
ample above the hexadecimal constant 0x7FFF has the value 32767 and thus type signed.
Other than for the decimal constant, the constant 0x8000 (value 32768 written in hexadec-
imal) then is an unsigned and expression -0x8000 again is unsigned.[Exs 11]
Rule 1.5.2.5 Don’t use octal or hexadecimal constants to express negative values.
Or if we formulate it postively
Rule 1.5.2.6 Use decimal constants to express negative values.
Integer constants can be forced to be unsigned or to be of a type of minimal width. This
done by appending “U”, “L” or “LL” to the literal. E.g. 1U has value 1 and type unsigned,
1L is signed long and 1ULL has the same value but type unsigned long long.[Exs 12]
A common error is to try to assign a hexadecimal constant to a signed under the expec-
tation that it will represent a negative value. Consider something like int x = 0xFFFFFFFF.
[Exs 10] Show that if the minimal and maximal values for signed long long have similar properties, the smallest
integer value for the platform can’t be written as a combination of one literal with a minus sign.
[Exs 11] Show that if in that case the maximum unsigned is 216 − 1 that then -0x8000 has value 32768, too.
[Exs 12] Show that the expressions -1U, -1UL and -1ULL have the maximum values and type of the three usable
unsigned types, respectively.
34 1. ACQUAINTANCE
TABLE 4. Examples for constants and their types, under the supposition
that signed and unsigned have the commonly used representation with 32
bit.
constant x value type value of −x
2147483647 +2147483647 signed −2147483647
2147483648 +2147483648 signed long −2147483648
4294967295 +4294967295 signed long −4294967295
0x7FFFFFFF +2147483647 signed −2147483647
0x80000000 +2147483648 unsigned +2147483648
0xFFFFFFFF +4294967295 unsigned +1
1 +1 signed −1
1U +1 unsigned +4294967295
This is done under the assumption that the hexadecimal value has the same binary representation
as the signed value −1. On most architectures with 32 bit signed this will be true (but not
on all of them) but then nothing guarantees that the effective value +4294967295 is con-
verted to the value −1.
You remember that value 0 is important. It is so important that it has a lot of equivalent
spellings: 0, 0x0 and ’0’ are all the same value, a 0 of type signed int. 0 has no decimal
integer spelling: 0.0 is a decimal spelling for the value 0 but seen as a floating point value,
namely with type double.
Rule 1.5.2.7 Different literals can have the same value.
For integers this rule looks almost trivial, for floating point constants this is less ob-
vious. Floating point values are only an approximation of the value they present literally,
because binary digits of the fractional part may be truncated or rounded.
Rule 1.5.2.8 The effective value of a decimal floating point constant may be different
from its literal value.
E.g. on my machine the constant 0.2 has in fact the value 0.2000000000000000111,
and as a consequence constants 0.2 and 0.2000000000000000111 have the same value.
Hexadecimal floating point constants have been designed because they better corre-
spond to binary representations of floating point values. In fact, on most modern architec-
tures such a constant (that has not too many digits) will exactly correspond to the literal
value. Unfortunately, these beasts are almost unreadable for mere humans.
Finally, floating point constants can be followed by the letters f or F to denote a float
or by l or L to denote a long double. Otherwise they are of type double. Beware that
different types of constants generally lead to different values for the same literal. A typical
example:
float double long double
literal 0.2F 0.2 0.2L
value 0x1.99999AP-3F 0x1.999999999999AP-3 0xC.CCCCCCCCCCCCCCDP-6L
Rule 1.5.2.9 Literals have value, type and binary representation.
5.3. Initializers. We already have seen (Section 2.3) that the initializer is an impor-
tant part of an object definition. Accessing uninitialized objects has undefined behavior,
the easiest way out is to avoid that situation systematically:
Rule 1.5.3.1 All variables should be initialized.
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c
Modern c

More Related Content

What's hot

JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
caijjournal
 
C
CC
Comparison of Programming Platforms
Comparison of Programming PlatformsComparison of Programming Platforms
Comparison of Programming Platforms
Anup Hariharan Nair
 
Unit 2 l1
Unit 2 l1Unit 2 l1
Unit 2 l1
Mitali Chugh
 
Copmuter Languages
Copmuter LanguagesCopmuter Languages
Copmuter Languages
actanimation
 
A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...
IRJET Journal
 
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof Chethan Raj C
 
2. C# Guide - To Print
2. C# Guide - To Print2. C# Guide - To Print
2. C# Guide - To Print
Chinthaka Fernando
 
Presentation of programming languages for beginners
Presentation of programming languages for beginnersPresentation of programming languages for beginners
Presentation of programming languages for beginners
Clement Levallois
 
Introduction to programming
Introduction to programmingIntroduction to programming
Introduction to programming
Neeru Mittal
 
Programming in c
Programming in cProgramming in c
Programming in c
vishnu973656
 
HTML for beginners
HTML for beginnersHTML for beginners
HTML for beginners
Salahaddin University-Erbil
 
Mca 504 dotnet_unit1
Mca 504 dotnet_unit1Mca 504 dotnet_unit1
Chapter 13.1.4
Chapter 13.1.4Chapter 13.1.4
Chapter 13.1.4
patcha535
 
Programming
Programming Programming
Programming
Kapcom Rawal
 
Introduction to .NET Framework
Introduction to .NET FrameworkIntroduction to .NET Framework
Introduction to .NET Framework
Kamlesh Makvana
 
Introduction to C# Programming
Introduction to C# ProgrammingIntroduction to C# Programming
Introduction to C# Programming
Sherwin Banaag Sapin
 
.Net framework
.Net framework.Net framework
.Net framework
Viv EK
 
Overview new programming languages
Overview new programming languagesOverview new programming languages
Overview new programming languages
umoren
 
Csharp ebook
Csharp ebookCsharp ebook
Csharp ebook
jeevesh_kumar
 

What's hot (20)

JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
JPT : A SIMPLE JAVA-PYTHON TRANSLATOR
 
C
CC
C
 
Comparison of Programming Platforms
Comparison of Programming PlatformsComparison of Programming Platforms
Comparison of Programming Platforms
 
Unit 2 l1
Unit 2 l1Unit 2 l1
Unit 2 l1
 
Copmuter Languages
Copmuter LanguagesCopmuter Languages
Copmuter Languages
 
A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...A Research Study of Data Collection and Analysis of Semantics of Programming ...
A Research Study of Data Collection and Analysis of Semantics of Programming ...
 
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
 
2. C# Guide - To Print
2. C# Guide - To Print2. C# Guide - To Print
2. C# Guide - To Print
 
Presentation of programming languages for beginners
Presentation of programming languages for beginnersPresentation of programming languages for beginners
Presentation of programming languages for beginners
 
Introduction to programming
Introduction to programmingIntroduction to programming
Introduction to programming
 
Programming in c
Programming in cProgramming in c
Programming in c
 
HTML for beginners
HTML for beginnersHTML for beginners
HTML for beginners
 
Mca 504 dotnet_unit1
Mca 504 dotnet_unit1Mca 504 dotnet_unit1
Mca 504 dotnet_unit1
 
Chapter 13.1.4
Chapter 13.1.4Chapter 13.1.4
Chapter 13.1.4
 
Programming
Programming Programming
Programming
 
Introduction to .NET Framework
Introduction to .NET FrameworkIntroduction to .NET Framework
Introduction to .NET Framework
 
Introduction to C# Programming
Introduction to C# ProgrammingIntroduction to C# Programming
Introduction to C# Programming
 
.Net framework
.Net framework.Net framework
.Net framework
 
Overview new programming languages
Overview new programming languagesOverview new programming languages
Overview new programming languages
 
Csharp ebook
Csharp ebookCsharp ebook
Csharp ebook
 

Similar to Modern c

Intro. to prog. c++
Intro. to prog. c++Intro. to prog. c++
Intro. to prog. c++
KurdGul
 
Module 1 2 just basic-
Module 1 2  just basic-Module 1 2  just basic-
Module 1 2 just basic-
Shanmugam Thiagoo
 
Module 201 2 20 just 20 basic
Module 201   2  20  just 20 basic Module 201   2  20  just 20 basic
Module 201 2 20 just 20 basic
Nick Racers
 
Migrating From Cpp To C Sharp
Migrating From Cpp To C SharpMigrating From Cpp To C Sharp
Migrating From Cpp To C Sharp
Ganesh Samarthyam
 
Notes
NotesNotes
Programming of c++
Programming of c++Programming of c++
Programming of c++
Ateeq Sindhu
 
C++ book
C++ bookC++ book
C++ book
mailmerk
 
C AND DATASTRUCTURES PREPARED BY M V B REDDY
C AND DATASTRUCTURES PREPARED BY M V B REDDYC AND DATASTRUCTURES PREPARED BY M V B REDDY
C AND DATASTRUCTURES PREPARED BY M V B REDDY
Malikireddy Bramhananda Reddy
 
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
C notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit orderC notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit order
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
Malikireddy Bramhananda Reddy
 
C c#
C c#C c#
C c#
Sireesh K
 
C plus plus for hackers it security
C plus plus for hackers it securityC plus plus for hackers it security
C plus plus for hackers it security
CESAR A. RUIZ C
 
ProgFund_Lecture_7_Intro_C_Sequence.pdf
ProgFund_Lecture_7_Intro_C_Sequence.pdfProgFund_Lecture_7_Intro_C_Sequence.pdf
ProgFund_Lecture_7_Intro_C_Sequence.pdf
lailoesakhan
 
C tutorials
C tutorialsC tutorials
C tutorials
sujit11feb
 
C++ language basic
C++ language basicC++ language basic
C++ language basic
Waqar Younis
 
Comso c++
Comso c++Comso c++
Comso c++
Mi L
 
Notes of c programming 1st unit BCA I SEM
Notes of c programming  1st unit BCA I SEMNotes of c programming  1st unit BCA I SEM
Notes of c programming 1st unit BCA I SEM
Mansi Tyagi
 
C++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh ShareC++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh Share
Nho Vĩnh
 
C# and java comparing programming languages
C# and java  comparing programming languagesC# and java  comparing programming languages
C# and java comparing programming languages
Shishir Roy
 
event driven programing course for all.pdf
event driven programing course for all.pdfevent driven programing course for all.pdf
event driven programing course for all.pdf
addisu67
 
SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018
SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018
SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018
Rai Saheb Bhanwar Singh College Nasrullaganj
 

Similar to Modern c (20)

Intro. to prog. c++
Intro. to prog. c++Intro. to prog. c++
Intro. to prog. c++
 
Module 1 2 just basic-
Module 1 2  just basic-Module 1 2  just basic-
Module 1 2 just basic-
 
Module 201 2 20 just 20 basic
Module 201   2  20  just 20 basic Module 201   2  20  just 20 basic
Module 201 2 20 just 20 basic
 
Migrating From Cpp To C Sharp
Migrating From Cpp To C SharpMigrating From Cpp To C Sharp
Migrating From Cpp To C Sharp
 
Notes
NotesNotes
Notes
 
Programming of c++
Programming of c++Programming of c++
Programming of c++
 
C++ book
C++ bookC++ book
C++ book
 
C AND DATASTRUCTURES PREPARED BY M V B REDDY
C AND DATASTRUCTURES PREPARED BY M V B REDDYC AND DATASTRUCTURES PREPARED BY M V B REDDY
C AND DATASTRUCTURES PREPARED BY M V B REDDY
 
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
C notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit orderC notes by m v b  reddy(gitam)imp  notes  all units notes  5 unit order
C notes by m v b reddy(gitam)imp notes all units notes 5 unit order
 
C c#
C c#C c#
C c#
 
C plus plus for hackers it security
C plus plus for hackers it securityC plus plus for hackers it security
C plus plus for hackers it security
 
ProgFund_Lecture_7_Intro_C_Sequence.pdf
ProgFund_Lecture_7_Intro_C_Sequence.pdfProgFund_Lecture_7_Intro_C_Sequence.pdf
ProgFund_Lecture_7_Intro_C_Sequence.pdf
 
C tutorials
C tutorialsC tutorials
C tutorials
 
C++ language basic
C++ language basicC++ language basic
C++ language basic
 
Comso c++
Comso c++Comso c++
Comso c++
 
Notes of c programming 1st unit BCA I SEM
Notes of c programming  1st unit BCA I SEMNotes of c programming  1st unit BCA I SEM
Notes of c programming 1st unit BCA I SEM
 
C++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh ShareC++ In One Day_Nho Vĩnh Share
C++ In One Day_Nho Vĩnh Share
 
C# and java comparing programming languages
C# and java  comparing programming languagesC# and java  comparing programming languages
C# and java comparing programming languages
 
event driven programing course for all.pdf
event driven programing course for all.pdfevent driven programing course for all.pdf
event driven programing course for all.pdf
 
SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018
SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018
SYSTEM DEVELOPMENTS LIFE CYCLE BSC I 2018
 

More from Stanley Ho

Riffmci
RiffmciRiffmci
Riffmci
Stanley Ho
 
用Raspberry PI學Linux驅動程式
用Raspberry PI學Linux驅動程式用Raspberry PI學Linux驅動程式
用Raspberry PI學Linux驅動程式
Stanley Ho
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2
Stanley Ho
 
看日記學Git
看日記學Git看日記學Git
看日記學GitStanley Ho
 
2006 CIC 電子報
2006 CIC 電子報2006 CIC 電子報
2006 CIC 電子報
Stanley Ho
 
Linux kernel 2.6 document
Linux kernel 2.6 documentLinux kernel 2.6 document
Linux kernel 2.6 document
Stanley Ho
 
LSP Cn Alpha(Revision 77)
LSP Cn Alpha(Revision 77)LSP Cn Alpha(Revision 77)
LSP Cn Alpha(Revision 77)
Stanley Ho
 
Bluespec Tutorial Helloworld
Bluespec Tutorial HelloworldBluespec Tutorial Helloworld
Bluespec Tutorial Helloworld
Stanley Ho
 
E Book Mems
E Book MemsE Book Mems
E Book Mems
Stanley Ho
 
ACPI In Linux CN
ACPI In Linux CNACPI In Linux CN
ACPI In Linux CN
Stanley Ho
 
Interrupt In Linux 1.1
Interrupt In Linux 1.1Interrupt In Linux 1.1
Interrupt In Linux 1.1
Stanley Ho
 
USB In A Nutshell - Making Sense of the USB Standard.
USB In A Nutshell - Making Sense of the USB Standard.USB In A Nutshell - Making Sense of the USB Standard.
USB In A Nutshell - Making Sense of the USB Standard.
Stanley Ho
 
USB Discussion
USB DiscussionUSB Discussion
USB Discussion
Stanley Ho
 
2002 5 1 Introduction To Amba Bus System
2002 5 1 Introduction To Amba Bus System2002 5 1 Introduction To Amba Bus System
2002 5 1 Introduction To Amba Bus System
Stanley Ho
 

More from Stanley Ho (14)

Riffmci
RiffmciRiffmci
Riffmci
 
用Raspberry PI學Linux驅動程式
用Raspberry PI學Linux驅動程式用Raspberry PI學Linux驅動程式
用Raspberry PI學Linux驅動程式
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2
 
看日記學Git
看日記學Git看日記學Git
看日記學Git
 
2006 CIC 電子報
2006 CIC 電子報2006 CIC 電子報
2006 CIC 電子報
 
Linux kernel 2.6 document
Linux kernel 2.6 documentLinux kernel 2.6 document
Linux kernel 2.6 document
 
LSP Cn Alpha(Revision 77)
LSP Cn Alpha(Revision 77)LSP Cn Alpha(Revision 77)
LSP Cn Alpha(Revision 77)
 
Bluespec Tutorial Helloworld
Bluespec Tutorial HelloworldBluespec Tutorial Helloworld
Bluespec Tutorial Helloworld
 
E Book Mems
E Book MemsE Book Mems
E Book Mems
 
ACPI In Linux CN
ACPI In Linux CNACPI In Linux CN
ACPI In Linux CN
 
Interrupt In Linux 1.1
Interrupt In Linux 1.1Interrupt In Linux 1.1
Interrupt In Linux 1.1
 
USB In A Nutshell - Making Sense of the USB Standard.
USB In A Nutshell - Making Sense of the USB Standard.USB In A Nutshell - Making Sense of the USB Standard.
USB In A Nutshell - Making Sense of the USB Standard.
 
USB Discussion
USB DiscussionUSB Discussion
USB Discussion
 
2002 5 1 Introduction To Amba Bus System
2002 5 1 Introduction To Amba Bus System2002 5 1 Introduction To Amba Bus System
2002 5 1 Introduction To Amba Bus System
 

Recently uploaded

Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery FleetStork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
Vince Scalabrino
 
SAP ECC & S4 HANA PPT COMPARISON MM.pptx
SAP ECC & S4 HANA PPT COMPARISON MM.pptxSAP ECC & S4 HANA PPT COMPARISON MM.pptx
SAP ECC & S4 HANA PPT COMPARISON MM.pptx
aneeshmanikantan2341
 
European Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptxEuropean Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptx
Digital Teacher
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
OnePlan Solutions
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service AvailableCall Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
sapnaanpad7
 
OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17
OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17
OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17
Shane Coughlan
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
Ortus Solutions, Corp
 
Beginner's Guide to Observability@Devoxx PL 2024
Beginner's  Guide to Observability@Devoxx PL 2024Beginner's  Guide to Observability@Devoxx PL 2024
Beginner's Guide to Observability@Devoxx PL 2024
michniczscribd
 
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...
Ortus Solutions, Corp
 
Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...
Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...
Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...
Anita pandey
 
What’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 UpdateWhat’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 Update
VictoriaMetrics
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
Anand Bagmar
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
Alina Yurenko
 
NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024
Bert Jan Schrijver
 
Folding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a seriesFolding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a series
Philip Schwarz
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt
lavesingh522
 
Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)
wonyong hwang
 
Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...
Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...
Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...
simmi singh$A17
 

Recently uploaded (20)

Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery FleetStork Product Overview: An AI-Powered Autonomous Delivery Fleet
Stork Product Overview: An AI-Powered Autonomous Delivery Fleet
 
SAP ECC & S4 HANA PPT COMPARISON MM.pptx
SAP ECC & S4 HANA PPT COMPARISON MM.pptxSAP ECC & S4 HANA PPT COMPARISON MM.pptx
SAP ECC & S4 HANA PPT COMPARISON MM.pptx
 
European Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptxEuropean Standard S1000D, an Unnecessary Expense to OEM.pptx
European Standard S1000D, an Unnecessary Expense to OEM.pptx
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service AvailableCall Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
Call Girls Goa 💯Call Us 🔝 7426014248 🔝 Independent Goa Escorts Service Available
 
OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17
OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17
OpenChain Webinar - Open Source Due Diligence for M&A - 2024-06-17
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
 
Beginner's Guide to Observability@Devoxx PL 2024
Beginner's  Guide to Observability@Devoxx PL 2024Beginner's  Guide to Observability@Devoxx PL 2024
Beginner's Guide to Observability@Devoxx PL 2024
 
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...
Strengthening Web Development with CommandBox 6: Seamless Transition and Scal...
 
Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...
Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...
Premium Call Girls In Ahmedabad 💯Call Us 🔝 7426014248 🔝Independent Ahmedabad ...
 
What’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 UpdateWhat’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 Update
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
 
NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024NLJUG speaker academy 2024 - session 1, June 2024
NLJUG speaker academy 2024 - session 1, June 2024
 
Folding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a seriesFolding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a series
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt119321250-History-of-Computer-Programming.ppt
119321250-History-of-Computer-Programming.ppt
 
Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)
 
Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...
Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...
Top Call Girls Lucknow ✔ 9352988975 ✔ Hi I Am Divya Vip Call Girl Services Pr...
 

Modern c

  • 1. Modern C Jens Gustedt INRIA, FRANCE ICUBE, STRASBOURG, FRANCE E-mail address: jens gustedt inria fr URL: http://icube-icps.unistra.fr/index.php/Jens_Gustedt This is a preliminary version of this book compiled on October 27, 2015. It contains feature complete versions of Levels 0, 1 and 2, and most of the material that I foresee for Level 4. The table of contents already gives you a glimpse on what should follow for the rest. You might find a more up to date version at http://icube-icps.unistra.fr/index.php/File:ModernC.pdf (inline) http://icube-icps.unistra.fr/img_auth.php/d/db/ModernC.pdf (download) You may well share this by pointing others to my home page or one of the links above. Since I don’t know yet how all of this will be published at the end, please don’t distribute the file itself. If you represent a publishing house that would like to distribute this work under an open license, preferably CC-BY, please drop me a note. All rights reserved, Jens Gustedt, 2015 Special thanks go to the people that encouraged the writing of this book by providing me with constructive feedback, in particular Cédric Bastoul, Lucas Nussbaum, Vincent Loechner, Kliment Yanev, Szabolcs Nagy and Marcin Kowalczuk.
  • 2.
  • 3. 3 PRELIMINARIES. The C programming language has been around for a long time — the canonical reference for it is the book written by its creators, Kernighan and Ritchie [1978]. Since then, C has been used in an incredible number of applications. Programs and systems written in C are all around us: in personal computers, phones, cameras, set-top boxes, refrigerators, cars, mainframes, satellites, basically in any modern device that has a programmable interface. In contrast to the ubiquitous presence of C programs and systems, good knowledge of and about C is much more scarce. Even experienced C programmers often appear to be stuck in some degree of self-inflicted ignorance about the modern evolution of the C language. A likely reason for this is that C is seen as an "easy to learn" language, allowing a programmer with little experience to quickly write or copy snippets of code that at least appear to do what it’s supposed to. In a way, C fails to motivate its users to climb to higher levels of knowledge. This book is intended to change that general attitude. It is organized in chapters called “Levels” that sum- marize levels of familiarity with the C language and programming in general. Some features of the language are presented in parts on earlier levels, and elaborated in later ones. Most notably, pointers are introduced at Level 1 but only explained in detail at Level 2. This leads to many forward references for impatient readers to follow. As the title of this book suggests, today’s C is not the same language as the one originally designed by its creators Kernighan and Ritchie (usually referred to as K&R C). In particular, it has undergone an important standardization and extension process now driven by ISO, the International Standards Organization. This led to three major publications of C standards in the years 1989, 1999 and 2011, commonly referred to as C89, C99 and C11. The C standards committee puts a lot of effort into guaranteeing backwards compatibility such that code written for earlier versions of the language, say C89, should compile to a semantically equivalent executable with a compiler that implements a newer version. Unfortunately, this backwards compatibility has had the unwanted side effect of not motivating projects that could benefit greatly from the new features to update their code base. In this book we will mainly refer to C11, as defined in JTC1/SC22/WG14 [2011], but at the time of this writing many compilers don’t implement this standard completely. If you want to compile the examples of this book, you will need at least a compiler that implements most of C99. For the changes that C11 adds to C99, using an emulation layer such as my macro package P99 might suffice. The package is available at http: //p99.gforge.inria.fr/. Programming has become a very important cultural and economic activity and C remains an important element in the programming world. As in all human activities, progress in C is driven by many factors, corporate or individual interest, politics, beauty, logic, luck, ignorance, selfishness, ego, sectarianism, ... (add your primary motive here). Thus the development of C has not been and cannot be ideal. It has flaws and artifacts that can only be understood with their historic and societal context. An important part of the context in which C developed was the early appearance of its sister language C++. One common misconception is that C++ evolved from C by adding its particular features. Whereas this is historically correct (C++ evolved from a very early C) it is not particularly relevant today. In fact, C and C++ separated from a common ancestor more than 30 years ago, and have evolved separately ever since. But this evolution of the two languages has not taken place in isolation, they have exchanged and adopted each other’s concepts over the years. Some new features, such as the recent addition of atomics and threads have been designed in a close collaboration between the C and C++ standard committees. Nevertheless, many differences remain and generally all that is said in this book is about C and not C++. Many code examples that are given will not even compile with a C++ compiler. Rule A C and C++ are different, don’t mix them and don’t mix them up. ORGANIZATION. This book is organized in levels. The starting level, encounter, will introduce you to the very basics of programming with C. By the end of it, even if you don’t have much experience in programming, you should be able to understand the structure of simple programs and start writing your own. The acquaintance level details most principal concepts and features such as control structures, data types, operators and functions. It should give you a deeper understanding of the things that are going on when you run your programs. This knowledge should be sufficient for an introductory course in algorithms and other work at that level, with the notable caveat that pointers aren’t fully introduced yet at this level. The cognition level goes to the heart of the C language. It fully explains pointers, familiarizes you with C’s memory model, and allows you to understand most of C’s library interface. Completing this level should enable you to write C code professionally, it therefore begins with an essential discussion about the writing and organization of C programs. I personally would expect anybody who graduated from an engineering school with a major related to computer science or programming in C to master this level. Don’t be satisfied with less. The experience level then goes into detail in specific topics, such as performance, reentrancy, atomicity, threads and type generic programming. These are probably best discovered as you go, that is when you encounter them in the real world. Nevertheless, as a whole they are necessary to round off the picture and to provide you with full expertise in C. Anybody with some years of professional programming in C or who heads a software project that uses C as its main programming language should master this level. Last but not least comes ambition. It discusses my personal ideas for a future development of C. C as it is today has some rough edges and particularities that only have historical justification. I propose possible paths to improve on the lack of general constants, to simplify the memory model, and more generally to improve the modularity of the language. This level is clearly much more specialized than the others, most C programmers can probably live without it, but the curious ones among you could perhaps take up some of the ideas.
  • 4.
  • 5. Contents Level 0. Encounter 1 1. Getting started 1 1.1. Imperative programming 1 1.2. Compiling and running 3 2. The principal structure of a program 6 2.1. Grammar 6 2.2. Declarations 7 2.3. Definitions 9 2.4. Statements 10 Level 1. Acquaintance 13 Warning to experienced C programmers 13 3. Everything is about control 14 3.1. Conditional execution 15 3.2. Iterations 17 3.3. Multiple selection 20 4. Expressing computations 22 4.1. Arithmetic 22 4.2. Operators that modify objects 24 4.3. Boolean context 24 4.4. The ternary or conditional operator 26 4.5. Evaluation order 27 5. Basic values and data 28 5.1. Basic types 30 5.2. Specifying values 32 5.3. Initializers 34 5.4. Named constants 35 5.5. Binary representions 39 6. Aggregate data types 46 6.1. Arrays 46 6.2. Pointers as opaque types 51 6.3. Structures 52 6.4. New names for types: typedef 56 7. Functions 58 7.1. Simple functions 58 7.2. main is special 59 7.3. Recursion 61 8. C Library functions 66 8.1. Mathematics 70 8.2. Input, output and file manipulation 70 8.3. String processing and conversion 79 8.4. Time 83 8.5. Runtime environment settings 85 5
  • 6. 6 CONTENTS 8.6. Program termination and assertions 88 Level 2. Cognition 91 9. Style 91 9.1. Formatting 91 9.2. Naming 92 10. Organization and documentation 95 10.1. Interface documentation 97 10.2. Implementation 99 10.3. Macros 99 10.4. Pure functions 101 11. Pointers 104 11.1. Address-of and object-of operators 105 11.2. Pointer arithmetic 106 11.3. Pointers and structs 108 11.4. Opaque structures 110 11.5. Array and pointer access are the same 111 11.6. Array and pointer parameters are the same 111 11.7. Null pointers 113 12. The C memory model 113 12.1. A uniform memory model 114 12.2. Unions 114 12.3. Memory and state 116 12.4. Pointers to unspecific objects 117 12.5. Implicit and explicit conversions 118 12.6. Alignment 119 13. Allocation, initialization and destruction 121 13.1. malloc and friends 121 13.2. Storage duration, lifetime and visibility 129 13.3. Initialization 134 13.4. Digression: a machine model 136 14. More involved use of the C library 138 14.1. Text processing 138 14.2. Formatted input 145 14.3. Extended character sets 146 14.4. Binary files 153 15. Error checking and cleanup 154 15.1. The use of goto for cleanup 156 Level 3. Experience 159 15.2. Project organization 159 16. Performance 159 16.1. Inline functions 159 16.2. Avoid aliasing: restrict qualifiers 159 16.3. Functionlike macros 160 16.4. Optimization 160 16.5. Measurement and inspection 160 17. Variable argument lists 160 17.1. va_arg functions 160 17.2. __VA_ARGS__ macros 160 17.3. Default arguments 160 18. Reentrancy and sharing 160 18.1. Short jumps 160
  • 7. CONTENTS 7 18.2. Long jumps 162 18.3. Signal handlers 162 18.4. Atomic data and operations 162 19. Threads 162 20. Type generic programming 162 21. Runtime constraints 162 Level 4. Ambition 163 22. The rvalue overhaul 164 22.1. Introduce register storage class in file scope 164 22.2. Typed constants with register storage class and const qualification 166 22.3. Extend ICE to register constants 169 22.4. Unify designators 171 22.5. Functions 174 23. Improve type generic expression programming 174 23.1. Storage class for compound literals 175 23.2. Inferred types for variables and functions 176 23.3. Anonymous functions 179 24. Improve the C library 181 24.1. Add requirements for sequence points 181 24.2. Provide type generic interfaces for string search functions 182 25. Modules 184 25.1. C needs a specific approach 184 25.2. All is about naming 184 25.3. Modular C features 185 26. Simplify the object and value models 186 26.1. Remove objects of temporary lifetime 186 26.2. Introduce comparison operator for object types 186 26.3. Make memcpy and memcmp consistent 187 26.4. Enforce representation consistency for _Atomic objects 187 26.5. Make string literals char const[] 187 26.6. Default initialize padding to 0 187 26.7. Make restrict qualification part of the function interface 187 26.8. References 188 27. Contexts 188 27.1. Introduce evaluation contexts in the standard 188 27.2. Convert object pointers to void* in unspecific context 188 27.3. Introduce nullptr as a generic null pointer constant and deprecate NULL 189 Appendix A. 191 Reminders 195 Listings 203 Appendix. Bibliography 205 Appendix. Index 207
  • 8.
  • 9. LEVEL 0 Encounter This first level of the book may be your first encounter with the programming language C. It provides you with a rough knowledge about C programs, about their purpose, their structure and how to use them. It is not meant to give you a complete overview, it can’t and it doesn’t even try. On the contrary, it is supposed to give you a general idea of what this is all about and open up questions, promote ideas and concepts. These then will be explained in detail on the higher levels. 1. Getting started In this section I will try to introduce you to one simple program that has been chosen because it contains many of the constructs of the C language. If you already have experi- ence in programming you may find parts of it feel like needless repetition. If you lack such experience, you might feel ovewhelmed by the stream of new terms and concepts. In either case, be patient. For those of you with programming experience, it’s very possible that there are subtle details you’re not aware of, or assumptions you have made about the language that are not valid, even if you have programmed C before. For the ones approaching programming for the first time, be assured that after approximately ten pages from now your understanding will have increased a lot, and you should have a much clearer idea of what programming might represent. An important bit of wisdom for programming in general, and for this book in particu- lar, is summarized in the following citation from the Hitchhiker’s guide to the Galaxy: Rule B Don’t panic. It’s not worth it. There are many cross references, links, side information present in the text. There is an Index on page 207. Follow those if you have a question. Or just take a break. 1.1. Imperative programming. To get started and see what we are talking about consider our first program in Listing 1: You probably see that this is a sort of language, containing some weird words like “main”, “include”, “for”, etc. laid out and colored in a peculiar way and mixed with a lot of weird characters, numbers, and text “Doing some work” that looks like an ordinary English phrase. It is designed to provide a link between us, the human programmers, and a machine, the computer, to tell it what to do — give it “orders”. Rule 0.1.1.1 C is an imperative programming language. In this book, we will not only encounter the C programming language, but also some vocabulary from an English dialect, C jargon, the language that helps us to talk about C. It will not be possible to immediately explain each term the first time it occurs. But I will explain each one, in time, and all of them are indexed such that you can easily cheat and jumpC to more explanatory text, at your own risk. As you can probably guess from this first example, such a C program has different components that form some intermixed layers. Let’s try to understand it from the inside out. 1
  • 10. 2 0. ENCOUNTER LISTING 1. A first example of a C program 1 /* This may look like nonsense, but really is -*- mode: C -*- */ 2 # include <stdlib.h> 3 # include <stdio.h> 4 5 /* The main thing that this program does. */ 6 i n t main(void) { 7 // Declarations 8 double A[5] = { 9 [0] = 9.0, 10 [1] = 2.9, 11 [4] = 3.E+25, 12 [3] = .00007, 13 }; 14 15 // Doing some work 16 for ( s i z e _ t i = 0; i < 5; ++i) { 17 p r i n t f ("element %zu is %g, tits square is %gn", 18 i, 19 A[i], 20 A[i]*A[i]); 21 } 22 23 return EXIT_SUCCESS; 24 } 1.1.1. Giving orders. The visible result of running this program is to output 5 lines of text on the command terminal of your computer. On my computer using this program looks something like Terminal 0 > ./getting-started 1 element 0 is 9, its square is 81 2 element 1 is 2.9, its square is 8.41 3 element 2 is 0, its square is 0 4 element 3 is 7e-05, its square is 4.9e-09 5 element 4 is 3e+25, its square is 9e+50 We can easily identify parts of the text that this program outputs (printsC in the C jargon) inside our program, namely the blue part of Line 17. The real action (statementC in C) happens between that line and Line 20. The statement is a callC to a functionC named printf. . getting-started.c 17 p r i n t f ("element %zu is %g, tits square is %gn", 18 i, 19 A[i], 20 A[i]*A[i]); Here, the printf functionC receives four argumentsC , enclosed in a pair of parenthesisC , “( ... )” :
  • 11. 1. GETTING STARTED 3 • The funny-looking text (the blue part) is a so-called string literalC that serves as a formatC for the output. Within the text are three markers (format specifiersC ), that mark the positions in the output where numbers are to be inserted. These markers start with a "%" character. This format also contains some special escape charactersC that start with a backslash, namely "t" and "n". • After a comma character we find the word “i”. The thing that “i” stands for will be printed in place of the first format specifier, "%zu". • Another comma separates the next argument “A[i]”. The thing that stands for will be printed in place of the second format specifier, the first "%g". • Last, again separated by comma, appears “A[i]*A[i]”, corresponding to the last "%g". We will later explain what all of these arguments mean. Let’s just remember that we identified the main purpose of that program, namely to print some lines on the terminal, and that it “orders” function printf to fulfill that purpose. The rest is some sugarC to specify which numbers will be printed and how many of them. 1.2. Compiling and running. As it is shown above, the program text that we have listed can not be understood by your computer. There is a special program, called a compiler, that translates the C text into something that your machine can understand, the so-called binary codeC or executableC . What that translated program looks like and how this translation is done is much too complicated to explain at this stage.1 However, for the moment we don’t need to understand more deeply, as we have that tool that does all the work for us. Rule 0.1.2.1 C is a compiled programming language. The name of the compiler and its command line arguments depend a lot on the platformC on which you will be running your program. There is a simple reason for this: the target binary code is platform dependentC , that is its form and details depend on the computer on which you want to run it; a PC has different needs than a phone, your fridge doesn’t speak the same language as your set-top box. In fact, that’s one of the reasons for C to exist. Rule 0.1.2.2 A C program is portable between different platforms. It is the job of the compiler to ensure that our little program above, once translated for the appropriate platform, will run correctly on your PC, your phone, your set-top box and maybe even your fridge. That said, there is a good chance that a program named c99 might be present on your PC and that this is in fact a C compiler. You could try to compile the example program using the following command: Terminal 0 > c99 -Wall -o getting-started getting-started.c -lm The compiler should do its job without complaining, and output an executable file called getting-started in your current directory.[Exs 2] In the above line • c99 is the compiler program. • -Wall tells it to warn us about anything that it finds unusual. 1In fact, the translation itself is done in several steps that goes from textual replacement, over proper com- pilation to linking. Nevertheless, the tool that bundles all this is traditionally called compiler and not translator, which would be more accurate. [Exs 2] Try the compilation command in your terminal.
  • 12. 4 0. ENCOUNTER • -o getting-started tells it to store the compiler outputC in a file named getting-started. • getting-started.c names the source fileC , namely the file that contains the C code that we have written. Note that the .c extension at the end of the file name refers to the C programming language. • -lm tells it to add some standard mathematical functions if necessary, we will need those later on. Now we can executeC our newly created executableC . Type in: Terminal 0 > ./getting-started and you should see exactly the same output as I have given you above. That’s what portable means, wherever you run that program its behaviorC should be the same. If you are not lucky and the compilation command above didn’t work, you’d have to look up the name of your compilerC in your system documentation. You might even have to install a compiler if one is not available. The names of compilers vary. Here are some common alternatives that might do the trick: Terminal 0 > clang -Wall -lm -o getting-started getting-started.c 1 > gcc -std=c99 -Wall -lm -o getting-started getting-started.c 2 > icc -std=c99 -Wall -lm -o getting-started getting-started.c Some of these, even if they are present on your computer, might not compile the program without complaining.[Exs 3] With the program in Listing 1 we presented an ideal world — a program that works and produces the same result on all platforms. Unfortunately, when programming yourself very often you will have a program that only works partially and that maybe produces wrong or unreliable results. Therefore, let us look at the program in Listing 2. It looks quite similar to the previous one. If you run your compiler on that one, it should give you some diagnosticC , something similar to this Terminal 0 > c99 -Wall -o getting-started-badly getting-started-badly.c 1 getting-started-badly.c:4:6: warning: return type of ’main’ is not ’int’ [-Wmain] 2 getting-started-badly.c: In function ’main’: 3 getting-started-badly.c:16:6: warning: implicit declaration of function ’printf’ [-Wimplicit-func 4 getting-started-badly.c:16:6: warning: incompatible implicit declaration of built-in function ’pr 5 getting-started-badly.c:22:3: warning: ’return’ with a value, in function returning void [enabled Here we had a lot of long “warning” lines that are even too long to fit on a terminal screen. In the end the compiler produced an executable. Unfortunately, the output when we run the program is different. This is a sign that we have to be careful and pay attention to details. clang is even more picky than gcc and gives us even longer diagnostic lines: [Exs 3] Start writing a textual report about your tests with this book. Note down which command worked for you.
  • 13. 1. GETTING STARTED 5 LISTING 2. An example of a C program with flaws 1 /* This may look like nonsense, but really is -*- mode: C -*- */ 2 3 /* The main thing that this program does. */ 4 void main() { 5 // Declarations 6 i n t i; 7 double A[5] = { 8 9.0, 9 2.9, 10 3.E+25, 11 .00007, 12 }; 13 14 // Doing some work 15 for (i = 0; i < 5; ++i) { 16 p r i n t f ("element %d is %g, tits square is %gn", 17 i, 18 A[i], 19 A[i]*A[i]); 20 } 21 22 return 0; 23 } Terminal 0 > clang -Wall -o getting-started-badly getting-started-badly.c 1 getting-started-badly.c:4:1: warning: return type of ’main’ is not ’int’ [-Wmain-return-type] 2 void main() { 3 ^ 4 getting-started-badly.c:16:6: warning: implicitly declaring library function ’printf’ with type 5 ’int (const char *, ...)’ 6 printf("element %d is %g, tits square is %gn", /*@label{printf-start-badly}*/ 7 ^ 8 getting-started-badly.c:16:6: note: please include the header <stdio.h> or explicitly provide a d 9 ’printf’ 10 getting-started-badly.c:22:3: error: void function ’main’ should not return a value [-Wreturn-typ 11 return 0; 12 ^ ~ 13 2 warnings and 1 error generated. This is a good thing! Its diagnostic outputC is much more informative. In particular it gave us two hints: it expected a different return type for main and it expected us to have a line such as Line 3 of Listing 1 to specify where the printf function comes from. Notice how clang, unlike gcc, did not produce an executable. It considers the problem in Line 22 fatal. Consider this to be a feature. In fact depending on your platform you may force your compiler to reject programs that produce such diagnostics. For gcc such a command line option would be -Werror. Rule 0.1.2.3 A C program should compile cleanly without warnings.
  • 14. 6 0. ENCOUNTER So we have seen two of the points in which Listings 1 and 2 differed, and these two modifications turned a good, standard conforming, portable program into a bad one. We also have seen that the compiler is there to help us. It nailed the problem down to the lines in the program that cause trouble, and with a bit of experience you will be able to understand what it is telling you.[Exs 4] [Exs 5] 2. The principal structure of a program Compared to our little examples from above, real programs will be more complicated and contain additional constructs, but their structure will be very similar. Listing 1 already has most of the structural elements of a C program. There are two categories of aspects to consider in a C program: syntactical aspects (how do we specify the program so the compiler understands it) and semantic aspects (what do we specify so that the program does what we want it to do). In the following subsections we will introduce the syntactical aspects (“grammar”) and three different semantic aspects, namely declarative parts (what things are), definitions of objects (where things are) and statements (what are things supposed to do). 2.1. Grammar. Looking at its overall structure, we can see that a C program is com- posed of different types of text elements that are assembled in a kind of grammar. These elements are: special words: In Listing 1 we have used the following special words6 : #include, int, void, double, for, and return. In our program text, here, they will usually be printed in bold face. These special words represent concepts and features that the C language imposes and that cannot be changed. punctuationsC : There are several punctuation concepts that C uses to structure the program text. • There are five sorts of parenthesis: { ... }, ( ... ), [ ... ], /* ... */ and < ... >. Parenthesis group certain parts of the program together and should al- ways come in pairs. Fortunately, the < ... > parenthesis are rare in C, and only used as shown in our example, on the same logical line of text. The other four are not limited to a single line, their contents might span several lines, like they did when we used printf earlier. • There are two different separators or terminators, comma and semicolon. When we used printf we saw that commas separated the four arguments to that function, in line 12 we saw that a comma also can follow the last element of a list of elements. . getting-started.c 12 [3] = .00007, One of the difficulties for newcomers in C is that the same punctuation characters are used to express different concepts. For example, {} and [] are each used for two differ- ent purposes in our program. Rule 0.2.1.1 Punctuation characters can be used with several different meanings. commentsC : The construct /* ... */ that we saw as above tells the compiler that ev- erything inside it is a comment, see e.g Line 5. [Exs 4] Correct Listing 2 step by step. Start from the first diagnostic line, fix the code that is mentioned there, recompile and so on, until you have a flawless program. [Exs 5] There is a third difference between the two programs that we didn’t mention, yet. Find it. 6In the C jargon these are directivesC , keywordsC and reservedC identifiers
  • 15. 2. THE PRINCIPAL STRUCTURE OF A PROGRAM 7 . getting-started.c 5 /* The main thing that this program does. */ Comments are ignored by the compiler. It is the perfect place to explain and document your code. Such “in-place” documentation can (and should) improve the readability and comprehensibility of your code a lot. Another form of com- ment is the so-called C++-style comment as in Line 15. These are marked by //. C++-style comments extend from the // to the end of the line. literalsC : Our program contains several items that refer to fixed values that are part of the program: 0, 1, 3, 4, 5, 9.0, 2.9, 3.E+25, .00007, and "element %zu is %g, tits square is %gn". These are called literalsC . identifiersC : These are “names” that we (or the C standard) give to certain entities in the program. Here we have: A, i, main, printf, size_t , and EXIT_SUCCESS. Identifiers can play different roles in a program. Amongst others they may refer to: • data objectsC (such as A and i), these are also referred to as variablesC • typeC aliases, size_t , that specify the “sort” of a new object, here of i. Observe the trailing _t in the name. This naming convention is used by the C standard to remind you that the identifier refers to a type. • functions (main and printf), • constants (EXIT_SUCCESS). functionsC : Two of the identifiers refer to functions: main and printf. As we have already seen printf is used by the program to produce some output. The function main in turn is definedC , that is its declarationC int main(void) is followed by a blockC enclosed in { ... } that describes what that function is supposed to do. In our example this function definitionC goes from Line 6 to 24. main has a special role in C programs as we will encounter them, it must always be present since it is the starting point of the program’s execution. operatorsC : Of the numerous C operators our program only uses a few: • = for initializationC and assignmentC , • < for comparison, • ++ to increment a variable, that is to increase its value by 1 • * to perform the multiplication of two values. 2.2. Declarations. Declarations have to do with the identifiersC that we encountered above. As a general rule: Rule 0.2.2.1 All identifiers of a program have to be declared. That is, before we use an identifier we have to give the compiler a declarationC that tells it what that identifier is supposed to be. This is where identifiers differ from keywordsC ; keywords are predefined by the language, and must not be declared or rede- fined. Three of the identifiers we use are effectively declared in our program: main, A and i. Later on, we will see where the other identifiers (printf, size_t , and EXIT_SUCCESS) come from. Above, we already mentioned the declaration of the main function. All three declara- tions, in isolation as “declarations only”, look like this: 1 i n t main(void); 2 double A[5]; 3 s i z e _ t i; These three follow a pattern. Each has an identifier (main, A or i) and a specification of certain properties that are associated with that identifier.
  • 16. 8 0. ENCOUNTER • i is of typeC size_t . • main is additionally followed by parenthesis, ( ... ), and thus declares a function of type int. • A is followed by brackets, [ ... ], and thus declares an arrayC . An array is an aggre- gate of several items of the same type, here it consists of 5 items of type double. These 5 items are ordered and can be referred to by numbers, called indicesC , from 0 to 4. Each of these declarations starts with a typeC , here int, double and size_t . We will see later what that represents. For the moment it is sufficient to know that this specifies that all three identifiers, when used in the context of a statement, will act as some sort of “numbers”. For the other three identifiers, printf, size_t and EXIT_SUCCESS, we don’t see any declaration. In fact they are pre-declared identifiers, but as we saw when we tried to com- pile Listing 2, the information about these identifiers doesn’t come out of nowhere. We have to tell the compiler where it can obtain information about them. This is done right at the start of the program, in the Lines 2 and 3: printf is provided by stdio.h, whereas#include <stdio.h> size_t and EXIT_SUCCESS come from stdlib.h. The real declarations of these identi-#include <stdlib.h> fiers are specified in .h files with these names somewhere on your computer. They could be something like: 1 i n t p r i n t f (char const format[ s t a t i c 1], ...); 2 typedef unsigned long s i z e _ t ; 3 # define EXIT_SUCCESS 0 but this is not important for the moment. This information is normally hidden from you in these include filesC or header filesC . If you need to know the semantics of these, it’s usually a bad idea to look them up in the corresponding files, as they tend to be barely readable. Instead, search in the documentation that comes with your platform. For the brave, I always recommend a look into the current C standard, as that is where they all come from. For the less courageous the following commands may help: Terminal 0 > apropos printf 1 > man printf 2 > man 3 printf Declarations may be repeated, but only if they specify exactly the same thing. Rule 0.2.2.2 Identifiers may have several consistent declarations. Another property of declarations is that they might only be valid (visibleC ) in some part of the program, not everywhere. A scopeC is a part of the program where an identifier is valid. Rule 0.2.2.3 Declarations are bound to the scope in which they appear. In Listing 1 we have declarations in different scopes. • A is visible inside the definition of main, starting at its very declaration on Line 8 and ending at the closing } on Line 24 of the innermost { ... } block that contains that declaration.
  • 17. 2. THE PRINCIPAL STRUCTURE OF A PROGRAM 9 • i has a more restricted visibility. It is bound to the for construct in which it is declared. Its visibility reaches from that declaration in Line 16 to the end of the { ... } block that is associated with the for in Line 21. • main is not enclosed in any { ... } block, so it is visible from its declaration onwards until the end of the file. In a slight abuse of terminology, the first two types of scope are called block scopeC . The third type, as used for main is called file scopeC . Identifiers in file scope are often referred to as globals. 2.3. Definitions. Generally, declarations only specify the kind of object an identifier refers to, not what the concrete value of an identifier is, nor where the object it refers to can be found. This important role is filled by a definitionC . Rule 0.2.3.1 Declarations specify identifiers whereas definitions specify objects. We will later see that things are a little bit more complicated in real life, but for now we can make a simplification Rule 0.2.3.2 An object is defined at the same time as it is initialized. Initializations augment the declarations and give an object its initial value. For in- stance: 1 s i z e _ t i = 0; is a declaration of i that is also a definition with initial valueC 0. A is a bit more complex . getting-started.c 8 double A[5] = { 9 [0] = 9.0, 10 [1] = 2.9, 11 [4] = 3.E+25, 12 [3] = .00007, 13 }; this initializes the 5 items in A to the values 9.0, 2.9, 0.0, 0.00007 and 3.0E+25, in that order. The form of an initializer we see here is called designatedC : a pair of brackets with an integer designate which item of the array is initialized with the corresponding value. E.g. [4] = 3.E+25 sets the last item of the array A to the value 3.E+25. As a special rule, any position that is not listed in the initializer is set to 0. In our example the missing [2] is filled with 0.0.7 Rule 0.2.3.3 Missing elements in initializers default to 0. You might have noticed that array positions, indicesC , above are not starting at 1 for the first element, but with 0. Think of an array position as the “distance” of the correspond- ing array element from the start of the array. Rule 0.2.3.4 For an array with n the first element has index 0, the last has index n-1. For a function we have a definition (as opposed to only a declaration) if its declaration is followed by braces { ... } containing the code of the function. 7We will see later how these number literals with dots . and exponents E+25 work.
  • 18. 10 0. ENCOUNTER 1 i n t main(void) { 2 ... 3 } In our examples so far we have seen two different kinds of objects, data objectsC , namely i and A, and function objectsC , main and printf. In contrast to declarations, where several were allowed for the same identifier, defini- tions must be unique: Rule 0.2.3.5 Each object must have exactly one definition. This rule concerns data objects as well as function objects. 2.4. Statements. The second part of the main function consists mainly of statements. Statements are instructions that tell the compiler what to do with identifiers that have been declared so far. We have . getting-started.c 16 for ( s i z e _ t i = 0; i < 5; ++i) { 17 p r i n t f ("element %zu is %g, tits square is %gn", 18 i, 19 A[i], 20 A[i]*A[i]); 21 } 22 23 return EXIT_SUCCESS; We have already discussed the lines that correspond to the call to printf. There are also other types of statements: a for and a return statement, and an increment operation, indicated by the operatorC ++. 2.4.1. Iteration. The for statement tells the compiler that the program should execute the printf line a number of times. It is the simplest form of domain iterationC that C has to offer. It has four different parts. The code that is to be repeated is called loop bodyC , it is the { ... } block that follows the for ( ... ). The other three parts are those inside ( ... ) part, divided by semicolons: (1) The declaration, definition and initialization of the loop variableC i that we already discussed above. This initialization is executed once before any of the rest of the whole for statement. (2) A loop conditionC , i < 5, that specifies how long the for iteration should con- tinue. This one tells the compiler to continue iterating as long as i is strictly less than 5. The loop condition is checked before each execution of the loop body. (3) Another statement, ++i, is executed i after each iteration. In this case it increases the value of i by 1 each time. If we put all those together, we ask the program to perform the part in the block 5 times, setting the value of i to 0, 1, 2, 3, and 4 respectively in each iteration. The fact that we can identify each iteration with a specific value for i makes this an iteration over the domainC 0, ..., 4. There is more than one way to do this in C, but a for is the easiest, cleanest and best tool for the task. Rule 0.2.4.1 Domain iterations should be coded with a for statement. A for statement can be written in several ways other than what we just saw. Often people place the definition of the loop variable somewhere before the for or even reuse the same variable for several loops. Don’t do that.
  • 19. 2. THE PRINCIPAL STRUCTURE OF A PROGRAM 11 Rule 0.2.4.2 The loop variable should be defined in the initial part of a for. 2.4.2. Function return. The last statement in main is a return. It tells the main func- tion, to return to the statement that it was called from once it’s done. Here, since main has int in its declaration, a return must send back a value of type int to the calling statement. In this case that value is EXIT_SUCCESS. Even though we can’t see its definition, the printf function must contain a similar return statement. At the point where we call the function in Line 17, execution of the statements in main is temporarily suspended. Execution continues in the printf function until a return is encountered. After the return from printf, execution of the statements in main continues from where it stopped. main(); call return return call progam code 6 i n t main ( void ) { 7 / / D e c l a r a t i o n s 8 double A[ 5 ] = { 9 [ 0 ] = 9.0 , 10 [ 1 ] = 2 .9 , 11 [ 4 ] = 3 .E+25 , 12 [ 3 ] = .00007 , 13 }; 14 15 / / Doing some work 16 for ( s i z e _ t i = 0; i < 5; ++ i ) { 17 p r i n t f ( " element %zu i s %g , t i t s square i s %g n" , 18 i , 19 A[ i ] , 20 A[ i ]∗A[ i ] ) ; 21 } 22 23 return EXIT_SUCCESS; 24 } int printf (char const fmt [], ...) { return something; } processstartup Clibrary FIGURE 1. Execution of a small program In Figure 1 we have a schematic view of the execution of our little program. First, a process startup routine (on the left) that is provided by our platform calls the user-provided function main (middle). That in turn calls printf, a function that is part of the C libraryC , on the right. Once a return is encountered there, control returns back to main, and when we reach the return in main, it passes back to the startup routine. The latter transfer of control, from a programmer’s point of view, is the end of the program’s execution.
  • 20.
  • 21. LEVEL 1 Acquaintance This chapter is supposed to get you acquainted with the C programming language, that is to provide you with enough knowledge to write and use good C programs. “Good” here refers to a modern understanding of the language, avoiding most of the pitfalls of early dialects of C, offering you some constructs that were not present before, and that are portable across the vast majority of modern computer architectures, from your cell phone to a mainframe computer. Having worked through this you should be able to write short code for everyday needs, not extremely sophisticated, but useful and portable. In many ways, C is a permissive language, a programmer is allowed to shoot themselves in the foot or other body parts if they choose to, and C will make no effort to stop them. Therefore, just for the moment, we will introduce some restrictions. We’ll try to avoid handing out guns in this chapter, and place the key to the gun safe out of your reach for the moment, marking its location with big and visible exclamation marks. The most dangerous constructs in C are the so-called castsC , so we’ll skip them at this level. However, there are many other pitfalls that are less easy to avoid. We will approach some of them in a way that might look unfamiliar to you, in particular if you have learned your C basics in the last millennium or if you have been initiated to C on a platform that wasn’t upgraded to current ISO C for years. • We will focus primarily on the unsignedC versions of integer types. • We will introduce pointers in steps: first, in disguise as parameters to functions (6.1.4), then with their state (being valid or not, 6.2) and then, only when we really can’t delay it any further (11), using their entire potential. • We will focus on the use of arrays whenever possible, instead. Warning to experienced C programmers. If you already have some experience with C programming, this may need some getting used to. Here are some of the things that may provoke allergic reactions. If you happen to break out in spots when you read some code here, try to take a deep breath and let it go. We bind type modifiers and qualifiers to the left. We want to separate identifiers visu- ally from their type. So we will typically write things as 1 char* name; where char* is the type and name is the identifier. We also apply the left binding rule to qualifiers and write 1 char const* const path_name; Here the first const qualifies the char to its left, the * makes it to a pointer and the second const again qualifies what is to its left. We use array or function notation for pointer parameters to functions. wherever these assume that the pointer can’t be null. Examples 1 s i z e _ t s t r l e n (char const string[ s t a t i c 1]); 13
  • 22. 14 1. ACQUAINTANCE 2 i n t main( i n t argc, char* argv[argc+1]); 3 i n t a t e x i t (void function(void)); The first stresses the fact that strlen must receive a valid (non-null) pointer and will access at least one element of string. The second summarizes the fact that main receives an array of pointers to char: the program name, argc-1 program arguments and one null pointer that terminates the array. The third emphasizes that semantically atexit receives a function as an argument. The fact that technically this function is passed on as a function pointer is usually of minor interest, and the commonly used pointer-to-function syntax is barely readable. Here are syntactically equivalent declarations for the three functions above as they would be written by many: 1 s i z e _ t s t r l e n (const char *string); 2 i n t main( i n t argc, char **argv); 3 i n t a t e x i t (void (*function)(void)); As you now hopefully see, this is less informative and more difficult to comprehend visu- ally. We define variables as close to their first use as possible. Lack of variable initializa- tion, especially for pointers, is one of the major pitfalls for novice C programmers. This is why we should, whenever possible, combine the declaration of a variable with the first assignment to it: the tool that C gives us for this purpose is a definition - a declaration together with an initialization. This gives a name to a value, and introduces this name at the first place where it is used. This is particularly convenient for for-loops. The iterator variable of one loop is se- mantically a different object from the one in another loop, so we declare the variable within the for to ensure it stays within the loop’s scope. We use prefix notation for code blocks. To be able to read a code block it is important to capture two things about it easily: its purpose and its extent. Therefore: • All { are prefixed on the same line with the statement or declaration that intro- duces them. • The code inside is indented by one level. • The terminating } starts a new line on the same level as the statement that intro- duced the block. • Block statements that have a continuation after the } continue on the same line. Examples: 1 i n t main( i n t argc, char* argv[argc+1]) { 2 puts("Hello world!"); 3 i f (argc > 1) { 4 while (true) { 5 puts("some programs never stop"); 6 } 7 } e l s e { 8 do { 9 puts("but this one does"); 10 } while ( f a l s e ); 11 } 12 return EXIT_SUCCESS; 13 } 3. Everything is about control In our introductory example we saw two different constructs that allowed us to control the flow of a program execution: functions and the for-iteration. Functions are a way to
  • 23. 3. EVERYTHING IS ABOUT CONTROL 15 transfer control unconditionally. The call transfers control unconditionally to the function and a return-statement unconditionally transfers it back to the caller. We will come back to functions in Section 7. The for statement is different in that it has a controlling condition (i < 5 in the ex- ample) that regulates if and when the dependent block or statement ({ printf(...)}) is executed. C has five conditional control statements: if, for, do, while and switch. We will look at these statements in this section. There are several other kinds of conditional expressions we will look at later on: the ternary operatorC , denoted by an expression in the form “cond ? A : B”, and the compile-time preprocessor conditionals (#if-#else) and type generic expressions (noted with the keyword _Generic). We will visit these in Sections 4.4 and 20, respectively. 3.1. Conditional execution. The first construct that we will look at is specified by the keyword if. It looks like this: 1 i f (i > 25) { 2 j = i - 25; 3 } Here we compare i against the value 25. If it is larger than 25, j is set to the value i - 25. In that example i > 25 is called the controlling expressionC , and the part in { ... } is called the dependent blockC . This form of an if statement is syntactically quite similar to the for statement that we already have encountered. It is a bit simpler, the part inside the parenthesis has only one part that determines whether the dependent statement or block is run. There is a more general form of the if construct: 1 i f (i > 25) { 2 j = i - 25; 3 } e l s e { 4 j = i; 5 } It has a second dependent statement or block that is executed if the controlling con- dition is not fulfilled. Syntactically, this is done by introducing another keyword else that separates the two statements or blocks. The if (...)... else ... is a selection statementC . It selects one of the two possible code pathsC according to the contents of ( ... ). The general form is 1 i f (condition) statement0-or-block0 2 e l s e statement1-or-block1 The possibilities for the controlling expression “condition” are numerous. They can range from simple comparisons as in this example to very complex nested expressions. We will present all the primitives that can be used in Section 4.3.2. The simplest of such “condition” specifications in an if statement can be seen in the following example, in a variation of the for loop from Listing 1. 1 for ( s i z e _ t i = 0; i < 5; ++i) { 2 i f (i) { 3 p r i n t f ("element %zu is %g, tits square is %gn", 4 i, 5 A[i], 6 A[i]*A[i]); 7 } 8 }
  • 24. 16 1. ACQUAINTANCE Here the condition that determines whether printf is executed or not is just i: a nu- merical value by itself can be interpreted as a condition. The text will only be printed when the value of i is not 0.[Exs 1] There are two simple rules for the evaluation a numerical “condition”: Rule 1.3.1.1 The value 0 represents logical false. Rule 1.3.1.2 Any value different from 0 represents logical true. The operators == and != allow us to test for equality and inequality, respectively. a == b is true if the value of a is equal to the value of b and false otherwise; a != b is false if a is equal to b and true otherwise. Knowing how numerical values are evaluated as conditions, we can avoid redundancy. For example, we can rewrite 1 i f (i != 0) { 2 ... 3 } as: 1 i f (i) { 2 ... 3 } The type bool, specified in stdbool.h, is what we should be using if we want to#include <stdbool.h> store truth values. Its values are false and true. Technically, false is just another name for 0 and true for 1. It’s important to use false and true (and not the numbers) to emphasize that a value is to be interpreted as a condition. We will learn more about the bool type in Section 5.5.4. Redundant comparisons quickly become unreadable and clutter your code. If you have a conditional that depends on a truth value, use that truth value directly as the condition. Again, we can avoid redundancy by rewriting something like: 1 bool b = ...; 2 ... 3 i f ((b != f a l s e ) == true) { 4 ... 5 } as 1 bool b = ...; 2 ... 3 i f (b) { 4 ... 5 } Generally: Rule 1.3.1.3 Don’t compare to 0, false or true. Using the truth value directly makes your code clearer, and illustrates one of the basic concepts of the C language: Rule 1.3.1.4 All scalars have a truth value. Here scalarC types include all the numerical types such as size_t , bool or int that we already encountered, and pointerC types, that we will come back to in Section 6.2. [Exs 1] Add the if (i) condition to the program and compare the output to the previous.
  • 25. 3. EVERYTHING IS ABOUT CONTROL 17 3.2. Iterations. Previously, we encountered the for statement that allows us to iterate over a domain; in our introductory example it declared a variable i that was set to the values 0, 1, 2, 3 and 4. The general form of this statement is 1 for (clause1; condition2; expression3) statement-or-block This statement is actually quite genereric. Usually “clause1” is an assignment ex- pression or a variable definition. It serves to state an initial value for the iteration domain. “condition2” tests if the iteration should continue. Then, “expression3” updates the iteration variable that had been used in “clause1”. It is performed at the end of each iteration. Some advice • In view of Rule 0.2.4.2 “clause1” should in most cases be be a variable defini- tion. • Because for is relatively complex with its four different parts and not so easy to capture visually, “statement-or-block” should usually be a { ... } block. Let’s see some more examples: 1 for ( s i z e _ t i = 10; i; --i) { 2 something(i); 3 } 4 for ( s i z e _ t i = 0, stop = upper_bound(); i < stop; ++i) { 5 something_else(i); 6 } 7 for ( s i z e _ t i = 9; i <= 9; --i) { 8 something_else(i); 9 } The first for counts i down from 10 to 1, inclusive. The condition is again just the evaluation of the variable i, no redundant test against value 0 is required. When i becomes 0, it will evaluate to false and the loop will stop. The second for declares two variables, i and stop. As before i is the loop variable, stop is what we compare against in the condition, and when i becomes greater than or equal to stop, the loop terminates. The third for appears like it would go on forever, but actually counts down from 9 to 0. In fact, in the next section we will see that “sizes” in C, that is numbers that have type size_t , are never negative.[Exs 2] Observe that all three for statements declare variables named i. These three variables with the same name happily live side by side, as long as their scopes don’t overlap. There are two more iterative statements in C, namely while and do. 1 while (condition) statement-or-block 2 do statement-or-block while(condition); The following example shows a typical use of the first: 1 #include <tgmath.h> 2 3 double const eps = 1E-9; // desired precision 4 ... 5 double const a = 34.0; 6 double x = 0.5; 7 while (fabs(1.0 - a*x) >= eps) { // iterate until close 8 x *= (2.0 - a*x); // Heron approximation 9 } [Exs 2] Try to imagine what happens when i has value 0 and is decremented by means of operator --.
  • 26. 18 1. ACQUAINTANCE It iterates as long as the given condition evaluates true. The do loop is very similar, except that it checks the condition after the dependent block: 1 do { // iterate 2 x *= (2.0 - a*x); // Heron approximation 3 } while (fabs(1.0 - a*x) >= eps); // iterate until close This means that if the condition evaluates to false, a while-loop will not run its dependent block at all, and a do-loop will run it once before terminating. As with the for statement, for do and while it is advisable to use the { ... } block variants. There is also a subtle syntactical difference between the two, do always needs a semicolon ; after the while (condition) to terminate the statement. Later we will see that this is a syntactic feature that turns out to be quite useful in the context of multiple nested statements, see Section 10.3. All three iteration statements become even more flexible with break and continue state- ments. A break statement stops the loop without re-evaluating the termination condition or executing the part of the dependent block after the break statement: 1 while (true) { 2 double prod = a*x; 3 i f (fabs(1.0 - prod) < eps) // stop if close enough 4 break; 5 x *= (2.0 - prod); // Heron approximation 6 } This way, we can separate the computation of the product a*x, the evaluation of the stop condition and the update of x. The condition of the while then becomes trivial. The same can be done using a for, and there is a tradition among C programmers to write it in as follows: 1 for (;;) { 2 double prod = a*x; 3 i f (fabs(1.0 - prod) < eps) // stop if close enough 4 break; 5 x *= (2.0 - prod); // Heron approximation 6 } for(;;) here is equivalent to while(true). The fact that the controlling expression of a for (the middle part between the ;;) can be omitted and is interpreted as “always true” is just an historic artifact in the rules of C and has no other special reason. The continue statement is less frequently used. Like break, it skips the execution of the rest of the dependent block, so all statements in the block after the continue are not executed for the current iteration. However, it then re-evaluates the condition and continues from the start of the dependent block if the condition is true. 1 for ( s i z e _ t i =0; i < max_iterations; ++i) { 2 i f (x > 1.0) { // check if we are on the correct side of 1 3 x = 1.0/x; 4 continue; 5 } 6 double prod = a*x; 7 i f (fabs(1.0 - prod) < eps) // stop if close enough 8 break; 9 x *= (2.0 - prod); // Heron approximation 10 }
  • 27. 3. EVERYTHING IS ABOUT CONTROL 19 In the examples above we made use of a standard macro fabs, that comes with the tgmath.h header3 . It calculates the absolute value of a double. If you are interested in #include <tgmath.h> how this works, Listing 1.1 is a program that does the same thing without the use of fabs. In it, fabs has been replaced by several explicit comparisons. The task of the program is to compute the inverse of all numbers that are provided to it on the command line. An example of a program execution looks like: Terminal 0 > ./heron 0.07 5 6E+23 1 heron: a=7.00000e-02, x=1.42857e+01, a*x=0.999999999996 2 heron: a=5.00000e+00, x=2.00000e-01, a*x=0.999999999767 3 heron: a=6.00000e+23, x=1.66667e-24, a*x=0.999999997028 To process the numbers on the command line the program uses another library function strtod from stdlib.h.[Exs 4][Exs 5][Exs 6] #include <stdlib.h> LISTING 1.1. A program to compute inverses of numbers 1 # include <stdlib.h> 2 # include <stdio.h> 3 4 /* lower and upper iteration limits centered around 1.0 */ 5 s t a t i c double const eps1m01 = 1.0 - 0x1P-01; 6 s t a t i c double const eps1p01 = 1.0 + 0x1P-01; 7 s t a t i c double const eps1m24 = 1.0 - 0x1P-24; 8 s t a t i c double const eps1p24 = 1.0 + 0x1P-24; 9 10 i n t main( i n t argc, char* argv[argc+1]) { 11 for ( i n t i = 1; i < argc; ++i) { // process args 12 double const a = strtod(argv[i], 0); // arg -> double 13 double x = 1.0; 14 for (;;) { // by powers of 2 15 double prod = a*x; 16 i f (prod < eps1m01) x *= 2.0; 17 e l s e i f (eps1p01 < prod) x *= 0.5; 18 e l s e break; 19 } 20 for (;;) { // Heron approximation 21 double prod = a*x; 22 i f ((prod < eps1m24) || (eps1p24 < prod)) 23 x *= (2.0 - prod); 24 e l s e break; 25 } 26 p r i n t f ("heron: a=%.5e,tx=%.5e,ta*x=%.12fn", 27 a, x, a*x); 28 } 29 return EXIT_SUCCESS; 30 } 3“tgmath” stands for type generic mathematical functions. [Exs 4] Analyse Listing 1.1 by adding printf calls for intermediate values of x. [Exs 5] Describe the use of the parameters argc and argv in Listing 1.1. [Exs 6] Print out the values of eps1m01 and observe the output when you change them slightly.
  • 28. 20 1. ACQUAINTANCE 3.3. Multiple selection. The last control statement that C has to offer is called switch statement and is another selectionC statement. It is mainly used when cascades of if-else constructs would be too tedious: 1 i f (arg == ’m’) { 2 puts("this is a magpie"); 3 } e l s e i f (arg == ’r’) { 4 puts("this is a raven"); 5 } e l s e i f (arg == ’j’) { 6 puts("this is a jay"); 7 } e l s e i f (arg == ’c’) { 8 puts("this is a chough"); 9 } e l s e { 10 puts("this is an unknown corvid"); 11 } In this case, we have a choice that is more complex than a false -true decision and that can have several outcomes. We can simplify this as follows: 1 switch (arg) { 2 case ’m’: puts("this is a magpie"); 3 break; 4 case ’r’: puts("this is a raven"); 5 break; 6 case ’j’: puts("this is a jay"); 7 break; 8 case ’c’: puts("this is a chough"); 9 break; 10 default : puts("this is an unknown corvid"); 11 } Here we select one of the puts calls according to the value of the arg variable. Like printf, the function puts is provided by stdio.h. It outputs a line with the string that is passed#include <stdio.h> as an argument. We provide specific cases for characters ’m’, ’r’, ’j’, ’c’ and a fallbackC case labeled default. The default case is triggered if arg doesn’t match any of the case values.[Exs 7] Syntactically, a switch is as simple as 1 switch (expression) statement-or-block and the semantics of it are quite straightforward: the case and default labels serve as jump targetsC . According to the value of the expression, control just continues at the state- ment that is labeled accordingly. If we hit a break statement, the whole switch under which it appears terminates and control is transferred to the next statement after the switch. By that specification a switch statement can in fact be used much more widely than iterated if-else constructs. 1 switch (count) { 2 default :puts("++++ ..... +++"); 3 case 4: puts("++++"); 4 case 3: puts("+++"); 5 case 2: puts("++"); 6 case 1: puts("+"); 7 case 0:; [Exs 7] Test the above switch statement in a program. See what happens if you leave out some of the break statements.
  • 29. 3. EVERYTHING IS ABOUT CONTROL 21 8 } Once we have jumped into the block, the execution continues until it reaches a break or the end of the block. In this case, because there are no break statements, we end up running all subsequent puts statements. For example, the output when the value of count is 3 would be a triangle with three lines. Terminal 0 +++ 1 ++ 2 + The structure of a switch can be more flexible than if-else, but it is restricted in another way: Rule 1.3.3.1 case values must be integer constant expressions. In Section 5.4.2 we will see what these expressions are in detail. For now it suffices to know that these have to be fixed values that we provide directly in the source such as the 4, 3, 2, 1, 0 above. In particular variables such as count above are only allowed in the switch part but not for the individual cases. With the greater flexibility of the switch statement also comes a price: it is more error prone. In particular, we might accidentally skip variable definitions: Rule 1.3.3.2 case labels must not jump beyond a variable definition.
  • 30. 22 1. ACQUAINTANCE 4. Expressing computations We’ve already made use of some simple examples of expressionsC . These are code snippets that compute some value based on other values. The simplest such expressions are certainly arithmetic expressions that are similar to those that we learned in school. But there are others, notably comparison operators such as == and != that we already saw earlier. In this section, the values and objects on which we will do these computations will be mostly of the type size_t that we already met above. Such values correspond to “sizes”, so they are numbers that cannot be negative. Their range of possible values starts at 0. What we would like to represent are all the non-negative integers, often denoted as N, N0, or “natural” numbers in mathematics. Unfortunately computers are finite so we can’t directly represent all the natural numbers, but we can do a reasonable approximation. There is a big upper limit SIZE_MAX that is the upper bound of what we can represent in a size_t . Rule 1.4.0.3 The type size_t represents values in the range [0, SIZE_MAX]. The value of SIZE_MAX is quite large, depending on the platform it should be one of 216 − 1 = 65535 232 − 1 = 4294967295 264 − 1 = 18446744073709551615 The first value is a minimal requirement, the other two values are much more commonly used today. They should be large enough for calculations that are not too sophisticated. The standard header stdint.h provides SIZE_MAX such that you don’t have to figure it#include <stdint.h> out yourself to write portable code. The concept of “numbers that cannot be negative” to which we referred for size_t corresponds to what C calls unsigned integer typesC . The symbols and combinations like + or != are called operatorsC and the things to which they are applied are called operandsC , so in something like “a + b”, “+” is the operator and “a” and “b” are its operands. For an overview of all C operators see the tables in the appendix; Table 2 lists the operators that operate on values, Table 3 those that operate objects and Table 4 those that operate on types. 4.1. Arithmetic. Arithmetic operators form the first group in Table 2 of operators that operate on values. 4.1.1. +, - and *. Arithmetic operators +, - and * mostly work as we would expect by computing the sum, the difference and the product of two values. 1 s i z e _ t a = 45; 2 s i z e _ t b = 7; 3 s i z e _ t c = (a - b)*2; 4 s i z e _ t d = a - b*2; must result in c being equal to 76, and d to 31. As you can see from that little example, sub-expressions can be grouped together with parenthesis to enforce a preferred binding of the operator. In addition, operators + and - also have unary variants. -b just gives the negative of b, namely a value a such that b + a is 0. +a simply provides the value of a. The following would give 76 as well. 3 s i z e _ t c = (+a + -b)*2;
  • 31. 4. EXPRESSING COMPUTATIONS 23 Even though we use an unsigned type for our computation, negation and difference by means of the operator - is well defined. In fact, one of the miraculous properties of size_t is that +-* arithmetic always works where it can. This means that as long as the final mathematical result is within the range [0, SIZE_MAX], then that result will be the value of the expression. Rule 1.4.1.1 Unsigned arithmetic is always well defined. Rule 1.4.1.2 Operations +, - and * on size_t provide the mathematically correct re- sult if it is representable as a size_t . In case that we have a result that is not representable, we speak of arithmetic overflowC . Overflow can e.g. happen if we multiply two values that are so large that their mathemat- ical product is greater than SIZE_MAX. We’ll look how C deals with overflow in the next section. 4.1.2. Division and remainder. The operators / and % are a bit more complicated, because they correspond to integer division and remainder operation. You might not be as used to them as to the other three arithmetic operators. a/b evaluates to the number of times b fits into a, and a%b is the remaining value once the maximum number of b are removed from a. The operators / and % come in pair: if we have z = a / b the remainder a % b could be computed as a - z*b: Rule 1.4.1.3 For unsigned values, a == (a/b)*b + (a%b). A familiar example for the % operator are the hours on a clock. Say we have a 12 hour clock: 6 hours after 8 o’clock is 2 o’clock. Most people are able to compute time differences on 12 hour or 24 hour clocks. This computation corresponds to a % 12, in our example (8 + 6)% 12 == 2.[Exs 8] Another similar use for % is computation with minutes in the hour, of the form a % 60. There is only one exceptional value that is not allowed for these two operations: 0. Division by zero is forbidden. Rule 1.4.1.4 Unsigned / and % are well defined only if the second operand is not 0. The % operator can also be used to explain additive and multiplicative arithmetic on unsigned types a bit better. As already mentioned above, when an unsigned type is given a value outside its range, it is said to overflowC . In that case, the result is reduced as if the % operator had been used. The resulting value “wraps around” the range of the type. In the case of size_t , the range is 0 to SIZE_MAX, therefore Rule 1.4.1.5 Arithmetic on size_t implicitly does computation %(SIZE_MAX+1). Rule 1.4.1.6 In case of overflow, unsigned arithmetic wraps around. This means that for size_t values, SIZE_MAX + 1 is equal to 0 and 0 - 1 is equal to SIZE_MAX. This “wrapping around” is the magic that makes the - operators work for unsigned types. For example, the value -1 interpreted as a size_t is equal to SIZE_MAX and so adding -1 to a value a, just evaluates to a + SIZE_MAX which wraps around to a + SIZE_MAX - (SIZE_MAX+1)= a - 1. Operators / and % have the nice property that their results are always smaller than or equal to their operands: Rule 1.4.1.7 The result of unsigned / and % is always smaller than the operands. And thus Rule 1.4.1.8 Unsigned / and % can’t overflow. [Exs 8] Implement some computations using a 24 hour clock, e.g. 3 hours after ten, 8 hours after twenty.
  • 32. 24 1. ACQUAINTANCE 4.2. Operators that modify objects. Another important operation that we already have seen is assignment, a = 42. As you can see from that example this operator is not symmetric, it has a value on the right and an object on the left. In a freaky abuse of language C jargon often refers to the right hand side as rvalueC (right value) and to the object on the left as lvalueC (left value). We will try to avoid that vocabulary whenever we can: speaking of a value and an object is completely sufficient. C has other assignment operators. For any binary operator @ from the five we have known above all have the syntax 1 an_object @= some_expression; They are just convenient abbreviations for combining the arithmetic operator @ and assignment, see Table 3. An equivalent form would be 1 an_object = (an_object @ (some_expression)); In other words there are operators +=, -=, *=, /=, and %=. For example in a for loop operator += could be used: 1 for ( s i z e _ t i = 0; i < 25; i += 7) { 2 ... 3 } The syntax of these operators is a bit picky, you aren’t allowed to have blanks between the different characters, e.g. “i + = 7” instead of “i += 7” is a syntax error. Rule 1.4.2.1 Operators must have all their characters directly attached to each other. We already have seen two other operators that modify objects, namely the increment operatorC ++ and the decrement operatorC --: • ++i is equivalent to i += 1, • --i is equivalent to i -= 1. All these assignment operators are real operators, they return a value (but not an ob- ject!). You could, if you were screwed enough write something like 1 a = b = c += ++d; 2 a = (b = (c += (++d))); // same But such combinations of modifications to several objects in one go is generally frowned upon. Don’t do that unless you want to obfuscate your code. Such changes to objects that are involved in an expression are referred to as side effectsC . Rule 1.4.2.2 Side effects in value expressions are evil. Rule 1.4.2.3 Never modify more than one object in a statement. For the increment and decrement operators there are even two other forms, namely postfix incrementC and postfix decrementC . They differ from the one that we have seen in the result when they are used inside a larger expression. But since you will nicely obey to Rule 1.4.2.2, you will not be tempted to use them. 4.3. Boolean context. Several operators yield a value 0 or 1 depending on whether some condition is verified or not, see Table 2. They can be grouped in two categories, comparisons and logical evaluation.
  • 33. 4. EXPRESSING COMPUTATIONS 25 4.3.1. Comparison. In our examples we already have seen the comparison operators ==, !=, <, and >. Whereas the later two perform strict comparison between their operands, operators <= and >= perform “less or equal” and “greater or equal” comparison, respec- tively. All these operators can be used in control statements as we have already seen, but they are actually more powerful than that. Rule 1.4.3.1 Comparison operators return the values false or true. Remember that false and true are nothing else then fancy names for 0 and 1 respec- tively. So they can perfectly used in arithmetic or for array indexing. In the following code 1 s i z e _ t c = (a < b) + (a == b) + (a > b); 2 s i z e _ t d = (a <= b) + (a >= b) - 1; we have that c will always be 1, and d will be 1 if a and b are equal and 0 otherwise. With 1 double largeA[N] = { 0 }; 2 ... 3 /* fill largeA somehow */ 4 5 s i z e _ t sign[2] = { 0, 0 }; 6 for ( s i z e _ t i = 0; i < N; ++i) { 7 sign[(largeA[i] < 1.0)] += 1; 8 } the array element sign[0] will hold the number of values in largeA that are greater or equal than 1.0 and sign[1] those that are strictly less. Finally, let’s mention that there also is an identifier “not_eq” that may be used as a replacement for !=. This feature is rarely used. It dates back to the times where some characters were not properly present on all computer platforms. To be able to use it you’d have to include the file iso646.h . #include <iso646.h> 4.3.2. Logic. Logic operators operate on values that are already supposed to repre- sent values false or true. If they are not, the rules that we described for conditional ex- ecution with Rules 1.3.1.1 and 1.3.1.2 apply first. The operator ! (not) logically negates its operand, operator && (and) is logical and, operator || (or) is logical or. The results of these operators are summarized in the following table: TABLE 1. Logical operators a not a false true true false a and b false true false false false true false true a or b false true false false true true true true Similar as for the comparison operators we have Rule 1.4.3.2 Logic operators return the values false or true. Again, remember that these values are nothing else than 0 and 1 and can thus be used as indices: 1 double largeA[N] = { 0 }; 2 ... 3 /* fill largeA somehow */ 4
  • 34. 26 1. ACQUAINTANCE 5 s i z e _ t isset[2] = { 0, 0 }; 6 for ( s i z e _ t i = 0; i < N; ++i) { 7 isset[!!largeA[i]] += 1; 8 } Here the expression !!largeA[i] applies the ! operator twice and thus just ensures that largeA[i] is evaluated as a truth value according to the general Rule 1.3.1.4. As a result, the array elements isset[0] and isset[1] will hold the number of values that are equal to 0.0 and unequal, respectively. Operators && and || have a particular property that is called short circuit evaluationC . This barbaric term denotes the fact that the evaluation of the second operand is omitted, if it is not necessary for the result of the operation. Suppose isgreat and issmall are two functions that yield a scalar value. Then in this code 1 i f (isgreat(a) && issmall(b)) 2 ++x; 3 i f (issmall(c) || issmall(d)) 4 ++y; then second function call on each line would conditionally be omitted during execution: issmall(b) if isgreat(a) was 0, issmall(d) if issmall(c) was not 0. Equivalent code would be 1 i f (isgreat(a)) 2 i f (issmall(b)) 3 ++x; 4 i f (issmall(c)) ++y; 5 e l s e i f (issmall(d)) ++y; 4.4. The ternary or conditional operator. The ternary operator is much similar to an if statement, only that it is an expression that returns the value of the chosen branch: 1 s i z e _ t size_min( s i z e _ t a, s i z e _ t b) { 2 return (a < b) ? a : b; 3 } Similar to the operators && and || the second and third operand are only evaluated if they are really needed. The macro sqrt from tgmath.h computes the square root of a#include <tgmath.h> non-negative value. Calling it with a negative value raises a domain errorC . 1 # include <tgmath.h> 2 3 # i f d e f __STDC_NO_COMPLEX__ 4 # error "we need complex arithmetic" 5 # endif 6 7 double complex sqrt_real(double x) { 8 return (x < 0) ? CMPLX(0, sqrt(-x)) : CMPLX(sqrt(x), 0); 9 } In this function sqrt is only called once, and the argument to that call is never negative. So sqrt_real is always well behaved, no bad values are ever passed to sqrt. Complex arithmetic and the tools used for it need the header complex.h which is#include <complex.h> indirectly included by tgmath.h. They will be introduced later in Section 5.5.6.#include <tgmath.h>
  • 35. 4. EXPRESSING COMPUTATIONS 27 In the example above we also see conditional compilation that is achieved with preprocessor directivesC , the #ifdef construct ensures that we hit the #error condition only if the macro __STDC_NO_COMPLEX__ isn’t defined. 4.5. Evaluation order. Of the above operators we have seen that &&, || and ?: condition the evaluation of some of their operands. This implies in particular that for these operators there is an evaluation order on the operands: the first operand, since it is a condition for the remaining ones is always evaluated first: Rule 1.4.5.1 &&, ||, ?: and , evaluate their first operand first. Here, , is the only operator that we haven’t introduced, yet. It evaluates its operands in order and the result is then the value of the right operand. E.g. (f(a), f(b)) would first evaluate f(a), then f(b) and the result would be the value of f(b). This feature is rarely useful in clean code, and is a trap for beginners. E.g. A[i, j] is not a two dimension index for matrix A, but results just in A[j]. Rule 1.4.5.2 Don’t use the , operator. Other operators don’t have an evaluation restriction. E.g. in an expression such as f(a)+g(b) there is no pre-established ordering specifying whether f(a) or g(b) is to be computed first. If any of functions f or g work with side effects, e.g. if f modifies b behind the scenes, the outcome of the expression will depend on the chosen order. Rule 1.4.5.3 Most operators don’t sequence their operands. That chosen order can depend on your compiler, on the particular version of that com- piler, on compile time options or just on the code that surrounds the expression. Don’t rely on any such particular sequencing, it will bite you. The same holds for the arguments of functions. In something like 1 p r i n t f ("%g and %gn", f(a), f(b)); we wouldn’t know which of the last two arguments is evaluated first. Rule 1.4.5.4 Function calls don’t sequence their argument expressions. The only reliable way not to depend on evaluation ordering of arithmetic expressions is to ban side effects: Rule 1.4.5.5 Functions that are called inside expressions should not have side effects.
  • 36. 28 1. ACQUAINTANCE 5. Basic values and data We will now change the angle of view from the way “how things are to be done” (statements and expressions) to the things on which C programs operate, valuesC and dataC . A concrete program at an instance in time has to represent values. Humans have a similar strategy: nowadays we use a decimal presentation to write numbers down on paper, a system that we inherited from the arabic culture. But we have other systems to write numbers: roman notation, e.g., or textual notation. To know that the word “twelve” denotes the value 12 is a non trivial step, and reminds us that European languages are denoting numbers not entirely in decimal but also in other systems. English is mixing with base 12, French with bases 16 and 20. For non-natives in French such as myself, it may be difficult to spontaneously associate “quatre vingt quinze” (four times twenty and fifteen) with the number 95. Similarly, representations of values in a computer can vary “culturally” from architec- ture to architecture or are determined by the type that the programmer gave to the value. What representation a particular value has should in most cases not be your concern; the compiler is there to organize the translation between values and representations back and forth. Not all representations of values are even observable from within your program. They only are so, if they are stored in addressable memory or written to an output device. This is another assumptions that C makes: it supposes that all data is stored in some sort of storage called memory that allows to retrieve values from different parts of the program in different moments in time. For the moment only keep in mind that there is something like an observable stateC , and that a C compiler is only obliged to produce an executable that reproduces that observable state. 5.0.1. Values. A value in C is an abstract entity that usually exists beyond your pro- gram, the particular implementation of that program and the representation of the value during a particular run of the program. As an example, the value and concept of 0 should and will always have the same effects on all C platforms: adding that value to another value x will again be x, evaluating a value 0 in a control expression will always trigger the false branch of the control statement. C has the very simple rule Rule 1.5.0.6 All values are numbers or translate to such. This really concerns all values a C program is about, whether these are the characters or texts that we print, truth values, measures that we take, relations that we investigate. First of all, think of these numbers as of mathematical entities that are independent of your program and its concrete realization. The data of a program execution are all the assembled values of all objects at a given moment. The state of the program execution is determined by: • the executable • the current point of execution • the data • outside intervention such as IO from the user. If we abstract from the last point, an executable that runs with the same data from the same point of execution must give the same result. But since C programs should be portable between systems, we want more than that. We don’t want that the result of a computation depends on the executable (which is platform specific) but idealy that it only depends on the program specification itself. 5.0.2. Types. An important step in that direction is the concept of typesC . A type is an additional property that C associates with values. Up to now we already have seen several such types, most prominently size_t , but also double or bool.
  • 37. 5. BASIC VALUES AND DATA 29 Rule 1.5.0.7 All values have a type that is statically determined. Rule 1.5.0.8 Possible operations on a value are determined by its type. Rule 1.5.0.9 A value’s type determines the results of all operations. 5.0.3. Binary representation and the abstract state machine. Unfortunately, the va- riety of computer platforms is not such that the C standard can impose the results of the operations on a given type completely. Things that are not completely specified as such by the standard are e.g. how the sign of signed type is represented, the so-called sign representation, or to which precision a double floating point operation is performed, so- called floating point representation. C only imposes as much properties on all representa- tions, such that the results of operations can be deduced a priori from two different sources: • the values of the operands • some characteristic values that describe the particular platform. E.g. the operations on the type size_t can be entirely determined when inspecting the value of SIZE_MAX in addition to the operands. We call the model to represent values of a given type on a given platform the binary representationC of the type. Rule 1.5.0.10 A type’s binary representation determines the results of all operations. Generally, all information that we need to determine that model are in reach of any C program, the C library headers provide the necessary information through named values (such as SIZE_MAX), operators and function calls. Rule 1.5.0.11 A type’s binary representation is observable. This binary representation is still a model and so an abstract representation in the sense that it doesn’t completely determine how values are stored in the memory of a computer or on a disk or other persistent storage device. That representation would be the object representation. In contrast to the binary representation, the object representation usually is of not much concern to us, as long as we don’t want to hack together values of objects in main memory or have to communicate between computers that have a different platform model. Much later, in Section 12.1, we will see that we may even observe the object representation if such an object is stored in memory and we know its address. As a consequence all computation is fixed through the values, types and their binary representations that are specified in the program. The program text describes an abstract state machineC that regulates how the program switches from one state to the next. These transitions are determined by value, type and binary representation, only. Rule 1.5.0.12 (as-if) Programs execute as if following the abstract state machine. 5.0.4. Optimization. How a concrete executable achieves this goal is left to the discre- tion of the compiler creators. Most modern C compilers produce code that doesn’t follow the exact code prescription, they cheat wherever they can and only respect the observable states of the abstract state machine. For example a sequence of additions with constants values such as 1 x += 5; 2 /* do something else without x in the mean time */ 3 x += 7; may in many cases be done as if it were specified as either
  • 38. 30 1. ACQUAINTANCE 1 /* do something without x */ 2 x += 12; or 1 x += 12; 2 /* do something without x */ The compiler may perform such changes to the execution order as long as there will be no observable difference in the result, e.g. as long we don’t print the intermediate value of “x” and as long as we don’t use that intermediate value in another computation. But such an optimization can also be forbidden because the compiler can’t prove that a certain operation will not force a program termination. In our example, much depends on the type of “x”. If the current value of x could be close to the upper limit of the type, the innocent looking operation x += 7 may produce an overflow. Such overflows are handled differently according to the type. As we have seen above, overflow of an unsigned type makes no problem and the result of the condensed operation will allways be consistent with the two seperated ones. For other types such as signed integer types (signed) or floating point types (double) an overflow may “raise an exception” and terminate the program. So in this cases the optimization cannot be performed. This allowed slackness between program description and abstract state machine is a very valuable feature, commonly referred to as optimizationC . Combined with the relative simplicity of its language description, this is actually one of the main features that allows C to outperform other programming languages that have a lot more knobs and whistles. An important consequence about the discussion above can be summarized as follows. Rule 1.5.0.13 Type determines optimization opportunities. 5.1. Basic types. C has a series of basic types and some means of constructing derived typesC from them that we will describe later in Section 6. Mainly for historical reasons, the system of basic types is a bit complicated and the syntax to specify such types is not completely straightforward. There is a first level of specification that is entirely done with keywords of the language, such as signed, int or double. This first level is mainly organized according to C internals. On top of that there is a second level of specification that comes through header files and for which we already have seen examples, too, namely size_t or bool. This second level is organized by type semantic, that is by specifying what properties a particular type brings to the programmer. We will start with the first level specification of such types. As we already discussed above in Rule 1.5.0.6, all basic values in C are numbers, but there are numbers of dif- ferent kind. As a principal distinction we have two different classes of numbers, with two subclasses, each, namely unsigned integersC , signed integersC , real floating point numbersC and complex floating point numbersC All these classes contain several types. They differ according to their precisionC , which determines the valid range of values that are allowed for a particular type.9 Table 2 contains an overview of the 18 base types. As you can see from that table there are some types which we can’t directly use for arithmetic, so-called narrow typesC . A a rule of thumb we get Rule 1.5.1.1 Each of the 4 classes of base types has 3 distinct unpromoted types. 9The term precision is used here in a restricted sense as the C standard defines it. It is different from the accuracy of a floating point computation.
  • 39. 5. BASIC VALUES AND DATA 31 TABLE 2. Base types according to the four main type classes. Types with a grey background don’t allow for arithmetic, they are promoted before doing arithmetic. Type char is special since it can be unsigned or signed, depending on the platform. All types in the table are considered to be distinct types, even if they have the same class and precision. class systematic name other name integers unsigned _Bool bool unsigned char unsigned short unsigned int unsigned unsigned long unsigned long long [un]signed char signed signed char signed short short signed int signed or int signed long long signed long long long long floating point real float double long double complex float _Complex float complex double _Complex double complex long double _Complex long double complex Contrary to what many people believe, the C standard doesn’t even prescribe the pre- cision of these 12 types, it only constrains them. They depend on a lot of factors that are implementation dependentC . Thus, to chose the “best” type for a given purpose in a portable way could be a tedious task, if we wouldn’t get help from the compiler implemen- tation. Remember that unsigned types are the most convenient types, since they are the only types that have an arithmetic that is defined consistently with mathematical properties, namely modulo operation. They can’t raise signals on overflow and can be optimized best. They are described in more detail in Section 5.5.1. Rule 1.5.1.2 Use size_t for sizes, cardinalities or ordinal numbers. Rule 1.5.1.3 Use unsigned for small quantities that can’t be negative. If your program really needs values that may both be positive and negative but don’t have fractions, use a signed type, see Section 5.5.5. Rule 1.5.1.4 Use signed for small quantities that bear a sign. Rule 1.5.1.5 Use ptrdiff_t for large differences that bear a sign. If you want to do fractional computation with values such as 0.5 or 3.77189E+89 use floating point types, see Section 5.5.6. Rule 1.5.1.6 Use double for floating point calculations. Rule 1.5.1.7 Use double complex for complex calculations.
  • 40. 32 1. ACQUAINTANCE TABLE 3. Some semantic arithmetic types for specialized use cases type header context of definition meaning uintmax_t stdint.h maximum width unsigned integer, preprocessor intmax_t stdint.h maximum width signed inte- ger, preprocessor errno_t errno.h Appendix K error return instead of int rsize_t stddef.h Appendix K size arguments with bounds checking time_t time.h time(0), difftime(t1, t0) calendar time in seconds since epoch clock_t time.h clock() processor time The C standard defines a lot of other types, among them other arithmetic types that model special use cases. Table 3 list some of them. The first two represents the type with maximal width that the platform supports. The second pair are types that can replace int and size_t in certain context. The first, errno_t, is just another name for int to emphasize the fact that it encodes an error value; rsize_t , in turn, is used to indicate that an interface performs bounds checking on its “size” parameters. The two types time_t and clock_t are used to handle times. They are semantic types, because the precision of the time computation can be different from platform to platform. The way to have a time in seconds that can be used in arithmetic is the function difftime: it computes the difference of two timestamps. clock_t values present the platforms model of processor clock cycles, so the unit of time here is usually much below the second; CLOCKS_PER_SEC can be used to convert such values to seconds. 5.2. Specifying values. We have already seen several ways in which numerical con- stants, so-called literalsC can be specified: 123 decimal integer constantC . The most natural choice for most of us. 077 octal integer constantC . This is specified by a sequence of digits, the first being 0 and the following between 0 and 7, e.g. 077 has the value 63. This type of specification has merely historical value and is rarely used nowadays.There is only one octal literal that is commonly used, namely 0 itself. 0xFFFF hexadecimal integer constantC . This is specified by a start of 0x followed by a sequence of digits between 0, ..., 9, a ...f, e.g. 0xbeaf is value 48815. The a .. f and x can also be written in capitals, 0XBEAF. 1.7E-13 decimal floating point constantsC . Quite familiar for the version that just has a decimal point. But there is also the “scientific” notation with an exponent. In the general form mEe is interpreted as m · 10e . 0x1.7aP-13 hexadecimal floating point constantsC . Usually used to describe floating point values in a form that will ease to specify values that have exact representations. The general form 0XhPe is interpreted as h · 2e . Here h is specified as an hexa- decimal fraction. The exponent e is still specified as a decimal number. ’a’ integer character constantC . These are characters put into ’ apostrophs, such as ’a’ or ’?’. These have values that are only implicitly fixed by the C stan- dard. E.g. ’a’ corresponds to the integer code for the character “a” of the Latin alphabet. Inside character constants a “” character has a special meaning. E.g. we already have seen ’n’ for the newline character.
  • 41. 5. BASIC VALUES AND DATA 33 "hello" string literalsC . They specify text, e.g. as we needed it for the printf and puts functions. Again, the “” character is special as in character constants. All but the last are numerical constants, they specify numbers. An important rule applies: Rule 1.5.2.1 Numerical literals are never negative. That is if we write something like -34 or -1.5E-23, the leading sign is not considered part of the number but is the negation operator applied to the number that comes after. We will see below where this is important. Bizarre as this may sound, the minus sign in the exponent is considered to be part of a floating point literal. In view of Rule 1.5.0.7 we know that all literals must not only have a value but also a type. Don’t mix up the fact of a constant having a positive value with its type, which can be signed. Rule 1.5.2.2 Decimal integer constants are signed. This is an important feature, we’d probably expect the expression -1 to be a signed, negative value. To determine the exact type for integer literals we always have a “first fit” rule. For decimal integers this reads: Rule 1.5.2.3 A decimal integer constant has the first of the 3 signed types that fits it. This rule can have surprising effects. Suppose that on a platform the minimal signed value is −215 = −32768 and the maximum value is 215 − 1 = 32767. The constant 32768 then doesn’t fit into signed and is thus signed long. As a consequence the expression -32768 has type signed long. Thus the minimal value of the type signed on such a platform cannot be written as a literal constant.[Exs 10] Rule 1.5.2.4 The same value can have different types. Deducing the type of an octal or hexadecimal constant is a bit more complicated. These can also be of an unsigned type if the value doesn’t fit for a signed one. In our ex- ample above the hexadecimal constant 0x7FFF has the value 32767 and thus type signed. Other than for the decimal constant, the constant 0x8000 (value 32768 written in hexadec- imal) then is an unsigned and expression -0x8000 again is unsigned.[Exs 11] Rule 1.5.2.5 Don’t use octal or hexadecimal constants to express negative values. Or if we formulate it postively Rule 1.5.2.6 Use decimal constants to express negative values. Integer constants can be forced to be unsigned or to be of a type of minimal width. This done by appending “U”, “L” or “LL” to the literal. E.g. 1U has value 1 and type unsigned, 1L is signed long and 1ULL has the same value but type unsigned long long.[Exs 12] A common error is to try to assign a hexadecimal constant to a signed under the expec- tation that it will represent a negative value. Consider something like int x = 0xFFFFFFFF. [Exs 10] Show that if the minimal and maximal values for signed long long have similar properties, the smallest integer value for the platform can’t be written as a combination of one literal with a minus sign. [Exs 11] Show that if in that case the maximum unsigned is 216 − 1 that then -0x8000 has value 32768, too. [Exs 12] Show that the expressions -1U, -1UL and -1ULL have the maximum values and type of the three usable unsigned types, respectively.
  • 42. 34 1. ACQUAINTANCE TABLE 4. Examples for constants and their types, under the supposition that signed and unsigned have the commonly used representation with 32 bit. constant x value type value of −x 2147483647 +2147483647 signed −2147483647 2147483648 +2147483648 signed long −2147483648 4294967295 +4294967295 signed long −4294967295 0x7FFFFFFF +2147483647 signed −2147483647 0x80000000 +2147483648 unsigned +2147483648 0xFFFFFFFF +4294967295 unsigned +1 1 +1 signed −1 1U +1 unsigned +4294967295 This is done under the assumption that the hexadecimal value has the same binary representation as the signed value −1. On most architectures with 32 bit signed this will be true (but not on all of them) but then nothing guarantees that the effective value +4294967295 is con- verted to the value −1. You remember that value 0 is important. It is so important that it has a lot of equivalent spellings: 0, 0x0 and ’0’ are all the same value, a 0 of type signed int. 0 has no decimal integer spelling: 0.0 is a decimal spelling for the value 0 but seen as a floating point value, namely with type double. Rule 1.5.2.7 Different literals can have the same value. For integers this rule looks almost trivial, for floating point constants this is less ob- vious. Floating point values are only an approximation of the value they present literally, because binary digits of the fractional part may be truncated or rounded. Rule 1.5.2.8 The effective value of a decimal floating point constant may be different from its literal value. E.g. on my machine the constant 0.2 has in fact the value 0.2000000000000000111, and as a consequence constants 0.2 and 0.2000000000000000111 have the same value. Hexadecimal floating point constants have been designed because they better corre- spond to binary representations of floating point values. In fact, on most modern architec- tures such a constant (that has not too many digits) will exactly correspond to the literal value. Unfortunately, these beasts are almost unreadable for mere humans. Finally, floating point constants can be followed by the letters f or F to denote a float or by l or L to denote a long double. Otherwise they are of type double. Beware that different types of constants generally lead to different values for the same literal. A typical example: float double long double literal 0.2F 0.2 0.2L value 0x1.99999AP-3F 0x1.999999999999AP-3 0xC.CCCCCCCCCCCCCCDP-6L Rule 1.5.2.9 Literals have value, type and binary representation. 5.3. Initializers. We already have seen (Section 2.3) that the initializer is an impor- tant part of an object definition. Accessing uninitialized objects has undefined behavior, the easiest way out is to avoid that situation systematically: Rule 1.5.3.1 All variables should be initialized.
  翻译: