## Sequence points

A particular type of question is asked time after time in C programming forums. There are two things about such questions that irritate the experienced programmers in the forums. Firstly, this type of questions is so common that many don't even want to respond to them even if it means posting a link to another thread where a similar question has been answered. Secondly, and more importantly, even if someone tries to provide the correct answer to the question, there are many others who ignore it and fill up the thread with incorrect answers.

The questions usually involve finding the output of a code like this.

#include <stdio.h>

int main()
{
int i = 5;
printf("%d %d %d\n", i, i--, ++i);
return 0;
}


The output is 5 6 5 when compiled with gcc and 6 6 6 when compiled with Microsoft C/C++ compiler that comes with Microsoft Visual Studio. The versions of the compilers with which I got these outputs are gcc (Debian 4.3.2-1.1) 4.3.2 and Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86.

Another such frequently asked question looks more or less like this.

#include <stdio.h>

int main()
{
int a = 5;
a += a++ + a++;
printf("%d\n", a);
return 0;
}


In this case, I got the output as 17 with both the compilers.

The behaviour of such C programs is undefined. In the statements printf("%d %d %d ", i, i--, ++i); and a += a++ + a++;, semicolon is the only sequence point. C guarantees that all side effects of a given expression is completed by the next sequence point in the program. If the value of a variable is modified more than once between two consecutive sequence points, the behavior is undefined. Such code may behave differently when compiled with different compilers.

Before I quote the relevant sections from the ISO/IEC standard, let me quote something from K&R. In Section 2.12 (Precedence and Order of Evaluation) of the book, the authors write,

C, like most languages, does not specify the order in which the operands of an operator are evaluated. (The exceptions are &&, ||, ?:, and ','.) For example, in a statement like
x = f() + g();
f may be evaluated before g or vice versa; thus if either f or g alters a variable on which the other depends, x can depend on the order of evaluation. Intermediate results can be stored in temporary variables to ensure a particular sequence.

In the next paragraph, they write,

Similarly, the order in which function arguments are evaluated is not specified, so the statement
printf("%d %d\n", ++n, power(2, n));   /* WRONG */

can produce different results with different compilers, depending on whether n is incremented before power is called. The solution, of course, is to write
++n;
printf("%d %d\n", n, power(2, n));


They provide one more example in this section.

One unhappy situation is typified by the statement
a[i] = i++;
The question is whether the subscript is the old value of i or the new. Compilers can interpret this in different ways, and generate different answers depending on their interpretation.

If you want to read more on this, download the ISO/IEC 9899 C standard and turn to page 438 for Annex C — Sequence Points. It lists down all the sequence points. ; is one of them. + and ++ operators are not sequence points.

Next, read section 5.1.2.3 (Program execution), point 2.

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects,11) which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)

## A problem of two files

Once upon a time there was a hacker called John. He ran many websites in various web servers. The web servers were located in Canada. They were directly connected to the Internet. There were no proxy servers or firewalls between the servers and the Internet.

The websites were very popular and thousands of users from all over the world visited his websites everyday. From a glance at the web servers' access logs, he knew that his servers received most of the hits from China. One day he wanted to create a report of the usage of his various web servers. The old fashioned John decided to write his own tool to do this.

He wrote a C program to process logs from these web servers. The tool would pull down the access logs from each web server via SCP and process them to create reports. For every valid request found in the log, a certain function called add_to_statistics was called. The file containing the definition of the function began in this manner:

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>


After those two lines in this function, the struct in_addr objects were not used at all. Later in the code the client IP address in client_ip was used to find the geographical location of the client from a database that mapped IP address ranges to countries. Finally, the code plotted a pie chart for each web server. The chart showed how many hits a server had received from each country. The larger the number of hits from a particular country to a server was, the larger was the slice for that country in the pie chart for the web server.

The colour to be used to fill the pie for each country was stored in a property file which began like this:

Canada: rgb(0, 0, 255)
India: rgb(0, 255, 0)
US: rgb(255, 0, 0)
China: rgb(255, 255, 0)

Which color was used most in the pie charts generated by the tool? Why?

Update: Prunthaban has answered the question correctly. The answer is: blue. In each pie chart Canada occupied the whole pie and hence blue was used to fill the entire pie chart for each web server. From the man page of inet_ntoa:

The inet_ntoa() function converts the Internet host address in, given in network byte order, to a string in IPv4 dotted-decimal notation. The string is returned in a statically allocated buffer, which subsequent calls will overwrite.

## Compiler taking input while compiling

I came across this tricky question a few weeks ago.

Write a small C program, which while compiling takes another program from input terminal, and on running gives the result for the second program. (NOTE: The key is, think UNIX).

At first, it looked like a strange puzzle. How can a C code force the compiler to request for input while it is already compiling the code? After a little thinking, the solution became obvious. I assumed that the compiler is allowed to accept the input during the preprocessing step.

The preprocessor allows us to direct the compiler to include another file with the #include directive. So, why not use it to read the standard input?

So, the solution has only one line of code: #include "/dev/stdin". On Windows machine, CON represents the standard input, so: #include "CON"

andromeda:/home/susam# cat reader.c
#include "/dev/stdin"
#include <stdio.h>

int main(int argc, char **argv)
{
printf("hello, world\n");
return 0;
}
hello, world


## Obfuscating main function

Today, I was amused to see the following code in an Orkut community.

#include <stdio.h>

#define decode(s,t,u,m,p,e,d) m ## s ## u ## t
#define begin decode(a,n,i,m,a,t,e)

begin()
{
printf("Stumped?\n");
}


3 years back, when I was in college, I used to run a mailing list called 'ncoders' which had around 150 members. I deleted it last year after the group became inactive and I lost interest in it. We used to discuss programming and Internet protocols in the group. One day we were discussing how we could obfuscate the main() function in C in a manner that the main() function didn't seem to appear in the code. I wrote the above code and posted it to the group. That's why I was amused to see it again on Orkut today. Probably, the code survived in the inboxes of some subscribers even though the community died. I searched the web to see if this code has been posted in other websites as well and indeed I found many occurrences of this code on the web.

Here's an explanation of the code.

Two tokens can be concatenated together in the preprocessed code using ## preprocessor operator. Now, the meaning of the macro f(s,t,u,m,p,e,d) becomes clear.

So, begin() becomes decode(a,n,i,m,a,t,e)(), decode(a,n,i,m,a,t,e)() becomes m ## a ## i ## n(), and m ## a ## i ## n() becomes main().