A quick introduction to C - learn programming in one day ============================================================ v1.00 Disclaimer: This is very dull stuff for everyone who can really program in C, so please don't waste your time and read it. If you still do, don't flame me telling me that it was lame. But for everyone who needs to get anything from basic knowledge to a thorough understanding of plain C, this tutorial should be a fast and easy to understand guide. This document might be improved for better comprehensibility later, if necessary. Any constructive input is very welcome. - Mixter http://mixter.warrior2k.com/C-intro.tar.gz http://mixter.void.ru/C-intro.tar.gz Contents I. Compiler and linker - Running gcc and ld II. Functions, types and function calls - The basic syntax of C III. Predefined keywords and expressions - The commonly used set of C keywords IV. Library functions and system calls - Some useful library functions for C V. Arrays and pointers - Pointer and memory concept in C VI. Structures - Structured programming VII. Advanced stuff - Some more important-to-know things VIII. Debugging, etc. - Some tips & tricks, and gdb usage I. Compiler and linker The compiler translates your C source into binary machine language format. The compiler is called cc, gcc, or c++/g++, if you're using UNIX. If you're using UNIX without a compiler, shoot yourself. If you want to follow this tutorial from Windows, download the gcc port CygWin32. ( [] means optional/not required. ... means more similar arguments follow. ) 1. cc [-c filename.o] source.c [source2.c ...] Generate object files - binaries with your code translated to machine code, but not an executable program (the functions aren't linked yet, there are no executable headers/tables and no entry/starting function). 2. ld -o executable object.o [object2.o ...] Link object files to a binary. The function that will be called on startup must be there, which is always the function called "main()". 3. cc -o executable source.c [source2.c source3.c ...] Compile and link an executable from source files. Do 1. and 2. together. Basic flags: ( Most of them can be at almost any place in the command line. ) -Wall warn on all problems -O, -O3, -O6 code optimization for speed -I PATH search for includes -g, -ggdb, -g3 compile with debugging info -L PATH search for libraries -ansi, -pedantic be more strict on sloppy code -v display what gcc is doing -include include a header at compile time -Dmacro[=definition], -Umacro define / undefine a macro at compile time Some preprocessor commands (can be anywhere in the source code): #include "filename" will read the file 'filename' and interpret it as if its contents were in the source right at the position of the #include #include do the same, but search for the file in the include path(s) #define NAME will internally replace all occurrences of NAME with 1 #define NAME 2 will internally replace all occurrences of NAME with 2 #undef NAME removes any previous definition of NAME #define MACRO(x) will define a macro (which works much like a function) II. Functions, types and function calls Every function looks like this: void func ( int a ) { } [1] [2] [3] [4] [1] TYPE: Every function returns a value. Values are stored in variables. Variables can be of the predefined types. Following keywords are types: char 1 byte (one character like 'a' is one byte) Value: -127 to 127 short 2 bytes (used for small numerical values) Value: -255 to 255 int 4 bytes (numerical values) Value: -(2^16) to 2^16 float 4 bytes (fraction values, like 1.5 0.02 10.10, etc.) long 8 bytes (big numerical values) Value: -(2^32) to 2^32 By default, these types are signed, meaning they can be negative. If preceded by the keyword "unsigned", the address space will be used for positive values only, and the maximum of a variable is twice as big. A variable preceded by keyword "const" (constant), can never be modified. [2] function name. every custom name for functions and variables in C can be up to 32 chars, starting with a letter, and contain numbers and _ [3] arguments. each function has at least an empty set of () braces after its name. an argument is a type an an argument name, which can be used inside the function. arguments are separated by commas: func (int abc, char xyz) [4] the function body containing C code, starting with { and ending with } Inside the function body, there can be initalizations of variables [1], at the beginning before anything else, expressions [2], and function calls [3]. All of these are always ended by a semicolon ; at the end. [1] Initalizations look like "type name;". This allocates a variable of the custom name which can then be accessed, e.g.: int i; Variables can also be initialized, meaning after the initalization they are given a value with the = sign, e.g.: char xyz = 'A'; [2] Expressions can be lots of things, like boolean operations, assigning values to variables and the use of the predefined C keywords. Like function calls, every expression implicitly returns a value. assign: i = 11; (returns 11) i = i - 1; (returns i-1) ... compare: i == 11; (return 1 if i is 11, else 0) i != 11; is the opposite increase by one: i++; (returns new value of i) decrease: i--; [3] Function calls look like this: name ( [args] ); If the function has no arguments, it can be called as name(); Else, each argument MUST be passed a valid value. E.g. int test (int i, char b) must be called with something like: test(1,'x'); Of course, either values or names of variables can be passed as arguments to the function. A sample program you should understand: int f2 (int y) { y = y * 5; return y--; } void f1 (int x) { int z = x; x = f2 (z++); } int main () { int abc = 1 + 2; f1 (abc); } III. Predefined keywords and expressions Following are the keywords in C and their usage. Some keywords you already know include the types, and the mathematic/boolean operators, which are: + (plus) - (minus) * (times) / (divide) % (rest) ^ (power) ! (not) = (assign) == (compare) < (compare if smaller) > (compare if bigger) ++ (increment) -- (decrement) != (compare if not) && (and: a&&b is 1 if a is 1 and b is 1) || (or: a||b is 1 if a is 1 or b is 1) += (add the value on the left to the value of the variable on the right), /= -= *= %= ^= /* */ comments. anything /* inside */ such comments will be ignored by gcc return. return is used to leave the current function and return a value to the previous function that called this one. e.g. if your function is int func(), you can return an int value. if your function is char func(), you can return a char value, and so on. in void functions, you just return; {} braces. braces are used to mark the beginning of a subroutine. at the beginning of a new subroutine, variables can be initalized. and subroutines is necessary for loop conditions, where more than one command is executed break. whenever in a for/while condition, break out of it. whenever in a case condition, stop it if break is reached. syntax: break; sizeof. returns the size of a variable in memory. sizeof(int) returns 4, cause an int takes 4 bytes. sizeof(char) returns 1, sizeof(char[8]) returns 8. while. used to make a loop, a block of code that repeats again as long as a given expression is true (isnt 0). syntax: while (expr) code; (while expr isnt 0, run code again). example: int i = 1; while (i) { i--; i++; } for. loop given an initial command, an expression, and a command that is run on each looping. syntax: for (command;expr;command) code; same example with for: int i; for (i=1; i; i--) i++; switch / case. compare an expression to several values and execute code that is supposed to run when the expressions match. example: switch (i) { case 12: /* i is 12 */ break; case 0: /* i is 0 */ break; case -32: /* i is -32 */ break; default: /* i is none of the above */ } if. run code if an expression is true. syntax: if (expr) code; [ else code; ] example: if (i == j) { j -= k; j--; } else k = j; (if i equals j subtract k from j and decrease j. or else assign j's value to k) ?. the question mark is used in what is called ternary condition. it is a simplified if/else, syntax: value?expr1:expr2 (if value is nonzero, return expr1, else expr2. expr's be anything from functions to complex expressions) example: biggest = ( i > j ) ? i : j; A sample program you should understand: int main() { int i = 100; while (i > 50) { switch (--i) { case 75: i++; break; } if (i == 76) i -= 2; } for (i = 200;i > 25;i--) { if (i % 2) i /= 2; if (i % 10) break; } printf ("%d %s be divided by five\n", i, ( !(i%5) ? "can" : "cannot" )); return 0; } IV. Library functions and system calls Before continuing with the actual C language, you need the most basic system and library functions. They are not called differently than your own, custom functions. Some are written in assembler, to wrap around low level system calls of the operating system, others are complex loops and input handling functions that would cost a lot of time to be written manually at low level. There are around 200 system functions and 2000 basic library functions in a normal operating system. To learn more than the basic functions, read the man pages in /usr/man/man2 and /usr/man/man3. They have a comprehensible format, divided into description, necessary include header files, function return types and arguments, detailed description of what the function does, and return value. From this point, you may encounter functions that aren't explained in this tutorial. If you don't understand them, just look them up in the man pages. void exit(int); - terminate a program. (Note: This function returns void, meaning it returns nothing, and takes one argument, of type int - a number). int printf(char *format, ...); - printf takes a "format" argument which must be a text string (char * means text string containing one or more characters, see chapter V. for a detailed explanation). It takes an amount of arbitrary variables after that, and expands the format arguments in its first string, displaying the supplied variables as part of the text strings. Some format arguments are: %d for a int, %s for a 'char *' string, %c for a single character, %f for float. In the "format" argument, printf also expands certain characters preceded by a \ slash to control characters, like \n to enter, \r to carriage return, \x01 to hexadecimal 1 in the ascii table. printf ("%c %d %s!\n", 'C', 4, "idiots"); displays: "C 4 idiots!" Quotes "" have to be used for marking text strings in C, '' for single chars. Related functions: fprintf, sprintf, vsprintf, vfprintf, scanf, sscanf, etc. int open(char *file, int flags); - open a file on the disk. the flags can be, for example, O_RDONLY for reading and O_WRONLY|O_CREAT for creating and writing a file. Those are an example of internal values #define'd in the headers. The return value of an open() call is the file descriptor, a reference number to the open file, and must be assigned to an int variable, i.e. int filedesc = open ("/etc/motd", O_RDONLY); On failure, -1 is returned. Related: fopen, close, fclose, read, write int read(int fd, char *buf, int siz); - read text from file descriptor fd, store it in char *buf, and dont read more than bytes. read() returns how many bytes were actually read. ex.: read(filedesc,mytext,100); Related: write, open, close, lseek, readdir, fnctl, select, ioctl, fwrite int write(int fd, char *buf, int size); - write our text from char *buf to file descriptor fd, attempt to write bytes. returns the amount of written bytes. ex.: write(1,"Hello world",11); (Explanation: file descriptors 0, 1, and 2 are opened by the operating system. 0 is used for reading direct input, 1 for writing to the current terminal, and 2 for writing to the users terminal on a separate channel, the "standard error" output). int close(int fd); - close file. (the file descriptor will become meaningless) Related: fclose, fcntl, shutdown, unlink int snprintf(char *str, int siz, char *format, ...); - like printf(), but the output will not be displayed, but instead saved in the variable buffer char *str (the first argument). siz is the maximum amount of bytes that may be written to str. (text string buffers have a fixed size that cannot be exceeded). the sprintf(char *str, char *format, ...) function can also be used for similar tasks, when the input variables are of predictable length. Related: sscanf, scanf, strcpy, strncpy, strdup, memcpy, strcat, strncat int scanf(char *format, ...); - scanf uses a format argument like printf, but it reads user input. it returns the number of variables that have successfully been filled with user input. user input is read as a text string and converted to fit into the variables via the format string. ex.: scanf("%d %c", &x, &y); ...expects the user to type a number and a character, separated by a space. Related: sscanf, vscanf, vsscanf, vfscanf, getc, printf, strtol int strcmp(char *string1, char *string2); - compare text in string1 and string2. return 0 if both strings are exactly the same, else 1 or -1. Related: strncmp, strstr, memcmp, strcasecmp, strncasecmp, strcoll int strlen(char *string); - return the length of the string in characters. int atoi(char *string); - convert a number in text form to int, and return the value as int. ex.: atoi("12345"); returns 12345 Related: atol, atof, strtol, strtoul, strtod int sleep(int seconds); - suspend the program for an amount of seconds. Related: usleep, setitimer, getitimer, select, time, signal, alarm int isalnum(int c); - check the value of c, as a character, and return 1 if the character is a letter or a number, else return 0. Related: isalpha, isascii, isblank, iscntrl, isdigit, isgraph, islower, ... int execl(char *filename, char *arg0, ...); - stop executing the current program, and execute the program specified by filename instead with the given number of arguments. the last argument must be NULL (which is defined to 0). ex.: execl("/bin/ls", "-la", "/", NULL); stops the program and runs /bin/ls A sample program you should understand: #include /* these includes are needed for the system calls */ #include /* you can find their names in the man pages */ #include #include #define HI "enter some text on the next line\n" /* you can #define anything */ int main() { int fd = open("blah", O_WRONLY|O_CREAT); char mytext[100], stuff[100]; write(2, HI, sizeof(HI)); scanf("%100s", mytext); printf("your typed %d characters\n", strlen(mytext)); printf("isalnum(%c) returned %d\n", 0x90, isalnum(0x90)); printf("now enter a number\n"); read(0, mytext, 100); printf("atoi(\"%s\") returned %d\n", mytext, atoi(mytext)); snprintf(stuff, 100, "this is a stupid file %d %d %.10s\n", 8, 9, "10"); write(fd, stuff, strlen(stuff)); close(fd); printf("strcmp(\"xx\",\"xx\") == %d\nplease wait...\n", strcmp("xx", "xx")); sleep(5); execl("/bin/sh", "sh", "-c", "/bin/cat blah ; /bin/rm -f blah", NULL); exit(0); } /* compile and run this program to verify the stuff it does */ V. Arrays and pointers Unlike most other higher programming languages, C lets the programmer interact directly with memory addresses, and every variable, array, and string, has a direct location in memory. Therefore, like assembly language, C uses pointers, which contain nothing else than memory addresses, which can reference to data anywhere in the programs memory. See chapter II for the sizes of variables. An array is an amount of variables of identical type that are located one after the other in memory. Such an array can be of any type, but most frequently is an array of characters, which is trivially called text or string. The way to access an array is to have a pointer to point to its beginning in memory. Every function that operates with strings or arrays takes a pointer as argument. main() {char *p = "text"; printf("this is a %s\n",p);} For example, for each string, the printf() function takes a char pointer, and dereferences it to the array in memory. It then accesses the string of characters that are at the memory address directly. char *name; initializes a pointer variable. a pointer variable is 4 bytes large and can contain any memory address in the programs memory. this is a char type pointer. pointers can be any type, but each needs to have a certain type, because they reference raw data in memory, and C needs to recognize it as a certain type of data note: void *, the void pointer is an exception that can refrence any type of data in memory. normally, it should not be used. char name[10]; initializes a pointer variable, and points it to an array of data of the size of (in this case) 10 times the space of the variable type in bytes. char takes 1 byte, so this locates a 10 byte array and points name, a normal char* pointer to it &variable the & references the address of a variable. any use of & in C preceding a variable always returns its memory address that can be assigned directly to a pointer. ( int i; int *p = &i; ). a use for referencing is passing variables to functions that take pointer arguments: int f(int *var) {return *var + 2;} main() {int a = 5; printf ("%d + 2 = %d\n", a, f(&a));} *variable * is the opposite, a dereference. it returns the actual value of the contents in memory, as a variable of the pointer's type: main() {char *text = "hi world"; printf("%c\n", *text);} pointer[4] dereference of the 4th element of the array that pointer points to. text[4] of char *text = "hello"; returns 'o' (remember that counting in C begins at 0. the 0th element is char 'h'). (char) 120 type casting is a method used in c to "convert" a variable from int i = (int) 'x'; one type to another. to cast types, precede the variable float f = (float) i; or value with a different type with the newtype, i.e.: (int) f (type) variable. this expression returns the actual variable/value with the new type. use type casting with care, and only when really needed, because it is used to override the "sanity checks" of the compiler. e.g.: (unsigned int) -1 returns 4294967295 (try to find out why ;) (char *) 0 will crash, since the program dereferences 0, then tries to read memory at non-existent address zero main() {char *p = "x"; printf ("0x%x",(int)p);} will print the value of pointer p as an integer, converted to hex. the value you'll see is an actual address in memory. main() {printf ("0x%x",(int)&"xyz");} will do a similar thing. malloc(int size) malloc is a function to allocate new memory dynamically in the program. the memory is allocated at a differnet location in memory (heap / data segment). it can have a dynamic size, unlike the memory allocated in normal arrays, like "char[100]" (it is theoretically possible to allocate dynamic sizes, like char[myint], but this results in un-portable programs and violates the ansi C programming standards). hence, you should use malloc() when you need an amount of memory whose size you can't determine at compile time (such as input/output from the user or from the network. the other approach is to use a buffer, just a big, normal array of which you only use the amount of space that you need each time). the argument to malloc is the size in bytes/characters to be allocated. the return address must be type casted and assigned to a pointer: char *p = (char *)malloc(100); int *i = (int *)malloc(100 * sizeof(int)); free(void *pointer) free must be used to release any unused memory previously allocated by malloc(). e.g. char *p = (char *)malloc(100); /* program here */ free(p); memset(void *dest, int c, int len) good example of a function that operates with pointers. memset will fill "len" bytes in the memory beginning at the address pointed to by "dest" with the character "c" (it is cast to a char internally). memcpy(void *dest, void *src, int len) another example. memcpy will copy "len" bytes in the memory at the address pointed to by "src" into the memory at the address pointed to by "dest" Make sure to compile and run both following programs, modify and debug them if necessary, until you fully understand them. A sample program with explanations: int main() { /* a pointer that contains the address of an array of 100 bytes which we just allocated in stack memory with this initalization statement */ char charp[100]; /* an un-initalized int pointer, its own size is 4 bytes, since it is design to contain memory addresses, which are 4 bytes long */ int *intp; int blah = 543210; /* an int variable, an int is 4 bytes */ intp = &blah; /* p now contains the address of the blah integer */ /* fill 100 bytes of the memory that charp points to (the allocated array in this case) with binary zeroes (used as delimiter in text strings) */ memset(charp,0,100); /* Write into our array pointed to by charp, with a maximum of 100 bytes. intp contains the address of int blah, the expression. The expression *intp dereferences this address and returns the contents of the memory address */ snprintf(charp, 100, "The content at memory address 0x%lx evaluated as int is: %d", intp, *intp); puts(charp); /* print out the contents of our array/buffer to console */ memset(charp,0,100); snprintf(charp, 100, "The content at memory address 0x%lx evaluated as char is: %c", intp, *intp); puts(charp); } A sample program you should understand: int main() { char p[10], *q = p, *xyz = (char *)&q, abc, *x = (char *)malloc(100); const char format[] = "0x%lx - 0x%lx = %d, &q = 0x%lx, abc = %c"; strncpy(p, "abcdefghij", sizeof(p)); abc = p[3]; (int *)q += 1; /* 4 bytes */ snprintf (x, 100, format, q, p, q - p, *xyz, abc); printf ("%s\n", x); printf ("%c%c%c%c\n", p[2], (p[0]) + 14, x[strlen(x)-1], p[4]); free(x); } VI. Structures A structure declaration can consist of declarations of various types and arrays, and even other structures. An actual structure consist of variables, arrays, and structs of the declared types, which are allocated in a consecutive area in memory. That way, a pattern of variables can be once declared, and each time it is used, the necessary memory for all elements is allocated at once, and the elements can be addressed as members of this structure. A structure declaration in C looks like: struct somename { int one; float two; char three[100]; } mystruct; 'somename' is the declaration name. it follows the C keyword 'struct'. To initalize an actual structure in memory, we use the declaration name to allocate enough memory to hold the variables we declared in our pattern: int main() { struct somename myname; } A declaration is optional and not syntactically necessary when a 'struct' is used to directly initialize a structure, like 'mystruct'. int one, float two and char three[] are the members of the structure. They are the variables each initalized structure will consist of. We can also see how big our somename-structures in memory will be: each 12 bytes (one int, 4 bytes, one float, 4 bytes, 4 char = 4 bytes). At the place where you see the initalization of 'mystruct' (after the closing brace before the semicolon), variables of the structure can be initalized. The typedef keyword: a type definition is simply an intruction to the compiler to create a new keyword. The syntax is: typedef ; An example of a structured program with explanation: struct processor { /* declare our structure */ float mhz, volt; char generation[4]; }; /* type definitions... type blahh = float, type hmm = char pointer, etc. */ typedef float blahh; typedef char * hmm; typedef char x; typedef struct processor cpu; /* type "struct processor" = type "cpu" */ int main() { /* normal initalizations of variables and structures */ blahh v; /* float */ hmm id1 = "686"; /* char pointer */ x * id2 = "586"; /* char pointer */ struct processor cyrix; /* processor structure */ cpu pentium; /* processor structure */ v = 3.0; cyrix.mhz = 333.33; cyrix.volt = v; strcpy(cyrix.generation,id1); /* structs of one type are identical in size and variables, hence we can even copy structs of the same declaration around in memory */ memcpy (&pentium, &cyrix, sizeof (struct processor)); pentium.mhz -= 100.0; strcpy(pentium.generation,id2); printf ("pentium: mhz: %f volt: %f generation: %s\n", pentium.mhz, pentium.volt, pentium.generation); printf ("cyrix: mhz: %f volt: %f generation: %s\n", cyrix.mhz, cyrix.volt, cyrix.generation); return 0; } Theres something more: structure pointers. Basically, this is nothing new, just putting together what you (should) know about pointers and about structs already. A structure declaration is a type, and a structure is a variable just like any other, except that it consists of smaller members of different types. So yes, for each structure you declare you can initialize pointers of that structure type, that point to a memory area of the size of your structure. The only new thing here is the way that you address the actual elements of a structure in memory that a structure pointer points to. Like you noticed from the example above, elements of a structure are addressed with name.element (a '.' between the structure name and the member). But when the structure is not an actual variable but a structure pointer, the elements are dereferenced using the '->' expression, example: struct name *namep; namep->element = 1; An example of structure pointers used for a linked list with explanations: #include /* we need this for fscanf, fopen, NULL, ... */ #define FILENAME "ipaddr.txt" /* put some numeric ip addresses in here... */ struct host /* declare structure type "host"... */ { int ip[4]; /* consisting of 4 int's to hold an ip address */ struct host *next; /* and a pointer of its own type (yes, possible) */ }; int main() { FILE *fp = fopen(FILENAME, "r"); /* open the list with numeric addresses */ /* a pointer to our first host structure. we malloc() the memory for it */ struct host *firsthost = (struct host *) malloc(sizeof(struct host)); struct host *hostp, *np; /* more struct pointers */ /* point hostp to our first host struct. while the end of the file isn't reach (feof() library function), go on to parse its input as follows */ for (hostp=firsthost;!feof(fp);) { fscanf(fp, "%d.%d.%d.%d\n", /* expect a numeric ip on one line */ &hostp->ip[0], &hostp->ip[1], /* parse it into our int's */ &hostp->ip[2], &hostp->ip[3]); /* allocate a new host struct that the 'next' pointer will point to */ hostp->next = (struct host *) malloc(sizeof(struct host)); /* now point from this structure to that next one. we don't loose the address in memory, since we have know the first host struct, and it's next pointer points to the second host struct, and the next pointer of that struct to the following, and so on. this principle is called "linked list" and is something realized with structure pointers in a structure of the same point than the structure */ hostp = hostp->next; } hostp->next = NULL; /* put a NULL here so we'll know it's the last record */ /* similar loop. works its way through our linked chain of host structs, until the last one (whose next pointer is NULL), and prints the stuff */ for (hostp = firsthost;hostp->next != NULL;hostp = hostp->next) { printf("Address: %d.%d.%d.%d\n", hostp->ip[0], hostp->ip[1], hostp->ip[2], hostp->ip[3]); } hostp = firsthost; /* now, "clean up" memory by going through */ np = hostp; /* our linked list of host structs and */ while (hostp->next != NULL) /* free()ing every struct that we have */ { /* allocated dynamically. this program */ np = hostp->next; /* does it by saving the next struct address */ free(hostp); /* in an extra pointer, freeing the current */ hostp = np; /* struct and then pointing to the next */ } /* struct. kind of messy and there are better */ return 0; /* approaches, but it works ;) */ } A sample program you should understand: struct pizza { int salami, cheese, tomatoes, anchovies; char taste[16]; int idx; }; struct pizza pizzas[] = /* This is how you fill in structs, with {}... */ { {1, 1, 0, 0, "yummy", 0}, {0, 1, 0, 0, "cheesy", 1}, {0, 1, 0, 1, "great", 2}, {0, 1, 1, 0, "tasty", 3}, {1, 1, 0, 1, "fine", 4}, {0, 0, 0, 0, "strange", 0} }; void getdesc(char *buffer, struct pizza *pp) { memset(buffer, 0, 128); if (pp->salami) snprintf(buffer, 128, "salami"); if (pp->cheese) snprintf(buffer + strlen(buffer), 128 - strlen(buffer), "%scheese", pp->salami ? " and " : ""); if (pp->tomatoes) snprintf(buffer + strlen(buffer), 128 - strlen(buffer), "%stomatoes", (pp->salami || pp->cheese) ? " and " : ""); if (pp->anchovies) snprintf(buffer + strlen(buffer), 128 - strlen(buffer), "%sanchovies", (pp->salami + pp->cheese + pp->tomatoes) ? " and " : ""); } int main() { struct pizza *pizzap = &pizzas[0]; char buffer[128]; strncpy(pizzas[2].taste, "odd", 16); while (strcmp(pizzap->taste, "strange")) { getdesc(buffer, pizzap); printf("Pizza #%d with %s tastes %s\n", pizzap->idx, buffer, pizzap->taste); pizzap = &pizzas[pizzap->idx + 1]; } return 0; } VII. Advanced stuff Low-level bit operations So, you're still here? Ok, here is some complicated stuff, starting with bit operations. You already know the logical/boolean and (&&), or (||), and not (!). You can use them to generate boolean values 0 or 1 like: if ((condition1 && condition2) || !condition3) ... && |(C1)==1|(C1)==0 || |(C1)==1|(C1)==0 ! ----------------------- ----------------------- ----------- (C2)==1| 1 | 0 (C2)==1| 1 | 1 (C2)==1| 0 (C2)==0| 0 | 1 (C2)==0| 1 | 0 (C2)==0| 1 As bit operators, we have similar operations, & (and), | (or), ~ (not), but it means, that each single bit is &'ed |'ed or 'ed with the rules above. Example: char i; i = 1 & 0; /* i is 0, 00000001 & 00000000 = 00000000 */ Some useful example you can realize with this are flags (storing more than one attribute in an integer) it's pretty hard to explain, so here's an example: #define FLAG_A 1 /* Note: there are tricks to implementing this. */ #define FLAG_B 2 /* This won't work with arbitrary values for flags. */ #define FLAG_C 3 #define FLAG_D 4 main() { int i = FLAG_A|FLAG_B; i |= FLAG_C; if (i & FLAG_A) puts("flag a"); if (i & FLAG_B) puts("flag b"); if (i & FLAG_C) puts("flag c"); if (i & FLAG_D) puts("flag d"); } Another operator we have are the byte shifts, << and >>. x<<1 shifts all bits in a byte left 1 position, x>>2 shifts them right two positions. The bits that are "shifted out" get lost, and the new bits at the other end of the byte will be binary 0. Talking in bits now, here's an example: 00000001 << 1 == 00000010 01100101 >> 3 == 00001100 If you're stunned and totally fascinated now, try figuring out the output of the following line... have fun (clue: 1024, 8 and 128 are potences of 2): main() { printf ("%d %d %d %d\n", 1024 << 1, 8 >> 8, 8 << 8, 128 >> 3); } Prototypes Every good program should have its functions prototyped. This is also essential when managing lots of source files that contain functions and form one big application. Prototypes should be put in header files. We used prototypes in chapter IV, like they're found in the man pages, too. E.g.: type/returnvalue function (argtype name, argtype name, etc.); int read (int fd, char *buf, int siz); Or simpler: type function(type, type, type, etc.); int read (int, char *, int); (The names of the arguments in prototypes are allowed to enhance readability.) extern. No matter how big your project is and how many source files you got laying around, each actual variable and prototype may only be declared once. Now, you may run into problems. Let's say you want to call your function void useconsole(int fd); from main.c, which contains the main function. But you need it at different places as well, so you prototyped it in: main.h, console.h, colors.h, ansi.h, etc... compiling object files will work just fine, but upon linking, the compiler complains about duplicate symbols. Here you need the 'extern' keyword. Put extern before your function prototypes, and variables, and it means, do not really declare that function or variable, because that is done elsewhere, and my functions will find the variables/functions when linking, but here's the specification so my function knows the function arguments/types it is working with. Examples: extern int counter; extern char buffer[]; extern int getinput(char *); Threads (well, kind of) Modern computers are multitasked and can execute numerous processes at once, blah blah, etc. Well the CPU can't really, it processes one instruction of one process at a time, but your operating system can multitask by maintaining a concurrent queue of operations that are waiting to be executed for various processes that it manages in an internal process table. "Real" threads mean that you have one process image and have various instructions executed in parallel, let the process wait for the completion of such instructions and move on with a different part of its program. The way of spawning new processes is also the easiest way of letting a program do different things in parallel, by simply letting it run as various different processes. The system function for this is called fork(). When it is called, the process splits into two. The really important thing is the return value of fork, because you fork() in a program with the same program code. The only way to execute different tasks is by depending on the return value of fork. fork() will return 0 to the new process (or "child" process), and return non-zero, namely the new process number to the original (or "parent") process. Example: int main() { int pid = fork(); if (pid == 0) { /* do some childish stuff here... */ exit(0); } else { printf ("my child's pid is %d...\n", pid); wait4(pid,0,0,0); /* wait4() is a system function that hangs the parent process around until the child process has terminated. if you decide not to wait() for your child processes, you should ignore SIGCHLD signals, since the child processes expect you to wait by default, and if you don't, they will stay in a "zombie"/"defunct" state unless you ignore SIGCHLD */ } return 0; } Input/Output redirection, pipes, and synchronous I/O multiplexing Remember, file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are always open? They point to the users console, by default. But they can be closed, and reopened, and the input/output will be redirected to any other file descriptor. int dup(int); is a function that duplicates a file descriptor. For example, dup(0) will return a new descriptor, an integer, which points to the same input/output as fd 0. int dup2(int, int); does the same, but the file descriptor's number to be used as duplicate can be specified. The first argument is the file descriptor, the second one is the number of the new descriptor that should be a duplicate of fd one. That way, it is possible to control redirection of a fd to a descriptor that is in use, such as 0, 1, 2, or whatever. An example for dup2: #include #include #include #include #include int main(int argc, char **argv) { char *text = "Hello world 12345\n", buf[100]; int fd, fd2 = 10; /* file flags: read+write, truncate file, create file new if not existing */ fd = open("file1", O_RDWR | O_TRUNC | O_CREAT); write(fd, text, strlen(text)); lseek(fd, 0, SEEK_SET); /* rewind the file pointer to byte 0 */ printf("fd: %d fd2: %d\n", fd, fd2); dup2(fd, fd2); /* redirect fd (descriptor 3) to the fd2 (desc. 10) */ printf("fd: %d fd2: %d\n", fd, fd2); read(fd2, buf, strlen(text)); printf("file1, fd2, contents: %s\n", buf); close(fd); close(fd2); unlink("file1"); /* clean up */ return 0; } int pipe(int[2]); takes an array of 2 integers, and assigns them the values of two new descriptors. these descriptors will point to a "pipe", meaning, everything that is written on fd[1] can be read on fd[0]. this can be useful in having child and parent process communicate with each other. int select(int n, fd_set *readfd, fd_set *writefd, fd_set *exceptfd, struct timeval *timeout); select() is a system call that will watch over an amount of file descriptors, for a given amount of maximum time, and return those which are ready for reading, writing, or where an exception has occured. select will return the number of descriptors whose status has changed. the first argument is a number that should be 1 higher than the highest integer value of the biggest descriptor. fd_sets readfd,writefd,exceptfd are used to store file descriptors inside them using FD_SET(descriptor, &fdset), after the select call the status of those descriptors is returned by FD_ISSET(descriptor, &fdset). 1 indicates a change, 0 indicates no change. the last value, timeout is a struct timeval and specifies the maximum amount of seconds and microseconds that the select() call will wait for any changes. if the timeval fields are both 0, select returns immediately, and if the argument is NULL (a null pointer), select will wait forever until any descriptor changes. Note: I won't include sockets or networking in this tutorial, but understanding this kind of I/O usage will help you in programming better client/server apps. An example for threads, synchronous I/O and pipes: #include #include #include #include #include #include #include #define CHLDS 9 /* The amount of sub-processes is variable */ void dothread(int *, int *); /* This spawns a child process that sends some data over its pipe to the parent */ int main(int argc, char **argv) { /* We're going to need an array for the childs pid's, 2*9 integers for the pipe file descriptors of the 9 child processes, a buffer to read from the pipes, a read-status fileset a struct timeval for select() */ int childs[CHLDS + 1], pipes[CHLDS + 1][2], i, j, cols = 0; char buf[64]; fd_set fds; struct timeval t; t.tv_sec = 3; t.tv_usec = 0; signal(SIGCHLD, SIG_IGN); /* Avoid zombies, since we don't wait() */ for (i = 0; i < CHLDS + 1; i++) { pipe(pipes[i]); /* Initialize the pipe */ dothread(&childs[i], pipes[i]); /* Spawn a child and provide it a pipe */ } for (i = 1; i > 0;) { FD_ZERO(&fds); /* Zero out the fdset before calling select() */ for (j = 0; j < CHLDS + 1; j++) FD_SET(pipes[j][0], &fds); /* Put each read descriptor in the fileset */ /* Do the select call to see on which fd's incoming data is pending */ i = select(pipes[CHLDS][1] + 1, &fds, NULL, NULL, &t); for (j = 0; j < CHLDS + 1; j++) if (FD_ISSET(pipes[j][0], &fds)) { /* Is there data to read? */ memset(buf, 0, 64); read(pipes[j][0], buf, sizeof(buf) - 1); printf(" #%d:%s ", j, buf); /* Read and print it out */ if (cols++ > 10) { /* Some check to avoid writing across the end */ cols = 0; /* of each line. Every computer sciene teacher */ printf("\n"); /* would love this part... =P */ } } } printf("finished\n"); return 0; } void dothread(int *p1, int *p2) { int i = 100000; char buf[32]; if ((*p1 = fork())) return; /* Assign the return value of fork to the integer. For the parent thread, return */ while (i--) { srand(time(NULL) & getpid()); usleep((rand() % 100) * 100); snprintf(buf, 32, "\x1b[%d;3%dmX\x1b[0;0m", (rand() % 2), (rand() % 7)); write(p2[1], buf, strlen(buf)); /* Write some stuff to the pipe */ } raise(9); exit(0); /* Don't let the child thread return and go on in */ } /* the parent's thread of operation! */ VIII. Debugging, etc. There are some generic tools that will make it easier for you to understand the execution thread of programs. Of course, the cheapest way is still to put printf's or similar things into your program, but the usage of debugging and development tools can be quite easy and is worth a try. gcc preprocessor - There's an interesting feature that you can use in complicated program to realize a "context" variable. An example where you can find this in a GNU program is eggdrop. The preprocessor expands all occurrences of __LINE__ in a source file with the current line, and __FILE__ with the current file... example: bash$ printf "__LINE__ __FILE__\n__LINE__\n__LINE__\n" > a.c ; gcc -E a.c # 1 "a.c" 1 "a.c" 2 3 To utilize this, make global variables int context; and char filename[50]; and #define Context context=__LINE__;strncpy(filename,__FILE__,50) Then put lots of Context; in your program and make it print out the last context on an abnormal termination (assertion failed, segmentation fault, etc.) strace and ltrace - strace is run as ./strace progname, and will intercept and show all the system calls a program makes by intercepting them with ptrace calls. strace works for linux - for Solaris and BSD, you use truss or ktrace and kdump. you can even attach these to a running program, or trace a program with all its child processes. there are quite a few options which you should check out if you don't know them. ltrace is a similar program which intercepts calls of shared dynamic libraries. additional to the low-level system calls, you can see the program make all libc calls, and calls to other libraries it is dynamically linked to, in real-time. indent - a source formatting and indenting program that can be very helpful. there are some guidelines for writing source in a comprehensible format, like GNU, K&R and Berkeley. If you take the examples inside this document, they are not really well formatted, and therefore not easily readable. (Hey, the harder you try to understand the code the better you'll really understand it. ;) Even if your coding style is sloppy, you're encouraged to use indent on your programs to understand it better later on, and to get used to the good style. gdb - the gnu debugger is so complex and useful in so many situations that a complete description is way beyond this tutorial. like strace you can run a program through gdb (then type 'r' to start in gdb), or attach it to a running process. like in any debugger, you can set a breakpoint at any code address or function, e.g. break *printf. a suspended program can be continued with 'c'. memory can be examined with 'x address', e.g. 'x 0xbffff000'. x/x will use hex print the next 5 bytes. register values can be viewed with an info command, 'info registers'. registers can be dereferenced by preceding them with $, e.g. 'x $esp'. functions can be disassembled with 'disas function', or by using the memory address instead. bt/backtrace, when used in a suspended program or core file will analyze the stack frames and show the functions called before. Credits So far only YounGoat. Thanks for helping me out with that example program. :)