Basic String Handling in C

From Compsci.ca Wiki

Revision as of 14:11, 11 June 2007 by Dan (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

C++ is not C

For better or worse, C and C++ are not the same language. One of the simplest, yet most profound ways the two languages differ is in how they handle strings.

While C++ has a standard string class which hides the nature of strings and allows programmers to deal with them in a fairly high-level manner... C does not.

What is a string in C?

A single character in the ASCII character set, with which we most commonly deal, is a single byte (or 8 bits) of computer memory. A string is just a number of bytes in a row in a computer. The variable representing a string is just a pointer to the first character in that string.

So how do we know where a string ends?

When any of the functions which deal with strings run into a null byte (one whose value is zero), they know they've reached the end of the string.


Allocating Strings

Note: If we want a 42 character string, we have to allocate space for 43 characters, so that one of them can be set to zero.

First we'll create a string that's 42 characters long and is allocated to the stack, so that it ceases to exist outside of a particular function call.

#include <stdlib.h>

int main()
{
   char foo[43];
}

And now we'll allocate a string on the heap, so that it sticks around until we explicitly free the memory. The malloc() function does this, and we cast the result of it to the type of pointer we want. In this case that's a char pointer.

We give malloc the number of bytes we want to allocate. We don't want to count on char being one byte, though, so we find the size of it with the sizeof() macro, then mutliply that by 43 (the number of characters we want, including the null character).

#include <stdlib.h>

int main()
{
   char * bar = (char *)malloc(sizeof(char) * 43);

   return 0;
}

Freeing up a string

When we're done with a string variable, to prevent it from continuing to take up memory, you should free() that memory.

This is unnecessary with stack-allocated strings.

#include <stdlib.h>

int main()
{
   char * bar = (char *)malloc(sizeof(char) * 43);

   /* do something with foo... */

   free(foo);

   return 0;
}

Any questions?

This is the point where you should stop yourself and figure out if you really understand the above before going further.

We know how to allocate and free strings... now what?

The C header file "string.h" contains numerous functions for handling strings.

Copying a string to a variable

The simplest way to copy one string into another is to use strcpy(). This functions takes a character pointer for the first argument (the destination), and a constant character pointer for the second argument (the source). The source string, including the null character is copied into the destination.

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[43];

   strcpy(foo, "Hello world");

   return 0;
}

And it can be used to copy from one string variable to another.

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[43];
   char bar[43];

   strcpy(foo, "Hello world");
   strcpy(bar, foo);

   return 0;
}

Concatenating Strings

The strcat function steps in here, and works much like strcpy.

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[43];

   strcpy(foo, "Hello ");
   strcat(foo, "world");

   return 0;
}

But there's a catch...

The strcpy() and strcat() functions aren't very smart. In fact, I could write one myself.

char * strcpy(char * dest, const char * source)
{
   for (int i = 0; source[i] != 0; i++)
   {
      dest[i] = source[i];
   }

   return dest;
}

The problem comes from a simple question: if I've allocated 4 characters for the destination, but there are 7 characters in the source, what happens?

The answer is: the function will continue to copy the source string to the destination string without regard for the fact that there isn't enough space in the destination.

This will either cause a segmentation fault, and your program will stop, or it will continue, but it will have written to areas of memory it shouldn't have. This is the infamous "buffer overflow" problem.

So how is this fixed?

Judicious use of the strncpy() and strncat() functions can eliminate this problem entirely. Each of these is similar to strcpy() and strncat(), but takes an extra argument which represents the number of characters to copy or concatenate.

The other difference is that these functions do not automatically insert the null character which is required at the end of strings. The solution is to insert that character by hand.

So, let's say I have a string six characters long and want to copy "Hello world" to it. Clearly my string isn't big enough. Since I need to use one of the six characters in the string for the null character, I'll only copy five characters from "Hello world".

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[6];

   strncpy(foo, "Hello world", 5);
   foo[5] = '\0';

   return 0;
}

Using strncat() is a bit more complicated, because the destination string probably already contains some characters. As a result, you can't just tell the function to concatenate the length of the destination (minus one) and be secure.

You have to calculate the length of the destination string. For that we use the strlen() function.

Knowing the current length of the string we can figure out how many characters worth of space we have left in the string.

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[6];

   /* this is just your basic strcpy() */
   strcpy(foo, "He");

   /* now, we use strncat() to get the rest of "Hello" without overflowing */
   strncat(foo, "llo world", 5 - strlen(foo));
   foo[5] = '\0';

   return 0;
}

How do I compare two strings?

We've already seen that strings in C are just character pointers. So directly comparing two "string" variables just compares the pointers. This will only tell us if the two strings are located at the same place in memory. If they are, then they are the same string.

However, that isn't usually what we're testing. Generally we don't care if two identical strings are located at different places in memory, just whether or not they contain the same characters. As a result, we have to use functions.

strcmp()

The most basic function for comparing two strings is strcmp(). This goes through two strings and evaluates them character by character.

If the first string is less than the second, the function return a number less than zero. If the first string is greater than the second, it returns a positive number. If they're the same it returns zero.

This trips up many programmers because zero is "false" in C. To test if two strings are equal, the output of strcmp() should be compared to zero.

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[] = "Hello";
   char bar[] = "hello";
   char baz[3];

   if (strcmp(foo, bar) == 0)
   {
      strcpy(baz, "yo");
   }
   else
   {
      strcpy(baz, "oy");
   }

   return 0;
}

The above will have baz being set to "oy".

Case-insensitive Comparison

Case-insensitive comparisons are possible as well, using the strcasecmp() function.


#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[] = "Hello";
   char bar[] = "hello";
   char baz[3];

   if (strcasecmp(foo, bar) == 0)
   {
      strcpy(baz, "yo");
   }
   else
   {
      strcpy(baz, "oy");
   }

   return 0;
}

Limited Length Comparisons

Both of these functions have versions which let you specify the number of characters to compare. So, let's say we have "abcdefg", and "abcz" and we only care about the first three characters.

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[] = "abcdefg";
   char bar[] = "abcz";
   char baz[3];

   if (strncmp(foo, bar, 3) == 0)
   {
      strcpy(baz, "yo");
   }
   else
   {
      strcpy(baz, "oy");
   }

   return 0;
}

Case Insentitive Comparison

#include <stdlib.h>
#include <string.h>

int main()
{
   char foo[] = "Abcdefg";
   char bar[] = "abcz";
   char baz[3];

   if (strncasecmp(foo, bar, 3) == 0)
   {
      strcpy(baz, "yo");
   }
   else
   {
      strcpy(baz, "oy");
   }

   return 0;
} 

Discussion

To Discuss this tutorial visit here.

Credits

Tutorial written by wtd, moved to wiki by Cornflake

Personal tools