How to Detect, Debug and Prevent Stack-Overflow!

Today I ran into my very first stack overflow bug. It was a frustrating experience to find the actual bug as I had to go through all of my code and double-check the logic, buffers, pretty much my entire code before I was finally able to detect that the bug I got was due to stack overflow. But once I managed to find out the fact that the source of my problem is stack overflow, the fix was a pretty easy one. This is one of those bugs that will make you doubt pretty much everything you know..! I am writing this article so that you won’t have to go through the same frustrating process as I did..! So let’s begin.

What is stack overflow? If the stack size allocated is not enough to hold the local variables then the data is put into the memory region adjacent to the stack, hence corrupting the region and causing unexplained behaviors

In this article, let’s discuss how stack overflow occurs, best practices to prevent it and how to detect if it actually occurred. Let’s start with what is stack and how it is implemented on microcontrollers. If you already familiar with some topics feel free to skip them and navigate to the topic of interest using the table of contents. 

Stack implementation

The stack is basically the region of memory used by the processor to store local variables. 

What are the local variables?

The variables declared inside a C block are called local variables. It is also called automatic variables as they come into existence only when the code inside a block is executing and it disappears once the block is exited. 

The value contained in these variables are usually garbage values and they need to be initialized every time we enter a function.

void aFunction()
{
    int x;
    printf ("x = %d\n", x);
}


void main()
{
    aFunction();
    .
    .
    .
    aFunction();
}
>> x = -487
>> x = 514

As you can see in the above listing, some random garbage value is printed every time this function is called. 

void aFunction()
{
    int x = 10;
    printf ("x = %d\n", x);
}


void main()
{
    aFunction();
    .
    .
    .
    aFunction();
}
>> x = 10
>> x = 10

But if you initialize the variable then the expected value will be printed as you can see in the above listing. This behavior is due to the fact that space for the local variable is allocated on the stack so whatever value is present on that space is used as the value for the local variable unless overwritten through initialization.

The other types of C variables include the extern/global variables and static variables. 

For example, see the code below 

#include <stdio.h>

// a and b are extern/global variable
int a = 100;
static int b;

void aFunction()
{
    int x = 10;
    int y = 24;
    int z = x + y;
    printf ("x+y=%d", z);
}


void main()
{
    aFunction();
    printf ("a = %d, b = %d", a, b);
    .
    .
    .
}

Here a is the extern/global variable, which can be accessed by all the files in your project and b is the static global variable, which is private to the file on which it is declared. These are of no interest to us other than the fact that space for them is allocated during compile time on the data region of the RAM. 

Lets next see the place of stack among other memory regions.

Memory regions in the RAM

The RAM is divided into 3 regions after the compilation of your code is done. They include 

  • bss
  • data and
  • stack

The data region of the RAM usually contains the global/ extern and static variables that are already initialized and the bss region usually contains the global/extern variables and static variables that are not initialized and hence are defaulted to zeros

RAM memory allocation in Code composer studio

(The pic above is the result of an example program compiled on Tiva TM4C123G launchpad, a development board by Texas Instruments on the Code Composer Studio IDE)

Also while compiling the stack size must be specified. Usually, a default value is specified but it is configurable by us through the make file.

In this development board, the RAM size is 32kB and after the program is downloaded it will look something like this

RAM memory allocation

As you can see the stack is on the bottom where local variables are placed, followed by data region where the initialized extern/global variables and static variables are placed and then by bss where the uninitialized extern and static variables are placed after being initialized to their default values, which is usually zeros and the leftover space is used for heap memory which is used dynamic memory allocation using malloc() and free() functions.

Okay keep this information in mind, later we will use this information as we discuss about stack-overflows.

How do stacks work?

Stack is a last in first out data structure which means the value stored last will come out first.

stack of books

It basically works like a stack of books. The book you place on the top will be the one you will take out first. On microcontrollers, as the power is applied, some initializations take place during the booting process (you can read more about it here). One such initialization is the allocation of the stack on the memory. 

Stack on the memory looks like this

Note that there is no column on memory on address and data types, it’s for our own reference. As you can see at memory address 0000 there is an 8-bit unsigned integer on it with a value of 115, at address 0001 there is a character ‘y’ and so on. This is just an example of how stack might look like.

Let’s take an example to see how this stack is filled.

void aFunction()
{
    int x = 10;
    int y = 24;
    int z = x + y;
    printf ("x+y=%d", z);
}


void main()
{
    aFunction();
    .
    .
    .
}
Stack being filled
Stack being filled

As you can see in the pic above, first the value x is pushed onto the stack, followed by the variable y and then by the variable z. As the variable is added, our processor must keep track of which location is actually the top of the stack. For this, it uses one of its registers to store the address of the top. This register is called the Stack Pointer (usually abbreviated into SP) which is also shown in the image above.

Then when printf statement is executed, the function returns back to the main() function. At this point as the function returns, the processor in your microcontroller executes 3 steps to remove or pop the variables from the stack as you can see in the next pic.

stack being emptied
stack being emptied

I must note here that when I say a variable is removed from a stack, it does not mean that that particular address in memory erased is written with zeros, this removing action merely decrements our stack pointer, leaving the value as it is. I have greyed the inactive stack area just to make it visual for you to understand.

When another variable is added to the stack, it just takes this value as its default value. This is the reason I mentioned above that before using a local variable, it must be initialized else it is going to contain some random garbage values as default!

The reason for this kind of implementation is just to save some processor time, else every time a value is removed from stack the processor has to write zeros to it and this will cause unnecessary delays in execution. Basically, this means more work for us programmers and less work for the CPU!

This terminology of the action of placing the variable onto the stack is called a PUSH and removing the variable is called a POP. This is how stacks work. I hope you got the idea. Next, let’s have a look at when and how a stack-overflow error can occur. 

Stackoverflow 

As we saw in a previous section about memory regions, the stack is allocated some space while we compile the code. This space is fixed while the program runs so it is necessary to allocate enough space. 

Let’s take the same configuration we just saw with the amount of stack space allocated to be 1024bytes.

Now let’s consider an example 

int a = 100;
int b = 200;

void aFunction()
{
    int x[1024] = {0}; // assigns zeros to all 1024 places like x[0], x[1], ...
    int y = 10;
    int z = 5;
    printf ("y=%d", y);
}


void main()
{
    aFunction();
    printf ("a = %d, b = %d", a, b);
    .
    .
    .
}

stack at the beginning of the main() function

stack at the beginning of the main() function

Once the program enters the main() function at line 14, the memory contents are as shown in the above picture. As you can see the memory address 1024 contains the variable a = 100 and next to it at address 1025 sits our variable b = 200, which are right above the stack region.

Once it enters the aFunction() and executes line 6, the stack memory is filled with the buffer and the resulting memory allocation looks like this

memory after the line 6 is executed
memory after the line 6 is executed

Now it comes to line 7 and the magic of stack overflow happens. Here since the stack is already full, it goes on to overwrite the adjacent memory regions and this results in your program trying to overwrite the data memory region and this results in a stack overflow. Usually, on microcontroller hardware, some mechanism is provided where if our code tries to write to some other memory region it sends a signal to the processor and the processor enters the fault state and stops program execution.

This error is called the stack-overflow error. I hope you got the idea. Now as we know what a stack overflow error is, let’s have a look at how to detect it using our debugger.

Using A Debugger to Detect Stack Overflow

There are some advanced options available on debuggers that allow us to set breakpoints based on register value. Using these options we can set a breakpoint so that if the stack pointer points to say 1023 (assuming we have a 1024 byte stack and it grows upwards from zero). Then, if the code hits that breakpoint, we can get to know that the stack is filled up to that point at some point in our code or not. Then if you step over, in very few steps you will probably run into the run-time error you are trying to confirm. If this happens then you can be sure that its the stack-overflow that’s causing the problem!

So once you are sure the problem is on stack allocation, just double the stack size and see if that fixes-up the issue. If not, that means your bug is someplace else in your code!

How to allocate enough stack size?

Using the above methods we can make sure that stack overflow is a cause of the run-time error or not. But to answer the question of how much stack space should be allocated on the RAM is a bit tricky!

Usually big applications use libraries and drivers and several other layers below the application code. So there will be a lot of nested functions starting down from main() function. Example main() will call functionA() which will call functionB() and so on and so forth till one of the function returns. Here it will be not so easy to calculate how much local variables each function uses for every possible scenario.

Also as microcontrollers usually come with very little RAM space, it will be difficult to know what stack size can be okay for our application. Too much stack space can leave very little RAM for our application and too little stack size can cause stack-overflow. 

In these situations, the best solution would be to keep using the default stack size as provided by the manufacturers until the point you run into strange bugs crashing your system. Once you notice some strange run time errors making the system to reset or halt, you can do a stack test as mentioned above to see if that is a problem. If it is indeed the problem then you can try doubling up the stack size on the RAM.

Once you double the stack size and your application starts running fine again, you can calculate the stack usage by first filling the stack with some known value say 0xFF for each byte and then let your application run for some duration over a variety of scenarios and then you can halt the processor to check how much of the stack is actually overwritten, since the places overwritten will have some value other than 0xFF.By experimenting like this we can figure out how much stack our application uses.

Say for example, after increasing stack to 2048 from 1024 and doing the stack test, you find out that the stack never grows beyond 1100, then it’s probably a good idea to resize the stack again to say 1200 bytes so that unnecessary space is not consumed by the stack and the extra RAM can be used by your application instead.

So once you figure out the number for the stack, you can go back and reduce the stack size again, just make sure that you leave some headroom for the future growth of your application!

Next, let’s see the best practices that we can follow to avoid such errors.

Best Practices to avoid stack overflows

Best Practice#1: Don’t allocate arrays on the stack

This is obvious from the previous example. It’s always a good idea to not declare arrays on the stack. It might work if the array is small enough, but as your program evolves and you decide to come and change the array size and suddenly your program starts crashing then you will not know where to look as the program will compile just fine.

Say you need to limit the access to an array to only with a function, what to do in this situation? Here you can use a C language feature and declare it as local static then the compiler will allocate the array outside the stack as seen in the next listing.

int a = 100;
int b;
 
void aFunction()
{
   static int x[1024] = {0}; // assigns zeros to all 1024 places like x[0], x[1], ...
   int y = 10;
   int z = 5;
   printf ("y=%d", y);
}
 
 
void main()
{
   aFunction();
   printf ("a = %d, b = %d", a, b);
   .
   .
   .
}

The static keyword is normally used inside functions to keep a variable in memory even when a function exits, so on that subsequent calls, the function can use the value. But it can also be used like in the above use case.

Best Practice#2: Pass pointers instead of big structs to functions

Another scenario where we can cause stack-overflow is by passing big structs to functions. 

Consider this example below. 

struct s_type {
   char char_buffer[512];
   uint32_t x;
   uint64_t y;
};
 
 
void print_function(s_type s_arg)
{
   printf ("%s", s_arg.char_buffer);
}
 
 
void main()
{
   struct s_type s;
   s.x = 123;
   s.y = 12345;
   strcpy(s.char_buffer,"some random string");
   print_function(s);
   .
   .
   .
}

We  have a struct data type called “s_type” and in the main function we declare a variable of that struct. Now the main function is already occupying more than half the stack as it has a struct with a buffer of 512 bytes. 

So if this struct is passed directly to our print function, the program will certainly crash as a copy of the entire struct will be made and this copy will be sent to the function and the function will try to allocate this copy on the stack, since there will be no space for 2 of this type on the stack, the program is going to get a runtime error.

Instead in these scenarios, it’s a better idea to pass it as a pointer as shown in the next listing

struct s_type {
   char char_buffer[512];
   uint32_t x;
   uint64_t y;
};
 
struct s_type s;
 
void print_function(const s_type * s_ptr)
{
   printf ("%s", s_ptr->char_buffer);
}

void main()
{
   int x[512] = {0};
   s.x = 123;
   s.y = 12345;
   strcpy(s.char_buffer,"some random string");
   print_function(&amp;s);
   .
   .
   .
}

Here since just the address of the struct is passed, it is going not going to increase the stack size by more than 4bytes (considering a 32bit address) hence the program should run just fine. 

Care should be taken when passing pointers as unlike copies if the value of a variable is changed on the function, the change will be sustained in the calling function too. Hence the keyword const was used on the function signature.

Best Practice#3: Avoid recursive functions

Recursive functions that take in a parameter can also cause stack overflows. Lets take a simple example of calculating the factorial of a number. For those of you who does not remember factorials, its a mathematical function that has the formula as follows.

N! = N x (N-1) x (N-2) x (N-3)…..x 1

For example 

4! = 4 x 3 x 2 x 1 = 24

What is a recursive function? Recursive functions are functions that call themselves.

In C, the functions, one of the features (in our case a source of a bug) is that functions can call themselves. It is there as a means of reducing code size, but it has the potential to cause stack overflows unless used wisely. 

The above mathematical formula for calculating factorials can be implemented in code using recursive functions as follows. 

Recursion example 1

long factorial(int a)
{
   if (a == 1)
   {
       return 1;
   }
   else
   {
       return a* factorial(a-1);
   }
}
 
void main()
{
   int x = 3;
   long y = factorial(x);
   printf("Factorial of %d is %ld ", x, y);
   return;
}
  • Here as 3 is passed to the function factorial(),the stack size increases by sizeof(int), lets assume sizeof(int) is 4 bytes for this example. 
  • Then it checks if 3 is equal to 1, then it decides no and goes to the next line and returns 3*factorial(2). Since its another function call the stack size again increments by 4 so totally stack size has incremented by 8. This time it reaches the last line with 2*factorial(1). 
  • Then once again this function calls factorial() function another time and the size of stack grows again by 4 so the total increment is 12bytes.  
  • This time though, since the test to check if the parameter passed in is equal to 1 succeeds and it returns 1, thus reducing the stack size by 4 and now its down to 8 again. 
  • This returns to the statement 2*factorial(1), so here 1 is substituted in the place of “factorial(1)” and hence it becomes 2*1 and now the stack size again decreases by 4 to become just 4.
  • Next it comes to the original function call of “3*factorial(2)” here “factorial(2)” is substituted by “2” and hence becomes “3*2”. Now it returns 6 as the final answer back to main() function. Once the function returns its stack gets popped again to go back to the original stack size of 0.

Thus stack size changed as

0 -> 4 -> 8 -> 12 -> 8 -> 4 -> 0

Let’s say we changed the code slightly like this by adding another local variable b

Recursion example 2

long factorial(int a)
{
   long b;
   if (a == 1)
   {
       return 1;
   }
   else
   {
       b = a* factorial(a-1);
       return b;
   }
}

Now the stack size change will be as follows

0 -> 12 -> 24 -> 36 -> 24 -> 12 -> 0 

This is because now the total amount of stack increase is sizeof(int) + sizeof(long) as we have 2 local variable int a and long b (assuming size of long to be 8),  hence the total size of local variables in this case is 12.

As we can see the maximum stack size was number of recursive calls times the size of the local variables. 

For the 1st case it was 3×4=12 and for the second case it was 3×12=36. 

Now if we wish to calculate the factorial of say 100 using the above code, the program will probably run into a stack overflow bug. Because 100×12=1200bytes and we allocated a stack size of only 1024 bytes!

The workaround for this is to implement the factorial formula as follows

Recursion example 3

long factorial(int a)
{
   long b = 1;
   for (int i = 0; i<a; i++)
   {
       b = b*a;
       a--;
   }
   return b;
}
 
void main()
{
   int x = 3;
   long y = factorial(x);
   printf("Factorial of %d is %ld ", x, y);
   return;
}

Here the same formula is implemented without the use of recursive functions and hence stack size will be incremented only once by 12bytes thus eliminating the stack overflow problem.

Why not keep all variables global?

You may argue that instead of having any local variables, why not keep it all global?

There are 3 reasons this is not such a good idea

  1. Pushing and Popping data in and out of stack can help reduce the RAM usage as compared to keeping everything global. In the recursion example 3, even though the stack size grew by 12, once the function returned to main(), the total RAM usage reduced again to 0, so more RAM for the other parts of your application!
  2. If everything is kept global, it will pollute the namespace as each global variable name should be unique
  3. Also, this will cause future bugs if your colleague who uses your code for a feature he is developing and he/she changes the values contained in your global variable, it will reflect in incorrect results next time the function that uses this particular global variable is called the next time around!

Related Questions

What is buffer overflow? It is very similar to stack overflow, where you write to an array past its end. But it is much more difficult to identify as instead of going into a runtime error, the program will happily write to the next memory location on the RAM, thus corrupting the data present there.

For example, consider the code below.

char a[10];
strcpy(a, "0123456789");

Here 11 bytes will be copied to the location a and corrupt whatever is sitting in memory next to a

“0123456789” is 11 bytes as strings end with a NULL character ‘\0’ so in memory it will look like this as it is converted from ASCII

A[0] = 0x30
A[1] = 0x31
A[2] = 0x32
A[3] = 0x33
A[4] = 0x34
A[5] = 0x35
A[6] = 0x36
A[7] = 0x37
A[8] = 0x38
A[9] = 0x39
A[10] = 0x00 // the buffer overflows here as it is the 11th memory location.

This method can be prevented by checking the number of bytes being written to an array every time some data is written to an array.

Alright, I will stop here. I hope you guys enjoyed reading this article. 

You can email us or contact us through this link if you have any questions or suggestions.

If you liked the post, feel free to share this post with your friends and colleagues! 

EI

We’re passionate about inventing embedded devices and we hope you are too! This blog deals with a wide variety of topics from C programming to IOT to networking certifications and more. Hope you enjoy your time spent here and hope you get some value out of our blog!

You may also like...