How string copy becomes costly – An Analysis

Let us try an interesting example , a simple but an interesting analysis at the end.  Let me start with a requirement by taking an example .

Assume that you are getting ‘ N ‘ number of strings from an xml file , where ‘ N ‘ is really huge. Each string is again having too many characters. A library is being used to parse the strings from the xml file, library will return the strings one by one.

The library returns string and the length of the string in a callback method, our program has to copy the string to keep a copy of it.

Let me write a simple code for this , to make the code simple for better understanding I will avoid library usage and all other stuff. Just let us focus of core problem , that is copying ‘ N ‘ strings.

int main(){
   getStrings();
  return 1;
}

void getStrings(){
   int numberOfCharacters = 2014; // this length will be returned by Library after parsing xml file
   char *string1 = (char*)malloc(numberOfCharacters); // to hold string
   strcpy(string1,"This string we will get  from library, for this example we are hard coding it");
   char *string2 = (char*)malloc(numberOfCharacters); // to hold string
   strcpy(string2,string1); // have a  copy of string returned by library .
}

The above problem looks very simple, but the last strcpy() call will become costly for you since number of strings are too big. 

How strcpy is costly ? 

strcpy typically looks like below –

char* strcpy(char * string1, const char * string2){
  char * originalStringPointer = string1;
  while(*string2 != '\0'){
    *string1++ = *string2++;
  }
 *string1 = '\0';
 return originalStringPointer;
}

So behind the screen for every string copy ( for all your N strings, where N is too big ) , strcpy method iterates each character and copies .

Let us take one example and analyse this – 

Analysis I –

If I have 100 strings all of same length say 10

then the number of times the loop in strcpy runs is 

100 * 10 = 1000  times, this doesn’t make much difference .

Analysis II – 

Let us take bigger value , 

If I have 3000 strings of length 100 each 

then your loop in strcpy runs 3000 * 100 = 3,00,000 times 

3,00,000 times is big, also we assumes all strings are of same size and that is just 100, but in realworld example that might be different again.

So in such a scenario your strcpy becomes very costly for you, your application performs really bad that too if you didn’t handle this case properly. If by chance you are reading all the strings during the app launch, your application takes several seconds to show the first screen.

A clever programmer finds a better way to achieve this with better way.

Better approach for better performance –

If you observe the problem statement, just notice that we have a string and its length both. so instead of using strcpy we can go for memcpy which actually just transfers chunk of data from one location to another location rather than copying it character by character.

so just replacing strcpy() by memcpy() avoids 3,00,000 loop executions in your second analysis .

your getStrings method looks like below now

void getStrings(){
  int numberOfCharacters = 2014; // this length will be returned by Library after parsing xml file
  char *string1 = (char*)malloc(numberOfCharacters); // to hold string
  strcpy(string1,"This string we will get  from library, for this example we are hard coding it");
  char *string2 = (char*)malloc(numberOfCharacters); // to hold string
  memcpy(string2,string1,numberOfCharacters); // have a  copy with memcpy.
}

 

It is always a very good practice to find best possible way to do something, even a one line code which runs hundreds, thousands of times will become bottleneck. 

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.