Scalability, architecture , Multitenancy : Database

Scalability, architecture , Multitenancy : Database

Assume that you have multiple clients, each client has access to similar kind of data, so database architecture should remain same for every client. 

An example for the same is , assume that you have developed an ERP product for a company, many companies will be having same requirement so the database will remain same.

Most of the developers will be under confusion, whether to go for single instance of database or multiple instance of database. Following are the two approaches and their pros and cons.

Approach 1 : Single instance database

  Keeping single instance of database but providing client id for each client to keep track of data of each client. 

Advantages : 

1) Better resource utilization – If your database is hosted on cloud services, like Amazon then as you create more number of instances they will allocate resources accordingly and charges will be applied for each instance. 
Example – check pricing for db instance in amazon

2) Scalability – Scalability is more easy since single instance is maintained.

Problems : 

1) Security issues – since every client’s data is in the same instance, developers should be wise enough to take care of showing data to appropriate users. Highest risk factor with single instance is security

2) Data loss – if the instance fails due to some problems related to one client, it affects to every other client, so routinely back of database is must be taken care. No compromise with data loss.

3) Performance issue – For a huge set of data performance of database might be low ( this matters for very huge data, not even for medium scale database )

This is called as Multitenancy – a principle in software architecture where a single instance of the software runs on a server, serving multiple tenants.

Approach 2 : Multiple instances database

with this approach all the problems of first approach will be solved but the only problem with this approach is scalability.

Algorithm Efficiency : Merge sort vs Insertion sort

Algorithm Efficiency : Merge sort vs Insertion sort

Importance of Algorithm Efficiency :

Let us learn how an algorithm efficiency matters a lot ? We will analyse this by taking some interesting experiment as an example. Before that let us know what are the efficiencies of both the algorithm

Insertion sort :

 The average and worst case efficiency of merge sort is (C1) O(n2)

Merge Sort :

 The average and worst case efficiency of merge sort is (C2) O(nlog(n)) 

But let us analyse the efficiency with an experiment :

Experiment :

Let us take a faster computer , Computer A running insertion sort and a slower computer running merge sort.

They each must sort an array of one million numbers. Considering the following factors

  • Computer A executes one billion instructions per second
  • Computer B executes ten million instructions per second
  • Computer A is 100 times faster than Computer B in raw computing power
  • To make the difference even more dramatic , suppose that the world’s craftiest programmer writes insertion sort in machine level language for Computer A and resulting code requires 2n2 instructions to sort n numbers ( 2 is constant c1)
  • Merge sort  will be developed by an average programmer using high-level language with an efficient compiler with a resulting code 50nlog(n) instructions ( 50 is constant c2)

With the above assumptions , to sort one million numbers

Time taken by Insertion Sort in Computer A ( Faster computer ) :
( 2 X (106)2 Instructions ) / 109 instructions/second = 2000 seconds,

Time taken by Merge Sort in Computer B ( Slower Computer ) :
(50 X 106 lg(106) Instructions) / 107 instructions/second = 100 seconds,
This experiment shows even with slower computer (100 times slower) the best algorithm gives best result compared to faster computer running less efficient algorithm.

Computer B runs 20 times faster than Computer A, even though Computer A is 100 times faster than Computer B.

This proves that even though the underlying hardware is low end, an efficient algorithm performs much better, one very good example for this is Android vs iOS 🙂

Even though android smart phones are very high end , very good RAM and processor, they perform less compared to iPhone which has less RAM and less speed processor
Reference : Introduction to algorithms by Thomas H. Cormen

Graph Theory, Matrices – A Realtime example

Observe the matrix given below 

Adjacent Matrix
Adjacent Matrix


The above Matrix shows all zero at diagonal. 

Let me draw a graph for this ,

Directed Graph
Directed Graph


Assume that A, B and C are entities in database , Say Person entity.

A – Ram , B – Suma, C – Prema

Ram – Posted 
 Hello ….
Suma – Commented
  Idiot 🙂 …
Ram – Replied
  You Idiot …
Prema – commented
  See every one call you as idiot 🙂
Directed Graph
Directed Graph
Hold a moment … Does the above graph is correct ?
Observe what is changed in below graph
Un-Directed Graph
Un-Directed Graph


It is not mandatory that only A has to comment B ! , Even B can comment A .

So the first graph shown  is a directed graph and it is wrong , It should be undirected graph.

Now assume that how a facebook keeps track of connected friends ? Mutual Friends ? how big is the graph of facebook ?

The intention of this article is , to teach you guys how a Matrix concept is used to represent a graph, and how a graph theory is used in realtime to represent a social networking site like facebook.

Now understand why one should learn Graph Theory and Matrices

Let us have a look into facebook DB and Big data
Total Monthly Active users – 1,310,000,000
Total Mobile users  – 680,000,000
Every 20 minutes on Facebook
Links shared –  1 million
Friend Requests – 2 million
Messages sent – 3 million

For loop VS While Loop – which is faster ?

Many times I have come across a very basic question Whether For loop is faster ? or While loop is faster in C and other languages. Most of the times an unexperienced developers use to ask this question to me. So I thought to write an article on this .

The answer to this question is – there are no difference in performance , both For and While loop perform same , but this may differ a little for different compilers. It may slightly differ for different compilers because it depends on how a compiler compiles and generates machine code for the same. Even thought those factors are negligible .

I managed to figure it out by looking into the intermediate assembly code generated for both For loop and while loop. Looking at the assembly instructions generated for both , it shows that both have similarly equal number of instructions and similar instructions.

Instead of explaining it in text, I am posting screenshot of assembly code generated with marking of details. The assembly code is generated in Mac OS X , XCode by using Assemble option.

For Loop Assembly Code
For Loop Assembly Code


While Loop assembly code
While Loop assembly code



Endianess – When it matters ?

We all knew what is an endianess in computer architectures , the big endian and little endian architectures works different. Big endian where most significant bit stored at smallest address where as in little endian least significant bit stored at smallest address.

The world’s most popular processor Intel is little endian machine, where world’s most oldest motorola processors are all big endian. All the network protocols , routers works in big endian architecture.

Most of the time endianess doesn’t come into picture , because programmers doesn’t deal with very low level bits. Ideally when I say int number = 4; the program runs properly in both kind of architecture. The reason is when the value is processed , it will be always in processor registers, not directly into memory. Endianess comes into picture when we deal with memory bits.

For example – If we are applying some effects or transform on an image by using shift operators or some standard APIs, then the things might go wrong. OpenGL APIs internally deal with shifting the bits in memory to perform certain operations in such cases the entire image may look inverted.

The common convention followed in networking is big endian , all TCP/IP protocols and other network protocols deal with big endian data. Those who work on networking should be aware of endianness. 


Let us take an example how endianness might be a problematic

Assume that a System 1 is sending an ip address to System 2 , If system 1 is 80×86 processor which is a little endian and System2 is SPARC machine which is a big endian. An ip address sent from System 1 , will be received by SPARC machine and converts it to big endian format.


If I consider , let us convert this to Hex value

ip address to Hex a logic =  take each octet , multiply it by 256 ^ n, where n is zero based index

= (1 * 256 ^ 0) + (2 * 256 ^ 1) + ( 168 * 256 ^ 2) + (192 * 256 ^ 3)  

= 1 + 512 + 11010048 + 3221225472

= 3232236033 ( decimal )

= C0A80201 ( Hex )

System 1 is sending value C0 A8 02 01 , but System 2 will interpret it as 01 02 A8 C0

This kind of issues happens due to endianness.




Recursion VS Iteration – An Analysis with fibonacci and factorial

Recursion VS Iteration – An Analysis with fibonacci and factorial

It is always difficult to choose one over the other , but recursive and iterative methods can be chosen wisely by analysing the algorithm with certain input values. Let us study the usage of recursive methods and let us analyse how recursive call works internally.

How Recursion  Works ?

The concept of stack frame comes into picture when we deal with recursion, for each call a stack frame will be created and the values will be pushed into stack. Once the recursive call reach the final condition, the program will start popping out from the stack frame and returns the values . A C program to find factorial of given number and analyse it using recursive method. 

Let us take an example –

Finding Factorial of a Number

long int count = 1;
long int factorial(long int n){
  count++;//to count how many times the method getting called
  if (n == 0) {
   return 1;
  return n * factorial(n - 1);

int main(int argc, const char * argv[])
   int n = 13;
   printf("Factorial of %d = %lu , number of time called %lu\n",n,factorial(n),count);
   return 0;

Let Analyse how it works ? 

When factorial(n) will be called every time , a stack frame will be created and the call will be pushed to stack, the entire call stack looks like below. The below image shows stack operations for a given recursive call.


If I give input as 10, the recursive method will be called 12 times,

If I give input as 14, the recursive method will be called 16 times,

If I generalise the number time of call then it will be – n + 2 times.

This looks ok, but let us take different example and try to compare both. A program to find fibonacci number and analyse it with respect to call stack.

Finding Fibonacci number 

long int count = 1;
long int fibonacci(int n){
    if (n <= 1) {
        return n;
        return fibonacci(n-1) + fibonacci(n - 2);
long int fibonacciIteration(int n) {
    int x = 0, y = 1, z = 1;
    for (int i = 0; i < n; i++) {
        x = y;
        y = z;
        z = x + y;
    return x;
int main(int argc, const char * argv[])
    int n = 4;
    printf("fib value recursive %lu ",fibonacci(n));
    n = 100; // let us give 100 as input for iterative method
    printf("\nIterative method %lu ",fibonacciIteration(100));
    printf("\nnumber of time called %lu\n",count);
    return 0;

Now the call stack looks as follows, memory layout of call stack for a given recursive call of fibonacci program. 

Call stack of Fibonacci
Call stack of Fibonacci

Let us analyse this example ,

If we observe the callstack , fib(2) , Fib(1) and Fib(0) called multiple times, which is not necessary as the value will be calculated .

For input n = 4 , the total number of times the fibonacci method called = 10.

Now let us take few more inputs and check how many times the function is getting called.
n = 5 => 16 times,
n = 6 => 26 times ,

If I generalise the number of time the function called will give us an astonishing result ,
number of times fib function calls = (2^n – C), Where C is constant, and very less ~ 2^n

Let us take some interesting inputs 🙂
n = 30 => 2692538 times
n = 31 => 4356618 times (just by increasing n by 1 , the number of times function called became so huge)

n = 42 => 866988874 times , Unbelievable !!! These results shows how recursive method is dangerous in this case.

n = 43 => my program not even stopping , even after 5 minutes .

Let us look on iterative method 

long int fibonacciIteration(int n) {
    int x = 0, y = 1, z = 1;
    for (int i = 0; i < n; i++) {
        x = y;
        y = z;
        z = x + y;
    return x;

This method will return , fibonacci value within few seconds even if input n = 100 .

Conclusion is , selecting between iterative and recursive methods should be decided based on the problem statement, the problem looks simple with simple inputs, but analysing the problem with bigger inputs will actually shows the big picture .

Update: Complexity of recursive Fibonacci –

The recursive equation for this algorithm is T(n)=T(n1)+T(n2)+Θ(1)

For this it will be T(n)=Θ(ϕn)

where ϕ is the golden ratio (ϕ=(1+5) / 2).

I strongly recommend to read Introduction to Algorithms 3rd edition : Purchase in FlipKart

A Video Tutorial on Analysis Of Recursive Function For Fibonacci is here

How string copy becomes costly – An Analysis

Let us try an interesting example , a simple but an interesting analysis at the end.  Let me start with a requirement by taking an example .

Assume that you are getting ‘ N ‘ number of strings from an xml file , where ‘ N ‘ is really huge. Each string is again having too many characters. A library is being used to parse the strings from the xml file, library will return the strings one by one.

The library returns string and the length of the string in a callback method, our program has to copy the string to keep a copy of it.

Let me write a simple code for this , to make the code simple for better understanding I will avoid library usage and all other stuff. Just let us focus of core problem , that is copying ‘ N ‘ strings.

int main(){
  return 1;

void getStrings(){
   int numberOfCharacters = 2014; // this length will be returned by Library after parsing xml file
   char *string1 = (char*)malloc(numberOfCharacters); // to hold string
   strcpy(string1,"This string we will get  from library, for this example we are hard coding it");
   char *string2 = (char*)malloc(numberOfCharacters); // to hold string
   strcpy(string2,string1); // have a  copy of string returned by library .

The above problem looks very simple, but the last strcpy() call will become costly for you since number of strings are too big. 

How strcpy is costly ? 

strcpy typically looks like below –

char* strcpy(char * string1, const char * string2){
  char * originalStringPointer = string1;
  while(*string2 != '\0'){
    *string1++ = *string2++;
 *string1 = '\0';
 return originalStringPointer;

So behind the screen for every string copy ( for all your N strings, where N is too big ) , strcpy method iterates each character and copies .

Let us take one example and analyse this – 

Analysis I –

If I have 100 strings all of same length say 10

then the number of times the loop in strcpy runs is 

100 * 10 = 1000  times, this doesn’t make much difference .

Analysis II – 

Let us take bigger value , 

If I have 3000 strings of length 100 each 

then your loop in strcpy runs 3000 * 100 = 3,00,000 times 

3,00,000 times is big, also we assumes all strings are of same size and that is just 100, but in realworld example that might be different again.

So in such a scenario your strcpy becomes very costly for you, your application performs really bad that too if you didn’t handle this case properly. If by chance you are reading all the strings during the app launch, your application takes several seconds to show the first screen.

A clever programmer finds a better way to achieve this with better way.

Better approach for better performance –

If you observe the problem statement, just notice that we have a string and its length both. so instead of using strcpy we can go for memcpy which actually just transfers chunk of data from one location to another location rather than copying it character by character.

so just replacing strcpy() by memcpy() avoids 3,00,000 loop executions in your second analysis .

your getStrings method looks like below now

void getStrings(){
  int numberOfCharacters = 2014; // this length will be returned by Library after parsing xml file
  char *string1 = (char*)malloc(numberOfCharacters); // to hold string
  strcpy(string1,"This string we will get  from library, for this example we are hard coding it");
  char *string2 = (char*)malloc(numberOfCharacters); // to hold string
  memcpy(string2,string1,numberOfCharacters); // have a  copy with memcpy.


It is always a very good practice to find best possible way to do something, even a one line code which runs hundreds, thousands of times will become bottleneck. 

Importance of DB Design – An analysis

The most important step in any big application development life cycle is database design, If it requires any data storage. We have come across many developers who will never think much before finalising the database design. Some people will simply add fields into the table , some people simply define the data type without considering the type of data stored in real time. Let’s take a simple example and do analysis which shows big picture of database design importance with small set of data.

Consider the following example , a database designed for a simple requirement and then the analysis of the same.

Example – Requirement

Design a database for a web application where registered users will post articles. Any one can post any article, each article might fall under some category, A Category is again defined under a department and a stream. There will be Maximum 100 departments and 100 streams in the system.

For example , an article on “ Analysis of computer simulated data on wheels of a plane “, this article might fall under “Aerospace” Department, Physics stream .

Another article on “ Calculation of Voltage variations within the cooling system “ ,this article might fall under “Aerospace” Department , Electrical stream.

If we consider the above requirement and two examples , the following design sounds good – ( Note the database designed here is simple and just taken for simple analysis )

DB Design

Considering the above design, and observe the department Id and Stream id fields in respective tables are defined as INT ,  The values of these two fields will never go beyond 100 as per the requirement. The first mistake is the datatype defined INT is not right. It can be declared as TINYINT(1) which holds value upto +127 , since both department id and stream id are declared as INT , extra 3 bytes is wasted for each record, because INT takes 4 bytes, and TINYINT takes 1 byte memory.

Let us analyse this scenario with simple real time example –

Analysis I –

If by chance each table ( department and stream ) holds 100 rows , then considering above storage details of each data type , for each row 3 bytes is wasted and hence if there are 100 records , for each table 300 bytes of data wasted.

So overall 600 bytes wasted from these two tables for just 100 records. This doesn’t make any difference because it is just a matter of few hundred bytes.

But If I change the above ArticePost table as follows ( that kind of mistakes usually happens when the database is bigger which will be having 100+ tables , since here only few tables are there then there is very less chance of such mistakes ).

Analysis II –

Modified ArticlePost table


For a quick access if I put both department id and stream id in ArticlePost table itself , the things will go worse !

Now consider 100 articles for each department and for each stream then total number of rows in ArticlePost = 10 * 100 * 100 = 100,000 articles, this is a simple math total ArticlePosts = (Articles for one dept/stream) * number of dept * number of streams 

If we consider 100,000 number of rows in articles then the things will be worse ! For each row 600 bytes of memory waste as per Analysis I , so now for 100,000 rows how much memory waste ?

100,000 * 600 = 600,00,000 bytes  = 57.220 MB memory wasted.

In real time , 100,000 of rows is not a big one , We should consider one thing very seriously here , 57.220 MB memory waste just because of two extra fields in a table and with improper usage of datatype .

Have a look on Data types and storage capacity from Mysql official documentation



Update –

Just observe how the data types in RDBMS are , INT(4) , INT(10), VARCHAR(25)

in general INT(N) , where N decides how much space it should consume ?.

Remember what we studied in C ? Can we relate this to C bit fields concept ?

Assume that I wanted to hold a status of the request in ma C program written for managing a government office complaint management system.

struct complaintStatus{
  int isComplaintRaised;
  int isComplainReceived;
  int isComplaintResolved;
} status;

Since We are holding only status , it not required to have an int of 4 bytes ! only one byte is sufficient to hold either 0 or 1 , where 0 indicates NO , 1 indicates YES.

Let me modify the above Struct 

struct complaintStatus{
 int isComplaintRaised : 1;
 int isComplainReceived : 1;
 int isComplaintResolved : 1;
} status;

By above structure your compiler will restrict the amount of memory used by status .