Playing for Change Episode 4

Distributed Database Storage Network

 

I have recently come across a big problem with one of my online tools, slow databases. It’s not something I can fix because I use a basic hosting company, but I have come up with an idea to decrease load off of a single database and spread it among many. I am calling this Distributed Database Storage. This process will differ from mainstream practice used currently and could possibly change the way we store our applications information on a day to day basis. To start you need a few different types of servers: the main application server, a control database and as many databases as you require.

The following diagram is how many web applications store their informaiton in databases.

typicalDataStore.jpg

The above client computers will request a page from the application server. The applicaiton server then makes a query to one of the databases. The database that is choosen is usually used becuase it, at the time of the query, has the lowest load of running queries.  The databases sync with each other to create an exact copy on each system. This system can have slower update time as information is synced from one database to the others. This is a good design for systems that do not need instant updates to the database as a whole.

The idea is that instead of duplicating data across multiple database servers the data will only reside on a single database server, but the application will spread the data among many databases. Since the application will need to know what server to make a connection it requires an index of the information. To make the reads to the index as fast as possible it will be entirely loaded into the memory of the control database. The application will contact the Database Index/Router asking where the needed information is stored, it will then send the query to the appropriate database and send the responce back to the application server.

distibutedDataStore.jpg

The next phase is to implement a optimization script that will run every month or week depending on the traffic to the site. The optimization will rank different rows in the database from extremely high activity to extremely low activity. The site will then have to go offline for the few minutes of data swapping. Each row will be given a value from 1 to ∞. 1 being the most active and each inclement being one less active then the previous. Each row will then be moved to an approprite database to balence the load among all of the system’s databases.

This outline will allow any number of databases to be used together. It also allows for easy addition to the database network by updating a single database record file. I have personally started work on implementing a Distributed Database Storage Network for Idea-Labs.com. There are still some finer details that need to be planned out before successful implementation, but I feel this is a good abstract to begin my journey.