Many testers know how to write SQL queries and how to fetch data from the tables. But only a few understand the internal architecture and how MySQL interacts with storage engines. This blog discusses the MySQL architecture, different layers and journey of SQL statement in MySQL data.
MySQL architecture is broken into three layers basically which can be defined by,
1. Application Layer
2. Logical Layer
3. Physical Layer
1. Application Layer
The application layer is where the clients and users interact with the MySQL RDBMS. All the services needed for connection handling, authentication, security are here. There are three main components in this layer namely Administrators, Clients, Query Users as shown in the below figure.
Administrators use various administrative interface and utilities like mysqladmin which performs tasks like shutting down the server and creating or dropping databases, mysqldump for backing up the database or copying databases to another server.
Clients communicate with MySQL through various interfaces and utilities like MySQL API’s. The MySQL API sends the query to the server as a series of tokens.
c. Query User
The query users interact with MySQL RDBMS through a query interface that is mysql.
2. Logical Layer
The Logical Layer takes the data from the Application Layer. Any functionality provided across storage engines lives at this level like stored procedures, triggers and views. It is divided into subsystems like Query Processor, Transaction Management, Recovery Management, Storage Management. These subsystems work together to process the requests issued to the MySQL database server. The output of one of the above subsystems becomes the input for another. Below is the basic conceptional diagram.
Query Processor – Now let’s try to understand query execution with a simple example.
SELECT * FROM Manufacture.Orders WHERE order_id = 1009;
In this query, we want to know order details for the order id 1009.
a. When a query request is received from a client in the application layer, we would execute this query using MySQL client session (MySQL Client is a utility to execute all MySQL commands and return an output). It is the responsibility of the first component in Query Processor i.e embedded DML (Data Manipulation Language) precompiler to extract the relevant SQL statements embedded in the client API command or to translate the client commands into the corresponding SQL statements.
b. When a client session is established a “MySQL THREAD” is created in MySQL to handle all the command executions and sessions.
c. The query is submitted to the MySQL server which is a “mysqld” process running on the remote or local server. “mysqld” is MySQL Server daemon program which runs in background and manages database related incoming and outgoing requests from clients. mysqld is a multi-threaded process which allows connection to multiple sessions, listen for all connections and manages MySQL instance.
d. Once entering these server process the query is checked against a list of previously executed SELECTs maintained by MySQL DBMS engine in an area of memory known as “QUERY CACHE”. If the same query was executed early the result set will be stored in the Query cache. This saves MySQL from doing any work of parsing, preprocessing and retrieving data from disk or memory resulting in fast output.
e. If the same exact query is not present in QUERY CACHE, MySQL will move to the next component i.e “PARSER” which will create a parse tree structure based on the query so that it can be easily understood by the other components.
f. Once the parse tree structure is ready, the Query Preprocessor will check the SQL syntax and check the semantics of the MySQL query to determine if the query is valid.
g. Once the MySQL query is deemed to be valid, the security integration manager is responsible for checking the GRANTS of user on the objects they want to access through the query. For example, we want to retrieve data from ‘Manufacture’ database’s ‘Orders’ table.
h. After determining the proper permissions and access, MySQL will take the query to the most important component i.e “QUERY OPTIMIZER”. This is the place where MySQL will decide on the execution plan to access and retrieve the data as fast as possible. Query Optimizer is responsible for checking which indexes exist on the specified table and if any exist whether it can use the index to retrieve data faster. An index here on ‘order_id’ column of ‘Orders’ table will help the storage engine layer to locate and retrieve data faster. Once confirmed MySQL will create an “EXECUTION PLAN” and pass on the plan to “Storage engine” layer (which I will be discussing in the next paragraphs).
Transaction Management – Next comes the Transaction Management which is responsible for making sure that the transaction is logged and executed automatically. It does so through the aid of the log manager and the concurrency-control manager. It is also responsible for issuing the COMMIT and the ROLLBACK SQL commands.
Recovery management – Here the log Manager logs every operation executed in the database and stores the operation logs as MySQL Commands. In case of system crash executing these commands will bring back the database to the last stable stage. The recovery manager is responsible for restoring the database to its last stable state. It does so by using the log for the database, which is acquired from the buffer manager and executing each operation in the log. Since the log manager logs all operations performed on the database (from the beginning of the database’s life), executing each command in the log file would recover the database to its last stable state.
Storage Management – Storage Management has buffer manager which allocates memory resources. It accepts the request from the execution engine, requests the details from buffer manager, receives references to data with memory from buffer manager. Returns this data to the upper layer.
3. Physical Layer
The third layer is the Physical Layer which contains the storage engines. They are responsible for storing and retrieving all data stored in MySQL. Physical Layer of MySQL is slightly different from other RDBMS. Here the physical system consists of Pluggable Storage Engine architecture that enables storage engines to be loaded into and unloaded from a running MySQL server.
The various engines available are MyISAM, InnoDB, CSV, Archive etc. Starting from MySQL 5.5.5, the default storage engine for new tables is InnoDB (The CREATE TABLE statement in MySQL creates InnoDB tables by default). Each storage engine has different characteristics and based on the requirement of application we can choose an appropriate storage engine. When you create a table, you can choose what storage engine to use. For example, if we have a CSV data, which accepts only text separated by comma, you can use the CSV storage engine which stores data in text files using comma-separated values format. Similarly, you can use Archive storage engine for storing large amounts of data without indexes.
MySQL stores each database (also called a schema) as a subdirectory of its data directory in the underlying filesystem. Every database has a corresponding data directory. When you create a table, MySQL stores the table definition in a .frm file with the same name as the table. Thus, when you create a table named Orders, MySQL stores the table definition in Orders.frm. These .frm files does not store data but only have the format that contains the description of the table structure.
I hope this gives an overview of MySQL Architecture. Please stay tuned for my other blog series on SQL query execution, database modeling and Liquibase.
If you are a startup finding it hard to hire technical QA engineers, learn more about Qxf2 Services.
I am an engineer with about 10 years of experience in software development and testing. Prior to joining Qxf2, I worked with reputed companies like Dell and Infosys. They helped me gain good experience in the manufacturing and healthcare domains. My QA career started at Dell. I got exposure to various testing tools, processes and methodologies, got an opportunity to work on various platforms like JMS Hermes, SOAP UI, data integration, queues, etc. I chose Qxf2 because it allowed remote working. My interests are vegetable gardening using organic methods, listening to music and reading books.