Database Management - Challenges and Opportunities
Challenges and Opportunities
7 Most Frequently Asked Questions Regarding NoSQL and MongoDB Management
Author: Seth Chuang
What are the key features of NoSQL databases that distinguish them from RDBMS?
1. The amount of data our systems have to work with is constantly increasing and creating new challenges. NoSQL databases need to handle a variety of data structures like JSON and XML.
2. NoSQL has the ability to scale-out. More in accordance with the decentralized architecture design (does not include graphic-database).
3. NoSQL databases do not use associations or normalization. They don’t support SQL syntax and JOIN operations. By using the aggregated data as the smallest unit of storage, it is beneficial to spread the data to multiple nodes through a rich data structure.
4. There is no need to establish a fixed table mode in advance, therefore it is schema free.
5. Last but not least most of them are open-source
What are the main advantages of NoSQL over RDBMS?
Ideal for huge amount of data
Traditional RDBMS is designed to operate on a single node and when it comes to increasing system resources, up-scaling is costly and leads to a lot of hardware problems. Conversely, NoSQL usually has the ability to scale horizontally over many servers (x86 server). Therefore, it can effectively solve the performance problems caused by hardware bottlenecks.
Avoiding impedance mismatch problems
Using RDBMS databases often come with the problem of impedance mismatch as different database models and programming languages are used. This problem can only be solved by ORM (object relational mapping) and SQL JOINs, resulting in poor system performance. In addition, if you change the database schema, you need to contact the DBA to modify the mode. By using NoSQL databases, you can avoid impedance mismatch problems.
No need for JOIN operations
Sometimes it is necessary to do JOIN operations to combine rows from two or more tables, but this operation seriously affects system performance. Conversely, NoSQL's databases, pre-processes the relationship between graphics nodes to ensure the efficiency of the query.
What types of NoSQL databases exist?
1. Document database
Widely used in various scenarios (does not support transactional systems), it is more suitable for flexible query then column database.
2. Column database
Widely used in a variety of scenarios (does not support transactional systems), it is more suitable for write operations than document databases.
3. Key-value database
It can only be queried by key, and the value is treated as an object, so some fields in the object cannot be retrieved. Suitable for storing user messages, such as shopping carts, web sessions, etc..
4. Graphic database
The structure of the graphic data formed by nodes and connections. Suitable for recording complex relationships, such as social networks, recommendation engines, and transportation routes.
The first group include aggregation-oriented databases, which are suitable for processing huge amounts of data in a decentralized architecture, but usually do not support ACID Transaction.
The last graphics database, suitable for processing complex relationships on a single server, supports ACID Transaction, but does not support association models.
MongoDB the Leading NoSQL Database
DB-ENGINES and Stackoverflow survey reports rank MongoDB as the No. 1 of NoSQL databases. MongoDB is the most popular NoSQL library available as it improves development efficiency and is very suitable for large amounts of data.
The design concept of MongoDB adopts the Nexus architecture and has thereby combined the advantages of RDBMS and NoSQL with each other. RDBMS uses query syntax and has strong consistency. NoSQL has a flexible data structure and uses horizontal scalability.
What are the main advantages of MongoDB?
1. Unstructured: suitable for storing unstructured, semi-structured (JSON, XML) data.
2. Schema - Free: Out-of-the-box, no predefined.
3. Replication (Replica Set): By copying, let multiple nodes maintain the same data, and can spread the read request to improve the read performance.
4. Sharding (Sharding Cluster): Through shard, data can be distributed to multiple nodes, and write requests can be distributed to improve write performance.
5. Storage Engine: The storage engine can be swapped according to the application scenario.
6. UI tools: There are many UI tools available for MongoDB management, for example, gudab, Robo 3T, Ops Manager, Compass, etc.
It is managed by the use of the replica set. The replica set of MongoDB is suitable for a formal environment, small and fixed amount of data. Through replication, the same data is maintained by multiple nodes, which can effectively improve the read performance. MongoDB uses master-slave replication. The primary node receives all read and write operations, while the secondary node accepts only read operations.
MongoDB also provides high availability, which is an active-standby mechanism. When the Primary hangs, backup is provided by the secondary, and the new primary is re-selected from the secondary to make a fail-over. Primary records the application's operation log (oplog) as a synchronization source for a secondary.
What is MongoDB Sharded cluster?
Sharded Cluster is suitable for a production environment, a large amount of data, and continuous growth of data. Sharding distributes data to multiple nodes to improve write performance. MongoDB uses the Router (mongos) as the entry point of the application and treats the entire shard cluster as a stand-alone database, effectively reducing development complexity. The Config Server records metadata such as routing rules.
For huge amount of data, it is recommended to use sharded clusters in the first place. Because the amount of data is too large to be sharded which not only consumes resources but also takes time.
Are there any monitoring tools for MongoDB?
There are many UI tools available for MongoDB management, for example, gudab, Robo 3T, Ops Manager, Compass and many more. gudab, for example, provides the most important functions to manage MongoDB: monitoring, alerts, backup, individualized dashboards, and activity and user management modules.
A special handy feature is the cluster architecture auto-detection. The auto-discovery automatically detects and maps the system topology, automatically lists all MongoDBs in the cluster, and displays the host type and lists for monitoring. If the Replica Set or Sharded Cluster has multiple members, you don't need to enter FULLNAME (host:port) to join the monitoring. Just enter one of them and you can add the entire cluster related members to the monitoring.
gudab also provides an indicator monitoring which can monitor more than 30 different metrics such as CPU, RAM, Disk, Connection, Index, and Throughput. Also, it displays the system status through real-time charts. MongoDB itself provides system instructions to query indicator performance, while gudab records it and renders a time series based on time granularity. Additionally, the administrator can individually set alerts for all monitoring metrics for the server. It supports E-mail and other customized alarm notification methods. gudab also enables users to set a time to switch the database offline at the expected time so the system will not continue to send alarms. Compared with the common monitoring tools gudab provides full UI operations. It is not necessary to maintain multiple sets of configuration files anymore, with gudab you simply import and manage by click afterward.
A nice feature is also the backup and restore function. gudab provides a template script to manually conduct a full-backup and automatically conduct an additional oplog-backup (the intermediate difference data). So gudab can provide differential backup, export/import restoration on selected servers.