Database is a systematic organized or structured repository of indexed information (usually as a group of linked data files) that allows easy retrieval, updating, analysis, and output of data. Stored usually in a computer, this data could be in the form of graphics, reports, scripts, tables, text, etc., representing almost every kind of information. Most computer applications (including antivirus software, spreadsheets, word-processors) are databases at their core.
Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.
Since DBMSs comprise a significant market, computer and storage vendors often take into account DBMS requirements in their own development plans. Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security.
A database is not generally portable across different DBMSs, but different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one DBMS. Computer scientists may classify database-management systems according to the database models that they support; the most popular database systems since the 1980s have all supported the relational model – generally associated with the SQL language. Sometimes a DBMS is loosely referred to as a “database”.
Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index). This article is concerned only with databases where the size and usage requirements necessitate use of a database management system.
Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups:
- Data definition – Creation, modification and removal of definitions that define the organization of the data.
- Update – Insertion, modification, and deletion of the actual data
- Retrieval – Providing information in a form directly usable or for further processing by other applications. The retrieved data may be made available in a form basically the same as it is stored in the database or in a new form obtained by altering or combining existing data from the database.
- Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure.
Both a database and its DBMS conform to the principles of a particular database model. “Database system” refers collectively to the database model, database management system, and database.
Origin of database management
Following the technology progress in the areas of processors, computer memory, computer storage, and computer networks, the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitude. The development of database technology can be divided into three eras based on data model or structure: navigational, SQL/relational, and post-relational.
The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated in all large-scale data processing applications, and as of 2015 they remain dominant: IBM DB2, Oracle, MySQL, and Microsoft SQL Server are the top DBMS. The dominant database language, standardised SQL for the relational model, has influenced database languages for other data models.
Object databases were developed in the 1980s to overcome the inconvenience of object-relational impedance mismatch, which led to the coining of the term “post-relational” and also the development of hybrid object-relational databases.
The next generation of post-relational databases in the late 2000s became known as NoSQL databases, introducing fast key-value stores and document-oriented databases. A competing “next generation” known as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs.
Terminologies used in database management
Database languages are specific to a particular data model. Notable examples include:
- SQL/XML combines XQuery with SQL.
- SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the first commercial languages for the relational model, although it departs in some respects from the relational model as described by Codd (for example, the rows and columns of a table can be ordered). SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. The standards have been regularly enhanced since and is supported (with varying degrees of conformance) by all mainstream commercial relational DBMSs.
- OQL is an object model language standard (from the Object Data Management Group). It has influenced the design of some of the newer query languages like JDOQL and EJB QL.
- XQuery is a standard XML query language implemented by XML database systems such as MarkLogic and eXist, by relational databases with XML capability such as Oracle and DB2, and also by in-memory XML processors such as Saxon.
Accessing the database: DBMS and RDBMS
A database management system (DBMS) is a type of software that allows you to define, manipulate, retrieve and manage data stored within a database. A relational database management system (RDBMS) is a type of database management software that was developed in the 1970s, based on the relational model, and is still the most popular way to manage a database. DBMS technologies began in the 1960s to support hierarchical databases, and they include IBM’s Information Management System and CA’s Integrated Database Management System.
Relational database
A relational database, is a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
Relational databases are made up of a set of tables with data that fits into a predefined category. Each table has at least one data category in a column, and each row has a certain data instance for the categories which are defined in the columns.
The Structured Query Language (SQL) is the standard user and application program interface for a relational database. Relational databases are easy to extend, and a new data category can be added after the original database creation without requiring that you modify all the existing applications.
Distributed database
A distributed database is a database in which portions of the database are stored in multiple physical locations, and in which processing is dispersed or replicated among different points in a network.
Distributed databases can be homogeneous or heterogeneous. All the physical locations in a homogeneous distributed database system have the same underlying hardware and run the same operating systems and database applications. The hardware, operating systems or database applications in a heterogeneous distributed database may be different at each of the locations.
Object-oriented database
Items created using object-oriented programming languages are often stored in relational databases, but object-oriented databases are well-suited for those items.
An object-oriented database is organized around objects rather than actions, and data rather than logic. For example, a multimedia record in a relational database can be a definable data object, as opposed to an alphanumeric value.
Graph database
A graph-oriented database, or graph database, is a type of NoSQL database that uses graph theory to store, map and query relationships. Graph databases are basically collections of nodes and edges, where each node represents an entity, and each edge represents a connection between nodes.
Graph databases are growing in popularity for analyzing interconnections. For example, companies might use a graph database to mine data about customers from social media.
Cloud database
A cloud database is a database that has been optimized or built for a virtualized environment, either in a hybrid cloud, public cloud or private cloud. Cloud databases provide benefits such as the ability to pay for storage capacity and bandwidth on a per-use basis, and they provide scalability on demand, along with high availability.
A cloud database also gives enterprises the opportunity to support business applications in a software-as-a-service deployment.
NoSQL database
NoSQL databases are effective for big data performance issues that relational databases aren’t built to solve. They are most effective when an organization must analyze large chunks of unstructured data or data that’s stored across multiple virtual servers in the cloud.
A database language may also incorporate features like:
- DBMS-specific Configuration and storage engine management
- Computations to modify query results, like counting, summing, averaging, sorting, grouping, and cross-referencing
- Constraint enforcement (e.g. in an automotive database, only allowing one engine type per car)
- Application programming interface version of the query language, for programmer convenience.