The spot for professionals interested in advanced data analytics & applications of innovative technologies!

Database Management Systems

Data has proven to be the most important commodity in the digital economy. But to harness its value, data needs to be stored and processed. Only then can it be transformed into information and further refined into knowledge. The mechanism for storing, curating, and processing data - facilitating these transformations - is the database. To work in information technology or management analytics is to fundamentally understand the foundation of database design and its importance.

A Quick Description of the Database Management System

A database management system is a software system which facilitates the organization of collected data into a particular database architecture. Many types of databases exist, each with their own benefits. For example, relational (Relational Database Management System, or RDBMS), document store, key-value store, column-oriented, graph, or others. Popular DBMSs include MongoDB, Cassandra, Redis, MySQL, Microsoft SQL Server, SQLite, Neo4j, among many, many, many more.
Click Here

101: Understanding The Basics of Databases for Analytics

Database Basics for Management Analytics

Relational

Relational (aka SQL) databases are the oldest general purpose database type and remain extremely popular today. Regardless of your industry or profession, if you are in operations or analytics, you will come across RDBMs and need to understand the mechanics and processes of this architecture.

Non-Relational

Also termed “NoSQL” this is a varied collection of modern database types that offer approaches that differ from the standard relational pattern. NoSQL is said to stand for either “non-SQL” or “not only SQL” indicating that this may allow for SQL-like querying.

Relational

The Power of The Relational Database Paradigms

What is a relational data model?

The relational database type uses a structure that allows users to identify and access data in relation to another piece of data in the database. These databases are extremely popular just as they are easy to understand because data is organized into structured, defined, and organized tables.

Also known as: the SQL database

SQL stands for “structured query language” and was a language developed to perform operations in the relational database system. Because of the relational-nature of this model and its organization of tables, rows, and columns, SQL optimizes the querying of data collected.

The design of a relational database

Luckily, this is a fairly simple architecture to understand. The basics inform us that a relational database is made up of tables, records, fields, columns, and rows affecting the way we build and query data in this model.

Tables:
A table is a set of data elements (values).
Records:
A piece of data in a file is called a record.
Fields:
Each item in a record is called a field.
Rows:
One piece of data or record is called a row.
Columns:
Each item or field is called a column.
relational database model example

Example: diagram of the relational data model for South African Municipalities uploaded to Research Gate by Jean Vincent Fonou Dombeau.

Examples of relational databases

There are a variety of relational database products available. Selecting one over the other is typically an enterprise decision or a personal one, depending on the work and environment. Open-source databases such as PostgreSQL or MariaDB are free while others requires purchase or subscription plans such as the Oracle-owned MySQL. Cloud-based databases are also available with Amazon and Google, among others.

As an analyst you can ensure that you are relatively agnostic to which database you use over another if you properly understand the fundamental nature of relational database management systems and its model, and are able to communicate/code in the structured querying language . Then your work is a matter of transforming data into valuable information and knowledge.

What you need to get started

First, you will learn how to query data from a single table using basic data selection techniques such as selecting columns, sorting a result set, and filtering rows. Your next effort is where you learn more advanced query techniques such as joining multiple tables, using set operations, and constructing subqueries. It is wise to also learn how to manage database tables such as creating new tables or modifying an existing table’s structure.

Learn structured query language

There are abundant courses online teaching SQL for students. The best experience involves practice. If you are completely new, consider a course where you are also playing with simple tables in a hosted RDMS environment that way you are not stuck downloading and configuring any software. If you have access to a database system you can take courses that provide datasets while teaching more advanced concepts and techniques.

Non-Relational

The Power of The Non-Relational Database Paradigms

What is a non-relational data model?

An alternative emerged to legacy relational database systems and were made increasingly popular – from necessity – by tech giants such as Google, Netflix, Facebook, and Amazon all of whom manage and process big data. In contrast to traditional RDBMs, non-relational databases do not store data in structured tables but instead provide a variety of new ways to store data. Each with important benefits and trade-offs. NoSQL databases may be organized as key-value, graph-store, document-based, or column-based.

Why the No-SQL database design?

Modern applications placed new demands on existing database systems that were lacking within the RDBMs architecture. On the one hand, relational databases were not designed to cope with the scale and agility challenges of applications today, nor were they developed  with the capacities of today’s storage and processing power. Given the commodity of today’s exploding volume of new and rapidly changing data types — structured, semi-structured, unstructured and polymorphic data, No-SQL databases emerged.

Different types of No-SQL databases

Luckily, this is a fairly simple architecture to understand. The basics inform us that a relational database is made up of tables, records, fields, columns, and rows affecting the way we build and query data in this model.

Document databases:
These databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph databases:
Graph database architecture is used to store information about networks of data, such as social connections. Examples include Neo4J and Giraph.
Key-value stores:
This is the simplest NoSQL database. Every single item in the database is stored as an attribute name (or ‘key’), together with its value. Examples are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as ‘integer’, which adds functionality.
Wide-column stores:
Examples include Cassandra and HBase. These databases are optimized for queries over large datasets, and store columns of data together, instead of rows.

Example: diagram of a non-relational data model which represents complex and deeply nested document structure. Photo from Ebay engineering.

Will you need a no-sql databases?

In terms of building new applications, considerations of alternative data models such as those available in the non-relational paradigm are extremely important. There are multiple examples and products for the various frameworks. It often depends on scalability, latency, data type, performance, querying language, etc etc… But if you want your application to handle a lot of complicated querying, database transactions and routine analysis of data, you’ll probably want to stick with a relational database.

As an analyst working with applications that are heavy into data analytics, you may not depend upon non-relational databases as much as you may assume given their growing popularity. This is because non-relational databases have no joins like there would be in relational databases making it difficult to query data. This means you would need to perform multiple queries and join the data manually within your code – which is not easy or efficient in terms of mean-adjusted error and effort.

What you need to get started

First, you will need a very strong understanding of exactly what your data requirements are before you select between a relational or non-relational data model. This requires an audit of data sources, use-cases, storage and processing requirements, application goals, and user experience. Nevertheless you may need to store unstructured data so that you can convert it into meaningful information for your business – the correct database can facilitate this.

Learning new querying languages

When it comes to actually using a database we all need a concrete mechanism for creating, manipulating and querying data. We need a query language. We already know about SQL. But what if your data isn’t structured in tables, columns and rows? How do we query this data? It it is worthwhile to understand various querying languages. Examples include GQL – the Graph Querying Language for graph databases or Object Query Language (OQL) a query language standard for object-oriented databases also modeled after SQL.

Case studies

How graphing database Neo4j helped solve metasearch for Lyft

Women in technology

Read our interviews and browse resources to inspire and engage

Join our newsletter!

Discover your professional and personal career goals