Getting Started with SQL: A Beginner's Guide to Data Analysis on SQL

Photo by Growtika on Unsplash

Getting Started with SQL: A Beginner's Guide to Data Analysis on SQL

Introduction to SQL for Data Analysis

Structured Query Language (SQL) is a widely-used programming language for managing and analyzing data stored in relational databases. For data to be accessed and analyzed for use in decision-making, it has to be initially collected and stored. And how is it stored? in a database.

Accessing the data requires knowledge of SQL. A survey run by Luke Barousse revealed that SQL had a 43% likelihood of being a requirement. The requirement for the skill tends to increase as you progress in your career.

This introductory guide will provide you with a solid foundation in SQL fundamentals. We will cover basic SQL syntax, data manipulation, and data aggregation techniques, along with practical examples and exercises to help you develop your skills.

SQL Basics

A database is a collection of structured data that is organized and managed for easy access, manipulation, and retrieval. Databases are used to store large volumes of data logically and efficiently.

In a database, tables are used to organize the data into rows and columns. Each row in a table represents a single record or instance of a particular entity, while each column represents a specific attribute of that entity. For example, a database for an online store might have tables for products, customers, orders, and shipping information.

The SQL syntax consists of a series of commands and statements used to retrieve, insert, update, and delete data in a database. The most basic SQL command is the SELECT statement, which is used to retrieve data from a table.

For example, the following SQL query would select all records from a table named "customers" and return them in a tabular format;

SELECT *
FROM customers;

The SQL SELECT statement takes in the syntax;

SELECT column1, column2, ...
FROM table_name;

Next, a SELECT statement can have an optional WHERE clause, which allows us to fetch data from a table that matches the specified condition(s). An example includes;

SELECT *
FROM customers
WHERE name = 'Chris'

There are various data types and operators used in SQL. Data types are used to define the type of data that can be stored in a particular column of a table.

The most common data types include integers, decimals, strings, dates, and Boolean values. Operators are used in SQL to perform logical and arithmetic operations on data. Some common operators in SQL include:

= (equals)

< (less than)

> (greater than)

<= (less than or equal to)

>= (greater than or equal to)

<> (not equal to)

AND

OR

NOT

They are useful in performing advanced queries and filtering through the data. For example, the following SQL query would select all records from a table named "orders" where the total price is greater than 100:

SELECT * 
FROM orders 
WHERE total_price > 100;

This query would retrieve all records from the "orders" table where the total price is greater than 100 and return them in a tabular format.

You can also combine multiple operators to perform SQL queries. For example;

SELECT *
FROM Customers
WHERE (country = 'Kenya' OR country = 'Uganda') AND orders > 100;

The ORDER BY clause is useful when seeking to sort the result in any order, either descending or ascending. For example;

SELECT *
FROM Customers
ORDER BY orders;

There are other clauses such as GROUP BY, LIKE, UNION, ANY and ALL, CASE, HAVING, and EXISTS which are likewise useful when selecting and sorting out data we may seek from a database.

We may also seek to join two tables and the SQL JOIN joins two tables based on a common column and results in data with matching values in these columns. There are other types of joins such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

An example of a JOIN is;

SELECT Customers.customer_id, Customers.name, orders.amount
FROM Customers
JOIN orders
ON Customers.customer_id = orders.customer;

SQL is also useful in creating databases in SQL. The CREATE DATABASE statement is used. An example is;

CREATE DATABASE Customers

After creating databases, you need to create tables. The CREATE TABLE statement is used. For example;

CREATE TABLE Customers_Kenya (
  id INT,
  name VARCHAR(50),
  address TEXT,
  email VARCHAR(50),
  phone VARCHAR(10)

In the event you want to drop a database, the DROP DATABASE Customers; query can be used to delete the Customers database. You can also delete a table using the query DROP TABLE Customers_Kenya;

Altering a table is useful when seeking to add a column, rename a column, modify, or delete a column. The ALTER TABLE command is used when seeking to alter a column.

After creating a table, you may want to insert new rows in the database table. To do this, the following command is used;

INSERT INTO Customers_Kenya(id, name, address, email, phone)
VALUES (1, 'user', '123-4567', 'User@gmail.com', 0712345678);

To update a record, we use the UPDATE statement;

UPDATE Customers_Kenya
SET name = 'Johnny'
WHERE id = 1;

The DELETE statement is used to delete rows from a database table.

DELETE FROM Customers_Kenya
WHERE id = 1;

Constraints in SQL are rules that are used to ensure data integrity in a database. They can be applied to a table or a column within a table, and are used to enforce business rules, improve data quality, and maintain data consistency. Here's an example of how constraints can be used in SQL;

CREATE TABLE Customers (
  id INT PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  address VARCHAR(100),
  email VARCHAR(50) UNIQUE NOT NULL
);

In this example, the email column has a UNIQUE constraint, which means that no two rows can have the same value in the email column. Additionally, the NOT NULL constraint ensures that every row in the table must have a value for the email column, so we don't end up with any missing data.

If we try to insert a new row into the table with an email address that already exists, the database will generate an error, preventing us from violating the constraint.

Conclusion

SQL is a database management language that is useful for data scientists, data analysts, and data engineers. If you are seeking to start a career in data, it is highly recommended that you learn SQL.

In this introduction to SQL, we covered the basic concepts of SQL, including how to create tables, how to insert, update, and delete data, and how to query data from a database. We also touched on some important concepts like constraints which are essential for ensuring the integrity and consistency of a database.

Did you find this article valuable?

Support Pius Mutuma by becoming a sponsor. Any amount is appreciated!