Getting Started with SQL: A Beginner's Guide to Data Analysis on SQL
Introduction to SQL for Data Analysis
Structured Query Language (SQL) is a widely-used programming language for managing and analyzing data stored in relational databases. For data to be accessed and analyzed for use in decision-making, it has to be initially collected and stored. And how is it stored? in a database.
Accessing the data requires knowledge of SQL. A survey run by Luke Barousse revealed that SQL had a 43% likelihood of being a requirement. The requirement for the skill tends to increase as you progress in your career.
This introductory guide will provide you with a solid foundation in SQL fundamentals. We will cover basic SQL syntax, data manipulation, and data aggregation techniques, along with practical examples and exercises to help you develop your skills.
SQL Basics
A database is a collection of structured data that is organized and managed for easy access, manipulation, and retrieval. Databases are used to store large volumes of data logically and efficiently.
In a database, tables are used to organize the data into rows and columns. Each row in a table represents a single record or instance of a particular entity, while each column represents a specific attribute of that entity. For example, a database for an online store might have tables for products, customers, orders, and shipping information.
The SQL syntax consists of a series of commands and statements used to retrieve, insert, update, and delete data in a database. The most basic SQL command is the SELECT
statement, which is used to retrieve data from a table.
For example, the following SQL query would select all records from a table named "customers" and return them in a tabular format;
SELECT *
FROM customers;
The SQL SELECT statement takes in the syntax;
SELECT column1, column2, ...
FROM table_name;
Next, a SELECT statement can have an optional WHERE clause, which allows us to fetch data from a table that matches the specified condition(s). An example includes;
SELECT *
FROM customers
WHERE name = 'Chris'
There are various data types and operators used in SQL. Data types are used to define the type of data that can be stored in a particular column of a table.
The most common data types include integers, decimals, strings, dates, and Boolean values. Operators are used in SQL to perform logical and arithmetic operations on data. Some common operators in SQL include:
= (equals)
< (less than)
> (greater than)
<= (less than or equal to)
>= (greater than or equal to)
<> (not equal to)
AND
OR
NOT
They are useful in performing advanced queries and filtering through the data. For example, the following SQL query would select all records from a table named "orders" where the total price is greater than 100:
SELECT *
FROM orders
WHERE total_price > 100;
This query would retrieve all records from the "orders" table where the total price is greater than 100 and return them in a tabular format.
You can also combine multiple operators to perform SQL queries. For example;
SELECT *
FROM Customers
WHERE (country = 'Kenya' OR country = 'Uganda') AND orders > 100;
The ORDER BY clause is useful when seeking to sort the result in any order, either descending or ascending. For example;
SELECT *
FROM Customers
ORDER BY orders;
There are other clauses such as GROUP BY, LIKE, UNION, ANY and ALL, CASE, HAVING, and EXISTS
which are likewise useful when selecting and sorting out data we may seek from a database.
We may also seek to join two tables and the SQL JOIN
joins two tables based on a common column and results in data with matching values in these columns. There are other types of joins such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN
.
An example of a JOIN is;
SELECT Customers.customer_id, Customers.name, orders.amount
FROM Customers
JOIN orders
ON Customers.customer_id = orders.customer;
SQL is also useful in creating databases in SQL. The CREATE DATABASE
statement is used. An example is;
CREATE DATABASE Customers
After creating databases, you need to create tables. The CREATE TABLE
statement is used. For example;
CREATE TABLE Customers_Kenya (
id INT,
name VARCHAR(50),
address TEXT,
email VARCHAR(50),
phone VARCHAR(10)
In the event you want to drop a database, the DROP DATABASE Customers;
query can be used to delete the Customers database. You can also delete a table using the query DROP TABLE Customers_Kenya;
Altering a table is useful when seeking to add a column, rename a column, modify, or delete a column. The ALTER TABLE
command is used when seeking to alter a column.
After creating a table, you may want to insert new rows in the database table. To do this, the following command is used;
INSERT INTO Customers_Kenya(id, name, address, email, phone)
VALUES (1, 'user', '123-4567', 'User@gmail.com', 0712345678);
To update a record, we use the UPDATE
statement;
UPDATE Customers_Kenya
SET name = 'Johnny'
WHERE id = 1;
The DELETE
statement is used to delete rows from a database table.
DELETE FROM Customers_Kenya
WHERE id = 1;
Constraints in SQL are rules that are used to ensure data integrity in a database. They can be applied to a table or a column within a table, and are used to enforce business rules, improve data quality, and maintain data consistency. Here's an example of how constraints can be used in SQL;
CREATE TABLE Customers (
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
address VARCHAR(100),
email VARCHAR(50) UNIQUE NOT NULL
);
In this example, the email
column has a UNIQUE
constraint, which means that no two rows can have the same value in the email
column. Additionally, the NOT NULL
constraint ensures that every row in the table must have a value for the email
column, so we don't end up with any missing data.
If we try to insert a new row into the table with an email address that already exists, the database will generate an error, preventing us from violating the constraint.
Conclusion
SQL is a database management language that is useful for data scientists, data analysts, and data engineers. If you are seeking to start a career in data, it is highly recommended that you learn SQL.
In this introduction to SQL, we covered the basic concepts of SQL, including how to create tables, how to insert, update, and delete data, and how to query data from a database. We also touched on some important concepts like constraints which are essential for ensuring the integrity and consistency of a database.