Let’s Learn the Basics of SQL

What is SQL, and Why do I Need it?

Tim McAleer
5 min readFeb 24, 2021

SQL is a query language designed primarily to access and manage data stored in a relational database. There are many Dialects of SQL available and the one you end up using will likely be dictated by either your employer or personal preference. These Dialects are much like the languages we speak: they may use differing terms for very specific reasons, but are built on the same foundation of communication. This blog is to illuminate some of these foundational methods that every Dialect uses.

Examples of different dialects

SELECT lesson_plan_1 FROM this_blog

The single most important distinction to learn is the difference between a table and a dataset. A table contains rows of information separated into columns. A dataset contains a set of tables, each with their own purpose and data. Some information may be duplicated across tables, but each is organized according to the needs of the table. For example, a person’s salary may be listed in a “Jobs” table along with the department they work in, but information about the department may be held in a “Departments” table.

Some notes about the format of the code. SQL, like any other written language, is best written in a way that enhances readability. The SQL statements SELECT and FROM are written in all caps to denote that it is part of the SQL language. Any columns or tables you are accessing are written lowercase for the opposite reason, to signify these are attributes of the data you want to read. Last, it is good form to give each step of the process its own line. SQL statements can be written as a run-on sentence in a single line but as a general rule each time your SQL command has a new purpose it should have its own line. For this example, let’s say you wanted to access the entire “Jobs” table.

SELECT * 
FROM jobs

SELECT is how you choose what columns you wish to see in your result. The * is a symbol meaning “all of the columns.” If you were to choose 2 or more columns from a larger table, you would separate them with commas.

FROM is how you choose what table you are accessing. Again, a list of tables would be separated by commas. There is no * function for tables.

That’s all you need for a basic SQL command! You’ve successfully accessed your data! Good job!

WHERE learning > 0

SELECT first_name, last_name
FROM jobs
WHERE salary > 50000

The WHERE statement should look familiar if you have worked with comparison operators before. This is how you define the parameters of your data to retrieve exactly what you want. This example will result in a table showing only the first and last names of every person who makes over $50,000. It’s key to remember that the salary column will not be returned unless specified. If you are looking for a range of salary, the BETWEEN statement exists to simplify your search.

SELECT first_name, last_name
FROM jobs
WHERE salary BETWEEN 50000 AND 80000

Remember that BETWEEN is inclusive and subsequently a little less clear to read than writing it out, but will work either way:

SELECT first_name, last_name
FROM jobs
WHERE salary >= 50000 AND salary <= 80000

Aggregate Functions, GROUP BY and HAVING

This section is the biggest stumbling block for beginners. Don’t worry! There’s nothing wrong with writing code that throws an error. Learning why errors happen will always be the best way to make fewer errors. Let’s take it one step at a time.

Aggregate Functions

These are generally pretty self explanatory and there are many functions available for your use. Research is important to discover if a function exists to make your work easier.

SELECT COUNT(*)
FROM jobs

The code above does one thing and it’s right in the name: It will return a count of the number of rows in your “Jobs” table. One important consideration is that this function will return the same result no matter what column you put in the COUNT() function. For example, COUNT(first_name) and COUNT(salary) will return the same number, since that column is present in every row and the function returns the number of rows. SUM(salary) will return the total dollar figure salary but will not work on a column of strings, like first_name.

GROUP BY and HAVING

Let’s say your “Jobs” table had the salaries for every position in your company and you only wanted to see the average salary of all the software engineers in your company. The GROUP BY statement will do exactly that, group the data by whatever column you supply through the aggregate function you’ve supplied.

SELECT title, AVG(salary)
FROM jobs
GROUP BY title
HAVING title = 'software_engineer'

The HAVING statement above is very similar to the WHERE statement except it is only used after a GROUP BY. In this example, your resulting table will have only one row, and two columns, one with the title ‘software_engineer’ and one column with the average salary of every software engineer.

Without the HAVING statement, your table would have as many rows as you have job titles and each title would have its own average salary listed.

WHERE, GROUP BY and HAVING are easy to mix up. The key to remember is that a GROUP BY is always used when you need to define a result from an aggregate function. Producing an average salary from your entire table will work without grouping, but as soon as you try to limit your results by job title or another parameter you will need a GROUP BY to organize.

When in doubt, it’s best practice to visualize what the table you’re looking for will look like. With a table containing 30 rows and 3 job titles, if you GROUP BY the job title your result will only have 3 rows. Each row can contain an average salary or a count or any single measurement for each job title. Attempting to produce 10 different individual salaries in a table grouped into 3 rows will produce an error. Planning your results ahead of time will always benefit your SQL queries and will go a long way toward getting the results you want.

--

--