DuckDB: Friendly SQL

SQL is a friendly language. It's easy to learn and straightforward to use. Think of it as a universal language. It's a simplified English that works across many databases.

AI Summary
This post explores DuckDB's "Friendly SQL," highlighting its advantages in a data-centric world. Though SQL's origins date back to the 1970s, DuckDB modernizes its syntax, making it more approachable and aligned with contemporary data handling practices.
Key innovations include streamlined data ingestion (CSV, JSON, Parquet), simplified aggregation (using SUMMARIZE), and the adoption of IF and other Python-like functions, all within the SQL framework. The document showcases these features through practical code snippets and explains their ease-of-use.
The core message emphasizes that Friendly SQL, implemented in DuckDB, makes SQL more intuitive and efficient. It covers aspects like columnar and data type handling, selection, sorting, and aggregation, demonstrating a commitment to making data manipulation and insights generation smoother for developers and analysts.

An old language that still holds up

SQL was born in the 1970s, making it over 50 years old. Although it has evolved, it hasn't changed dramatically. A veteran coder from the disco era could still read today's SQL and feel right at home.

It's a language you can pick up quickly, and even basic skills can take you far.

That said, it's subjective. For some, the syntax might seem daunting at first. Some folks argue SQL's syntax is clunky or illogical. But don't worry, help is here.

SQL for the 21st century

In recent years, DuckDB has become a hot topic among data professionals. It is a columnar analytical database that's innovative in many ways. So much so that some are choosing it over the trusty Python library pandas.

One of DuckDB's standout features is its updated SQL syntax, dubbed “Friendly SQL.”

This addresses complaints about SQL being too rigid. The creators of DuckDB have cleverly kept the traditional syntax while adding new features. Users can stick with the old, embrace the new, or mix and match.

Below are examples of DuckDB's Friendly SQL in action. These use the file randomers.csv which was generated by synthetic data generator and show how DuckDB handles CSV data with ease.

Skip the SELECT

No need to write SELECT *. If you omit it, DuckDB selects all columns automatically.

FROM 'randomers.csv';

| date_of_birth | name            | city               |
|---------------|-----------------|--------------------|
| 1996-05-04    | Amanda Wilson   | Matthewfurt        |
| 1999-07-29    | Joseph Gonzalez | Christophershire   |
| 1924-03-14    | Mark Werner     | West Jennifermouth |
| NULL          | Jordan Ballard  | Ronaldstad         |
| 1995-12-31    | Anthony Henry   | NULL               |

Ignore rows with NULL

The COLUMNS(*) expression is a lifesaver. It applies a condition to all columns, like filtering out rows with NULL values.

FROM 'randomers.csv' 
WHERE COLUMNS(*) IS NOT NULL;

| date_of_birth | name            | city               |
|---------------|-----------------|--------------------|
| 1996-05-04    | Amanda Wilson   | Matthewfurt        |
| 1999-07-29    | Joseph Gonzalez | Christophershire   |
| 1924-03-14    | Mark Werner     | West Jennifermouth |

Group and sort by all columns

DuckDB lets you group (GROUP BY ALL) and sort (ORDER BY ALL) by all columns without listing them individually. This saves time, though sometimes listing columns explicitly is clearer.

SELECT
    EXTRACT('year' FROM date_of_birth) AS "Year",
    EXTRACT('month' FROM date_of_birth) AS "Month",
    COUNT(*) AS "Count"
FROM 'randomers.csv' 
GROUP BY ALL 
ORDER BY ALL;

| Year  | Month | Count |
|-------|-------|-------|
| 1924  | 3     | 1     |
| 1995  | 12    | 1     |
| 1996  | 5     | 1     |
| 1999  | 7     | 1     |
| NULL  | NULL  | 1     |

Get aggregates quickly

Have you ever used panda's describe? DuckDB has something similar called SUMMARIZE.

It gives you aggregates over a table. Or query.

SUMMARIZE FROM 'randomers.csv';

|  column_name  | column_type |       min        |        max         | approx_unique |         avg         | std  |    q25     |    q50     |    q75     | count | null_percentage |
|---------------|-------------|------------------|--------------------|--------------:|---------------------|------|------------|------------|------------|------:|----------------:|
| date_of_birth | DATE        | 1924-03-14       | 1999-07-29         | 4             | 1979-01-10 18:00:00 | NULL | 1960-02-06 | 1996-03-03 | 1997-12-15 | 5     | 20.00           |
| name          | VARCHAR     | Amanda Wilson    | Mark Werner        | 5             | NULL                | NULL | NULL       | NULL       | NULL       | 5     | 0.00            |
| city          | VARCHAR     | Christophershire | West Jennifermouth | 4             | NULL                | NULL | NULL       | NULL       | NULL       | 5     | 20.00           |

Substrings, Python-style

Some DuckDB functions mimic Python's simplicity. For example, extracting the first three characters of a string can be done two ways:

SELECT 
    name, 
    SUBSTR(name, 1, 3) AS "Option 1", 
    name[1:3] AS "Option 2" 
FROM 'randomers.csv';

| name            | Option 1 | Option 2 |
|-----------------|----------|----------|
| Amanda Wilson   | Ama      | Ama      |
| Joseph Gonzalez | Jos      | Jos      |
| Mark Werner     | Mar      | Mar      |
| Jordan Ballard  | Jor      | Jor      |
| Anthony Henry   | Ant      | Ant      |

IF simplifies CASE

Do you find the CASE clause too wordy? Use IF for simple yes/no logic, similar to MySQL.

SELECT
    date_of_birth,
    IF(MONTH(date_of_birth) > 6, 'Second half of the year', 'First half of the year') AS dob_half
FROM 'randomers.csv'
WHERE date_of_birth IS NOT NULL;

| date_of_birth |        dob_half         |
|---------------|-------------------------|
| 1996-05-04    | First half of the year  |
| 1999-07-29    | Second half of the year |
| 1924-03-14    | First half of the year  |
| 1995-12-31    | Second half of the year |

Easier data type casting

Converting data types is simpler with DuckDB. You can use a shorthand syntax or the traditional CAST.

SELECT 
    date_of_birth::VARCHAR AS "Option 1", 
    CAST(date_of_birth AS VARCHAR) AS "Option 2"
FROM 'randomers.csv';

| Option 1   | Option 2   |
|------------|------------|
| 1996-05-04 | 1996-05-04 |
| 1999-07-29 | 1999-07-29 |
| 1924-03-14 | 1924-03-14 |
| NULL       | NULL       |
| 1995-12-31 | 1995-12-31 |

Exclude unwanted columns

Are your working with a table that has too many columns? Use EXCLUDE to drop the ones you don't need.

SELECT * EXCLUDE(city)
FROM 'randomers.csv';

| date_of_birth |      name       |
|---------------|-----------------|
| 1996-05-04    | Amanda Wilson   |
| 1999-07-29    | Joseph Gonzalez |
| 1924-03-14    | Mark Werner     |
| NULL          | Jordan Ballard  |
| 1995-12-31    | Anthony Henry   |

Conclusion

These few examples show how DuckDB breathes new life into SQL without breaking what works. Friendly SQL is a thoughtful update. It makes data work easier.

Is that all DuckDB can do? Of course not, it can do much more.

Did you like the article? Share it with others or write us something nice. Thank you!

Back to blog homepage