
DuckDB: Friendly SQL
SQL is a friendly language. It's easy to learn and straightforward to use. Think of it as a universal language. It's a simplified English that works across many databases.
AI Summary
- This post explores DuckDB's "Friendly SQL," highlighting its advantages in a data-centric world. Though SQL's origins date back to the 1970s, DuckDB modernizes its syntax, making it more approachable and aligned with contemporary data handling practices.
- Key innovations include streamlined data ingestion (CSV, JSON, Parquet), simplified aggregation (using
SUMMARIZE), and the adoption ofIFand other Python-like functions, all within the SQL framework. The document showcases these features through practical code snippets and explains their ease-of-use.- The core message emphasizes that Friendly SQL, implemented in DuckDB, makes SQL more intuitive and efficient. It covers aspects like columnar and data type handling, selection, sorting, and aggregation, demonstrating a commitment to making data manipulation and insights generation smoother for developers and analysts.
An old language that still holds up
SQL was born in the 1970s, making it over 50 years old. Although it has evolved, it hasn't changed dramatically. A veteran coder from the disco era could still read today's SQL and feel right at home.
It's a language you can pick up quickly, and even basic skills can take you far.
That said, it's subjective. For some, the syntax might seem daunting at first. Some folks argue SQL's syntax is clunky or illogical. But don't worry, help is here.
SQL for the 21st century
In recent years, DuckDB has become a hot topic among data professionals. It is a columnar analytical database that's innovative in many ways. So much so that some are choosing it over the trusty Python library pandas.
One of DuckDB's standout features is its updated SQL syntax, dubbed “Friendly SQL.”
This addresses complaints about SQL being too rigid. The creators of DuckDB have cleverly kept the traditional syntax while adding new features. Users can stick with the old, embrace the new, or mix and match.
Below are examples of DuckDB's Friendly SQL in action. These use the file randomers.csv which was generated by synthetic data generator and show how DuckDB handles CSV data with ease.
More than just CSV reading
DuckDB makes reading CSVs easy. Just point it to the file (see the example below). The best part? The file can be local or stored in the cloud. DuckDB handles it the same way.
It also processes other formats like JSON and Parquet with the same simplicity.
For complex files, you can use functions like read_csv, read_json, or read_parquet.
No extra libraries or config files needed, which makes life easier for data analysts.
Skip the SELECT
No need to write SELECT *.
If you omit it, DuckDB selects all columns automatically.
FROM 'randomers.csv';
| date_of_birth | name | city |
|---------------|-----------------|--------------------|
| 1996-05-04 | Amanda Wilson | Matthewfurt |
| 1999-07-29 | Joseph Gonzalez | Christophershire |
| 1924-03-14 | Mark Werner | West Jennifermouth |
| NULL | Jordan Ballard | Ronaldstad |
| 1995-12-31 | Anthony Henry | NULL |
Ignore rows with NULL
The COLUMNS(*) expression is a lifesaver.
It applies a condition to all columns, like filtering out rows with NULL values.
FROM 'randomers.csv'
WHERE COLUMNS(*) IS NOT NULL;
| date_of_birth | name | city |
|---------------|-----------------|--------------------|
| 1996-05-04 | Amanda Wilson | Matthewfurt |
| 1999-07-29 | Joseph Gonzalez | Christophershire |
| 1924-03-14 | Mark Werner | West Jennifermouth |
Group and sort by all columns
DuckDB lets you group (GROUP BY ALL) and sort (ORDER BY ALL) by all columns without listing them individually.
This saves time, though sometimes listing columns explicitly is clearer.
SELECT
EXTRACT('year' FROM date_of_birth) AS "Year",
EXTRACT('month' FROM date_of_birth) AS "Month",
COUNT(*) AS "Count"
FROM 'randomers.csv'
GROUP BY ALL
ORDER BY ALL;
| Year | Month | Count |
|-------|-------|-------|
| 1924 | 3 | 1 |
| 1995 | 12 | 1 |
| 1996 | 5 | 1 |
| 1999 | 7 | 1 |
| NULL | NULL | 1 |
Get aggregates quickly
Have you ever used panda's describe? DuckDB has something similar called SUMMARIZE.
It gives you aggregates over a table. Or query.
SUMMARIZE FROM 'randomers.csv';
| column_name | column_type | min | max | approx_unique | avg | std | q25 | q50 | q75 | count | null_percentage |
|---------------|-------------|------------------|--------------------|--------------:|---------------------|------|------------|------------|------------|------:|----------------:|
| date_of_birth | DATE | 1924-03-14 | 1999-07-29 | 4 | 1979-01-10 18:00:00 | NULL | 1960-02-06 | 1996-03-03 | 1997-12-15 | 5 | 20.00 |
| name | VARCHAR | Amanda Wilson | Mark Werner | 5 | NULL | NULL | NULL | NULL | NULL | 5 | 0.00 |
| city | VARCHAR | Christophershire | West Jennifermouth | 4 | NULL | NULL | NULL | NULL | NULL | 5 | 20.00 |
Substrings, Python-style
Some DuckDB functions mimic Python's simplicity. For example, extracting the first three characters of a string can be done two ways:
SELECT
name,
SUBSTR(name, 1, 3) AS "Option 1",
name[1:3] AS "Option 2"
FROM 'randomers.csv';
| name | Option 1 | Option 2 |
|-----------------|----------|----------|
| Amanda Wilson | Ama | Ama |
| Joseph Gonzalez | Jos | Jos |
| Mark Werner | Mar | Mar |
| Jordan Ballard | Jor | Jor |
| Anthony Henry | Ant | Ant |
IF simplifies CASE
Do you find the CASE clause too wordy?
Use IF for simple yes/no logic, similar to MySQL.
SELECT
date_of_birth,
IF(MONTH(date_of_birth) > 6, 'Second half of the year', 'First half of the year') AS dob_half
FROM 'randomers.csv'
WHERE date_of_birth IS NOT NULL;
| date_of_birth | dob_half |
|---------------|-------------------------|
| 1996-05-04 | First half of the year |
| 1999-07-29 | Second half of the year |
| 1924-03-14 | First half of the year |
| 1995-12-31 | Second half of the year |
Easier data type casting
Converting data types is simpler with DuckDB.
You can use a shorthand syntax or the traditional CAST.
SELECT
date_of_birth::VARCHAR AS "Option 1",
CAST(date_of_birth AS VARCHAR) AS "Option 2"
FROM 'randomers.csv';
| Option 1 | Option 2 |
|------------|------------|
| 1996-05-04 | 1996-05-04 |
| 1999-07-29 | 1999-07-29 |
| 1924-03-14 | 1924-03-14 |
| NULL | NULL |
| 1995-12-31 | 1995-12-31 |
Exclude unwanted columns
Are your working with a table that has too many columns?
Use EXCLUDE to drop the ones you don't need.
SELECT * EXCLUDE(city)
FROM 'randomers.csv';
| date_of_birth | name |
|---------------|-----------------|
| 1996-05-04 | Amanda Wilson |
| 1999-07-29 | Joseph Gonzalez |
| 1924-03-14 | Mark Werner |
| NULL | Jordan Ballard |
| 1995-12-31 | Anthony Henry |
Conclusion
These few examples show how DuckDB breathes new life into SQL without breaking what works. Friendly SQL is a thoughtful update. It makes data work easier.
Is that all DuckDB can do? Of course not, it can do much more.
You might also be interested in
Blog

How we build single purpose apps with AI

2025 in review

How we use AI for marketing
Data Apps

Linear calendar

Speech to Text

