Implementing SQL JOIN Support in ChronDB via PostgreSQL Protocol
SQL JOIN operations are a fundamental part of any robust database system, allowing developers to combine data from multiple tables based on related columns. In this blog post, we’ll explore how we implemented INNER JOIN and LEFT JOIN support in ChronDB, our Git-based chronological database with PostgreSQL protocol compatibility.

Background: ChronDB Architecture
Before diving into the JOIN implementation, let’s briefly review ChronDB’s architecture. ChronDB is a chronological key-value database built on Git’s internal structure, providing versioning capabilities by default. We’ve implemented a PostgreSQL-compatible protocol layer that allows clients to connect using standard PostgreSQL drivers and execute SQL queries.
The database uses Git repositories as the storage engine:
- Git repository: database
- Git branch: schema
- Directory in the repository: table
- JSON files: documents (records)
The Challenge: Implementing JOIN Operations
JOIN operations are complex because they require:
- Parsing and understanding the JOIN syntax in SQL queries
- Finding the related records across different tables
- Combining the records based on join conditions
- Handling different types of joins (INNER, LEFT, RIGHT)
- Supporting filtering and projection on joined data
Our implementation in PR #24 focused on supporting the most common types: INNER JOIN and LEFT JOIN.
Implementation Approach
1. SQL Parsing Enhancements
The first step was improving our SQL parser to recognize JOIN clauses. This required:
- Adding JOIN type detection in the tokenizer
- Implementing join condition parsing with support for table.field format
- Updating the SELECT query parser to handle JOIN clauses
2. Data Retrieval and Join Execution
The core of the implementation is in the query execution logic:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
| ;; Pseudocode for the JOIN implementation
(defn handle-select [index storage query branch]
(let [primary-table (:from query)
join-clauses (:join query)
primary-docs (get-docs-from-table index storage primary-table branch)]
(if (empty? join-clauses)
;; Regular select without joins
(filter-and-project primary-docs (:where query) (:select query))
;; Handle joins
(let [joined-docs
(for [primary-doc primary-docs
:let [result-doc (process-joins primary-doc join-clauses index storage branch)]]
result-doc)]
(filter-and-project joined-docs (:where query) (:select query))))))
(defn process-joins [primary-doc join-clauses index storage branch]
(reduce
(fn [doc join-clause]
(let [join-type (:type join-clause) ;; "INNER" or "LEFT"
join-table (:table join-clause)
join-condition (:condition join-clause)
;; Find matching documents in the secondary table
matching-docs (find-matching-docs doc join-table join-condition index storage branch)]
(case join-type
"INNER" (when (seq matching-docs)
(merge-docs doc (first matching-docs) join-table))
"LEFT" (if (seq matching-docs)
(merge-docs doc (first matching-docs) join-table)
(merge-with-nulls doc join-table))
;; Default case - no join
doc)))
primary-doc
join-clauses))
|
3. Join Condition Evaluation
The join condition parser supports the standard syntax:
1
| SELECT * FROM users INNER JOIN orders ON users.id = orders.user_id
|
We parse this into a condition structure that can be evaluated against documents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| (defn parse-join-condition [condition-tokens]
(let [left-side (parse-field (first condition-tokens))
operator (second condition-tokens)
right-side (parse-field (nth condition-tokens 2))]
{:left left-side
:operator operator
:right right-side}))
(defn evaluate-join-condition [primary-doc secondary-doc condition]
(let [left-value (get-nested-value primary-doc (:left condition))
right-value (get-nested-value secondary-doc (:right condition))
operator (:operator condition)]
(case operator
"=" (= left-value right-value)
"<>" (not= left-value right-value)
;; Handle other operators
)))
|
4. Full-Text Search Integration
We also improved the full-text search capability to work seamlessly with JOINs:
1
2
3
4
5
6
7
8
9
10
11
12
13
| (defn- fts-get-matching-docs
"Find documents matching FTS condition"
[index storage fts-condition branch]
(let [search-terms (extract-search-terms fts-condition)
normalized-terms (normalize-search-terms search-terms)]
(try
(let [matching-ids (index/search index normalized-terms)]
(when (seq matching-ids)
(log/debug "Found" (count matching-ids) "matching documents for terms" search-terms)
(storage/get-documents storage matching-ids branch)))
(catch Exception e
(log/error "Error in FTS search:" (.getMessage e))
[]))))
|
Implementation Challenges
1. Table and Field Name Resolution
One challenge was handling table-qualified field names. When referencing users.name
, we needed to distinguish this from the name
field in other tables.
Our solution was to prefix fields from joined tables with the table name in the result set:
1
2
3
4
5
6
7
| (defn merge-docs [primary-doc secondary-doc join-table]
(let [secondary-map (into {}
(keep (fn [[k v]]
(when (not= k :_table)
[(keyword (str join-table "." (name k))) v])))
secondary-doc)]
(merge primary-map secondary-map)))
|
2. LEFT JOIN Null Handling
For LEFT JOINs, we needed to handle cases where no matching records exist in the secondary table. We solved this by creating a map with null values for all fields in the secondary table:
1
2
3
4
5
| (defn create-null-secondary-map [join-table secondary-fields]
(into {}
(map (fn [field]
[(keyword (str join-table "." (name field))) nil]))
secondary-fields))
|
3. Multiple Matches Handling
In real databases, a JOIN might match multiple records in the secondary table. Our initial implementation only used the first match, but we later enhanced it to handle multiple matches correctly:
1
2
3
4
5
6
7
8
9
10
11
12
| (if (seq matching-secondary-docs)
;; For matches, merge primary with each secondary doc
(map (fn [sec-doc]
(let [secondary-map (into {}
(keep (fn [[k v]]
(when (not= k :_table)
[(keyword (str join-table "." (name k))) v])))
sec-doc)]
(merge primary-map secondary-map)))
matching-secondary-docs)
;; For non-matches in LEFT JOIN, merge with nulls
[(merge primary-map null-secondary-map)])
|
JOIN operations can be expensive, especially with large datasets. To optimize performance:
- We implemented indexes for commonly joined fields
- Added logging to help diagnose slow queries
- Limited the result set size for joined queries to prevent memory issues
- Used lazy sequences where possible to avoid loading all data at once
Testing the Implementation
We wrote comprehensive tests to verify the JOIN functionality:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| -- Basic INNER JOIN test
SELECT users.name, orders.amount
FROM users
INNER JOIN orders ON users.id = orders.user_id
WHERE orders.amount > 100
-- LEFT JOIN test with nulls
SELECT users.name, orders.amount
FROM users
LEFT JOIN orders ON users.id = orders.user_id
-- Multi-table join
SELECT users.name, orders.id, items.name
FROM users
INNER JOIN orders ON users.id = orders.user_id
INNER JOIN items ON orders.item_id = items.id
|
Conclusion
Implementing JOIN support in ChronDB significantly enhances its capabilities as a database system. While our Git-based storage is primarily designed for document storage (example json), the addition of JOIN operations brings relational capabilities that many applications require.
The implementation leverages Clojure’s strengths in data processing and transformation while maintaining the core benefits of ChronDB - versioning, auditability, and a simple data model.
In future updates, we plan to:
- Add support for FULL OUTER JOINs
- Optimize JOIN performance with better indexing
- Support more complex join conditions with AND/OR operators
- Implement aggregate functions across joined tables
We’re excited about how this enhancement makes ChronDB more versatile while maintaining its unique chronological, Git-based architecture.
References