ListWords API - Filtering and Querying¶
Overview¶
The ListWords API uses a unified CEL (Common Expression Language) based filtering approach for maximum flexibility and consistency across all List APIs.
API Definition¶
message ListWordsRequest {
string filter = 1; // CEL expression for filtering
string order_by = 2; // Sorting specification
common.v1.PaginationRequest pagination = 3;
}
Filter Fields¶
The following fields are available for filtering:
| Field | Type | Operators | Description |
|---|---|---|---|
language |
string | ==, != |
ISO 639-1 language code (e.g., "en", "fr") |
keyword |
string | ==, != |
Searches within lemma AND inflected forms (partial match, case-insensitive) |
category |
string[] | in |
Filters by word categories/tags (OR logic) |
surface |
string[] | in |
Batch lookup by exact lemma OR inflected forms (exact match) |
Filter Examples¶
Language Filtering¶
Find all English words:
Find all French words:
Keyword Search¶
The keyword field searches in both the lemma and all inflected forms, making it ideal for fuzzy search:
Search for words containing "book":
Search by inflected form - finds the base word:
Partial match:
Category Filtering¶
Find CET-4 level words:
Find words that are in either CET-4 OR CET-6:
Surface Term Lookup (Batch Lookup)¶
The surface field enables exact match batch lookup of words by their lemma or inflected forms.
Key differences from keyword:
- keyword: Partial match (contains), searches in both lemma and forms
- surface: Exact match, used for batch lookup of specific terms
The surface field is particularly useful for:
- Looking up multiple specific words in one query
- Finding words by exact inflected forms (e.g., "running" → "run")
- Batch operations with exact term matching
Find word by exact lemma:
Find word by exact inflected form:
Batch lookup multiple words:
Batch lookup by mixed lemmas and forms:
filter: 'surface in ["running", "swam", "walked"]'
// Returns: run, swim, walk (by their inflected forms)
Combined Filtering¶
English words in technology category:
CET-4 words containing "comp":
English words matching specific surface forms:
Sorting¶
The order_by field supports:
| Field | Description |
|---|---|
lemma |
Sort by lemma alphabetically |
created_at |
Sort by creation time |
updated_at |
Sort by last update time |
Add desc for descending order:
order_by: "lemma" # Ascending (A-Z)
order_by: "lemma desc" # Descending (Z-A)
order_by: "created_at desc" # Newest first
Pagination¶
Standard pagination with page number and page size:
page_no: Page number (1-indexed)page_size: Number of items per page (max: 100)
Complete Examples¶
Example 1: English CET-4 words, sorted alphabetically¶
{
"filter": "language == \"en\" && category in [\"cet4\"]",
"order_by": "lemma",
"pagination": {
"page_no": 1,
"page_size": 50
}
}
Example 2: Batch lookup with sorting¶
{
"filter": "surface in [\"running\", \"swimming\", \"walking\"]",
"order_by": "lemma",
"pagination": {
"page_no": 1,
"page_size": 10
}
}
Example 3: Technology words updated recently¶
{
"filter": "language == \"en\" && category in [\"technology\"]",
"order_by": "updated_at desc",
"pagination": {
"page_no": 1,
"page_size": 20
}
}
Implementation Details¶
Surface Term Lookup¶
The surface field uses a sophisticated query that:
1. Matches the word's lemma directly (case-insensitive)
2. Joins with the lexemes and lexeme_forms tables to find words with matching inflected forms
3. Uses OR logic to return words that match ANY of the provided surface terms
SQL logic (simplified):
SELECT * FROM words
WHERE
LOWER(lemma) IN ('running', 'swimming') OR
EXISTS (
SELECT 1 FROM lexemes l
JOIN lexeme_forms f ON l.id = f.lexeme_id
WHERE l.word_id = words.id
AND LOWER(f.text) IN ('running', 'swimming')
)
This enables efficient batch lookup while maintaining the ability to find words by any of their forms.
Category Filtering Logic¶
Category filtering uses OR logic:
- category in ["cet4", "cet6"] returns words that have EITHER "cet4" OR "cet6" in their categories array
- A word with categories: ["cet4", "technology"] will match the above filter
Performance Considerations¶
- Keyword searches use case-insensitive partial matching
- Surface term lookups use indexed queries for efficiency
- Category filtering leverages JSONB array operators in PostgreSQL
- Pagination is applied after filtering and sorting to minimize data transfer