Go to file

Untone 3c40bbde2b 0.9.29] - 2025-10-08

### 🎯 Search Quality Upgrade: ColBERT + Native MUVERA + FAISS

- **🚀 +175% Recall**: Интегрирован ColBERT через pylate с НАТИВНЫМ MUVERA multi-vector retrieval
- **🎯 TRUE MaxSim**: Настоящий token-level MaxSim scoring, а не упрощенный max pooling
- **🗜️ Native Multi-Vector FDE**: Каждый токен encode_fde отдельно → список FDE векторов
- **🚀 FAISS Acceleration**: Двухэтапный поиск O(log N) для масштабирования >10K документов
- **🎯 Dual Architecture**: Поддержка BiEncoder (быстрый) и ColBERT (качественный) через `SEARCH_MODEL_TYPE`
- **⚡ Faster Indexing**: ColBERT индексация ~12s vs BiEncoder ~26s на бенчмарке
- **📊 Better Results**: Recall@10 улучшен с 0.16 до 0.44 (+175%)

### 🛠️ Technical Changes

- **requirements.txt**: Добавлены `pylate>=1.0.0` и `faiss-cpu>=1.7.4`
- **services/search.py**:
  - Добавлен `MuveraPylateWrapper` с **native MUVERA multi-vector** retrieval
  - 🎯 **TRUE MaxSim**: token-level scoring через списки FDE векторов
  - 🚀 **FAISS prefilter**: двухэтапный поиск (грубый → точный)
  - Обновлен `SearchService` для динамического выбора модели
  - Каждый токен → отдельный FDE вектор (не max pooling!)
- **settings.py**:
  - `SEARCH_MODEL_TYPE` - выбор модели (default: "colbert")
  - `SEARCH_USE_FAISS` - включить FAISS (default: true)
  - `SEARCH_FAISS_CANDIDATES` - количество кандидатов (default: 1000)

### 📚 Documentation

- **docs/search-system.md**: Полностью обновлена документация
  - Сравнение BiEncoder vs ColBERT с бенчмарками
  - 🚀 **Секция про FAISS**: когда включать, архитектура, производительность
  - Руководство по выбору модели для разных сценариев
  - 🎯 **Детальное описание native MUVERA multi-vector**: каждый токен → FDE
  - TRUE MaxSim scoring алгоритм с примерами кода
  - Двухэтапный поиск: FAISS prefilter → MaxSim rerank
  - 🤖 Предупреждение о проблеме дистилляционных моделей (pylate#142)

### ⚙️ Configuration

```bash
# Включить ColBERT (рекомендуется для production)
SEARCH_MODEL_TYPE=colbert

# 🚀 FAISS acceleration (обязательно для >10K документов)
SEARCH_USE_FAISS=true              # default: true
SEARCH_FAISS_CANDIDATES=1000       # default: 1000

# Fallback к BiEncoder (быстрее, но -62% recall)
SEARCH_MODEL_TYPE=biencoder
```

### 🎯 Impact

- ✅ **Качество поиска**: +175% recall на бенчмарке NanoFiQA2018
- ✅ **TRUE ColBERT**: Native multi-vector без упрощений (max pooling)
- ✅ **MUVERA правильно**: Используется по назначению для multi-vector retrieval
- ✅ **Масштабируемость**: FAISS prefilter → O(log N) вместо O(N)
- ✅ **Готовность к росту**: Архитектура выдержит >50K документов
- ✅ **Индексация**: Быстрее на ~54% (12s vs 26s)
- ⚠️ **Latency**: С FAISS остается приемлемой даже на больших индексах
- ✅ **Backward Compatible**: BiEncoder + отключение FAISS через env

### 🔗 References

- GitHub PR: https://github.com/sionic-ai/muvera-py/pull/1
- pylate issue: https://github.com/lightonai/pylate/issues/142
- Model: `answerdotai/answerai-colbert-small-v1`

2025-10-09 01:15:19 +03:00

.gitea/workflows

[0.9.28] - OAuth/Auth with httpOnly cookie

2025-09-28 12:22:37 +03:00

.github/workflows

cifix

2025-09-03 13:01:38 +03:00

auth

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

cache

[0.9.15] - 2025-08-30

2025-08-30 22:20:58 +03:00

docs

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

orm

rating-patch

2025-09-01 16:29:50 +03:00

panel

admin-auth

2025-09-29 16:08:58 +03:00

rbac

notifications-stats-todo

2025-09-16 12:52:14 +03:00

resolvers

cleaner-log3

2025-09-29 01:00:18 +03:00

schema

author-stats-fix

2025-08-31 22:42:21 +03:00

services

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

storage

[0.9.21] - 2025-09-21

2025-09-21 14:23:53 +03:00

tests

[0.9.28] - OAuth/Auth with httpOnly cookie

2025-09-28 12:22:37 +03:00

utils

oauth-fixing

2025-09-29 08:53:39 +03:00

__init__.py

tests-passed

2025-07-31 18:55:59 +03:00

.cursorignore

2024-11-14 14:00:33 +03:00

.dockerignore

dockerfix2

2025-09-10 11:16:53 +03:00

.editorconfig

migration, auth, refactoring, formatting

2022-09-17 21:12:14 +03:00

.gitignore

tests-upgrade

2025-09-25 09:40:12 +03:00

app.json

1sec-delay

2024-02-21 23:12:47 +03:00

biome.json

panel minor fixes

2025-09-16 11:49:24 +03:00

CHANGELOG.md

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

codegen.ts

[0.9.28] - OAuth/Auth with httpOnly cookie

2025-09-28 12:22:37 +03:00

CONTRIBUTING.md

Squashed new RBAC

2025-07-02 22:30:21 +03:00

dev.py

circular-fix

2025-08-17 16:33:54 +03:00

Dockerfile

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

env.d.ts

Squashed new RBAC

2025-07-02 22:30:21 +03:00

index.html

Improve topic sorting: add popular sorting by publications and authors count

2025-06-02 02:56:11 +03:00

LICENSE

docs+featured/unfeatured-upgrade

2025-06-19 11:28:48 +03:00

main.py

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

mypy.ini

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

package-lock.json

[0.9.28] - OAuth/Auth with httpOnly cookie

2025-09-28 12:22:37 +03:00

package.json

[0.9.28] - OAuth/Auth with httpOnly cookie

2025-09-28 12:22:37 +03:00

pylanceconfig.json

0.5.8-panel-upgrade-community-crud-fix

2025-06-30 21:25:26 +03:00

pyproject.toml

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

README.md

refactored

2025-08-17 17:56:31 +03:00

requirements.dev.txt

feat: migrate to uv package manager

2025-08-12 13:12:39 +03:00

requirements.txt

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

settings.py

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

tsconfig.json

Squashed new RBAC

2025-07-02 22:30:21 +03:00

uv.lock

0.9.29] - 2025-10-08

2025-10-09 01:15:19 +03:00

vite.config.mts

rolespicker-fix

2025-07-25 12:26:31 +03:00

README.md

Discours.io Core

🚀 Modern community platform with GraphQL API, RBAC system, and comprehensive testing infrastructure.

🎯 Features

🔐 Authentication: JWT + OAuth (Google, GitHub, Facebook)
🏘️ Communities: Full community management with roles and permissions
🔒 RBAC System: Role-based access control with inheritance
🌐 GraphQL API: Modern API with comprehensive schema
🧪 Testing: Complete test suite with E2E automation
🚀 CI/CD: Automated testing and deployment pipeline

🚀 Quick Start

Prerequisites

Python 3.11+
Node.js 18+
Redis
uv (Python package manager)

Installation

# Clone repository
git clone <repository-url>
cd core

# Install Python dependencies
uv sync --group dev

# Install Node.js dependencies
cd panel
npm ci
cd ..

# Setup environment
cp .env.example .env
# Edit .env with your configuration

Development

# Start backend server
uv run python dev.py

# Start frontend (in another terminal)
cd panel
npm run dev

🧪 Testing

Run All Tests

uv run pytest tests/ -v

Test Categories

Run only unit tests

uv run pytest tests/ -m "not e2e" -v

Run only integration tests

uv run pytest tests/ -m "integration" -v

Run only e2e tests

uv run pytest tests/ -m "e2e" -v

Run browser tests

uv run pytest tests/ -m "browser" -v

Run API tests

uv run pytest tests/ -m "api" -v

Skip slow tests

uv run pytest tests/ -m "not slow" -v

Run tests with specific markers

uv run pytest tests/ -m "db and not slow" -v

Test Markers

unit - Unit tests (fast)
integration - Integration tests
e2e - End-to-end tests
browser - Browser automation tests
api - API-based tests
db - Database tests
redis - Redis tests
auth - Authentication tests
slow - Slow tests (can be skipped)

E2E Testing

E2E tests automatically start backend and frontend servers:

Backend: http://localhost:8000
Frontend: http://localhost:3000

🚀 CI/CD Pipeline

GitHub Actions Workflow

The project includes a comprehensive CI/CD pipeline that:

🧪 Testing Phase
- Matrix testing across Python 3.11, 3.12, 3.13
- Unit, integration, and E2E tests
- Code coverage reporting
- Linting and type checking
🚀 Deployment Phase
- Staging: Automatic deployment on dev branch
- Production: Automatic deployment on main branch
- Dokku integration for seamless deployments

Local CI Testing

Test the CI pipeline locally:

# Run local CI simulation
chmod +x scripts/test-ci-local.sh
./scripts/test-ci-local.sh

CI Server Management

The ./ci-server.py script manages servers for CI:

# Start servers in CI mode
CI_MODE=true python3 ./ci-server.py

📊 Project Structure

core/
├── auth/           # Authentication system
├── orm/            # Database models
├── resolvers/      # GraphQL resolvers
├── services/       # Business logic
├── panel/          # Frontend (SolidJS)
├── tests/          # Test suite
├── scripts/        # CI/CD scripts
└── docs/           # Documentation

🔧 Configuration

Environment Variables

DATABASE_URL - Database connection string
REDIS_URL - Redis connection string
JWT_SECRET - JWT signing secret
OAUTH_* - OAuth provider credentials

Database

Development: SQLite (default)
Production: PostgreSQL
Testing: In-memory SQLite

📚 Documentation

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Development Workflow

# Create feature branch
git checkout -b feature/your-feature

# Make changes and test
uv run pytest tests/ -v

# Commit changes
git commit -m "feat: add your feature"

# Push and create PR
git push origin feature/your-feature

📈 Status

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.