Untone 3c40bbde2b 0.9.29] - 2025-10-08
### 🎯 Search Quality Upgrade: ColBERT + Native MUVERA + FAISS

- **🚀 +175% Recall**: Интегрирован ColBERT через pylate с НАТИВНЫМ MUVERA multi-vector retrieval
- **🎯 TRUE MaxSim**: Настоящий token-level MaxSim scoring, а не упрощенный max pooling
- **🗜️ Native Multi-Vector FDE**: Каждый токен encode_fde отдельно → список FDE векторов
- **🚀 FAISS Acceleration**: Двухэтапный поиск O(log N) для масштабирования >10K документов
- **🎯 Dual Architecture**: Поддержка BiEncoder (быстрый) и ColBERT (качественный) через `SEARCH_MODEL_TYPE`
- ** Faster Indexing**: ColBERT индексация ~12s vs BiEncoder ~26s на бенчмарке
- **📊 Better Results**: Recall@10 улучшен с 0.16 до 0.44 (+175%)

### 🛠️ Technical Changes

- **requirements.txt**: Добавлены `pylate>=1.0.0` и `faiss-cpu>=1.7.4`
- **services/search.py**:
  - Добавлен `MuveraPylateWrapper` с **native MUVERA multi-vector** retrieval
  - 🎯 **TRUE MaxSim**: token-level scoring через списки FDE векторов
  - 🚀 **FAISS prefilter**: двухэтапный поиск (грубый → точный)
  - Обновлен `SearchService` для динамического выбора модели
  - Каждый токен → отдельный FDE вектор (не max pooling!)
- **settings.py**:
  - `SEARCH_MODEL_TYPE` - выбор модели (default: "colbert")
  - `SEARCH_USE_FAISS` - включить FAISS (default: true)
  - `SEARCH_FAISS_CANDIDATES` - количество кандидатов (default: 1000)

### 📚 Documentation

- **docs/search-system.md**: Полностью обновлена документация
  - Сравнение BiEncoder vs ColBERT с бенчмарками
  - 🚀 **Секция про FAISS**: когда включать, архитектура, производительность
  - Руководство по выбору модели для разных сценариев
  - 🎯 **Детальное описание native MUVERA multi-vector**: каждый токен → FDE
  - TRUE MaxSim scoring алгоритм с примерами кода
  - Двухэтапный поиск: FAISS prefilter → MaxSim rerank
  - 🤖 Предупреждение о проблеме дистилляционных моделей (pylate#142)

### ⚙️ Configuration

```bash
# Включить ColBERT (рекомендуется для production)
SEARCH_MODEL_TYPE=colbert

# 🚀 FAISS acceleration (обязательно для >10K документов)
SEARCH_USE_FAISS=true              # default: true
SEARCH_FAISS_CANDIDATES=1000       # default: 1000

# Fallback к BiEncoder (быстрее, но -62% recall)
SEARCH_MODEL_TYPE=biencoder
```

### 🎯 Impact

-  **Качество поиска**: +175% recall на бенчмарке NanoFiQA2018
-  **TRUE ColBERT**: Native multi-vector без упрощений (max pooling)
-  **MUVERA правильно**: Используется по назначению для multi-vector retrieval
-  **Масштабируемость**: FAISS prefilter → O(log N) вместо O(N)
-  **Готовность к росту**: Архитектура выдержит >50K документов
-  **Индексация**: Быстрее на ~54% (12s vs 26s)
- ⚠️ **Latency**: С FAISS остается приемлемой даже на больших индексах
-  **Backward Compatible**: BiEncoder + отключение FAISS через env

### 🔗 References

- GitHub PR: https://github.com/sionic-ai/muvera-py/pull/1
- pylate issue: https://github.com/lightonai/pylate/issues/142
- Model: `answerdotai/answerai-colbert-small-v1`
2025-10-09 01:15:19 +03:00
2025-09-03 13:01:38 +03:00
2025-10-09 01:15:19 +03:00
2025-08-30 22:20:58 +03:00
2025-10-09 01:15:19 +03:00
2025-09-01 16:29:50 +03:00
2025-09-29 16:08:58 +03:00
2025-09-16 12:52:14 +03:00
2025-09-29 01:00:18 +03:00
2025-08-31 22:42:21 +03:00
2025-10-09 01:15:19 +03:00
2025-09-21 14:23:53 +03:00
2025-09-29 08:53:39 +03:00
2025-07-31 18:55:59 +03:00
..
2024-11-14 14:00:33 +03:00
2025-09-10 11:16:53 +03:00
2025-09-25 09:40:12 +03:00
2024-02-21 23:12:47 +03:00
2025-09-16 11:49:24 +03:00
2025-10-09 01:15:19 +03:00
2025-07-02 22:30:21 +03:00
2025-08-17 16:33:54 +03:00
2025-10-09 01:15:19 +03:00
2025-07-02 22:30:21 +03:00
2025-06-19 11:28:48 +03:00
2025-10-09 01:15:19 +03:00
2025-10-09 01:15:19 +03:00
2025-10-09 01:15:19 +03:00
2025-08-17 17:56:31 +03:00
2025-10-09 01:15:19 +03:00
2025-10-09 01:15:19 +03:00
2025-07-02 22:30:21 +03:00
2025-10-09 01:15:19 +03:00
2025-07-25 12:26:31 +03:00

Discours.io Core

🚀 Modern community platform with GraphQL API, RBAC system, and comprehensive testing infrastructure.

🎯 Features

  • 🔐 Authentication: JWT + OAuth (Google, GitHub, Facebook)
  • 🏘️ Communities: Full community management with roles and permissions
  • 🔒 RBAC System: Role-based access control with inheritance
  • 🌐 GraphQL API: Modern API with comprehensive schema
  • 🧪 Testing: Complete test suite with E2E automation
  • 🚀 CI/CD: Automated testing and deployment pipeline

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Redis
  • uv (Python package manager)

Installation

# Clone repository
git clone <repository-url>
cd core

# Install Python dependencies
uv sync --group dev

# Install Node.js dependencies
cd panel
npm ci
cd ..

# Setup environment
cp .env.example .env
# Edit .env with your configuration

Development

# Start backend server
uv run python dev.py

# Start frontend (in another terminal)
cd panel
npm run dev

🧪 Testing

Run All Tests

uv run pytest tests/ -v

Test Categories

Run only unit tests

uv run pytest tests/ -m "not e2e" -v

Run only integration tests

uv run pytest tests/ -m "integration" -v

Run only e2e tests

uv run pytest tests/ -m "e2e" -v

Run browser tests

uv run pytest tests/ -m "browser" -v

Run API tests

uv run pytest tests/ -m "api" -v

Skip slow tests

uv run pytest tests/ -m "not slow" -v

Run tests with specific markers

uv run pytest tests/ -m "db and not slow" -v

Test Markers

  • unit - Unit tests (fast)
  • integration - Integration tests
  • e2e - End-to-end tests
  • browser - Browser automation tests
  • api - API-based tests
  • db - Database tests
  • redis - Redis tests
  • auth - Authentication tests
  • slow - Slow tests (can be skipped)

E2E Testing

E2E tests automatically start backend and frontend servers:

  • Backend: http://localhost:8000
  • Frontend: http://localhost:3000

🚀 CI/CD Pipeline

GitHub Actions Workflow

The project includes a comprehensive CI/CD pipeline that:

  1. 🧪 Testing Phase

    • Matrix testing across Python 3.11, 3.12, 3.13
    • Unit, integration, and E2E tests
    • Code coverage reporting
    • Linting and type checking
  2. 🚀 Deployment Phase

    • Staging: Automatic deployment on dev branch
    • Production: Automatic deployment on main branch
    • Dokku integration for seamless deployments

Local CI Testing

Test the CI pipeline locally:

# Run local CI simulation
chmod +x scripts/test-ci-local.sh
./scripts/test-ci-local.sh

CI Server Management

The ./ci-server.py script manages servers for CI:

# Start servers in CI mode
CI_MODE=true python3 ./ci-server.py

📊 Project Structure

core/
├── auth/           # Authentication system
├── orm/            # Database models
├── resolvers/      # GraphQL resolvers
├── services/       # Business logic
├── panel/          # Frontend (SolidJS)
├── tests/          # Test suite
├── scripts/        # CI/CD scripts
└── docs/           # Documentation

🔧 Configuration

Environment Variables

  • DATABASE_URL - Database connection string
  • REDIS_URL - Redis connection string
  • JWT_SECRET - JWT signing secret
  • OAUTH_* - OAuth provider credentials

Database

  • Development: SQLite (default)
  • Production: PostgreSQL
  • Testing: In-memory SQLite

📚 Documentation

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

Development Workflow

# Create feature branch
git checkout -b feature/your-feature

# Make changes and test
uv run pytest tests/ -v

# Commit changes
git commit -m "feat: add your feature"

# Push and create PR
git push origin feature/your-feature

📈 Status

Tests Coverage Python Node.js

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Description
ядро платформы
Readme MIT 12 MiB
Languages
Python 74.6%
TypeScript 19.2%
CSS 6.1%