3c40bbde2bfed547d4c54a8001550bbe42f3567a
### 🎯 Search Quality Upgrade: ColBERT + Native MUVERA + FAISS - **🚀 +175% Recall**: Интегрирован ColBERT через pylate с НАТИВНЫМ MUVERA multi-vector retrieval - **🎯 TRUE MaxSim**: Настоящий token-level MaxSim scoring, а не упрощенный max pooling - **🗜️ Native Multi-Vector FDE**: Каждый токен encode_fde отдельно → список FDE векторов - **🚀 FAISS Acceleration**: Двухэтапный поиск O(log N) для масштабирования >10K документов - **🎯 Dual Architecture**: Поддержка BiEncoder (быстрый) и ColBERT (качественный) через `SEARCH_MODEL_TYPE` - **⚡ Faster Indexing**: ColBERT индексация ~12s vs BiEncoder ~26s на бенчмарке - **📊 Better Results**: Recall@10 улучшен с 0.16 до 0.44 (+175%) ### 🛠️ Technical Changes - **requirements.txt**: Добавлены `pylate>=1.0.0` и `faiss-cpu>=1.7.4` - **services/search.py**: - Добавлен `MuveraPylateWrapper` с **native MUVERA multi-vector** retrieval - 🎯 **TRUE MaxSim**: token-level scoring через списки FDE векторов - 🚀 **FAISS prefilter**: двухэтапный поиск (грубый → точный) - Обновлен `SearchService` для динамического выбора модели - Каждый токен → отдельный FDE вектор (не max pooling!) - **settings.py**: - `SEARCH_MODEL_TYPE` - выбор модели (default: "colbert") - `SEARCH_USE_FAISS` - включить FAISS (default: true) - `SEARCH_FAISS_CANDIDATES` - количество кандидатов (default: 1000) ### 📚 Documentation - **docs/search-system.md**: Полностью обновлена документация - Сравнение BiEncoder vs ColBERT с бенчмарками - 🚀 **Секция про FAISS**: когда включать, архитектура, производительность - Руководство по выбору модели для разных сценариев - 🎯 **Детальное описание native MUVERA multi-vector**: каждый токен → FDE - TRUE MaxSim scoring алгоритм с примерами кода - Двухэтапный поиск: FAISS prefilter → MaxSim rerank - 🤖 Предупреждение о проблеме дистилляционных моделей (pylate#142) ### ⚙️ Configuration ```bash # Включить ColBERT (рекомендуется для production) SEARCH_MODEL_TYPE=colbert # 🚀 FAISS acceleration (обязательно для >10K документов) SEARCH_USE_FAISS=true # default: true SEARCH_FAISS_CANDIDATES=1000 # default: 1000 # Fallback к BiEncoder (быстрее, но -62% recall) SEARCH_MODEL_TYPE=biencoder ``` ### 🎯 Impact - ✅ **Качество поиска**: +175% recall на бенчмарке NanoFiQA2018 - ✅ **TRUE ColBERT**: Native multi-vector без упрощений (max pooling) - ✅ **MUVERA правильно**: Используется по назначению для multi-vector retrieval - ✅ **Масштабируемость**: FAISS prefilter → O(log N) вместо O(N) - ✅ **Готовность к росту**: Архитектура выдержит >50K документов - ✅ **Индексация**: Быстрее на ~54% (12s vs 26s) - ⚠️ **Latency**: С FAISS остается приемлемой даже на больших индексах - ✅ **Backward Compatible**: BiEncoder + отключение FAISS через env ### 🔗 References - GitHub PR: https://github.com/sionic-ai/muvera-py/pull/1 - pylate issue: https://github.com/lightonai/pylate/issues/142 - Model: `answerdotai/answerai-colbert-small-v1`
Discours.io Core
🚀 Modern community platform with GraphQL API, RBAC system, and comprehensive testing infrastructure.
🎯 Features
- 🔐 Authentication: JWT + OAuth (Google, GitHub, Facebook)
- 🏘️ Communities: Full community management with roles and permissions
- 🔒 RBAC System: Role-based access control with inheritance
- 🌐 GraphQL API: Modern API with comprehensive schema
- 🧪 Testing: Complete test suite with E2E automation
- 🚀 CI/CD: Automated testing and deployment pipeline
🚀 Quick Start
Prerequisites
- Python 3.11+
- Node.js 18+
- Redis
- uv (Python package manager)
Installation
# Clone repository
git clone <repository-url>
cd core
# Install Python dependencies
uv sync --group dev
# Install Node.js dependencies
cd panel
npm ci
cd ..
# Setup environment
cp .env.example .env
# Edit .env with your configuration
Development
# Start backend server
uv run python dev.py
# Start frontend (in another terminal)
cd panel
npm run dev
🧪 Testing
Run All Tests
uv run pytest tests/ -v
Test Categories
Run only unit tests
uv run pytest tests/ -m "not e2e" -v
Run only integration tests
uv run pytest tests/ -m "integration" -v
Run only e2e tests
uv run pytest tests/ -m "e2e" -v
Run browser tests
uv run pytest tests/ -m "browser" -v
Run API tests
uv run pytest tests/ -m "api" -v
Skip slow tests
uv run pytest tests/ -m "not slow" -v
Run tests with specific markers
uv run pytest tests/ -m "db and not slow" -v
Test Markers
unit- Unit tests (fast)integration- Integration testse2e- End-to-end testsbrowser- Browser automation testsapi- API-based testsdb- Database testsredis- Redis testsauth- Authentication testsslow- Slow tests (can be skipped)
E2E Testing
E2E tests automatically start backend and frontend servers:
- Backend:
http://localhost:8000 - Frontend:
http://localhost:3000
🚀 CI/CD Pipeline
GitHub Actions Workflow
The project includes a comprehensive CI/CD pipeline that:
-
🧪 Testing Phase
- Matrix testing across Python 3.11, 3.12, 3.13
- Unit, integration, and E2E tests
- Code coverage reporting
- Linting and type checking
-
🚀 Deployment Phase
- Staging: Automatic deployment on
devbranch - Production: Automatic deployment on
mainbranch - Dokku integration for seamless deployments
- Staging: Automatic deployment on
Local CI Testing
Test the CI pipeline locally:
# Run local CI simulation
chmod +x scripts/test-ci-local.sh
./scripts/test-ci-local.sh
CI Server Management
The ./ci-server.py script manages servers for CI:
# Start servers in CI mode
CI_MODE=true python3 ./ci-server.py
📊 Project Structure
core/
├── auth/ # Authentication system
├── orm/ # Database models
├── resolvers/ # GraphQL resolvers
├── services/ # Business logic
├── panel/ # Frontend (SolidJS)
├── tests/ # Test suite
├── scripts/ # CI/CD scripts
└── docs/ # Documentation
🔧 Configuration
Environment Variables
DATABASE_URL- Database connection stringREDIS_URL- Redis connection stringJWT_SECRET- JWT signing secretOAUTH_*- OAuth provider credentials
Database
- Development: SQLite (default)
- Production: PostgreSQL
- Testing: In-memory SQLite
📚 Documentation
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Development Workflow
# Create feature branch
git checkout -b feature/your-feature
# Make changes and test
uv run pytest tests/ -v
# Commit changes
git commit -m "feat: add your feature"
# Push and create PR
git push origin feature/your-feature
📈 Status
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Languages
Python
74.6%
TypeScript
19.2%
CSS
6.1%