discours.io/core - core - Discours.io Git

Author	SHA1	Message	Date
Untone	3c40bbde2b	0.9.29] - 2025-10-08 ### 🎯 Search Quality Upgrade: ColBERT + Native MUVERA + FAISS - 🚀 +175% Recall: Интегрирован ColBERT через pylate с НАТИВНЫМ MUVERA multi-vector retrieval - 🎯 TRUE MaxSim: Настоящий token-level MaxSim scoring, а не упрощенный max pooling - 🗜️ Native Multi-Vector FDE: Каждый токен encode_fde отдельно → список FDE векторов - 🚀 FAISS Acceleration: Двухэтапный поиск O(log N) для масштабирования >10K документов - 🎯 Dual Architecture: Поддержка BiEncoder (быстрый) и ColBERT (качественный) через `SEARCH_MODEL_TYPE` - ⚡ Faster Indexing: ColBERT индексация ~12s vs BiEncoder ~26s на бенчмарке - 📊 Better Results: Recall@10 улучшен с 0.16 до 0.44 (+175%) ### 🛠️ Technical Changes - requirements.txt: Добавлены `pylate>=1.0.0` и `faiss-cpu>=1.7.4` - services/search.py: - Добавлен `MuveraPylateWrapper` с native MUVERA multi-vector retrieval - 🎯 TRUE MaxSim: token-level scoring через списки FDE векторов - 🚀 FAISS prefilter: двухэтапный поиск (грубый → точный) - Обновлен `SearchService` для динамического выбора модели - Каждый токен → отдельный FDE вектор (не max pooling!) - settings.py: - `SEARCH_MODEL_TYPE` - выбор модели (default: "colbert") - `SEARCH_USE_FAISS` - включить FAISS (default: true) - `SEARCH_FAISS_CANDIDATES` - количество кандидатов (default: 1000) ### 📚 Documentation - docs/search-system.md: Полностью обновлена документация - Сравнение BiEncoder vs ColBERT с бенчмарками - 🚀 Секция про FAISS: когда включать, архитектура, производительность - Руководство по выбору модели для разных сценариев - 🎯 Детальное описание native MUVERA multi-vector: каждый токен → FDE - TRUE MaxSim scoring алгоритм с примерами кода - Двухэтапный поиск: FAISS prefilter → MaxSim rerank - 🤖 Предупреждение о проблеме дистилляционных моделей (pylate#142) ### ⚙️ Configuration ```bash # Включить ColBERT (рекомендуется для production) SEARCH_MODEL_TYPE=colbert # 🚀 FAISS acceleration (обязательно для >10K документов) SEARCH_USE_FAISS=true # default: true SEARCH_FAISS_CANDIDATES=1000 # default: 1000 # Fallback к BiEncoder (быстрее, но -62% recall) SEARCH_MODEL_TYPE=biencoder ``` ### 🎯 Impact - ✅ Качество поиска: +175% recall на бенчмарке NanoFiQA2018 - ✅ TRUE ColBERT: Native multi-vector без упрощений (max pooling) - ✅ MUVERA правильно: Используется по назначению для multi-vector retrieval - ✅ Масштабируемость: FAISS prefilter → O(log N) вместо O(N) - ✅ Готовность к росту: Архитектура выдержит >50K документов - ✅ Индексация: Быстрее на ~54% (12s vs 26s) - ⚠️ Latency: С FAISS остается приемлемой даже на больших индексах - ✅ Backward Compatible: BiEncoder + отключение FAISS через env ### 🔗 References - GitHub PR: https://github.com/sionic-ai/muvera-py/pull/1 - pylate issue: https://github.com/lightonai/pylate/issues/142 - Model: `answerdotai/answerai-colbert-small-v1`	2025-10-09 01:15:19 +03:00
Untone	78bc110685	search-index-fix2 Some checks failed Deploy on push / deploy (push) Failing after 5m42s Details	2025-09-10 12:39:00 +03:00
Untone	6817fb6436	search-index-reload	2025-09-10 12:29:59 +03:00
Untone	b70901f8f7	## [0.9.19] - 2025-09-01 Some checks failed Deploy on push / deploy (push) Failing after 5m57s Details ### 🚀 ML Models Runtime Preloading - 🔧 models loading: Перенесена предзагрузка ML моделей из Docker build в runtime startup - Убрана предзагрузка из `Dockerfile` - модели теперь загружаются после монтирования `/dump` папки - Добавлена async функция `preload_models()` в `services/search.py` для фоновой загрузки - Интеграция предзагрузки в `lifespan` функцию `main.py` - Использование `asyncio.run_in_executor()` для неблокирующей загрузки моделей - Исправлена проблема с недоступностью `/dump` папки во время сборки Docker образа	2025-09-01 16:38:23 +03:00
Untone	9daade05c0	model-path-fix Some checks failed Deploy on push / deploy (push) Failing after 1m14s Details	2025-09-01 16:10:10 +03:00
Untone	a1e4d0d391	search-restore-2 Some checks failed Deploy on push / deploy (push) Failing after 9s Details	2025-09-01 15:19:05 +03:00
Untone	4489d25913	## [0.9.18] - 2025-01-09 Some checks failed Deploy on push / deploy (push) Failing after 1m34s Details ### 🔍 Search System Redis Storage - 💾 Redis-based vector index storage: Переключились обратно на Redis для хранения векторного индекса - Заменили файловое хранение в `/dump` на Redis ключи для надежности - Исправлена проблема с правами доступа на `/dump` папку на сервере - Векторный индекс теперь сохраняется по ключам `search_index:{name}:data` и `search_index:{name}:metadata` - 🛠️ Improved reliability: Убрали зависимость от файловой системы для критичных данных - ⚡ Better performance: Redis обеспечивает более быстрый доступ к индексу - 🔧 Technical changes: - Заменили `save_index_to_file()` на `save_index_to_redis()` - Заменили `load_index_from_file()` на `load_index_from_redis()` - Обновили автосохранение для использования Redis вместо файлов - Удалили неиспользуемые импорты (`gzip`, `pathlib`, `cast`)	2025-09-01 15:09:36 +03:00
Untone	db3dafa569	embedding-search Some checks failed Deploy on push / deploy (push) Failing after 22m28s Details	2025-08-31 19:20:43 +03:00
Untone	7325cdc5f5	[0.9.15] - 2025-08-30 All checks were successful Deploy on push / deploy (push) Successful in 5m42s Details ### 🔧 Fixed - 🧾 Database Table Creation: Унифицирован подход к созданию таблиц БД между продакшеном и тестами - Исправлена ошибка "no such table: author" в тестах - Обновлена функция `create_all_tables()` в `storage/schema.py` для использования стандартного SQLAlchemy подхода - Улучшены фикстуры тестов с принудительным импортом всех ORM моделей - Добавлена детальная диагностика создания таблиц в тестах - Добавлены fallback механизмы для создания таблиц в проблемных окружениях ### 🧪 Testing - Все RBAC тесты теперь проходят успешно - Исправлены фикстуры `test_engine`, `db_session` и `test_session_factory` - Добавлены функции `ensure_all_tables_exist()` и `ensure_all_models_imported()` для диагностики ### 📝 Technical Details - Заменен подход `create_table_if_not_exists()` на стандартный `Base.metadata.create_all()` - Улучшена обработка ошибок при создании таблиц - Добавлена проверка регистрации всех критических таблиц в metadata	2025-08-30 22:20:58 +03:00
Untone	98f625ec0d	index-metric All checks were successful Deploy on push / deploy (push) Successful in 5m45s Details	2025-08-30 21:20:01 +03:00
Untone	906c9bbdf4	search-index-metric	2025-08-30 21:18:48 +03:00
Untone	c9e1d9d878	muvera-index-fix Some checks failed Deploy on push / deploy (push) Has been cancelled Details	2025-08-30 20:41:13 +03:00
Untone	00a866876c	search-wrapper Some checks failed Deploy on push / deploy (push) Failing after 4m31s Details	2025-08-23 14:08:34 +03:00
Untone	b4f683a7cc	fmt Some checks failed Deploy on push / deploy (push) Failing after 36s Details	2025-08-23 10:47:52 +03:00
Untone	9a2b792f08	refactored Some checks failed Deploy on push / deploy (push) Failing after 6s Details	2025-08-17 17:56:31 +03:00
Untone	e78e12eeee	circular-fix Some checks failed Deploy on push / deploy (push) Failing after 17s Details	2025-08-17 16:33:54 +03:00
Untone	c80f3efc77	mypy-fixed Some checks failed Deploy on push / deploy (push) Failing after 5s Details	2025-07-31 19:27:58 +03:00
Untone	809bda2b56	search-fix, devstart-fix, cache-fix, logs-less Some checks failed Deploy on push / deploy (push) Failing after 5s Details	2025-07-31 19:13:23 +03:00
Untone	e7230ba63c	tests-passed	2025-07-31 18:55:59 +03:00
Untone	7585dae0ab	less search logs	2025-07-01 12:18:40 +03:00
Untone	8a5f4a2421	maintainance	2025-06-16 20:20:23 +03:00
Untone	0375939e73	hardcopy-search-service-code All checks were successful Deploy on push / deploy (push) Successful in 6s Details	2025-06-03 02:10:08 +03:00
Untone	1329aee1f1	search-combined All checks were successful Deploy on push / deploy (push) Successful in 6s Details	2025-06-03 02:00:44 +03:00
Untone	903065fdb3	search-debug Some checks failed Deploy on push / type-check (push) Failing after 6s Details Deploy on push / deploy (push) Has been skipped Details	2025-06-02 22:40:10 +03:00
Untone	21d28a0d8b	token-storage-refactored Some checks failed Deploy on push / type-check (push) Failing after 8s Details Deploy on push / deploy (push) Has been skipped Details	2025-06-02 21:50:58 +03:00
Untone	3327976586	Improve topic sorting: add popular sorting by publications and authors count	2025-06-02 02:56:11 +03:00
Untone	4070f4fcde	linted+fmt All checks were successful Deploy on push / deploy (push) Successful in 6s Details	2025-05-29 12:37:39 +03:00
Untone	ab39b534fe	auth fixes, search connected	2025-05-22 04:34:30 +03:00
Stepan Vladovskiy	c344fcee2d	refactoring(search.py): logs for search-combine and search-authors are equal All checks were successful Deploy on push / deploy (push) Successful in 6s Details	2025-05-02 18:28:06 -03:00
Stepan Vladovskiy	a1a61a6731	feat: follow same logic as search shouts for authors. Store them to Reddis cache + pagination All checks were successful Deploy on push / deploy (push) Successful in 41s Details	2025-05-02 18:17:05 -03:00
Stepan Vladovskiy	3782a9dffb	fix(search.py, author.py): small fixes for start. logger import fails All checks were successful Deploy on push / deploy (push) Successful in 40s Details	2025-04-29 17:50:51 -03:00
Stepan Vladovskiy	93c00b3dd1	feat(author.py):addresolver for searching authors by text All checks were successful Deploy on push / deploy (push) Successful in 1m15s Details	2025-04-29 17:45:37 -03:00
Stepan Vladovskiy	fac43e5997	refact(search,reader): withput any kind of sorting All checks were successful Deploy on push / deploy (push) Successful in 42s Details	2025-04-24 21:00:41 -03:00
Stepan Vladovskiy	e7facf8d87	style(search.py): with indexing message All checks were successful Deploy on push / deploy (push) Successful in 42s Details	2025-04-24 18:45:00 -03:00
Stepan Vladovskiy	3062a2b7de	refactor(search.py): with checking titles without bodies for not re indexing them every startup All checks were successful Deploy on push / deploy (push) Successful in 42s Details	2025-04-24 14:58:14 -03:00
Stepan Vladovskiy	c0406dbbf2	refac(search.py): without logger and rm dublicated def search-text All checks were successful Deploy on push / deploy (push) Successful in 44s Details	2025-04-24 14:18:14 -03:00
Stepan Vladovskiy	5425dbf832	refactor(search.py): simplify def search	2025-04-24 13:46:58 -03:00
Stepan Vladovskiy	a10db2d38a	feat(search.py): combined search on shouts tittles and bodys	2025-04-24 13:35:36 -03:00
Stepan Vladovskiy	11654dba68	feat: with three separate endpoints All checks were successful Deploy on push / deploy (push) Successful in 5s Details	2025-04-23 18:24:00 -03:00
Stepan Vladovskiy	4d965fb27b	feat(search.py): separate indexing of Shout Title, shout Body and Authors All checks were successful Deploy on push / deploy (push) Successful in 39s Details	2025-04-20 19:22:08 -03:00
Stepan Vladovskiy	106222b0e0	debug: without debug logging. clean All checks were successful Deploy on push / deploy (push) Successful in 1m27s Details	2025-04-07 11:41:48 -03:00
Stepan Vladovskiy	78326047bf	fix(reader.py): change sorting and answer on querys All checks were successful Deploy on push / deploy (push) Successful in 50s Details	2025-04-03 13:20:18 -03:00
Stepan Vladovskiy	bc4ec79240	fix(search.py): store all results in cash not only first offset All checks were successful Deploy on push / deploy (push) Successful in 52s Details	2025-04-03 13:10:53 -03:00
Stepan Vladovskiy	a0db5707c4	feat: add cash for storing searchresalts and hold them for working pagination. Now we are have offset for use on frontend All checks were successful Deploy on push / deploy (push) Successful in 51s Details	2025-04-01 16:01:09 -03:00
Stepan Vladovskiy	e405fb527b	refactor(search.py): moved to use one table docs for embdings and docs store All checks were successful Deploy on push / deploy (push) Successful in 50s Details	2025-03-25 16:42:44 -03:00
Stepan Vladovskiy	7f36f93d92	feat(search.py): detects both missing documents and null embeddings All checks were successful Deploy on push / deploy (push) Successful in 1m32s Details	2025-03-25 15:18:29 -03:00
Stepan Vladovskiy	f089a32394	debug(search.py): with more logs when check sync of indexing All checks were successful Deploy on push / deploy (push) Successful in 1m3s Details	2025-03-25 14:44:05 -03:00
Stepan Vladovskiy	1fd623a660	feat: with index sync endpoints configs All checks were successful Deploy on push / deploy (push) Successful in 56s Details	2025-03-25 13:31:45 -03:00
Stepan Vladovskiy	077cb46482	debug: server.py -> threds 1 , search.py -> add 3 times reconect All checks were successful Deploy on push / deploy (push) Successful in 49s Details	2025-03-24 20:16:07 -03:00
Stepan Vladovskiy	60a13a9097	refactor(search.py): moved initialization logic in search-txtai instance All checks were successful Deploy on push / deploy (push) Successful in 55s Details	2025-03-24 19:47:02 -03:00

1 2 3 4

197 Commits