Replace spring-ai-client-chat dependency with spring-ai-model in model implementations
and memory repositories, and with spring-ai-commons in document readers. This change
improves the dependency structure by having components depend on the appropriate
abstraction level.
Additional changes:
- Add slf4j-api dependency to pdf-reader and spring-ai-retry
- Move spring-ai-client-chat to test scope in spring-ai-ollama
- Fix XML formatting in some pom.xml files
Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
* Remove use of Document.getContext method from spring-ai-core, use getText
* Remove deprecated ChatOptionsBuilder class
* Remove deprecated FunctionCallingOptionsBuilder class
The Document class previously allowed multiple media entries while also having a
text field, leading to ambiguity in content handling. This change enforces a
clear separation between text and media documents to prevent content type
confusion and simplify document processing.
A Document now must contain either text content or a single media entry, but
never both. This aligns with the class's primary use in ETL pipelines where
clear content type boundaries are essential for proper embedding generation and
vector database storage.
Additional architectural changes:
- Document now implements a cleaner API by removing deprecated methods
- Removed MediaContent interface implementation from Document class
- Document.getMedia() now returns a single Media object instead of Collection
- Removed EMPTY_TEXT constant in favor of proper null handling
- Constructor signatures simplified and streamlined
- Builder pattern improved to enforce single content type constraint
The breaking changes include:
- Media is now a single entry instead of a collection
- Content field renamed to text for clarity
- Removed support for mixed content types
- Simplified builder API to prevent ambiguous construction
Prefer using text-related methods over deprecated content methods to
better reflect the actual content type being handled and improve API clarity.
Document
* Introduced “score” attribute in Document API. It stores the similarity score.
* Consolidate “distance” metadata for Documents. It stores the distance measurement.
* Adopted prefix-less naming convention in Document.Builder and deprecated old methods.
* Deprecated the many overloaded Document constructors in favour of Document.Builder.
Vector Stores
* Every vector store implementation now configures a “score” attribute with the similarity score of the Document embedding. It also includes the “distance” metadata with the distance measurement.
* Fixed error in Elasticsearch where distance and similarity were mixed up.
* Added missing integration tests for SimpleVectorStore.
* The Azure Vector Store and HanaDB Vector Store do not include those measurements because the product documentation do not include information about how the similarity score is returned, and without access to the cloud products I could not verify that via debugging.
* Improved tests to actually assert the result of the similarity search based on the returned score.
Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
- Disable project-wide Checkstyle checks to unblock development
- Add documentation for enabling Checkstyle locally
- Fix remaining checkstyle violations in current codebase
Fixes#1669
This commit introduces a new Markdown document reader with several
key features and improvements:
* Add support for text with various formatting elements
* Implement handling for horizontal rules and hard line breaks
* Add functionality for inline and block code sections
* Incorporate blockquote handling
* Support ordered and unordered lists
* Introduce additional metadata capabilities
* Include JavaDocs
Update ETL documentation to reflect these new features and usage.
Fixes#105