Skip to main content
Sergei Grebnov
Senior Software Engineer at Spice AI
View all authors

Spice v0.19.3-beta (Oct 28, 2024)

ยท 4 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v0.19.3-beta ๐Ÿ“ˆ

Spice v0.19.3-beta improves the performance and stability of data connectors and accelerators, including faster queries across multiple federated sources by optimizing how filters are applied. Anthropic has also been added as a LLM model provider.

Highlights in v0.19.3โ€‹

DataFusion Fixes: Resolved bugs in DataFusion and DataFusion Table Providers, expanding TPC-DS coverage and correctness.

GitHub Data Connector Beta Milestone: The GitHub Data Connector has graduated to Beta after extensive testing, stability, and performance improvements.

Anthropic Models Provider: Anthropic has been added as an LLM provider, including support for streaming.

Example spicepod.yml:

models:
- from: anthropic:claude-3-5-sonnet-20240620
name: claude_3_5_sonnet
params:
anthropic_api_key: ${ secrets:SPICE_ANTHROPIC_API_KEY }

Spice v0.18.1-beta (Sep 23, 2024)

ยท 6 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v0.18.1-beta. ๐ŸŽ๏ธ

The v0.18.1-beta release continues to improve runtime performance and reliability. Performance for accelerated queries joining multiple datasets has been significantly improved with join push-down support. GraphQL, MySQL, and SharePoint data connectors have better reliability and error handling, and a new Microsoft SQL Server data connector has been introduced. Task History now has fine-grained configuration, including the ability to disable the feature entirely. A new spice search CLI command has been added, enabling development-time embeddings-based searches across datasets.

Highlights in v0.18.1-betaโ€‹

Join push-down for accelerations: Queries to the same accelerator will now push-down joins, significantly improving acceleration performance for queries joining multiple tables.

Microsoft SQL Server Data Connector: Use from: mssql: to access and accelerate Microsoft SQL Server datasets.

Example spicepod.yml:

datasets:
- from: mssql:path.to.my_dataset
name: my_dataset
params:
mssql_connection_string: ${secrets:mssql_connection_string}

See the Microsoft SQL Server Data Connector documentation.

Task History: Task History can be configured in the spicepod.yml, including the ability to include, or truncate outputs such as the results of a SQL query.

Example spicepod.yml:

runtime:
task_history:
enabled: true
captured_output: truncated
retention_period: 8h
retention_check_interval: 15m

See the Task History Spicepod reference for more information on possible values and behaviors.

Search CLI Command Use the spice search CLI command to perform embeddings-based searches across search configure datasets. Note: Search requires the ai feature to be installed.

Refresh on File Changes: File Data Connector data refreshes can be configured to be triggered when the source file is modified through a file system watcher. Enable the watcher by adding file_watcher: enabled to the acceleration parameters.

Example spicepod.yml:

datasets:
- from: file://path/to/my_file.csv
name: my_file
acceleration:
enabled: true
refresh_mode: full
params:
file_watcher: enabled

Spice v0.18-beta (Sep 16, 2024)

ยท 6 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

Announcing the release of Spice v0.18-beta.

The v0.18.0-beta release adds new Sharepoint and File data connectors, introduces AWS Identity and Access Management (IAM) support for the S3 Data Connector, improves performance of the GitHub connector, and increases the overall reliability of all data accelerators. The /ready API endpoint was enhanced to report as ready only when all components, including loaded data, have successfully reported readiness.

Highlights in v0.18.0-betaโ€‹

Sharepoint Data Connector: Use from: sharepoint: to access and accelerate documents stored in Microsoft 365 OneDrive for Business (Sharepoint). The CLI also includes a new spice login sharepoint to aid in local development and testing.

Example spicepod.yml:

datasets:
- from: sharepoint:drive:Documents/path:/important_documents/
name: important_documents
params:
sharepoint_client_id: ${secrets:SPICE_SHAREPOINT_CLIENT_ID}
sharepoint_tenant_id: ${secrets:SPICE_SHAREPOINT_TENANT_ID}
sharepoint_client_secret: ${secrets:SPICE_SHAREPOINT_CLIENT_SECRET}

See the Sharepoint Data Connector documentation.

AWS Identity and Access Management (IAM) for S3: A new s3_auth parameter for the s3 data connector to configure the authentication method to use when connecting to S3. Supported values are public, key, and iam_role. Use s3_auth: iam_role to assume the instance IAM role.

Example spicepod.yml:

datasets:
- from: s3://my-bucket
name: bucket
params:
s3_auth: iam_role # Assume IAM role of instance

See the S3 Data Connector documentation.

File Data Connector Use from: file: to query files stored by locally accessible filesystems.

Example spicepod.yml:

datasets:
- from: file://path/to/customer.parquet
name: customer
params:
file_format: parquet

See the File Data Connector documentation.

Improved /ready Api Now includes the initial data load for accelerated datasets in addition to component readiness to ensure readiness is only reported when data has loaded and can be successfully queried.

Spice v0.14-alpha (June 17, 2024)

ยท 4 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

The v0.14-alpha release focuses on enhancing accelerated dataset performance and data integrity, with support for configuring primary keys and indexes. Additionally, the GraphQL data connector been introduced, along with improved dataset registration and loading error information.

Highlightsโ€‹

  • Accelerated Datasets: Ensure data integrity using primary key and unique index constraints. Configure conflict handling to either upsert new data or drop it. Create indexes on frequently filtered columns for faster queries on larger datasets.

  • GraphQL Data Connector: Initial support for using GraphQL as a data source.

Example Spicepod showing how to use primary keys and indexes with accelerated datasets:

datasets:
- from: eth.blocks
name: blocks
acceleration:
engine: duckdb # Use DuckDB acceleration engine
primary_key: '(hash, timestamp)'
indexes:
number: enabled # same as `CREATE INDEX ON blocks (number);`
'(number, hash)': unique # same as `CREATE UNIQUE INDEX ON blocks (number, hash);`
on_conflict:
'(hash, number)': drop # possible values: drop (default), upsert
'(hash, timestamp)': upsert

Primary Keys, constraints, and indexes are currently supported when using SQLite, DuckDB, and PostgreSQL acceleration engines.

Learn more with the indexing quickstart and the primary key sample.

Read the Local Acceleration documentation.

Spice v0.12.2-alpha (May 13, 2024)

ยท 4 min read
Sergei Grebnov
Senior Software Engineer at Spice AI

The v0.12.2-alpha release introduces data streaming and key-pair authentication for the Snowflake data connector, enables general append mode data refreshes for time-series data, improves connectivity error messages, adds nested folders support for the S3 data connector, and exposes nodeSelector and affinity keys in the Helm chart for better Kubernetes management.

Highlightsโ€‹

  • Improved Connectivity Error Messages: Error messages provide clearer, actionable guidance for misconfigured settings or unreachable data connectors.

  • Snowflake Data Connector Improvements: Enables data streaming by default and adds support for key-pair authentication in addition to passwords.

  • API for Refresh SQL Updates: Update dataset Refresh SQL via API.

  • Append Data Refresh: Append mode data refreshes for time-series data are now supported for all data connectors. Specify a dataset time_column with refresh_mode: append to only fetch data more recent than the latest local data.

  • Docker Image Update: The spiceai/spiceai:latest Docker image now includes the ODBC data connector. For a smaller footprint, use spiceai/spiceai:latest-slim.

  • Helm Chart Improvements: nodeSelector and affinity keys are now supported in the Helm chart for improved Kubernetes deployment management.