Skip to content

TIKA-4703: Add Docker CI pipelines for tika-server and tika-grpc#2715

Merged
nddipiazza merged 4 commits intomainfrom
TIKA-4703-docker-ci
Apr 21, 2026
Merged

TIKA-4703: Add Docker CI pipelines for tika-server and tika-grpc#2715
nddipiazza merged 4 commits intomainfrom
TIKA-4703-docker-ci

Conversation

@nddipiazza
Copy link
Copy Markdown
Contributor

Summary

Moves Docker build infrastructure into the main tika repo so that Docker image releases are tied directly to Tika releases, eliminating the need for cross-repo coordination with tika-docker and tika-grpc-docker.

  • Snapshot workflow (main branch push): builds and pushes apache/tika, apache/tika-full, and apache/tika-grpc snapshot images to Docker Hub
  • Release workflow (version tag push): builds and pushes versioned + latest tags for all three images
  • tika-server Dockerfiles: copied from tika-docker repo (source of truth), plus new Dockerfile.snapshot variants that use the Maven assembly output instead of downloading from Apache mirrors
  • tika-grpc docker-build: Dockerfile, entrypoint script, and build context assembly script
  • TikaGrpcServer: now falls back to a bundled empty default-tika-config.json from classpath when no -c flag is provided, matching standard Java application conventions
  • Tested locally: all three images (minimal, full, grpc) build and start successfully

Required Setup

DOCKERHUB_USERNAME and DOCKERHUB_TOKEN secrets must be configured in the repo settings for the workflows to push images.

Test plan

  • tika-server minimal: HTTP 200 on port 9998, user 35002:35002
  • tika-server full: HTTP 200 on port 9998, user 35002:35002, ImageMagick verified
  • tika-grpc: gRPC server starts on port 9090, all plugins loaded, no config file required
  • Test Docker push to personal Docker Hub
  • Verify snapshot workflow triggers on main merge
  • Verify release workflow triggers on version tag

🤖 Generated with Claude Code

nddipiazza and others added 3 commits March 27, 2026 09:02
Move Docker build infrastructure into the main tika repo so that
Docker image releases are tied directly to Tika releases rather than
requiring cross-repo coordination with tika-docker/tika-grpc-docker.

Snapshot workflow (main branch push):
- Builds tika-server minimal and full images from Maven output
- Builds tika-grpc image from Maven output
- Pushes snapshot tags to Docker Hub (e.g. 4.0.0-SNAPSHOT)

Release workflow (version tag push):
- Builds tika-server minimal/full from Apache mirror JARs with GPG
  verification (multi-arch: amd64, arm64, arm/v7, s390x)
- Builds tika-grpc from Maven output (multi-arch: amd64, arm64)
- Pushes versioned + latest tags to Docker Hub

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- tika-server snapshot Dockerfiles: use assembly tgz (thin JAR + lib/)
  instead of the thin JAR alone, matching the 4.x packaging model
- tika-grpc: bundle default-tika-config.json so the server starts
  without requiring a config volume mount
- tika-grpc: pass -c, -p, and --plugin-roots as CLI args instead of
  system properties so TikaGrpcServer actually picks them up
- tika-grpc: default port is now 9090 (configurable via TIKA_GRPC_PORT)

Tested locally: all three images (minimal, full, grpc) build and start
successfully.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
TikaGrpcServer now falls back to a bundled default-tika-config.json
from the classpath when no -c flag is provided, matching normal Java
application conventions. The default config is empty (no pre-configured
fetchers/emitters) — users configure these at runtime.

This removes the need for a separate config file in the Docker image.
The entrypoint only passes -c when TIKA_CONFIG env var is explicitly set.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@nddipiazza
Copy link
Copy Markdown
Contributor Author

Adding clarification: the Docker images are published to the existing Docker Hub repositories:

These are the same Docker Hub repos currently used by tika-docker and tika-grpc-docker — the GitHub Actions workflows will publish to the same locations, just automated from the main tika repo instead of manually.

@nddipiazza
Copy link
Copy Markdown
Contributor Author

@bartek

Comment thread tika-grpc/docker-build/Dockerfile
Comment thread tika-grpc/docker-build/Dockerfile Outdated
Comment thread tika-grpc/docker-build/Dockerfile Outdated
Comment thread tika-grpc/docker-build/Dockerfile Outdated
- run the grpc image as the shared non-root UID/GID
- align the grpc image with Java 21 and an LTS Ubuntu base

Co-authored-by: Copilot <[email protected]>
Copy link
Copy Markdown

@joepurdy joepurdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nddipiazza Looks good to me! Thanks for getting these updates in

@nddipiazza nddipiazza merged commit 2260a19 into main Apr 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants