kell_creations/docs/operations/index.md

109 lines
4.6 KiB
Markdown

# Operations Overview
This section contains operational runbooks, CI/CD documentation, and business procedures for the Kell Creations platform.
## Current Operational Documentation
| Document | Purpose | Status |
| ------------------------------------------------- | ---------------------------------------------------- | ---------------- |
| [CI/CD Workflow](cicd-workflow.md) | Defines the documentation publishing pipeline | ✅ Comprehensive |
| [Architecture Workflow](architecture-workflow.md) | Defines the diagram authoring and publishing process | ✅ Complete |
## Analysis Findings
!!! info "Last analyzed: 2026-05-22"
### Confirmed strengths
1. **CI/CD documentation is thorough** — The CI/CD workflow document covers platforms, runner architecture, branch behavior, troubleshooting, permissions, and security considerations
2. **Architecture workflow is well-defined** — Clear step-by-step process for creating and publishing diagrams
3. **Four Forgejo Actions workflows are operational**`publish-docs.yml`, `validate-docs.yml`, `flutter-analyze.yml`, `flutter-test.yml`
### Gaps and recommendations
#### 1. No operational runbooks
**Priority:** Medium
No runbooks exist for common operational tasks such as:
- Server health checks and restart procedures
- Forgejo runner maintenance and token rotation
- PlantUML server maintenance
- MkDocs container updates
- Backup and recovery procedures
- SSL certificate renewal
- DNS and reverse proxy configuration
**Recommendation:** Create lightweight runbooks for the most critical operations first. Suggested initial candidates:
| Candidate | Description |
| ------------------------------ | -------------------------------------------------------------------------------- |
| Runner maintenance runbook | How to check, restart, re-register, and rotate tokens for Forgejo runners |
| Documentation host maintenance | Docker container updates, published site integrity checks, disk space monitoring |
| Incident response procedure | What to do when the docs site, Git, or runners are down |
#### 2. No monitoring or alerting documentation
**Priority:** Medium
No documentation exists for how to detect or respond to:
- CI/CD pipeline failures
- Documentation site downtime
- Runner service failures
- Disk space or resource exhaustion
**Recommendation:** Document current monitoring capabilities (even if manual) and identify candidates for automated alerting.
#### 3. Architecture workflow is incomplete
**Priority:** Low
The architecture workflow document at `docs/operations/architecture-workflow.md` ends at step 4 (validate repository state) without covering:
- Commit and push procedures
- CI/CD pipeline verification
- Published site verification
- Diagram review process
**Recommendation:** Complete the remaining workflow steps to match the level of detail in the CI/CD workflow document.
#### 4. Local development setup not documented
**Priority:** Low
No documentation covers how to set up a local development environment for:
- MkDocs local preview (including the PlantUML render step)
- Flutter development environment setup
- Forgejo runner local testing
**Recommendation:** Add a developer setup guide, particularly noting that `docs/images/` is a CI/CD build artifact and local MkDocs builds require manual PlantUML rendering.
#### 5. CI/CD validation could be expanded
**Priority:** Low
The CI/CD workflow document itself identifies future enhancements that remain unimplemented:
- Broken-link validation
- Markdown linting integration
- PlantUML diagram validation
- Required document metadata checks
- Notification hooks for failed publishes
**Recommendation:** Prioritize Markdown linting and link checking as the highest-value additions to the validation pipeline.
## Recommended Procedures
The following operational procedures are candidates for formal documentation using the procedure template at `policies/templates/procedure-template.md`:
| Candidate ID | Title | Priority |
| -------------- | ---------------------------------------- | -------- |
| KC-PRO-IT-001 | Forgejo Runner Maintenance Procedure | Medium |
| KC-PRO-IT-002 | Documentation Host Maintenance Procedure | Medium |
| KC-PRO-OPS-001 | Incident Response Procedure | Medium |
| KC-PRO-IT-003 | Local Development Setup Procedure | Low |
| KC-PRO-OPS-002 | Backup and Recovery Procedure | Low |