Automate Your Database Documentation with Metadata Auto-Doc
Database schemas evolve constantly - new tables appear, columns change, indexes get added, and data volumes fluctuate. Yet keeping documentation current often falls by the wayside, leaving teams with outdated wikis and confused developers. What if your database could document itself automatically and alert you when important changes occur?
Enter Metadata Auto-Doc, a Python tool that transforms the tedious task of database documentation into an automated workflow that fits seamlessly into your development process.
https://github.com/steveoliai/metadata-auto-documentation
What Does It Do?
Metadata Auto-Doc connects to your PostgreSQL or BigQuery databases and automatically generates comprehensive documentation in multiple formats:
Markdown files for version control and wikis
HTML reports for sharing with stakeholders
JSON snapshots for programmatic analysis
Mermaid ERD diagrams showing table relationships
Change tracking with configurable thresholds
Slack notifications when significant changes are detected
Key Features
Multi-Database Support
Currently supports PostgreSQL and BigQuery, with a modular architecture that makes adding other databases straightforward. Each database type captures the metadata most relevant to that platform - constraints and indexes for PostgreSQL, partitioning and clustering for BigQuery.
Intelligent Change Detection
The tool doesn't just document your current schema - it tracks changes over time by comparing snapshots. You can set thresholds for what constitutes a "significant" change:
Table additions/removals
Column schema changes
Row count fluctuations above a percentage threshold
Table size changes
Index and constraint modifications
Visual Documentation
Beyond tables and text, Metadata Auto-Doc generates Entity Relationship Diagrams using Mermaid syntax. These diagrams automatically show table relationships and can be embedded in documentation platforms that support Mermaid rendering.
Team Integration
Built-in Slack webhook support means your team gets notified when important schema changes occur. No more surprises when someone adds a table or modifies a critical column - the changes are documented and communicated automatically.
Real-World Usage Scenarios
Scenario 1: Development Team Workflow
# Daily schema documentation as part of CI/CD
python metadata_autodoc.py --source postgres --conn "postgresql://user:pass@prod-db:5432/analytics" --schema public --md docs/schema.md --html docs/schema.html --snapshot-dir ./_snapshots --slack-webhook $SLACK_WEBHOOK_URL
This command runs in your CI pipeline, automatically updating documentation and alerting the team to schema changes in production.
Scenario 2: Data Engineering Monitoring
# BigQuery dataset monitoring
python metadata_autodoc.py --source bigquery --project data-warehouse --dataset analytics --json current_snapshot.json --baseline yesterday_snapshot.json --threshold-row-pct 5.0
Perfect for monitoring data warehouse changes where even small row count variations might indicate pipeline issues.
Scenario 3: Schema Migration Tracking
Before and after a major database migration, you can capture snapshots to document exactly what changed:
# Before migration
python metadata_autodoc.py --source postgres ... --json pre_migration.json
# After migration
python metadata_autodoc.py --source postgres ... --baseline pre_migration.json
The diff report shows precisely what tables, columns, and constraints were affected.
What The Output Looks Like
PostgreSQL Documentation
For each table, you get:
Table comments and descriptions
Approximate row counts and table sizes
Primary and foreign key relationships
Index definitions
Complete column details with data types, nullability, defaults, and comments
BigQuery Documentation
For BigQuery tables, the tool captures:
Table descriptions and types
Partitioning strategies and fields
Clustering configurations
Row counts
Column schemas with modes and descriptions
Change Reports
When comparing snapshots, you see:
Tables added or removed
Column additions, removals, and modifications
Changes to keys, indexes, and table options
Row count and size fluctuations with percentage changes
Flagged items that exceed your configured thresholds
Technical Architecture
The tool follows a clean, extensible pattern:
Collectors (
pg_collect
,bq_collect
) extract metadata using database-specific queriesRenderers transform the metadata into various output formats
Diff engine compares snapshots and identifies changes
Notification system alerts teams to significant changes
This separation makes it easy to add new database types or output formats without touching existing code.
Getting Started
Prerequisites
# For PostgreSQL
pip install sqlalchemy pandas psycopg2-binary
# For BigQuery
pip install google-cloud-bigquery pandas
# For enhanced HTML output
pip install jinja2
# For Slack notifications
pip install requests
Basic Usage
Document a PostgreSQL schema:
python metadata_autodoc.py --source postgres --conn "postgresql://user:pass@host:5432/db" --schema public --md schema_docs.md
Track changes over time:
# First run creates baseline python metadata_autodoc.py --source postgres ... --snapshot-dir ./snapshots # Subsequent runs show changes python metadata_autodoc.py --source postgres ... --snapshot-dir ./snapshots
Generate visual ERD: The Markdown and HTML outputs include Mermaid diagrams that render in GitHub, GitLab, and documentation platforms.
Why This Matters
Database documentation that's out of sync with reality is worse than no documentation at all - it actively misleads developers and creates technical debt. Manual documentation processes simply can't keep pace with modern development cycles.
Metadata Auto-Doc solves this by making documentation a natural byproduct of your database operations. Schema changes get documented automatically, teams stay informed, and your database knowledge stays current without manual intervention.
Whether you're managing a small application database or a sprawling data warehouse, automated metadata documentation helps teams move faster while maintaining clarity about their data structures. It's particularly valuable for:
Data teams tracking warehouse evolution
Development teams maintaining application databases
DevOps engineers monitoring production changes
Data governance initiatives requiring current documentation
Future Enhancements?
The modular architecture makes several enhancements straightforward:
Support for MySQL, SQL Server, and other databases
Integration with documentation platforms like Notion or Confluence
Custom output templates for specific organizational needs
Advanced relationship detection and constraint analysis
Performance metrics and query optimization suggestions
Metadata Auto-Doc represents a shift toward treating database documentation as a continuous, automated process rather than a periodic manual chore. By embedding documentation generation into your existing workflows, you ensure your team always has current, accurate information about your database structures.