Deployment: CredForge Databases¶
This document outlines the deployment steps for the required databases: MongoDB 7.0 and PostgreSQL 16.
Database Requirements¶
- MongoDB 7.0
- PostgreSQL 16
Deployment Methods¶
1. Docker Compose (Recommended for Dev Testing Environments)¶
MongoDB 7.0¶
version: '3.8'
services:
mongodb:
image: mongo:7.0
container_name: mongodb
ports:
- "27017:27017"
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=your_secure_password
volumes:
- mongodb_data:/data/db
volumes:
mongodb_data:
PostgreSQL 16¶
version: '3.8'
services:
postgres:
image: postgres:16
container_name: postgres
ports:
- "5432:5432"
environment:
- POSTGRES_USER=admin
- POSTGRES_PASSWORD=your_secure_password
- POSTGRES_DB=postgres
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
To deploy using Docker Compose:
- Create a
docker-compose.ymlfile with both services - Run
docker-compose up -d - Verify services are running with
docker-compose ps
2. Custom Installation on Ubuntu 22.04 (Recommended for UAT Environments)¶
MongoDB 7.0 Installation¶
Import MongoDB public GPG key:
curl -fsSL https://pgp.mongodb.com/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg \
--dearmor
Create list file for MongoDB:
echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
Update package database and install MongoDB:
Start and enable MongoDB:
PostgreSQL 16 Installation¶
Add PostgreSQL repository:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
Install PostgreSQL:
Start and enable PostgreSQL:
3. AWS Managed Databases (Recommended for Production Environments)¶
Amazon DocumentDB (MongoDB Compatible)¶
- Create a DocumentDB cluster:
- Go to AWS Console → DocumentDB
- Click "Create cluster"
- Select engine version compatible with MongoDB 7.0
- Configure instance class (recommended: db.r6g.large or higher)
- Set up VPC and security groups
- Configure backup retention period
-
Set up encryption at rest
-
Configure security:
- Configure security groups to allow access from application servers
Amazon RDS for PostgreSQL 16¶
- Create an RDS instance:
- Go to AWS Console → RDS
- Click "Create database"
- Choose PostgreSQL 16
- Select instance class (recommended: db.r6g.large or higher)
- Configure storage (recommended: 100GB+ with autoscaling)
- Set up VPC and security groups
- Enable Multi-AZ deployment for high availability
- Configure backup retention period
-
Enable encryption at rest
-
Configure security:
- Configure security groups to allow access from application servers
Database Schema Setup¶
PostgreSQL Database Setup¶
Required PostgreSQL Databases¶
cred_forge_db
Create database and user:¶
-- Create databases
CREATE DATABASE cred_forge_db;
-- Create user with password
CREATE USER credforge_user WITH PASSWORD 'your_secure_password';
-- Grant privileges for api_insights_db
GRANT CONNECT ON DATABASE cred_forge_db TO credforge_user;
GRANT USAGE ON SCHEMA public TO credforge_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO credforge_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO credforge_user;
-- Set default privileges for future tables in both databases
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO credforge_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT USAGE, SELECT ON SEQUENCES TO credforge_user;
Create Tables¶
Database to use - cred_forge_db
CREATE TABLE forge_request_workflow_state (
request_id VARCHAR(255),
user_id VARCHAR(255),
reference_id VARCHAR(255),
client_id VARCHAR(255),
workflow_endpoint VARCHAR(255),
workflow_strategy VARCHAR(255),
workflow_config JSONB,
input_json JSONB DEFAULT '{}'::jsonb,
ecm JSONB,
features JSONB,
rules_output JSONB,
engine_history JSONB,
created_at TIMESTAMP NOT NULL,
created_date TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_date);
-- Optional indices for better query performance
CREATE INDEX idx_request_workflow_state_reference_id ON forge_request_workflow_state(reference_id);
CREATE INDEX idx_request_workflow_state_user_id ON forge_request_workflow_state(user_id);
CREATE INDEX idx_request_workflow_state_request_id ON forge_request_workflow_state(request_id);
CREATE INDEX idx_request_workflow_state_client_id ON forge_request_workflow_state(client_id);
CREATE INDEX idx_request_workflow_state_workflow_endpoint ON forge_request_workflow_state(workflow_endpoint);
CREATE INDEX idx_request_workflow_state_workflow_strategy ON forge_request_workflow_state(workflow_strategy);
CREATE INDEX idx_request_workflow_state_created_date ON forge_request_workflow_state(created_date);
CREATE TABLE forge_ecm_response_log (
id SERIAL,
request_id VARCHAR(255) NOT NULL,
user_id VARCHAR(255),
reference_id VARCHAR(255),
engine VARCHAR(255),
client_id VARCHAR(255),
external_status VARCHAR(255),
external_response_code INTEGER,
external_retries INTEGER,
external_latency FLOAT,
ecm_data_source VARCHAR(255),
ecm_message VARCHAR(255),
metadata_json JSONB,
ecm_status_code INTEGER,
created_at TIMESTAMP NOT NULL,
created_date TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_date);
-- Indices for better query performance
CREATE INDEX idx_ecm_response_log_request_id ON forge_ecm_response_log(request_id);
CREATE INDEX idx_ecm_response_log_reference_id ON forge_ecm_response_log(reference_id);
CREATE INDEX idx_ecm_response_log_user_id ON forge_ecm_response_log(user_id);
CREATE INDEX idx_ecm_response_log_engine ON forge_ecm_response_log(engine);
CREATE INDEX idx_ecm_response_log_created_date ON forge_ecm_response_log(created_date);
CREATE TABLE forge_application_logs (
id CHAR(36),
request_id VARCHAR(255),
client_id VARCHAR(255),
reference_id VARCHAR(255),
user_id VARCHAR(255),
forge_component VARCHAR(255) NOT NULL,
api_endpoint VARCHAR(255),
engine_name VARCHAR(255),
workflow_name VARCHAR(255),
level VARCHAR(50) NOT NULL,
timestamp TIMESTAMP NOT NULL,
message TEXT NOT NULL,
extra_data JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_date date GENERATED ALWAYS AS ("timestamp"::date) STORED
) PARTITION BY RANGE (created_at);
-- Create indexes
CREATE INDEX idx_application_logs_request_id ON forge_application_logs(request_id);
CREATE INDEX idx_application_logs_client_id ON forge_application_logs(client_id);
CREATE INDEX idx_application_logs_reference_id ON forge_application_logs(reference_id);
CREATE INDEX idx_application_logs_user_id ON forge_application_logs(user_id);
CREATE INDEX idx_application_logs_forge_component ON forge_application_logs(forge_component);
CREATE INDEX idx_application_logs_api_endpoint ON forge_application_logs(api_endpoint);
CREATE INDEX idx_application_logs_engine_name ON forge_application_logs(engine_name);
CREATE INDEX idx_application_logs_workflow_name ON forge_application_logs(workflow_name);
CREATE INDEX idx_application_logs_level ON forge_application_logs(level);
CREATE INDEX idx_application_logs_created_date ON forge_application_logs(created_date);
Create Table Partitions¶
CREATE OR REPLACE FUNCTION public.create_daily_partition(parent_table text, partition_date date, days_ahead integer DEFAULT 0)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
partition_name text;
partition_start timestamp;
partition_end timestamp;
partition_col_name text;
d date := partition_date;
end_date date := partition_date + days_ahead;
BEGIN
-- Detect the partition key column
SELECT a.attname INTO partition_col_name
FROM pg_partitioned_table pt
JOIN pg_class c ON pt.partrelid = c.oid
JOIN pg_attribute a ON a.attrelid = pt.partrelid AND a.attnum = ANY(pt.partattrs)
WHERE c.relname = parent_table
LIMIT 1;
IF partition_col_name IS NULL THEN
RAISE EXCEPTION 'Partition column not found for table %', parent_table;
END IF;
-- Iterate using WHILE loop (safe for DATE)
WHILE d <= end_date LOOP
partition_start := d::timestamp;
partition_end := (d + 1)::timestamp;
partition_name := parent_table || '_' || to_char(d, 'YYYY_MM_DD');
BEGIN
IF NOT EXISTS (
SELECT 1
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = partition_name
AND n.nspname = 'public'
) THEN
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
partition_name,
parent_table,
partition_start,
partition_end
);
RAISE NOTICE 'Created partition % for table % on %', partition_name, parent_table, partition_col_name;
END IF;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Failed to create partition %: %', partition_name, SQLERRM;
END;
d := d + INTERVAL '1 day';
END LOOP;
RETURN;
END;
$function$
;
Execute, or set crons for -
SELECT create_daily_partition('forge_request_workflow_state', CURRENT_DATE, 10);
SELECT create_daily_partition('forge_ecm_response_log', CURRENT_DATE, 10);
SELECT create_daily_partition('forge_application_logs', CURRENT_DATE, 10);
To setup crons using pg_cron -
Enable pg_cron -
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT * FROM pg_extension WHERE extname = 'pg_cron';
GRANT USAGE ON SCHEMA cron TO credforge_user;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA cron TO credforge_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA cron GRANT EXECUTE ON FUNCTIONS TO credforge_user;
Set crons -
SELECT cron.schedule(
'create_daily_partitions_for_forge_request_workflow_state',
'0 0 * * *',
$$SELECT create_daily_partition('forge_request_workflow_state', CURRENT_DATE, 10);$$
);
SELECT cron.schedule(
'create_daily_partitions_for_forge_ecm_response_log',
'0 0 * * *',
$$SELECT create_daily_partition('forge_ecm_response_log', CURRENT_DATE, 10);$$
);
SELECT cron.schedule(
'create_daily_partitions_for_forge_application_logs',
'0 0 * * *',
$$SELECT create_daily_partition('forge_application_logs', CURRENT_DATE, 10);$$
);
Purge Table Partitions¶
CREATE OR REPLACE FUNCTION public.cleanup_old_partitions_for_table(parent_table text, retention_days integer DEFAULT 90)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
partition_record record;
partition_date date;
BEGIN
FOR partition_record IN
SELECT tablename, schemaname
FROM pg_tables
WHERE schemaname = 'public'
AND tablename LIKE parent_table || '_%_%_%'
LOOP
BEGIN
-- Extract the last three parts of the table name which contain the date
partition_date := to_date(
split_part(partition_record.tablename, '_', -3) || '_' ||
split_part(partition_record.tablename, '_', -2) || '_' ||
split_part(partition_record.tablename, '_', -1),
'YYYY_MM_DD'
);
IF partition_date < current_date - retention_days THEN
EXECUTE format('DROP TABLE IF EXISTS %I.%I',
partition_record.schemaname,
partition_record.tablename);
RAISE NOTICE 'Deleted partition: %', partition_record.tablename;
END IF;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Error processing partition %: %', partition_record.tablename, SQLERRM;
CONTINUE;
END;
END LOOP;
END;
$function$
;
Execute, or set crons for -
SELECT cleanup_old_partitions_for_table('forge_request_workflow_state', 7);
SELECT cleanup_old_partitions_for_table('forge_ecm_response_log', 7);
SELECT cleanup_old_partitions_for_table('forge_application_logs', 7);
To setup crons using pg_cron -
Enable pg_cron -
Follow cron setup steps above.
Set crons -
SELECT cron.schedule(
'create_daily_partitions_for_forge_request_workflow_state',
'0 0 * * *',
$$SELECT create_daily_partition('forge_request_workflow_state', CURRENT_DATE, 10);$$
);
SELECT cron.schedule(
'create_daily_partitions_for_forge_ecm_response_log',
'0 0 * * *',
$$SELECT create_daily_partition('forge_ecm_response_log', CURRENT_DATE, 10);$$
);
SELECT cron.schedule(
'create_daily_partitions_for_forge_application_logs',
'0 0 * * *',
$$SELECT create_daily_partition('forge_application_logs', CURRENT_DATE, 10);$$
);
MongoDB Database Setup¶
Create databases and user¶
// Connect to MongoDB
use admin
// Create application user
db.createUser({
user: "credforge_user",
pwd: "your_secure_password",
roles: [
{ role: "readWrite", db: "cred_forge_db" }
]
})
Create collections in MongoDB¶
// 1. Clients
db.createCollection("clients");
db.clients.createIndex({ "client_id": 1 });
db.clients.createIndex({ "auth_token": 1 });
// 2. Ecm Client Metadata
db.createCollection("ecm_client_metadata");
db.ecm_client_metadata.createIndex({ "client_id": 1 });
db.ecm_client_metadata.createIndex({ "engine": 1 });
// 3. Ecm Output Log
db.createCollection("ecm_output_log");
db.ecm_output_log.createIndex({ "reference_id": 1 });
db.ecm_output_log.createIndex({ "user_id": 1 });
db.ecm_output_log.createIndex({ "engine": 1 });
db.ecm_output_log.createIndex({ "request_id": 1 });
db.ecm_output_log.createIndex({ "client_id": 1 });
// 4. Ecm Request Log
db.createCollection("ecm_request_log");
db.ecm_request_log.createIndex({ "reference_id": 1 });
db.ecm_request_log.createIndex({ "user_id": 1 });
db.ecm_request_log.createIndex({ "request_id": 1 });
db.ecm_request_log.createIndex({ "engine": 1 });
db.ecm_request_log.createIndex({ "client_id": 1 });
// 5. Ecm Response Log
db.createCollection("ecm_response_log");
db.ecm_response_log.createIndex({ "reference_id": 1 });
db.ecm_response_log.createIndex({ "request_id": 1 });
db.ecm_response_log.createIndex({ "user_id": 1 });
db.ecm_response_log.createIndex({ "engine": 1 });
db.ecm_response_log.createIndex({ "client_id": 1 });
// 6. Request Workflow State
db.createCollection("request_workflow_state");
db.request_workflow_state.createIndex({ "client_id": 1 });
db.request_workflow_state.createIndex({ "created_date": 1 });
db.request_workflow_state.createIndex({ "user_id": 1 });
db.request_workflow_state.createIndex({ "request_id": 1 });
db.request_workflow_state.createIndex({ "reference_id": 1 });
db.request_workflow_state.createIndex({ "workflow_strategy": 1 });
db.request_workflow_state.createIndex({ "workflow_endpoint": 1 });
// 7. Unauthorized Requests
db.createCollection("unauthorized_requests");
db.unauthorized_requests.createIndex({ "client_id": 1 });
db.unauthorized_requests.createIndex({ "request_id": 1 });
// 8. Usage
db.createCollection("usage");
db.usage.createIndex({ "client_id": 1 });
db.usage.createIndex({ "request_id": 1 });
db.usage.createIndex({ "api_name": 1 });
// 9. Workflow Configs
db.createCollection("workflow_configs");
db.workflow_configs.createIndex({ "workflow_name": 1 });
db.workflow_configs.createIndex({ "client_id": 1 });
// 10. Workflow Endpoint Config
db.createCollection("workflow_endpoint_config");
db.workflow_endpoint_config.createIndex({ "workflow_endpoint": 1 });
db.workflow_endpoint_config.createIndex({ "client_id": 1 });
// 11. Workflow Event Log
db.createCollection("workflow_event_log");
db.workflow_event_log.createIndex({ "client_id": 1 });
db.workflow_event_log.createIndex({ "reference_id": 1 });
db.workflow_event_log.createIndex({ "request_id": 1 });
db.workflow_event_log.createIndex({ "user_id": 1 });
db.workflow_event_log.createIndex({ "event_type": 1 });
Important Notes¶
- Replace
your_secure_passwordwith a strong, secure password in all scripts - Ensure proper network security rules are in place before running these scripts
- For production environments, use AWS Secrets Manager or similar services to manage credentials
- Regularly rotate database credentials
- Monitor database access logs for any unauthorized access attempts
Security Considerations¶
- Always use strong passwords
- Enable encryption at rest
- Configure network security (firewalls, security groups)
- Regular backups
- Monitor database performance and logs
- Keep databases updated with security patches
Backup and Recovery¶
MongoDB¶
- Regular automated backups
- Point-in-time recovery (for DocumentDB)
- Backup verification
- Recovery testing
PostgreSQL¶
- Automated snapshots
- Point-in-time recovery
- WAL archiving
- Backup verification
- Recovery testing
Monitoring¶
- Set up CloudWatch metrics (for AWS deployments)
- Configure database-specific monitoring tools
- Set up alerts for:
- High CPU usage
- Low disk space
- Connection count
- Error rates
- Replication lag
Maintenance¶
- Regular security updates
- Performance optimization
- Index maintenance
- Vacuum operations (PostgreSQL)
- Database statistics updates
Scaling Ladder¶
The following table provides recommended node specifications and counts for PostgreSQL and MongoDB deployments at various Daily Active User (DAU) scales.
- PostgreSQL: 1GB RAM, 2 vCPU, 25GB disk (3000 IOPS, 125 MBPS)
- MongoDB: 4GB RAM, 2 vCPU, 100GB disk (3000 IOPS, 125 MBPS)
| Requests / Day | PostgreSQL Nodes | MongoDB Nodes |
|---|---|---|
| 10K | 1 | 1 |
| 25K | 2 | 2 |
| 50K | 3 | 3 |
| 75K | 4 | 4 |
| 100K | 5 | 5 |
Notes:
- For high availability and failover, consider running at least 2 nodes (primary + standby/replica) at all times, even at lower DAU.
- Scale up node count and/or instance size if you observe CPU, RAM, or disk IOPS/throughput consistently above 50% utilization.
- Adjust disk size and IOPS as data volume grows.
- Monitor database metrics and tune accordingly.