PKI & Certificates
Root CAs, intermediate CAs, certificate chains, and what can go wrong. The plumbing behind every HTTPS connection.
The trust problem
When your browser connects to a bank's website, it needs to verify it is talking to the real server and not an impersonator. But you cannot hardcode every legitimate server's public key into every browser. Public Key Infrastructure (PKI) solves this with a chain of trust.
The idea: a small set of root Certificate Authorities (CAs) are trusted by default in operating systems and browsers. Any certificate signed by a trusted CA (or by an intermediate CA that traces back to a trusted root) is accepted.
The certificate chain
Certificates are organized in a hierarchy. A typical chain looks like:
Root CA (DigiCert / Let's Encrypt / Sectigo)
└── Intermediate CA
└── End-entity certificate (your domain: api.yourbank.com)Root CAs keep their private keys offline in hardware security modules (HSMs) in physically secured facilities. They sign intermediate CA certificates and then go back offline. Day-to-day certificate issuance is handled by intermediates.
When a client connects to your server, it receives your end-entity certificate plus the intermediate certificate(s). It then validates the chain up to a root it already trusts.
Missing intermediate certificate
One of the most common TLS failures: the server sends only its end-entity certificate without the intermediate. Most browsers cache intermediates and compensate. But mobile apps, API clients, and Java applications using a custom TrustManager often fail with a certificate chain error. Always configure your server to send the full chain.What is inside a certificate?
A certificate is an X.509 structure containing:
Subject: CN=api.yourbank.com, O=Your Bank Ltd, C=IN Issuer: CN=DigiCert TLS RSA SHA256 2020 CA1 SANs: api.yourbank.com, *.yourbank.com Validity: 2024-01-01 to 2025-01-01 Public key: EC (P-256), 256 bits Key usage: Digital Signature, Key Encipherment Signature: SHA256withRSA (issued by the intermediate)
Key fields to understand:
- Subject Alternative Names (SANs): The actual hostnames the certificate is valid for. Browsers match against SANs, not just the CN. A certificate for
api.yourbank.comwill fail forpayments.yourbank.comunless that is also in SANs. - Key usage: Limits what the key pair can do. A certificate flagged only for Digital Signature cannot be used for key encipherment.
- Extended Key Usage: Further restricts usage.
serverAuthfor TLS servers,clientAuthfor mTLS clients.
DV, OV, and EV certificates
CAs offer different levels of validation before issuing a certificate:
- DV (Domain Validation):CA verifies you control the domain (DNS record or file challenge). No identity verification. Fast and cheap (Let's Encrypt is DV). The certificate only proves control of the domain, not who owns it.
- OV (Organization Validation): CA verifies the legal organization in addition to domain control. The certificate includes the organization name. Most banking and payment APIs use OV.
- EV (Extended Validation): Stricter identity checks, more paperwork, higher cost. Browsers used to show a green bar for EV but removed it as it provided false security assurance to end users. Still used by some large banks for their main website.
Certificate revocation
When a private key is compromised or a certificate is mis-issued, it needs to be revoked before its expiry date. Two mechanisms handle this:
- CRL (Certificate Revocation List): A signed list of revoked serial numbers, published by the CA. Clients download and check it. Large files, infrequent updates, often cached aggressively.
- OCSP (Online Certificate Status Protocol): Real-time per-certificate status check against the CA. Faster but adds latency. OCSP Stapling solves this: the server fetches and caches the OCSP response, stapling it to the TLS handshake so clients do not need to query the CA.
In practice, most browsers use a "soft-fail" approach: if the OCSP or CRL check fails (CA unreachable), the certificate is still accepted. This is a known weakness. For high-security BFSI integrations, consider hard-fail OCSP or certificate pinning.
Certificate expiry: the operational risk
Expired certificates are the most common cause of unplanned TLS outages. Google has proposed reducing certificate lifetimes to 47 days (from the current 398 days). This forces automation and removes reliance on manual calendar reminders.
Common expiry failures in BFSI systems
- Vendor-managed certificates on payment gateway load balancers not covered by internal monitoring
- Client certificates used for mTLS to NPCI/NACH forgotten after initial setup
- Certificates on internal microservices not in the public-facing monitoring scope
- Root or intermediate CA certificates expiring (happened with AddTrust in 2020, broke many systems that were not tracking the full chain)
Automated renewal (Let's Encrypt + certbot, AWS Certificate Manager, HashiCorp Vault PKI) is the correct solution. If you must manage certificates manually, monitor expiry dates with at least 30-day warnings, and treat a certificate renewal as a scheduled change with a runbook.
Private CA for internal systems
Public CAs issue certificates for publicly accessible domains. For internal services (microservices, internal dashboards, service mesh), you run a private CA. Every service in your infrastructure trusts your internal root CA.
Options range from OpenSSL-managed CA files (fragile, hard to audit) to purpose-built solutions:
- HashiCorp Vault PKI: short-lived certificates, automatic rotation, audit trail
- AWS Private CA: managed service, integrates with ACM
- EJBCA, step-ca: self-hosted options with ACME support
Quantum impact on PKI
Today's certificates use RSA or ECDSA public keys. Both are broken by Shor's algorithm on a sufficiently powerful quantum computer. This affects:
- The certificate signature (the CA's signature on your cert)
- The server's authentication during the TLS handshake
- The client certificate in mTLS
NIST standardized ML-DSA (FIPS 204) as the replacement for digital signatures. Migration requires both CAs to support issuing ML-DSA certificates and TLS stacks to support verifying them. Hybrid certificates (ML-DSA + ECDSA) allow gradual transition.
This migration is harder than key exchange migration because it requires changes all the way up to the root CA. Plan accordingly: the PKI migration will take longer than the TLS cipher suite migration.