Introduction
Today, we're going to talk about a payment system. This system requires some life experience as it's not about actually making transactions without the proper financial license. For example, when buying clothes online, you might need to jump to another platform to pay.
Payment Entities and the Role of Card Organizations
-
Payment Entities: When a user uses a credit card to buy something, the user is the cardholder, and the merchant is the收款方. Each has a corresponding bank - the user's bank is the发卡行, and the merchant's bank is the收单行.
-
Card Organizations: In the middle, there are card organizations like Visa or MasterCard, which are card networks. They forward various payment requests. Since users may have different bank cards and merchants may have different bank accounts, directly connecting to each one would be a huge workload with a heavy compliance burden, and banks often have slow - updating technologies.
The Role of PSP
- PSP Platform: To simplify, we usually use a payment service platform (PSP). It provides a simple API for us to call, and it handles the payment requests. In return, it charges a fee, typically in the range of 0.5% to 1%.
The Core of the Payment System Design
- Recording and Internal Clearing: The payment system we need to design is mainly for recording payments and internal clearing. It's a system to record payment facts.
User Payment Process
- Payment Request: The merchant initiates a payment request.
- API Call: We call the external PSP's API to execute the payment.
- Record Keeping: After PSP execution, we record the transaction, whether it's a success or a failure.
- User Notification: Notify the corresponding user about the payment status.
- Daily Settlement: Perform daily account settlement.
Functional and Non-Functional Requirements
Functional Requirements
-
Merchant Side: The merchant should be able to initiate a payment request and view the order's payment status.
-
Customer Side: The customer should be able to make a payment.
Non-Functional Requirements
-
Security: This is crucial as all financial-related matters require high-security measures.
-
Reliability: The system should be able to handle failures.
-
Consistency: We want a payment to be deducted only once and properly recorded.
-
Correctness: The accounts should always be accurate.
-
Scalability: Since we're using an external API, the bottleneck is often external. We can consider scalability later if possible.
High-Level Diagram and Payment Integration
High-Level Diagram
After understanding the requirements, we can draw a high-level diagram.
Payment Integration
-
Server-to-Server: The simplest way is for the merchant to collect the user's card number and CVV and send it directly to the bank or card network. However, this requires a payment institution license, PCI audit (which is complex and expensive), and is mainly used by large enterprises. The advantage is more autonomy and cost savings on third-party channels.
-
Payment Platform SDK: A common method is to use a PSP's SDK like Stripe, PayPal, or domestic ones like WeChat Pay or Alipay. The PSP provides a JavaScript library that the merchant embeds in their website. The SDK sends the user's card number to the PSP and returns a token. The merchant uses this token to call the PSP's API without directly handling the card number. But the SDK can be a bottleneck, and there are security risks like XSS injection or MageCart implant. To mitigate this, the merchant can use security policies like CSP or SRI, or use an iframe for better security. Modern SDKs often use a mixed mode with an iframe for better data protection.
-
Hosted Redirect: The user clicks to place an order and is redirected to a payment platform like PayPal or Stripe. After payment, they are redirected back. This is the simplest method with the lowest PCI burden and easiest maintenance but has the least freedom and may have a lower user conversion rate.
Recording Transactions and Database Design
Database Design
-
Single Table Consideration: Initially, we might think of using a single table to store all payment information. But due to compliance, payment data should be unchangeable.
-
Three - Table Structure: We should split the data into three tables for business status (orders and products), transaction status (payment details and status), and funds status (money arrival and distribution). These tables have different read-write modes.
Payment Table Structure
- Payment Table Fields: We need fields like charge ID, order ID, payment method, PSP, status (which can be represented by a state machine), amount, currency, creation and update times, and an idempotency key for uniqueness.
Ensuring Consistency and Handling Workflow
Consistency
-
Internal Consistency: Can be handled by using a relational database.
-
External Consistency: To avoid double charging or double spending, we use an idempotency key. When the user opens the checkout page, the server generates a key, and all subsequent processes use this key. The payment service does de-duplication based on this key.
Workflow and State Machine
- Payment Process: The payment table's status changes through a state machine. From created to start when a valid request is received, then to processing when the PSP is called. Since PSP is asynchronous, we need a retry mechanism with four elements: timeout, backoff (preferably exponential), jitter, and a dead letter queue. The PSP also needs to do de-duplication with the same idempotency key.
Completing the Payment Process
Callback and Polling
-
Callback: After PSP returns a success message, we can inform the user. PSP usually calls a callback URL to notify us when the payment is processed.
-
Polling: If the callback fails, we need to use polling to check the status. Once the PSP finishes, the state machine enters success or fail, and we notify the user. If it's a success, we write to the ledger.
Ledger and Accounting
-
Ledger Database: The ledger should be an append-only database. There are specialized databases like Amazon's QLDB or Alibaba's ledgerDB that are optimized for financial transactions and have built-in features like audit and encryption.
-
Reconciliation: Banks send settlement files regularly. We perform daily reconciliation tasks and can trigger automatic compensation or manual review if there are issues.
Potential Issues and Solutions
State Machine Jump
-
Problem: There can be a situation where a payment is marked as failed due to polling, but then the PSP notifies a success later.
-
Solutions: We can use a monotonic state machine, compare version numbers during updates, or rely on the end-of-day reconciliation.
Alternative Design with Event Store
Event - Based System
-
Concept: Instead of a traditional database, we can use an event store like Kafka, Apache Pulsar, or an append-only database. All changes are written as events, and we generate view tables for faster querying.
-
Advantages and Disadvantages: This approach offers both financial system correctness and internet system scalability but has a high system complexity and development and maintenance costs.
Fraud Detection
Fraud Detection Methods
-
Rule Engine: Simple and easy to comply with and audit, but not suitable for complex fraud.
-
Statistical and Machine Learning: Common but limited by data and slow to react to new fraud. May require manual feature selection and data labeling.
-
Unsupervised Learning: Used by large companies to solve cold-start and zero-sample problems but has a certain false-positive rate.
-
Infrastructure Support: Graph models or GNN can be used to capture complex fraud patterns but are complex to maintain and develop.