Skip to main content
When you send a messy transaction string like AMZN Mktp US*1A2B3C4D5 Amzn.com/bill WA to ParseTx, it passes through a five-stage pipeline before you receive a clean, structured result. Understanding this pipeline helps you predict response times, interpret status values, and build confidently on top of the API — because the same input will always produce the same output, every time.

The Enrichment Pipeline

1

Input Normalization

Before anything else, ParseTx sanitizes your raw transaction string. This stage has no network calls and completes in under 1ms on the edge.The normalizer performs the following operations in sequence:
  • Strips prompt injection vectors — scans for 12+ known jailbreak signatures (e.g. ignore all previous instructions, [INST]) and removes them entirely.
  • Removes numeric PII — any numeric sequence longer than 4 digits (card numbers, order IDs, routing numbers) is stripped before the string ever reaches the AI engine.
  • Cleans noisy substrings — trailing dates (12/03, 06-11), location fragments, and processor-appended order-ID patterns are removed.
  • Truncates to 64 characters — enforces a hard cap to prevent Denial-of-Wallet (DoW) attacks that pad inputs to maximise AI token consumption.
  • Uppercases the result — normalizes casing so that Netflix.com on demand and NETFLIX.COM ON DEMAND resolve to the same cache key.
If the resulting string contains no alphabetic characters at all — or has a vowel ratio below 8% with no spaces — it fails the entropy gate and is returned immediately as "status": "rejected". Do not retry these items.
2

Cache Lookup

The normalized string is looked up in ParseTx’s global edge cache, distributed worldwide for low-latency access from any region.
  • Cache hit → the stored result is returned immediately. No AI call is made. Response time is typically under 50ms.
  • Cache miss → the request proceeds to the AI enrichment stage.
Cache hits always return "source": "cache" and "confidence": 1 in the response. The cache is pre-warmed with the top 50,000 global merchants, so the most common transaction strings resolve instantly on first request.
3

AI Enrichment

On a cache miss, the normalized string is forwarded to ParseTx’s AI inference engine. A strict structured schema is enforced — categories, booleans, and confidence scores are all validated before the result is used.
  • Results with confidence ≥ 0.80 are written back to the edge cache for 30 days. Results below this threshold are returned to you but not cached, so the next request for the same string will try again.
Expect AI enrichment to take 1.5–3.5 seconds during normal operation. If the AI engine is slow or unavailable, the item returns "status": "retry" — safe to resubmit.
4

Deterministic Output

Once a merchant string is resolved and cached, ParseTx guarantees deterministic output: the same input always returns the same merchant, category, mcc_code, and all other fields. There is no LLM variance on repeat requests.This is the property that makes ParseTx suitable for production bookkeeping, analytics, and reporting. If RECURRING PMT AUTHORIZED ON 05/01 NETFLIX.COM maps to "category": "Entertainment" today, it maps to "category": "Entertainment" six months from now.
Unlike calling an LLM directly, ParseTx’s canonical cache means identical inputs are resolved once and fanned out consistently to every caller. You never pay for the same merchant string twice, and you never get a different answer on a subsequent call.
5

Fault-Isolated Batch Response

For batch requests, every item in your array is processed inside its own isolated try/catch block. A single item that times out, triggers a rate limit, or encounters a transient upstream error does not fail the rest of the batch.
  • Items that succeed return "status": "complete".
  • Items that fail transiently return "status": "retry" — you can resubmit them.
  • Items that are garbage return "status": "rejected" — do not retry these.
The overall HTTP response is always 200 OK. You must inspect each item’s status field to determine whether it needs action. If any item in the batch returns retry, the envelope-level status is "partial" instead of "complete".

Pipeline Flow

Your input string


┌─────────────────────────────────────────┐
│  Stage 1: Normalization                 │
│  Strip PII, noise, dates → uppercase    │
│  Entropy gate → reject garbage          │
└───────────────────┬─────────────────────┘


┌─────────────────────────────────────────┐
│  Stage 2: Cache Lookup                  │
└──────────┬──────────────────────────────┘

     ┌─────┴──────┐
     │            │
  HIT ✓        MISS ✗
     │            │
     │            ▼
     │   ┌─────────────────────────┐
     │   │  Stage 3: AI Enrichment │
     │   │  Strict schema enforced │
     │   │  Cache result (30 days) │
     │   └────────────┬────────────┘
     │                │
     └────────┬───────┘


     Structured JSON result
     (merchant, domain, category,
      mcc_code, is_subscription,
      confidence, status, source)

Batch Deduplication

If you send the same transaction string multiple times within a single batch request, ParseTx resolves it once and fans the result back to each position in your array. You are billed once per unique string, not once per array element. Deduplication happens after normalization, so NETFLIX.COM ON DEMAND and netflix.com on demand are treated as the same string.

Response Times at a Glance

Cache Hit

Under 50msNormalized string found in the global edge cache. No AI call. Confidence is always 1. Source is "cache".

AI Enrichment

1.5 – 3.5 secondsCache miss forwarded to the AI inference engine. Result is cached for 30 days after the first call. Source is "llm".
Your first batch of transactions will be slowest — cache misses trigger AI inference. Subsequent batches with the same merchants resolve from cache in under 50ms. The more you use ParseTx, the faster it gets.