<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Vectoring AI</title>
<link>https://vectoringai.com/pages/ai-architect.html</link>
<atom:link href="https://vectoringai.com/pages/ai-architect.xml" rel="self" type="application/rss+xml"/>
<description>AI software architect interview questions covering requirements analysis, system design, architecture patterns, tech stack selection, cost optimization, security, scalability, latency, quality, risk management, roadmap strategy, and operational excellence.</description>
<generator>quarto-1.9.36</generator>
<lastBuildDate>Thu, 21 May 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>AI Architect Interview QA - 1</title>
  <dc:creator>Vectoring AI</dc:creator>
  <link>https://vectoringai.com/posts/ai-architect/AI-Architect-Interview-QA-1.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>This is <strong>Part 1</strong> of our AI Architect Interview QA series, focused on the <strong>strategic and operational skills</strong> expected of an AI software architect — not just technical depth, but the ability to understand business needs, map system landscapes, make architecture trade-offs, manage risk, control costs, ensure quality, and drive delivery across development phases.</p>
<p>An AI architect bridges business stakeholders, data scientists, ML engineers, platform teams, and security — making decisions that shape systems for years. These questions test that breadth.</p>
<blockquote class="blockquote">
<p>For related technical content, see <a href="../../posts/system-design/System-Design-Interview-QA-1.html">System Design Interview QA - 1</a>, <a href="../../posts/aiops-interview/MLOps-Interview-QA-1.html">MLOps Interview QA - 1</a>, and <a href="../../posts/design-pattern/Design-Pattern-Interview-QA-1.html">Design Pattern Interview QA - 1</a>.</p>
</blockquote>
<hr>
</section>
<section id="q1-how-do-you-approach-requirements-cartography-for-an-ai-system" class="level2">
<h2 class="anchored" data-anchor-id="q1-how-do-you-approach-requirements-cartography-for-an-ai-system">Q1: How Do You Approach Requirements Cartography for an AI System?</h2>
<p><strong>Answer:</strong></p>
<p><strong>Requirements cartography</strong> is the process of mapping the full landscape of needs, constraints, and stakeholders before any design begins. Unlike simple requirements gathering, cartography produces a structured view of the problem space — surfacing hidden dependencies, conflicting priorities, and non-functional requirements that dominate architecture decisions.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Discovery["Requirements Cartography"]
        STAKEHOLDERS["Stakeholder Mapping&lt;br/&gt;(who needs what)"]
        BUSINESS["Business Objectives&lt;br/&gt;(OKRs, KPIs, value)"]
        FUNCTIONAL["Functional Requirements&lt;br/&gt;(what the system does)"]
        NFR["Non-Functional Requirements&lt;br/&gt;(how it performs)"]
        CONSTRAINTS["Constraints&lt;br/&gt;(budget, timeline, compliance)"]
        DEPENDENCIES["Dependencies&lt;br/&gt;(data, teams, systems)"]
    end

    subgraph Outputs["Cartography Outputs"]
        CONTEXT["Context Diagram&lt;br/&gt;(system boundaries)"]
        QUALITY["Quality Attribute Scenarios&lt;br/&gt;(measurable NFRs)"]
        RISK_MAP["Risk Register&lt;br/&gt;(identified unknowns)"]
        TRADEOFF["Trade-off Matrix&lt;br/&gt;(conflicting priorities)"]
    end

    Discovery --&gt; Outputs

    style Discovery fill:#6cc3d5,stroke:#333,color:#fff
    style Outputs fill:#56cc9d,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="ai-specific-requirements-dimensions" class="level3">
<h3 class="anchored" data-anchor-id="ai-specific-requirements-dimensions">AI-Specific Requirements Dimensions</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 38%">
<col style="width: 36%">
</colgroup>
<thead>
<tr class="header">
<th>Dimension</th>
<th>Questions to Ask</th>
<th>Why It Matters</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Data availability</strong></td>
<td>What data exists? Quality? Volume? Freshness? Access rights?</td>
<td>Models are only as good as training data</td>
</tr>
<tr class="even">
<td><strong>Latency tolerance</strong></td>
<td>Real-time (&lt;100ms)? Near-real-time (&lt;5s)? Batch (minutes/hours)?</td>
<td>Drives inference architecture completely</td>
</tr>
<tr class="odd">
<td><strong>Accuracy requirements</strong></td>
<td>What’s acceptable error rate? False positive vs false negative cost?</td>
<td>Determines model complexity and validation</td>
</tr>
<tr class="even">
<td><strong>Explainability</strong></td>
<td>Must decisions be interpretable? Regulatory or trust reasons?</td>
<td>May rule out black-box models</td>
</tr>
<tr class="odd">
<td><strong>Volume &amp; throughput</strong></td>
<td>Requests/sec at peak? Data volume for training?</td>
<td>Sizing, scaling strategy</td>
</tr>
<tr class="even">
<td><strong>Compliance</strong></td>
<td>GDPR, HIPAA, SOC2? Data residency? Audit trail?</td>
<td>Constrains cloud choices, data flows</td>
</tr>
<tr class="odd">
<td><strong>Budget</strong></td>
<td>CapEx vs OpEx? GPU budget? Ongoing inference cost?</td>
<td>Bounds tech stack and model size</td>
</tr>
<tr class="even">
<td><strong>Team capability</strong></td>
<td>Existing skills? Hiring timeline?</td>
<td>Determines build vs buy vs adapt</td>
</tr>
<tr class="odd">
<td><strong>Time-to-market</strong></td>
<td>MVP in weeks? Full system in months?</td>
<td>Phased delivery strategy</td>
</tr>
<tr class="even">
<td><strong>Evolution</strong></td>
<td>How will requirements change? Multi-model future?</td>
<td>Extensibility of architecture</td>
</tr>
</tbody>
</table>
</section>
<section id="requirements-cartography-framework" class="level3">
<h3 class="anchored" data-anchor-id="requirements-cartography-framework">Requirements Cartography Framework</h3>
<pre><code>1. STAKEHOLDER MAP
   ├── Business sponsors (value, ROI, timeline)
   ├── End users (UX, latency, accuracy)
   ├── Data scientists (experimentation freedom, tooling)
   ├── ML engineers (deployment, monitoring, ops)
   ├── Platform/infra team (cost, security, compliance)
   ├── Security &amp; compliance (data handling, audit)
   └── Support/operations (observability, incident response)

2. QUALITY ATTRIBUTE SCENARIOS (measurable NFRs)
   Format: [Source] [Stimulus] → [Response] [Measure]
   Example: "Under 1000 concurrent users, the recommendation
             API responds within 200ms at p99"

3. CONSTRAINT REGISTER
   ├── Must use existing Kubernetes cluster
   ├── Budget: $50K/month cloud spend cap
   ├── Timeline: MVP in 8 weeks
   ├── Data: Cannot leave EU (GDPR)
   └── Team: 2 ML engineers, 3 backend engineers

4. DEPENDENCY MAP
   ├── Upstream: Customer data lake (daily refresh)
   ├── Upstream: Real-time event stream (Kafka)
   ├── Downstream: Mobile app (REST API consumer)
   ├── Downstream: Analytics dashboard (Looker)
   └── Shared: Auth service, API gateway</code></pre>
</section>
<section id="common-mistakes-in-ai-requirements" class="level3">
<h3 class="anchored" data-anchor-id="common-mistakes-in-ai-requirements">Common Mistakes in AI Requirements</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 26%">
<col style="width: 38%">
<col style="width: 35%">
</colgroup>
<thead>
<tr class="header">
<th>Mistake</th>
<th>Consequence</th>
<th>Prevention</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Skipping NFR definition</td>
<td>System works in demo, fails at scale</td>
<td>Explicit quality attribute scenarios</td>
</tr>
<tr class="even">
<td>Assuming data quality</td>
<td>Model underperforms in production</td>
<td>Data profiling + validation early</td>
</tr>
<tr class="odd">
<td>Ignoring feedback loops</td>
<td>Model predictions influence future data</td>
<td>Map causal effects explicitly</td>
</tr>
<tr class="even">
<td>No cost ceiling</td>
<td>GPU bills spiral out of control</td>
<td>Budget as a first-class constraint</td>
</tr>
<tr class="odd">
<td>Single-point accuracy target</td>
<td>“95% accuracy” without context</td>
<td>Define per-class metrics, edge cases</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q2-how-do-you-design-the-architecture-of-an-aiml-system" class="level2">
<h2 class="anchored" data-anchor-id="q2-how-do-you-design-the-architecture-of-an-aiml-system">Q2: How Do You Design the Architecture of an AI/ML System?</h2>
<p><strong>Answer:</strong></p>
<p>AI system architecture follows a <strong>layered approach</strong> where each layer has distinct concerns — data ingestion, feature engineering, model training, serving, monitoring, and orchestration. The architect must decide boundaries, communication patterns, and the degree of coupling between ML and application logic.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph DataLayer["Data Layer"]
        SOURCES["Data Sources&lt;br/&gt;(DBs, APIs, streams, lakes)"]
        INGEST["Ingestion&lt;br/&gt;(batch + streaming)"]
        STORE["Feature Store&lt;br/&gt;(online + offline)"]
    end

    subgraph MLLayer["ML Layer"]
        TRAIN["Training Pipeline&lt;br/&gt;(experimentation → production)"]
        REGISTRY["Model Registry&lt;br/&gt;(versioning, metadata)"]
        EVAL["Evaluation&lt;br/&gt;(offline metrics, A/B tests)"]
    end

    subgraph ServingLayer["Serving Layer"]
        INFERENCE["Inference Service&lt;br/&gt;(real-time / batch / streaming)"]
        GATEWAY["API Gateway&lt;br/&gt;(routing, rate limiting)"]
        CACHE_S["Prediction Cache&lt;br/&gt;(repeated queries)"]
    end

    subgraph ObsLayer["Observability Layer"]
        METRICS["Metrics&lt;br/&gt;(latency, throughput, errors)"]
        DRIFT["Drift Detection&lt;br/&gt;(data + model quality)"]
        ALERTS["Alerting + Automation&lt;br/&gt;(retrain triggers)"]
    end

    subgraph OrcLayer["Orchestration Layer"]
        PIPELINE["Pipeline Orchestrator&lt;br/&gt;(Airflow, Kubeflow, SageMaker)"]
        CICD["CI/CD&lt;br/&gt;(model + app deployment)"]
        IaC["Infrastructure as Code&lt;br/&gt;(Terraform, Pulumi)"]
    end

    SOURCES --&gt; INGEST --&gt; STORE
    STORE --&gt; TRAIN --&gt; REGISTRY --&gt; INFERENCE
    INFERENCE --&gt; GATEWAY
    INFERENCE --&gt; DRIFT
    PIPELINE --&gt; TRAIN
    CICD --&gt; INFERENCE

    style DataLayer fill:#6cc3d5,stroke:#333,color:#fff
    style MLLayer fill:#56cc9d,stroke:#333,color:#fff
    style ServingLayer fill:#ffce67,stroke:#333
    style ObsLayer fill:#ff6b6b,stroke:#333,color:#fff
    style OrcLayer fill:#c3aed6,stroke:#333
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="architecture-patterns-for-ai-systems" class="level3">
<h3 class="anchored" data-anchor-id="architecture-patterns-for-ai-systems">Architecture Patterns for AI Systems</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 27%">
<col style="width: 39%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Pattern</th>
<th>When to Use</th>
<th>Trade-offs</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Monolithic ML</strong></td>
<td>Single model, simple pipeline, small team</td>
<td>Fast to build; hard to scale independently</td>
</tr>
<tr class="even">
<td><strong>Microservice per model</strong></td>
<td>Multiple models, independent scaling, separate teams</td>
<td>Operational complexity; network overhead</td>
</tr>
<tr class="odd">
<td><strong>Event-driven ML</strong></td>
<td>Streaming predictions, real-time features</td>
<td>Complex debugging; eventual consistency</td>
</tr>
<tr class="even">
<td><strong>Lambda architecture</strong></td>
<td>Need both batch accuracy and real-time speed</td>
<td>Dual pipeline maintenance</td>
</tr>
<tr class="odd">
<td><strong>Feature platform</strong></td>
<td>Multiple models share features, team scale</td>
<td>Upfront investment; governance overhead</td>
</tr>
<tr class="even">
<td><strong>Gateway + routing</strong></td>
<td>A/B testing, canary, multi-model serving</td>
<td>Added latency; routing logic</td>
</tr>
<tr class="odd">
<td><strong>Sidecar pattern</strong></td>
<td>ML at the edge, embedded inference</td>
<td>Model size limits; update complexity</td>
</tr>
<tr class="even">
<td><strong>RAG architecture</strong></td>
<td>LLM + domain knowledge, dynamic content</td>
<td>Retrieval quality; context window limits</td>
</tr>
</tbody>
</table>
</section>
<section id="key-architecture-decisions" class="level3">
<h3 class="anchored" data-anchor-id="key-architecture-decisions">Key Architecture Decisions</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 26%">
<col style="width: 23%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>Decision</th>
<th>Options</th>
<th>Selection Criteria</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Sync vs Async inference</strong></td>
<td>REST API / gRPC vs Message queue / Batch</td>
<td>Latency requirement, throughput pattern</td>
</tr>
<tr class="even">
<td><strong>Model co-location</strong></td>
<td>Embedded in app vs Separate service</td>
<td>Deployment independence, resource isolation</td>
</tr>
<tr class="odd">
<td><strong>Feature computation</strong></td>
<td>Pre-computed (store) vs On-the-fly (runtime)</td>
<td>Freshness requirement, computation cost</td>
</tr>
<tr class="even">
<td><strong>Training location</strong></td>
<td>Cloud managed vs Self-hosted K8s</td>
<td>GPU cost, compliance, team skills</td>
</tr>
<tr class="odd">
<td><strong>State management</strong></td>
<td>Stateless inference vs Stateful sessions</td>
<td>Conversation memory, personalization</td>
</tr>
<tr class="even">
<td><strong>Multi-model orchestration</strong></td>
<td>Pipeline (serial) vs Ensemble (parallel) vs Cascade</td>
<td>Latency budget, accuracy need</td>
</tr>
<tr class="odd">
<td><strong>Data flow</strong></td>
<td>Push (producer-driven) vs Pull (consumer-driven)</td>
<td>Freshness, coupling, backpressure</td>
</tr>
</tbody>
</table>
</section>
<section id="architecture-documentation-c4-model-approach" class="level3">
<h3 class="anchored" data-anchor-id="architecture-documentation-c4-model-approach">Architecture Documentation (C4 Model Approach)</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 29%">
<col style="width: 29%">
<col style="width: 41%">
</colgroup>
<thead>
<tr class="header">
<th>Level</th>
<th>Shows</th>
<th>Audience</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Context</strong> (L1)</td>
<td>System + external actors + neighboring systems</td>
<td>Business stakeholders</td>
</tr>
<tr class="even">
<td><strong>Container</strong> (L2)</td>
<td>Major deployable units (services, DBs, queues)</td>
<td>Tech leads, architects</td>
</tr>
<tr class="odd">
<td><strong>Component</strong> (L3)</td>
<td>Internal structure of a container (modules, classes)</td>
<td>Development team</td>
</tr>
<tr class="even">
<td><strong>Code</strong> (L4)</td>
<td>Implementation details (only for critical paths)</td>
<td>Individual developers</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q3-how-do-you-evaluate-and-select-a-tech-stack-for-aiml-systems" class="level2">
<h2 class="anchored" data-anchor-id="q3-how-do-you-evaluate-and-select-a-tech-stack-for-aiml-systems">Q3: How Do You Evaluate and Select a Tech Stack for AI/ML Systems?</h2>
<p><strong>Answer:</strong></p>
<p>Tech stack selection for AI systems is a high-stakes decision with long-lived consequences. The architect must evaluate options against requirements while managing trade-offs between maturity, cost, team skills, vendor lock-in, and ecosystem integration.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Criteria["Selection Criteria"]
        REQ["Requirements Fit&lt;br/&gt;(functional + NFR)"]
        TEAM["Team Skills&lt;br/&gt;(current + hirable)"]
        COST_C["Total Cost of Ownership&lt;br/&gt;(build + run + maintain)"]
        MATURITY["Maturity &amp; Support&lt;br/&gt;(community, docs, enterprise)"]
        LOCKIN["Vendor Lock-in Risk&lt;br/&gt;(portability, exit cost)"]
        ECOSYSTEM["Ecosystem Integration&lt;br/&gt;(existing infra, tools)"]
    end

    subgraph Decision["Decision Framework"]
        WEIGHT["Weight criteria&lt;br/&gt;per project context"]
        COMPARE["Compare options&lt;br/&gt;(scored matrix)"]
        PROTOTYPE["Prototype critical path&lt;br/&gt;(validate assumptions)"]
        DOC["Document decision&lt;br/&gt;(ADR - Architecture Decision Record)"]
    end

    Criteria --&gt; Decision

    style Criteria fill:#6cc3d5,stroke:#333,color:#fff
    style Decision fill:#56cc9d,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="ai-tech-stack-layers" class="level3">
<h3 class="anchored" data-anchor-id="ai-tech-stack-layers">AI Tech Stack Layers</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 21%">
<col style="width: 27%">
<col style="width: 51%">
</colgroup>
<thead>
<tr class="header">
<th>Layer</th>
<th>Options</th>
<th>Decision Drivers</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Cloud platform</strong></td>
<td>AWS, GCP, Azure, multi-cloud, on-prem</td>
<td>Existing contracts, compliance, GPU availability, team skills</td>
</tr>
<tr class="even">
<td><strong>Orchestration</strong></td>
<td>Airflow, Kubeflow, SageMaker Pipelines, Vertex AI</td>
<td>K8s expertise, cloud lock-in tolerance, pipeline complexity</td>
</tr>
<tr class="odd">
<td><strong>Training</strong></td>
<td>SageMaker, Vertex AI, Azure ML, self-managed K8s + Ray</td>
<td>GPU cost, distributed training needs, experiment scale</td>
</tr>
<tr class="even">
<td><strong>Serving</strong></td>
<td>SageMaker Endpoints, KServe, Seldon, BentoML, vLLM</td>
<td>Latency, multi-model, auto-scaling, model size</td>
</tr>
<tr class="odd">
<td><strong>Feature store</strong></td>
<td>Feast, Tecton, SageMaker FS, Vertex AI FS</td>
<td>Online/offline needs, team size, freshness</td>
</tr>
<tr class="even">
<td><strong>Experiment tracking</strong></td>
<td>MLflow, W&amp;B, Neptune, SageMaker Experiments</td>
<td>Collaboration needs, cost, self-hosted vs SaaS</td>
</tr>
<tr class="odd">
<td><strong>Data platform</strong></td>
<td>Databricks, Snowflake, BigQuery, Redshift</td>
<td>Data volume, SQL/Spark preference, existing investment</td>
</tr>
<tr class="even">
<td><strong>Model format</strong></td>
<td>ONNX, TorchScript, SavedModel, GGUF</td>
<td>Framework diversity, edge deployment, optimization</td>
</tr>
<tr class="odd">
<td><strong>Monitoring</strong></td>
<td>Evidently, WhyLabs, SageMaker Monitor, custom</td>
<td>Drift types, alerting integration, cost</td>
</tr>
<tr class="even">
<td><strong>LLM infra</strong></td>
<td>OpenAI API, Anthropic, self-hosted (vLLM, TGI)</td>
<td>Data privacy, latency, cost, fine-tuning needs</td>
</tr>
</tbody>
</table>
</section>
<section id="evaluation-matrix-template" class="level3">
<h3 class="anchored" data-anchor-id="evaluation-matrix-template">Evaluation Matrix Template</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 24%">
<col style="width: 23%">
<col style="width: 31%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="header">
<th>Criterion (weight)</th>
<th>Option A: Managed</th>
<th>Option B: Open-source K8s</th>
<th>Option C: Hybrid</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Requirements fit</strong> (25%)</td>
<td>9/10</td>
<td>8/10</td>
<td>9/10</td>
</tr>
<tr class="even">
<td><strong>Team skills</strong> (20%)</td>
<td>8/10 (lower learning curve)</td>
<td>5/10 (K8s expertise needed)</td>
<td>7/10</td>
</tr>
<tr class="odd">
<td><strong>TCO (3-year)</strong> (20%)</td>
<td>6/10 (higher at scale)</td>
<td>8/10 (lower unit cost)</td>
<td>7/10</td>
</tr>
<tr class="even">
<td><strong>Lock-in risk</strong> (15%)</td>
<td>4/10 (high lock-in)</td>
<td>9/10 (portable)</td>
<td>7/10</td>
</tr>
<tr class="odd">
<td><strong>Maturity</strong> (10%)</td>
<td>9/10 (enterprise support)</td>
<td>7/10 (community)</td>
<td>8/10</td>
</tr>
<tr class="even">
<td><strong>Ecosystem</strong> (10%)</td>
<td>8/10 (cloud-native)</td>
<td>7/10 (integration effort)</td>
<td>8/10</td>
</tr>
<tr class="odd">
<td><strong>Weighted score</strong></td>
<td><strong>7.35</strong></td>
<td><strong>7.25</strong></td>
<td><strong>7.75</strong></td>
</tr>
</tbody>
</table>
</section>
<section id="architecture-decision-record-adr-template" class="level3">
<h3 class="anchored" data-anchor-id="architecture-decision-record-adr-template">Architecture Decision Record (ADR) Template</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;"># ADR-003: Model Serving Infrastructure</span></span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Status: Accepted (2026-05-21)</span></span>
<span id="cb2-4"></span>
<span id="cb2-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Context</span></span>
<span id="cb2-6">We need to serve 5 ML models (recommendation, fraud, pricing, </span>
<span id="cb2-7">search ranking, personalization) with p99 latency &lt; 200ms at </span>
<span id="cb2-8">10K requests/sec peak. Team has Kubernetes expertise but limited </span>
<span id="cb2-9">cloud-managed ML experience.</span>
<span id="cb2-10"></span>
<span id="cb2-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Decision</span></span>
<span id="cb2-12">Use KServe on existing EKS cluster with Istio service mesh.</span>
<span id="cb2-13"></span>
<span id="cb2-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Rationale</span></span>
<span id="cb2-15"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Leverages existing K8s expertise and infrastructure</span>
<span id="cb2-16"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Supports multiple frameworks (sklearn, PyTorch, TensorFlow)</span>
<span id="cb2-17"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Provides canary deployments and traffic splitting natively</span>
<span id="cb2-18"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>No vendor lock-in (runs on any K8s)</span>
<span id="cb2-19"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Scale-to-zero for low-traffic models reduces cost</span>
<span id="cb2-20"></span>
<span id="cb2-21"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Consequences</span></span>
<span id="cb2-22"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Team needs to learn KServe CRDs and InferenceService API</span>
<span id="cb2-23"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Must manage GPU node pools ourselves (auto-scaling config)</span>
<span id="cb2-24"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Need to build custom monitoring dashboard (Prometheus + Grafana)</span>
<span id="cb2-25"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Upgrade path: can migrate to managed service later if needed</span>
<span id="cb2-26"></span>
<span id="cb2-27"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">## Alternatives Considered</span></span>
<span id="cb2-28"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>SageMaker Endpoints: Higher cost at scale, AWS lock-in</span>
<span id="cb2-29"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>BentoML Cloud: Less mature, limited auto-scaling options</span>
<span id="cb2-30"><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">- </span>Seldon Core: More complex for our use case (inference graphs not needed)</span></code></pre></div></div>
</section>
<section id="build-vs-buy-vs-adapt-decision-framework" class="level3">
<h3 class="anchored" data-anchor-id="build-vs-buy-vs-adapt-decision-framework">Build vs Buy vs Adapt Decision Framework</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 15%">
<col style="width: 24%">
<col style="width: 24%">
<col style="width: 35%">
</colgroup>
<thead>
<tr class="header">
<th>Factor</th>
<th>Build Custom</th>
<th>Buy Managed</th>
<th>Adapt Open-Source</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Time-to-market</strong></td>
<td>Slowest</td>
<td>Fastest</td>
<td>Medium</td>
</tr>
<tr class="even">
<td><strong>Long-term cost</strong></td>
<td>Lowest (at scale)</td>
<td>Highest (at scale)</td>
<td>Medium</td>
</tr>
<tr class="odd">
<td><strong>Team investment</strong></td>
<td>High (build + maintain)</td>
<td>Low (vendor manages)</td>
<td>Medium (customize + ops)</td>
</tr>
<tr class="even">
<td><strong>Differentiation</strong></td>
<td>Maximum (custom to needs)</td>
<td>Limited (shared features)</td>
<td>High (customizable)</td>
</tr>
<tr class="odd">
<td><strong>Risk</strong></td>
<td>Delivery risk (build wrong thing)</td>
<td>Vendor risk (lock-in, shutdown)</td>
<td>Community risk (abandonment)</td>
</tr>
<tr class="even">
<td><strong>Best when</strong></td>
<td>Core competitive advantage</td>
<td>Commodity capability</td>
<td>Common need + specific customization</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q4-how-do-you-manage-the-cost-latency-quality-triangle-in-ai-systems" class="level2">
<h2 class="anchored" data-anchor-id="q4-how-do-you-manage-the-cost-latency-quality-triangle-in-ai-systems">Q4: How Do You Manage the Cost-Latency-Quality Triangle in AI Systems?</h2>
<p><strong>Answer:</strong></p>
<p>Every AI system faces a fundamental <strong>three-way trade-off</strong> between cost, latency, and quality. Improving any one dimension typically worsens at least one other. The architect’s job is to find the optimal balance point for the specific business context and make trade-offs explicit.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Triangle["Cost-Latency-Quality Triangle"]
        COST["COST&lt;br/&gt;(compute, storage, API calls)"]
        LATENCY["LATENCY&lt;br/&gt;(response time, throughput)"]
        QUALITY["QUALITY&lt;br/&gt;(accuracy, reliability, UX)"]
    end

    COST ---|"Cheaper models → lower quality"| QUALITY
    COST ---|"Fewer resources → higher latency"| LATENCY
    LATENCY ---|"Faster → simpler model → lower quality"| QUALITY

    style Triangle fill:#f8f9fa,stroke:#333
    style COST fill:#ff6b6b,stroke:#333,color:#fff
    style LATENCY fill:#ffce67,stroke:#333
    style QUALITY fill:#56cc9d,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="trade-off-scenarios" class="level3">
<h3 class="anchored" data-anchor-id="trade-off-scenarios">Trade-off Scenarios</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 19%">
<col style="width: 15%">
<col style="width: 21%">
<col style="width: 21%">
<col style="width: 21%">
</colgroup>
<thead>
<tr class="header">
<th>Scenario</th>
<th>Cost ↓</th>
<th>Latency ↓</th>
<th>Quality ↑</th>
<th>Technique</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Use smaller model</td>
<td>✅</td>
<td>✅</td>
<td>❌</td>
<td>Distillation, quantization</td>
</tr>
<tr class="even">
<td>Add caching layer</td>
<td>✅</td>
<td>✅</td>
<td>—</td>
<td>Redis/CDN for repeated queries</td>
</tr>
<tr class="odd">
<td>Batch predictions</td>
<td>✅</td>
<td>❌</td>
<td>—</td>
<td>Pre-compute during off-peak</td>
</tr>
<tr class="even">
<td>Cascade (cheap → expensive)</td>
<td>✅</td>
<td>—</td>
<td>✅</td>
<td>Route hard cases to better model</td>
</tr>
<tr class="odd">
<td>Scale horizontally</td>
<td>❌</td>
<td>✅</td>
<td>—</td>
<td>More replicas, load balancing</td>
</tr>
<tr class="even">
<td>Use larger model</td>
<td>❌</td>
<td>❌</td>
<td>✅</td>
<td>GPT-4 instead of GPT-3.5</td>
</tr>
<tr class="odd">
<td>Feature enrichment</td>
<td>❌</td>
<td>❌</td>
<td>✅</td>
<td>More signals → better predictions</td>
</tr>
<tr class="even">
<td>A/B test + rollback</td>
<td>—</td>
<td>—</td>
<td>✅</td>
<td>Validate quality before full deploy</td>
</tr>
<tr class="odd">
<td>Spot/preemptible for training</td>
<td>✅</td>
<td>❌ (training time)</td>
<td>—</td>
<td>Checkpointing + retry</td>
</tr>
<tr class="even">
<td>Edge inference</td>
<td>✅ (no cloud)</td>
<td>✅ (local)</td>
<td>❌ (model size limit)</td>
<td>ONNX, TFLite on device</td>
</tr>
</tbody>
</table>
</section>
<section id="cost-optimization-strategies" class="level3">
<h3 class="anchored" data-anchor-id="cost-optimization-strategies">Cost Optimization Strategies</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 22%">
<col style="width: 43%">
<col style="width: 34%">
</colgroup>
<thead>
<tr class="header">
<th>Strategy</th>
<th>Savings Potential</th>
<th>Applicability</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Right-size inference instances</strong></td>
<td>30-60%</td>
<td>Over-provisioned endpoints</td>
</tr>
<tr class="even">
<td><strong>Auto-scale to zero</strong></td>
<td>80-90% for low-traffic</td>
<td>Dev/staging + off-peak models</td>
</tr>
<tr class="odd">
<td><strong>Spot instances for training</strong></td>
<td>60-90%</td>
<td>Fault-tolerant training jobs</td>
</tr>
<tr class="even">
<td><strong>Model quantization (INT8/FP16)</strong></td>
<td>50-75% inference cost</td>
<td>Latency-tolerant applications</td>
</tr>
<tr class="odd">
<td><strong>Prediction caching</strong></td>
<td>40-80% API call savings</td>
<td>Repeated/similar queries</td>
</tr>
<tr class="even">
<td><strong>Cascade routing</strong></td>
<td>40-60%</td>
<td>Mixed complexity requests</td>
</tr>
<tr class="odd">
<td><strong>Batch inference</strong></td>
<td>70-90% vs real-time</td>
<td>Non-urgent scoring</td>
</tr>
<tr class="even">
<td><strong>Reserved capacity / Savings Plans</strong></td>
<td>30-60%</td>
<td>Steady-state workloads</td>
</tr>
<tr class="odd">
<td><strong>Smaller models (distillation)</strong></td>
<td>50-80%</td>
<td>Where accuracy drop acceptable</td>
</tr>
<tr class="even">
<td><strong>Multi-tenant endpoints</strong></td>
<td>40-70%</td>
<td>Many low-traffic models</td>
</tr>
</tbody>
</table>
</section>
<section id="latency-budget-breakdown" class="level3">
<h3 class="anchored" data-anchor-id="latency-budget-breakdown">Latency Budget Breakdown</h3>
<pre><code>Total latency budget: 200ms (p99 target)
├── Network (client → gateway): 20ms
├── API gateway + auth: 10ms
├── Feature retrieval (online store): 15ms
├── Model inference: 80ms
├── Post-processing + business logic: 15ms
├── Response serialization: 5ms
├── Network (gateway → client): 20ms
└── Buffer for variance: 35ms</code></pre>
</section>
<section id="quality-assurance-layers" class="level3">
<h3 class="anchored" data-anchor-id="quality-assurance-layers">Quality Assurance Layers</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 21%">
<col style="width: 59%">
<col style="width: 18%">
</colgroup>
<thead>
<tr class="header">
<th>Layer</th>
<th>What It Validates</th>
<th>When</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Offline evaluation</strong></td>
<td>Accuracy, F1, AUC on held-out data</td>
<td>Before deployment</td>
</tr>
<tr class="even">
<td><strong>Shadow testing</strong></td>
<td>Compare new model vs production (no user impact)</td>
<td>Pre-production</td>
</tr>
<tr class="odd">
<td><strong>Canary deployment</strong></td>
<td>Small traffic %, monitor metrics</td>
<td>Gradual rollout</td>
</tr>
<tr class="even">
<td><strong>A/B testing</strong></td>
<td>Statistical comparison of business metrics</td>
<td>Production</td>
</tr>
<tr class="odd">
<td><strong>Online monitoring</strong></td>
<td>Drift, latency, error rate, prediction distribution</td>
<td>Continuous</td>
</tr>
<tr class="even">
<td><strong>User feedback</strong></td>
<td>Explicit ratings, implicit engagement signals</td>
<td>Ongoing</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q5-how-do-you-ensure-security-and-compliance-in-ai-architecture" class="level2">
<h2 class="anchored" data-anchor-id="q5-how-do-you-ensure-security-and-compliance-in-ai-architecture">Q5: How Do You Ensure Security and Compliance in AI Architecture?</h2>
<p><strong>Answer:</strong></p>
<p>AI systems introduce unique security challenges — adversarial attacks on models, data poisoning, prompt injection, PII leakage, and model theft. The architect must address security at every layer while meeting regulatory requirements (GDPR, HIPAA, SOC2, EU AI Act).</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Threats["AI-Specific Threats"]
        ADV["Adversarial Attacks&lt;br/&gt;(input manipulation)"]
        POISON["Data Poisoning&lt;br/&gt;(corrupted training data)"]
        EXTRACT["Model Extraction&lt;br/&gt;(stealing model via API)"]
        INJECTION["Prompt Injection&lt;br/&gt;(LLM manipulation)"]
        LEAKAGE["Data Leakage&lt;br/&gt;(PII in outputs/logs)"]
        SUPPLY["Supply Chain&lt;br/&gt;(malicious packages/models)"]
    end

    subgraph Defenses["Defense Layers"]
        NETWORK["Network Security&lt;br/&gt;(VPC, private endpoints)"]
        DATA_SEC["Data Security&lt;br/&gt;(encryption, access control)"]
        MODEL_SEC["Model Security&lt;br/&gt;(signing, validation)"]
        RUNTIME["Runtime Protection&lt;br/&gt;(input validation, guardrails)"]
        AUDIT["Audit &amp; Governance&lt;br/&gt;(logging, compliance)"]
        RESPONSIBLE["Responsible AI&lt;br/&gt;(bias, fairness, transparency)"]
    end

    Threats --&gt; Defenses

    style Threats fill:#ff6b6b,stroke:#333,color:#fff
    style Defenses fill:#56cc9d,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="security-architecture-checklist" class="level3">
<h3 class="anchored" data-anchor-id="security-architecture-checklist">Security Architecture Checklist</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 21%">
<col style="width: 28%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>Layer</th>
<th>Control</th>
<th>Implementation</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Network</strong></td>
<td>Isolation</td>
<td>VPC, private subnets, VPC endpoints, no public access</td>
</tr>
<tr class="even">
<td><strong>Network</strong></td>
<td>Encryption in transit</td>
<td>TLS 1.3 everywhere, mutual TLS for service-to-service</td>
</tr>
<tr class="odd">
<td><strong>Data</strong></td>
<td>Encryption at rest</td>
<td>KMS/CMK for all storage (S3, DB, volumes)</td>
</tr>
<tr class="even">
<td><strong>Data</strong></td>
<td>Access control</td>
<td>Least privilege IAM, row-level security, column masking</td>
</tr>
<tr class="odd">
<td><strong>Data</strong></td>
<td>PII handling</td>
<td>Tokenization, differential privacy, data minimization</td>
</tr>
<tr class="even">
<td><strong>Model</strong></td>
<td>Integrity</td>
<td>Model signing, hash verification, immutable registry</td>
</tr>
<tr class="odd">
<td><strong>Model</strong></td>
<td>Access**</td>
<td>API keys + rate limiting + IP allowlisting</td>
</tr>
<tr class="even">
<td><strong>Inference</strong></td>
<td>Input validation</td>
<td>Schema validation, content filtering, size limits</td>
</tr>
<tr class="odd">
<td><strong>Inference</strong></td>
<td>Output filtering</td>
<td>PII scrubbing, guardrails, response validation</td>
</tr>
<tr class="even">
<td><strong>LLM</strong></td>
<td>Prompt injection defense</td>
<td>System prompts, input/output guards, sandboxing</td>
</tr>
<tr class="odd">
<td><strong>Supply chain</strong></td>
<td>Dependency scanning</td>
<td>Signed containers, vulnerability scanning, SBOM</td>
</tr>
<tr class="even">
<td><strong>Governance</strong></td>
<td>Audit trail</td>
<td>All API calls logged, model lineage tracked</td>
</tr>
<tr class="odd">
<td><strong>Compliance</strong></td>
<td>Data residency</td>
<td>Region-locked processing, data classification</td>
</tr>
</tbody>
</table>
</section>
<section id="regulatory-compliance-matrix" class="level3">
<h3 class="anchored" data-anchor-id="regulatory-compliance-matrix">Regulatory Compliance Matrix</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 44%">
<col style="width: 35%">
</colgroup>
<thead>
<tr class="header">
<th>Regulation</th>
<th>Key Requirements for AI</th>
<th>Architect Response</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>GDPR</strong></td>
<td>Right to explanation, data minimization, consent</td>
<td>Interpretable models, data retention policies, audit logs</td>
</tr>
<tr class="even">
<td><strong>EU AI Act</strong></td>
<td>Risk classification, transparency, human oversight</td>
<td>Risk assessment, model cards, human-in-the-loop for high-risk</td>
</tr>
<tr class="odd">
<td><strong>HIPAA</strong></td>
<td>PHI protection, access logs, BAA</td>
<td>Encryption, access control, audit trail, compliant hosting</td>
</tr>
<tr class="even">
<td><strong>SOC 2</strong></td>
<td>Security, availability, confidentiality controls</td>
<td>Documented policies, automated controls, annual audit</td>
</tr>
<tr class="odd">
<td><strong>CCPA</strong></td>
<td>Data deletion, opt-out of automated decisions</td>
<td>Data lineage, model unlearning capability</td>
</tr>
<tr class="even">
<td><strong>FDA (SaMD)</strong></td>
<td>Clinical validation, change control</td>
<td>Locked models, validation studies, version control</td>
</tr>
</tbody>
</table>
</section>
<section id="zero-trust-architecture-for-ai" class="level3">
<h3 class="anchored" data-anchor-id="zero-trust-architecture-for-ai">Zero-Trust Architecture for AI</h3>
<pre><code>Principle: Never trust, always verify

1. Identity: Every service has a workload identity (no shared credentials)
2. Network: Service mesh (Istio/Linkerd) with mTLS between all services
3. Data: Encrypted at rest AND in transit, even within private network
4. Access: Just-in-time access to training data, not standing permissions
5. Inference: Validate every input (schema + content + rate + origin)
6. Models: Signed artifacts, verified at deployment, immutable in production
7. Observability: Log all access decisions, model inputs/outputs (redacted)
8. Supply chain: Signed containers, scanned dependencies, private registry</code></pre>
<hr>
</section>
</section>
<section id="q6-how-do-you-architect-ai-systems-for-scalability-and-concurrency" class="level2">
<h2 class="anchored" data-anchor-id="q6-how-do-you-architect-ai-systems-for-scalability-and-concurrency">Q6: How Do You Architect AI Systems for Scalability and Concurrency?</h2>
<p><strong>Answer:</strong></p>
<p>AI systems face unique scaling challenges: GPU-bound inference, large model loading times, stateful sessions (conversational AI), and variable compute costs per request. The architect must design for elastic scaling across multiple dimensions simultaneously.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Scaling["Scaling Dimensions"]
        HSCALE["Horizontal&lt;br/&gt;(more replicas)"]
        VSCALE["Vertical&lt;br/&gt;(bigger instances)"]
        FUNC["Functional&lt;br/&gt;(decompose by model)"]
        DATA_S["Data&lt;br/&gt;(partition by entity)"]
    end

    subgraph Patterns["Scaling Patterns"]
        ASYNC["Async Processing&lt;br/&gt;(queue-based decoupling)"]
        CACHE_P["Caching&lt;br/&gt;(reduce recomputation)"]
        BATCH_P["Batching&lt;br/&gt;(GPU efficiency)"]
        SHARD["Sharding&lt;br/&gt;(partition load)"]
        CIRCUIT["Circuit Breaker&lt;br/&gt;(graceful degradation)"]
    end

    subgraph Infra["Infrastructure"]
        K8S_I["Kubernetes&lt;br/&gt;(pod autoscaling)"]
        GPU_I["GPU Pools&lt;br/&gt;(heterogeneous nodes)"]
        LB_I["Load Balancer&lt;br/&gt;(intelligent routing)"]
        CDN_I["CDN / Edge&lt;br/&gt;(reduce round-trips)"]
    end

    Scaling --&gt; Patterns --&gt; Infra

    style Scaling fill:#6cc3d5,stroke:#333,color:#fff
    style Patterns fill:#56cc9d,stroke:#333,color:#fff
    style Infra fill:#ffce67,stroke:#333
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="scaling-strategy-by-load-type" class="level3">
<h3 class="anchored" data-anchor-id="scaling-strategy-by-load-type">Scaling Strategy by Load Type</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 28%">
<col style="width: 24%">
<col style="width: 46%">
</colgroup>
<thead>
<tr class="header">
<th>Load Pattern</th>
<th>Challenge</th>
<th>Architecture Response</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Steady high throughput</strong></td>
<td>Cost efficiency at scale</td>
<td>Right-sized reserved instances, model optimization</td>
</tr>
<tr class="even">
<td><strong>Spiky / bursty</strong></td>
<td>Cold start latency on scale-up</td>
<td>Warm pools, pre-scaled buffer, predictive scaling</td>
</tr>
<tr class="odd">
<td><strong>Diurnal (day/night)</strong></td>
<td>Paying for idle capacity</td>
<td>Scheduled scaling, scale-to-zero off-peak</td>
</tr>
<tr class="even">
<td><strong>Event-driven surges</strong></td>
<td>Unpredictable 10-100x spikes</td>
<td>Queue-based decoupling, serverless overflow</td>
</tr>
<tr class="odd">
<td><strong>Gradual growth</strong></td>
<td>Architecture ceiling hit</td>
<td>Horizontal partitioning, data sharding</td>
</tr>
<tr class="even">
<td><strong>Multi-tenant</strong></td>
<td>Noisy neighbor, fair sharing</td>
<td>Resource quotas, priority queues, tenant isolation</td>
</tr>
</tbody>
</table>
</section>
<section id="gpu-aware-scaling" class="level3">
<h3 class="anchored" data-anchor-id="gpu-aware-scaling">GPU-Aware Scaling</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 27%">
<col style="width: 36%">
<col style="width: 36%">
</colgroup>
<thead>
<tr class="header">
<th>Strategy</th>
<th>Description</th>
<th>When to Use</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Dynamic batching</strong></td>
<td>Collect requests and batch GPU inference</td>
<td>High-throughput serving</td>
</tr>
<tr class="even">
<td><strong>Model parallelism</strong></td>
<td>Split large model across multiple GPUs</td>
<td>LLMs (70B+ params)</td>
</tr>
<tr class="odd">
<td><strong>Multi-model serving</strong></td>
<td>Load multiple small models on one GPU</td>
<td>Many low-traffic models</td>
</tr>
<tr class="even">
<td><strong>GPU sharing (MIG/MPS)</strong></td>
<td>Partition GPU across workloads</td>
<td>Mixed-size models</td>
</tr>
<tr class="odd">
<td><strong>CPU offloading</strong></td>
<td>Pre/post-processing on CPU, GPU for inference only</td>
<td>Minimize GPU time</td>
</tr>
<tr class="even">
<td><strong>Speculative decoding</strong></td>
<td>Draft + verify for faster LLM generation</td>
<td>LLM latency reduction</td>
</tr>
</tbody>
</table>
</section>
<section id="concurrency-patterns-for-ai" class="level3">
<h3 class="anchored" data-anchor-id="concurrency-patterns-for-ai">Concurrency Patterns for AI</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Pattern: Queue-based decoupling for variable-cost inference</span></span>
<span id="cb5-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Handles bursts without overwhelming GPU resources</span></span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb5-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">Producer (API) → Message Queue → Consumer (GPU Workers)</span></span>
<span id="cb5-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     ↓                              ↓</span></span>
<span id="cb5-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Immediate ACK              Process at GPU capacity</span></span>
<span id="cb5-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  (202 Accepted)             Auto-scale workers on queue depth</span></span>
<span id="cb5-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">     ↓                              ↓</span></span>
<span id="cb5-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  Client polls /             Write result to cache/DB</span></span>
<span id="cb5-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  or webhook callback</span></span>
<span id="cb5-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb5-13"></span>
<span id="cb5-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Auto-scaling triggers:</span></span>
<span id="cb5-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1. Queue depth &gt; 100 → scale up workers</span></span>
<span id="cb5-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2. Average GPU utilization &lt; 30% → scale down</span></span>
<span id="cb5-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 3. Request latency p99 &gt; SLA → scale up</span></span>
<span id="cb5-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 4. Time-based: pre-scale before known peak hours</span></span></code></pre></div></div>
</section>
<section id="capacity-planning-formula" class="level3">
<h3 class="anchored" data-anchor-id="capacity-planning-formula">Capacity Planning Formula</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 30%">
<col style="width: 34%">
<col style="width: 34%">
</colgroup>
<thead>
<tr class="header">
<th>Metric</th>
<th>Formula</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Min replicas</strong></td>
<td>Peak RPS ÷ Throughput per replica × Safety factor</td>
<td>1000 RPS ÷ 200/replica × 1.5 = 8</td>
</tr>
<tr class="even">
<td><strong>GPU memory</strong></td>
<td>Model size + Batch size × Input size + Overhead</td>
<td>7GB + 32 × 0.1GB + 2GB = 12.2GB</td>
</tr>
<tr class="odd">
<td><strong>Queue depth target</strong></td>
<td>Acceptable latency × Consumer throughput</td>
<td>5s × 200/s = 1000 messages</td>
</tr>
<tr class="even">
<td><strong>Storage growth</strong></td>
<td>Daily data × Retention × Replication</td>
<td>10GB/day × 90 × 3 = 2.7TB</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q7-how-do-you-manage-risk-in-ai-system-architecture" class="level2">
<h2 class="anchored" data-anchor-id="q7-how-do-you-manage-risk-in-ai-system-architecture">Q7: How Do You Manage Risk in AI System Architecture?</h2>
<p><strong>Answer:</strong></p>
<p>AI projects carry unique risks beyond standard software: model performance degradation, data dependency fragility, regulatory uncertainty, and the gap between offline accuracy and real-world value. The architect manages risk through identification, quantification, mitigation, and continuous monitoring.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph RiskCategories["Risk Categories"]
        TECHNICAL["Technical Risk&lt;br/&gt;(model fails, system breaks)"]
        DATA_R["Data Risk&lt;br/&gt;(quality, availability, drift)"]
        BUSINESS["Business Risk&lt;br/&gt;(no value delivered)"]
        OPERATIONAL["Operational Risk&lt;br/&gt;(outage, incidents)"]
        COMPLIANCE_R["Compliance Risk&lt;br/&gt;(regulatory violations)"]
        VENDOR_R["Vendor Risk&lt;br/&gt;(lock-in, shutdown, cost hike)"]
    end

    subgraph Mitigation["Mitigation Strategies"]
        FALLBACK["Fallback Systems&lt;br/&gt;(graceful degradation)"]
        MONITORING_R["Active Monitoring&lt;br/&gt;(detect before impact)"]
        CONTRACTS["Contracts &amp; SLAs&lt;br/&gt;(vendor accountability)"]
        PHASED["Phased Delivery&lt;br/&gt;(validate before scale)"]
        INSURANCE["Insurance Patterns&lt;br/&gt;(redundancy, backups)"]
    end

    RiskCategories --&gt; Mitigation

    style RiskCategories fill:#ff6b6b,stroke:#333,color:#fff
    style Mitigation fill:#56cc9d,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="ai-risk-register-template" class="level3">
<h3 class="anchored" data-anchor-id="ai-risk-register-template">AI Risk Register Template</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 11%">
<col style="width: 25%">
<col style="width: 15%">
<col style="width: 13%">
<col style="width: 21%">
<col style="width: 13%">
</colgroup>
<thead>
<tr class="header">
<th>Risk</th>
<th>Probability</th>
<th>Impact</th>
<th>Score</th>
<th>Mitigation</th>
<th>Owner</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Model accuracy below threshold</td>
<td>Medium</td>
<td>High</td>
<td><strong>High</strong></td>
<td>Phased rollout + A/B testing + rollback plan</td>
<td>ML Lead</td>
</tr>
<tr class="even">
<td>Training data pipeline fails</td>
<td>Low</td>
<td>Critical</td>
<td><strong>High</strong></td>
<td>Redundant sources + data validation + alerting</td>
<td>Data Eng</td>
</tr>
<tr class="odd">
<td>GPU costs exceed budget 2x</td>
<td>Medium</td>
<td>Medium</td>
<td><strong>Medium</strong></td>
<td>Auto-scaling limits + spot instances + cost alerts</td>
<td>Architect</td>
</tr>
<tr class="even">
<td>Key vendor discontinues service</td>
<td>Low</td>
<td>High</td>
<td><strong>Medium</strong></td>
<td>Abstraction layer + multi-vendor capable</td>
<td>Architect</td>
</tr>
<tr class="odd">
<td>Data drift degrades model silently</td>
<td>High</td>
<td>High</td>
<td><strong>Critical</strong></td>
<td>Model monitoring + automated retraining triggers</td>
<td>MLOps</td>
</tr>
<tr class="even">
<td>Regulatory change (EU AI Act)</td>
<td>Medium</td>
<td>High</td>
<td><strong>High</strong></td>
<td>Build for interpretability + model cards + audit trail</td>
<td>Legal + Arch</td>
</tr>
<tr class="odd">
<td>Single point of failure in serving</td>
<td>Low</td>
<td>Critical</td>
<td><strong>High</strong></td>
<td>Multi-AZ + circuit breaker + fallback model</td>
<td>Platform</td>
</tr>
<tr class="even">
<td>Team member leaves (bus factor)</td>
<td>Medium</td>
<td>Medium</td>
<td><strong>Medium</strong></td>
<td>Documentation + pair programming + cross-training</td>
<td>Manager</td>
</tr>
</tbody>
</table>
</section>
<section id="graceful-degradation-strategy" class="level3">
<h3 class="anchored" data-anchor-id="graceful-degradation-strategy">Graceful Degradation Strategy</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 14%">
<col style="width: 27%">
<col style="width: 37%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="header">
<th>Layer</th>
<th>Full Service</th>
<th>Degraded Service</th>
<th>Fallback</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>ML model</strong></td>
<td>Latest v3 model (best accuracy)</td>
<td>Previous v2 model (stable)</td>
<td>Rule-based heuristics</td>
</tr>
<tr class="even">
<td><strong>Feature store</strong></td>
<td>Real-time features</td>
<td>Cached features (1hr old)</td>
<td>Default feature values</td>
</tr>
<tr class="odd">
<td><strong>LLM API</strong></td>
<td>GPT-4 (best quality)</td>
<td>GPT-3.5 (faster, cheaper)</td>
<td>Template responses</td>
</tr>
<tr class="even">
<td><strong>Recommendations</strong></td>
<td>Personalized (ML model)</td>
<td>Popular items (pre-computed)</td>
<td>Editorial curated list</td>
</tr>
<tr class="odd">
<td><strong>Search ranking</strong></td>
<td>ML-ranked results</td>
<td>TF-IDF / BM25 fallback</td>
<td>Alphabetical / recency</td>
</tr>
<tr class="even">
<td><strong>Fraud detection</strong></td>
<td>Real-time ML scoring</td>
<td>Rule-based thresholds</td>
<td>Block &gt; $10K transactions</td>
</tr>
</tbody>
</table>
</section>
<section id="risk-mitigation-patterns" class="level3">
<h3 class="anchored" data-anchor-id="risk-mitigation-patterns">Risk Mitigation Patterns</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 28%">
<col style="width: 40%">
<col style="width: 31%">
</colgroup>
<thead>
<tr class="header">
<th>Pattern</th>
<th>Description</th>
<th>Use Case</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Circuit breaker</strong></td>
<td>Stop calling failing service, use fallback</td>
<td>Model service overloaded</td>
</tr>
<tr class="even">
<td><strong>Canary deployment</strong></td>
<td>Route 5% traffic to new model, monitor</td>
<td>Model release risk</td>
</tr>
<tr class="odd">
<td><strong>Shadow mode</strong></td>
<td>Run new model in parallel, don’t serve results</td>
<td>Validate before production</td>
</tr>
<tr class="even">
<td><strong>Feature flags</strong></td>
<td>Toggle ML features on/off without deploy</td>
<td>Quick disable if issues</td>
</tr>
<tr class="odd">
<td><strong>Chaos engineering</strong></td>
<td>Intentionally break things to find weaknesses</td>
<td>Pre-production resilience testing</td>
</tr>
<tr class="even">
<td><strong>Data contracts</strong></td>
<td>Formal schema + quality SLA with data producers</td>
<td>Prevent upstream data breaks</td>
</tr>
<tr class="odd">
<td><strong>Model rollback</strong></td>
<td>Automatic revert to previous version</td>
<td>Monitoring-triggered</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q8-how-do-you-build-an-ai-development-roadmap-and-phased-strategy" class="level2">
<h2 class="anchored" data-anchor-id="q8-how-do-you-build-an-ai-development-roadmap-and-phased-strategy">Q8: How Do You Build an AI Development Roadmap and Phased Strategy?</h2>
<p><strong>Answer:</strong></p>
<p>AI projects have high uncertainty — models may not work, data may not exist, and value is hard to predict before deployment. The architect designs a <strong>phased roadmap</strong> that validates assumptions incrementally, demonstrates value early, and avoids big-bang deployments.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph LR
    subgraph Phases["Development Phases"]
        P0["Phase 0: Discovery&lt;br/&gt;(2-4 weeks)"]
        P1["Phase 1: Proof of Concept&lt;br/&gt;(4-6 weeks)"]
        P2["Phase 2: MVP&lt;br/&gt;(6-12 weeks)"]
        P3["Phase 3: Production&lt;br/&gt;(8-16 weeks)"]
        P4["Phase 4: Scale &amp; Optimize&lt;br/&gt;(ongoing)"]
    end

    P0 --&gt; P1 --&gt; P2 --&gt; P3 --&gt; P4

    style Phases fill:#f8f9fa,stroke:#333
    style P0 fill:#c3aed6,stroke:#333
    style P1 fill:#6cc3d5,stroke:#333,color:#fff
    style P2 fill:#56cc9d,stroke:#333,color:#fff
    style P3 fill:#ffce67,stroke:#333
    style P4 fill:#ff6b6b,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="phase-breakdown" class="level3">
<h3 class="anchored" data-anchor-id="phase-breakdown">Phase Breakdown</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 15%">
<col style="width: 13%">
<col style="width: 28%">
<col style="width: 42%">
</colgroup>
<thead>
<tr class="header">
<th>Phase</th>
<th>Goal</th>
<th>Deliverables</th>
<th>Go/No-Go Criteria</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>0: Discovery</strong></td>
<td>Understand problem, validate feasibility</td>
<td>Requirements cartography, data audit, risk assessment</td>
<td>Data exists + problem is learnable + business case clear</td>
</tr>
<tr class="even">
<td><strong>1: PoC</strong></td>
<td>Prove model can solve the problem</td>
<td>Notebook + baseline metrics on sample data</td>
<td>Accuracy exceeds heuristic baseline by meaningful margin</td>
</tr>
<tr class="odd">
<td><strong>2: MVP</strong></td>
<td>Deliver working system to limited users</td>
<td>Deployed model + basic API + monitoring</td>
<td>End-to-end works, users get value, latency acceptable</td>
</tr>
<tr class="even">
<td><strong>3: Production</strong></td>
<td>Reliable, scalable, monitored system</td>
<td>Full pipeline + CI/CD + monitoring + security</td>
<td>Meets SLA, handles peak load, passes security review</td>
</tr>
<tr class="odd">
<td><strong>4: Scale</strong></td>
<td>Optimize cost, add features, expand coverage</td>
<td>A/B testing, multi-model, advanced monitoring</td>
<td>ROI positive, continuous improvement loop running</td>
</tr>
</tbody>
</table>
</section>
<section id="phase-details" class="level3">
<h3 class="anchored" data-anchor-id="phase-details">Phase Details</h3>
<pre><code>PHASE 0: DISCOVERY (2-4 weeks)
├── Stakeholder interviews → requirements cartography
├── Data audit (exists? accessible? quality? volume?)
├── Literature review (SOTA, similar solutions)
├── Feasibility assessment (is ML the right tool?)
├── Success criteria definition (what "good" looks like)
├── Risk identification + initial mitigation plan
└── Decision: GO / PIVOT / STOP

PHASE 1: PROOF OF CONCEPT (4-6 weeks)
├── Data exploration + preprocessing prototype
├── Baseline model (simple, interpretable)
├── Evaluation on representative sample
├── Benchmark against heuristic / rule-based approach
├── Architecture spike (validate critical tech choices)
├── Cost estimate (training + serving)
└── Decision: PROCEED / ADJUST SCOPE / STOP

PHASE 2: MVP (6-12 weeks)
├── Data pipeline (automated, validated)
├── Model training pipeline (reproducible)
├── Basic serving infrastructure (REST API)
├── Core monitoring (latency, errors, basic drift)
├── Limited user group deployment (beta)
├── Collect user feedback + real-world metrics
└── Decision: SCALE / ITERATE / PIVOT

PHASE 3: PRODUCTION (8-16 weeks)
├── Hardened infrastructure (HA, auto-scaling, security)
├── Full CI/CD pipeline (model + application)
├── Comprehensive monitoring + alerting
├── A/B testing framework
├── Documentation + runbooks
├── Security review + compliance certification
├── Load testing + chaos engineering
└── Full production deployment

PHASE 4: SCALE &amp; OPTIMIZE (ongoing)
├── Cost optimization (right-sizing, caching, batching)
├── Model improvements (new features, architectures)
├── Additional use cases (expand coverage)
├── Advanced monitoring (concept drift, fairness)
├── User experience refinement
└── Technical debt reduction</code></pre>
</section>
<section id="roadmap-anti-patterns" class="level3">
<h3 class="anchored" data-anchor-id="roadmap-anti-patterns">Roadmap Anti-Patterns</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 34%">
<col style="width: 23%">
<col style="width: 42%">
</colgroup>
<thead>
<tr class="header">
<th>Anti-Pattern</th>
<th>Problem</th>
<th>Better Approach</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Big bang deployment</strong></td>
<td>Months of work, no validation until end</td>
<td>Phased with go/no-go gates</td>
</tr>
<tr class="even">
<td><strong>Infrastructure first</strong></td>
<td>Build platform before proving model works</td>
<td>Model-first → infra follows</td>
</tr>
<tr class="odd">
<td><strong>Perfectionist PoC</strong></td>
<td>Over-engineer proof of concept</td>
<td>Time-boxed, minimum viable experiment</td>
</tr>
<tr class="even">
<td><strong>Skip monitoring</strong></td>
<td>Ship model, discover failure from users</td>
<td>Monitoring from MVP phase</td>
</tr>
<tr class="odd">
<td><strong>No baseline</strong></td>
<td>Can’t prove ML adds value</td>
<td>Always compare against simple heuristic</td>
</tr>
<tr class="even">
<td><strong>Scope creep per phase</strong></td>
<td>Each phase grows unbounded</td>
<td>Fixed time-box + explicit criteria</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q9-how-do-you-make-architecture-trade-off-decisions-and-document-them" class="level2">
<h2 class="anchored" data-anchor-id="q9-how-do-you-make-architecture-trade-off-decisions-and-document-them">Q9: How Do You Make Architecture Trade-Off Decisions and Document Them?</h2>
<p><strong>Answer:</strong></p>
<p>Architecture is the art of making trade-offs under uncertainty. Every decision involves sacrifice — the architect’s skill is in understanding <em>what</em> to sacrifice given the specific context, making decisions explicitly, and documenting them so they can be reviewed, challenged, and revised.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Framework["Decision Framework"]
        CONTEXT["1. Understand Context&lt;br/&gt;(constraints, priorities)"]
        OPTIONS["2. Identify Options&lt;br/&gt;(at least 3 alternatives)"]
        ANALYZE["3. Analyze Trade-offs&lt;br/&gt;(pros/cons per option)"]
        DECIDE["4. Decide &amp; Document&lt;br/&gt;(ADR with rationale)"]
        REVIEW["5. Review &amp; Revisit&lt;br/&gt;(as context changes)"]
    end

    CONTEXT --&gt; OPTIONS --&gt; ANALYZE --&gt; DECIDE --&gt; REVIEW
    REVIEW -.-&gt;|"Context changed"| CONTEXT

    style Framework fill:#fff,stroke:#333,color:#fff
    style CONTEXT fill:#c3aed6,stroke:#333,color:#fff
    style OPTIONS fill:#56cc9d,stroke:#333,color:#fff
    style ANALYZE fill:#ffce67,stroke:#333,color:#fff
    style DECIDE fill:#ff6b6b,stroke:#333,color:#fff
    style REVIEW fill:#6cc3d5,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="common-ai-architecture-trade-offs" class="level3">
<h3 class="anchored" data-anchor-id="common-ai-architecture-trade-offs">Common AI Architecture Trade-offs</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 23%">
<col style="width: 21%">
<col style="width: 21%">
<col style="width: 34%">
</colgroup>
<thead>
<tr class="header">
<th>Trade-off</th>
<th>Option A</th>
<th>Option B</th>
<th>Decision Driver</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Build vs Buy</strong></td>
<td>Custom model training pipeline</td>
<td>Managed service (SageMaker, Vertex)</td>
<td>Team size, time-to-market, budget</td>
</tr>
<tr class="even">
<td><strong>Single model vs Ensemble</strong></td>
<td>One model (simple, fast)</td>
<td>Multiple models (accurate, expensive)</td>
<td>Latency budget, accuracy requirement</td>
</tr>
<tr class="odd">
<td><strong>Real-time vs Batch</strong></td>
<td>Instant predictions (costly)</td>
<td>Pre-computed (cheaper, stale)</td>
<td>Freshness requirement</td>
</tr>
<tr class="even">
<td><strong>Monolith vs Microservices</strong></td>
<td>Single deployment unit</td>
<td>Independent services per model</td>
<td>Team autonomy, scaling independence</td>
</tr>
<tr class="odd">
<td><strong>Cloud vs On-prem</strong></td>
<td>Elastic, managed, pay-per-use</td>
<td>Control, compliance, fixed cost</td>
<td>Data sovereignty, GPU economics</td>
</tr>
<tr class="even">
<td><strong>Generality vs Specialization</strong></td>
<td>One model for many tasks</td>
<td>Task-specific models</td>
<td>Accuracy need, maintenance burden</td>
</tr>
<tr class="odd">
<td><strong>Speed vs Safety</strong></td>
<td>Fast deployment (no gate)</td>
<td>Multi-stage approval</td>
<td>Risk tolerance, regulatory context</td>
</tr>
<tr class="even">
<td><strong>Freshness vs Cost</strong></td>
<td>Retrain daily</td>
<td>Retrain monthly</td>
<td>Drift rate, retraining cost</td>
</tr>
</tbody>
</table>
</section>
<section id="atam-architecture-tradeoff-analysis-method" class="level3">
<h3 class="anchored" data-anchor-id="atam-architecture-tradeoff-analysis-method">ATAM (Architecture Tradeoff Analysis Method)</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 41%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Step</th>
<th>Activity</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>Present architecture to stakeholders</td>
<td>Shared understanding</td>
</tr>
<tr class="even">
<td>2</td>
<td>Identify quality attribute scenarios</td>
<td>Prioritized list of NFRs</td>
</tr>
<tr class="odd">
<td>3</td>
<td>Analyze architectural approaches</td>
<td>Sensitivity points + trade-off points</td>
</tr>
<tr class="even">
<td>4</td>
<td>Identify risks and non-risks</td>
<td>Risk themes</td>
</tr>
<tr class="odd">
<td>5</td>
<td>Document findings</td>
<td>Trade-off matrix + ADRs</td>
</tr>
</tbody>
</table>
</section>
<section id="decision-documentation-principles" class="level3">
<h3 class="anchored" data-anchor-id="decision-documentation-principles">Decision Documentation Principles</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 68%">
<col style="width: 31%">
</colgroup>
<thead>
<tr class="header">
<th>Principle</th>
<th>Why</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Record the WHY, not just the WHAT</strong></td>
<td>Future team understands context</td>
</tr>
<tr class="even">
<td><strong>List alternatives considered</strong></td>
<td>Shows due diligence, aids future revisiting</td>
</tr>
<tr class="odd">
<td><strong>State consequences explicitly</strong></td>
<td>Team knows what they’re accepting</td>
</tr>
<tr class="even">
<td><strong>Assign ownership</strong></td>
<td>Someone monitors if decision remains valid</td>
</tr>
<tr class="odd">
<td><strong>Set review trigger</strong></td>
<td>“Revisit if traffic exceeds 10K RPS”</td>
</tr>
<tr class="even">
<td><strong>Keep decisions lightweight</strong></td>
<td>1-page ADR, not a 50-page document</td>
</tr>
<tr class="odd">
<td><strong>Version decisions</strong></td>
<td>Supersede old ADRs when context changes</td>
</tr>
</tbody>
</table>
</section>
<section id="architecture-fitness-functions" class="level3">
<h3 class="anchored" data-anchor-id="architecture-fitness-functions">Architecture Fitness Functions</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 39%">
<col style="width: 36%">
<col style="width: 23%">
</colgroup>
<thead>
<tr class="header">
<th>Quality Attribute</th>
<th>Fitness Function</th>
<th>Threshold</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Latency</strong></td>
<td>p99 inference latency measured in CI/CD</td>
<td>&lt; 200ms</td>
</tr>
<tr class="even">
<td><strong>Cost</strong></td>
<td>Monthly cloud bill tracked per model</td>
<td>&lt; $X/month per model</td>
</tr>
<tr class="odd">
<td><strong>Availability</strong></td>
<td>Uptime measured over 30-day window</td>
<td>&gt; 99.9%</td>
</tr>
<tr class="even">
<td><strong>Deployability</strong></td>
<td>Time from code merge to production</td>
<td>&lt; 30 minutes</td>
</tr>
<tr class="odd">
<td><strong>Model quality</strong></td>
<td>Automated eval metrics in pipeline</td>
<td>Accuracy &gt; 0.90</td>
</tr>
<tr class="even">
<td><strong>Security</strong></td>
<td>Automated vulnerability scan results</td>
<td>Zero critical findings</td>
</tr>
<tr class="odd">
<td><strong>Coupling</strong></td>
<td>Dependency fan-out per service</td>
<td>&lt; 5 direct dependencies</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="q10-what-are-the-challenges-of-ai-architecture-and-how-do-you-address-them" class="level2">
<h2 class="anchored" data-anchor-id="q10-what-are-the-challenges-of-ai-architecture-and-how-do-you-address-them">Q10: What Are the Challenges of AI Architecture and How Do You Address Them?</h2>
<p><strong>Answer:</strong></p>
<p>AI architecture faces challenges that don’t exist in traditional software — from the inherent uncertainty of ML models to the operational complexity of data-dependent systems. Understanding these challenges and having systematic responses is what separates senior architects from technical leads.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">graph TD
    subgraph Challenges["Key AI Architecture Challenges"]
        UNCERTAINTY["Inherent Uncertainty&lt;br/&gt;(models are probabilistic)"]
        DATA_DEP["Data Dependencies&lt;br/&gt;(upstream changes break models)"]
        FEEDBACK["Feedback Loops&lt;br/&gt;(predictions influence data)"]
        TECHNICAL_DEBT["ML Technical Debt&lt;br/&gt;(glue code, config, entanglement)"]
        REPRODUCIBILITY["Reproducibility&lt;br/&gt;(non-deterministic training)"]
        ORG_CHALLENGE["Organizational&lt;br/&gt;(silos between teams)"]
    end

    subgraph Responses["Architectural Responses"]
        MODULAR["Modular Boundaries&lt;br/&gt;(isolate ML from application)"]
        CONTRACTS["Data Contracts&lt;br/&gt;(explicit interfaces)"]
        OBSERVE["Deep Observability&lt;br/&gt;(detect issues early)"]
        AUTOMATE["Automation&lt;br/&gt;(CI/CD, testing, retraining)"]
        ABSTRACT["Abstraction Layers&lt;br/&gt;(swap components)"]
        CULTURE["Platform Thinking&lt;br/&gt;(self-service for teams)"]
    end

    Challenges --&gt; Responses

    style Challenges fill:#6cc3d5,stroke:#333,color:#fff
    style Responses fill:#56cc9d,stroke:#333,color:#fff
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<section id="challenge-matrix" class="level3">
<h3 class="anchored" data-anchor-id="challenge-matrix">Challenge Matrix</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 16%">
<col style="width: 41%">
</colgroup>
<thead>
<tr class="header">
<th>Challenge</th>
<th>Root Cause</th>
<th>Symptom</th>
<th>Architectural Response</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Model accuracy in prod ≠ offline</strong></td>
<td>Distribution shift, data leakage</td>
<td>Model metrics look great in eval, fail with real users</td>
<td>Shadow testing, A/B testing, continuous monitoring</td>
</tr>
<tr class="even">
<td><strong>Training-serving skew</strong></td>
<td>Different code paths for training vs inference</td>
<td>Silent quality degradation</td>
<td>Feature store, shared preprocessing, end-to-end tests</td>
</tr>
<tr class="odd">
<td><strong>Data dependency fragility</strong></td>
<td>Upstream schema/quality changes unannounced</td>
<td>Model breaks without code change</td>
<td>Data contracts, schema validation, alerting</td>
</tr>
<tr class="even">
<td><strong>Feedback loops</strong></td>
<td>Model predictions influence future training data</td>
<td>Model amplifies biases, creates echochambers</td>
<td>Feedback detection, diversity injection, holdout groups</td>
</tr>
<tr class="odd">
<td><strong>Configuration complexity</strong></td>
<td>Hyperparams, feature flags, model versions, data versions</td>
<td>Changes cause unexpected interactions</td>
<td>Configuration versioning, canary configs, integration tests</td>
</tr>
<tr class="even">
<td><strong>Undeclared consumers</strong></td>
<td>Other teams start depending on model outputs</td>
<td>Can’t change model without breaking unknown downstream</td>
<td>API contracts, deprecation policies, consumer registry</td>
</tr>
<tr class="odd">
<td><strong>Entanglement</strong></td>
<td>Changing one feature affects other features’ importance</td>
<td>Can’t improve one model without regressing others</td>
<td>Feature importance monitoring, isolated model testing</td>
</tr>
<tr class="even">
<td><strong>Cost explosion</strong></td>
<td>GPU inference at scale, foundation model API calls</td>
<td>Budget overruns, project threatened</td>
<td>Tiered models, caching, batching, cost monitoring</td>
</tr>
</tbody>
</table>
</section>
<section id="ml-technical-debt-categories-from-googles-paper" class="level3">
<h3 class="anchored" data-anchor-id="ml-technical-debt-categories-from-googles-paper">ML Technical Debt Categories (from Google’s paper)</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 34%">
<col style="width: 28%">
<col style="width: 37%">
</colgroup>
<thead>
<tr class="header">
<th>Debt Type</th>
<th>Example</th>
<th>Prevention</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Glue code</strong></td>
<td>95% glue, 5% ML code</td>
<td>Standardized interfaces, SDK</td>
</tr>
<tr class="even">
<td><strong>Pipeline jungles</strong></td>
<td>Spaghetti data preparation</td>
<td>Managed pipelines, lineage tracking</td>
</tr>
<tr class="odd">
<td><strong>Dead experimental code</strong></td>
<td>Unused model variants in codebase</td>
<td>Regular cleanup, feature flags</td>
</tr>
<tr class="even">
<td><strong>Data testing debt</strong></td>
<td>No validation on training data</td>
<td>Great Expectations, schema tests</td>
</tr>
<tr class="odd">
<td><strong>Configuration debt</strong></td>
<td>Hardcoded paths, magic numbers</td>
<td>Config management, parameterization</td>
</tr>
<tr class="even">
<td><strong>Reproducibility debt</strong></td>
<td>Can’t recreate past results</td>
<td>DVC, MLflow, seed management</td>
</tr>
<tr class="odd">
<td><strong>Monitoring debt</strong></td>
<td>No drift detection, no alerting</td>
<td>Observability from day one</td>
</tr>
<tr class="even">
<td><strong>Abstraction debt</strong></td>
<td>No clean interfaces between components</td>
<td>Hexagonal architecture, ports/adapters</td>
</tr>
</tbody>
</table>
</section>
<section id="organizational-challenges" class="level3">
<h3 class="anchored" data-anchor-id="organizational-challenges">Organizational Challenges</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 36%">
<col style="width: 30%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Challenge</th>
<th>Symptom</th>
<th>Solution</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Data scientist ↔︎ Engineer gap</strong></td>
<td>“Works in notebook” can’t go to production</td>
<td>MLOps platform, shared tooling, embedded engineers</td>
</tr>
<tr class="even">
<td><strong>No ownership model</strong></td>
<td>Model in production with no team responsible</td>
<td>Clear RACI, model ownership policy</td>
</tr>
<tr class="odd">
<td><strong>Competing priorities</strong></td>
<td>Data team, ML team, platform team misaligned</td>
<td>Shared OKRs, architecture council, regular syncs</td>
</tr>
<tr class="even">
<td><strong>Skill scarcity</strong></td>
<td>Few people understand full stack</td>
<td>Platform abstractions, documentation, enablement</td>
</tr>
<tr class="odd">
<td><strong>Experimentation vs stability</strong></td>
<td>Data scientists want flexibility, ops wants stability</td>
<td>Separate experiment/production environments with promotion gates</td>
</tr>
</tbody>
</table>
</section>
<section id="architecture-maturity-model" class="level3">
<h3 class="anchored" data-anchor-id="architecture-maturity-model">Architecture Maturity Model</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 18%">
<col style="width: 35%">
<col style="width: 45%">
</colgroup>
<thead>
<tr class="header">
<th>Level</th>
<th>Description</th>
<th>Characteristics</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>0: Ad-hoc</strong></td>
<td>Manual everything, notebooks to production</td>
<td>No CI/CD, no monitoring, hero mode</td>
</tr>
<tr class="even">
<td><strong>1: Repeatable</strong></td>
<td>Automated training pipeline, basic serving</td>
<td>Scripts, cron jobs, manual deployment</td>
</tr>
<tr class="odd">
<td><strong>2: Defined</strong></td>
<td>Standard platform, CI/CD, monitoring</td>
<td>ML platform, model registry, defined process</td>
</tr>
<tr class="even">
<td><strong>3: Managed</strong></td>
<td>Metrics-driven, SLAs, auto-retraining</td>
<td>Continuous training, A/B testing, cost tracking</td>
</tr>
<tr class="odd">
<td><strong>4: Optimized</strong></td>
<td>Self-improving, multi-model orchestration</td>
<td>AutoML, automated architecture search, ML-driven ops</td>
</tr>
</tbody>
</table>
<hr>
</section>
</section>
<section id="summary-table" class="level2">
<h2 class="anchored" data-anchor-id="summary-table">Summary Table</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 13%">
<col style="width: 30%">
<col style="width: 56%">
</colgroup>
<thead>
<tr class="header">
<th>#</th>
<th>Topic</th>
<th>Key Concept</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td><strong>Requirements Cartography</strong></td>
<td>Map stakeholders, NFRs, constraints, and dependencies before design</td>
</tr>
<tr class="even">
<td>2</td>
<td><strong>AI System Design</strong></td>
<td>Layered architecture (data → ML → serving → observability → orchestration)</td>
</tr>
<tr class="odd">
<td>3</td>
<td><strong>Tech Stack Selection</strong></td>
<td>Weighted evaluation matrix + ADRs + prototype critical paths</td>
</tr>
<tr class="even">
<td>4</td>
<td><strong>Cost-Latency-Quality</strong></td>
<td>Three-way trade-off; cascade, cache, quantize to optimize</td>
</tr>
<tr class="odd">
<td>5</td>
<td><strong>Security &amp; Compliance</strong></td>
<td>AI-specific threats (adversarial, injection, leakage) + zero-trust</td>
</tr>
<tr class="even">
<td>6</td>
<td><strong>Scalability &amp; Concurrency</strong></td>
<td>GPU-aware scaling, dynamic batching, queue-based decoupling</td>
</tr>
<tr class="odd">
<td>7</td>
<td><strong>Risk Management</strong></td>
<td>Risk register, graceful degradation, circuit breakers, rollback</td>
</tr>
<tr class="even">
<td>8</td>
<td><strong>Roadmap &amp; Phases</strong></td>
<td>Discovery → PoC → MVP → Production → Scale; go/no-go gates</td>
</tr>
<tr class="odd">
<td>9</td>
<td><strong>Trade-off Decisions</strong></td>
<td>ADRs, ATAM, fitness functions; document WHY not just WHAT</td>
</tr>
<tr class="even">
<td>10</td>
<td><strong>Challenges</strong></td>
<td>ML debt, feedback loops, training-serving skew, org silos</td>
</tr>
</tbody>
</table>
<hr>
</section>
<section id="whats-next" class="level2">
<h2 class="anchored" data-anchor-id="whats-next">What’s Next?</h2>
<p>This article covered the strategic and operational dimensions of AI architecture. For related content:</p>
<ul>
<li><strong>System design patterns:</strong> <a href="../../posts/system-design/System-Design-Interview-QA-1.html">System Design Interview QA - 1</a></li>
<li><strong>Design patterns:</strong> <a href="../../posts/design-pattern/Design-Pattern-Interview-QA-1.html">Design Pattern Interview QA - 1</a></li>
<li><strong>MLOps fundamentals:</strong> <a href="../../posts/aiops-interview/MLOps-Interview-QA-1.html">MLOps Interview QA - 1</a></li>
<li><strong>Cloud-agnostic MLOps tools:</strong> <a href="../../posts/aiops-interview/MLOps-Interview-QA-5.html">MLOps Interview QA - 5</a></li>
<li><strong>LLMOps:</strong> <a href="../../posts/aiops-interview/LLMOps-Interview-QA-1.html">LLMOps Interview QA - 1</a></li>
<li><strong>Python production APIs:</strong> <a href="../../posts/swe-interview/Python-SWE-Interview-QA-4.html">Python SWE Interview QA - 4</a></li>
</ul>


</section>

 ]]></description>
  <guid>https://vectoringai.com/posts/ai-architect/AI-Architect-Interview-QA-1.html</guid>
  <pubDate>Thu, 21 May 2026 00:00:00 GMT</pubDate>
  <media:content url="https://vectoringai.com/images/ai-architect/thumb_ai_architect_interview_qa_300.png" medium="image" type="image/png" height="96" width="144"/>
</item>
</channel>
</rss>
