LLM 추상화 레이어 — 48시간마다 새 모델이 나오는 시대에 살아남는 법

48시간마다 새 모델이 출시돼요.

GPT-5.4 나옴 → "오 이게 최고네" → 갈아타고 싶음
근데 코드 전체가 Anthropic SDK로 짜여있음

갈아타려면:
- SDK 교체
- API 형식 전부 수정
- 프롬프트 재조정
- 테스트 전부 재실행
- 2주짜리 작업

이게 벤더 락인이에요.

반대로 추상화 레이어가 있으면:

# 이것만 바꾸면 끝
MODEL = "openai/gpt-5.4"  # 전에는 "anthropic/claude-sonnet-4-6"

코드 한 줄. 2주가 1초가 돼요.

왜 지금 이게 중요한가

실제로 일어난 일들:
- DALL-E 3 2026년 5월 지원 종료 → 수주 안에 마이그레이션
- GPT-4 요금 갑자기 30% 인상 → 단가 모델 붕괴
- Anthropic API 6시간 다운 → 서비스 전체 중단
- 경쟁사가 10배 빠른 모델을 반값에 출시 → 못 갈아탐

Anthropic 기준 엔터프라이즈 LLM API 점유율 변화:
2023: OpenAI 50%, Anthropic 20%
2026: OpenAI 27%, Anthropic 40%

→ 판도가 바뀜. 2년 후도 똑같이 바뀔 거예요.

추상화 레이어 설계 원칙

핵심은 하나예요.

비즈니스 로직이 LLM 공급사를 몰라야 한다.

# 나쁜 예 — 공급사가 코드에 박혀있음
import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}]
)
text = response.content[0].text

# 좋은 예 — 공급사를 모름
text = llm.generate(prompt)

구현 방법 1 — 직접 추상화 클래스 만들기

from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class LLMResponse:
    text: str
    model: str
    input_tokens: int
    output_tokens: int

class BaseLLM(ABC):
    """모든 LLM 공급사의 공통 인터페이스"""

    @abstractmethod
    def generate(self, prompt: str, **kwargs) -> LLMResponse:
        pass

    @abstractmethod
    def generate_with_system(
        self,
        system: str,
        prompt: str,
        **kwargs
    ) -> LLMResponse:
        pass

Anthropic 구현체

import anthropic as _anthropic

class AnthropicLLM(BaseLLM):
    def __init__(self, model: str = "claude-sonnet-4-6"):
        self.client = _anthropic.Anthropic()
        self.model = model

    def generate(self, prompt: str, **kwargs) -> LLMResponse:
        response = self.client.messages.create(
            model=self.model,
            max_tokens=kwargs.get("max_tokens", 1024),
            messages=[{"role": "user", "content": prompt}]
        )
        return LLMResponse(
            text=response.content[0].text,
            model=self.model,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
        )

    def generate_with_system(
        self, system: str, prompt: str, **kwargs
    ) -> LLMResponse:
        response = self.client.messages.create(
            model=self.model,
            max_tokens=kwargs.get("max_tokens", 1024),
            system=system,
            messages=[{"role": "user", "content": prompt}]
        )
        return LLMResponse(
            text=response.content[0].text,
            model=self.model,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
        )

OpenAI 구현체

from openai import OpenAI as _OpenAI

class OpenAILLM(BaseLLM):
    def __init__(self, model: str = "gpt-5.4-mini"):
        self.client = _OpenAI()
        self.model = model

    def generate(self, prompt: str, **kwargs) -> LLMResponse:
        response = self.client.chat.completions.create(
            model=self.model,
            max_tokens=kwargs.get("max_tokens", 1024),
            messages=[{"role": "user", "content": prompt}]
        )
        return LLMResponse(
            text=response.choices[0].message.content,
            model=self.model,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
        )

    def generate_with_system(
        self, system: str, prompt: str, **kwargs
    ) -> LLMResponse:
        response = self.client.chat.completions.create(
            model=self.model,
            max_tokens=kwargs.get("max_tokens", 1024),
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": prompt}
            ]
        )
        return LLMResponse(
            text=response.choices[0].message.content,
            model=self.model,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
        )

팩토리 — 환경변수 하나로 제어

import os

def create_llm(provider: str = None, model: str = None) -> BaseLLM:
    """
    환경변수로 LLM 공급사 선택.
    코드 수정 없이 공급사 전환 가능.
    """
    provider = provider or os.getenv("LLM_PROVIDER", "anthropic")
    model = model or os.getenv("LLM_MODEL")

    providers = {
        "anthropic": lambda: AnthropicLLM(
            model or "claude-sonnet-4-6"
        ),
        "openai": lambda: OpenAILLM(
            model or "gpt-5.4-mini"
        ),
    }

    if provider not in providers:
        raise ValueError(f"지원하지 않는 공급사: {provider}")

    return providers[provider]()

# 사용 — 비즈니스 로직은 공급사를 전혀 모름
llm = create_llm()
response = llm.generate("파이썬 퀵소트 구현해줘")
print(response.text)

모델 전환:

# Claude 쓸 때
LLM_PROVIDER=anthropic LLM_MODEL=claude-sonnet-4-6 python app.py

# GPT로 전환 — 코드 한 줄도 안 바꿈
LLM_PROVIDER=openai LLM_MODEL=gpt-5.4-mini python app.py

# Gemini로 전환
LLM_PROVIDER=google LLM_MODEL=gemini-3.1-flash python app.py

구현 방법 2 — LiteLLM으로 10분 만에 (권장)

직접 만들기 귀찮으면 LiteLLM이 다 해줘요.

pip install litellm

import litellm
import os

def generate(prompt: str, model: str = None) -> str:
    """
    모델명만 바꾸면 공급사 자동 전환.
    코드 구조 변경 없음.
    """
    model = model or os.getenv("LLM_MODEL", "claude-sonnet-4-6")

    response = litellm.completion(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Claude
result = generate("안녕", model="claude-sonnet-4-6")

# GPT — 함수 그대로, 모델명만 바꿈
result = generate("안녕", model="gpt-5.4-mini")

# Gemini — 똑같이
result = generate("안녕", model="gemini/gemini-3.1-flash")

# 로컬 Ollama
result = generate("안녕", model="ollama/llama4")

자동 폴백 설정

import litellm

# Claude 실패하면 GPT, GPT 실패하면 Gemini
response = litellm.completion(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": prompt}],
    fallbacks=[
        "gpt-5.4-mini",
        "gemini/gemini-3.1-flash",
    ],
    num_retries=2,
    timeout=30,
)

구현 방법 3 — OpenRouter로 더 간단하게

인프라도 없이 바로:

from openai import OpenAI

# 이것만 하면 200개+ 모델 전부 접근
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)

def generate(prompt: str) -> str:
    model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-6")

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Claude
LLM_MODEL=anthropic/claude-sonnet-4-6 python app.py

# GPT
LLM_MODEL=openai/gpt-5.4-mini python app.py

# 무료 Llama
LLM_MODEL=meta-llama/llama-4-scout:free python app.py

프롬프트도 추상화해야 한다

모델 전환 시 가장 자주 깨지는 게 프롬프트예요.

# 나쁜 예 — 특정 모델 quirk에 의존
prompt = """
<claude_context>
너는 FastAPI 전문가야.
반드시 XML 태그로 감싸서 응답해줘.
<thinking> 태그로 추론 먼저 하고.
</claude_context>
"""

# 좋은 예 — 공급사 독립적
SYSTEM_PROMPT = "당신은 FastAPI 전문 개발자입니다."
USER_PROMPT = "FastAPI JWT 인증 구현해줘"

프롬프트를 코드 밖으로 분리:

# prompts/code_review.yaml
system: "당신은 시니어 소프트웨어 엔지니어입니다."
user_template: |
  다음 코드를 리뷰해줘:

  {code}

  보안, 성능, 가독성 기준으로 분석해줘.

import yaml

def load_prompt(name: str) -> dict:
    with open(f"prompts/{name}.yaml") as f:
        return yaml.safe_load(f)

prompt = load_prompt("code_review")
response = llm.generate_with_system(
    system=prompt["system"],
    prompt=prompt["user_template"].format(code=my_code)
)

이미 락인됐을 때 탈출하는 법

# 1단계: 공급사 의존 코드 전부 찾기
grep -r "anthropic" src/
grep -r "from openai" src/
grep -r "claude" src/
grep -r "gpt-" src/

# 2단계: 직접 호출 → 추상화 레이어로 감싸기

# Before
response = anthropic_client.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": prompt}]
)
text = response.content[0].text

# After
text = llm.generate(prompt)
# llm은 어떤 공급사든 상관없음

# 3단계: 스테이징에서 모델 전환 테스트
import os

# 스테이징: GPT로 테스트
os.environ["LLM_MODEL"] = "gpt-5.4-mini"

# 프로덕션: Claude 유지
os.environ["LLM_MODEL"] = "claude-sonnet-4-6"

추상화 레이어 선택 기준

직접 구현:
→ 세밀한 제어 필요
→ 특수한 요구사항
→ 의존성 최소화

LiteLLM:
→ 빠르게 시작
→ 100개+ 모델 지원
→ 자동 폴백, 로깅 내장
→ 가장 현실적인 선택

OpenRouter:
→ 인프라 관리 없이
→ 200개+ 모델
→ 무료 모델 있음
→ 프로토타입, 개인 프로젝트

자체 게이트웨이 (Portkey, TrueFoundry):
→ 기업 환경
→ RBAC, 감사 로그 필요
→ 데이터 컴플라이언스

결론

LLM을 바꾸는 게 왜 어렵냐?
→ 처음부터 추상화 안 했기 때문

원칙:
1. 비즈니스 로직이 공급사를 몰라야 한다
2. 모델 선택은 설정 파일/환경변수에
3. 프롬프트는 코드 밖으로
4. 폴백 전략은 처음부터

지금 당장 할 것:
grep -r "anthropic\|openai\|claude\|gpt-" src/
→ 나온 파일 수가 마이그레이션 비용

'AI Development' 카테고리의 다른 글

Continue.dev 완전 가이드 — GitHub Copilot 대신 쓰는 무료 오픈소스 AI 코딩 어시스턴트 (0)	2026.04.23
LiteLLM 완전 가이드 — Claude, GPT, Gemini 100개+ LLM을 코드 한 줄로 전환하기 (0)	2026.04.23
GitHub Copilot Agent Mode 실전 가이드 — VS Code에서 자율 코딩 에이전트 쓰는 법 (0)	2026.04.21
markitdown-ocr 플러그인 — 스캔 PDF, 이미지 속 텍스트까지 뽑아내는 법 (0)	2026.04.21
markitdown 완전 가이드 — PDF, Word, PPT를 LLM이 읽는 형식으로 자동 변환 (0)	2026.04.21

Cell DEVLOG

LLM 추상화 레이어 — 48시간마다 새 모델이 나오는 시대에 살아남는 법

왜 지금 이게 중요한가

추상화 레이어 설계 원칙

구현 방법 1 — 직접 추상화 클래스 만들기

Anthropic 구현체

OpenAI 구현체

팩토리 — 환경변수 하나로 제어

구현 방법 2 — LiteLLM으로 10분 만에 (권장)

자동 폴백 설정

구현 방법 3 — OpenRouter로 더 간단하게

프롬프트도 추상화해야 한다

이미 락인됐을 때 탈출하는 법

추상화 레이어 선택 기준

결론

'AI Development' 카테고리의 다른 글

티스토리툴바

LLM 추상화 레이어 — 48시간마다 새 모델이 나오는 시대에 살아남는 법

왜 지금 이게 중요한가

추상화 레이어 설계 원칙

구현 방법 1 — 직접 추상화 클래스 만들기

Anthropic 구현체

OpenAI 구현체

팩토리 — 환경변수 하나로 제어

구현 방법 2 — LiteLLM으로 10분 만에 (권장)

자동 폴백 설정

구현 방법 3 — OpenRouter로 더 간단하게

프롬프트도 추상화해야 한다

이미 락인됐을 때 탈출하는 법

추상화 레이어 선택 기준

결론

'AI Development' 카테고리의 다른 글

'AI Development' Related Articles

티스토리툴바