The Anthropic API

Everything you actually use when shipping an app on Claude — auth, the model picker, prompt caching, extended thinking, tool use, vision, PDFs, files, batch processing, citations, and the cost levers that decide your bill.

On this page

1. Setup & first call

  1. Sign in at console.anthropic.com.
  2. Settings → API Keys → Create Key. Copy it once.
  3. Add billing + set a monthly usage limit.
  4. Export ANTHROPIC_API_KEY in your shell.
# Python
pip install "anthropic>=0.40"
# Node
npm i @anthropic-ai/sdk
from anthropic import Anthropic
client = Anthropic()

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You are a concise senior engineer.",
    messages=[{"role": "user", "content": "Refactor this for clarity: ..."}],
)
print(resp.content[0].text)
print(resp.usage)  # input_tokens, output_tokens, cache_*

2. Model IDs & picking the right one

ModelIDBest for
Opus 4.7claude-opus-4-7Hard reasoning, large refactors, multi-step planning.
Opus 4.7 (1M)claude-opus-4-7[1m]Same model, 1M-token context. Use for huge codebases or long docs.
Sonnet 4.6claude-sonnet-4-6Daily workhorse. Strong + fast + cheap. Default for most apps.
Haiku 4.5claude-haiku-4-5-20251001High-volume, latency-sensitive: classification, routing, autocomplete.

See the Models page for the full comparison, benchmarks, and a decision flowchart.

Default to Sonnet. Upgrade to Opus only when Sonnet visibly struggles. Downgrade to Haiku when the task is bounded and a million calls/day is the norm.

3. Prompt caching (always-on cost cut)

The single biggest cost lever. Mark any chunk of your prompt with cache_control: {"type": "ephemeral"} and Anthropic caches it for ~5 minutes. Cache hits are ~90% cheaper and faster.

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {"type": "text", "text": "You are a senior engineer."},
        {
            "type": "text",
            "text": LONG_CODEBASE_CONTEXT,            # 200k tokens of docs / code
            "cache_control": {"type": "ephemeral"},  # <-- cache this
        },
    ],
    messages=[{"role": "user", "content": "What does foo() do?"}],
)

The next call within 5 minutes that has the same prefix gets the cache hit. The cache extends as long as you keep hitting it — heavy users see TTLs in the hours.

Rule of thumb: if you're sending the same 10k+ tokens twice in five minutes, you're losing money by not caching.

4. Extended thinking

Give Claude scratchpad tokens before its final answer. Pays off on math, multi-step reasoning, and code planning. Opus and Sonnet support it; Haiku does not.

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": HARD_PROBLEM}],
)

for block in resp.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

5. Tool use

The mechanism that turns Claude into an agent. You define tools as JSON schemas; the model picks when and how to call them.

tools = [{
    "name": "get_weather",
    "description": "Current weather for a city.",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's it like in Tokyo?"}],
)

# If resp.stop_reason == "tool_use", run the tool and send tool_result back

6. Vision (images)

import base64, httpx
img = base64.b64encode(httpx.get("https://.../diagram.png").content).decode()

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": img}},
            {"type": "text", "text": "What does this architecture diagram describe?"},
        ],
    }],
)

Supports PNG, JPEG, GIF, WebP. Up to 100 images per request. Use URL sources ("type": "url") when the file is already public — saves you the base64 dance.

7. PDF input

Send PDFs directly. Claude sees text and visual layout — better than passing only OCR'd text.

pdf_b64 = base64.b64encode(open("contract.pdf", "rb").read()).decode()

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_b64}},
            {"type": "text", "text": "Summarize the indemnity clause and flag risks."},
        ],
    }],
)

8. Files API

Upload once, reference many times. Useful for skills, large context docs, and reusable PDFs.

file = client.beta.files.upload(file=open("my-skill.zip", "rb"), purpose="skills")

resp = client.beta.messages.create(
    model="claude-opus-4-7",
    betas=["skills-2025-10-02"],
    container={"skills": [{"file_id": file.id}]},
    messages=[...],
)

9. Batch API (50% off)

Submit up to 100k requests, get results within 24 hours, at half the per-token cost. The right tool for evals, backfills, bulk classification.

batch = client.beta.messages.batches.create(
    requests=[
        {"custom_id": "row-1", "params": {"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}},
        {"custom_id": "row-2", "params": {...}},
        # ...up to 100,000
    ],
)

# later
result = client.beta.messages.batches.retrieve(batch.id)

10. Citations

Make Claude cite exact passages from documents you provided. Critical for RAG, compliance, and any UX that displays sources.

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {"type": "text", "media_type": "text/plain", "data": DOC_TEXT},
                "title": "Q3 financial report",
                "citations": {"enabled": True},
            },
            {"type": "text", "text": "What was net revenue?"},
        ],
    }],
)

The response includes citations with character ranges back into the source doc. Render them as footnotes.

11. Computer use

Claude can drive a virtual desktop — move mouse, type, take screenshots. Public beta on Opus and Sonnet.

resp = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    betas=["computer-use-2025-04-29"],
    tools=[{"type": "computer_20250124", "name": "computer", "display_width_px": 1024, "display_height_px": 768}],
    messages=[{"role": "user", "content": "Open the Settings app and turn on dark mode."}],
)
Run this in a sandboxed VM. Computer use means Claude can click anything. Treat it like a junior intern with admin access — useful, scary, contained.

12. Streaming

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me about caching."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Use streaming for any UI surface — perceived latency drops to near-zero. The Node SDK has an equivalent async iterator.

13. Pricing & rate limits

ModelInput ($/M tok)Output ($/M tok)Cache writeCache read
Opus 4.7~$15~$75+25%−90%
Sonnet 4.6~$3~$15+25%−90%
Haiku 4.5~$1~$5+25%−90%

Indicative pricing — confirm at anthropic.com/pricing.

Rate limits

Tiered. You start at Tier 1 (low) and auto-promote as you spend. Check Settings → Limits in the console. If you're hitting limits hard, contact sales for Tier 4+.

14. Production patterns