OpenAI SDK

Drop-in replacements for openai.OpenAI and openai.AsyncOpenAI with automatic telemetry on every token-consuming endpoint.

from weflayr.sdk.openai.client import OpenAI, AsyncOpenAI

Clients

Class	Replaces	Mode
`OpenAI`	`openai.OpenAI`	Synchronous
`AsyncOpenAI`	`openai.AsyncOpenAI`	Async / await

Constructor parameters

All parameters beyond api_key are optional and fall back to environment variables.

Parameter	Env var	Description
`api_key`	—	Your OpenAI API key
`intake_url`	`WEFLAYR_INTAKE_URL`	Weflayr intake base URL
`client_id`	`WEFLAYR_CLIENT_ID`	Your Flare client ID
`bearer_token`	`WEFLAYR_CLIENT_SECRET`	Your Flare client secret

Coverage

7 covered

9 not covered

16 total endpoints

Endpoint	Accessor	Sync	Async	Stream	Billing metrics	Notes
`create()` covered	client.chat.completions	✓	✓	✓	prompt_tokens, completion_tokens	Streaming supported — injects `include_usage` automatically to capture token counts from the final chunk
`create()` covered	client.embeddings	✓	✓	—	prompt_tokens, total_tokens	Tracks both prompt and total token counts
`create()` covered	client.responses	✓	✓	—	input_tokens, output_tokens, cached_tokens	Stateful responses API. Also tracks prompt cache hits via `cached_tokens`
`create()` covered	client.audio.speech	✓	✓	—	char_count	Billed by character count, not tokens — `char_count` is captured from the input text before the call
`create()` covered	client.audio.transcriptions	✓	✓	—	prompt_tokens	Supports `whisper-1` and newer transcription models. Billed by tokens or audio seconds depending on model
`create()` covered	client.audio.translations	✓	✓	—	prompt_tokens	Translates audio to English. `whisper-1` only
`create()` covered	client.completions	✓	✓	—	prompt_tokens, completion_tokens	For `gpt-3.5-turbo-instruct` and similar legacy models. Also tracks `prompt_length`
`generate()` not covered	client.images	—	—	—	—	Not yet instrumented
`edit()` not covered	client.images	—	—	—	—	Not yet instrumented
`create_variation()` not covered	client.images	—	—	—	—	Not yet instrumented
`create / list / retrieve / update / delete` not covered	client.beta.assistants	—	—	—	—	Full Assistants API not instrumented — use direct API integration if needed
`threads.* / runs.*` not covered	client.beta.threads	—	—	—	—	Requires Assistants API — not yet instrumented
`jobs.create()` not covered	client.fine_tuning	—	—	—	—	Not yet instrumented
`create()` not covered	client.moderations	—	—	—	—	Free endpoint — no billing tracking. Not instrumented
`create()` not covered	client.batches	—	—	—	—	Async batch processing not yet instrumented
`create / list / delete` not covered	client.beta.vector_stores	—	—	—	—	Not yet instrumented

Examples

Chat completions — standard

from weflayr.sdk.openai.client import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain recursion"}],
    tags={"feature": "docs", "version": "v2"},
)
print(response.choices[0].message.content)

Chat completions — streaming

with client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
    tags={"feature": "streaming-demo"},
) as stream:
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")
# Token counts are captured from the final chunk automatically

Async client

import asyncio
from weflayr.sdk.openai.client import AsyncOpenAI

client = AsyncOpenAI(api_key="sk-...")

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
        tags={"env": "production"},
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox",
    tags={"pipeline": "rag", "step": "embed"},
)

Text-to-Speech

audio = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello from Weflayr!",
    tags={"locale": "en-US"},
)
# char_count is tracked automatically from `input`

Transcription

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("audio.mp3", "rb"),
    tags={"source": "call-centre"},
)