OpenAI SDK
Drop-in replacements for openai.OpenAI and openai.AsyncOpenAI with automatic telemetry on every token-consuming endpoint.
from weflayr.sdk.openai.client import OpenAI, AsyncOpenAI
Clients
| Class | Replaces | Mode |
|---|---|---|
OpenAI |
openai.OpenAI |
Synchronous |
AsyncOpenAI |
openai.AsyncOpenAI |
Async / await |
Constructor parameters
All parameters beyond api_key are optional and fall back to environment variables.
| Parameter | Env var | Description |
|---|---|---|
api_key |
— | Your OpenAI API key |
intake_url |
WEFLAYR_INTAKE_URL |
Weflayr intake base URL |
client_id |
WEFLAYR_CLIENT_ID |
Your Flare client ID |
bearer_token |
WEFLAYR_CLIENT_SECRET |
Your Flare client secret |
Coverage
7
covered
9
not covered
16
total endpoints
| Endpoint | Accessor | Sync | Async | Stream | Billing metrics | Notes |
|---|---|---|---|---|---|---|
create()
covered
|
client.chat.completions | ✓ | ✓ | ✓ | prompt_tokens, completion_tokens | Streaming supported — injects `include_usage` automatically to capture token counts from the final chunk |
create()
covered
|
client.embeddings | ✓ | ✓ | — | prompt_tokens, total_tokens | Tracks both prompt and total token counts |
create()
covered
|
client.responses | ✓ | ✓ | — | input_tokens, output_tokens, cached_tokens | Stateful responses API. Also tracks prompt cache hits via `cached_tokens` |
create()
covered
|
client.audio.speech | ✓ | ✓ | — | char_count | Billed by character count, not tokens — `char_count` is captured from the input text before the call |
create()
covered
|
client.audio.transcriptions | ✓ | ✓ | — | prompt_tokens | Supports `whisper-1` and newer transcription models. Billed by tokens or audio seconds depending on model |
create()
covered
|
client.audio.translations | ✓ | ✓ | — | prompt_tokens | Translates audio to English. `whisper-1` only |
create()
covered
|
client.completions | ✓ | ✓ | — | prompt_tokens, completion_tokens | For `gpt-3.5-turbo-instruct` and similar legacy models. Also tracks `prompt_length` |
generate()
not covered
|
client.images | — | — | — | — | Not yet instrumented |
edit()
not covered
|
client.images | — | — | — | — | Not yet instrumented |
create_variation()
not covered
|
client.images | — | — | — | — | Not yet instrumented |
create / list / retrieve / update / delete
not covered
|
client.beta.assistants | — | — | — | — | Full Assistants API not instrumented — use direct API integration if needed |
threads.* / runs.*
not covered
|
client.beta.threads | — | — | — | — | Requires Assistants API — not yet instrumented |
jobs.create()
not covered
|
client.fine_tuning | — | — | — | — | Not yet instrumented |
create()
not covered
|
client.moderations | — | — | — | — | Free endpoint — no billing tracking. Not instrumented |
create()
not covered
|
client.batches | — | — | — | — | Async batch processing not yet instrumented |
create / list / delete
not covered
|
client.beta.vector_stores | — | — | — | — | Not yet instrumented |
Examples
Chat completions — standard
from weflayr.sdk.openai.client import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain recursion"}],
tags={"feature": "docs", "version": "v2"},
)
print(response.choices[0].message.content)
Chat completions — streaming
with client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
tags={"feature": "streaming-demo"},
) as stream:
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
# Token counts are captured from the final chunk automatically
Async client
import asyncio
from weflayr.sdk.openai.client import AsyncOpenAI
client = AsyncOpenAI(api_key="sk-...")
async def main():
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
tags={"env": "production"},
)
print(response.choices[0].message.content)
asyncio.run(main())
Embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox",
tags={"pipeline": "rag", "step": "embed"},
)
Text-to-Speech
audio = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello from Weflayr!",
tags={"locale": "en-US"},
)
# char_count is tracked automatically from `input`
Transcription
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=open("audio.mp3", "rb"),
tags={"source": "call-centre"},
)