IMO the simplest option is llamafile (it is multiplatform using "cosmopolitan" l...

c6401 · on Nov 12, 2024

It has a webui, but this is how I use it from python (sorry I like python, but similar connection method should work from the other langs too).

    ai = openai.AsyncOpenAI(base_url="http://localhost:8080/v1", api_key="sk-no-key-required")
    response = await ai.chat.completions.create(
        messages=[
            {"role": "system", "content": "..."}, {"role": "user", "content": "..."},
        ],
        max_tokens=100,
        model="Llama-3.2-1B-Instruct.Q6_K.gguf",
    )

    content = response.choices[0].message.content