samples | ||
src | ||
LICENSE | ||
README.md | ||
requirements.txt |
Generative Agents
This serves as yet-another cobbled together application of generative agents utilizing LangChain as the core dependency and subjugating a "proxy" for GPT4.
In short, by utilizing a language model to summarize, rank, and query against information using NLP queries/instructions, immersive agents can be attained.
Features
- gradio web UI
- saving and loading of agents
- works with non-OpenAI LLMs and embeddings (tested llamacpp)
- modified prompts for use with vicuna
Installation
pip install -r requirements.txt
Usage
Set your environment variables accordingly:
LLM_TYPE
: (oai
,llamacpp
): the LLM backend to use in LangChain. OpenAI requires some additional environment variables:OPENAI_API_BASE
: URL for your target OpenAIOPENAI_API_KEY
: authentication key for OpenAIOPENAI_API_MODEL
: target model
LLM_MODEL
: (./path/to/your/llama/model.bin
): path to your GGML-formatted LLaMA model, if usingllamacpp
as the LLM backendLLM_EMBEDDING_TYPE
: (oai
,llamacpp
,hf
): the embedding model to use for similarity computing.LLM_VECTORSTORE_TYPE
: (chromadb
): the vector store to use for "infinite" context.LLM_PROMPT_TUNE
: (oai
,vicuna
,supercot
,cocktail
): prompt formatting to use, for variants with specific finetunes for instructions, etc.LLM_CONTEXT
: sets maximum context size
To run:
python .\src\main.py
Plans
- I do not plan on making this uber-user friendly like mrq/ai-voice-cloning, as this is just a stepping stone for a bigger project integrating generative agents.
- I need to re-implement grabbing relevant context, as I moved away mostly entirely from the provided LangChain implementation (not to knock it, it's just not in my nature to piggyback off of it).
- the """grand""" endgoal of this is to either have it host as an addon server (more likely, given the... prohibitive VRAM requirements), or live in a C++ library to be used in other programs.
Caveats
A local LLM isn't quite ready yet to truly rely on:
- It works, but it's a bit slow.
- You're really, really dependent on how well your variant is performing.
- You have to condition your prompts properly for decent results.
- Some flavors will handle the instruction-based method used here better than others.
- Model size is also a factor with how much memory it will consume.
- for example, if this were to be used strictly in C++ in a game engine, you're having to compete with limited (V)RAM
- however, I don't think mine consumes all that much resources.
- for example, if this were to be used strictly in C++ in a game engine, you're having to compete with limited (V)RAM
- ChromaDB seems to be strictly Python (node.js bindings still use Python, to my knowledge), which everyone seems to be preferring to use as a vector store over FAISS.
Utilizing GPT4 (or Anthropic Claude) Just Works a little too nicely even without conditioning the prompts for a "chat" model. But SaaS models are inherently limited by cost-per-generation, and not everyone will have that luxury for [enter use case for this application of generative agents here]