Go to file

mrq 6fa2c18fb1 reworked saving/loading agent by just saving the memory documents themselves and adding them on load, rather than serializing the entire memory object and it breaking between systems / wizard prompt tune / more tunings		2023-05-15 07:28:04 +00:00
samples	an amazing commit	2023-04-29 04:14:56 +00:00
src	reworked saving/loading agent by just saving the memory documents themselves and adding them on load, rather than serializing the entire memory object and it breaking between systems / wizard prompt tune / more tunings	2023-05-15 07:28:04 +00:00
LICENSE	Initial commit	2023-04-29 03:37:11 +00:00
README.md	reworked saving/loading agent by just saving the memory documents themselves and adding them on load, rather than serializing the entire memory object and it breaking between systems / wizard prompt tune / more tunings	2023-05-15 07:28:04 +00:00
requirements.txt	updating for new langchain, more tunes	2023-05-03 01:01:58 +00:00

README.md

Generative Agents

This serves as yet-another cobbled together application of generative agents utilizing LangChain as the core dependency and subjugating a "proxy" for GPT4.

In short, by utilizing a language model to summarize, rank, and query against information using NLP queries/instructions, immersive agents can be attained.

Features

gradio web UI
saving and loading of agents
works with non-OpenAI LLMs and embeddings (tested llamacpp)
modified prompts for use with vicuna

Installation

pip install -r requirements.txt

Usage

Set your environment variables accordingly:

LLM_TYPE: (oai, llamacpp): the LLM backend to use in LangChain. OpenAI requires some additional environment variables:
- OPENAI_API_BASE: URL for your target OpenAI
- OPENAI_API_KEY: authentication key for OpenAI
- OPENAI_API_MODEL: target model
LLM_MODEL: (./path/to/your/llama/model.bin): path to your GGML-formatted LLaMA model, if using llamacpp as the LLM backend
LLM_EMBEDDING_TYPE: (oai, llamacpp, hf): the embedding model to use for similarity computing.
LLM_PROMPT_TUNE: (oai, vicuna, supercot, cocktail): prompt formatting to use, for variants with specific finetunes for instructions, etc.
LLM_CONTEXT: sets maximum context size

To run:

python .\src\main.py

Plans

I do not plan on making this uber-user friendly like mrq/ai-voice-cloning, as this is just a stepping stone for a bigger project integrating generative agents.

Caveats

A local LM is quite slow. Things seem to be getting faster as llama.cpp is being developed. GPU offloading (and the OpenCL PR) seems to bring some very nice hope in just scrapping this in Python and just integrate it entirely in C++.

Even using one that's more instruction-tuned like Vicuna (with a SYSTEM:\nUSER:\nASSISTANT: structure of prompts), it's still inconsistent.

However, I seem to be getting consistent results with SuperCOT 33B, it's just, well, slow. SuperCOT 13B seems to be giving better answers over Vicuna-1.1 13B, so. Cocktail 13B seems to be the best of the 13Bs.

A lot of prompt wrangling is needed, and a lot of the routines could be polished up (for example, an observation queries the LM for a rating, and each response reaction requires quering for the observed entity, then the relationship between an agent and observed entity which ends up just summarizing relevant context/memories, and then queries for a response), and if one of these steps fails, then the fail rate is higher. If anything, I might as well just work from the ground up and only really salvage the use of FAISS to store embedded-vectors.

GPT4 seems to Just Work, unfortunately.