October 2024 - Dan Austin
AiTuning developed & fine-tuned an AI system that discovers high-quality & enriched B2B leads on autopilot.
Lga (lead gen agent) is a generative AI system that discovers high-quality leads on autopilot for any B2B enterprise user.
Each lead discovered is enriched with their LinkedIn profile & email address and can be exported to the user's CRM. This article explains how AiTuning developed, fine-tuned & deployed the system from the ground up.
When a new user joins, they're greeted with a concise form. This form asks them to describe their ideal client, provide details about their own business, and specify the service or product they're seeking leads for. Once this information is submitted, the AI system jumps into action.
Within a short timeframe, the AI presents a curated list of organisations as shown above. All ranked based on how likely they are to be interested in the user's offering. The initial view provides key details for each organisation, with an option to expand for more in-depth information.
Expanding an organisation reveals crucial facts about the company, along with a carefully filtered list of decision-makers within that organisation. The AI has ranked these staff members based on their likelihood of being the most relevant person to contact.
Once the user is ready, they may click "Enrich & Export" and select their chosen CRM. When selected, the oAuth flow will be launched, after which the user will receive all the organisations as well as their enriched staff members directly into their CRM, ready for outreach at scale.
To fully appreciate the impact of lga, let's first consider the conventional approach lead generation involved a labor-intensive process:
A skilled lead generator working full-time could process approximately 8 leads per hour.
Beyond quantifiable metrics, lga offers several intangible benefits:
Our custom evaluation suite showed Anthropic's Claude-3.5-Sonnet outperformed OpenAI's GPT4-o and other model options on our use cases, so we chose it for the prototyping and prompt engineering phase of development.
Once the AI system was built and prompts were consistently effective, Llama 3.1 8B was selected for fine-tuning. This smaller, efficient LLM offered an excellent balance of performance and resource usage over other small models we tested, making it ideal for the production environment.
Unsloth was employed to accelerate the fine-tuning process whilst minimising resource consumption. Its free, open-source, offers 2.2x faster performance than alternatives, and a 70% reduction in VRAM usage. All this made it an excellent choice to fine-tune Llama 3.1 8B with. Google Colab provided the cost-effective platform for running Unsloth.
Runpod was selected for the serverless deployment of the fine-tuned LLM. Its pay-per-use GPU deployment model, with no idling costs, sub-250ms cold starts, and autoscaling capabilities, ensured efficient and responsive operation of our fine-tuned Llama model.
Langfuse played a crucial role in monitoring inference metrics and collecting datapoints for fine-tuning. It is open-source, and therefore can be self-hosted. It also provides centralised prompt management and feedback logging features which are great for scalability.
HuggingFace provided a secure, private hosting solution for the fine-tuned LoRA adapter and merged model weights.
Golang was chosen for the REST API server due to its performance, concurrency capabilities, and scalability. All essential attributes for handling the large amounts of data and consumer-facing nature of the application.
We leveraged Python for various tasks such as building & running our custom made model evaluation suite, fine-tuning, and deployment scripts.
MongoDB provided us the flexibility & performance needed for handling large volumes of lead data. Its ability to store nested structures aligned perfectly with our data models. The indexing options also enabled fast retrieval for our AI system to make use of.
AWS hosted both the Golang server and Langfuse deployment on an EC2 instance.
Next.js was utilised for frontend development, offering a powerful and flexible framework for creating a responsive and dynamic user interface.
Vercel provided a seamless and efficient platform for hosting the Next.js application. We linked it to our Github repo and it automated deployment.
The key data source used for lead discovery. Their database consists of 275M contacts & 73M total organisations while maintaining incredibly high data accuracy. They also provide an API which searches over organisations and staff members, providing all the data our AI system needs to reason over.
Our approach to selecting the right LLM was data-driven and split into two phases: initial model selection for prototyping, and then selecting a model to fine-tune for production use.
To begin, we developed a custom evaluation suite to compare different models. This suite was designed to test each models' performance on tasks specific to our AI system, such as deciding if an organisation was relevant and deciding which staff members were relevant leads to pursue.
After running these evals and iterating on the prompts numerous times, we saw Claude-3.5-Sonnet consistently outperformed other models such as OpenAI's GPT4-o & Meta's Llama 3.1 70B.
While Claude-3.5-Sonnet performed very well, there were some edge cases that no amount of prompting could fix, and we also needed a more efficient model for production use. Therefore our goal was to find a smaller model that we could fine-tune to achieve better performance than Claude-3.5-Sonnet.
We went through a similar evaluation process with smaller LLMs such as Llama 3.1 8B, Phi 3.5, Mistral's family of models, and more. After running evaluations, we selected Llama 3.1 8B as our base model for fine-tuning. This base model slightly outperformed other small models on our eval suite and also offered excellent resource usage for future deployment.
To fine-tune Llama 3.1 8B we leveraged several great tools and platforms. The following diagram provides a high-level overview of our fine-tuning process:
To begin, we collected the input-output pairs from our work with Claude-3.5-Sonnet, employing a combination of manual human review and automated scripts to ensure high data quality. This data was then formatted into the Alpaca chat template compatible with Llama 3.1 models.
When the dataset was ready, we loaded Unsloth into a Google Colab Jupyter notebook and fine-tuned a LoRA adapter on our high-quality dataset. The resulting adapter is then merged into the base model and pushed to Hugging Face for safe storage & later use.
Post fine-tuning, we subjected our new model to our rigorous evaluation suite, benchmarking its performance against Claude-3.5-Sonnet. This process revealed areas where our dataset excelled and where it fell short. For scenarios where the model's performance was unsatisfactory, we augmented our dataset with synthetic data specifically crafted to address these edge cases, teaching the model how to handle such scenarios in future iterations.
We repeated this fine-tuning and evaluation cycle until our model consistently outperformed Claude-3.5-Sonnet across our test suite. Once we achieved this milestone, we proceeded to deployment, leveraging Runpod's easily accessible GPUs and streamlined deployment options to bring our fine-tuned model into production.
In production, when users flag suboptimal outputs, these cases are added to the evaluation suite, followed by augmentation of the dataset to show the model how to deal with this type of situation in the future.
This feedback loop ensures continuous improvement of the model's performance in the edge cases it has seen and also its ability to generalise on novel edge cases it encounters in production.
The lga system employs a sophisticated data flow and processing pipeline to deliver high-quality leads to B2B enterprise users. The sequence diagram below illustrates the interactions between the user, the lga system, the LLM, the Apollo API, and the user's CRM.
From the client's viewpoint, the process begins with submitting a form detailing their ideal client, business specifics, and the product or service they're offering. The lga system immediately responds with Campaign and Run IDs, initiating the lead generation process.
The client then enters a polling loop, periodically checking for updates. Once processing is complete, the system returns a paginated list of relevant organisations, complete with detailed information and potential decision-makers.
When ready, the client can trigger the "Enrich & Export" process. This initiates an oAuth flow with their chosen CRM, ensuring secure authorisation. After successful authentication, the system enriches the lead data with email information and exports it to the CRM.
Behind the scenes, the lga system orchestrates a complex series of operations to generate and refine leads:
We were careful to leverage the power of LLMs and software only in the areas that they respectively excel at.
It's important to note that at present, lga is not intended for consumer use. It functions solely as an internal tool for AiTuning to reach new clients and a case study to showcase our capabilities. This focus allows us to apply our learnings to our client's projects and further refine the system over time.
If you have a generative AI project you would like to discuss, please contact us!
Best,
Dan Austin