Lga

Page contents

Overview
Non-technical perspective
Time & cost savings
Tech stack
Tech architecture deep dive
- LLM selection & fine-tuning
- Data flow & processing
Future of lga

Overview

Lga (lead gen agent) is a generative AI system that discovers high-quality leads on autopilot for any B2B enterprise user.

Each lead discovered is enriched with their LinkedIn profile & email address and can be exported to the user's CRM. This article explains how AiTuning developed, fine-tuned & deployed the system from the ground up.

Lga from a non-technical perspective

When a new user joins, they're greeted with a concise form. This form asks them to describe their ideal client, provide details about their own business, and specify the service or product they're seeking leads for. Once this information is submitted, the AI system jumps into action.

Within a short timeframe, the AI presents a curated list of organisations as shown above. All ranked based on how likely they are to be interested in the user's offering. The initial view provides key details for each organisation, with an option to expand for more in-depth information.

Expanding an organisation reveals crucial facts about the company, along with a carefully filtered list of decision-makers within that organisation. The AI has ranked these staff members based on their likelihood of being the most relevant person to contact.

Once the user is ready, they may click "Enrich & Export" and select their chosen CRM. When selected, the oAuth flow will be launched, after which the user will receive all the organisations as well as their enriched staff members directly into their CRM, ready for outreach at scale.

Time and Cost Savings

To fully appreciate the impact of lga, let's first consider the conventional approach lead generation involved a labor-intensive process:

Manually deciding on optimal search parameters for the Apollo website
Individually reviewing each organisation and potential leads to assess fit
Manually exporting promising leads

A skilled lead generator working full-time could process approximately 8 leads per hour.

Comparing Lga's Automated Approach

Intangible Benefits

Beyond quantifiable metrics, lga offers several intangible benefits:

Consistency: AI-driven lead generation ensures uniform quality and criteria application, reducing human bias and fatigue-related errors.
Scalability: lga can easily handle sudden spikes in demand without requiring additional human resources.
24/7 Operation: Continuous lead generation allows sales teams to always have a fresh pipeline, regardless of time zones or working hours.

Tech stack

Claude-3.5-Sonnet

Our custom evaluation suite showed Anthropic's Claude-3.5-Sonnet outperformed OpenAI's GPT4-o and other model options on our use cases, so we chose it for the prototyping and prompt engineering phase of development.

Fine-tuned Llama 3.1 8B

Once the AI system was built and prompts were consistently effective, Llama 3.1 8B was selected for fine-tuning. This smaller, efficient LLM offered an excellent balance of performance and resource usage over other small models we tested, making it ideal for the production environment.

Unsloth

Unsloth was employed to accelerate the fine-tuning process whilst minimising resource consumption. Its free, open-source, offers 2.2x faster performance than alternatives, and a 70% reduction in VRAM usage. All this made it an excellent choice to fine-tune Llama 3.1 8B with. Google Colab provided the cost-effective platform for running Unsloth.

Runpod

Runpod was selected for the serverless deployment of the fine-tuned LLM. Its pay-per-use GPU deployment model, with no idling costs, sub-250ms cold starts, and autoscaling capabilities, ensured efficient and responsive operation of our fine-tuned Llama model.

Langfuse

Langfuse played a crucial role in monitoring inference metrics and collecting datapoints for fine-tuning. It is open-source, and therefore can be self-hosted. It also provides centralised prompt management and feedback logging features which are great for scalability.

HuggingFace

HuggingFace provided a secure, private hosting solution for the fine-tuned LoRA adapter and merged model weights.

Golang

Golang was chosen for the REST API server due to its performance, concurrency capabilities, and scalability. All essential attributes for handling the large amounts of data and consumer-facing nature of the application.

Python

We leveraged Python for various tasks such as building & running our custom made model evaluation suite, fine-tuning, and deployment scripts.

MongoDB

MongoDB provided us the flexibility & performance needed for handling large volumes of lead data. Its ability to store nested structures aligned perfectly with our data models. The indexing options also enabled fast retrieval for our AI system to make use of.

AWS

AWS hosted both the Golang server and Langfuse deployment on an EC2 instance.

Next.js (Typescript)

Next.js was utilised for frontend development, offering a powerful and flexible framework for creating a responsive and dynamic user interface.

Vercel

Vercel provided a seamless and efficient platform for hosting the Next.js application. We linked it to our Github repo and it automated deployment.

Apollo.io

The key data source used for lead discovery. Their database consists of 275M contacts & 73M total organisations while maintaining incredibly high data accuracy. They also provide an API which searches over organisations and staff members, providing all the data our AI system needs to reason over.

Technical Architecture Deep Dive

LLM Selection and Fine-Tuning

Our approach to selecting the right LLM was data-driven and split into two phases: initial model selection for prototyping, and then selecting a model to fine-tune for production use.

Phase 1: Evaluations for model selection

To begin, we developed a custom evaluation suite to compare different models. This suite was designed to test each models' performance on tasks specific to our AI system, such as deciding if an organisation was relevant and deciding which staff members were relevant leads to pursue.

After running these evals and iterating on the prompts numerous times, we saw Claude-3.5-Sonnet consistently outperformed other models such as OpenAI's GPT4-o & Meta's Llama 3.1 70B.

While Claude-3.5-Sonnet performed very well, there were some edge cases that no amount of prompting could fix, and we also needed a more efficient model for production use. Therefore our goal was to find a smaller model that we could fine-tune to achieve better performance than Claude-3.5-Sonnet.

We went through a similar evaluation process with smaller LLMs such as Llama 3.1 8B, Phi 3.5, Mistral's family of models, and more. After running evaluations, we selected Llama 3.1 8B as our base model for fine-tuning. This base model slightly outperformed other small models on our eval suite and also offered excellent resource usage for future deployment.

Phase 2: Fine-tuning for production

To fine-tune Llama 3.1 8B we leveraged several great tools and platforms. The following diagram provides a high-level overview of our fine-tuning process:

To begin, we collected the input-output pairs from our work with Claude-3.5-Sonnet, employing a combination of manual human review and automated scripts to ensure high data quality. This data was then formatted into the Alpaca chat template compatible with Llama 3.1 models.

When the dataset was ready, we loaded Unsloth into a Google Colab Jupyter notebook and fine-tuned a LoRA adapter on our high-quality dataset. The resulting adapter is then merged into the base model and pushed to Hugging Face for safe storage & later use.

Post fine-tuning, we subjected our new model to our rigorous evaluation suite, benchmarking its performance against Claude-3.5-Sonnet. This process revealed areas where our dataset excelled and where it fell short. For scenarios where the model's performance was unsatisfactory, we augmented our dataset with synthetic data specifically crafted to address these edge cases, teaching the model how to handle such scenarios in future iterations.

We repeated this fine-tuning and evaluation cycle until our model consistently outperformed Claude-3.5-Sonnet across our test suite. Once we achieved this milestone, we proceeded to deployment, leveraging Runpod's easily accessible GPUs and streamlined deployment options to bring our fine-tuned model into production.

Continuous Fine-tuning in production

In production, when users flag suboptimal outputs, these cases are added to the evaluation suite, followed by augmentation of the dataset to show the model how to deal with this type of situation in the future.

This feedback loop ensures continuous improvement of the model's performance in the edge cases it has seen and also its ability to generalise on novel edge cases it encounters in production.

Data Flow and Processing

The lga system employs a sophisticated data flow and processing pipeline to deliver high-quality leads to B2B enterprise users. The sequence diagram below illustrates the interactions between the user, the lga system, the LLM, the Apollo API, and the user's CRM.

Client Perspective

From the client's viewpoint, the process begins with submitting a form detailing their ideal client, business specifics, and the product or service they're offering. The lga system immediately responds with Campaign and Run IDs, initiating the lead generation process.

The client then enters a polling loop, periodically checking for updates. Once processing is complete, the system returns a paginated list of relevant organisations, complete with detailed information and potential decision-makers.

When ready, the client can trigger the "Enrich & Export" process. This initiates an oAuth flow with their chosen CRM, ensuring secure authorisation. After successful authentication, the system enriches the lead data with email information and exports it to the CRM.

Server Perspective

Behind the scenes, the lga system orchestrates a complex series of operations to generate and refine leads:

Form to Campaign Prompt: The submitted form data is processed by the LLM using a specialised prompt, transforming raw input into a pre-defined & structured campaign format. This includes query parameters for Apollo APIs, how to rank the returned organisations, and more.
Campaign and Run Creation: The system creates a new campaign and run based on the processed form data. A run is then executed to launch the below process.
Organisation Discovery: The system queries the Apollo API with highly relevant query parameters to fetch relevant organisation data.
Org Ranking Prompt: The LLM ranks the organisations based on their relevance to the user's business offering.
People Discovery: A batched API call to Apollo retrieves information about potential decision-makers within the ranked organisations.
People Ranking Prompt: The LLM ranks these individuals based on their likelihood of being the most relevant contact.
Run ends: When all organisations have had their people ranked, the Run ends, and the client can retrieve the results.
Enrichment: After CRM authorisation, the system makes an additional API call to Apollo to enrich the lead data with email information.
Export: Finally, the enriched data is exported to the client's CRM, ready for outreach campaigns.

We were careful to leverage the power of LLMs and software only in the areas that they respectively excel at.

Future of lga

1. Streamlined User Experience

Replace the current form with a simplified UI.
Users will input their business website URL and the fine-tuned AI will research their business and then engage in a brief chat to create campaigns.

2. Personalised AI Adapters

Implement the ability for users to make ranking adjustments.
Use this feedback to fine-tune individual AI adapters for each business, enhancing accuracy over time and creating personalised adapters for each user.

3. Cold Email Campaigns

Integrate AI-personalised email functionality for scaled email outreach.
Enable automated, truly tailored outreach to discovered leads.

4. Enhanced Data Visualisation

Implement interactive dashboards for better result interpretation.
Introduce visual representations of lead quality and relevance scores.

Current Status

It's important to note that at present, lga is not intended for consumer use. It functions solely as an internal tool for AiTuning to reach new clients and a case study to showcase our capabilities. This focus allows us to apply our learnings to our client's projects and further refine the system over time.

If you have a generative AI project you would like to discuss, please contact us!

Best,
Dan Austin

An AI system that discovers 9,600 high-quality B2B leads per day.