Brief
Spacecubed were looking for an easy way to get massive amounts of company data for their Echo platform. We built an internal AI-driven pipeline that automates the extraction of comprehensive company data from a user-provided URL. Users simply enter a company website URL, and the system crawls the site and external sources to compile key business information.
Problem
To gather company data for their Echo platform, Spacecubed employees were required to manually browse the web and enter this information. Gathering a complete company profile such as contact email, phone number, physical address, LinkedIn profile, and map location traditionally requires manually navigating multiple websites and searching through Google for hours. This process is time-consuming, error-prone, and inefficient.
Solution
Figure 1: Diagram of the ai company data extraction agent.
Our solution follows these steps:
- User enters a URL
- The AI reads text from the supplied URL and begins extracting data.
- The AI visits other pages on the website (e.g., /about, /contactus, /ourteam) and extracts additional company data
- For missing details, the AI searches Google for the company's address and LinkedIn profile. The AI uses the data it has already extracted to ensure it is choosing the same company when selecting Google results
- The AI identifies the Google Maps object for the company location
- All aggregated company data is compiled and returned to the user
This solution was delivered as a private npm package to be integrated with the existing Echo platform.
Result
Users receive a unified company profile including name, contact email, phone number, address, LinkedIn, and map location in seconds. This approach eliminates manual research across multiple sources, greatly improving efficiency and data accuracy for Spacecubed's Echo platform.