ANALYSIS TOOL BOX

7 functions

Data Collection

Gather data from web, PDFs, and external APIs.

Pull data from external sources without building scrapers from scratch — websites, PDFs, SEC filings, and U.S. Census geographies.

Functions

FetchWebsiteText

Scrape and clean the text content of any public webpage.

from analysistoolbox.data_collection import FetchWebsiteText

text = FetchWebsiteText(url="https://example.com/article")

ExtractTextFromPDF

Extract clean text from a local or remote PDF document.

from analysistoolbox.data_collection import ExtractTextFromPDF

text = ExtractTextFromPDF(file_path="report.pdf")

FetchPDFFromURL

Download a PDF from a URL to a local path.

GetCompanyFilings

Access SEC EDGAR filings programmatically — 10-Ks, 10-Qs, 8-Ks, and more.

from analysistoolbox.data_collection import GetCompanyFilings

filings = GetCompanyFilings(
    company_name="Apple Inc.",
    filing_type="10-K"
)

GetGoogleSearchResults

Fetch Google search results via the Serper API. Requires a SERPER_API_KEY environment variable.

FetchUSShapefile

Retrieve U.S. Census TIGER shapefiles for states, counties, tracts, or congressional districts.

from analysistoolbox.data_collection import FetchUSShapefile

gdf = FetchUSShapefile(geography="county", state="Virginia")

GetZipFile

Download and extract a ZIP archive from a URL.

Use cases

  • Competitive intelligence from public web sources
  • Financial analysis from SEC filings
  • Document processing pipelines
  • Geospatial analysis with Census boundaries