What the platform is built for
indices.io is designed for structured, deterministic work. The core unit is a task: a repeatable interaction on a specific website that can take parameters at runtime. You can compose tasks into richer workflows. For example, a personal‑assistant app can maintain a catalogue of tasks (stored in a vector DB or registry) and have an agent select and parameterise them in response to user queries. Because tasks execute with very low latency, you can deliver a highly responsive experience. Workloads that are inherently unstructured (e.g. “research everything about Company X”) are unsuitable, unless they operate over a well‑defined, bounded set of sites (eg Yahoo Finance, Pitchbook, etc)What works well
- Structure lists of data
- Examples: stock gainers pages, event listings, job boards, product catalogues
- Predictable navigation flows
- Stable URLs, consistent pagination, obvious filters, and detail pages.
- SME and mid-market websites
- Lighter anti‑bot measures and simpler DOMs tend to be more reliable.
- Authenticated flows with standard login
- OAuth or conventional email/password forms without unusual device challenges.
Proven examples
- Yahoo Finance – Retrieve top gaining stocks
- Luma – Extract upcoming events
- Job boards — Poll for open positions (title, company, location, application URL)
- Octopus Energy — Invoice retrieval behind authenticated endpoints
Less likely to work
- Strong anti-bot/anti-automation protections
- Highly dynamic, enterprise-scale sites
- 50/50 chance of success. Some work well, others are too complicated or brittle. When trying with complex sites, it’s especially important to scope the task tightly (and atomically) and write a precise description.
Examples
- kayak.co.uk flight retrieval / Booking.com: currently too much data transfer for our infrastructure to handle reliably
Designing high‑success tasks
- Choose the right page
- Prefer list/index pages with consistent item structure.
- Scope the task tightly
- One job per task (eg, “list top gainers” or “export invoices”), not a multi-step workflow with complex branches.
- Keep tasks atomic (retrieve, create, update) and idempotent where possible.
- Write a precise task description
- State the objective, success criteria, and any constraints (such as: max items to retrieve, pagination behaviour).
- All requirements of your task should be contained within the task description.
- The description must not contradict the input and output schemas. For example, if your description specifies at most
limittasks should be fetched, but the input schema does not contain alimitparameter, this is contradictory and will lead to lower success rates.
- Input and output schema shape
- Include only parameters that change between runs. Static parameters should instead be specified in the task description.
- Give strict types, bounds, and formats. For example, with dates, use the description to specify the format (such as
YYYY-MM-DDor ISO 8601). - Mark optional fields explicitly.
- Return only the data you need. If a website natively supports filtering, use this to reduce bandwidth.
- Consider pagination and stopping conditions
- Define limits such as
max_itemsormax_pagesto avoid unbounded retrieval.
- Define limits such as
- Authentication
- Task creation: include credentials in the task description for the first run.
- Subsequent runs: pass credentials via input parameters. Refer to the Octopus Energy example below.
Input and output schemas
In a way, indices generates an unofficial API for the website you’re integrating with. You can imagine an API endpoint takes certain input parameters, and gives a structured response. In the indices platform, the input and output schemas are how you define this contract. The input schema allows you to vary parameters in each invocation. For example, if you’re submitting an immigration application in a government portal, each execution will have different inputs (such as the name of the person), so these should be declared within the input schema. The output schema lets you enforce a structure for the task’s output, so that downstream dependencies can Both schemas are declared using jsonschema. We don’t support the full jsonschema syntax. For the properties, only the following fields are supported:type: eitherstring,integer,number,boolean,objectorarraydescription
Sample task templates
Here are some example configurations we’ve found to be successful, across a range of sites.Yahoo Finance — Top gainers
Yahoo Finance — Top gainers
Luma — Event retrieval
Luma — Event retrieval
Job boards — Open positions
Job boards — Open positions
Octopus Energy — Invoice retrieval (authenticated)
Octopus Energy — Invoice retrieval (authenticated)