Skip to main content
This guide summarises what works best with indices.io and how to design tasks that run fast and reliably. It covers the patterns we support, examples that work well, cases that are challenging, and practical guidance for crafting high‑success tasks. As a general rule, you’re likely to see high success rates on SME websites, dashboards, internal systems, etc, and mixed success rates on enterprise sites with lots going on. In particular, complex anti-bot protection or very large amounts of data flow will cause problems.

What the platform is built for

indices.io is designed for structured, deterministic work. The core unit is a task: a repeatable interaction on a specific website that can take parameters at runtime. You can compose tasks into richer workflows. For example, a personal‑assistant app can maintain a catalogue of tasks (stored in a vector DB or registry) and have an agent select and parameterise them in response to user queries. Because tasks execute with very low latency, you can deliver a highly responsive experience. Workloads that are inherently unstructured (e.g. “research everything about Company X”) are unsuitable, unless they operate over a well‑defined, bounded set of sites (eg Yahoo Finance, Pitchbook, etc)

What works well

  • Structure lists of data
    • Examples: stock gainers pages, event listings, job boards, product catalogues
  • Predictable navigation flows
    • Stable URLs, consistent pagination, obvious filters, and detail pages.
  • SME and mid-market websites
    • Lighter anti‑bot measures and simpler DOMs tend to be more reliable.
  • Authenticated flows with standard login
    • OAuth or conventional email/password forms without unusual device challenges.

Proven examples

  • Yahoo Finance – Retrieve top gaining stocks
  • Luma – Extract upcoming events
  • Job boards — Poll for open positions (title, company, location, application URL)
  • Octopus Energy — Invoice retrieval behind authenticated endpoints
Full task configurations are provided below, to serve as a reference of what we’ve seen work well.

Less likely to work

  • Strong anti-bot/anti-automation protections
  • Highly dynamic, enterprise-scale sites
    • 50/50 chance of success. Some work well, others are too complicated or brittle. When trying with complex sites, it’s especially important to scope the task tightly (and atomically) and write a precise description.
If your use case needs one of the above, reach out — we’re actively expanding coverage for tougher environments, and would love to hear about your needs.

Examples

  • kayak.co.uk flight retrieval / Booking.com: currently too much data transfer for our infrastructure to handle reliably

Designing high‑success tasks

  • Choose the right page
    • Prefer list/index pages with consistent item structure.
  • Scope the task tightly
    • One job per task (eg, “list top gainers” or “export invoices”), not a multi-step workflow with complex branches.
    • Keep tasks atomic (retrieve, create, update) and idempotent where possible.
  • Write a precise task description
    • State the objective, success criteria, and any constraints (such as: max items to retrieve, pagination behaviour).
    • All requirements of your task should be contained within the task description.
    • The description must not contradict the input and output schemas. For example, if your description specifies at most limit tasks should be fetched, but the input schema does not contain a limit parameter, this is contradictory and will lead to lower success rates.
  • Input and output schema shape
    • Include only parameters that change between runs. Static parameters should instead be specified in the task description.
    • Give strict types, bounds, and formats. For example, with dates, use the description to specify the format (such as YYYY-MM-DD or ISO 8601).
    • Mark optional fields explicitly.
    • Return only the data you need. If a website natively supports filtering, use this to reduce bandwidth.
  • Consider pagination and stopping conditions
    • Define limits such as max_items or max_pages to avoid unbounded retrieval.
  • Authentication
    • Task creation: include credentials in the task description for the first run.
    • Subsequent runs: pass credentials via input parameters. Refer to the Octopus Energy example below.

Input and output schemas

In a way, indices generates an unofficial API for the website you’re integrating with. You can imagine an API endpoint takes certain input parameters, and gives a structured response. In the indices platform, the input and output schemas are how you define this contract. The input schema allows you to vary parameters in each invocation. For example, if you’re submitting an immigration application in a government portal, each execution will have different inputs (such as the name of the person), so these should be declared within the input schema. The output schema lets you enforce a structure for the task’s output, so that downstream dependencies can Both schemas are declared using jsonschema. We don’t support the full jsonschema syntax. For the properties, only the following fields are supported:
  • type: either string, integer, number, boolean, object or array
  • description
If a field is not required, you may also specify a default. Properties may be declared as required or optional in the usual way. Specific examples of legal schemas are given below. The dashboard gives you an easy way to build input and output schemas using a friendly editor.

Sample task templates

Here are some example configurations we’ve found to be successful, across a range of sites.
{
  "display_name": "Yahoo Finance — Top Gainers (UK)",
  "website": "https://uk.finance.yahoo.com/markets/stocks/gainers/",
  "task": "Retrieve a list of the top gaining stocks.",
  "input_schema": {
    "type": "object",
    "properties": {
      "max_items": {
        "type": "number",
        "description": "Number of stocks to retrieve"
      }
    },
    "required": ["max_items"]
  },
  "output_schema": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "price": {
          "type": "number"
        },
        "volume": {
          "type": "number"
        },
        "ticker_symbol": {
          "type": "string"
        },
        "change_percent": {
          "type": "number"
        },
        "change_absolute": {
          "type": "number"
        }
      },
      "required": ["ticker_symbol", "name", "price", "change_absolute", "change_percent", "volume"]
    }
  }
}
{
  "display_name": "Luma — AI Events",
  "website": "https://luma.com/ai",
  "task": "Get a list of events in the Luma AI calendar (https://luma.com/ai), which are in London.

  Keep fetching events until `limit` number of events has been fetched, or we've reached the end of the list.",
  "input_schema": {
    "type": "object",
    "properties": {
      "limit": {
        "type": "integer",
        "default": 300,
        "minimum": 1,
        "description": "Maximum number of events to return."
      }
    }
  },
  "output_schema": {
    "type": "array",
    "items": {
      "type": "object",
      "title": "Event",
      "properties": {
        "url": {
          "type": "string",
          "format": "uri",
          "description": "Link to the event page."
        },
        "date": {
          "type": "string",
          "format": "date-time",
          "description": "ISO 8601 date and time of the event."
        },
        "title": {
          "type": "string",
          "description": "Title of the event."
        },
        "location": {
          "type": "string",
          "description": "Location where the event takes place."
        }
      },
      "required": ["title", "url", "date", "location"]
    }
  }
}
{
  "display_name": "Jobs - The Keystone Group",
  "website": "https://apply.workable.com/interpath-advisory/",
  "task": "Get a list of open positions at the company.
  
  Keep fetching jobs until `limit` number of events has been fetched, or we've reached the end of the list.",
  "input_schema": {
    "type": "object",
    "properties": {
      "limit": {
        "type": "integer",
        "default": 300,
        "minimum": 1,
        "description": "Maximum number of events to return."
      }
    }
  },
  "output_schema": {
    "type": "array",
    "items": {
      "type": "object",
      "title": "JobListing",
      "required": ["job_name", "company"],
      "properties": {
        "job_id": {
          "type": "string",
          "description": "Job reference or ID code."
        },
        "company": {
          "type": "string",
          "description": "Name of the hiring company or organization."
        },
        "job_name": {
          "type": "string",
          "description": "Title or name of the job."
        },
        "location": {
          "type": "string",
          "description": "What city or country the job is based in."
        },
        "closing_date": {
          "type": "string",
          "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
          "description": "Date the job closes (YYYY-MM-DD)."
        },
        "opening_date": {
          "type": "string",
          "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
          "description": "Date the job opened (YYYY-MM-DD)."
        }
      }
    },
    "title": "JobListingsOutput"
  }
}
{
  "display_name": "Octopus Energy — Invoices",
  "website": "https://octopus.energy/",
  "task": "Return a list of my invoices at Octopus Energy. Each invoice should include the invoice's unique identifier, bill type, to and from dates, and the URL to download the pdf.
  
  During task creation only, use the following credentials to log in:
  
  - Login username: <SNIP>
  - Login password: <SNIP>",
  "input_schema": {
    "type": "object",
    "properties": {
      "n": {
        "type": "number",
        "default": 5
      },
      "login_password": {
        "type": "string"
      },
      "login_username": {
        "type": "string"
      }
    },
    "required": ["login_username", "login_password"]
  },
  "output_schema": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
          "description": "Invoice ID generated by Octopus"
        },
        "to_date": {
          "type": "string",
          "description": "End date of the invoice/bill"
        },
        "bill_type": {
          "type": "string",
          "description": "electricity or gas"
        },
        "from_date": {
          "type": "string",
          "description": "Start date of the invoice/bill"
        },
        "invoice_pdf_url": {
          "type": "string"
        }
      },
      "required": ["id", "bill_type", "from_date", "to_date", "invoice_pdf_url"]
    }
  }
}