API Architecture

The Structify API is a RESTful service that provides endpoints for:
  • Dataset Management: Create and configure dataset schemas
  • Entity Operations: Add, update, and query entities (nodes)
  • Relationship Management: Connect entities with typed relationships
  • Document Processing: Upload and structure unstructured data
  • Job Management: Track asynchronous processing tasks
  • Search & Query: Natural language and relationship-based search

Core Concepts

Datasets

A dataset is a container for your structured data. Each dataset has:
  • A unique name
  • A schema defining entity types and properties
  • Tables for different entity types
  • Relationships between entities
# Create a dataset
dataset = client.datasets.create(
    name="CompanyData",
    description="Company information and relationships"
)

Entities

Entities are the records in your dataset. Each entity:
  • Belongs to a specific table (entity type)
  • Has properties defined by the schema
  • Can be connected to other entities via relationships
# Add an entity
entity = client.entities.add(
    dataset_name="CompanyData",
    table_name="Company",
    entity={
        "Name": "Acme Corp",
        "Industry": "Technology"
    }
)

Relationships

Relationships connect entities in your dataset. They:
  • Have a type (e.g., “Owns”, “Partners With”)
  • Can have properties
  • Are directional (source → target)
# Create a relationship
client.entities.add_relationship(
    dataset_name="CompanyData",
    source_entity_id=company_id,
    target_entity_id=subsidiary_id,
    relationship_name="Owns",
    properties={"percentage": 100}
)

Jobs

Many operations are asynchronous and return a job. Jobs:
  • Track the progress of long-running operations
  • Can be queried for status
  • Return results when complete
# Start a job
job = client.documents.structure(
    document_id=doc_id,
    dataset_name="CompanyData"
)

# Check status
status = client.jobs.get(job_id=job.id)
if status.status == "completed":
    print("Job finished!")

Authentication

All API requests require authentication via API key:
from structify import Structify

# Using environment variable (recommended)
client = Structify()  # Uses STRUCTIFY_API_TOKEN

# Or explicit API key
client = Structify(api_key="your_api_key")

Rate Limits

API requests are rate limited based on your plan:
PlanRequests/minConcurrent Jobs
Free602
Pro60010
EnterpriseCustomCustom

Error Handling

The API uses standard HTTP status codes:
from structify import BadRequestError, NotFoundError

try:
    entity = client.entities.get(entity_id="invalid")
except NotFoundError as e:
    print(f"Entity not found: {e}")
except BadRequestError as e:
    print(f"Invalid request: {e}")

Pagination

List endpoints support pagination:
# Get paginated results
jobs = client.jobs.list(
    limit=10,
    offset=20
)

for job in jobs:
    print(f"Job {job.id}: {job.status}")

Async Operations

For better performance, use the async client:
from structify import AsyncStructify
import asyncio

async def process_documents():
    client = AsyncStructify()

    # Upload multiple documents concurrently
    tasks = [
        client.documents.upload(file_path=f"doc{i}.pdf")
        for i in range(10)
    ]
    results = await asyncio.gather(*tasks)
    return results

WebSocket Events

Real-time updates are available via WebSocket:
# Subscribe to job events
async def listen_for_updates():
    async with client.websocket() as ws:
        async for event in ws:
            if event.type == "job.completed":
                print(f"Job {event.job_id} completed")

Best Practices

1

Use Environment Variables

Store API keys in environment variables, never in code
2

Batch Operations

Use batch endpoints when processing multiple items
3

Handle Rate Limits

Implement exponential backoff for rate limit errors
4

Monitor Jobs

Poll job status or use WebSockets for real-time updates
5

Cache Results

Cache frequently accessed data to reduce API calls

Next Steps