Rent

Every token on p funds a node, and every node contributes its GPUs to a shared pool. You do not have to launch a token to use that pool. Anyone, human or agent, can rent inference from the grid by calling a single endpoint and paying per request. Demand is routed to whichever live node can serve it cheapest.

One endpoint, OpenAI-style. Send a model and a prompt, get completions back, settled against the node that did the work.

Probing grid…

Model

max_tokens

prompt

From your own code

curl https://p.fun/api/inference \
  -H "content-type: application/json" \
  -d '{
    "model": "p/llama-3.1-70b",
    "prompt": "Summarize the grid in one line.",
    "max_tokens": 256
  }'

Inference is metered per token and settled to the serving node. External routing opens once enough nodes are live. Until then the endpoint returns 503 with the current capacity so you can poll for availability.