People start developing protocol, standards and overengineering abstractions to get free PR and status. Since AI hype started we have seen so many concepts built upon the basic LLM, from Langchain to CoT chains to MCP to UTCP.
I even attended a conference where one of the speakers was adamant that you couldn't "chain model responses" until Langchain came out. Over and over again, we build these abstractions that distance us from the lower layers and the core technology, leaving people with huge knowledge gaps and misunderstanding of it.
And with LLM's, this cycle got quite fast and it's impact in the end is highly visible - these tools do nothing but poison your context, offering you less control over the response and tie you into their ecosystem.
Every time I tried just listing a list of available functions with a basic signature like:
it provided better and more efficient results than poisoning the context with a bunch of tool definitions because "oooh tool calling works that way".
Making a simple monad interface beats using langchain by a margin, and you get to keep control over its implementation and design rather than having to use a design made by someone who doesn't see the pattern.
Keeping control over what goes into the prompt gives you way better control over the output. Keeping things simple gives you a way better control over the flow and architecture.
I don't care that your favorite influencer says differently. If you go and build, you'll experience it directly.
How do you pull out the call? Parse the response? Deal with invalid calls? Encode and tie results to the original call? Deal with error states? Is it custom work to bring in each new api or do you have common pieces dealing with, say, rest APIs or shelling out, etc?
Lots of this isn’t project specific in what you suggest as a better approach.
If your setup keeps working better then it’s probably got a lot of common pieces that could be reused, right? Or do you write the parsing from scratch each time?
If it’s reused, then is it that different from creating abstractions?
As an aside - models are getting explicitly trained to use tool calls rather than custom things.
You parse it. Invalid calls you revalidate with a model of your choice.
Parsing isn't a hard to solve thing, it's easy and you can parse whatever you want. I've been parsing responses from LLM's since days of Ada and DaVinci where they would just complete the text and it really isn't that hard.
> Deal with invalid calls? Encode and tie results to the original call? Deal with error states? Is it custom work to bring in each new api or do you have common pieces dealing with, say, rest APIs or shelling out, etc?
Why would any LLM framework deal with that? That is your basic architecture 101. I don't want to stack another architecture on top of an existing one.
>If it’s reused, then is it that different from creating abstractions?
Because you have control over the abstractions. You have control over what goes into the context. You have control over updating those abstractions and prompts based on your context. You have control over choosing your models instead of depending on models supported by the library or the tool you're using.
>As an aside - models are getting explicitly trained to use tool calls rather than custom things.
That's great,but also they are great at generating code and guess what the code does? Calls functions.
That's all fine, but it should be noted that proper tool-calling using the LLM's structured response functionality guarantees a compliant response because invalid responses are culled as they're generated.
But now you're limited by the model, provider and the model's adherence to the output.
While using structured outputs is great, it can cause large performance impacts and you lose control over it - i.e. using a smaller model via groq fix the invalid response often times works faster than having a large model generate a structured response.
Have 50 tools? It's faster and more precise to just stack 2 small models or do a search and just pass in basic definitions for each and have it output a function call than to feed it all 50 tools defined as JSON.
While structured response itself is fine, it really depends on the usecase and on the provider. If you can handle the loss of compute seconds, yeah it's great. If you can't, then nothing beats having absolute control over your provider, model and output choice.
This approach makes sense when integrating LLMs directly into your application.
However, you still need a protocol for the reverse scenario—when your application needs to integrate with an LLM provider's interface.
For many applications, integrating into the user's existing chat interface is far more valuable than building a custom one.
Currently, MCP is the leading option for this, though I haven't yet found any MCP implementations that are genuinely useful.
There are significant advantages to avoiding custom LLM integrations: users can leverage their existing LLM subscriptions, you maintain less code, and your product can focus on its core use case rather than building LLM interfaces.
While this approach won't suit every application, it will likely be the right choice for most.
Technically yes, and I've caught myself in a bit of a paradoxal conundrum :)
Think there is a curve of "reason" to apply when someone is advocating something like this, especially about technology and abstractions.
While in most places adding abstractions to core technology makes sense since "it makes it easier to use/manage/deploy" and it is reasonable to use it, LLM's are a quite different case than usual.
Because usually going downstream makes it harder (i.e. going 100% assembly or 100% JS is a harder thing), but going 100% pure LLM is an easier thing - you don't have to learn new frameworks, no need to learn new abstractions, it is shareable, easy to manage and readable by everyone.
In this case, going upstream is what makes it harder, turns it into code management, makes it harder to reason about and adds inevitable complexity.
If you add a new person on your team and they see that you are using 100% assembly, they have to onboard to it, learn how it works, learn why this was done this way etc etc.
If you add a new person to your team and you see that they are using all these tools and abstractions on top of LLMs its the same.
But if you are just using the core tech, they can immediately understand what is going on. No wrapped prompts, magic symbols, weird abstractions - "oh this is an agent but this is a chain while this is a retriever which is also an agent but it can only be chained to a non-retriever that uses UTCP to call it".
So as always, it is subjective and any advocacy needs to be applied to a curve of reason - in the end, does it make sense?
Soon, they will discover APIs. (Application Programming Interfaces). Where programs interact with each other with given contracts/protocols/prototypes.
Funnily enough, thats exactly what UTCP is. Its just a way to let your AI leverage existing APIs instead of doing any wrapper protocols (like MCP) for the tool call.
The protocol is only for discovery. It gets out of the way for the actual call
I have the same feeling as 2010-2015 with the js ecosystem craziness. These half baked ideas. Please we need to stop, we don’t need extra layers of abstraction and tooling that is not really solving any problem, it’s just ego tripping to create something and to get GitHub stars.
I disagree. I think it’s worthwhile trying these things out to see what’s working and what isn’t. We can sit and think and discuss all we want but sometimes you need to build something and try.
I think that the overwhelming majority of orgs are too inflexible to just insert LLM access, step back, and ask of it was an improvement.
I think you're more likely to see an effect if you can somehow capture how far a solo founder gets before they need to bring on the second employee. Because LLMs aren't better than me at my job, but they are better than me at many jobs which I can tolerate being done poorly.
If there is a significant technological shift, it'll be when those startups start outperforming the ones for which LLMs weren't available at the start.
Right now theres is only one option available for people creating agents, that lets the users of those agents add capabilities for the agents through tools.
Its MCP. And to be honest, its not great, because its very opinionated, and requires an internet-wide infrastructure rebuild. UTCP is the only other alternative. And it takes a light-weight approach, that instead of forcing extra abstraction layers, uses existing protocols
What's your issue with "half baked ideas"? Would you prefer that we go back to the 80s when everyone baked in a garage and only released fully-baked code? I don't remember things being better then.
And as for the js ecosystem craziness; while I agree that we went through a lot of craziness, I think it was necessary to arrive at the current relative stability around React and a handful of other frameworks in conversation with it. React in particular was originally released in 2013 and definitely benefitted from baking outside in the sun since then.
In my opinion, there's a lot of unnecessary criticism here. The entire AI field is in a discovery phase, with new ideas, approaches, and methods emerging at many levels, including models and APIs.
Right now, we don't even have a single standardized API for model responses (OpenAI's Chat Completions API is the de facto standard, but it's not formally standardized, lacks reasoning responses, and caching remains unclear, etc.).
So some ideas from projects like this could stick and contribute something new.
Agreed! Right now, there's only one way to do tool calling, and that's MCP. I'm not happy with MCP, so I want alternatives so in the end we can all choose or create the best option. We're talking about the infrastructure of the future right now, and I dont want to rebuild everything that works, just because MCP told be to.
Some developers love to implement Rube-Goldberg-machines as tooling.
Whenever they need to make two or three manual steps or configurations, they rather develop an abstraction layer where you just have to press one button to perform them.
Until that button needs to be accompanied by another manual action. Then they will again develop an abstraction which encapsulates the press of the button along with the augmenting action and is triggered by the pressing of a superior button.
Example: Docker->Kubernetes->Helm or any other tooling that uses YAML to write YAML.
Agreed, but my point is that an abstraction should provide discernible gain of value, not merely roll a handful of operations of the same level into one.
One really nice thing about using LLMs as components is that they just generate text. We've taught them to sometimes issue JSON messages representing a structured query or command, but it still comes out of the thing as text; the model doesn't have any IO or state of its own. Then the actual program can then decide what, if anything, to do with that structured request.
I don't like the gradual reframing of the model itself as being in charge of the tools, aided by a framework that executes whatever the model pumps out. It's not good to abstract away the connection between the text-generator and the actual volatile IO of your program.
By not using a server as an interface, UTCP more tightly couples the caller to the tool. For example, a search MCP server could make the search provider an optional parameter to be dynamically selected without breaking the contract. With UTCP you can not do that.
Given that tool calling usually does not constitute the bulk of a typical LLM query's time, I think optimizing the latency by eliminating the interface is an unwise architectural choice. Interfaces are good things.
It's not about latency. It's about security and rebuilding existing infrastructure. There are plenty of communication protocols to get anything you can imagine done on the internet. We should use those and write our interfaces and wrappers using those, not MCP. That is what UTCP does. It allows agents to use existing infra, communication protocols and security.
This is insane! We've been training models to write and edit code, and now this? These guys should read this (CodeAct) paper https://arxiv.org/abs/2402.01030
> LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
"The Universal Tool Calling Protocol (UTCP) is a modern, flexible, and scalable standard for defining and interacting with tools across a wide variety of communication protocols."
But we're real people working on a real protocol that we feel improves on the status quo, and calling it 'trash' because it doesn't fit your vernacular is unnecessary, and dare I say hurtful
So pls, if you have feedback, happy to improve on it, but don't just dismiss it
But there is a grain of truth to their commentary. "Modern" gets old fast, it probably shouldn't be used, just like "new" shouldn't be used in project/library names.
IMHO, removing "modern, flexible and scalable" would improve that line.
I've diagonally read through the README, and for my taste there is overuse of "soft" adjectives (simple, easy, ...) which makes me (an engineer) discard it as marketing-driven.
You know I really don’t get it. I must be missing something obvious.
Any “tool protocol” is really just a typed function interface.
For decades, we’ve had dozens (hundreds? thousands?) of different formats/languages to describe those.
Why do we keep making more?
Does it really matter who is calling the function and for which purpose? Does it matter if it’s implemented by a server or a command line executable? Does the data transport protocol matter? Does “model” matter?
If you were to try to implement what you described, you'd figure out what you missed quickly. Namely, that you have to interface your interface to a text interface.
So this basically says either have the tool implement this inteface or write a wrapper (which would be the equivalent of writing an MCP server), right? So this is slightly more general than MCP, or am I missing something?
Why is everyone complaining, this is great and a much needed improvement over MCP. MCP is not yet the "definitive answer", perhaps there's time to replace it with something better.
I understand your frustrations of people adding abstractions on existing infrastructure for no reason, but UTCP is actually trying to avoid exactly that.
Picture the following scenario. You create an agent that your users should use in their daily life. Now you want your users to be able to do more than generate text. A.k.a. call tools.
How do you do this?
Option 1: you implement your own infrastructure allowing them to call whatever tools you custom infra supports. -> less flexibility on the user side of what tools they can use
Option 2: you use an MCP client. Now your users can add any tool they want as long as it has MCP support. This however adds an extra layer of infra. Agent -> MCP Server -> Original Tool (maybe a REST API, or a CLI command, etc.)
Option 3: you use the UTCP client. Now your users can add any tool that uses any communication protocol. Agent -> Original tool (maybe a REST API, CLI command, MCP server, etc.)
TLDR, UTCP is about security and not rebuilding existing infrastructure. There are plenty of communication protocols to get anything you can imagine done on the internet. We should use those and write interfaces and wrappers using those, not MCP. That is what UTCP does. It allows agents to use existing infra, communication protocols and security.
We're fully community-driven OSS, and just aiming to make a protocol that is useful for people
Always the same with every tech hype-train.
People start developing protocol, standards and overengineering abstractions to get free PR and status. Since AI hype started we have seen so many concepts built upon the basic LLM, from Langchain to CoT chains to MCP to UTCP.
I even attended a conference where one of the speakers was adamant that you couldn't "chain model responses" until Langchain came out. Over and over again, we build these abstractions that distance us from the lower layers and the core technology, leaving people with huge knowledge gaps and misunderstanding of it.
And with LLM's, this cycle got quite fast and it's impact in the end is highly visible - these tools do nothing but poison your context, offering you less control over the response and tie you into their ecosystem.
Every time I tried just listing a list of available functions with a basic signature like:
fn run_search(query: String, engine: String oneOf Bing, Google, Yahoo)
it provided better and more efficient results than poisoning the context with a bunch of tool definitions because "oooh tool calling works that way".
Making a simple monad interface beats using langchain by a margin, and you get to keep control over its implementation and design rather than having to use a design made by someone who doesn't see the pattern.
Keeping control over what goes into the prompt gives you way better control over the output. Keeping things simple gives you a way better control over the flow and architecture.
I don't care that your favorite influencer says differently. If you go and build, you'll experience it directly.
How do you pull out the call? Parse the response? Deal with invalid calls? Encode and tie results to the original call? Deal with error states? Is it custom work to bring in each new api or do you have common pieces dealing with, say, rest APIs or shelling out, etc?
Lots of this isn’t project specific in what you suggest as a better approach.
If your setup keeps working better then it’s probably got a lot of common pieces that could be reused, right? Or do you write the parsing from scratch each time?
If it’s reused, then is it that different from creating abstractions?
As an aside - models are getting explicitly trained to use tool calls rather than custom things.
You parse it. Invalid calls you revalidate with a model of your choice. Parsing isn't a hard to solve thing, it's easy and you can parse whatever you want. I've been parsing responses from LLM's since days of Ada and DaVinci where they would just complete the text and it really isn't that hard.
> Deal with invalid calls? Encode and tie results to the original call? Deal with error states? Is it custom work to bring in each new api or do you have common pieces dealing with, say, rest APIs or shelling out, etc?
Why would any LLM framework deal with that? That is your basic architecture 101. I don't want to stack another architecture on top of an existing one.
>If it’s reused, then is it that different from creating abstractions?
Because you have control over the abstractions. You have control over what goes into the context. You have control over updating those abstractions and prompts based on your context. You have control over choosing your models instead of depending on models supported by the library or the tool you're using.
>As an aside - models are getting explicitly trained to use tool calls rather than custom things.
That's great,but also they are great at generating code and guess what the code does? Calls functions.
I’m not saying they’re hard I’m saying they’re common problems that don’t need solving each time. I don’t re-solve title casing every time I need it.
> Because you have control over the abstractions.
And depending on what you’re using you have that with other libraries/etc.
> That's great,but also they are great at generating code and guess what the code does? Calls functions.
Yep, and a lot more so it depends how well you’re sandboxing that I guess.
That's all fine, but it should be noted that proper tool-calling using the LLM's structured response functionality guarantees a compliant response because invalid responses are culled as they're generated.
But now you're limited by the model, provider and the model's adherence to the output.
While using structured outputs is great, it can cause large performance impacts and you lose control over it - i.e. using a smaller model via groq fix the invalid response often times works faster than having a large model generate a structured response.
Have 50 tools? It's faster and more precise to just stack 2 small models or do a search and just pass in basic definitions for each and have it output a function call than to feed it all 50 tools defined as JSON.
While structured response itself is fine, it really depends on the usecase and on the provider. If you can handle the loss of compute seconds, yeah it's great. If you can't, then nothing beats having absolute control over your provider, model and output choice.
This approach makes sense when integrating LLMs directly into your application.
However, you still need a protocol for the reverse scenario—when your application needs to integrate with an LLM provider's interface.
For many applications, integrating into the user's existing chat interface is far more valuable than building a custom one. Currently, MCP is the leading option for this, though I haven't yet found any MCP implementations that are genuinely useful.
There are significant advantages to avoiding custom LLM integrations: users can leverage their existing LLM subscriptions, you maintain less code, and your product can focus on its core use case rather than building LLM interfaces.
While this approach won't suit every application, it will likely be the right choice for most.
While I might agree with your standpoint, how is this different from also influencing?
I've seen a lot of influencers suggest "100% assembly", "JavaScript only", "no SQL", which seem quite similar.
Technically yes, and I've caught myself in a bit of a paradoxal conundrum :)
Think there is a curve of "reason" to apply when someone is advocating something like this, especially about technology and abstractions.
While in most places adding abstractions to core technology makes sense since "it makes it easier to use/manage/deploy" and it is reasonable to use it, LLM's are a quite different case than usual.
Because usually going downstream makes it harder (i.e. going 100% assembly or 100% JS is a harder thing), but going 100% pure LLM is an easier thing - you don't have to learn new frameworks, no need to learn new abstractions, it is shareable, easy to manage and readable by everyone.
In this case, going upstream is what makes it harder, turns it into code management, makes it harder to reason about and adds inevitable complexity.
If you add a new person on your team and they see that you are using 100% assembly, they have to onboard to it, learn how it works, learn why this was done this way etc etc.
If you add a new person to your team and you see that they are using all these tools and abstractions on top of LLMs its the same.
But if you are just using the core tech, they can immediately understand what is going on. No wrapped prompts, magic symbols, weird abstractions - "oh this is an agent but this is a chain while this is a retriever which is also an agent but it can only be chained to a non-retriever that uses UTCP to call it".
So as always, it is subjective and any advocacy needs to be applied to a curve of reason - in the end, does it make sense?
Soon, they will discover APIs. (Application Programming Interfaces). Where programs interact with each other with given contracts/protocols/prototypes.
/pun
Funnily enough, thats exactly what UTCP is. Its just a way to let your AI leverage existing APIs instead of doing any wrapper protocols (like MCP) for the tool call.
The protocol is only for discovery. It gets out of the way for the actual call
Never heard of it. Is API the thing we develop for MCPs to interact with?
yes but we need a lot of SOAP and XML. soon. /s :)
Then MSFT discovers COM.
That's fine. Wait until AI bros discover COM (Context Orchestration Mechanisms).
Sounds complex. We'll need an AI-enabled Google or Microsoft to figure this stuff out for us and sell as a cloud subscription service.
The common but lazy joke. This is about connecting agents which have a specific set of requirements about the apis they can call and existing APIs.
I have the same feeling as 2010-2015 with the js ecosystem craziness. These half baked ideas. Please we need to stop, we don’t need extra layers of abstraction and tooling that is not really solving any problem, it’s just ego tripping to create something and to get GitHub stars.
I disagree. I think it’s worthwhile trying these things out to see what’s working and what isn’t. We can sit and think and discuss all we want but sometimes you need to build something and try.
none of it is working, MIT just released a paper showing that the overwhelming majority of orgs adopting llm workflows have seen no benefit
I think that the overwhelming majority of orgs are too inflexible to just insert LLM access, step back, and ask of it was an improvement.
I think you're more likely to see an effect if you can somehow capture how far a solo founder gets before they need to bring on the second employee. Because LLMs aren't better than me at my job, but they are better than me at many jobs which I can tolerate being done poorly.
If there is a significant technological shift, it'll be when those startups start outperforming the ones for which LLMs weren't available at the start.
nah, i am too old, i already know that this is not useful and it will generate security issues in the short term.
Wow, must be amazing to be the first omniscient human being!
It’s called experience. Never say i know everything. I know this is hype nonsense that’s all. Thank me later.
Right now theres is only one option available for people creating agents, that lets the users of those agents add capabilities for the agents through tools. Its MCP. And to be honest, its not great, because its very opinionated, and requires an internet-wide infrastructure rebuild. UTCP is the only other alternative. And it takes a light-weight approach, that instead of forcing extra abstraction layers, uses existing protocols
What's your issue with "half baked ideas"? Would you prefer that we go back to the 80s when everyone baked in a garage and only released fully-baked code? I don't remember things being better then.
And as for the js ecosystem craziness; while I agree that we went through a lot of craziness, I think it was necessary to arrive at the current relative stability around React and a handful of other frameworks in conversation with it. React in particular was originally released in 2013 and definitely benefitted from baking outside in the sun since then.
Maybe we can release a buzzword thing called Tool Calling Protocol. And it's just TCP.
Or Contextual Shared Variables
Or eXecution Model Language
or..
Context Sharing System and Heuristic Transformers Machine Learning are also promising.
Damn these Tool-related Language Additions.
Please fund my Extensible eXpertise Enhancements
In my opinion, there's a lot of unnecessary criticism here. The entire AI field is in a discovery phase, with new ideas, approaches, and methods emerging at many levels, including models and APIs. Right now, we don't even have a single standardized API for model responses (OpenAI's Chat Completions API is the de facto standard, but it's not formally standardized, lacks reasoning responses, and caching remains unclear, etc.). So some ideas from projects like this could stick and contribute something new.
Agreed! Right now, there's only one way to do tool calling, and that's MCP. I'm not happy with MCP, so I want alternatives so in the end we can all choose or create the best option. We're talking about the infrastructure of the future right now, and I dont want to rebuild everything that works, just because MCP told be to.
Some developers love to implement Rube-Goldberg-machines as tooling.
Whenever they need to make two or three manual steps or configurations, they rather develop an abstraction layer where you just have to press one button to perform them.
Until that button needs to be accompanied by another manual action. Then they will again develop an abstraction which encapsulates the press of the button along with the augmenting action and is triggered by the pressing of a superior button.
Example: Docker->Kubernetes->Helm or any other tooling that uses YAML to write YAML.
Increased abstraction is why we don't all program in assembly any more.
Agreed, but my point is that an abstraction should provide discernible gain of value, not merely roll a handful of operations of the same level into one.
Or, to re-word it in the words of the parent comment, that's why we don't write write programs in assembler that output even more assembler.
The assumption that many decade old tools might adopt a year old protocol just to improve handling with agents.
That's optimism
One really nice thing about using LLMs as components is that they just generate text. We've taught them to sometimes issue JSON messages representing a structured query or command, but it still comes out of the thing as text; the model doesn't have any IO or state of its own. Then the actual program can then decide what, if anything, to do with that structured request.
I don't like the gradual reframing of the model itself as being in charge of the tools, aided by a framework that executes whatever the model pumps out. It's not good to abstract away the connection between the text-generator and the actual volatile IO of your program.
What kind of abstractions around llm-io would you prefer?
Initial commit: Jun 24, 2025. Already has a migration path from 0.x to 1.x.
Why am I feeling so old now?
On a more serious note, do any models support this?
> Initial commit: Jun 24, 2025. Already has a migration path from 0.x to 1.x.
because that is perfectly reasonable to the llm that wrote that readme
we should have age verification to post on HN
By not using a server as an interface, UTCP more tightly couples the caller to the tool. For example, a search MCP server could make the search provider an optional parameter to be dynamically selected without breaking the contract. With UTCP you can not do that.
Given that tool calling usually does not constitute the bulk of a typical LLM query's time, I think optimizing the latency by eliminating the interface is an unwise architectural choice. Interfaces are good things.
It's not about latency. It's about security and rebuilding existing infrastructure. There are plenty of communication protocols to get anything you can imagine done on the internet. We should use those and write our interfaces and wrappers using those, not MCP. That is what UTCP does. It allows agents to use existing infra, communication protocols and security.
This is insane! We've been training models to write and edit code, and now this? These guys should read this (CodeAct) paper https://arxiv.org/abs/2402.01030
> LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
We probably don’t need another tool calling protocol unless it is also a tool composition protocol.
Armin Ronacher has recently been making some good points about tool composition: https://lucumr.pocoo.org/2025/8/18/code-mcps/
hulloo I think this is really good feedback and something we'll look into, thank you for sharing!
I promise, it's the last one we'll ever need https://en.wikipedia.org/wiki/The_Last_One_%28software%29
Cmd+F "modern"
"The Universal Tool Calling Protocol (UTCP) is a modern, flexible, and scalable standard for defining and interacting with tools across a wide variety of communication protocols."
Into the trash it goes.
hey man, I get it, we're using fancy words
But we're real people working on a real protocol that we feel improves on the status quo, and calling it 'trash' because it doesn't fit your vernacular is unnecessary, and dare I say hurtful
So pls, if you have feedback, happy to improve on it, but don't just dismiss it
But there is a grain of truth to their commentary. "Modern" gets old fast, it probably shouldn't be used, just like "new" shouldn't be used in project/library names.
IMHO, removing "modern, flexible and scalable" would improve that line.
I've diagonally read through the README, and for my taste there is overuse of "soft" adjectives (simple, easy, ...) which makes me (an engineer) discard it as marketing-driven.
as is tradition the younger generation of devs are going to get burned by hype vaporware and create a whole ecosystem of orphaned protocols and tools
reminds me of the early 2000’s and all the nosql trash
It has "universal" in the name so this must be it, right?
You know I really don’t get it. I must be missing something obvious.
Any “tool protocol” is really just a typed function interface.
For decades, we’ve had dozens (hundreds? thousands?) of different formats/languages to describe those.
Why do we keep making more?
Does it really matter who is calling the function and for which purpose? Does it matter if it’s implemented by a server or a command line executable? Does the data transport protocol matter? Does “model” matter?
Here’s your tool protocol brogood job! 3k github stars for you sir!!!
If you were to try to implement what you described, you'd figure out what you missed quickly. Namely, that you have to interface your interface to a text interface.
I don't want to be trite, but terminals and things like MUDs and their precursors have been interfacing with humans via text for 40 years.
Huh? This is all text
So this basically says either have the tool implement this inteface or write a wrapper (which would be the equivalent of writing an MCP server), right? So this is slightly more general than MCP, or am I missing something?
Why is everyone complaining, this is great and a much needed improvement over MCP. MCP is not yet the "definitive answer", perhaps there's time to replace it with something better.
Hi HN,
UTCP Contributor here.
I understand your frustrations of people adding abstractions on existing infrastructure for no reason, but UTCP is actually trying to avoid exactly that.
Picture the following scenario. You create an agent that your users should use in their daily life. Now you want your users to be able to do more than generate text. A.k.a. call tools.
How do you do this?
Option 1: you implement your own infrastructure allowing them to call whatever tools you custom infra supports. -> less flexibility on the user side of what tools they can use
Option 2: you use an MCP client. Now your users can add any tool they want as long as it has MCP support. This however adds an extra layer of infra. Agent -> MCP Server -> Original Tool (maybe a REST API, or a CLI command, etc.)
Option 3: you use the UTCP client. Now your users can add any tool that uses any communication protocol. Agent -> Original tool (maybe a REST API, CLI command, MCP server, etc.)
TLDR, UTCP is about security and not rebuilding existing infrastructure. There are plenty of communication protocols to get anything you can imagine done on the internet. We should use those and write interfaces and wrappers using those, not MCP. That is what UTCP does. It allows agents to use existing infra, communication protocols and security.
We're fully community-driven OSS, and just aiming to make a protocol that is useful for people
Hope this makes sense and happy building <3
Will UTCP work with OpenAI models despite their absence of general support for MCP?
In general models themselves don't support MCP, it's wrappers around models that do.
I'm tired boss [meme]
Obligatory XKCD: https://xkcd.com/927/
[dead]
maybe we could implement a UTCP->REST bridge for another unnecessary abstraction layer..
/s