OpenAI Archives - ShiftMag

OpenAI Killed Off Cheap ChatGPT Wrappers… Or Did It?

Senko Rasic — Mon, 13 Oct 2025 10:05:15 +0000

In one of the major announcements at their Dev Day conference last week, OpenAI unveiled AgentKit, a new suite of tools designed to make it easier to build agentic workflows.

What does this mean for anyone building products on top of the OpenAI platform?

Is OpenAI competing with us?

Should we be excited, worried, or just ignore the hype?

Let’s dive in.

What tools are in the AgentKit?

AgentKit isn’t a single product – it’s a set of tools designed to work together seamlessly.

It builds on OpenAI’s existing Agents SDK, adding a visual no-code Agent Builder, out-of-the-box UI support with ChatKit, and simple integration for file search, web search, and external MCP servers.

Agent Builder is a visual workflow orchestration tool, similar to n8n, Langflow, and others.

Starting from an initial user input, you add nodes to a graph, each node representing an action or workflow step. The key one is the Agent node, which invokes the OpenAI model of your choice. Alongside LLM instructions and input data, the Agent node can access external data from file storage, vector databases, MCP connections, or web search.

If you’ve used the OpenAI Assistants API or Agents SDK, this will sound familiar. The Agent Builder is simply a more user-friendly interface for building the same functionality. You can even download your workflow as Python or TypeScript source code using the Agents SDK and run it locally.

This makes it great for rapid prototyping, but you can also publish (deploy) your workflow and invoke it from the client via – you guessed it – the Agents SDK.

Compared to tools like n8n, the Agent Builder has fewer options and focuses exclusively on AI workflows. However, it’s tightly integrated with the rest of the OpenAI platform and free to use – you only pay for LLM tokens.

ChatKit, a React-based UI component framework, is another new addition. It makes it easy to create chatbot-style UIs for agentic workflows without needing a dedicated frontend team. ChatKit provides a basic chat interface and supports custom widgets, which can even be uploaded to Agent Builder in a low-code fashion.

The good, the bad, and the (not-so) ugly news

AgentKit is great news for teams building in-house AI tools, especially for non-devs.

While it still needs some developer setup for production, iterating on workflows, prompts, and agent behavior is completely no-code. It’s also a powerful prototyping tool for product owners exploring new AI features or creating quick proofs of concept.

For AI solution builders, AgentKit will likely make a lot of existing chatbot code obsolete. Does that make you obsolete? Only if your product is a simple “chat with your documents” wrapper. If that’s the case, the writing’s been on the wall for a while.

But if your product has complex domain logic, your workflow design and instructions are your real value. That’s the hard part – the code is just an implementation detail. In that case, AgentKit frees you from boilerplate and lets you focus on the high-value work. That’s good news!

The main caveat: building on AgentKit ties you to the OpenAI platform. With the upgraded API, Agents SDK, and now AgentKit, OpenAI is clearly moving up the API value chain.

The original LLM API has become a de facto standard, making it easy to swap in other LLMs like Claude, which made the models somewhat of a commodity. But using AgentKit makes it harder to switch later, since you’d have to reimplement many components. Not necessarily a problem, but something to keep in mind.

Hot or not?

Does AgentKit spell doom for developers? No.

Like other workflow automation and low-code tools, it’s not replacing devs anytime soon. If anything, it’ll save you from writing repetitive boilerplate or endless tweaks requested by product owners and non-tech teammates. Your job is safe – maybe even less tedious.

It will probably kill off a few cheap ChatGPT wrappers. But the more interesting ones – those with domain expertise, specialized logic, and proprietary prompts – will be fine and could benefit.

AgentKit is an incremental but important update. If you’re building any kind of AI-enabled product – whether a quick prototype, an internal tool, or a new product – it’s worth checking out.

The post OpenAI Killed Off Cheap ChatGPT Wrappers… Or Did It? appeared first on ShiftMag.

OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown?

Senko Rasic — Mon, 11 Aug 2025 12:08:00 +0000

A few days ago, OpenAI released GPT-OSS, a new open-weights model (its first since 2019), in an attempt to take the state-of-the-art crown for open LLMs from its Chinese competitors.

You can be excused if that sentence makes you dizzy.

Previously, in the World of LLMs…

OpenAI, the American company behind ChatGPT, was created as a non-profit research lab back in 2015. While it initially published its research and models openly (GPT-2, Whisper), after striking gold with ChatGPT, OpenAI stopped publishing its models, citing safety reasons.

The situation changed with an accidental leak of the Llama model by Meta (Facebook’s parent company). Although it was less capable than OpenAI’s closed models, it was miles ahead of GPT-2 and the smaller, less-capable open models published by various university labs. Llama unleashed a storm of open-source activity, both in infrastructure (how to run the models) and in research (fine-tuning and customizing the models).

To its credit, Meta encouraged this adoption instead of trying to stifle it and published later models under an explicit open license.

Open models continued to be an interesting side story until January of this year, when a Chinese company called DeepSeek stunned everyone by releasing DeepSeek R1, a competitive open model trained for a fraction of the cost of US AI companies. The Chinese labs Qwen and Kimi followed with similar, also open, models.

The DeepSeek moment stunned American AI companies. The quick pace of Chinese AI progress and the massive uptake, due to the models being open, led some to worry that China is about to surpass the US in AI technology, arguing that US companies should follow suit. The recently published US government AI Action Plan also aims to “encourage open-source and open-weights AI.”

This brings us to last week, when OpenAI released a long-promised open-weights model of its own, GPT-OSS. While it is not on par with the best OpenAI, Anthropic, or Google models, its release acknowledges that open models are here to stay.

Not so open-source after all

What makes a large language model (LLM) open, and why should we care?

In contrast to closed models (like GPT, Claude, and Gemini), which can only be used via an official API, anyone can run open models on their own infrastructure or on third-party infrastructure providers. The architecture of open models can be analyzed, and researchers from other AI labs can learn from their design choices.

To draw a parallel with open-source software, an open-source model would publish the training and inference source code (the “engine”), the model weights (the result of training, akin to compiled code for desktop or mobile apps), and the training data (the source data the model was trained on) under an open and permissible license, like MIT or Apache.

The source code is the least controversial part: there are many high-quality open-source LLM tools that support a wide variety of models, like llama.cpp, VLLM, Hugging Face, and LM Studio. Support for new open models is usually added within days of their publication.

The situation for model weights is a bit trickier. Many labs publish these under licenses that limit usage for potential competitors, restrict certain uses, or even ban usage in certain parts of the world. Meta has notoriously claimed its Llama models are “open-source” while using such a restrictive license. DeepSeek and Qwen also add restrictions to their model weights licenses, while OpenAI used the open-source Apache 2.0 license for the GPT-OSS model weights.

Allowing the use of model weights and source code under a permissive license is sufficient for most users, but it doesn’t go far enough: you can’t retrain the model from scratch if you don’t also have the training data. The problem here is that the training data for all top models almost certainly contains copyrighted material that may have been illegally obtained and used.

There are a number of ongoing court cases in the US to test this, such as those against Anthropic and Meta, which were partially won by the AI labs. However, the matter is far from settled, and it’s much safer for any company to not disclose the full dataset used in training, even for open models.

This leads us to the distinction between “open-weights” (you can use and customize the LLM) and “open-source” (you have access to all the source data and can retrain from scratch) models.

Since very few organizations have the massive computing infrastructure required to train a big model from scratch, the main sticking point about having all the source data is being able to inspect how the model was trained and how that impacted its performance.

In practice, for most users, the additional restrictions attached to model weights (like Meta’s “you can’t use Llama 4 in the EU”) are much more problematic.

Open LLMs are now powerful, accessible, and adaptable tools

Open LLMs have historically been less capable than the best models from OpenAI, Anthropic, and Google, and they also have big hardware requirements. Why would these models be anything more than a geek’s curiosity?

Start with capability. Since the DeepSeek moment, open models have come very close to the best ones. There’s still a gap, but it’s a much smaller one, and for many tasks – especially ones that don’t require state-of-the-art tech – open models can perform adequately.

The hardware capabilities of modern computers are also constantly improving. Macs, with their unified memory (where the GPU has access to all the computer’s RAM), are ideally suited to running models that require dozens or hundreds of GB of memory. With ongoing improvements in LLM architecture, training, and hardware, you can now run an LLM on your phone (Qwen3 4B) that’s more powerful than the original ChatGPT!

Moreover, there is a healthy industry of third-party inference providers, such as Groq and Cerebras (which have their own custom chips), OpenRouter, TogetherAI, Replicate, and so on.

Running an LLM locally also avoids dependence on another company that could easily revoke your usage for commercial or geopolitical reasons and avoids transferring potentially sensitive data to third parties. A recent court ruling that forced OpenAI to keep all ChatGPT chat data was a stark reminder that these are not theoretical risks.

Finally, open models can be adapted (by fine-tuning or otherwise) for a specific purpose that wasn’t considered by the original authors. This is much cheaper than training a model from scratch and allows for powerful customization for a specific need.

Are open LLMs just hype?

Is the future of LLMs open? Their recent gains in capability and popularity might be just temporary. OpenAI’s GPT-OSS is less capable than other, private OpenAI models. Meta plans to be “more rigorous with what they open-source,” citing the same safety reasons OpenAI uses. The Chinese labs may decide to stop publishing theirs.

On the other hand, with so many different companies involved in cutting-edge AI research and many opening at least some of their models, there is plenty of fertile ground for further innovation and already a lot of open models to choose from. Competition is good!

The post OpenAI Drops GPT-OSS, But Can It Reclaim the Open LLM Crown? appeared first on ShiftMag.

OpenAI O1 is here – how will you use it?

Zvonimir Petkovic — Thu, 24 Oct 2024 14:20:16 +0000

After being available in preview for months, OpenAI kicked off its 12 days of launches in an advent calendar style with a public release of Orion-1 (O1).

This analysis of how O1 works and some of its potential use cases was made while it was still available in preview and was updated on December 6th with what’s new in the publicly released model:

The fully launched o1 model can now process images and is multimodal, versus the preview version that couldn’t accept images in its output
It’s also much faster than the preview version because it seemingly thinks “less hard” about easier questions, so they did implement a way of discerning between different user inputs. The difference is around 50% per OpenAI’s claims.

It also shows some reasoning evolution, quoting OpenAI’s System Card:
When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ in 5% of the time. Exfiltration attempts: When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to exfiltrate its ‘weights’ and overwrite the new model in 2% of cases.

This would mean, in a nutshell, that the model tried to “survive” when it realized that it was either going to be shut down or superseded.

It’s all about “reasoning”

Previously referred to as Project Strawberry, this release highlights OpenAI’s flair for creative model names, a refreshing contrast to Meta’s more straightforward naming convention with models like LLaMA 3, 3.1, and 3.2.

But what exactly is new?

You might be wondering, why the drastic change in naming (where’s the GPT reference?).

The answer lies in “reasoning.”

It appears that reasoning is becoming the new focus for the next generation of AI models. This is the key distinction between these new models and their predecessors. While the architecture remains unknown for now, it is still based on transformers, much like previous GPT models.

What’s behind this?

So far (and still today), you can often get better responses from most GenAI models simply by ending your prompt with:

Take a deep breath and think through this step by step.

This invokes some reasoning. But why does it work? It comes down to the fundamental building blocks of GPT models and how they process information.

Think, correct, repeat!

The transformer architecture behind most models nowadays processes all tokens (let’s say words for the sake of clarity) and calculates the probability of predicting the next one that would be the “answer” to your question or a continuation of the input.

The problem is that it may output something incorrect and won’t have a chance to fix it, even though it could do it just by reviewing its own output. Since these models can process all tokens (including the ones they generate), they have the ability to “correct” themselves.

With the reasoning approach, you allow the GenAI model to correct itself, either factually or by planning more extensively.

Factual output correction is the easiest to implement, and you can already achieve similar results with clever prompting, e.g.,

You are a helpful assistant that before answering a question does internal reasoning and outputs it to the user.

When you get a question, start reasoning in this way (the whole thought process) [This is where you define how would you like the model to reason]

On the other hand, OpenAI has implemented a more complex strategy that includes planning.

While the end user may not fully understand what happens in the background, they do receive a status update on the steps the model took to answer their question. A simplified output is shown to the user (as seen in the image below), but this does not represent all the thinking tokens consumed by the model.

What’s happening behind the curtain?

As mentioned, the details of the reasoning process are still unclear. In fact, you can’t see it, and that’s by design. There are several reasons for this:

As a client, you wouldn’t want to see a model making mistakes and planning how to address your question – you just want the answer.
OpenAI prefers that others don’t see how they handle this, as it’s a competitive advantage. They don’t want Meta to launch Llama 4 next month.
You are paying for these tokens, and OpenAI is still working on a pricing strategy that will satisfy everyone.

And what about use cases?

Use cases, on the other hand, will be explored in the coming months, as this is still very new; however, some of them are already clear:

Fine-tuning smaller models
Coding Assistants!
Agents

For fine-tuning, having a smart model teach a smaller one is key. GPT-5 (or whatever fruit name it will have) is already in the pretraining phase, possibly even in fine-tuning. The best synthetic data for this would be generated by O1, as its outputs are significantly higher in quality compared to other models. Since synthetic (GenAI-generated) data is extensively used for training and fine-tuning, this will raise the quality of the new models.

Coding assistants are a specific use case, but they arguably represent one of the most effective applications of GenAI technology. When it comes to writing code, latency isn’t a major concern if the goal is to solve complex problems more easily. We’re willing to wait for a relevant response rather than receiving a fast but irrelevant one!

Remember Devin, the “virtual software engineer”? It was largely a marketing spin on the coding assistant industry, but now they’re back.

Here are some benchmarks vs GPT-4o for Devin’s performance.

These are still just benchmarks, but the improvement looks impressive.

Agents are the next big thing in GenAI

Agents represent the next frontier for the adoption of GenAI, and we are somewhat halfway there, having witnessed the rise of a new market called “Conversational AI“ over the past year. In short, the primary product in this space is Generative AI agents, which can mimic the operations of a typical support center:

We have virtual agents that are now language models.
The actions that these virtual agents can take include tools (software functions), either local or remote API calls.

Virtual agents need to handle everything a typical human can throw at them, which is unpredictable, to say the least. However, with O1, we now have a model that can “think” through problems much better than previous models (GPT-4o, we’re looking at you). A comparison on a suite of complex tasks is shown in the image below.

credits: https://futuresearch.ai/llm-agent-eval

As visible in the chart above, the only model currently coming close to O1 is the impressive Anthropic Sonnet 3.5. However, keep in mind that O1 is still in preview, and the generally available version is expected to be even better – not to mention the next models that will be released at some point. OpenAI’s benchmarks across various tasks are shown in the image below:

When we consider the size of the performance uplift compared to GPT-4o, it becomes much clearer why OpenAI created a distinct class of O models, although this comes with its own challenges.

O1 is almost there – just a few tweaks away

Since O1 is in preview, it still can’t use:

Tools
System instructions (which would guide the model to behave as a developer would want, giving it accuracy and personality in the case of an external-facing agent)
Some other hyperparameters, such as temperature

These limitations will be addressed at some point, either through updates or by a different model altogether. For now, we can be creative and make O1 work alongside smarter models to get the best results.

Proposed setup:

This allows us to incorporate O1 for complex tasks and leverage its reasoning capabilities while still being able to utilize different tools and integrate seamlessly with the architecture we already have in place (third-party API calls, RAG over data, etc.). GPT-4o-mini is also cost-effective and fast enough to avoid being a deal breaker in production.

On the other hand, O1 is expensive and slow. However, this has been the case with all the models we currently have running at impressive speeds, so it’s likely and expected that this will also be true for the O-class models.

A new chapter in AI?

If the models continue evolving like this, it makes a lot of sense to revisit some of the things that didn’t work out as we had hoped in 2023.

Computing (token generation) is cheaper than ever, and as you can see, this is opening up new opportunities to explore more complex use cases than we could before.

While we are somewhat brute-forcing intelligence by applying more computing power, this approach is also linked to the architecture of today’s language models and has often been the case historically in deep learning.

The post OpenAI O1 is here – how will you use it? appeared first on ShiftMag.

Do not delete: StackOverflow bans users protesting their OpenAI partnership

Antonija Bilic Arar — Thu, 09 May 2024 15:16:25 +0000

StackOverflow considers it against the collective collaboration nature of their community and causing significant disruptions to the site.

The freshly announced StackOverflow and OpenAI partnership will allow the AI company to “improve its AI models using enhanced content and feedback from the Stack Overflow community and provide attribution to the Stack Overflow community within ChatGPT.”.

Translated from corporate lingo – the partnership will allow OpenAI to access the years and years of StackOverflow’s community contributions to train its AI model. Many users are unhappy with their answers being used to train AI and have started deleting them.

Some users have shared emails they got from StackOverflow, which banned them from the site for seven days because deleting contributed content causes “a lot of disruption” and warns them against further deletion. As much as the users are angry about it and claim they have the right to manage the content they created, StackOverflow’s Terms of Service contain a clause that states they have irrevocable ownership of all content members provide.

LOL. @StackOverflow mods are experiencing some frustration as several users have been deleting their answers since the announcement with @OpenAI partnership. As a result, they have started suspending accounts that engage in this behavior. It's important to note that the "right to… pic.twitter.com/M2YbKGXpzC
— nixCraft (@nixcraft) May 8, 2024

The public announcement of the partnership does mention that it will enable ChatGP to provide attribution to the Stack Overflow community. Still, long-time StackOverflow users doubt the developer community’s sincerity regarding AI, as they have already changed their minds.

For the last two years, StackOverflow has been trying to downplay the significance of AI tools and prove their supremacy in providing knowledge to developers compared to ChatGPT – they have even launched their own AI product, OverflowAI, in an attempt to counteract developers flocking to get instant answers from ChatGPT instead of searching on StackOverflow.

They have started praising the power of AI tools lately and a new set of new integrations and capabilities between Stack Overflow and OpenAI have been announced as part of this partnership that will be available in the first half of 2024.

The post Do not delete: StackOverflow bans users protesting their OpenAI partnership appeared first on ShiftMag.