Nat Rubio-Licht
Senior Reporter

Nat Rubio-Licht

Nat Rubio-Licht is a Senior Reporter at The Deep View. Nat previously led CIO Upside, a newsletter dedicated to enterprise tech, for The Daily Upside. They've also worked for Protocol, The LA Business Journal, and Seattle Magazine. Reach out to Nat at nat@thedeepview.ai.

Apple finds new way to spot AI hallucinations

Apple may not have homegrown AI. But it wants to make sure the technology is done right.

On Tuesday, Apple published research detailing a new way to find and quash incidents of hallucination, the pesky mistakes that an AI model makes when it doesn’t have enough training data and starts making guesses. Apple’s research introduces “Reinforcement Learning for Hallucination Span Detection,” which pinpoints not just when an AI model hallucinates, but where exactly within a line of text the model goes wrong.

Apple’s model gives its AI framework small rewards each time it accurately identifies incorrect phrases or words, based on how closely its responses match those of human evaluators.

  • This turns hallucination detection from a “binary task” into a “multi-step decision-making process,” Apple said in its research.
  • To put it simply, it’s the difference between a teacher saying you failed a test with no explanation and a teacher telling you exactly which answers you got wrong and why.

“Most existing research works focus on a binary hallucination detection problem, where the goal is to determine if the model output contains hallucinations or not,” Apple said in the paper. “While useful, this formulation is limited: in many real-world applications, one often needs to know which specific spans in the model output are hallucinated in order to assess the reliability of the generated content.”

And Apple’s system proved itself, outperforming conventional methods on the RAGTruth Benchmark, an AI truth-checking test for tasks like summarization, question answering, and data-to-text.

Our Deeper View

While Apple may be seen as miles behind in the AI race, this is a misconception. Apple has effectively removed itself from the competition entirely, instead hitching its wagon to Google through a multi-year agreement to use Gemini to power Siri. However, Apple still bears the burden of doing AI right. With almost 2.5 billion devices in the hands of users worldwide, it’s vital that an AI-powered Siri makes as few mistakes as possible, especially if many of those users aren’t AI-savvy. This research is a sign that Apple understands the consequences of getting it wrong.

New pact pushes back on AI replacement race

AI ethicists have put out another plea for the world to pay attention to the tech’s risks.

On Wednesday, a coalition of leaders across industries announced the “Pro-Human AI Declaration,” united by a broad, simple proclamation that AI “should serve humanity, not the reverse.”

“This race to replace poses risks to societal stability, national security, economic prosperity, civil liberties, privacy, and democratic governance,” the statement reads. “It also imperils the human experiences of childhood and family, faith, and community.”

The declaration, which counts names like Yoshua Bengio, Steve Bannon, Susan Rice, Sir Richard Branson and Joseph-Gordon Levitt among its endorsers, proposes five central tenets in creating trustworthy and controllable AI:

  • Keeping humans in charge, which suggests meaningful human controls and override capabilities, an AI “off-switch,” independent oversight and an end to the superintelligence race
  • Avoiding power concentration, such as preventing AI monopolies, having democratic authority over major work, society, and civic life impacts, and shared prosperity of AI’s benefits
  • Protecting the human experience, which proposes that AI should not be allowed to exploit or stunt children's growth, should not be addictive and should not “supplant” foundational relationships
  • Human agency and liberty, suggesting that AI should not be allowed to have personhood, that humans should also retain rights to their data and privacy, and that AI should not “enfeeble” users.
  • Responsibility and accountability for AI companies, advising that AI should not create a "liability shield” for companies, developers or users, and that all failures by models should be made transparent.
[@portabletext/react] Unknown block type "twitter", specify a component for it in the `components.types` prop

This is not the first time tech ethicists have implored the industry to pay attention to the dangers that lie ahead in our current AI trajectory. In October, the Future of Life Institute put out a petition calling for a moratorium on developing superintelligence, claiming that the tech harbors “extreme large-scale risks.” The petition garnered more than 135,000 signatures, many of whom also endorsed this Pro-Human AI Declaration.

Our Deeper View

AI is moving so fast that it often breaks out of restraints quicker than we can make them. Getting people to pay attention to the risks the tech presents is a huge challenge. The fact is that people won’t pay attention to responsible AI until AI actually creates a major crisis. So I ask: What will it take? How many wrongful death lawsuits against LLM providers are going to have to pile up? How many people need to lose their jobs? How many self-driving cars need to crash? Though the ethos of innovation has long been to move fast and break things, what will it have to break to get people to act?

AI giants race to build lighter models

AI firms are racing to enable users to do more with faster, lighter models.

On Tuesday, both OpenAI and Google released new iterations of their flagship models. Each of these models boasts quality outputs and capabilities at faster speeds and lower costs.

Let’s check out the specs.

  • OpenAI’s model, GPT-5.3 Instant, declines fewer questions and offers a less defensive tone, OpenAI said in its announcement. Conversation flow is more consistent, and information synthesization for web searching is more relevant. This update is targeted primarily at everyday users of ChatGPT, with updates to its heavier Thinking and Pro models coming soon.
  • Google’s model, Gemini 3.1 Flash-Lite, is targeted at high-volume developer workloads, offering quicker responses at a lower latency. For example, this model is a good fit for tasks like translation and content moderation where “cost is a priority,” Google said in its announcement, however, the model can also handle complex, reasoning-heavy workloads, such as generating dashboards or creating simulations.

OpenAI’s latest addition is available today to all users in ChatGPT, as well as to developers in

the API. GPT-5.2 Instant will remain available for three months until it is retired in June. Google’s latest model, meanwhile, is available in preview to developers through the Gemini API in Google AI Studio and for enterprises in Vertex AI.

These models come amid lightweight releases from Chinese open source competitors like Alibaba’s Qwen, which unveiled its Small Model Series, ranging from 800 million to 9 billion parameters, earlier this week.

Our Deeper View

Though these models court different audiences, the objective is the same: To offer cost-effective and faster alternatives to heavier reasoning models. OpenAI’s latest offering, targeting the consumer audiences, could more quickly answer user queries that they might otherwise turn to a search engine, saving OpenAI money and keeping its user base consistent as it rolls out ads. Google, meanwhile, is saving developers from eating through their token budgets with tedious tasks at a time when inference costs are mounting. These models could signify a broader trend: AI firms are starting to realize that less is often more.

The fight over who controls AI just began

Everyone’s got an opinion on Anthropic’s face-off with the Pentagon.

The past few days have brought a deluge of news and updates as the government continues to blacklist the AI firm, with its technology now being shut out of the US Treasury and the federal housing agency, along with the military. The company’s designation as a supply chain risk marks an unprecedented retaliatory move by the government, and risks chilling future contracts between tech companies and federal agencies.

If you’re trying to sort out the situation, The Deep View has picked three of the most poignant and viral essays published in recent days:

  • “Clawed,” By Dean W. Ball for Hyperdimensional. In this long-form piece, Ball makes the argument that the fight between Anthropic and the U.S. government is indicative of a “death rattle of the old republic.” The fight also marks one of the first times the question of who should control AI has been debated in the public eye, and the government got off on “extraordinarily bad footing” in the argument.
  • “Anthropic and Alignment,” by Ben Thompson for Stratechery. Thompson argues that Anthropic’s insistence on designating how its models can be used is “fundamentally misaligned with reality,” claiming that it is intolerable for corporate executives to supersede the decisions of elected officials.
  • “A Few Observations on AI Companies and Their Military Usage Policies,” by Sarah Shoker for fishbowlificiation. Shoker, former leader of OpenAI’s geopolitics team, takes a broader view of the subject, pointing out that frontier AI labs don’t have coherent policies around military AI use, which has allowed these firms to live in a vague grey area of “optionality.” Additionally, the use of AI in military action is largely opaque due to policy, disinformation and the fog of war creating “black boxes" all around.

Even Deep View readers have strong thoughts on the topic: In our daily poll, we asked “should Anthropic have acquiesced to the Pentagon’s request to remove safety restrictions,” to which 78.8% responded “No.”

[@portabletext/react] Unknown block type "twitter", specify a component for it in the `components.types` prop

Our Deeper View

This situation has made one reality abundantly clear: AI will impact our society and future in monumental ways. The fight between the US government and Anthropic is about more than just one company or one contract. Instead, it boils down to power, for the first time putting on public display “the nexus of control over frontier AI,” as Ball’s essay notes. In the end, the one who controls the technology effectively controls the future.

Claude becomes No. 1 app hours after Pentagon ban

In standing up to the US government, Anthropic has built up so much goodwill that people are chalking “GOD LOVES ANTHROPIC” on the sidewalks outside of its San Francisco HQ. It also gave OpenAI an opening it needed.

After Anthropic stood firm in its refusal to bend on its two conditions for using its AI — no mass surveillance of US citizens and no fully autonomous weapons — the Pentagon and the Trump Administration went to DEFCON 5 on Friday. They designated the company a supply chain risk, a title typically reserved for adversaries. President Donald Trump has also directed every government organization to “immediately cease” using Anthropic’s technology.

“The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield,” Secretary of War Pete Hegseth said in a post on X.

While facing the fallout of losing hundreds of millions in revenue, Anthropic has seen a massive outpouring of support, both in the AI community and beyond. In the hours after the government ban, its Claude app skyrocketed to the top of Apple's App Store, dethroning ChatGPT in the No. 1 spot. Anthropic employees, meanwhile, broadly took to X to praise their employer. AI leaders like Ilya Sutskever have commended the company for its stance, and the app has even earned the public admiration of celebrities who have nothing to do with AI, like Katy Perry.

In an interview with CBS News, CEO Dario Amodei called disagreeing with the government the “most American thing in the world.”

Anthropic vowed to take the Pentagon to court over the extent of the supply chain risk designation. In a statement on Friday, the company said the designation is both "legally unsound and set a dangerous precedent” for companies to negotiate with the government.

In the meantime, OpenAI seized the opportunity to sign a contract with the Department of War to use its models instead.

While OpenAI claimed that its agreement with the Pentagon upholds its "redlines" over domestic surveillance and autonomous weapons and "has more guardrails than any previous agreement for classified AI deployments," the reality is a little more nuanced.

Anthropic sought to preserve explicit restrictions barring the use of its models for mass surveillance of U.S. citizens and fully autonomous weapons. The Pentagon, which often structures contracts around broad “all lawful purposes” language, reportedly preferred not to carve out those exceptions. Anthropic declined to move forward under those terms. OpenAI later signed a Department of War agreement structured around the standard federal contracting language. OpenAI CEO Sam Altman defended the stance, saying, "I do not believe unelected leaders of private companies should have as much power as our democratically elected government."

Our Deeper View

While all the attention is certainly a silver lining for Anthropic, it was also its moment of truth to uphold the company's founding principles of AI responsibility, safety, and ethics. By not bending to the Pentagon’s will, especially given that human rights and lives might be at stake, the company was keeping its promises. Its refusal to capitulate may also put pressure on rivals. An open letter titled “We Will Not Be Divided” began circulating on social media and has since garnered 537 signatures from Google employees and 89 from OpenAI employees. By holding to its stated objectives and incurring the wrath of the federal government, Anthropic has effectively made itself a martyr.

Anthropic defies Pentagon over AI guardrails

Amid pressure from the Pentagon to give in to its demands to loosen its safeguards, Anthropic continues to stand firm.

In a statement on Thursday afternoon, Anthropic CEO Dario Amodei made it clear that the company cannot accede to the Department of War’s demand to roll back its safeguards that prevent its AI models from being used in two key areas: mass surveillance of U.S. citizens and fully autonomous weapons.

Amodei noted that AI’s use in mass surveillance posed “serious, novel risks to our fundamental liberties.” And while the tech may someday be helpful in fully autonomous weaponry, the guardrails simply don’t exist today to deploy this safely.

"In a narrow set of cases, we believe AI can undermine, rather than defend, democratic values,” Amodei said in his statement. “Some uses are also simply outside the bounds of what today’s technology can safely and reliably do.”

Amodei said that its Claude models are widely deployed throughout the defense and intelligence community, including in the government’s classified networks, in national laboratories, and in mission-critical applications such as intelligence analysis, modeling and simulation, operational planning, and cybersecurity operations. Thus far, its safeguards haven’t presented an issue in these cases, he said.

Though Anthropic’s “strong preference” is to continue to support military action, it will only do so with its safeguards in place. Otherwise, it cannot “in good conscience” submit to their requests and continue its relationship.

Amodei’s response is the latest move in the fight between the company and the Pentagon. Earlier this week, the agency took its first steps in blacklisting Anthropic by labelling it a “supply chain risk,” a label generally reserved for companies from adversarial countries.

  • The unprecedented move would not only threaten Anthropic’s contract with the military but also force all defense vendors to cut ties with Anthropic.
  • And after his meeting with Amodei, Secretary of War Pete Hegseth contradicted himself by threatening to invoke the Defense Protection Act, forcing Anthropic to tailor its models to military desires regardless.
  • Additionally, the Pentagon struck a deal with xAI on Monday to use its Grok models in classified systems, including weapons development and battlefield operations.

Policymakers, however, have started to warn that the sparring match between Anthropic and the Pentagon will only sour future relationships between the government and Silicon Valley AI firms, with Dean Ball, former AI adviser to the Trump Administration, calling Hegseth’s contradictory threats “incoherent.”

Our Deeper View

Anthropic standing firm in its decision not to give in to the Pentagon’s threats was its only option, given that the company has built its reputation around AI safety and only deploying AI with guidelines that ensure it does no harm. Though the company is confronting its moral and ethical standards with recent changes to its Responsible Scaling Policy, backing down would have been a sharp about-face, betraying its core principles. Though the fallout could cost Anthropic a large chunk of its revenue from government agencies and vendors, there may be a silver lining: Gaining further trust with its primary audience of risk-averse but AI-hungry enterprises.

Why vibe coding has boosted demand for engineers

Even as vibe coding allows users to produce massive quantities of code, that doesn’t mean everyone can be a software engineer without training.

Earlier this week, the viral 2028 Global Intelligence Crisis report from Citrini Research painted a bleak picture of the impacts of AI adoption at scale in which the economy and job markets crash as a result of AI-enabled productivity. This was a worst-case scenario thought experiment, but not everyone is buying into the doom and gloom.

In a rebuttal to Citrini’s post, Citadel Securities laid out the current state of the labor market in the face of burgeoning AI adoption. The findings point to the fact that the current unemployment rate is 4.28% and software engineering job postings are up 11% year over year.

In a post on X, Aaron Levie, CEO of Box, said of the data that while AI is allowing coding novices to do more, it’s also leading them to launch new custom software that eventually requires more expertise. "This is counterintuitive for some … but if you lower the cost of something that was previously supply-constrained, demand for that thing goes up. Software engineering is just one of the easiest examples to contemplate," Levie said.

[@portabletext/react] Unknown block type "twitter", specify a component for it in the `components.types` prop

This adds another layer to the conflicting narratives of AI’s impact on the job market, and particularly on the software engineering field. While some estimates point to AI making software engineers, among other jobs, entirely automated and therefore obsolete, other data shows that AI is increasing workloads, rather than shrinking them. One projection indicates that many of those laid off as a result of AI will be rehired to do similar work.

In short: While vibe coding is making it easier for tech novices to produce code, designs or proofs-of-concept, turning that code and those prototypes into something that’s actually useful requires a more deft hand, thereby creating more demand for technical expertise.

Our Deeper View

While this is good news for the software engineers worrying that their livelihood is on unsteady ground, it may also be another point in favor of the “SaaS-pocalypse” argument. Though executives of companies like Salesforce and Workday are downplaying the impact that AI will have on legacy software firms, AI is enabling enterprises to build their own custom tooling more easily than ever. That means they’re likely to shift their spending from legacy SaaS platforms to in-house developers and the AI tools that facilitate their rapid building.

Google’s Nano Banana 2 solves a key AI flaw

Google has once again raised the bar on AI image generation.

On Thursday, the company unveiled Nano Banana 2, the latest iteration of its image model, offering advanced world knowledge and quality and reasoning at faster speeds than its predecessor. Arguably, the biggest upgrade is how it handles text.

Nano Banana 2 is powered by real-time information and images gathered from web search. In a post on X, Google noted that users can create images with “real-world accuracy,” including improved lighting, textures and details.

“This deep understanding also helps you create infographics, turn notes into diagrams and generate data visualizations,” Google said in its announcement.

Of all of the upgrades that Nano Banana 2 touts, two in particular stick out: Creative control and text rendering.

  • Nano Banana 2’s ability to render text with more accuracy is something that past image generators have largely struggled with, often making it one of the easiest ways to flag that an image was generated using AI. The model can also translate localized text within an image between languages.
  • The model also offers more creative control, including better instruction following, subject and character consistency and production-ready specs with resolutions from 512px to 4K.
  • These capabilities open the door for Google’s image model to be far more valuable for enterprise use cases, such as graphic design or marketing, where it can now be used to create printable materials.
[@portabletext/react] Unknown block type "twitter", specify a component for it in the `components.types` prop

Nano Banana 2 is currently available across the Google and Gemini suite, including in the Gemini app, search, AI Studio, Google Cloud and in the Google Ads platform.

Our Deeper View

Though it’s easier to make the case for embedding language models or agents into enterprise processes, image generation models are a harder sell, with inconsistency and poor text rendering capabilities impeding marketing departments from using them. Nano Banana 2, however, might break the mold, allowing creatives and marketers to render billboards, printed programs, or entire campaigns with text that looks much more polished and professional. Given that Google is powering this model with web data, however, copyright issues may still present a thorn. As copyright infringement cases against AI firms persist, enterprises might want to pause before taking the legal risk, even if the capabilities of Google’s new model seem enticing.

Pressure mounts on Anthropic’s AI ethics stance

Anthropic might be risking the thing that makes it Anthropic.

On Tuesday, the company announced changes to its Responsible Scaling Policy, the framework that prevents Anthropic’s models from being released without proper safety and security measures.

The biggest change? The company has struck the pledge to hold back its models if Anthropic can’t guarantee proper risk mitigations in advance of release. Additionally, the company is now no longer preventing itself from training models above a certain level without certain safety measures.

In an interview with Time, Anthropic’s chief science officer Jared Kaplan said that it “wouldn't actually help anyone” for it to stop training AI models. “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”

Some of the highlights from the new version of the policy include:

  • Anthropic meeting or exceeding the “overall risk reduction posture” of competitors
  • Delaying development if Anthropic is already considered to be in the lead on AI development and models in production are considered to carry catastrophic risk
  • Commitments to release “risk reports” every three to six months to remain transparent about the safety issues its models may face
  • Introducing “frontier safety roadmap,” describing concrete plans for risk defenses across security, alignment, safeguards and policy

Even with the changes, Anthropic is still standing firm in the ongoing dispute over its models being used for warfare: After a meeting with CEO Dario Amodei on Tuesday, Defense Secretary Pete Hegseth is reportedly giving the company until Friday to roll back its AI safety guardrails for its chatbot, Claude. Anthropic has two major ethical boundaries: For its models to not be used for fully autonomous targeting in military operations or for the surveillance of U.S. citizens.

If Anthropic continues to refuse, the Pentagon will label the firm a “supply chain risk” and invoke the Defense Production Act, giving the agency access to Claude “regardless of if they want to or not,” according to CNN.

But not all AI firms have the same reservations: On Monday, xAI struck a deal with the Pentagon to use Grok in classified systems, including weapons development and battlefield operations.

Our Deeper View

Though holding out against the use of its models for acts of warfare still gives it the ethical high ground, with so much attention and cash flowing into Anthropic, the company loosening its tight safety standards felt inevitable. As Harvey Dent says in The Dark Knight, “You either die a hero, or live long enough to see yourself become the villain.” While Anthropic isn’t a “villain” by any stretch of the imagination, it continues to face situations that challenge its moral compass, chipping away at its position as the poster child for ethical conduct in AI. In the interview with Time, Kaplan denied that this move was financially motivated. But fewer restrictions will allow the company to innovate faster and fit in better with the “move fast and break things” ethos of Silicon Valley. We just thought that was the mindset Anthropic was created to oppose.

or get it straight to your inbox, for free!

Get our free, daily newsletter that makes you smarter about AI. Read by 750,000+ from Google, Meta, Microsoft, a16z and more.