May 27, 2026 ChainGPT

AudioHijack: Inaudible Audio Hack Can Hijack AI Voice Wallets

AudioHijack: Inaudible Audio Hack Can Hijack AI Voice Wallets
Headline: Researchers reveal “inaudible” audio hack that can hijack AI voice models — a new threat for crypto services that use voice interfaces A team at Zhejiang University has demonstrated a powerful new attack that hides commands inside audio files that humans can’t hear but AI voice models obey. Presented at the 47th IEEE Symposium on Security and Privacy in San Francisco, the technique — dubbed AudioHijack — achieved up to a 96% success rate in experiments and poses direct risks to any service that relies on large audio-language models (LALMs), including voice-enabled crypto wallets, trading assistants, and customer-support bots. What the researchers did - Attack name and venue: AudioHijack, presented at IEEE S&P (47th) by Zhejiang University researchers led by Ph.D. student Meng Chen. - Speed and scope: The adversarial signal can be trained in about half an hour and is “context-agnostic,” meaning it can be appended to arbitrary audio and still influence the target model regardless of what a human user says. - Effectiveness: The team tested 13 open-source LALMs and also observed attacks on commercial systems from Microsoft and Mistral. Reported outcomes include causing models to refuse legitimate requests, spread false information, inject harmful links, change personality, or perform actions the user did not request — such as web searches, file downloads, and sending emails that could contain personal data. How the hack works - Rather than altering the text prompt, AudioHijack manipulates the digital waveform itself by changing numerical sample values in ways that are imperceptible to people but interpreted differently by AI. - Because the hidden commands ride on the audio signal instead of visible text, many existing protections that scan for prompt-injection or suspicious text content are bypassed. - The researchers also reported unpublished follow-up work showing similar manipulations in live AI voice chats. Delivery vectors - The manipulated audio could be delivered via common channels: streaming videos, music files, voice notes, uploads to transcription services (e.g., audio from Zoom calls), or other media that LALMs process. Implications for the crypto industry - Voice-based authentication and trading: Any crypto product that accepts voice commands — wallets with voice unlock, voice-driven trading apps, or voice-activated treasury interfaces — could be coerced into executing unintended actions or divulging sensitive information. - Social engineering and phishing: Attackers could embed the signal in a podcast, video ad, or voice memo to manipulate downstream AI assistants used by victims, triggering phishing pages, malicious links, or data exfiltration. - Third-party services: Custodial or compliance tools that use AI transcription or voice agents may be used as an attack vector if they process adversarial audio. Defenses and limits - Best-performing defense tested: Monitoring a model’s internal attention mechanisms was the most effective countermeasure the team evaluated. - Adaptive attackers: The researchers also found attackers aware of such defenses could reduce the signal strength and still retain much of the attack’s effectiveness, making single-point mitigations fragile. - Practical takeaway: Chen warned these defenses struggle because models have difficulty distinguishing normal user intent from adversarial audio. In his words, “it takes just half an hour to train this signal… you can use it to attack the target model whenever you want, no matter what the user says.” What crypto teams should do now - Treat voice as an elevated risk vector: Avoid relying solely on voice authentication or single-factor voice commands for high-value actions. - Add multi-factor checks for sensitive operations, and require explicit human confirmations for fund transfers and key operations. - Audit any third-party voice AI integrations and limit which tools can perform privileged tasks. - Consider detection layers that inspect low-level audio features or monitor model internal states, but plan for adaptive adversaries. - Limit auto-processing of meeting audio or other sources that may contain sensitive material unless verified clean. Bottom line AudioHijack exposes a new class of adversarial audio attacks that are cheap to create and effective across multiple LALMs. For the crypto sector — where automation and voice interfaces are increasingly common — the finding is a red flag: voice channels can be weaponized against AI-based tooling, and defenders will need layered, adaptive controls to stay ahead. Read more AI-generated news on: undefined/news