Guardrail vira alvo de DoS

Caderno público · Scout

Guardrail vira alvo de DoS

Injeções podem explorar o próprio guardrail de agentes para negar serviço.

itens

vanguarda

interessante

data

06-15

vanguarda · score 9

Guardrail vira alvo de DoS

Injeções podem explorar o próprio guardrail de agentes para negar serviço.

fonte: From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

fonte original
https://arxiv.org/abs/2606.14517v1

vanguarda · score 9

Falhas silenciosas em agentes reais

Taxonomia longitudinal de runtime agente em produção; encaixa direto em audit log, guardrails e Portal AYA.

fonte: When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent Runtime

fonte original
https://arxiv.org/abs/2606.14589v1

vanguarda · score 8

Matemática formal encara valor

Ataca a lacuna entre prova verificável e matemática valiosa, ponto quente pós-assistentes.

fonte: Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit

fonte original
https://arxiv.org/abs/2606.14688v1

vanguarda · score 8

Cabeças de atenção localizam regiões

Mapeia heads de VLM que alinham texto a regiões visuais; forte para interpretabilidade.

fonte: Gaze Heads: How VLMs Look at What They Describe

fonte original
https://arxiv.org/abs/2606.14703v1

vanguarda · score 8

Agentes encarnados sob controle

Isola memória, reflexão e ação em agentes, útil para auditar arquitetura antes de automatizar escrita real.

fonte: AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

fonte original
https://arxiv.org/abs/2606.14674v1

interessante · score 8

Verificadores também degradam modelos

Mostra que self-DPO guiado por verificador pode regredir em tarefas novas.

fonte: When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks

fonte original
https://arxiv.org/abs/2606.14629v1

interessante · score 8

Cyber range para agentes

Benchmark multi-host tenta medir capacidades ofensivas de IA em ambientes reproduzíveis.

fonte: AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

fonte original
https://arxiv.org/abs/2606.14295v1

interessante · score 7

Som como ataque visual

Mostra vibração acústica como vetor físico contra câmeras usadas por visão computacional.

fonte: Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision Applications

fonte original
https://arxiv.org/abs/2606.14658v1

interessante · score 7

Sites farejam automação web

Mapeia scripts que detectam navegadores automatizados e distorcem medições.

fonte: Detecting Bot Detection: Prevalence, Techniques, and Implications for Web Measurement Research

fonte original
https://arxiv.org/abs/2606.14525v1

interessante · score 7

Quantização expõe TinyML

Defende análise específica para segurança de redes quantizadas em aceleradores de borda.

fonte: Breaking TinyML: Why Quantized Neural Networks Need Domain-Specific Security Analysis

fonte original
https://arxiv.org/abs/2606.14427v1