Cisco Talos researchers have discovered that more than 1,100 instances of the Ollama framework, used for locally hosting large language models (LLMs), are directly accessible on the public internet.
Approximately 20 percent of these servers were actively serving models without any form of authentication, leaving them susceptible to a spectrum of high-severity attacks.
In just ten minutes of scanning Shodan, Cisco’s proof-of-concept detection tool identified 1,139 publicly exposed Ollama endpoints, of which 214 responded to model queries without requiring credentials.
This lack of access control enables adversaries to reconstruct internal model weights through repeated queries, effectively performing model-extraction attacks.
It also permits the bypassing of content filters by coercing the models into generating disallowed outputs, such as malicious code or disinformation, via prompt injection. It opens the door to backdoor injection, where attackers upload tampered models or alter server configurations.
Even the remaining 80 percent of “dormant” servers pose a significant risk. Though they did not host active models at the time of discovery, these endpoints remain vulnerable to unauthorized model uploads, resource exhaustion attacks that inflate hosting costs, and lateral movement across compromised networks.
Moreover, a striking 88.9 percent of the discovered servers adhered to the OpenAI-compatible API schema using endpoints like /v1/chat/completions which streamlines the adaptation of existing exploit scripts across multiple LLM hosting platforms.
Technical Fingerprinting and Exposure Analysis
Cisco’s methodology combined Shodan’s indexed network signatures with default-port heuristics to achieve high-confidence identification of Ollama instances. Specifically, most deployments listen on port 11434 and present “Ollama” or “uvicorn” in their HTTP response headers.
Correlating default port usage with the presence of the Uvicorn ASGI server signature reduces false positives and identifies instances that employ non-standard configurations.
Geospatial analysis revealed that 36.6 percent of exposed servers were located in the United States, followed by China at 22.5 percent and Germany at 8.9 percent—highlighting widespread lapses in network perimeter isolation and firewall enforcement in AI infrastructure deployments.
Recommendations for Immediate Mitigation
To remediate these vulnerabilities, Cisco Talos specialists recommend enforcing strong authentication mechanisms, such as API key or OAuth 2.0 token validation, paired with role-based access control, to ensure that only authorized users can invoke the model.
Network isolation should be achieved by deploying inference servers within private subnets or VPNs and restricting inbound traffic to trusted IP ranges, ideally terminating TLS and stripping identifying headers at a reverse proxy.
Changing default service ports and suppressing metadata such as “uvicorn” or “Ollama” in HTTP headers can impede automated scanning efforts. Additionally, integrating API gateways that support rate limiting and anomaly detection will help detect and throttle abusive usage patterns.
Finally, organizations should schedule regular exposure audits using Shodan alerts or custom scanners (for example, naabu or Nmap with adaptive fingerprints) to promptly identify and remediate any regressions in server exposure.
While Shodan offers a rapid snapshot of exposed AI services, Cisco warns that a comprehensive security posture requires complementary discovery methods such as active probing, multi-source indexing (Censys, ZoomEye), and the examination of non-standard ports to close gaps in LLM deployment security.
Find this Story Interesting! Follow us on Google News , LinkedIn and X to Get More Instant Updates