NYU Internet Census

The NYU Internet Census is a security research project sponsored by the OSIRIS lab that conducts internet-wide surveys across different services and protocols to gain insights into global exposure to common vulnerabilities. The data collected is available to the public in an effort to enable security research.

Introduction

The project started out as SSL NYU, which focused on monitoring the global use of SSL certificates which the public relies on to ensure the security of their internet services. This data was published in cooperation with the University of Michigan at scans.io and thus made available to projects such as the EFF SSL Observatory, which had already revealed issues and misconfigurations in the SSL landscape before.

The Scanning and Collection Process

NYU Internet Census gathers data in two stages. In the first stage, this involves scanning all public IPv4 addresses in an attempt to determine which have the respective service port open. Once an IP is identified as meeting these criteria, collection activities take place which involve connecting to and communicating with the service.

At no point does NYU bypass any technical barriers or otherwise access non-public-facing computers. We are doing everything possible to reduce impact on remote networks and we follow best practices as already outlined by the ZMap developers.

Services and collected data

  • NYU collects all SSL certificates visible on public IPv4 HTTPS web servers. This data can be used to detect changes such as malicious replacement of certificates or reveal the revocation of a compromised previous certificate. This data is complementary to the Electronic Frontier Foundation's SSL Observatory project. Other purposes include detection of insecurely reused or still actively used revoked certificates. In addition, with the NYU data one can see all IP addresses / services that claim to represent a particular domain - which in turn can be used for asset identification and detection of malicious certificate usage. Also the certificate fields can be used for soft- and hardware identification in specific situations. The SSL work is being expanded to encompass non-HTTP services, such as SSL and STARTTLS-enabled email services like SMTP, IMAP and POP.
  • NYU performs several HTTP studies that collect the HTML content of all public IPv4 web servers. The main HTTP study requests the index page (“/”) on TCP port 80, and other studies request other specific pages potentially on other TCP ports. This behavior is similar to what search engines do, except that NYU does not crawl the servers beyond the initial requested page. One of the potential uses of this data set is the identification of compromised web servers and injected malicious HTML snippets such as "iframes" to non-advertisement web servers. We found several instances of Javascript and direct IFrames pointing to so-called "exploit kits" that try to infect client computers. We also use this data to identify vulnerable embedded devices through fingerprinting the content and headers of the HTTP response
  • NYU gathers the reverse DNS records for all IPv4 addresses. This data enables organizational asset discovery and can help identify misconfigurations and possibly DNS hijacking attempts.
  • NYU uses the domain names gathered from the above processes as well as certain TLD zone files to conduct DNS "ANY" record requests. This data is also useful for asset discovery and the identification of phishing portals, as well as new malicious domains matching algorithmic patterns.
  • NYU scans a growing number of TCP and UDP services. TCP studies include SSH, SMB, Telnet, RDP, Mongo, Redis, CouchDB, and more. UDP studies include NetBIOS, DNS, NTP, IPMI, NAT-PMP, BACNet, SIP, SNMP, MDNS, and quite a few others. We use the metadata from these publicly exposed services to identify large-scale misconfigurations and vulnerabilities in consumer, enterprise, and critical infrastructure systems.

Accessing our data

All data sets gathered are post-processed and published in compressed form for public use in cooperation with the University of Michigan. You can find the data on scans.io.

Acknowledgements

NYU Internet Census employs a range of open-source tools, most notably the ZMap software developed by Zakir Durumeric, Eric Wustrow, and J. Alex Halderman at the University of Michigan. We publish a few of our own tools as well, including Recog, which is used in the processing stage of our scanning system.

Terms of Service

Use of the NYU Internet Census research datasets available on this website ("Project NYU data") is subject to the following terms. By accessing or using NYU Internet Census data, you accept these terms of service. If you are using NYU Internet Census data on behalf of another organization or entity, you represent that you have authority to accept these terms on behalf of the organization or entity and that the organization or entity accepts these terms. Subject to these terms, NYU grants you a worldwide, non-exclusive, non-transferable license to use or reproduce NYU Internet Census data. NYU Internet Census data is published on this website with the intention of helping enhance cybersecurity and may not be used

  • To do anything illegal or in violation of the rights of others, including unlawful access or damage to computers.
  • To facilitate or encourage illegal activity.
You agree to abide by all applicable laws when using NYU Internet Census data. You are responsible at all times for the consequences of your use of Project NYU data. NYU is not responsible for the actions of third parties, and you agree to hold harmless and indemnify NYU and its affiliates, officers, employees, and agents from any claim, action, or damages, known and unknown, related to the use of NYU Internet Census data. NYU does not make any representations or warranties of any kind regarding NYU Internet Census data. If any portion of these terms is found to be unenforceable, the remaining portion shall remain in effect. If NYU does not enforce these terms, it shall not be considered a waiver of the terms. NYU reserves the right to update and modify these terms from time to time.

Getting in touch

Feel free to contact research[at]scan.lol regarding further questions. We also appreciate any community analysis results and hope for your collaboration. We can be contacted using the web chat widget below or at #isislab on irc.freenode.net.

Opt-Out

In case you would like to be excluded from some or all of our probes please let us know at research[at]scan.lol - make sure to mention your CIDR blocks / list of IP addresses and affiliation.