You are not logged in.

Announcement

Welcome to the IMPACT Forum, a place for researchers, developers, data & tool providers, and other cyber risk stakeholders can discuss all things IMPACT!

#1 2018-05-04 2:07:16 pm

PHick
Member
Registered: 2017-01-11
Posts: 2

Update on CAIDA datasets

We send this announcement to let you know about several new and improved datasets available from CAIDA and to remind you that we would like to hear about publications (including presentations,
web pages, class projects, etc.) using CAIDA data.

See our overview table at http://www.caida.org/data/overview/ for a complete list of all our current datasets. See below for a list of new or modified datasets.

Changes in data access over the last year
=========================================

Several changes over the last year affect the way users access
some of our most popular datasets.

Many CAIDA datasets are available exclusively through IMPACT (DHS' Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT)):

  - recent (less than 1 year old) IPv4 Archipelago (Ark) data
  - all datasets from the UCSD Network Telescope
  - our DDoS dataset (http://www.caida.org/data/passive/ddos-20070804_dataset.xml)

In our data overview table http://www.caida.org/data/overview/ , these data sets are marked in the third column ('Availability') as 'Request access from IMPACT'. IMPACT provides data access only to U.S.-based researchers and researchers working in one of the IMPACT partner countries  ( https://www.impactcybertrust.org/help_international ).   If you need access but are not eligible to obtain them through IMPACT,  please contact us to discuss possible alternative solutions.

Archipelago data
----------------

All IPv6 Ark traceroute data, and all IPv4 traceroute data older than one year are now accessible through our public data server http://data.caida.org (i.e. no username/password is required to access them).

The most recent year of Ark IPv4 traceroute data is now available exclusively through the IMPACT portal.

Before last year access to the most recent two years of Ark IPv4 traceroute data could be obtained directly from CAIDA. Existing accounts (username/passwords) on https://topo-data.caida.org remain valid until their nominal expiration date.

The old Ark data server https://topo-data.caida.org is scheduled to be retired. Existing accounts on this data server will be gradually moved to the main CAIDA data server https://data.caida.org. If you regularly download new Ark data, please send us email and we will make sure you don't lose access during this transition.

Telescope data
--------------

All data sets related to the UCSD Network Telescope are now available exclusively through IMPACT.


New and improved datasets
=========================

Since the last of these announcements (April 2017) we have added/improved a number of datasets:

* Anonymized 2018 Internet Traces Dataset

In March 2018 we took the first monthly trace on our new 10 Gb link  monitor in New York city. This monitor picks up where we left off in Chicago in March 2016.  This move was forced on us when links in  Chicago upgraded the link we were monitoring from from 10 Gb to 100 Gb.

The March and April traces are now online. We plan to publish these traces at least quarterly; depending on storage resources we may decide to publish all monthly traces (as we did before 2014).

We are also still looking into options for upgrading hardware to capture on 100 Gb links.

* Macroscopic Internet Topology Data Kit (ITDK)

The latest ITDK is available through Impact only.

http://www.caida.org/data/internet-topology-data-kit/

We released a new Ark ITDK: ITDK-2017-08. The ITDKs contain two router-level topologies generated from the same IP-level topology based on data from the Ark IPv4 Routed /24 Topology Dataset.  They also include an IPv6 router-level topology; assignments of routers to ASes; geographic locations of each router; and Domain Name Service (DNS) lookups of all observed IP addresses.  Latest snapshot available through IMPACT; older snapshots are available publicly at the URL above.

* Peering DB Dataset

http://www.caida.org/data/peeringdb/

PeeringDB (https://www.peeringdb.com/) provides an online database of peering policies, traffic volumes and geographic presence of participating networks. Our researchers have been taking daily snapshots of this database since August 2010. These are now available on our website (with permission from PeeringDB).

* AS Facilities Dataset

Available through Impact only.

http://www.caida.org/data/as-facilities/

This dataset created in April 2017 contains information about geographic locations of interconnection facilities, and autonomous systems (ASes) that have peering interconnections at those facilities.

* Internet eXchange Points (IXPs) Dataset

http://www.caida.org/data/ixps/

This dataset provides information about Internet eXchange Points (IXPs) and their geographic locations, facilities, prefixes, and member ASes. It is derived by combining information from PeeringDB, Hurricane Electric, Packet Clearning House (PCH), and GeoNames.

* Geolocated Router Data Set

Available through Impact only.

https://www.caida.org/publications/papers/2017/look_at_router_geolocation/ (links to: https://www.impactcybertrust.org/dataset_view?idDataset=792 )

We provide a ground-truth dataset of 16,586 router interface addresses and locations, with city-level accuracy.  13:35 This data set was used in our publication A Look at Router Geolocation in Public and Commercial Databases (IMC 2017) by M. Gharaibeh, A. Shah, B. Huffaker, H. Zhang, R. Ensafi, and C. Papadopoulos.

* BGP Communities Data Set

Available through Impact only.

https://www.caida.org/data/bgp-communities/

We developed a web-mining tool that enabled automatic compilation of a dictionary of BGP communities and their geolocation semantics. The resulting dictionary represents our best effort to extract meaningful geolocation information encoded by network operators into the BGP community attributes they set up for their networks. Available through IMPACT.


Did you publish anything using CAIDA data?
==========================================

For those of you that already sent us publication information, thank you very much!

If you have not done so already, then please email information on your publication to data-info at caida.org, including the following minimum information:

    - Authors
    - Title
    - Date of publication (Year/Month)
    - Dataset(s) used

Any additional information, if available, is of course also very welcome:

    - DOI
    - Booktitle (for proceedings)
    - Journal (for articles)
    - Institution (for tech reports)
    - URL (either to a paper, or to an abstract)

We will use this information to update our list of publications that use CAIDA data:

    http://www.caida.org/data/publications/bydate/index.xml

This publication list is an important factor in communications with our sponsors, members, and funding agencies. Your research papers, enabled by CAIDA datasets, are an important indicator of CAIDA's success in creating and curating internet datasets for use by the research community, and thereby provide valuable help in CAIDA's efforts to secure future funding for our data curation work.


Suggestions for future data sets welcome
========================================

We would very much welcome suggestions on possible future data sets that would be most useful for your research. This will help us make decisions on future datasets and help allocate our resources on both the technical and policy side.

Offline

Board footer