IP address blocklists are a useful source of information about
repeat attackers. Such information can be used to prioritize which
traffic to divert for deeper inspection (e.g., repeat offender
traffic), or which traffic to serve first (e.g., traffic from
sources that are not blocklisted). But blocklists also suffer from
overspecialization -- each list is geared towards a specific
purpose -- and they may be inaccurate due to misclassification or
stale information. We propose BLAG, a system that evaluates and
aggregates multiple blocklists feeds, producing a more useful,
accurate and timely master blocklist, tailored to the specific
customer network. BLAG uses a sample of the legitimate sources of
the customer network's inbound traffic to evaluate the accuracy of
each blocklist over regions of address space. It then leverages
recommendation systems to select the most accurate information to
aggregate into its master blocklist. Finally, BLAG identifies
portions of the master blocklist that can be expanded into larger
address regions (e.g. /24 prefixes) to uncover more malicious
addresses with minimum collateral damage. Our evaluation of
blocklists of various attack types and three ground-truth datasets
shows that BLAG achieves high specificity up to 99%, improves
recall by up to 114 times compared to competing approaches, and
detects attacks up to 13.7 days faster, which makes it a promising
approach for blocklist generation.





Although performance of blocklists can be improved, they need to be
used carefully. Blocklists can potentially lead to unjust blocking
to legitimate users due to IP address reuse, where more users could
be blocked than intended. IP addresses can be reused either at the
same time (Network Address Translation) or over time (dynamic
addressing). We present two new techniques to identify reused
addresses. We built a crawler using the BitTorrent Distributed Hash
Table to detect NATed addresses and use the RIPE Atlas measurement
logs to detect dynamically allocated address spaces. We then
analyze 151 publicly available IPv4 blocklists to show the
implications of reused addresses and find that 53--60% of
blocklists contain reused addresses having about 30.6K--45.1K
listings of reused addresses. We also find that reused addresses
can potentially affect as many as 78 legitimate users for as many
as 44 days.