Yuhong Nan, "Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps"

CERIAS Weekly Security Seminar - Purdue University

English - February 26, 2020 21:30 - 225 MB Video - ★★★★ - 6 ratings
Technology Education Courses infosec security video seminar cerias purdue information sfs research education Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Doug Rapp, "Security, Ethics and the End of the World as We Know It"

Next Episode: Matt Mickelson, "Physics-Based Approaches for creating Cyber Resilient Systems"

A long-standing challenge in analyzing information leaks within
mobile apps is to automatically identify the code

operating on sensitive data. With all existing solutions relying on
System APIs (e.g., IMEI, GPS location) or features of user
interfaces (UI), the content from app servers, like user’s Facebook
profile, payment history, fall through the crack.

In this talk, I will introduce ClueFinder, a novel semantics-driven
solution for automatic discovery of sensitive user data, including
those from the server side. ClueFinder utilizes natural language
processing (NLP) to automatically locate the program elements
(variables, methods, etc.) of interest, and then performs a
learning-based program structure analysis to accurately identify
those indeed carrying sensitive content. Using this new technique,
we analyzed over 400k popular apps, an unprecedented scale for this
type of research. Our findings brings to light the pervasiveness of
information leaks, and the channels through which the leaks happen,
including unintentional over-sharing across libraries and
aggressive data acquisition behaviors.