”I Am Spartacus” Privacy via Obfuscation for Vulnerable Populations
“I Am Spartacus!” Privacy Via Obfuscation For Vulnerable Populations (Victims of Abuse, etc)
Wednesday 10J
Convener: Mike Kiser
Notes-taker(s): Mike Kiser
Tags for the session - technology discussed/ideas considered:
Discussion notes, key understandings, outstanding questions, observations, and, if appropriate to this discussion: action items, next steps:
Link to open-source tool discussed: https://github.com/derrumbe/Spartacus-as-a-Service
In addition to the below content, we discussed how this could be useful to immigrants (U.S. visa application requires 5 years of social media accounts), victims of abuse needing to relocate, and journalists who report on topics that those in power would rather not gbe investigated.
1) Spartacus, Kirk Gibson, and the Right to Privacy Retelling of the ending of Spartacus as inspiration for the right to privacy through obscurity. [see: https://www.youtube.com/watch?v=-8h_v_our_Q for entertainment and informative purposes.]
(2) Use Case: Seeking to be Forgotten Follows the difficulty of users to remove accounts and data from online networks and applications, using a sample identity as a typical use case.
- (a) A sample identity was fabricated (name, photo, backstory, etc). Note that this kind of wholesale creation of identities is much easier now with sites like http://thispersondoesnotexist.com . Accounts for this identity were created at 26 different services representing a wide swath of online activity:
- • Ashley Madison • Bumble • Dropbox • Evernote • Facebook • Gmail • Google • Groupon • Instagram • iTunes • LinkedIn • Medium • Myspace • Netflix • Pinterest • Skype • Soundcloud • Spotify • Steam • Tinder • Tumblr • Twitch • Twitter • WhatsApp • WordPress • Xing • Yahoo
- (b) Summary findings are shared based off of these systems (“How many sell your data?” “How many let you retain rights to your creations?” . . .)
- (c) Content was then produced for these online systems and resulting search results and targeted advertising were noted (based off of this activity). (d) An attempt was then made to delete all accounts and their data (with mixed results depending on the target service.)
- A wide swath of success and failure comprises this set of services. Some are forced to retain some information even with an account delete request—particularly those that are commercial in nature. Other sites may also have already mined these sites for data (photos, other collateral, etc.)—rendering account deletion relatively unhelpful. More regulation is shown as not being that useful, either, given existing retention requirements and the difficulty of enforcement / incentivization.
(2) Inadequacy of Privacy Legislation Super brief overview of current legislation, showing how it widely varies around the world and the lack of protection, along with daily headlines, is encouraging people to delete their online information/accounts and value their privacy more highly.
The lack of adequate legislation and the current tendency of enterprises to misuse personal data and sacrifice privacy leads to the need for an alternative that would “encourage” these services to provide deletion of data and accounts.
In short, a better way is sought, which leads to . . .
(3) Privacy Through Obfuscation
- (a) The goal is obfuscation of the “real” person or data. This is done through injection of additional accounts or data that creates false positives in the environment and creates confusion as to who the real person or data is.
- [see https://mitpress.mit.edu/books/obfuscation for a solid background on obfuscation techniques]
- (b) Many of these techniques are also based off of previous research done on location obfuscation:
- Ardagna C.A., Cremonini M., Damiani E., De Capitani di Vimercati S., Samarati P. (2007) Location Privacy Protection Through Obfuscation-Based Techniques. In: Barker S., Ahn GJ. (eds) Data and Applications Security XXI. DBSec 2007. Lecture Notes in Computer Science, vol 4602. Springer, Berlin, Heidelberg]
- (c) Three primary techniques explored:
- a. “Enlarging the Radius” —this technique constitutes the creation of additional identities / accounts on the target system with the same name but slightly different personal data. This enlarges the pool of potential targets and helps to preserve the privacy of the original identity. (“one name, many accounts”)
- b. “Shifting the Center” — this technique employs the creation of additional identities / accounts in such a way as to bias the peer group into a different center—one that no longer focuses on the original identity, but one that makes the original identity an outlier. This is primarily focused on a common photograph or other visual marker, with the other data varying per account. (“one photo, many accounts”)
- c. “Filling the Channel with Noise” — a technique which floods the existing account with extraneous false data to help mask the real data. (e.g. Liking various activities or postings on Facebook as a cover for real preferences.) [see this previous research for more information: http://www.kevinludlow.com/blog/1610/Bayesian_Flooding_and_Facebook_Manipulation_RD/]
- Note that flooding the channel is site dependent – for example, Facebook restricted the ability to put in bogus content into a user’s account in 2008 (but they still left “life events” open to manipulation). By contrast, Twitter still allows for the programmatic flooding of accounts with content. Others are more restrictive. Polluting past search results for engines such as google is also possible. Each app or online site may call for its own technique to ensure obfuscation.
(4) Presentation of an open-source proof-of-concept is introduced that facilitates these obfuscation techniques. Named, for obvious reasons, “Spartacus as a Service.” This will allow for automatic obfuscation of a chosen identity on a small scale, and lessons learned from its usage will be discussed.
- a. Current version of Spartacus as a Service may be found at:
https://github.com/derrumbe/Spartacus-as-a-Service
- b. It is an open-source tool written largely in Node.js under an MIT license
- c. Development is ongoing, and this is expected to be a long-term project (first official release would coincide with BlackHat/DefCon)
- d. Authorization for obfuscation is done via OAuth for a signed in user (explicit consent is therefore given)
- e. Additional resources have been incorporated to accommodate this content. A Markov chain is used to generate new content based on a textual repository (ranging from political platforms to the oft-used Jane Austen canon to Aaron Franklin’s book on BBQ (he’s a big deal here in Austin.)) Amazon Mechanical Turk may be used to circumvent bothersome pieces such as captchas.
- f. Note that this is not a tool that *prevents* targeted advertising and the like, instead it seeks to dilute the value of information that companies know about a user, masking the true information from the fake, so that it is impossible to tell what the real content (or in some cases, who the person) actually is.
(5) Results for these techniques are examined, with primary results being search results and targeted advertising rates / topics. (with baseline already established from the original identity.)
(6) Finally, lessons learned from obfuscation will be discussed, as well as the practicality of this technique going forward. (Spoiler alert: it's not practical at scale, for a few obvious reasons.) But false personas might be a thing in the future, much like many people use burner email addresses (see apple's new email proxy service, for instance.) [What it should do is push the consent receipt concept - empower people to know and control what they've shared w/o it being sold off