Data Transfer Project – Universal Data Portability for All

From IIW

Data Transfer Project: Universal Data Portability for All (Overview, Demo, How To)

Day/Session:Wednesday 4C

Convener:Jessie “Chuy” Chavez

Notes-taker(s): Scott Mace

Discussion notes, key understandings, outstanding questions, observations, and, if appropriate to this discussion: action items, next steps:

Enabling Universal Data Portability

Google, Facebook, Twitter, Microsoft have contributed.

  • What portability looks like today. Ideal portability solution. Data Transfer Project Overview. Data Transfer Project Architecture. DTP & screenshots. What’s next
  • Mini-version launched 8-10 years ago.
  • Download your data. Launched as a one-stop hub. We’re already at 80 products, especially with GDPR. Good framework in place to make sure we capture everything. An article after Cambridge Analytica that an Irish blogger wrote, 200,000 retweets. Pointed directly at Takeout.
  • Surprises, I didn’t know Google had my voice searches. Next wave is to make it more user friendly. We have Privacy Advisor, launched today inside Search.
  • Challenges today. Mobile overtook desktop. Now what do you do? iCal good to download, upload elsewhere. Other formats not as easy yet.
  • Data also deletable.
  • N squared scaling problem.
  • What if we had a common set of data models.
  • Open to all companies
  • Scalable
  • Enables innovation
  • Reciprocal
  • No one company can own it.
  • DTP is open source, 1300 commits. Make it super easy. We’re adding model overlays. Also microdata formats coming in. All on Github.
  • Focus on customer data.
  • People moving from Google+ to Google Groups, Slack channels, a different problem because the data is owned by many.
  • Architecture slide. Mail exporter. Mail & photo exporter. Photo exporter. Internal representation of Mail Exchange Format, Photo Exchange Format. OAuth 1 and OAuth 2.
  • Mail is well defined. Photo is well defined. Except little motion thing in Google. A fidelity problem, how to represent things that are unique to each product space.
  • Exporter: Mail importer, Photo Importer.
  • Idea is each of these exists once.
  • We call those adapters, generic name. The magic happens in the adapter.
  • Apple has nested albums. Apple doesn’t have to change its API.
  • We provide the plug-ins. Applied what was learned over the years with Takeout.
  • Hosting environment:
    • Docker image containing demo frontend and back end userver. Running on local machine.
  • Steps:
    1. Download our demo image from DockerHub
    2. Obtain and download keys for the APIs you wish to support
    3. Run it!
  • First FB contribution: FB photos.
  • Instructions are online. You do have to have Docker.
  • Demo 2 – Prototype: Data Transfer Project at Google.
  • We have Google Dashboard, high-level summary of all the data you have at Google across all the products, i.e. location history is on. Drop down how to download data. Can transfer the data. Slides showing migration. Transfer to Flickr. OAuth screen to Google, followed by OAuth screen at Flickr. Trying to use OAuth scopes as best they can.
  • Take token, issue reads / uploads of files. Can do it now.
  • Future plans
  • Productionizing existing code (O’Reilly Site Reliability Engineering book)
  • Additional partners
  • Additional verticals
  • Consumer-facing functionality.
  • Big announcement was in July.
  • Agricultural findata project out of Purdue, really cool project. So farmers can have a unified dashboard from disparate systems
  • Strava to Fitbit, Google Fit, Garmin
  • What does it look like on the consumer side? How users actually handle the transfer.
  • Am I deleting at the same time, close their account? A lot of confusion
  • A privacy / UX challenge & tradeoff between those concerns
  • Getting involved
  • Build adapters!
  • Log bugs / requests
  • Make it your own
  • Local French businesses sharing power / utility data
  • Would love to see importers into identity hubs
  • The more users empowered to move their data around [the better]
  • Displayed Github repository
  • Constant flow of commits
  • Right now most of the developers are from Microsoft or Facebook
  • Cool fact, most of our developers are women
  • Web site with our white paper, pages of the stuff government officials and regulators want to see
  • Get into everything about minimizing data.
  • i.e. if Facebook does social media export, should it drag along your friends and their comments? Posts, not the name of the person who did it? Next-generation concerns. GDPR advises not necessarily to include data from person not doing the transfer
  • Not everyone allows the token to be revoked. We only want it for the length of time it takes to transfer your data. Bad actors could keep tokens. Kind of a failure of OAuth.
  • Some tokens are actually too broad. We don’t want to change profiles.
  • Q: Is what you’re running in the service the same as what’s in the repo?
  • Yes. Aside from Google authentication code.
  • Q: Will it appear on Google’s site?
  • We will vet it, but if it’s useful, it will appear on the Google site.
  • Q: Export of authenticated data?
  • We haven’t done it yet, but it came up in Mydata a lot. Data could be an asset for insurance reasons. Verifying data is a really important. We were wondering if we could add that as a layer to this. You would also have to insure that each product is verifiable.
  • Video, presented in 2017 in Datentag in Berlin, first conference on data portability.
  • We have a talk from Ali & Greg. Geoffrey Delcroix. Really interesting stuff on data portability. Microsoft came on board.
  • Google My Account, a lot of stuff. Today we launched privacy advisor. Can be annoying if you’re looking for driving directions. There are tradeoffs.
  • Privacy Advisor is contextual. You’ll see it in Search.
  • In Takeout we’re launching Schedule Takeout.
  • We keep hoping standalone viewers will come out.