Machine Learning/Computer Vision & Internet Identity

From IIW

Machine Learning/Computer Vision & Internet Identity

Day/Session:Tuesday 1I

Convener:Liam Broza

Notes-taker(s): Nick Roy

Discussion notes, key understandings, outstanding questions, observations, and, if appropriate to this discussion: action items, next steps:

Half the people asked them to build this tech on OAuth, half on Sovrin, so decided to come here to hash it out

Building a person information manager

Build a database of yourself - where've you've been, what you own, etc.

Don't trust/can't control stuff run by the big internet identity/services companies

Built a tool with about 7 other people - lifescope

Have web browser extension for a bunch of browsers

Uses ETL tricks to figure out all your interactions, string this together into events

Could be put into an exascale model of the self

Tag things - semantic metadata on top of all of this

Return personal data ownership to the user

25-30 integrations

Search/sort through the info captured

Easy to search and share the info

Wrote about 100 scrapers for the top sites on the web

Parsers produce graphs

API - graphql - ingress and egress of data

Hang out with a lot of AR/VR people - want to do an AR visualization of your stuff, where you've been, etc

Liam has 7 TB of LifeScope data over 11 years for himself

Can do data science and visualization on it

Thinking about a human report card

Q: How are you categorizing people on their human report card?

A: Don't make up your own. There is a lot of existing work on this

Storage: on nodes you control or a private blockchain

Machine learning - computer vision scared Liam. You can use stuff like TensorFlow to do all sorts of elaborate tracking/modeling on a phone, browser using computer vision. Detect every object in the scene and how they all fit together, for example. Do pose matching on subjects. Facial recognition, not just the face, but the expression. Pathing - how things relate to each other in the scene. Photogrammetry - push photos and videos together from the same place and create 3D models - completely passive. Facebook could create a reproduction of Street View just from people's photos. You can compile the computer vision models into shaders - you can run as many as you want at the same time, and will do as much machine learning as the computer is capable of. The performance is improving on a weekly basis.

Talking about how to capture this daata and relate it to all the other data you have. Alternative to the closed computer visions systems out there.

Q: You do realize you're weaponizing this as well.

A: It's an arms race, everyone else already has this - intel, military, etc. Question is do 'we' have this?

If individuals have access to this data, then it does certain things about what a third party correlator could do with the data. So a tthird party couldn't tamper with the data. You now have the ability to refute that.

For individuals that are non-state actors, if you have a data sharing agreement (GDPR), you could use this as a motivation to say that collecting data about other individuals has to come with their consent. All data everywhere should be self-sovereign.

Q: How do you trust who collected that data? How do you know the data wasn't modified when it was collected? How do we go back and say that is the thing that really happened?

A: Have to be completely open-source and people own their own data. We each need a private mesh that owns our own data.

If there are multiple copies, those multiple copies can be used to build a verifiable authenticity. We all become correlators for each other. Then you are protected against a malicious third party doing correlation. (How?)

Q: Who do I sue if something goes wrong with this?

A: Put in a github pull request.

I want to have my data vault and have the great machine learning trainers come to me and use my data to make suggestions to me about things like the best place to fish based on my preferences, skill level, etc.

Need an obfuscated ID - decentralization system to act as a trusted broker. Want to build as little of this as possible, leverage data soverignty built by others.

Leverage an authentication system that is operated by others, but decentralized. Right now using OAuth 2 to get claims/scopes - who, what, when, where and how.

Suggestion to look really seriously at DID.

Did it in OAuth, everyone wanted Sovrin, questions was 'what's Sovrin?'

There are at least 6 nodes doing DID stuff.

If you're doing a test thing, Sovrin's really the only game in town. As long as you're using DIDs, you don't have to pick one.

Broke the infrastructure up into Kubernetes pods

Each node types come together in clusters

The clusters network together

API layer talks to the data cluster

Have looked at ledgers, blockchain stuff, etc.

Ingest is via OAuth with their own scope system, but it's modular

When you go to manage your stuff, want you to be able to turn off components, etc.

Q: Don't get why you built this, other than 'you want us to have this.'

A: Looking for $4M (slide). Wants a Ray Bradbury house as an alternative to the Internet. How do we right the ship? Want to make it easier to find 'objective truth.' You're losing insight about yourself that the big companies already have, if you're not collecting this stuff. Trying to prevent social engineering as an attack vector.

Q: Is this trying to enable free will versus societal programming?

A: That is the moonshot idea here. Want to build a predictive personal chatbot to help you run your life.

COEL - Classification of Everyday Life

Want a pseudonymizer between attribute bundles and consumers of that data

Q: Don't know that I get exactly what you're saying, still, and how does an average person use this? A: Right now it's a cloud service. Q: Where is that data stored?

A: Currently launch a microcluster of mongo, mint credentials to encrypt storage, but person doesn't fully control the key.

In order to do the machine learning, you have to decrypt the data. You could do machine learning just on your own data.

Data cleanliness is a huge problem.

With 5G around the corner and IoT there will be a massive amount of data collected on everyone.

Projection: By 2025 there will be 180 zettabytes generated by edge devices. Most of it will be collected, processed and dumped.

If you have a computing infrastructure that is decentralized, you can process it privately - no one else has all the data. All the processing happens off the central infrastructure.

Encryption will continue to get broken - every three letter agency is collecting all this data in Utah and in a few years the crypto will be broken.

Until homomorphic encryption is practical, there is no real privacy, there is only point in time privacy. State actors have unlimited resources, so privacy by obscurity doesn't apply. Non-state actors are disincentivized to correlate if you make it inefficient for them to do correlation.

We fall over ourselves on privacy at the expense of data access and understanding.

A credential is a way of making an attestation about the final product - but should be self-sovereign. Transactions are all about exchanges of value. I want to get value from my data.

Post-privacy and post-secrecy world. How does the individual maintain control? You can't maintain control. There is a time window- where you are right now is much more valuable than where you were 10 years ago.

In the edge space, want to control how much footprint you left behind. How do we manage post-privacy and post-secrecy? Supremacy of control is access.

AWS DeepLens or connect for Azure. You take a camera and couple it with a LIDAR sensor. The big internet companies are all working on productizing this. They want to put two edge device cameras in every room on the planet. Then they can run reports on who is doing what all the time.

Felt that privacy was an unapproachable topic full of jerks before - but privacy is about having a bit of space to reflect outside of the public domain. It has to be nuanced. ltimately we will learn things about ourselves. 90% of the world's data has been collected in the last two years. Have to think about privacy in the same way we think about identity - contextual, fleeting, situational. An opportunity to reflect - a screen we own that reflects upon what's happening with my data.

"The Known Citizen" - A history of privacy in modern America (Sarah Eigo book)