Cleanse

Day Three: Your Data Roadmap

A road snaking through the field of view.
You will make the map for your own data management systems! (pngimg.com, creative commons licence)

Today's plan is this: you are going to make a list. Of all your online accounts. Yes, all of them. Or at least as many as you can remember.

It will be a long list. So more specifically, you are going to start making this list. Because you will undoubtedly wake up at 2am remembering, "What was the name of that one shop where I bought Hannukah-themed socks for uncle Roger eight years ago?" So get started now, because you'll need a place to write that down in the middle of the night.

Trust me, you will thank me for this later. I am the Marie Kondo of personal data.

A Roadmap to the Roadmap 

Lay your list out as follows, with these columns (or download the template here):

  • Write the name of the service in the first column.
  • Then write a short description of what it's for (Mail/Music/Calendars/Search/Contacts/Shopping/Social/Work etc.)
  • Add space for five checkboxes.
  • Then space for notes
  • Then a checkbox called DONE.

A great way to start is actually in your password manager, as it has a pretty good list of any online accounts with passwords that are autofilled. If you've just migrated successfully to Firefox (in Day Two) you can find those logins here: if you use LastPass or another external password manager, log in and look for the list.

Consider, too, that if you have an account with a Platform (like Google or Amazon),  chances are you use it for many things at a time. You may buy supplies on Amazon.com, stream music on Amazon Music, and listen to books on Kindle or Audible (an Amazon company, it says so in the splash header). You probably use Google Calendar, Classroom, Maps, Mail, Search, Contacts, and much more (Wave and G+ anyone? anyone?).

In your list, break these different services out into separate line items, because you will deal with each of those elements separately.

Your password manager isn't everything. I'm betting if you're not like me (which is basically, I freely admit it, everyone else on the planet) you have probably used some form of Single Sign On to purchase something in the past.

This is that system where instead of creating a unique login for a service, you click "login with Google" or "login with Facebook" or Apple or Github or Microsoft. You've seen this before.

The trick will be to try to remember as many of those accounts and purchases as you can, if possible, especially if you use those services often and want to use them again.

Please do not make this list on Google sheets because you are about to migrate away from Google products, or at least be more selective in how you use them. But you can make your list on a piece of paper (or several, like how about a small booklet?)  

If you must use software here is a version of the list available for Spreadsheet software. And I recommend you download LibreOffice, the open source version of Word-Excel-Powerpoint, which you can use to open this file and you don't have to pay for it. It has all the same functionality as the Office 365 suite, looks fairly similar, and is straightforward for everyday use.

The Master of your Accounts

The purpose of this list is three-fold. First, it's going to give you a sense of where your personal data currently is. Even if there are a lot of places, grounding this will feel much less overwhelming than just waving your hands in the air and not knowing where to start.

The second purpose of this map is that it is going to give you a road map for what you want to do with that data. You will make this roadmap yourself, based on your own choices. You will start to be intentional about your choices, not just signing up for whatever because your friend sent you a link.

You will leave the services that don't serve you. After all, some of those services you probably don't need anymore (unless you need an annual dose of Hannukah-themed socks). You can choose to close those accounts down and wipe as much of your data as you can.

Some services are things you still need, but you need to deal with them in a more data-conscious way. And others still, you are not in charge of. We will discuss how to deal with these below.

The third purpose of this map, however, is that it is a checklist. Because you will list these services and logins, decide what to do with them, then do it. Yes, you will! That's why you need the roadmap. To help navigate these three weeks.

On your list, label the five checkboxes and mark them based on these categories:

  • LOSE. What can you live happily without? You will eventually log in to each of these and delete your account information so you can move on. (Not today. Today you're writing this list.)
  • SSO. Which are Single Sign On (SSO) or linked through Google, Facebook, or Apple? The ones you want to keep, you'll go through and create separate logins in a few days' time, once we have sorted out your email, email masking, and shopping habits. 
  • STUCK: These are systems you can't leave, because you have no control over the fact that you have to use them. Think, like, not legally allowed to leave, not just stuff like you're stuck on TikTok because you like it. Maybe these systems belong to your work, or a work-issued laptop, or your church or temple, or your doctor's office. You didn't make the choice of what system to use in these cases. 
  • MIGRATE: Necessary, beloved, or core services that you rely on and have some control over: that is, you once had a choice to join or not, (even if now it feels like you have no choice). Think LinkedIn or Google. However, you're concerned about how they use your data. In the coming weeks you will address these services in turn. You'll download or migrate your data somewhere more trustworthy. Or you'll find alternative systems that do the same job in a more privacy-oriented way. Or you'll trim and slim down your contact list. Or you'll adjust your login emails and profile names so they aren't easily attached to your do-it-all-gmail.  There are many approaches here. In "notes" we'll indicate how you choose to manage and migrate.
  • STAY: Systems that respect your privacy or help you achieve your privacy goals. You'll return to some of these to ensure that your logins are safe.

For everything that isn't LOSE or STAY, you're going to choose one of the following options. Let me tell you what they are, and why you might choose them.

Most Privacy Isn't Private

A lot of what you know of as "privacy" is not privacy at all: it's a strange form of window dressing. Many companies have long and extensive Privacy Policies, so long they put you to sleep and you don't really read them before you click accept.

Or they have complicated Privacy Dashboards that look like a 747 cockpit, about who can see what kind of data about you. This is common on social media. Like Facebook controls which friends can see what details about you, but it won't let you control what Facebook itself records about you (which, at one point years ago, even included things you typed into its status box without even submitting).

These Privacy Dashboards often feature what human-computer interaction specialists call "dark patterns": user interface controls that convince you you have some kind of control, when really they're misleading. For instance, in Twitter's privacy settings they used to create a long generated list of topics the system thought you were interested in based on your system interactions. It then invited you to deselect, manually, anything you wanted the system to forget about you, but the list was hundreds of items long and it had a nasty habit of resetting, forgetting what you checked or unchecked, making you start over again.

I once took them at their word and spent three days or more meticulously unchecking every box, repeatedly, until it was done. Ugh.

Then there is an entire professional class of computer managers and scientists who are specialists in "Security and Privacy." This conflates a series of information security questions (like, "can my system get hacked") with data privacy questions, leaving no room for questions like, "how in control of my own data am I?" In other words, it's their job to tell companies how to keep your data in a digital Fort Knox, so no other company can steal it, leaving said company free to mine your data for whatever it wishes.

It's like you've put your money in the bank and it's got a lot of heavy duty locks and security systems, but inside people are diving into your money like Scrooge McDuck and playing with it to turn it into their own investment vehicle, and you don't see the interest or returns.

I want you to become passionately curious about what happens under the hood.  That is, your choice of services will no longer be about whether it's pretty, or has a nice design, or lets you talk to your friends, or has cool videos, or comes in Rose Gold. 

Your question will be: where does it keep my data, what does it do with that data, and how much control do I have over it? This is an important question, because in the past fifteen years, collecting your data has become a massive part of their game, one that they are playing for keeps.

So, what is their game?

The personal data economy (and its offspring, "generative AI") were born of specific advances in data processing that we call machine learning. Machines can 'learn' by frequencies of association -- what word is likely to come next, what products people are likely to buy together, etc.  To do this, they have developed ways of associating data to know what data typically comes together and which things are categorically separate.

The initial approach to this was based on relational databases, which is just a special way of saying a way of keeping data in such a way that preserves relationships between items in the database, or across datasets.

Say you have three datasets about different things -- music, shopping, and email, for instance. But they all have an email address or a phone number in common. With that one element in common, it becomes possible to tie those three datasets together and associate them with an individual.

That's not all. With advances in data mining, it was soon clear that you could pinpoint somone with just three pieces of information that didn't have a single element in common -- a phone number, an E-Z pass ID, a zip code, for instance. It turns out that bringing these three disparate things together, you end up with unique combinations. Just three pieces of information, even if they are about disparate things, are enough to pinpoint an individual. The needle in the haystack is found.

Another issue has to do with single datasets, because if they're capacious enough and contain enough different kinds of information, they can also be de-anonymized and individuals can be identified. My Princeton colleague Arvind did this with the Netflix dataset about a decade ago: it turns out, your combination of preferences for slasher films, rom coms, and sci fi is unique to you. So we want to be careful about how much of the many aspects of our lives we give to one company as well.

There is more, of course, and systems have moved on since them too. But a key element here is that companies profit by bringing your datasets about disparate things together. They can do that through having single ID's or sign-on's (phone numbers, etc). Or they can do that through owning many digital systems and elements and bringing them together.  Or they can do that by having a big enough catalog that you will show many sides of your personality while using them.

You can see how platforms got so good at this game, and how our every interaction on them fed their empire. We have to break that party up and do what we can to keep our datastreams separate.

Some systems will help you do this. They will protect your data to the death. They will erect a Fort Knox just for you. They will keep working at privacy and protection, because that is the core of their product. In the coming weeks, you will come to know these companies and explore their services.  You will migrate some of your services over to them. If they betray you, you will move again. You will learn how to figure this out and it will feel more like a fun game than a chore. 

For everything else, you have a few other options.

Privacy through Data Sovereignty

One option is to take your data back. Make it yours, make it local. As in, have it on a disk at home (if you haven't bought an external drive yet, go back to Day One).  Or you manage a server with your data on it, locally or elsewhere.

Honestly, this might be you one day or it might not be, no judgement. We don't all have time or resources to do it all alone. (That said, a year from now if you pop up in my timeline and say, "Hey Janet now I serve my own email!" I will give you a huge digital high-five.)

But that doesn't mean we should cede control. So another way to manage this is to migrate your data to trusted providers. Companies that don't sell it, mine it, use it to enrich their own pockets and investors. We will explore this more throughout the 21 days.

Note that when you take your data back, you're losing some of the Fort Knox security. That is, you'll have to think about how you keep that data secure and protected. You won't have an army of IT people trying to protect corporate investments.  

Of course, if stuff is stored in an external harddrive that isn't plugged in and is password protected and held under lock and key in your house, that is a more secure option in many ways. But certainly do not assume that "privacy by obscurity" is a good way forward: as I've said elsewhere, that's basically the rhythm method for the Internet.

Still, we can't realistically own everything or control everything, and not every alternative is viable for everyday users. Also, we don't always control what systems we're asked to use. Your kids' daycare, your father's oncologist's office, the work computer you're issued with Google suite on it... We have to have other options.

It's not all or nothing. We may be required to play their game, but we can definitely take evasive action.

Below are two techniques for mainstream services that I innovated over a decade ago. I have used them for twelve years to thwart the advertising machinery, the algorithms, and even AI. They may not work forever, and they dont work for everything, but they are pretty effective at producing some form of obfuscation overall. If the threat model you have is one of piecing everything together, these are fairly effective strategies. I call them "Data Balkanization" and "Render to Caesar."

Data Balkanization

The principle behind data Balkanization is simple: don't put your eggs in one basket. Just go one step further by spreading your traces across hostile domains. Let me explain.

Say you use Google for maps, but you use Yahoo! for mail. Both are going to mine the heck out of what you give to them, of course. But Google and Yahoo! are also rival companies. They are not going to share data with each other. Yahoo! can mine your mail all it wants (so we'll be attentive to that) but they won't know where you are or where you're going too. Google can record every time it gave you directions, but it can't link that up with anything in your email.

Remember those IT guys assembling a digital Fort Knox? You can make it work for you instead of against you.

Here's another example. You talk with your mom in WhatsApp, your colleagues on Skype, and your boyfriend on Signal. WhatsApp is owned by Facebook; Skype is owned by Microsoft; and Signal is a privacy-centric Foundation. Facebook and Microsoft hate each other with approximately the fire of a thousand suns. They are also both so enormous and powerful that they are unlikely to either merge or eat each other's lunch. Signal would rather die than make your data available to anyone. Your data is unlikely to be merged when spread across those three platforms, especially provided it is spread thinly.

Data balkanization is anti-platform. It resists the idea that all your data should belong in one place, to one company, to do with what they please.  And it's not the only option.

Render to Caesar

This one is also anti-platform. Its name comes from a Christian parable where early followers of Jesus inquire if they should be paying him their money instead of taxes to the Roman state. Jesus asks them, whose head is it on the coin? Why, Caesar's head, they reply, to which the answer is: "Render to Caesar only what is Caesar's." In other words, it's okay for his followers to pay taxes to Rome as long as they don't give away their spiritual fealty too.

I'm not asking for your spirituality, or your fealty! Instead I am asking that you consider those systems you really can't give up with a little more grace. For instance, you don't want to leave your birdwatching group on Facebook. It gives you so much joy!  But then you don't also need Facebook to stay in touch with your high school girlfriend's best friend's mother's aunt's neighbor. Why not use Facebook for just one thing -- birdwatching -- and nothing else?

This is the principle behind Render to Caesar. You choose to use one system for one thing only, so that's all they know about you. So Mark Zuckerberg knows you like birds: cool beans! He doesn't need to know everything else about you. He'll also show you pics in your feed of things that "other people who liked birds liked" and you will resist clicking because you don't want to feed his algorithms. To Facebook, all you like is birds. That's it. You're a one-hit wonder.

Think about it. How about LinkedIn only professional networks; X only for public service transit and weather announcements; your work email only for work email. Render to Caesar. Only give them what you choose, and make it small.

The Bottom Line is ... It's Not A Hard Line

With your data roadmap, go back to the elements you checked STUCK SSO, or MIGRATE, and consider to which ones you will apply a private service, data Sovereignty, data Balkanization, or Render to Caesar going forward. You don't have to decide just yet, but at least have these options in mind, and write any choices you make into your list in the notes section.

Anything you have accessed through SSO, you will Balkanize. You want to sever those ties to Google or Facebook or whatever that the single login establishes.

Anything you have listed STUCK or MIGRATE, think of these choices. You might be able to choose an alternative or keep some data at home. For those that are more limiting, rendering some data to Caesar or spreading your traces thin, all may help create some layer of obfuscation.

You also have to make it difficult for them to piece a single picture of you back together. Everyone only gets one piece of the puzzle, a sliver at that. They'll have to go knocking on a lot of doors to assemble all the pieces.

All this is to say, you don't have to give up these systems entirely. You just have to not give them the entirety of your life.

Note that this is not the same as complete invisibility online. It's also not disappearing from the Web entirely. It is making it harder to assemble your data for pinpointing purposes and is thwarting the data economy instead of feeding it with your traces as you live online.

Now, you may ask yourself, isn't this crazy? How do you live with all this rendering and balkanizing, Janet? Don't you have like fifteen email addresses? (I have an uncountable number of email addresses.)

Actually, I love living like this. I find it so, so much easier to live like this, I can't believe it. If all my emails were actually going to just one email address, I'd be swamped. I'd have so much stuff in one spot I'd be drowning.

With the different parts of my life separated, with their own space for each element, I am far freer. I can choose when I log in, what I check, which notifications I'm subject to. I don't have to rely so heavily on spam filters. I can also do better at balancing between work and life, because I can stop checking work stuff when I get home, and stop checking life stuff while I'm at work, if and when I choose. It is liberating.

Because someone else didn't balkanize your data -- you did. You sorted it out. You put it away. It's your personal KonMari method for digital stuff. You know exactly where it is and why.

--

Tomorrow, we'll tackle your email, which will make all of this more possible. For today, though, we need your roadmap. You likely have a long list to make. Go ahead, I'll wait.