Bring on the noise [post 44/100]

How many questions a day is the average human prepared to answer? When I was in school, days when we were asked loads of questions were scary – they were exam days, or worse, pop quiz days. Most questions were asked person to person, and were related to events that were shared between those people or the wider community. Maybe a couple of times a day (outside of art or design classes) did anyone ask me whether I like something – we seem to be able to gauge whether the person we’re with likes a thing without having to ask. Now I’m being asked hundreds of times every day.

Every article that has a Facebook LIKE button on it is asking me. Every post on Facebook is asking me. Every interstitial ad served to me on my mobile device is asking me. Hundreds of times a day I’m being asked (albeit tacitly) whether I like something. I’m pretty sure my responses have little to do with the reality of my interests and tastes.

It’s already been a couple of years since we began to figure out that a million ‘Likes’ on Facebook didn’t directly correspond to an equivalent number (or even a reliable ratio) of actions in the real world. That’s because when we ‘Like’ something on Facebook, it’s often not because we actually like that thing in any literal sense. I am often guilty of this, usually in one or more of the following ways:

Images or links that are so horrible they’re funny
Things my friends have posted specifically asking for them to be passed on (but that don’t necessarily fall inside my areas of interest)
Things my friends have posted that I know are important to them, and that I want them to feel good about or supported in
Random things that I see when I’m hung over or bored in the airport and really should be doing something other than rifling through Facebook

I’m sure that people also ‘Like’ a disproportionate amount of stuff posted by their boyfriend/girlfriend/spouse/partner/enamoratu/crush. Equally, I’m sure that lots of people don’t ‘Like’ things posted by people they’re on the outs with but not so badly they’ve un-friended them. My point is, what we ‘Like’ isn’t necessarily the same as what we like.

Maybe that’s because this construct is a new one that doesn’t have a direct corollary in the physical/historical world, and so we’re still getting to grips with what it’s for. It’s often called ‘grooming’ behaviour, after the way that primates groom one another. It’s definitely social, and somewhat affectionate, but it doesn’t necessarily carry meaning in the same way that it would if I walked up to you on the street and said, hey, I really like your shoes.

But social networks aren’t the only ones gathering data about you in the interest of figuring out what you like. Advertising platforms do it too, tracking how often you click on ads, how often you dismiss interstitials, and so forth. The assumption there is that if I don’t click on the ad, or I dismiss the interstitial, then I don’t like/am not interested in the thing. Trouble is, that might not be it at all – I might actually be very interested in that thing but just don’t have time to look at it right then. Or, more maddeningly, I am so interested that I have actually just bought the thing and therefore really do not need to have it pushed at me anymore.

And increasingly, the objects in our physical environments will gather data on how we interact with them (or don’t), where we go, where we stand still, for how long, etc. If you were at SXSW it’s entirely possible there was a beacon that could (not that it did)(that I know of) have tracked how much time you spent in the toilet.

All of this is allegedly in the name of gaining better understanding of us in order to be able to provide a better service. But I’m not sure this is an effective tactic. As I’ve said many times before, unless you know why I’m doing what I’m doing, you can’t really draw viable conclusions from my actions. And a lot of the data we’re collecting and intend to start collecting is so loose that we may not even be sure what it is we’re watching for.

A lot of the bigger-is-better logic behind Big Data seems to be predicated on statistical concepts of relevance, i.e. the larger your sample size, the smaller your margin for error. But that only applies if you are asking a question, and moreover are asking the same question consistently every time.

So what is the question we are asking, with all of this data we’re gathering? Any data scientist will tell you that large volumes of relatively unstructured data are extremely difficult to extract meaning from. Yet lots of companies seem to be adopting the dubious non-strategy of ‘gather everything and we’ll figure it out later.’

I can’t help but think that these practices are simply saddling us with more noise, rather than with something useful and valuable. Not to mention the fact that a lot of data gathering practices can be considered intrusive to the individual being studied. Perhaps it would be helpful for us to be a bit more thoughtful about this as well – I’m all for gathering data but I tend to be quite clear about what I’m gathering it for, what the question is that I’m trying to answer. That way I can be reasonably certain that I’ll get a reliable answer to my questions, and just as importantly be respectful, fair and open with the individual users whose actions are my source.