Tuesday, December 2, 2014

The Ginormous Stackrank of Human Experiences

I've decided to accouche an idea that began over four years ago.

Back then, I was freshly emerging from the ethics-heavy portion of my graduate education. The moral reasoning models I was learning copulated with the decision analysis tools I was exposed to, and my brain conceived The Carmack Vector Addition Theory of Ethics: Advancing the Ball.

In its first trimester, the idea was mostly geared toward enabling a more rigorous mathematical approach to ethical decision making. As the idea continued to gestate, I developed some alternative titles for the approach: "Mathematizing Morality" or "Quantifying Compassion." I also debated various designs and objectives. Eventually though, the example of Facemash from The Social Network (Mark Zuckerberg developed a website that allowed visitors to compare two student pictures side-by-side and let them choose who was “hot” and who was “not”) prevailed due to it's simplicity. My intention now is to create a giant stack rank of human experiences.

How would the system work? I'll go into greater detail below, but the crux of the system is pretty simple: users choose which experience they prefer out of a pair. For example, you might be asked, "Which do you prefer?" between (A) graduating from college and (B) falling in love. You select one, then move on to the next pair the system feeds you and repeat. The end result after millions of selections is a giant, robust stack rank of human experiences.

Below I detail (1) Rules, (2) Approach, (3) Next Steps, (4) Problems/Solutions, (5) Initial List, and (6) Further Commentary.

Rules

  1. You can only select between experiences you've actually had
  1. No experience in the list can exceed a "2" level of detail
    1. 1=Having coffee
    2. 2=Having coffee with a friend
    1. 3=Having coffee with a friend in the morning

  1. You must answer honestly

Approach

A user clicks on a link and arrives on the landing page/app home. There are two options: "View the List" or "Participate." If the user chooses "View the List", they are taken to the stackrank where they can search and browse.

If the user chooses "Participate," he or she is given a Batch (40 experiences). The experiences all have three options: "I have experienced this", "I haven't experienced this", or "this experience doesn't qualify, e.g. it exceeds the level of detail or is not an actual human experience" (the last option is for quality control). The default, "I haven't experienced this", is selected for all. The user selects the appropriate option for all 40 experiences, based on his/her own past.

The user is then fed a set of between 10 & 100 experience pairs (only experiences the user indicated s/he has had are presented to the user). Each pair has two options. For example, the pair is (A) graduating from college and (B) falling in love. The user selects A or B, and is then shown the next pair as well as a progress bar (e.g. pair 2 of 100). At the end of the set of pairs, the user is given the option to add a question of his/her own. If s/he chooses "no thanks," they are returned to the landing page/app home. If s/he chooses "add a question," they are taken to a screen where they submit a question (some brief submission guidelines display).

The system randomly-ish presents the new submissions to subsequent participants, and uses the results to update G-SHE (Ginormous Stackrank of Human Experiences) in real time, much as a chess ranking system would. The system also feeds pairs in a strategic way (e.g. doesn't often ask participants if they'd rather fall in love vs. lose your child) in order to elicit the most differential inputs, similar to the methodology for pairing opponents in large-bracket sport competitions.

Next steps


  1. Determine if G-SHE (or a list substantially like it) is already out there in the world somewhere. If so, consider abandoning or redirecting the effort
  2. Decide which ratings system to use 
  3. Most common- used to rank chess players
    Elo + ratings reliability
    Glicko + ratings volatility
    1. There may be a better rating system - these are just the first three I've researched so far
    2. I'm thinking Glicko 2
  4. Identify an existing list of human experiences to start with
  5. Develop the tool
  6. Distribute the tool
  7. Manage the tool


Problems/Solutions

This effort will doubtlessly run into numerous problems as it proceeds; I'll start capturing them in this expandable table.

Problem
Candidate solutions

Similar experiences submitted (duplicates)
-Use existing tech to detect similar submissions and have a human decide whether they're essentially duplicates, then merge if yes
-Whatever approach is taken to solve this problem in comparable settings, such as user feedback fora

Too little participation
-Could display leaderboards - e.g. who's submitted the most qualifying questions, who's submitted the most selections, etc.
-Could pay folks on mechanical turk to participate
-Could ask volunteers or ethics students to participate
-Could display the full list only if the user participates (only give a sample of the list until the user participates)
-Could exchange statistical analysis of the results for participation
-Whatever other solutions survey firms use to solve this problem

Quality of questions
-Enable a button on the selection screen for "recommend removing this experience (usually because it (1) is not an actual human experience or (2) exceeds a "2" level of detail)
-Enable a button on the "have you had this experience" screen to recommend removal
- enable Wiki-style comments, or some crowd-based moderation approach used in comparable settings such as wikipedia

Bots complete batches
Leverage existing human-detection tech and restrict participation to humans

Same person selects between the same pair 2+ times
Authenticate the users, or require a sign-in that signals the system not to present a pair to that user if that user has seen that pair in the past

Participant lies
    • Have the system refrain from including in the effective data set, all results that come from participants whose selection profiles vary more than 3 standard deviations from the median
    • Use some other "smart" techniques to detect likely liars and underweight or eliminate their responses from the calculations
    • Require a set amount of time on each question (similar to completing the blood donation questionnaire) to disincentivize speeding through the questions

Participant tires due to quantity of pairings
    • Allow users to complete a certain number of pairs per day/week
    • Allow completion in batches that don't exceed a  defined number of pairs


Initial List

I hope to find an existing list of human experiences that comply with rule #2, so I don't have to reinvent the wheel. However, the approach is scalable even if I do have to start from scratch. Here's a candidate initial list:

Being displaced due to a civil war
Waking up after a good sleep
Having sex
Skydiving
Giving birth
Mastering a foreign language
Being tortured for over six months
Losing a life partner
Going fishing
Having an accomplishment recognized at work
Eating lunch
Reading a book for pleasure
Your child dying
Voting in a meaningful government election
Having coffee with a friend
Taking a nap
Falling in love

Further commentary


  • I hope this list will be a useful tool for preference utilitarians. Though I'm not 100% sure yet of all the applications for this stack rank, I expect creative applications will be identified and developed by those who become acquainted with the result. I can imagine think tanks, policy analysts, ethicists, and others being interested in the data; demographers might collect rich data on the participants, then categorize and analyze the results. I also think the average person would be fascinated by the list itself- how interesting it would be to browse and see how various experiences rank! 
  • Q. Why the "2" level of detail? A. To engender consistency and simplicity. The greater the complexity, the more difficult (and potentially less reliable) the preferences become. Plus, constraining the base unit worked well for Twitter... 
  • Q. As sales and marketing professionals will tell you, people's actual choices are better measures of their preferences than what they choose in survey responses. How do you solve for that? A. I don't: that's a weakness in my approach. However, since not all experiences are chosen (say, being raised Catholic), my approach enables a comparison of a greater breadth of human experience than would be possible with a choice-based approach. 
  • Q. Your baby has a long way to go before it matures into a robust, mature adult. How will you get this effort there, given your limited expertise? A. I'm convinced that once smart people see what I'm going for, they'll identify and share improvements. I believe we only need a strong proof of concept to inspire better future versions (like how thefacebook.com of 2004 inspired the far more sophisticated version we now know in 2014 as Facebook).
  • In future iterations I'd like to provide a more sophisticated approach to letting the participant choose the experiences they've had, which populates the pool from which their presented pairings are drawn. 
  • I'd like to capture the data from the batch phase where participants indicate whether they've had the experience. That data element itself is interesting, in addition to being a useful basis for the system to decide what experience pairs to present to a participant (e.g. present several pairs that include rare experiences to participants who have had that experience).
  • So far the best title I have is "The Ginormous Stackrank of Human Experiences", acronym G-SHE; lmk if you have a catchier one.


No comments:

Post a Comment

Search This Blog