Friday, October 03, 2008

Review: StackOverflow

StackOverflow is a new programming Q&A web site created by Jeff Attwood and Joel Spolsky. (Well, I think Joel just helped fund and promote it, he didn't do any of the actual creation — that was Jeff and his team). It is still in beta, and things are still in flux, but the changes are calming down somewhat. It's not a discussion board (though some would like to use it that way), it's specifically designed for programming questions and answers. The questions can be as detailed as you need them to be (here's one on writing XML files using a particular tool in a particular encoding scheme using C#), but there are lots of more generic ones too. One seemingly popular question is "what are some of your favourite "hidden" features of <language>" — there are such questions on C, C++, C#, Java, python, perl, and others. I've found a couple of helpful tips in those ones.

The site was meant to be an amalgam of Yahoo! Answers (though only for programming questions), digg, and a wiki. You can ask and answer questions (and make comments on both), and each question can have one "accepted" answer. For each question and answer, you can "upvote" if you like it (i.e. it's correct, partially correct, or at least helpful) or "downvote" if you don't (if it's wrong or not helpful at all). You can also mark a question or answer as "offensive" if it's hate speech or spam or something like that. If an question or answer gets enough offensive votes, it vanishes entirely. With the speed at which things get voted on, this can happen very quickly, so if someone posts some spam "question", it will likely vanish within a couple of minutes. You can also tag each question with up to five categories, similar to gmail labels, so if you want to search for all questions on C#, you can just click the C# tag. (There's even a sqlanywhere tag, though there's only been one SQL Anywhere question so far.) The idea of the site is that if you have a programming question, even if you don't know about SO, google searches will hit the SO site and you'll find your answer quickly and easily.

I think the idea of this site originated in part with experts-exchange.com (which used to be expertsexchange.com but they added the dash because it looked like ExpertSexChange.com (snicker)). A number of times in the past, I have done Google searches looking for information, and came across a question on that site that was similar to the information I was looking for. But when I went to the site to look at the conversation, it said that I had to become a member (i.e. pay) to see the answers. I immediately thought "bite me" and went to the next Google entry. I'm sure Jeff did the same thing, and then decided that having such a site for programmers that was free would be a good thing.

On a related note, there is advertising on the SO site, but it's only one unobtrusive ad down the right side of each page. Maybe I'm naïve when it comes to internet advertising (actually, there's no maybe about it), but I can't imagine that even with several thousand users, that one ad is bringing in enough money to pay Jeff a living wage (he is working on this full-time) plus pay the other three or four part-time employees.

SO uses the concept of "reputation" to (a) give users some "credibility" (though that's artificial; I'll get to that in a bit) and (b) to allow users to gain limited "moderator" abilities. The more upvotes your questions and answers get, the more you gain reputation, and obviously you lose rep points for downvotes. There are also badges for certain milestones (eg. you get a "good answer" badge for each of your answers with a net upvote of 25), but they're essentially just for fun. Once you have enough rep points, you can start doing extra things; you can't leave comments until you have 50 points, you can retag other people's questions when you get to 500, and you can edit other people's posts and delete comments when you get to 2000.

Anyway, enough about what the site is, and onto my impressions of it.

Moderation in moderation

SO was a private beta for several weeks before it went public. While it was private, everything was wonderful. Once it went public, I didn't notice a significant drop in the quality of questions or answers, but there was an (expected) increase in the amount of junk added — people asking silly or subjective questions ("Should the open brace of an if be on the same line as the if or on the next line?"). One question contained the subject line "Why do birds..." and the rest of the question was "...suddenly appear, every time you are near?". The user who posed the question was "The Carpenters". The question was down-voted so fast that within two minutes of the question being posted, I couldn't find it anymore. I was one of those who down-voted it, mainly because it forced that horrible song into in my head, and it's still there. Now that I've mentioned it, it's probably in your head too — sorry 'bout that.

However, the moderation is having its problems too. I've seen lots of questions closed by one user and then re-opened by a different one. I saw one question that was an exact duplicate of another one posted a couple of minutes earlier by the same guy (it looked like he posted a question and didn't think it worked, so he did it again). One of them received five or six answers and was then closed as a duplicate, while the other one (that remained open) had no answers. Part of this problem is the speed at which answers show up, which is the next topic.

For the most part, the moderation is done automatically by the community as a whole — if a question or answer is stupid, meaningless, or an obvious troll, it's downvoted or marked as offensive and disappears. There is no provision for voting on users, but at least one has been deleted because of useless postings (this was "Consultant Barbie" who answered a bunch of questions with "<topic of question> is hard. Let's go shopping!"). This is odd because in one of the SO podcasts, Jeff mentioned that he did not want to delete users who did this, because it would basically make them mad and they'd just keep creating new accounts or finding other ways to be a pest. If you just ignore them and downvote their inane answers, they will either get bored and stop, or they wouldn't, but you would rarely see their downvoted answers anyway.

I'm not sure I'm sold on the idea of closing threads, other than for questions that are duplicates of previous questions. It won't be long until there are hundreds of users with enough rep to close questions, and then there will be edit wars where one user will continuously close a thread and another will continuously re-open it. At that point a real moderator will need to step in, and then the whole self-moderation thing goes out the window.

The Fastest Gun in the West

One problem that's come up a number of times was summarized in a question and given the name The Fastest Gun in the West problem. You see a question that you know the answer to, and you take the time to write a well thought-out and researched answer. Once you click "Post your answer", you find that eight other people have answered in the meantime, and some of those answers have even been upvoted. Those answers are frequently quick and dirty ("I dunno, maybe try this"), and may even be wrong. In the long run, you'd think that your longer and better answer would get more upvotes, but it doesn't always seem to work that way. In the end, you see a question that you know the answer to, and immediately starting writing the fastest, shortest answer possible. You post pseudocode rather than actually make sure your code compiles. You say "there may be issues on Mac" rather than taking two minutes to look up what the issues are and post them. All in all, the quality of answers tends to go down.

In the long run, I think this will be less of an issue, as people's reputation gets high enough that they don't worry so much about the numbers. If you write the best answer to a question but people vote up an earlier answer more than yours, oh well. You can always leave a comment explaining why your answer is better or more complete or whatever.

Reputation is everything — or not

Your abilities on SO are based solely on your reputation score. The more people upvote your questions or answers, the more reputation you get. You also gain rep by having lots of people respond to (or even view) your questions. The idea is that if you ask smart questions and give helpful answers, you'll get a high rep score, and people will be able to trust your answers. This is meaningless for (at least) three reasons:

  1. A very knowledgable user who just joined SO last week will have a much lower rep score that someone who joined a few months ago (and this will get worse the longer the site is around).
  2. There are rep whores out there who post questions and answers willy-nilly in the hopes of gaining rep points. One upvote is worth five downvotes, so as long as you don't post utter garbage, you're bound to gain more than you lose, even if your answers aren't always that helpful. I've seen answers that were wrong but voted up anyway that began "I don't know but maybe..." Why would you upvote that kind of answer?
  3. Some questions are not programming-related, but still count towards your reputation. I am a prime example of this. As of today (Sept. 30, 2008), I have a reputation score of 1939. I have asked three questions and given 65 answers. My top answer to an actual programming question is 18, but I have five answers that are much higher than that:
    • two answers to one question ("What is your favourite programmer cartoon?") that combine for 102 upvotes
    • two more for another question ("Great programming quotes") that combine for 107, and
    • one answer at an unbelievable 123 (and still growing daily). That one is an answer to the question "Confessions of your worst WTF Moment" where I tell the story of the time I accidentally got my colleague's fingerprints inserted into the FBI database. (I blogged about that a couple of years ago.)
    The majority of my reputation (1090 points out of 1939) has come from those five non-programming questions. So most of my reputation on this question-and-answer site comes from two quotes I didn't make, two comics I didn't draw, and a story (though admittedly a pretty funny one). Less than half actually comes from questions or answers. I'm sure that I'm not the only one in this situation, though I suppose in five years, assuming I actually ask some more good questions and give some more helpful answers, it will all even out.

What's a programming question?

One of the biggest problems right now is questions that involve programmers but don't actually involve programming. There have been questions on interview tips (both from the interviewer and interviewee points of view), writing a resume, salaries, certification, and that kind of thing. There are lots of very subjective questions that don't have real answers (or at least, not a single answer), like "Should programmers have laptops or desktops?" or "What's the most influential book every programmer should read?" There have also been some other questions that might be tangentially related to programming, like questions on hardware setups or networking problems. Those ones are sometimes "justified" by things like "As a programmer, I need to have my network properly configured or I can't do my job". This may be true, but it's true for many other non-programmer jobs too. You shouldn't be able to just prefix any question with "As a programmer, ..." and automatically have it apply. As a programmer, I need to eat healthy foods, but asking about whether asparagus is better for you than broccoli isn't a valid question for SO.

When such questions appear, some folks just answer them, others downvote and complain, and sometimes a moderator will just close the question as not being a programming question. If that's the case, why is the "programming cartoons" thread still open? The guidelines (in the faq) aren't clear. Well, they try to be clear, stating "Avoid asking questions that are subjective, argumentative, or require extended discussion" and "try to refrain from asking questions about Stack Overflow itself unless you absolutely, positively have to". But searching the "subjective" tag gives you 583 questions, the vast majority of which are not closed. One question asks "How do you vent stress as a programmer?" Is that a programming question? No, and it's subjective and could require (or incite) extended discussion. So it should be avoided, according to the faq, right? But this question has received 34 (net) upvotes and 133 answers and has not been closed. There are no negative comments or answers saying that it's not a programming question. So are these kind of questions allowed, or not?

What the site really needs is forums.stackoverflow.com or something like that — a message board where you can go and discuss things. I suppose you could use the comments for that, but it would be nice to have a place to discuss things that's separate from the Q&A part of the site.

Community-owned posts

One idea that I didn't get at first was community-owned posts. When you ask a question, you can mark it as "Community-owned", and then you get no reputation points for that question. Every answer is also marked as community-owned, and so people who answer get no rep either. I get that — if you're asking a subjective question (i.e. one that might have many answers), you might mark it as community-owned so that people know you're not just trying to bump your rep score by asking such a question. In that case, it's your choice as to whether to mark the question as community-owned. The thing that I didn't get about it was that there are a few rules that will automatically turn a regular question into a community-owned question, which means that the person who asked will get no further reputation score from that question. This happens if:

  1. the asker edits the question more than five times
  2. if more than four different people edit the question, or
  3. if the question receives more than 30 answers.

I didn't like the idea that other people had control over whether I received reputation score from my own question. However, I heard Jeff talk on the podcast about the reasons behind these decisions, and now I think it's kind of clever. Here is the reasoning behind each of these rules:

  1. This is to prevent someone from continually editing their own question just so that it stays on the home page.
  2. Any question that has been edited by lots of different people was more than likely not very clear to begin with, and so rewarding the asker with rep points after others have cleaned up the question doesn't make sense.
  3. A question that has more than 30 answers is more than likely a subjective one or a poll or something similar, not a specific programming question that the site was designed for. In that case, you should not be rewarded for asking such a question.

Conclusion

Overall I'm pretty psyched about StackOverflow. I think it could be a really useful resource for every programmer. It's also a lot of fun (which is why some have begun calling it "CrackOverflow"), though it's still very new, so we'll see if the novelty wears off after a while. There's are still some kinks to work out, but once people figure out (and get comfortable with) the community moderation thing, and realize that the reputation scores don't tell you how much creedence to give an answer, I can see this site becoming one of the most popular programming sites anywhere.

4 comments:

Anonymous said...

I just learned this from SO myself, but when you Google a question and get an answer from Expert Sex Change, you can scroll all the way to the bottom and see the real answers. I did not know this before - it would have been immensely useful.

tom s. said...

"an unbelievable 123" - now 435!

Peter said...

Generally I'm not a fan of question and answer sites. They don't seem to work well, and you often have to go through a registration process before asking.

I just tried SO for a problem I was having. No registration required and my questions was answered in less than five minutes! Further, the answer was spot on and helped.

It's like calling a friend for a quick question, but in a case where you don't have a friend who knows the specific subject matter.

Anonymous said...

Stack Overflow can be a useful site to find answers. The one big problem is that is is full of anally retentive people that think "Pointing you in the right direction" is what its all about. If someone has a problem and you know the answer why not give them the answer? The site drives me mad so I rarely use it.