<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Wim&#039;s blog &#187; programming</title>
	<atom:link href="http://blog.grandtrunk.net/tag/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.grandtrunk.net</link>
	<description></description>
	<lastBuildDate>Sun, 06 Jun 2010 10:31:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Historical currency converter web service</title>
		<link>http://blog.grandtrunk.net/2010/02/historical-currency-converter-web-service/</link>
		<comments>http://blog.grandtrunk.net/2010/02/historical-currency-converter-web-service/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 17:28:28 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/?p=94</guid>
		<description><![CDATA[Looking for an excuse to try out Google AppEngine, and encouraged by someone on StackOverflow looking for a free web service to convert between currencies at historical dates, I built the Historical currency converter web service. Using a very simple RESTfull API, you can convert between all currencies on the ECB&#8217;s list, using exchange rates [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://code.google.com/appengine/images/appengine-silver-120x30.gif" alt="" width="120" height="30" />Looking for an excuse to try out <a href="https://appengine.google.com/">Google AppEngine</a>, and encouraged by someone on StackOverflow looking for a free web service to convert between currencies at historical dates, I built the <a href="http://currencies.apps.grandtrunk.net/">Historical currency converter web service</a>. Using a very simple RESTfull API, you can convert between all currencies on the <a href="http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html">ECB&#8217;s list</a>, using exchange rates that date back to January 1999.</p>
<p><span id="more-94"></span></p>
<p>My preliminary findings: Google AppEngine is really cool (obviously), and using Python again after almost <a href="http://www.qualitysheet.com">OD-ing on PHP</a> was very pleasant. I&#8217;m still learning to properly use the datastore though, setting a multi-column primary key to guarantee unique (date, currency) records wasn&#8217;t very straightforward. Also the import of the <a href="http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html">historical data</a> was a bit of a hassle with the import script timing out, until I found how the <a href="http://code.google.com/appengine/docs/python/tools/uploadingdata.html">BulkLoader</a> can automatically do this in multiple HTTP requests. Finally, getting this to run on my own domain, <code>currencies.apps.grandtrunk.net</code>, took some time until I found out the right DNS magic to set in <a href="http://www.dreamhost.com/">DreamHost</a>&#8216;s panel (if you&#8217;re interested: I&#8217;m now fully hosting <code>apps.grandtrunk.net</code>, which allows me to set the domain validation code using a normal file uploaded to DreamHost; <code>apps.grandtrunk.net</code> is also the domain I told <a href="http://www.google.com/apps/">Google Apps</a> to use, while at DreamHost I needed to set a CNAME (alias) record for <code>currencies.apps.grandtrunk.net</code> that points to <code>ghs.google.com</code>). The cron job is also humming along nicely now downloading daily updates, so <a href="http://currencies.apps.grandtrunk.net/getrate/2009-11-15/usd/zar">convert away</a> while I watch the dashboard seeing my quota trickle down&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2010/02/historical-currency-converter-web-service/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Questioner or answerer?</title>
		<link>http://blog.grandtrunk.net/2010/01/questioner-or-answerer/</link>
		<comments>http://blog.grandtrunk.net/2010/01/questioner-or-answerer/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 18:21:26 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[StackOverflow]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/?p=74</guid>
		<description><![CDATA[Yesterday on StackOverflow, I came across one of those users that kept asking questions, but didn&#8217;t really seem to understand much of the responses. Looking at his profile, it turned out he had asked over a hundred questions, but contributed less than ten answers. I won&#8217;t be tempted to start about his capabilities of actually [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday on <a href="http://stackoverflow.com/">StackOverflow</a>, I came across one of those users that kept asking questions, but didn&#8217;t really seem to understand much of the responses. Looking at his profile, it turned out he had asked over a hundred questions, but contributed less than ten answers. I won&#8217;t be tempted to start about his capabilities of actually answering any SO questions (although his understanding of other&#8217;s answers to his own questions, except when he was able to copy-paste someone&#8217;s source code, also didn&#8217;t seem to be that great), but it did get me thinking about what a &#8216;common&#8217; ratio of questions versus answers would be for other SO users (personally, I&#8217;m at 1/85 right now). Of course, that triggered my data-analysis and graphing gene&#8230;</p>
<p><span id="more-74"></span>Using the <a href="http://blog.stackoverflow.com/category/cc-wiki-dump/">StackOverflow public data set</a> (the October 2009 one, which I still had lying around from <a href="http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/">last time</a>), I set about plotting the number of questions versus the number of answers contributed by each user. This graph shows the raw results:</p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_q_a.png" alt="" width="480" height="360" /></p>
<p>Clearly, several very active users tend to &#8216;specialize&#8217; towards either the Questions or the Answers axis. One user is even up to almost 800 questions, but barely gave any answers. Can you earn reputation points that way? Sure, if they are good questions, other people will pass by that have hit the same dead-end and express their eternal gratitude of seeing their problem already solved by voting the question up. Here&#8217;s the same graph from before (except for the log-log axes) but including colors to show reputation classes:</p>
<p><img src="http://grandtrunk.net/images/so_q_a_r.png" alt="" width="480" height="360" /></p>
<p>The next graph plots each user&#8217;s reputation versus their &#8220;Q-to-A ratio&#8221; (their number of questions divided by their number of answers). For users without any answers I just plotted their number of questions (red dots).</p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_r_n.png" alt="" width="480" height="360" /></p>
<p>The vast majority of users give much more answers than they have asked questions (so they have a Q/A-ratio of less than one) and inhabit the region near the horizontal axis. Most high-reputation users also tend to answer more than they ask. In the lower reputation regions (&lt;1000) much higher Q/A-ratios can be seen (including those with an &#8216;infinite&#8217; ratio, i.e. with <em>no</em> answers).</p>
<p>(As I found out later, someone else had already <a href="http://meta.stackoverflow.com/questions/2557/which-accounts-have-more-questions-than-answers#2609">computed the top Q/A-ratios on an older SO dump</a>, although I always prefer looking at things with some graphs&#8230;)</p>
<p>Finally, here are some Q/A-ratio distributions: the first one shows the number of users in each Q/A-ratio class. (Never mind the fact that the 1/1 class seems extraordinarily large in comparison to it&#8217;s neighbours, I&#8217;ve used a pretty strange class grouping function (an arctangens, so it can show both 0 and infinity symmetrically around 0) to make it fit properly on the graph.) The distribution is clearly weighted towards the right, again showing that most users have contributed more answers than questions.</p>
<p><img src="http://grandtrunk.net/images/so_rcc.png" alt="" width="480" height="240" /></p>
<p>We can also weigh users by their reputation, so we get the fraction of total reputation that is represented by the users with a certain Q/A-ratio. The already large group on the left side of the Q/A scale becomes even more important now, meaning that most reputation is owned by users with a low (&lt;1/10) Q-to-A ratio. This is confirmed by the next graph which shows the average reputation of all users in a given Q/A class.</p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_rct.png" alt="" width="480" height="240" /></p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_rca.png" alt="" width="480" height="240" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2010/01/questioner-or-answerer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>StackOverflow user diversity</title>
		<link>http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/</link>
		<comments>http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 11:50:24 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[StackOverflow]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/?p=61</guid>
		<description><![CDATA[I&#8217;ve been wondering what the diversity of knowledge of StackOverflow users would be like. It seemed like an interesting research idea to see how many people have responded only to questions in a very narrow field, and how many others have broader knowledge and can contribute useful answers in more diverse fields. Apparently, there is [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been wondering what the diversity of knowledge of <a href="http://stackoverflow.com">StackOverflow</a> users would be like. It seemed like an interesting research idea to see how many people have responded only to questions in a very narrow field, and how many others have broader knowledge and can contribute useful answers in more diverse fields. Apparently, there is even supposed to be a badge for that (the <a href="http://stackoverflow.com/badges/15/generalist">Generalist badge</a>), but <a href="http://meta.stackoverflow.com/questions/14875/better-definition-of-generalist-badge">it didn&#8217;t get implemented yet</a>.</p>
<p>It&#8217;s easy to do this using tags: some sort of clustering should be applied according to how often each pair of tags shows up at the same question (a user that knows both ASP and ASP.net shouldn&#8217;t be considered a &#8216;diverse&#8217; person, so this should be factored out first), next we can count in how many different clusters that this user has contributed a good answer.</p>
<p><span id="more-61"></span>I&#8217;ve had a stab at trying this on the last (October 2009) <a href="http://blog.stackoverflow.com/category/cc-wiki-dump/">StackOverflow public data set</a>. I&#8217;ve ignored the SU and SF parts of the dump. The idea is to count how many of SO&#8217;s questions you could conceivably answer, given your <em>proficiency </em>in each of the tags of that question.</p>
<p>First I&#8217;ve scored all answers each user has given. 20 points go to it being the accepted answer, another 80 points are distributed over all answers to a question in relation to the answer&#8217;s votes. This is not exactly the same as the reputation earned for that answer, since popular questions get a lot more up-votes to their answers, but in this analysis this isn&#8217;t really worth much more than a good answer to an unpopular question.</p>
<p>The points for all your answers are distributed over the tags with which the question is tagged. This gives each user a number of points for all tags. These points are converted into a <em>proficiency</em> that user has for this tag: <code>proficiency = 1 - exp(-points / 500)</code>. So 1000 points (10 very good answers) gets you to 86%, more points will asymptotically get you to 100%.</p>
<p>At this point we can compute the average proficiency over all tags for each user, which is plotted in this graph (average tag proficiency versus reputation, for users with reputation &gt; 1000):</p>
<p><img class="alignnone" src="http://www.grandtrunk.net/images/so_rep_tp.png" alt="average tag proficiency versus reputation" /></p>
<p>The next step is to compute the question proficiency. This is done, for each question and each user, by taking the geometric average of this user&#8217;s tag proficiencies over all tags that this question is tagged with (I&#8217;ve choosen the geometric average rather than the arithmetic one, since you&#8217;re supposed to be knowledgable on <em>all </em>components of a question in order to answer it). These per-question proficiencies can again be averaged (arithmetically), yielding the user&#8217;s <em>average question proficiency</em> which is a measure for how many of the site&#8217;s questions he/she could answer. This graph plots average question proficiency versus reputation:</p>
<p><img class="alignnone" src="http://www.grandtrunk.net/images/so_rep_qp.png" alt="average question proficiency versus reputation" /></p>
<p>Note that this last one is a fairly heavy query (~1 second per user), so it could more feasible to base an actual implementation on tag proficiency only (see also <a href="http://www.grandtrunk.net/images/so_tp_qp.png">this graph</a> of tag vs. question proficiency), although this would overvalue knowledge of topics SO doesn&#8217;t care much about.</p>
<p>Analysing the question proficiency vs. reputation graph, we see that both are clearly related (if you answer enough questions, you&#8217;re bound to have covered most of the tags). Still, some users reach a possible threshold value of 20% question proficiency at a reputation of only 10,000; while others couldn&#8217;t get there even at 60,000 reps.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
