<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Wim&#039;s blog &#187; Research</title>
	<atom:link href="http://blog.grandtrunk.net/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.grandtrunk.net</link>
	<description></description>
	<lastBuildDate>Sun, 06 Jun 2010 10:31:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Questioner or answerer?</title>
		<link>http://blog.grandtrunk.net/2010/01/questioner-or-answerer/</link>
		<comments>http://blog.grandtrunk.net/2010/01/questioner-or-answerer/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 18:21:26 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[StackOverflow]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/?p=74</guid>
		<description><![CDATA[Yesterday on StackOverflow, I came across one of those users that kept asking questions, but didn&#8217;t really seem to understand much of the responses. Looking at his profile, it turned out he had asked over a hundred questions, but contributed less than ten answers. I won&#8217;t be tempted to start about his capabilities of actually [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday on <a href="http://stackoverflow.com/">StackOverflow</a>, I came across one of those users that kept asking questions, but didn&#8217;t really seem to understand much of the responses. Looking at his profile, it turned out he had asked over a hundred questions, but contributed less than ten answers. I won&#8217;t be tempted to start about his capabilities of actually answering any SO questions (although his understanding of other&#8217;s answers to his own questions, except when he was able to copy-paste someone&#8217;s source code, also didn&#8217;t seem to be that great), but it did get me thinking about what a &#8216;common&#8217; ratio of questions versus answers would be for other SO users (personally, I&#8217;m at 1/85 right now). Of course, that triggered my data-analysis and graphing gene&#8230;</p>
<p><span id="more-74"></span>Using the <a href="http://blog.stackoverflow.com/category/cc-wiki-dump/">StackOverflow public data set</a> (the October 2009 one, which I still had lying around from <a href="http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/">last time</a>), I set about plotting the number of questions versus the number of answers contributed by each user. This graph shows the raw results:</p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_q_a.png" alt="" width="480" height="360" /></p>
<p>Clearly, several very active users tend to &#8216;specialize&#8217; towards either the Questions or the Answers axis. One user is even up to almost 800 questions, but barely gave any answers. Can you earn reputation points that way? Sure, if they are good questions, other people will pass by that have hit the same dead-end and express their eternal gratitude of seeing their problem already solved by voting the question up. Here&#8217;s the same graph from before (except for the log-log axes) but including colors to show reputation classes:</p>
<p><img src="http://grandtrunk.net/images/so_q_a_r.png" alt="" width="480" height="360" /></p>
<p>The next graph plots each user&#8217;s reputation versus their &#8220;Q-to-A ratio&#8221; (their number of questions divided by their number of answers). For users without any answers I just plotted their number of questions (red dots).</p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_r_n.png" alt="" width="480" height="360" /></p>
<p>The vast majority of users give much more answers than they have asked questions (so they have a Q/A-ratio of less than one) and inhabit the region near the horizontal axis. Most high-reputation users also tend to answer more than they ask. In the lower reputation regions (&lt;1000) much higher Q/A-ratios can be seen (including those with an &#8216;infinite&#8217; ratio, i.e. with <em>no</em> answers).</p>
<p>(As I found out later, someone else had already <a href="http://meta.stackoverflow.com/questions/2557/which-accounts-have-more-questions-than-answers#2609">computed the top Q/A-ratios on an older SO dump</a>, although I always prefer looking at things with some graphs&#8230;)</p>
<p>Finally, here are some Q/A-ratio distributions: the first one shows the number of users in each Q/A-ratio class. (Never mind the fact that the 1/1 class seems extraordinarily large in comparison to it&#8217;s neighbours, I&#8217;ve used a pretty strange class grouping function (an arctangens, so it can show both 0 and infinity symmetrically around 0) to make it fit properly on the graph.) The distribution is clearly weighted towards the right, again showing that most users have contributed more answers than questions.</p>
<p><img src="http://grandtrunk.net/images/so_rcc.png" alt="" width="480" height="240" /></p>
<p>We can also weigh users by their reputation, so we get the fraction of total reputation that is represented by the users with a certain Q/A-ratio. The already large group on the left side of the Q/A scale becomes even more important now, meaning that most reputation is owned by users with a low (&lt;1/10) Q-to-A ratio. This is confirmed by the next graph which shows the average reputation of all users in a given Q/A class.</p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_rct.png" alt="" width="480" height="240" /></p>
<p><img class="alignnone" src="http://grandtrunk.net/images/so_rca.png" alt="" width="480" height="240" /></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2010/01/questioner-or-answerer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>StackOverflow user diversity</title>
		<link>http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/</link>
		<comments>http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 11:50:24 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[StackOverflow]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/?p=61</guid>
		<description><![CDATA[I&#8217;ve been wondering what the diversity of knowledge of StackOverflow users would be like. It seemed like an interesting research idea to see how many people have responded only to questions in a very narrow field, and how many others have broader knowledge and can contribute useful answers in more diverse fields. Apparently, there is [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been wondering what the diversity of knowledge of <a href="http://stackoverflow.com">StackOverflow</a> users would be like. It seemed like an interesting research idea to see how many people have responded only to questions in a very narrow field, and how many others have broader knowledge and can contribute useful answers in more diverse fields. Apparently, there is even supposed to be a badge for that (the <a href="http://stackoverflow.com/badges/15/generalist">Generalist badge</a>), but <a href="http://meta.stackoverflow.com/questions/14875/better-definition-of-generalist-badge">it didn&#8217;t get implemented yet</a>.</p>
<p>It&#8217;s easy to do this using tags: some sort of clustering should be applied according to how often each pair of tags shows up at the same question (a user that knows both ASP and ASP.net shouldn&#8217;t be considered a &#8216;diverse&#8217; person, so this should be factored out first), next we can count in how many different clusters that this user has contributed a good answer.</p>
<p><span id="more-61"></span>I&#8217;ve had a stab at trying this on the last (October 2009) <a href="http://blog.stackoverflow.com/category/cc-wiki-dump/">StackOverflow public data set</a>. I&#8217;ve ignored the SU and SF parts of the dump. The idea is to count how many of SO&#8217;s questions you could conceivably answer, given your <em>proficiency </em>in each of the tags of that question.</p>
<p>First I&#8217;ve scored all answers each user has given. 20 points go to it being the accepted answer, another 80 points are distributed over all answers to a question in relation to the answer&#8217;s votes. This is not exactly the same as the reputation earned for that answer, since popular questions get a lot more up-votes to their answers, but in this analysis this isn&#8217;t really worth much more than a good answer to an unpopular question.</p>
<p>The points for all your answers are distributed over the tags with which the question is tagged. This gives each user a number of points for all tags. These points are converted into a <em>proficiency</em> that user has for this tag: <code>proficiency = 1 - exp(-points / 500)</code>. So 1000 points (10 very good answers) gets you to 86%, more points will asymptotically get you to 100%.</p>
<p>At this point we can compute the average proficiency over all tags for each user, which is plotted in this graph (average tag proficiency versus reputation, for users with reputation &gt; 1000):</p>
<p><img class="alignnone" src="http://www.grandtrunk.net/images/so_rep_tp.png" alt="average tag proficiency versus reputation" /></p>
<p>The next step is to compute the question proficiency. This is done, for each question and each user, by taking the geometric average of this user&#8217;s tag proficiencies over all tags that this question is tagged with (I&#8217;ve choosen the geometric average rather than the arithmetic one, since you&#8217;re supposed to be knowledgable on <em>all </em>components of a question in order to answer it). These per-question proficiencies can again be averaged (arithmetically), yielding the user&#8217;s <em>average question proficiency</em> which is a measure for how many of the site&#8217;s questions he/she could answer. This graph plots average question proficiency versus reputation:</p>
<p><img class="alignnone" src="http://www.grandtrunk.net/images/so_rep_qp.png" alt="average question proficiency versus reputation" /></p>
<p>Note that this last one is a fairly heavy query (~1 second per user), so it could more feasible to base an actual implementation on tag proficiency only (see also <a href="http://www.grandtrunk.net/images/so_tp_qp.png">this graph</a> of tag vs. question proficiency), although this would overvalue knowledge of topics SO doesn&#8217;t care much about.</p>
<p>Analysing the question proficiency vs. reputation graph, we see that both are clearly related (if you answer enough questions, you&#8217;re bound to have covered most of the tags). Still, some users reach a possible threshold value of 20% question proficiency at a reputation of only 10,000; while others couldn&#8217;t get there even at 60,000 reps.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2009/11/stackoverflow-user-diversity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TOP500 list by interconnect</title>
		<link>http://blog.grandtrunk.net/2009/08/top500-list-by-interconnect/</link>
		<comments>http://blog.grandtrunk.net/2009/08/top500-list-by-interconnect/#comments</comments>
		<pubDate>Wed, 26 Aug 2009 19:29:35 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/?p=22</guid>
		<description><![CDATA[While attending a Hot Interconnects talk on supercomputing, I got the following idea. The TOP500 site provides graphs of the number of systems and total performance per interconnect family, which shows an approximate measure of the popularity of the different interconnects. But how do they affect the performance of an individual system? Clearly, a high-performance interconnect [...]]]></description>
			<content:encoded><![CDATA[<p>While attending a <a title="Hot Interconnects" href="http://www.hoti.org/">Hot Interconnects</a> talk on supercomputing, I got the following idea. The <a href="http://www.top500.org/">TOP500</a> site provides <a href="http://www.top500.org/charts/list/34/conn">graphs of the number of systems and total performance per interconnect family</a>, which shows an approximate measure of the popularity of the different interconnects. But how do they affect the performance of an individual system? Clearly, a high-performance interconnect should result in higher efficiency than a commodity one. But by how much? And which systems would use what type of interconnect?</p>
<p><span id="more-22"></span>For the following set of graphs, I downloaded the XML file for the current TOP500 list (since updated to the November 2009 list). I decided to look at each system&#8217;s LINPACK performance (<em>Rmax</em>) compared with its <em>peak</em> performance (<em>Rpeak</em>) which is what you get when adding the theoretical performance of all processor cores in the system. The result should be a measure of how efficiently the system&#8217;s total computational power can be used, which is mostly a function of the interconnect (also software, but for a TOP500 listing we would expect that to be rather good!).</p>
<p>So here is the graph:</p>
<p><img class="alignnone" src="http://www.grandtrunk.net/images/top500.png" alt=""  /></p>
<p>Here&#8217;s the same data but this time efficiency (<em>Rmax/Rpeak</em>) is plotted directly:</p>
<p><img class="alignnone" src="http://www.grandtrunk.net/images/top500_eff.png" alt=""  /></p>
<p>For top-of-the-list machines, Infiniband and proprietary interconnects (mainly BlueGene, Cray and SGI&#8217;s NUMAlink) are the most common. Their efficiency is rather similar at 75-80% for the large machines, and up to 95% for smaller ones. The main alternative interconnection technology, Ethernet, only starts at to be in common use from system <a href="http://top500.org/system/9987">#98</a> onwards. Clearly, the efficiency of these systems is significantly less, at some 55%.</p>
<p>Two main outliers to this theory can be seen. <a href="http://top500.org/system/10186">System #5</a> uses Infiniband but manages to get only a 46% efficiency. This is the <strong>Tianhe-1</strong> cluster located in Tianjin, China. It consists of 5120 ATI Radeon HD 4870 video cards, in addition to a number of Intel Xeon CPUs. I would guess that the software for this rather radical new architecture (especially at this scale!) is not yet fully up-to-date, and that this system should get a boost in <em>Rmax</em> by the time of the next TOP500 list next June.</p>
<p>The other exception is <a href="http://top500.org/system/10269">#486</a>, again a Chinese system, which uses a 10-Gbit Ethernet network (rather than 1-Gb Ethernet for the others &#8211; at least, as far as I could tell from parsing the list). Clearly, the faster Ethernet technology significantly improves efficiency: at 70% it comes much closer to the Inifiband and proprietary-interconnect powered systems. When compared to 1GbE, the difference between 10GbE and Infiniband is minimal though, and also the price comes pretty close. So I guess, in the end, you do get what you pay for&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2009/08/top500-list-by-interconnect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Practical Compressor Test</title>
		<link>http://blog.grandtrunk.net/2004/07/practical-compressor-test/</link>
		<comments>http://blog.grandtrunk.net/2004/07/practical-compressor-test/#comments</comments>
		<pubDate>Mon, 05 Jul 2004 16:19:00 +0000</pubDate>
		<dc:creator>Wim</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://blog.grandtrunk.net/2009/12/practical-compressor-test/</guid>
		<description><![CDATA[Welcome to the &#8216;Practical Compressor Test&#8217;. Unlike some other compressor comparison sites, I won&#8217;t be looking for a compressor offering for the last bit of compression. Instead I&#8217;ll try to find the most practical compressor out there. This means compression and decompression times are taken into account, so PAQAR and the like, which can achieve very good compression at [...]]]></description>
			<content:encoded><![CDATA[<p>Welcome to the &#8216;Practical Compressor Test&#8217;. Unlike some other compressor comparison sites, I won&#8217;t be looking for a compressor offering for the last bit of compression. Instead I&#8217;ll try to find the most practical compressor out there. This means compression and decompression times are taken into account, so <a href="http://www2.cs.fit.edu/~mmahoney/compression/">PAQAR</a> and the like, which can achieve very good compression at the expense of insanely long run times (several hours on this benchmark!) are <strong>not</strong> considered.</p>
<p>Instead I&#8217;ll be focusing on very well known, established compressors that are easily obtained (I only use precompiled packages and won&#8217;t build from source) and have reasonable run times. Also I won&#8217;t try every combination of compression options but limit the test to one general option (-1 to -9 for gzip and bzip2, -m1 to -m5 for RAR, &#8230;).</p>
<p><span id="more-5"></span></p>
<h3>Updates</h3>
<ul>
<li>2007/04/25: Updated <strong>7-zip</strong> to version 4.45: small improvements in speed and compression ratio</li>
<li>2006/06/08: Reran all tests on a Pentium D 830 (dual-core, 3 GHz, 2x 1MiB L2) and 2 GiB RAM</li>
<li>2006/02/13: Updated <strong>7-zip</strong> to version 4.33: both faster compression (up to 20%) and better compression ratios (a few %)</li>
<li>2005/11/27: Updated <strong>7-zip</strong> to version 4.30 beta: high compression (-mx=7 &#8230; -mx=9) is now up to 40% faster</li>
<li>2005/11/03: Updated <strong>7-zip</strong> to version 4.29 beta: no changes</li>
<li>2005/04/12: Updated <strong>7-zip</strong> to version 4.16: the ratio is the same, but compression and decompression are now 30% to 40% faster!</li>
<li>2004/11/10: Added the <strong>7-zip</strong> compressor</li>
<li>2004/09/30: Added the <strong>lzop</strong> compression algorithm</li>
</ul>
<h3>The Test</h3>
<p>Just like Johan De Bock&#8217;s excellent <a href="http://uclc.info/gimp_source_compression_test.htm">GIMP Source Compression Test</a>, I&#8217;ll be using the <a href="ftp://ftp.gimp.org/pub/gimp/v2.0/gimp-2.0.0.tar.bz2">GIMP 2.0 Source tarfile</a>:</p>
<ul>
<li>The test file: <a href="ftp://ftp.gimp.org/pub/gimp/v2.0/gimp-2.0.0.tar.bz2">GIMP 2.0.0</a> Sources as one TAR</li>
<li>MD5-Hash: d2a1c33317fb57bbed3641671b2da163</li>
<li>Total size: 78,745,600 Bytes</li>
</ul>
<p>Each program is used to compress and decompress the file with each of the selected command line switches. The time required for both compression and decompression is measured, as is the size of the resulting archive. To have an idea of the accuracy of the timing measurements, the test is repeated three times and the minimum runtime (= the one with the least disturbances) is reported. All this is done on a Pentium D 830 (3.0 GHz) processor with 2 GiB RAM, running Fedora Core 5 Linux. Only one CPU is used for the tests (using -mmt=off for 7-zip, default for other compressors).</p>
<h3>The Contenders</h3>
<table>
<tbody>
<tr>
<th>Name</th>
<th>Version</th>
<th>Switches</th>
</tr>
<tr class="even">
<td><a href="http://www.gzip.org/">gzip</a></td>
<td>1.3.5</td>
<td>-1 &#8230; -9</td>
</tr>
<tr class="odd">
<td><a href="http://www.bzip.org/">bzip2</a></td>
<td>1.0.3</td>
<td>-1 &#8230; -9</td>
</tr>
<tr class="even">
<td><a href="http://www.rarlab.com/">RAR</a></td>
<td>3.60b2</td>
<td>-m1 &#8230; -m5</td>
</tr>
<tr class="odd">
<td><a href="http://www.info-zip.org/pub/infozip/Zip.html">Zip</a></td>
<td>2.31</td>
<td>-1 &#8230; -9</td>
</tr>
<tr class="even">
<td>(N)compress</td>
<td>4.2.4</td>
<td>(none)</td>
</tr>
<tr class="odd">
<td><a href="http://p7zip.sourceforge.net/">7-zip</a></td>
<td>4.42</td>
<td>-mx=1 &#8230; -mx=9</td>
</tr>
<tr class="even">
<td><a href="http://www.lzop.org/">lzop</a></td>
<td>1.02rc1</td>
<td>-1 &#8230; -9</td>
</tr>
<tr class="odd">
<td>Zoo</td>
<td>2.1</td>
<td>(none) and -h</td>
</tr>
</tbody>
</table>
<p>The versions aren&#8217;t always the latest and the greatest, but they are – again in the spirit of being &#8216;practical&#8217; – the most recent ones installed by the Fedora Core 5 distribution.</p>
<h3>Results – graphs</h3>
<p>These graphs show the results for the various compressors and their switches. Compression and decompression time, respectively, are in the horizontal axis, compression efficiency (compressed size / original size, so smaller is better) is on the vertical axis.</p>
<p><img class="styled alignnone" src="http://grandtrunk.net/compression/compress.png" alt="" width="480" height="360" /><br />
<img class="styled alignnone" src="http://grandtrunk.net/compression/decompress.png" alt="" width="480" height="360" /></p>
<h3>Conclusions</h3>
<ul>
<li>The best compressor can be found near the lower (efficient) left (fast) corner. You&#8217;re right: there isn&#8217;t any! All real compressors are either inefficient (gzip), slow (lzma/7-zip), or moderate on both counts (zip). This means your ideal compressor will depend on how you value speed against efficiency.</li>
<li>gzip is consistently faster but less efficient than bzip2. RAR is on all counts better than bzip2, so that&#8217;s probably why you have to pay for it <img src='http://blog.grandtrunk.net/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </li>
<li>The simple switches used give you a decent choice between speed and efficiency, however, the effects are usually smaller than the differences between different compression algorithms. gzip and zip use the same algorithm so compression ratios are similar, but the timing depends on the optimizations used in the actual implementation.</li>
<li>Between the GNU compressors, the sequence lzop – gzip – bzip2 – 7zip gives you a wide range of speed / ratio trade-off. compress and zoo are obviously outdated, they are at the same time slower and less efficient than gzip.</li>
</ul>
<h3>Ranking</h3>
<p>As we have already seen, your ideal compressor will depend on what you use it for. This ranking can be customized to what you want to do. It will compute the time required for 1 compression, a number of downloads over a network and the following decompressions.</p>
<p><em>total time = compression time + n * (compressed file size / network speed + decompression time)</em></p>
<p>For instance, if you compress a file to send it over a network once, <em>n</em> equals one and compression time will have a big influence. If you want to post a file to be downloaded many times, <em>n</em> is big so long compression times will weigh less in the final decision. Finally, slow networks will do best with a slow but efficient algorithm, while for fast networks a speedy, possibly less efficient algorithm is needed.</p>
<p><a name="ranking"></a><br />

    <style>
      #practical-compression img { background: none; border: none; padding: 0; }
      #practical-compression input[type="text"] { width: 10ex; }
    </style>
    <div id='practical-compression'>
      <form action='#ranking' method='get' style='text-align: left'>
        <ul>
          <li>One compression</li>
          <li>Network speed: <input type='text' name='network' value='1000' size='5'/> kbps
            or 0 for infinite speed (i.e. no network transmission time is counted)</li>
          <li><input type='text' name='decomp' value='1' size='5'/>
            download/decompression cycle(s), or -1 for <b>only</b> download and decompression (i.e. no compression time is counted)</li>
        </ul>
        <input type='submit' value='Show me!'/>
      </form>

      Network speed: <b>1000 kbps</b>     &nbsp;&nbsp;
      Downloads/decompressions: <b>1</b><br/>

      <div style='width: 350px; height: 300px; overflow: auto; border: solid 1px #eee'><img src='http://chart.apis.google.com/chart?cht=bhs&chs=300x1000&chco=ff0000%2C00ff00%2C0000ff&chbh=a&chd=t%3A0%2C0%2C1%2C4%2C4%2C2%2C2%2C2%2C2%2C2%2C1%2C2%2C0%2C2%2C1%2C2%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C3%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%2C0%7C34%2C32%2C37%2C31%2C30%2C40%2C40%2C41%2C42%2C43%2C29%2C44%2C53%2C45%2C50%2C47%2C56%2C56%2C57%2C56%2C56%2C57%2C55%2C55%2C58%2C58%2C50%2C55%2C26%2C55%2C27%2C62%2C62%2C64%2C64%2C66%2C66%2C64%2C62%2C63%2C63%2C81%2C82%2C82%2C82%2C82%2C82%2C90%2C97%7C6%2C9%2C5%2C10%2C11%2C9%2C9%2C9%2C9%2C8%2C24%2C8%2C1%2C7%2C5%2C7%2C2%2C2%2C1%2C2%2C3%2C1%2C4%2C4%2C1%2C1%2C7%2C5%2C33%2C5%2C33%2C1%2C1%2C1%2C1%2C1%2C0%2C5%2C8%2C10%2C11%2C0%2C0%2C0%2C0%2C0%2C0%2C1%2C1&chxt=y%2Cy&chxl=0%3A%7C-m2%7C-m3%7C-mx%3D3%7C-m4%7C-m5%7C-9%7C-8%7C-7%7C-6%7C-5%7C-mx%3D5%7C-4%7C-m1%7C-3%7C-mx%3D1%7C-2%7C-6%7C-6%7C-5%7C-7%7C-7%7C-5%7C-8%7C-8%7C-4%7C-4%7C-1%7C-9%7C-mx%3D9%7C-9%7C-mx%3D7%7C-3%7C-3%7C-2%7C-2%7C-1%7C-1%7C-7%7Ch%7C-8%7C-9%7C-1%7C-3%7C-4%7C-6%7C-5%7C-2%7C-f%7Cq%7C1%3A%7Crar%7Crar%7C7za%7Crar%7Crar%7Cbzip2%7Cbzip2%7Cbzip2%7Cbzip2%7Cbzip2%7C7za%7Cbzip2%7Crar%7Cbzip2%7C7za%7Cbzip2%7Czip%7Cgzip%7Czip%7Cgzip%7Czip%7Cgzip%7Cgzip%7Czip%7Czip%7Cgzip%7Cbzip2%7Cgzip%7C7za%7Czip%7C7za%7Czip%7Cgzip%7Czip%7Cgzip%7Czip%7Cgzip%7Clzop%7Czoo%7Clzop%7Clzop%7Clzop%7Clzop%7Clzop%7Clzop%7Clzop%7Clzop%7Ccompress%7Czoo%7C' alt='chart' /></div>
<br/><img src='http://www.grandtrunk.net/gifs/dot_red.gif' width='10' height='10' alt='red' /> Compression time    &nbsp;&nbsp;<img src='http://www.grandtrunk.net/gifs/dot_green.gif' width='10' height='10' alt='green' /> Transmission time &nbsp;&nbsp;<img src='http://www.grandtrunk.net/gifs/dot_blue.gif' width='10' height='10' alt='blue' /> Decompression time</div>
</p>
<h3>Optimal compressors</h3>
<p>For any bandwidth / #downloads combination we can now determine the optimal compressor.</p>
<p><img class="styled alignnone" src="http://grandtrunk.net/compression/best.png" alt="" width="640" height="360" /><br />
<img class="alignnone styled" src="http://grandtrunk.net/compression/bestfree.png" alt="" width="640" height="360" /></p>
<h3>Related sites</h3>
<ul>
<li><a href="http://uclc.info/">Ultimate Command Line Compressors</a>, comparing almost all existing algorithms on efficiency, by Johan De Bock.</li>
<li> <a href="http://www.compression-links.info/">Compression Links Info</a>, a good all-round site about everything related to compression.</li>
<li> <a href="http://en.wikipedia.org/wiki/Data_compression">The Wikipedia article on data compression</a></li>
</ul>
<h3>Results – table</h3>
<table>
<tbody>
<tr>
<th>Algorithm</th>
<th>Effort</th>
<th>Compression time (s)</th>
<th>Decompression time (s)</th>
<th>Compression ratio</th>
</tr>
<tr class="even">
<td>gzip</td>
<td>-1</td>
<td>2.6</td>
<td>1.8</td>
<td>27.7%</td>
</tr>
<tr class="odd">
<td></td>
<td>-2</td>
<td>2.8</td>
<td>1.7</td>
<td>26.7%</td>
</tr>
<tr class="even">
<td></td>
<td>-3</td>
<td>3.2</td>
<td>1.7</td>
<td>26.0%</td>
</tr>
<tr class="odd">
<td></td>
<td>-4</td>
<td>3.9</td>
<td>1.6</td>
<td>24.5%</td>
</tr>
<tr class="even">
<td></td>
<td>-5</td>
<td>4.7</td>
<td>1.6</td>
<td>23.8%</td>
</tr>
<tr class="odd">
<td></td>
<td>-6</td>
<td>6.2</td>
<td>1.5</td>
<td>23.4%</td>
</tr>
<tr class="even">
<td></td>
<td>-7</td>
<td>7.3</td>
<td>1.5</td>
<td>23.3%</td>
</tr>
<tr class="odd">
<td></td>
<td>-8</td>
<td>10.9</td>
<td>1.5</td>
<td>23.2%</td>
</tr>
<tr class="even">
<td></td>
<td>-9</td>
<td>13.3</td>
<td>1.5</td>
<td>23.2%</td>
</tr>
<tr class="odd">
<td>bzip2</td>
<td>-1</td>
<td>18.4</td>
<td>8.4</td>
<td>21.2%</td>
</tr>
<tr class="even">
<td></td>
<td>-2</td>
<td>19.1</td>
<td>7.4</td>
<td>19.9%</td>
</tr>
<tr class="odd">
<td></td>
<td>-3</td>
<td>19.9</td>
<td>7.0</td>
<td>19.1%</td>
</tr>
<tr class="even">
<td></td>
<td>-4</td>
<td>21.0</td>
<td>6.9</td>
<td>18.4%</td>
</tr>
<tr class="odd">
<td></td>
<td>-5</td>
<td>22.1</td>
<td>6.7</td>
<td>17.9%</td>
</tr>
<tr class="even">
<td></td>
<td>-6</td>
<td>23.8</td>
<td>6.5</td>
<td>17.5%</td>
</tr>
<tr class="odd">
<td></td>
<td>-7</td>
<td>23.9</td>
<td>6.4</td>
<td>17.2%</td>
</tr>
<tr class="even">
<td></td>
<td>-8</td>
<td>24.8</td>
<td>6.4</td>
<td>16.9%</td>
</tr>
<tr class="odd">
<td></td>
<td>-9</td>
<td>26.0</td>
<td>6.2</td>
<td>16.8%</td>
</tr>
<tr class="even">
<td>rar</td>
<td>-m1</td>
<td>4.0</td>
<td>1.4</td>
<td>22.1%</td>
</tr>
<tr class="odd">
<td></td>
<td>-m2</td>
<td>17.4</td>
<td>1.0</td>
<td>14.5%</td>
</tr>
<tr class="even">
<td></td>
<td>-m3</td>
<td>25.4</td>
<td>1.0</td>
<td>13.4%</td>
</tr>
<tr class="odd">
<td></td>
<td>-m4</td>
<td>27.9</td>
<td>11.6</td>
<td>13.1%</td>
</tr>
<tr class="even">
<td></td>
<td>-m5</td>
<td>31.1</td>
<td>12.8</td>
<td>12.5%</td>
</tr>
<tr class="odd">
<td>zip</td>
<td>-1</td>
<td>2.8</td>
<td>1.0</td>
<td>27.7%</td>
</tr>
<tr class="even">
<td></td>
<td>-2</td>
<td>3.0</td>
<td>0.9</td>
<td>26.7%</td>
</tr>
<tr class="odd">
<td></td>
<td>-3</td>
<td>3.5</td>
<td>0.9</td>
<td>26.0%</td>
</tr>
<tr class="even">
<td></td>
<td>-4</td>
<td>4.1</td>
<td>0.9</td>
<td>24.5%</td>
</tr>
<tr class="odd">
<td></td>
<td>-5</td>
<td>5.0</td>
<td>0.9</td>
<td>23.8%</td>
</tr>
<tr class="even">
<td></td>
<td>-6</td>
<td>6.7</td>
<td>0.8</td>
<td>23.4%</td>
</tr>
<tr class="odd">
<td></td>
<td>-7</td>
<td>8.0</td>
<td>0.8</td>
<td>23.3%</td>
</tr>
<tr class="even">
<td></td>
<td>-8</td>
<td>12.0</td>
<td>0.8</td>
<td>23.2%</td>
</tr>
<tr class="odd">
<td></td>
<td>-9</td>
<td>14.7</td>
<td>0.8</td>
<td>23.2%</td>
</tr>
<tr class="even">
<td>7za</td>
<td>-mx=1</td>
<td>14.0</td>
<td>4.4</td>
<td>20.8%</td>
</tr>
<tr class="odd">
<td></td>
<td>-mx=3</td>
<td>15.7</td>
<td>3.4</td>
<td>15.7%</td>
</tr>
<tr class="even">
<td></td>
<td>-mx=5</td>
<td>65.1</td>
<td>2.7</td>
<td>12.1%</td>
</tr>
<tr class="odd">
<td></td>
<td>-mx=7</td>
<td>87.8</td>
<td>2.6</td>
<td>11.5%</td>
</tr>
<tr class="even">
<td></td>
<td>-mx=9</td>
<td>88.2</td>
<td>2.5</td>
<td>11.2%</td>
</tr>
<tr class="odd">
<td>lzop</td>
<td>-1</td>
<td>0.9</td>
<td>1.1</td>
<td>34.1%</td>
</tr>
<tr class="even">
<td></td>
<td>-2</td>
<td>0.9</td>
<td>1.1</td>
<td>34.3%</td>
</tr>
<tr class="odd">
<td></td>
<td>-3</td>
<td>0.9</td>
<td>1.0</td>
<td>34.3%</td>
</tr>
<tr class="even">
<td></td>
<td>-4</td>
<td>0.9</td>
<td>1.0</td>
<td>34.3%</td>
</tr>
<tr class="odd">
<td></td>
<td>-5</td>
<td>0.9</td>
<td>1.1</td>
<td>34.3%</td>
</tr>
<tr class="even">
<td></td>
<td>-6</td>
<td>0.9</td>
<td>1.0</td>
<td>34.3%</td>
</tr>
<tr class="odd">
<td></td>
<td>-7</td>
<td>13.3</td>
<td>0.8</td>
<td>26.6%</td>
</tr>
<tr class="even">
<td></td>
<td>-8</td>
<td>27.6</td>
<td>0.8</td>
<td>26.3%</td>
</tr>
<tr class="odd">
<td></td>
<td>-9</td>
<td>31.4</td>
<td>0.8</td>
<td>26.3%</td>
</tr>
<tr class="even">
<td>compress</td>
<td>-f</td>
<td>3.4</td>
<td>1.2</td>
<td>37.8%</td>
</tr>
<tr class="odd">
<td>zoo</td>
<td>q</td>
<td>4.8</td>
<td>1.4</td>
<td>40.6%</td>
</tr>
<tr class="even">
<td></td>
<td>h</td>
<td>23.5</td>
<td>2.0</td>
<td>25.8%</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://blog.grandtrunk.net/2004/07/practical-compressor-test/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
