<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>LiraNuna's Development Blog</title>
	<atom:link href="http://www.liranuna.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.liranuna.com</link>
	<description>Just another coder</description>
	<lastBuildDate>Sun, 03 Mar 2013 21:01:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>PHP Path Resolution &#8211; Now PHP 5.3 compatible!</title>
		<link>http://www.liranuna.com/php-path-resolution-now-php-5-3-compatible/</link>
		<comments>http://www.liranuna.com/php-path-resolution-now-php-5-3-compatible/#comments</comments>
		<pubDate>Sun, 29 Aug 2010 07:52:40 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Releases]]></category>
		<category><![CDATA[path resolution]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=1125</guid>
		<description><![CDATA[Presenting the PHP 5.3 compatible version of my PHP path resolution class! Why there should be a difference? Because my original version used create_function, which&#8230; created a whole new function object every time you used one of the class&#8217;s methods. This was very memory expensive and even incurred a slight performance hit. The new version [...]]]></description>
				<content:encoded><![CDATA[<p>Presenting the PHP 5.3 compatible version of my PHP path resolution class!</p>
<p>Why there should be a difference? Because my <a href="http://www.liranuna.com/php-path-resolution-class-relative-paths-made-easy/">original version</a> used <a href="http://php.net/manual/en/function.create-function.php">create_function</a>, which&#8230; created a whole new function object every time you used one of the class&#8217;s methods. This was very memory expensive and even incurred a slight performance hit.</p>
<p><span id="more-1125"></span></p>
<p>The new version uses PHP 5.3&#8242;s <a href="http://www.php.net/manual/en/functions.anonymous.php">anonymous functions </a>to create more readable code in addition to reducing memory consumption and execution time.</p>
<p>As usual, the source code is under the <a href="http://sam.zoy.org/wtfpl/">WTFPL</a> for you to enjoy without any restrictions.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
&nbsp;
<span style="color: #009933; font-style: italic;">/**
 * @class Path
 *
 * @brief Utility class that handles file and directory pathes
 *
 * This class handles basic important operations done to file system paths.
 * It safely renders relative pathes and removes all ambiguity from a relative path.
 *
 * @author Liran Nuna
 */</span>
final <span style="color: #000000; font-weight: bold;">class</span> Path
<span style="color: #009900;">&#123;</span>
	<span style="color: #009933; font-style: italic;">/**
	 * Returns the parent path of this path.
	 * &quot;/path/to/directory&quot; will return &quot;/path/to&quot;
	 *
	 * @arg $path	The path to retrieve the parent path from
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">self</span><span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Returns the last item on the path.
	 * &quot;/path/to/directory&quot; will return &quot;directory&quot;
	 *
	 * @arg $path	The path to retrieve the base from
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">self</span><span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Normalizes the path for safe usage
	 * This function does several operations to the given path:
	 *   * Removes unnecessary slashes (///path//to/////directory////)
	 *   * Removes current directory references (/path/././to/./directory/./././)
	 *   * Renders relative pathes (/path/from/../to/somewhere/in/../../directory)
	 *
	 * @arg $path	The path to normalize
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> normalize<span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #990000;">array_reduce</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$a</span><span style="color: #339933;">,</span> <span style="color: #000088;">$b</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$a</span> <span style="color: #339933;">===</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span>
				<span style="color: #000088;">$a</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;/&quot;</span><span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$b</span> <span style="color: #339933;">===</span> <span style="color: #0000ff;">&quot;&quot;</span> <span style="color: #339933;">||</span> <span style="color: #000088;">$b</span> <span style="color: #339933;">===</span> <span style="color: #0000ff;">&quot;.&quot;</span><span style="color: #009900;">&#41;</span>
				<span style="color: #b1b100;">return</span> <span style="color: #000088;">$a</span><span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$b</span> <span style="color: #339933;">===</span> <span style="color: #0000ff;">&quot;..&quot;</span><span style="color: #009900;">&#41;</span>
				<span style="color: #b1b100;">return</span> <span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$a</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #b1b100;">return</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/\/+/&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;/&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;<span style="color: #006699; font-weight: bold;">$a</span>/<span style="color: #006699; font-weight: bold;">$b</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Combines a list of pathes to one safe path
	 *
	 * @arg $root	The path or array with values to combine into a single path
	 * @arg ...		Relative pathes to root or arrays
	 *
	 * @note		This function works with multi-dimentional arrays recursively.
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> combine<span style="color: #009900;">&#40;</span><span style="color: #000088;">$root</span><span style="color: #339933;">,</span> <span style="color: #000088;">$rel1</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$arguments</span> <span style="color: #339933;">=</span> <span style="color: #990000;">func_get_args</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">self</span><span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">array_reduce</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$arguments</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$a</span><span style="color: #339933;">,</span><span style="color: #000088;">$b</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">is_array</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$a</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
				<span style="color: #000088;">$a</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_reduce</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$a</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'Path::combine'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">is_array</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$b</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
				<span style="color: #000088;">$b</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_reduce</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$b</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'Path::combine'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #b1b100;">return</span> <span style="color: #0000ff;">&quot;<span style="color: #006699; font-weight: bold;">$a</span>/<span style="color: #006699; font-weight: bold;">$b</span>&quot;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Empty, private constructor, to prevent instantiation
	 */</span>
	<span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">function</span> __construct<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #666666; font-style: italic;">// Prevents instantiation</span>
	<span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/php-path-resolution-now-php-5-3-compatible/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Diff parser for CodeMirror</title>
		<link>http://www.liranuna.com/diff-parser-for-codemirror/</link>
		<comments>http://www.liranuna.com/diff-parser-for-codemirror/#comments</comments>
		<pubDate>Sun, 07 Feb 2010 02:56:36 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Releases]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=1108</guid>
		<description><![CDATA[I wrote a quick diff parser for  CodeMirror. CodeMirror is a real-time code editor for your browser. I know that diff isn&#8217;t a format edited by humans, but I found myself needing the diff syntax highlight where other code is shown. More on that soon. Live Example, Download the parser.]]></description>
				<content:encoded><![CDATA[<p>I wrote a quick diff parser for  <a href="http://marijn.haverbeke.nl/codemirror/"> CodeMirror</a>.</p>
<p>CodeMirror is a real-time code editor for your browser. I know that diff isn&#8217;t a format edited by humans, but I found myself needing the diff syntax highlight where other code is shown.<br />
More on that soon.</p>
<p><a href="http://liranuna.com/codemirror-diff/">Live Example</a>, <a href="http://www.liranuna.com/wordpress/wp-content/uploads/2010/02/codemirror-diff.zip">Download the parser</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/diff-parser-for-codemirror/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>PHP Path resolution class &#8211; Relative paths made easy</title>
		<link>http://www.liranuna.com/php-path-resolution-class-relative-paths-made-easy/</link>
		<comments>http://www.liranuna.com/php-path-resolution-class-relative-paths-made-easy/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 11:24:31 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[path resolution]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=1079</guid>
		<description><![CDATA[Lately I&#8217;ve been working on a project that required me to handle a lot of file-system related operations, especially paths. While PHP offers some basic functions to handle paths, such as basename and dirname to resolute the (direct) parent and base name of a path, it does not offer any means of normalizing or combining [...]]]></description>
				<content:encoded><![CDATA[<p>Lately I&#8217;ve been working on a project that required me to handle a lot of file-system related operations, especially paths.</p>
<p>While PHP offers some basic functions to handle paths, such as <a href="http://php.net/manual/function.basename.php">basename</a> and <a href="http://php.net/manual/function.dirname.php">dirname</a> to resolute the (direct) parent and base name of a path, it does not offer any means of normalizing or combining a path if it&#8217;s on a remote file system that is not in the server&#8217;s reach. If the files are local, it offers the function <a href="http://php.net/manual/function.realpath.php">realpath</a>.</p>
<p>I didn&#8217;t like the case and decided to write a &#8216;static&#8217; utility class to handle file paths safely, without worrying about possible path masquerading from broken code.</p>
<p><span id="more-1079"></span></p>
<p>I hope someone will find the result useful:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
&nbsp;
<span style="color: #009933; font-style: italic;">/**
 * @class Path
 *
 * @brief Utility class that handles file and directory pathes
 *
 * This class handles basic important operations done to file system paths.
 * It safely renders relative pathes and removes all ambiguity from a relative path.
 *
 * @author Liran Nuna
 */</span>
final <span style="color: #000000; font-weight: bold;">class</span> Path
<span style="color: #009900;">&#123;</span>
	<span style="color: #009933; font-style: italic;">/**
	 * Returns the parent path of this path.
	 * &quot;/path/to/directory&quot; will return &quot;/path/to&quot;
	 *
	 * @arg $path	The path to retrieve the parent path from
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">self</span><span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Returns the last item on the path.
	 * &quot;/path/to/directory&quot; will return &quot;directory&quot;
	 *
	 * @arg $path	The path to retrieve the base from
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">self</span><span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Normalizes the path for safe usage
	 * This function does several operations to the given path:
	 *   * Removes unnecessary slashes (///path//to/////directory////)
	 *   * Removes current directory references (/path/././to/./directory/./././)
	 *   * Renders relative pathes (/path/from/../to/somewhere/in/../../directory)
	 *
	 * @arg $path	The path to normalize
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> normalize<span style="color: #009900;">&#40;</span><span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #990000;">array_reduce</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">explode</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$path</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #990000;">create_function</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'$a, $b'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'
			if($a === 0)
				$a = &quot;/&quot;;
&nbsp;
			if($b === &quot;&quot; || $b === &quot;.&quot;)
				return $a;
&nbsp;
			if($b === &quot;..&quot;)
				return dirname($a);
&nbsp;
			return preg_replace(&quot;/\/+/&quot;, &quot;/&quot;, &quot;$a/$b&quot;);
		'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Combines a list of pathes to one safe path
	 *
	 * @arg $root	The path or array with values to combine into a single path
	 * @arg ...		Relative pathes to root or arrays
	 *
	 * @note		This function works with multi-dimentional arrays recursively.
	 */</span>
	<span style="color: #000000; font-weight: bold;">public</span> static <span style="color: #000000; font-weight: bold;">function</span> combine<span style="color: #009900;">&#40;</span><span style="color: #000088;">$root</span><span style="color: #339933;">,</span> <span style="color: #000088;">$rel1</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$arguments</span> <span style="color: #339933;">=</span> <span style="color: #990000;">func_get_args</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">self</span><span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">array_reduce</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$arguments</span><span style="color: #339933;">,</span> <span style="color: #990000;">create_function</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'$a,$b'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'
			if(is_array($a))
				$a = array_reduce($a, &quot;Path::combine&quot;);
			if(is_array($b))
				$b = array_reduce($b, &quot;Path::combine&quot;);
&nbsp;
			return &quot;$a/$b&quot;;
		'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #009933; font-style: italic;">/**
	 * Empty, private constructor, to prevent instantiation
	 */</span>
	<span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">function</span> __construct<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #666666; font-style: italic;">// Prevents instantiation</span>
	<span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Usage of this class is very simple, Path::basename and Path::dirname perform the same operation as PHP&#8217;s native dirname and basename, <strong>but safer</strong>:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">// PHP's native basname will return '..'</span>
<span style="color: #b1b100;">echo</span> <span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/path/to/treasure/island/monster/../..'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// Safe basename will return 'treasure'</span>
<span style="color: #b1b100;">echo</span> Path<span style="color: #339933;">::</span><span style="color: #990000;">basename</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/path/to/treasure/island/monster/../..'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// PHP's native dirname will return '/path/to/treasure/island/monster/..'</span>
<span style="color: #b1b100;">echo</span> <span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/path/to/treasure/island/monster/../..'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// Safe dirname will return '/path/to'</span>
<span style="color: #b1b100;">echo</span> Path<span style="color: #339933;">::</span><span style="color: #990000;">dirname</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/path/to/treasure/island/monster/../..'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Path::normalize will sanitize paths and return the safe real path even if it does not exist on the server:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">// Normalize will 'sanitize' this path</span>
<span style="color: #666666; font-style: italic;">// Result: '/path/to/candy/up/ahead/please/go/right'</span>
<span style="color: #b1b100;">echo</span> Path<span style="color: #339933;">::</span><span style="color: #004000;">normalize</span><span style="color: #009900;">&#40;</span>
	<span style="color: #0000ff;">'///../path//to/./monster/././/'</span> <span style="color: #339933;">.</span>
	<span style="color: #0000ff;">'//../candy/.//./up/ahead/.//./'</span> <span style="color: #339933;">.</span>
	<span style="color: #0000ff;">'test//back/../..//please/go///'</span> <span style="color: #339933;">.</span>
	<span style="color: #0000ff;">'/left/./../right/123_test!/../'</span>
<span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Lastly, Path::combine will combine paths from variable amount of strings and arrays to form one safe path:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">// Combine paths from a relative path and root</span>
<span style="color: #666666; font-style: italic;">// Result: '/var/www/www.site.com/index.html'</span>
<span style="color: #b1b100;">echo</span> Path<span style="color: #339933;">::</span><span style="color: #004000;">combine</span><span style="color: #009900;">&#40;</span>
	<span style="color: #0000ff;">'/var/www/www.site.com/'</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">'img/../css/jqueryui/../../index.html'</span>
<span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// Combine will also take values from arrays</span>
<span style="color: #666666; font-style: italic;">// Result: '/path/to/directory/sub/TEST/test/lastDirectory/filename.ext'</span>
<span style="color: #b1b100;">echo</span> Path<span style="color: #339933;">::</span><span style="color: #004000;">combine</span><span style="color: #009900;">&#40;</span>
	<span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
		<span style="color: #0000ff;">&quot;/path/to&quot;</span><span style="color: #339933;">,</span>
		<span style="color: #0000ff;">&quot;folder/../directory&quot;</span>
	<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'sub'</span><span style="color: #339933;">,</span>
	<span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
		<span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
			<span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
				<span style="color: #0000ff;">'TEST'</span><span style="color: #339933;">,</span>
				<span style="color: #0000ff;">'test'</span><span style="color: #339933;">,</span>
			<span style="color: #009900;">&#41;</span>
		<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
		<span style="color: #0000ff;">'lastDirectory'</span><span style="color: #339933;">,</span>
	<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
	<span style="color: #0000ff;">'filename.ext'</span>
<span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>As always, code I post is under the <a href="http://sam.zoy.org/wtfpl/">WTFPL</a>, so you can use it without any obligations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/php-path-resolution-class-relative-paths-made-easy/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>libellen on devkitARM r26</title>
		<link>http://www.liranuna.com/libellen-on-devkitarm-r26/</link>
		<comments>http://www.liranuna.com/libellen-on-devkitarm-r26/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 17:30:48 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[libellen]]></category>
		<category><![CDATA[Releases]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=1066</guid>
		<description><![CDATA[Thanks to a contributed patch from Iván Vodopiviz, libellen now works on the latest release of devkitARM. libellen also recieved an official svn repository, incorporating this patch. Get libellen latest sources from svn using: svn co http://svn.liranuna.com/libellen/trunk ellen Current revision is 4, so this release is named libellen r4.]]></description>
				<content:encoded><![CDATA[<p>Thanks to a contributed patch from Iván Vodopiviz, libellen now works on the latest release of <a href="http://www.devkitpro.org/">devkitARM</a>.</p>
<p>libellen also recieved an official svn repository, incorporating this patch. Get libellen latest sources from svn using:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">svn co</span> http:<span style="color: #000000; font-weight: bold;">//</span>svn.liranuna.com<span style="color: #000000; font-weight: bold;">/</span>libellen<span style="color: #000000; font-weight: bold;">/</span>trunk ellen</pre></td></tr></table></div>

<p>Current revision is 4, so this release is named libellen r4.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/libellen-on-devkitarm-r26/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSE intrinsics optimizations in popular compilers</title>
		<link>http://www.liranuna.com/sse-intrinsics-optimizations-in-popular-compilers/</link>
		<comments>http://www.liranuna.com/sse-intrinsics-optimizations-in-popular-compilers/#comments</comments>
		<pubDate>Sat, 25 Jul 2009 06:33:47 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=984</guid>
		<description><![CDATA[Lately I have been playing a lot with SSE optimizations and I really enjoy it so far &#8211; using functions to tell the compiler what instructions to use makes you feel the power in your finger tips. At first I was naive and thought the compiler will do exactly what it&#8217;s being told, assuming that [...]]]></description>
				<content:encoded><![CDATA[<p>Lately I have been playing a lot with SSE optimizations and I really enjoy it so far &#8211; using functions to tell the compiler what instructions to use makes you feel the power in your finger tips. At first I was naive and thought the compiler will do exactly what it&#8217;s being told, assuming that you know what you&#8217;re doing &#8211; looking at the SSE intrinsic header file was mostly a bunch of calls to internal GCC functions or &#8216;extern&#8217; in MSVC, suggesting that the compiler will simply follow your leadership.</p>
<p>I assumed wrong &#8211; the compiler will take the liberty to optimized your code even further &#8211; at points you wouldn&#8217;t even think about, though I have noticed that is not always the case with MSVC. MSVC will sometimes behave too trusting at the coder even when optimizations obviously could be made. After grasping the concept of SSE and what it could do, I quickly realized MSVC won&#8217;t optimize as good as GCC 4.x or ICC would.</p>
<p>I read a lot of forums about people who want to gain speed by using SSE to optimize their core math operations such as a 4D vector or a 4&#215;4 matrix. While SSE will notably boost performance by about 10-30% depending on usage, there is no magic switch to tell the compiler to optimize your code to use SSE for you, so you need to know how to use intrinsics while actually optimizing along the way, while carefully examining the resulting assembly code.</p>
<p>This article will closely inspect and analyze the assembly output of 3 major compilers &#8211; GCC 4.x targeting Linux (4.3.3 in specific), the latest (stable) MSVC 2008 (Version 9.0.30729.1 SP1 in particular) and ICC 11.1.</p>
<p><span id="more-984"></span></p>
<p>I&#8217;ll start by declaring the options I give for each compiler &#8211; I am keeping it minimal and simple yet enough to output sane and optimized code.</p>
<p>GCC command line:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">gcc</span> <span style="color: #660033;">-O2</span> <span style="color: #660033;">-msse</span> test.c <span style="color: #660033;">-S</span> <span style="color: #660033;">-o</span> test.asm</pre></td></tr></table></div>

<p>MSVC command line:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;">cl  <span style="color: #000000; font-weight: bold;">/</span>O2 <span style="color: #000000; font-weight: bold;">/</span>arch:SSE <span style="color: #000000; font-weight: bold;">/</span>c <span style="color: #000000; font-weight: bold;">/</span>FA test.c</pre></td></tr></table></div>

<p>ICC&#8217;s command line:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;">icc <span style="color: #660033;">-O2</span> <span style="color: #660033;">-msse</span> test.c <span style="color: #660033;">-S</span> <span style="color: #660033;">-o</span> test.asm</pre></td></tr></table></div>

<p>MSVC automatically generates a file called test.asm, so no need to specify output file. Regardless of that, note the remarkable resemblance of the commands&#8230;</p>
<h2>Basics</h2>
<p>Let&#8217;s begin with a simple assignment test:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;xmmintrin.h&gt;</span>
&nbsp;
<span style="color: #0000ff;">extern</span> <span style="color: #0000ff;">void</span> printv<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	__m128 m <span style="color: #000080;">=</span> _mm_set_ps<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">4</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	__m128 z <span style="color: #000080;">=</span> _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	printv<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	printv<span style="color: #008000;">&#40;</span>z<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	<span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>This will assign m to be [1, 2, 3, 4] and z to be [0, 0, 0, 0]. Please note that the undefined &#8220;extern&#8221; function &#8216;printv&#8217; is to force the compilers to not optimize out the variable and to &#8220;prove&#8221; that they are used, since we only assemble in both compilers, there is no need to actually define printv.</p>
<p>The variable m is actually const, but we didn&#8217;t hint to compiler. The compiler should understand &#8216;m&#8217; does not change and move it into the const data section (.text). The zero vector should use the xorps opcode to generate a fast zero vector without trading off const memory (x XOR x is always 0).</p>
<p>The output:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">MSVC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40400000</span> <span style="color: #666666; font-style: italic;">; 3.0f</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40800000</span> <span style="color: #666666; font-style: italic;">; 4.0f</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@3f800000 <span style="color: #666666; font-style: italic;">; 1.0f</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40000000</span> <span style="color: #666666; font-style: italic;">; 2.0f</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
&nbsp;
GCC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
&nbsp;
	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span> <span style="color: #666666; font-style: italic;">; 1.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1073741824</span> <span style="color: #666666; font-style: italic;">; 2.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1077936128</span> <span style="color: #666666; font-style: italic;">; 3.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1082130432</span> <span style="color: #666666; font-style: italic;">; 4.0f</span>
&nbsp;
ICC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
&nbsp;
	_2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40400000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40800000</span></pre></td></tr></table></div>

<p>Both GCC and ICC understood that the variable &#8216;m&#8217; is const, and moved it to the .text (const) section. MSVC however chose to use <em>4 xmm registers </em>to create &#8216;m&#8217; &#8211; it not only wrote to valuable registers that in a real application are crucial to have, it also forces the use of the stack if those registers actually contained information, which is common when inlining. It will also invalidate cache usage, since the data is in the opcode, effectively eliminating future prefetches. All compilers however, used xorps to create a zero vector, which is pleasing to see.</p>
<h2>Arithmetic prediction</h2>
<p>Next test is arithmetic prefiction. The test will see how the compiler deals with predefined SSE operations, such as arithmetic, much like predefined integer operations. The compiler should predict and precompute operations such as &#8217;1+1&#8242; and use the answer directly instead of making the CPU compute a static answer. The test is as follows:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;xmmintrin.h&gt;</span>
&nbsp;
<span style="color: #0000ff;">extern</span> <span style="color: #0000ff;">void</span> printv<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	__m128 m <span style="color: #000080;">=</span> _mm_set_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color: #0000dd;">4</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">3</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">2</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	__m128 one <span style="color: #000080;">=</span> _mm_set1_ps<span style="color: #008000;">&#40;</span><span style="color:#800080;">1.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	printv<span style="color: #008000;">&#40;</span>_mm_and_ps<span style="color: #008000;">&#40;</span>m, _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Always a zero vector</span>
	printv<span style="color: #008000;">&#40;</span>_mm_or_ps<span style="color: #008000;">&#40;</span>m, _mm_set1_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color:#800080;">0.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Negate all (nop, all negative)</span>
	printv<span style="color: #008000;">&#40;</span>_mm_add_ps<span style="color: #008000;">&#40;</span>m, _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Add 0 (nop; x+0=x)</span>
	printv<span style="color: #008000;">&#40;</span>_mm_sub_ps<span style="color: #008000;">&#40;</span>m, _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Substruct 0 (nop; x-0=x)</span>
	printv<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>m, one<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Multiply by one (nop)</span>
	printv<span style="color: #008000;">&#40;</span>_mm_div_ps<span style="color: #008000;">&#40;</span>m, one<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Division by one (nop)</span>
&nbsp;
	<span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>On the first test, the compiler should <strong>always</strong> send a zero xmm register to printv, since x &amp; 0 is always equal to 0. The rest of the tests should always result into sending the same register, since all the tests are a simple way to create a nop (no operation).</p>
<p>The results:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">MSVC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0800000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0400000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0000000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@bf800000
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">xorps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR tv129<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">80000000</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">0</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv129<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">addps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">subps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv129<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">32</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
&nbsp;
GCC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
&nbsp;
	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3212836864</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3221225472</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3225419776</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3229614080</span>
&nbsp;
ICC<span style="color: #339933;">:</span>
        <span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
        <span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>2<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">orps</span>      _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">mulps</span>     _2il0floatpacket<span style="color: #339933;">.</span>1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">divps</span>     _2il0floatpacket<span style="color: #339933;">.</span>1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
&nbsp;
	_2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0xbf800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0400000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>1<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>2<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span></pre></td></tr></table></div>

<p>The results are certainly interesting. MSVC has decided to not optimize the code and did exactly what it was told, resulting in redundant code. More should be noted: xorps (line 7) could&#8217;ve been moved after the unpcklps instruction (line 10) to take advantage of <a href="http://avisynth.org/mediawiki/Filter_SDK/Instruction_pairing">instruction pairing</a> (when the processor executes the same opcode again, it&#8217;s usually faster, especially in SSE-land, where the CPU operates on large registers of 128bit). GCC&#8217;s code does exactly what we expect from a modern compiler; it performs static check for all operations. ICC seems to be selective on what it can determine, leaving out the redundant OR, multiplication and division while optimizing the others out.</p>
<h2>Shuffles</h2>
<p>Next test is regarding shuffles. There will be several tests regarding redundant shuffles that could be easily optimized out or merged, such as double reverses and subsequent shuffles</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;xmmintrin.h&gt;</span>
&nbsp;
<span style="color: #0000ff;">extern</span> <span style="color: #0000ff;">void</span> printv<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	__m128 m <span style="color: #000080;">=</span> _mm_set_ps<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">4</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0xE4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// NOP - shuffles to same order</span>
	printv<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x1B</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Reverses the vector</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x1B</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Reverses the vector again, NOP</span>
	printv<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x1B</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Reverses the vector</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x1B</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Reverses the vector again, NOP</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x1B</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// All should be optimized to one shuffle</span>
	printv<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0xC9</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Those two shuffles together swap pairs</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x2D</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// And could be optimized to 0x4E</span>
	printv<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x55</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// First element</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x55</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Redundant - since all are the same</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x55</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Let's stress it again</span>
	m <span style="color: #000080;">=</span> _mm_shuffle_ps<span style="color: #008000;">&#40;</span>m, m, <span style="color: #208080;">0x55</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// And one last time</span>
	printv<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	<span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>The results here should be minimum shuffles. First two tests should have no shuffle at all. Third should only have one shuffle to reverse the vector (mask = 0x1B). Forth test should merge the two shuffles into one shuffle, from mask 0xC9 and 0x2D to mask 0x4E (swap pairs). Last test should be optimized to only one shuffle, since all are ending up selecting the same value.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">MSVC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40800000</span> <span style="color: #666666; font-style: italic;">; 4.0f</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40400000</span> <span style="color: #666666; font-style: italic;">; 3.0f</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40000000</span> <span style="color: #666666; font-style: italic;">; 2.0f</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@3f800000 <span style="color: #666666; font-style: italic;">; 1.0f</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">228</span>			<span style="color: #666666; font-style: italic;">; 000000e4H</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">27</span>			<span style="color: #666666; font-style: italic;">; 0000001bH</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">27</span>			<span style="color: #666666; font-style: italic;">; 0000001bH</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">27</span>			<span style="color: #666666; font-style: italic;">; 0000001bH</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">27</span>			<span style="color: #666666; font-style: italic;">; 0000001bH</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">27</span>			<span style="color: #666666; font-style: italic;">; 0000001bH</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">201</span>			<span style="color: #666666; font-style: italic;">; 000000c9H</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">45</span>			<span style="color: #666666; font-style: italic;">; 0000002dH</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">85</span>			<span style="color: #666666; font-style: italic;">; 00000055H</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">85</span>			<span style="color: #666666; font-style: italic;">; 00000055H</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">85</span>			<span style="color: #666666; font-style: italic;">; 00000055H</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">85</span>			<span style="color: #666666; font-style: italic;">; 00000055H</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
&nbsp;
GCC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #ff0000;">24</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">-</span><span style="color: #ff0000;">24</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">201</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">45</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #ff0000;">24</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">-</span><span style="color: #ff0000;">24</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">ebp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">85</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
&nbsp;
	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span> <span style="color: #666666; font-style: italic;">; 1.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1073741824</span> <span style="color: #666666; font-style: italic;">; 2.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1077936128</span> <span style="color: #666666; font-style: italic;">; 3.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1082130432</span> <span style="color: #666666; font-style: italic;">; 4.0f</span>
&nbsp;
	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1082130432</span> <span style="color: #666666; font-style: italic;">; 4.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1077936128</span> <span style="color: #666666; font-style: italic;">; 3.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1073741824</span> <span style="color: #666666; font-style: italic;">; 2.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span> <span style="color: #666666; font-style: italic;">; 1.0f</span>
&nbsp;
ICC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	addl      <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">4</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">228</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">27</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">27</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">27</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">27</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">27</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">201</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">45</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">85</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">85</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">85</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">85</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
&nbsp;
	_2il0floatpacket<span style="color: #339933;">.</span>0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40400000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40800000</span></pre></td></tr></table></div>

<p>The results are interesting and quite surprising &#8211; GCC passed all of the tests but the shuffle merge while MSVC and ICC didn&#8217;t optimize <em>any</em> of the shuffles. Shame. Interesting to note that GCC chose to duplicate the original reverse vector for post operations instead of caching it in an extra xmm register. (Copying registers is faster than copying memory, even if it&#8217;s aligned).</p>
<h2>Dynamic input</h2>
<p>All of the previous tests were about the compiler being able to make decisions about static data. Now it&#8217;s time for functions, where input and output isn&#8217;t known. A notable example of a vector operation is normalization. Here is a function to normalize an SSE vector and return a normalized copy.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;">__m128 normalize<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	__m128 l <span style="color: #000080;">=</span> _mm_mul_ps<span style="color: #008000;">&#40;</span>m, m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	l <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>l, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>l, l, <span style="color: #208080;">0x4E</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">return</span> _mm_div_ps<span style="color: #008000;">&#40;</span>m, _mm_sqrt_ps<span style="color: #008000;">&#40;</span>_mm_add_ps<span style="color: #008000;">&#40;</span>l,
	                                   _mm_shuffle_ps<span style="color: #008000;">&#40;</span>l, l, <span style="color: #208080;">0x11</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>The function is really optimized. It gives hints the compiler what should be a temporary variable and what should be reused and takes a total of 7 operations.</p>
<p>The results we expect are perfect projection of the SSE intrinsics to assembly using only 3 vectors (original, length and square):</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">MSVC<span style="color: #339933;">:</span>
	normalize<span style="color: #339933;">:</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #b00040;">mulps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">78</span> <span style="color: #666666; font-style: italic;">; 0000004eH</span>
		<span style="color: #b00040;">addps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">17</span> <span style="color: #666666; font-style: italic;">; 00000011H</span>
		<span style="color: #b00040;">addps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">sqrtps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">divps</span>		<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #00007f; font-weight: bold;">ret</span>
&nbsp;
GCC<span style="color: #339933;">:</span>
	normalize<span style="color: #339933;">:</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">mulps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">shufps</span>	<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">78</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">addps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">shufps</span>	<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">17</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">addps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">sqrtps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">divps</span>		<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #00007f; font-weight: bold;">ret</span>
&nbsp;
ICC<span style="color: #339933;">:</span>
	normalize<span style="color: #339933;">:</span>
		<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">mulps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">78</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">addps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">shufps</span>    <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">17</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">addps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">sqrtps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm4</span>
		<span style="color: #b00040;">divps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #00007f; font-weight: bold;">ret</span></pre></td></tr></table></div>

<p>Good to see that all compilers are equal in here. These are exactly the results we expected.</p>
<h2>Inline Functions</h2>
<p>Next test would be combining function calls and static compile-time data to get inline functions. Inline functions should embed the function&#8217;s code into the calling routine and case-optimize if possible. A classic case of inline functions is &#8216;abs&#8217;:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;xmmintrin.h&gt;</span>
&nbsp;
<span style="color: #0000ff;">extern</span> <span style="color: #0000ff;">void</span> printv<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #ff0000; font-style: italic;">/* This is called _mm_abs_ps because 'abs' is a built in function
   and C does not allow overloading */</span>
<span style="color: #0000ff;">inline</span> __m128 _mm_abs_ps<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">return</span> _mm_andnot_ps<span style="color: #008000;">&#40;</span>_mm_set1_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color:#800080;">0.0f</span><span style="color: #008000;">&#41;</span>, m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
		<span style="color: #666666;">// All positive</span>
	printv<span style="color: #008000;">&#40;</span>_mm_abs_ps<span style="color: #008000;">&#40;</span>_mm_set_ps<span style="color: #008000;">&#40;</span><span style="color:#800080;">1.0f</span>, <span style="color:#800080;">0.0f</span>, <span style="color:#800080;">0.0f</span>, <span style="color:#800080;">1.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		<span style="color: #666666;">// All negative</span>
	printv<span style="color: #008000;">&#40;</span>_mm_abs_ps<span style="color: #008000;">&#40;</span>_mm_set_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color:#800080;">1.0f</span>, <span style="color: #000040;">-</span><span style="color:#800080;">0.0f</span>, <span style="color: #000040;">-</span><span style="color:#800080;">0.0f</span>, <span style="color: #000040;">-</span><span style="color:#800080;">1.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
		<span style="color: #666666;">// Mixed</span>
	printv<span style="color: #008000;">&#40;</span>_mm_abs_ps<span style="color: #008000;">&#40;</span>_mm_set_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color:#800080;">1.0f</span>, <span style="color: #000040;">-</span><span style="color:#800080;">0.0f</span>, <span style="color:#800080;">0.0f</span>, <span style="color:#800080;">1.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>The results we expect are perfect inlining of the function, resulting in the same vector over the three calls. A good compiler will also not duplicate the data over the three calls and reuse the same vector for the program, since the linker will most likely not do it.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">MSVC<span style="color: #339933;">:</span>
	main<span style="color: #339933;">:</span>
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@3f800000
		<span style="color: #b00040;">xorps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">80000000</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">0</span>
		<span style="color: #b00040;">movaps</span>	XMMWORD PTR tv166<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm4</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">andnps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #00007f; font-weight: bold;">call</span>	printv
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@bf800000
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">80000000</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv166<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm4</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">andnps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #00007f; font-weight: bold;">call</span>	printv
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@bf800000
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">80000000</span>
		<span style="color: #b00040;">xorps</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@3f800000
		<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv166<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">16</span><span style="color: #009900; font-weight: bold;">&#93;</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
		<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
		<span style="color: #b00040;">andnps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm4</span>
		<span style="color: #00007f; font-weight: bold;">call</span>	printv
&nbsp;
GCC<span style="color: #339933;">:</span>
	main<span style="color: #339933;">:</span>
		<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #00007f; font-weight: bold;">call</span>	printv
		<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #00007f; font-weight: bold;">call</span>	printv
		<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
		<span style="color: #00007f; font-weight: bold;">call</span>	printv
&nbsp;
	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span> <span style="color: #666666; font-style: italic;">; -0.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span> <span style="color: #666666; font-style: italic;">; -0.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span> <span style="color: #666666; font-style: italic;">; -0.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span> <span style="color: #666666; font-style: italic;">; -0.0f</span>
&nbsp;
	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span> <span style="color: #666666; font-style: italic;">; 1.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0</span>          <span style="color: #666666; font-style: italic;">; 0.0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0</span>          <span style="color: #666666; font-style: italic;">; 0,0f</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span> <span style="color: #666666; font-style: italic;">; 1.0f</span>
&nbsp;
ICC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>7<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	addl      <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">4</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span>
	<span style="color: #b00040;">movaps</span>    <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #b00040;">andnps</span>    _2il0floatpacket<span style="color: #339933;">.</span>6<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andnps</span>    _2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andnps</span>    _2il0floatpacket<span style="color: #339933;">.</span>9<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
&nbsp;
	_2il0floatpacket<span style="color: #339933;">.</span>6<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x00000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x00000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>7<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0xbf800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xbf800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>9<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x00000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xbf800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>11<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span></pre></td></tr></table></div>

<p>This time each compiler chose it&#8217;s own way of optimizing. MSVC&#8217;s horrible assignment code in addition to it&#8217;s inability to predict static operations resulted in redundant code. ICC inlined the function, but kept some of the static data in the stack (while comfortably available on aligned read-only space) and did not perform any precomputation. GCC optimizes the code as we expected, but it &#8220;forgot&#8221; to remove the unnecessary helper vector (LC0) which is not used. This isn&#8217;t a big deal though because the linker will simply remove unreferenced const objects. GCC most likely kept it for when the inline function would have had use for it.</p>
<h2>SSE comparison prediction</h2>
<p>A good compiler should also predict branches and eliminate the unused code if the check is always true or false. SSE provides a way to compare 4 floats at once using the cmp*ps routines. If the result is true, the instruction puts a mask of 1s on the component. If it is false, 0. This instruction could be eliminated easily if the result is known during compile time &#8211; especially in inline functions. The test will implement the function &#8216;sign&#8217; which returns 1, 0 or -1 per component.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;xmmintrin.h&gt;</span>
&nbsp;
<span style="color: #0000ff;">extern</span> <span style="color: #0000ff;">void</span> printv<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">inline</span> __m128 sign<span style="color: #008000;">&#40;</span>__m128 m<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #0000ff;">return</span> _mm_and_ps<span style="color: #008000;">&#40;</span>_mm_or_ps<span style="color: #008000;">&#40;</span>_mm_and_ps<span style="color: #008000;">&#40;</span>m, _mm_set1_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color:#800080;">0.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>,
				_mm_set1_ps<span style="color: #008000;">&#40;</span><span style="color:#800080;">1.0f</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>,
			  _mm_cmpneq_ps<span style="color: #008000;">&#40;</span>m, _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	__m128 m <span style="color: #000080;">=</span> _mm_setr_ps<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">3</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
	printv<span style="color: #008000;">&#40;</span>_mm_cmpeq_ps<span style="color: #008000;">&#40;</span>m, m<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Equal to itself</span>
	printv<span style="color: #008000;">&#40;</span>_mm_cmpgt_ps<span style="color: #008000;">&#40;</span>m, _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Greater than zero</span>
	printv<span style="color: #008000;">&#40;</span>_mm_cmplt_ps<span style="color: #008000;">&#40;</span>m, _mm_setzero_ps<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Less than zero</span>
&nbsp;
	printv<span style="color: #008000;">&#40;</span>sign<span style="color: #008000;">&#40;</span>_mm_setr_ps<span style="color: #008000;">&#40;</span> <span style="color: #0000dd;">1</span>,  <span style="color: #0000dd;">2</span>,  <span style="color: #0000dd;">3</span>,  <span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// All 1's</span>
	printv<span style="color: #008000;">&#40;</span>sign<span style="color: #008000;">&#40;</span>_mm_setr_ps<span style="color: #008000;">&#40;</span><span style="color: #000040;">-</span><span style="color: #0000dd;">1</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">2</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">3</span>, <span style="color: #000040;">-</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// All -1's</span>
	printv<span style="color: #008000;">&#40;</span>sign<span style="color: #008000;">&#40;</span>_mm_setr_ps<span style="color: #008000;">&#40;</span> <span style="color: #0000dd;">0</span>,  <span style="color: #0000dd;">0</span>,  <span style="color: #0000dd;">0</span>,  <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// All 0's</span>
	printv<span style="color: #008000;">&#40;</span>sign<span style="color: #008000;">&#40;</span>m<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #666666;">// Mixed</span>
<span style="color: #008000;">&#125;</span></pre></td></tr></table></div>

<p>A good compiler will eliminate those checks and will create a const copy of the result, especially in places of where all comparisons fail, resulting a zero vector. In the following test, we will check the generated code of several const comparison results without sacrificing application size.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="asm" style="font-family:monospace;">MSVC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40400000</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0800000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@3f800000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0000000
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpeqps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR tv258<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpltps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">cmpltps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv258<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@3f800000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40400000</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40800000</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">40000000</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">0</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR tv271<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm4</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@<span style="color: #ff0000;">80000000</span>
	<span style="color: #b00040;">shufps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">0</span>
	<span style="color: #b00040;">movaps</span>	XMMWORD PTR tv272<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> XMMWORD PTR tv258<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">cmpneqps</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0400000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0800000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@bf800000
	<span style="color: #b00040;">movss</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff; font-weight: bold;">DWORD</span> PTR __real@c0000000
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> XMMWORD PTR tv258<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv272<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv271<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">cmpneqps</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> XMMWORD PTR tv258<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm3</span>
	<span style="color: #b00040;">unpcklps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv272<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv271<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">cmpneqps</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> XMMWORD PTR _m<span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv272<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> XMMWORD PTR tv258<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> XMMWORD PTR tv271<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #46aa03; font-weight: bold;">esp</span><span style="color: #339933;">+</span><span style="color: #ff0000;">64</span><span style="color: #009900; font-weight: bold;">&#93;</span>
	<span style="color: #b00040;">cmpneqps</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	_printv
&nbsp;
GCC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC2<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">rsp</span><span style="color: #009900; font-weight: bold;">&#41;</span>
	<span style="color: #b00040;">cmpeqps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpltps</span>	<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">rsp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">rsp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpltps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpneqps</span>	<span style="color: #339933;">.</span>LC3<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">movaps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpneqps</span>	<span style="color: #339933;">.</span>LC4<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpneqps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
	<span style="color: #b00040;">xorps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">rsp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpneqps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>	<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">rsp</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #339933;">.</span>LC0<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">orps</span>	<span style="color: #339933;">.</span>LC1<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #339933;">%</span>rip<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>	<span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>	printv
&nbsp;
	<span style="color: #339933;">.</span>LC0<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">2147483648</span>
	<span style="color: #339933;">.</span>LC1<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span>
	<span style="color: #339933;">.</span>LC2<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3221225472</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1077936128</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3229614080</span>
	<span style="color: #339933;">.</span>LC3<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1065353216</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1073741824</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1077936128</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">1082130432</span>
	<span style="color: #339933;">.</span>LC4<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3212836864</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3221225472</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3225419776</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">3229614080</span>
&nbsp;
ICC<span style="color: #339933;">:</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	addl      <span style="color: #0000ff; font-weight: bold;">$</span><span style="color: #ff0000;">4</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">esp</span>
	<span style="color: #b00040;">cmpeqps</span>   <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpltps</span>   _2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">cmpltps</span>   <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>9<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>9<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">andps</span>     _2il0floatpacket<span style="color: #339933;">.</span>10<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">cmpneqps</span>  <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">orps</span>      _2il0floatpacket<span style="color: #339933;">.</span>11<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>12<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>10<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">andps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">orps</span>      _2il0floatpacket<span style="color: #339933;">.</span>11<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">cmpneqps</span>  <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">cmpneqps</span>  <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>10<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">movaps</span>    _2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">orps</span>      _2il0floatpacket<span style="color: #339933;">.</span>11<span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #b00040;">xorps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span>
	<span style="color: #b00040;">cmpneqps</span>  <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span>
	<span style="color: #b00040;">andps</span>     <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #339933;">%</span><span style="color: #46aa03; font-weight: bold;">xmm0</span>
	<span style="color: #00007f; font-weight: bold;">call</span>      printv
&nbsp;
	_2il0floatpacket<span style="color: #339933;">.</span>8<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40400000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>9<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40400000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x40800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>10<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>11<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>12<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0xbf800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0400000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0xc0800000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>14<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x80000000</span>
	_2il0floatpacket<span style="color: #339933;">.</span>15<span style="color: #339933;">:</span>
		<span style="color: #339933;">.</span>long	<span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span><span style="color: #339933;">,</span><span style="color: #ff0000;">0x3f800000</span></pre></td></tr></table></div>

<p>None of the compilers optimized the comparisons, which could benefit the code in a large extent, especially when inlined. It&#8217;s notable to mention that GCC merged some of constants, eliminating 2 of the vectors that ICC left. ICC and GCC both optimized useless ORs where possible while MSVC simply followed the code intrinsic by intrinsic.</p>
<h2>Conclusion</h2>
<p>I keep hearing the catch-phrase among programmers that &#8220;the compiler is better than you [think].&#8221; I completely disagree with it and object the use of it. Not only it makes novice programmers misunderstand it and give the compiler a lot of credit where it&#8217;s impossible to expect a compiler to optimize a case, it also makes more advance programmers become lazy and believe the compiler does know what it&#8217;s doing.</p>
<p>Proven here is a case using the so called &#8216;intrinsics&#8217; to <em>guide</em> the compiler as opposed of <em>instructing</em> it. As seen by the above examples, only GCC (and to an extent, ICC) behaves the way we expect it to though it still misses a few of the cases (such as merging shuffles and predicting vector branches). MSVC is most likely the worst example of an SSE-guided compiler &#8211; not only it did not optimize <em>any</em> of the tests, it generated horrible assignment code which abused the stack most of the time and hurt performance by not utilizing cache properly.</p>
<p>If you are to code using SSE intrinsics, I advise you to take a closer look at the code if you want maximum performance. Taking advantage of SSE for speed will result a lot of satisfaction if used properly &#8211; instruction pairing, redundant arithmetic operations and redundant compares should be optimized by human beings most of the time and you should not rely on the compiler to do that. Compilers are given much more credit than they deserve.</p>
<p>As a side note about GCC&#8217;s near perfection in code generation &#8211; I was quite surprised seeing it surpass even Intel&#8217;s own compiler! It shows that even compiler writers, who know their own hardware and internal mechanisms, can overlook simple problems in the way humans think &#8211; redundancy in most cases. I highly recommend giving the newest GCC 4.4 a try, if you are on Linux, you most likely have GCC 4.3.x, or if your distribution is an early bird (Gentoo, Fedora&#8230;), you might already have it. Windows users are lucky enough to know that GCC 4.4 have been ported successfully to Windows on both the <a href="https://sourceforge.net/project/shownotes.php?release_id=691876">MinGW suite</a> and the <a href="http://www.tdragon.net/recentgcc/">TDM suite</a>. Mac users might have to compile gcc 4.4 themselves using Xcode (which is actually gcc 4.0.1).</p>
<p>Happy optimizing!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/sse-intrinsics-optimizations-in-popular-compilers/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Securing your Debian server against slowloris</title>
		<link>http://www.liranuna.com/securing-your-debian-server-against-slowloris/</link>
		<comments>http://www.liranuna.com/securing-your-debian-server-against-slowloris/#comments</comments>
		<pubDate>Sat, 27 Jun 2009 02:26:38 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=949</guid>
		<description><![CDATA[I recently came across a very nasty DoS attack that any script kiddy can execute &#8211; called slowloris. It involves sending partial HTTP packets while Apache2 patiently waits for an unreasonable amount of time for the remaining data while consuming a thread, doing so continuously will prevent Apache2 from opening more threads and serving potential [...]]]></description>
				<content:encoded><![CDATA[<p>I recently came across a very nasty DoS attack that any script kiddy can execute &#8211; called <a href="http://ha.ckers.org/slowloris/">slowloris</a>. It involves sending partial HTTP packets while Apache2 patiently waits for an unreasonable amount of time for the remaining data while consuming a thread, doing so continuously will prevent Apache2 from opening more threads and serving potential web viewers.</p>
<p>One old remedy for this was supposedly <a href="http://www.zdziarski.com/projects/mod_evasive/">mod_evasive</a>, but it doesn&#8217;t really work against that specific type of attack as it acts too late to understand it&#8217;s an attack.</p>
<p>Very recently, an Apache mod fixing this vulnerability had been released &#8211; <a href="ftp://ftp.monshouwer.eu/pub/linux/mod_antiloris/">mod_antiloris</a>, but it&#8217;s made with a RedHat based server in mind. Here are the steps to get it working on a Debian or any other Debian compatible server (such as Ubuntu).</p>
<p><span id="more-949"></span></p>
<p>First install the prerequisites. I assume you are using the threaded version of Apache, else you are not vulnerable to this type of attack.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">apt-get install</span> <span style="color: #c20cb9; font-weight: bold;">gcc</span> apache2-threaded-dev</pre></td></tr></table></div>

<p>Next, get the module source, extract it and compile:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">wget</span> <span style="color: #ff0000;">&quot;ftp://ftp.monshouwer.eu/pub/linux/mod_antiloris/mod_antiloris-0.3.tar.bz2&quot;</span>
<span style="color: #c20cb9; font-weight: bold;">tar</span> xvf mod_antiloris-<span style="color: #000000;">0.3</span>.tar.bz2
<span style="color: #7a0874; font-weight: bold;">cd</span> mod_antiloris-<span style="color: #000000;">0.3</span><span style="color: #000000; font-weight: bold;">/</span></pre></td></tr></table></div>

<p>The following command will end up in error &#8211; this is perfectly normal! Since apxs2 (Apache extension service) for Debian isn&#8217;t modified to handle Debian-style modules, <strong>do not run it as root</strong> as it will mess up with your system, thinking it&#8217;s RedHat compatible.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">apxs2</span> <span style="color: #660033;">-a</span> <span style="color: #660033;">-i</span> <span style="color: #660033;">-c</span> mod_antiloris.c</pre></td></tr></table></div>

<p>Because apxs2 didn&#8217;t have permission to copy the module, we&#8217;ll do it ourselves:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">cp</span> .libs<span style="color: #000000; font-weight: bold;">/</span>mod_antiloris.so <span style="color: #000000; font-weight: bold;">/</span>usr<span style="color: #000000; font-weight: bold;">/</span>lib<span style="color: #000000; font-weight: bold;">/</span>apache2<span style="color: #000000; font-weight: bold;">/</span>modules<span style="color: #000000; font-weight: bold;">/</span>mod_antiloris.so</pre></td></tr></table></div>

<p>Now we&#8217;ll add Debian-style .load file to auto load the module:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">su</span> <span style="color: #660033;">-c</span> <span style="color: #ff0000;">&quot;echo 'LoadModule antiloris_module /usr/lib/apache2/modules/mod_antiloris.so' &gt; /etc/apache2/mods-available/antiloris.load&quot;</span></pre></td></tr></table></div>

<p>Then we&#8217;ll enable the module, Debian style:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> a2enmod antiloris</pre></td></tr></table></div>

<p>And reload Apache&#8217;s configurations and modules:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #000000; font-weight: bold;">/</span>etc<span style="color: #000000; font-weight: bold;">/</span>init.d<span style="color: #000000; font-weight: bold;">/</span>apache2 reload</pre></td></tr></table></div>

<p>This module solves the slowloris DoS attack &#8211; so I urge you to install it as soon as possible if you are using Apache as your HTTP server.</p>
<p>I would like to make sure credit is where it is due &#8211; I did not develop this module &#8211; I just wrote instructions on how to make it Debian compatible, since it seems to be RedHat centric. The module was written and hosted by <a href="http://www.monshouwer.eu/index.php?url=adres">Kees Monshouwer</a>, which I cannot seem to find any official website associated with.<br />
I hope this will help people as much as it helped me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/securing-your-debian-server-against-slowloris/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Sintia has a Twitter page</title>
		<link>http://www.liranuna.com/sintia-has-a-twitter-page/</link>
		<comments>http://www.liranuna.com/sintia-has-a-twitter-page/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 08:09:12 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=947</guid>
		<description><![CDATA[Sintia, the suicidal MegaHAL IRC service bot I run for over 6 years on EFnet now have a twitter page! Sintia will be updating it regularly. Check Sintia&#8217;s page here.]]></description>
				<content:encoded><![CDATA[<p>Sintia, the suicidal MegaHAL IRC service bot I run for over 6 years on EFnet now have a twitter page! Sintia will be updating it regularly.</p>
<p>Check Sintia&#8217;s page <a href="http://twitter.com/sintiahal">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/sintia-has-a-twitter-page/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Lessons updated</title>
		<link>http://www.liranuna.com/lessons-updated/</link>
		<comments>http://www.liranuna.com/lessons-updated/#comments</comments>
		<pubDate>Fri, 01 May 2009 02:47:37 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=876</guid>
		<description><![CDATA[My aging NDS 2D lessons have been updated to conform to the latest version of libnds. They should now compile and run on a modern devkitARM toolchain. I hope someone will find this useful.]]></description>
				<content:encoded><![CDATA[<p>My aging <a href="http://www.liranuna.com/nds-2d-tuts/">NDS 2D lessons</a> have been updated to conform to the latest version of libnds. They should now compile and run on a modern devkitARM toolchain.</p>
<p>I hope someone will find this useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/lessons-updated/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>NDS Blending Demo</title>
		<link>http://www.liranuna.com/nds-blending-demo/</link>
		<comments>http://www.liranuna.com/nds-blending-demo/#comments</comments>
		<pubDate>Sun, 05 Apr 2009 02:16:09 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Releases]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=858</guid>
		<description><![CDATA[First I would like to start with the fact that this demo was lying around in my HDD since Halloween. This demo was written to demonstrate how easy it is to utilize the DS&#8217;s hardware blending and create impressive effects with no effort. In this demo, the witch is flying in the sky, and whenever [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: center;"><img class="size-full wp-image-859 alignnone" title="blending-demo" src="http://www.liranuna.com/wordpress/wp-content/uploads/2009/04/blending-demo.png" alt="blending-demo" width="256" height="192" /></p>
<p style="text-align: left;">First I would like to start with the fact that this demo was lying around in my HDD since Halloween.</p>
<p style="text-align: left;">This demo was written to demonstrate how easy it is to utilize the DS&#8217;s hardware blending and create impressive effects with no effort. In this demo, the witch is flying in the sky, and whenever she&#8217;s hovering between the moon, she turns black, because the light from the moon illusions it as such.</p>
<p style="text-align: left;">The demo is composed of 2 backgrounds and a sprite. the sprite is set to blend with the first background (the moon) with 0 blending, resulting the black color.</p>
<p style="text-align: left;">Download: <a href="http://www.liranuna.com/wordpress/wp-content/uploads/2009/04/blending-demo.zip">blending-demo</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/nds-blending-demo/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Rhythmbox &#8216;Now Playing&#8217; script for XChat</title>
		<link>http://www.liranuna.com/rhythmbox-now-playing-script-for-xchat/</link>
		<comments>http://www.liranuna.com/rhythmbox-now-playing-script-for-xchat/#comments</comments>
		<pubDate>Sat, 28 Mar 2009 04:39:58 +0000</pubDate>
		<dc:creator>LiraNuna</dc:creator>
				<category><![CDATA[Releases]]></category>

		<guid isPermaLink="false">http://www.liranuna.com/?p=846</guid>
		<description><![CDATA[This is a small script I wrote for myself when using XChat and Rhythmbox. Ever since the latest Rhythmbox release, there has been an undocumented feature in rhythmbox-client to print the string received from shoutcast streams, such as my favorite di.fm radio, which I normally have on. I found several xchat-rhythmbox announcers but they all [...]]]></description>
				<content:encoded><![CDATA[<p>This is a small script I wrote for myself when using <a href="http://www.xchat.org/">XChat</a> and <a href="http://projects.gnome.org/rhythmbox/">Rhythmbox</a>.</p>
<p>Ever since the latest Rhythmbox release, there has been an undocumented feature in rhythmbox-client to print the string received from shoutcast streams, such as my favorite di.fm radio, which I normally have on. I found several xchat-rhythmbox announcers but they all lacked the ability to determine if rhythmbox currently streams music or listens to a music file.</p>
<p>Now that I actually have free time, I could write a small script to do exactly what I wanted, and I&#8217;ve decided to share it. The source/script is released under the terms of the <a href="http://sam.zoy.org/wtfpl/">WTFPL</a>.</p>
<p>Download link: <a href="http://www.liranuna.com/wordpress/wp-content/uploads/2009/03/rhythmbox_nowplaying.tar.gz">rhythmbox_nowplaying.tar.gz</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.liranuna.com/rhythmbox-now-playing-script-for-xchat/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
