<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1935045230982980752</id><updated>2012-02-16T11:38:44.397-08:00</updated><title type='text'>Hexley in Motion</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://hexley.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1935045230982980752/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://hexley.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Hexley</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1935045230982980752.post-1036325724707484703</id><published>2009-01-28T19:59:00.000-08:00</published><updated>2009-01-28T20:47:54.056-08:00</updated><title type='text'>The Pipeline</title><content type='html'>&lt;span class="Apple-style-span"  style="font-family:verdana;"&gt;What if I told you that the best thing that happened to parallel programming was the following:&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;main()&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    for (each unique func1, func3, and inArray1 dataset)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    {&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;        "setup" func1, func3, and inArray1...&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;        compute();&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="  "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    }&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="  "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;compute()&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="  "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    for (each element in inArray1)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:arial;"&gt;&lt;span class="Apple-style-span"  style=" ;font-family:Georgia;"&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;        outArray1[i] = func1(inArray1[i]);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    }&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    // func2 is a well known transformation&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    // that takes input from outArray1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    // and outputs to outArray2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    func2(outArray1[], outArray2[]);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    for (each element in outArray2)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    {&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;        outArray3[i] = func3(outArray2[i]);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;    }&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style=" ;font-family:'courier new';font-size:13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style=" "&gt;&lt;span class="Apple-style-span" style="  "&gt;&lt;span class="Apple-style-span" style="font-family: verdana;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Programmers are given the freedom to implement main, but have absolutely no control over the structure of the compute function other than the ability to implement func1 and func3. The for loops in compute function is executed super fast because it's on dedicated parallel hardware that executes as many elements together at a time. &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: verdana;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: verdana;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;It's the best thing that happened to parallel programming becuase its structure ensures no data dependency and there's no synchronization primitives needed. &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: verdana;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: verdana;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;As you can guess by now, func1 is our vertex program, func2 is the GPU rasterizer, and func3 is our fragment program. And it is the way people program real-time graphics right now.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1935045230982980752-1036325724707484703?l=hexley.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hexley.blogspot.com/feeds/1036325724707484703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://hexley.blogspot.com/2009/01/pipeline.html#comment-form' title='39 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1935045230982980752/posts/default/1036325724707484703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1935045230982980752/posts/default/1036325724707484703'/><link rel='alternate' type='text/html' href='http://hexley.blogspot.com/2009/01/pipeline.html' title='The Pipeline'/><author><name>Hexley</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>39</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1935045230982980752.post-4216887548081427870</id><published>2009-01-16T23:40:00.000-08:00</published><updated>2009-01-17T12:10:28.169-08:00</updated><title type='text'>The future of real-time rendering</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Seem like everyone in the industry is speculating where real-time rendering is heading these days. People are either really excited about &lt;span id="SPELLING_ERROR_0" class="blsp-spelling-error"&gt;Larrabee&lt;/span&gt; or really doubtful that Intel can make compete with &lt;span id="SPELLING_ERROR_1" class="blsp-spelling-error"&gt;ATI&lt;/span&gt;/&lt;span id="SPELLING_ERROR_2" class="blsp-spelling-error"&gt;NVIDIA&lt;/span&gt;. After all, Intel is the same company that brought us i740 and the beloved Intel &lt;span id="SPELLING_ERROR_3" class="blsp-spelling-error"&gt;IGP&lt;/span&gt; (integrated graphics platform). The way I see it, whether Intel will succeed or not depends on one question: "Do we need more &lt;span id="SPELLING_ERROR_4" class="blsp-spelling-error"&gt;programmability&lt;/span&gt; in real-time graphics?" If you believe that the current state of affairs: triangles, maps (textures, light map, normal map, ...etc.), and &lt;span id="SPELLING_ERROR_5" class="blsp-spelling-error"&gt;shaders&lt;/span&gt; (vertex and fragment/pixel) is all we need, then it's quite possible that &lt;span id="SPELLING_ERROR_6" class="blsp-spelling-error"&gt;Larrabee&lt;/span&gt; will down history as another one of Intel's failed attempts. But if &lt;span id="SPELLING_ERROR_7" class="blsp-spelling-error"&gt;DX&lt;/span&gt;10 and &lt;span id="SPELLING_ERROR_8" class="blsp-spelling-error"&gt;DX&lt;/span&gt;11 level hardware has any appeal to you at all, then what you need to ask is: why am I stuck with the pipeline?&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Make no mistake, there's nothing magical about the &lt;span id="SPELLING_ERROR_9" class="blsp-spelling-error"&gt;GPU&lt;/span&gt; trumping CPU in terms of graphics performance back in the day &lt;span id="SPELLING_ERROR_10" class="blsp-spelling-error"&gt;GPU&lt;/span&gt; was introduced. &lt;span id="SPELLING_ERROR_11" class="blsp-spelling-error"&gt;GPU&lt;/span&gt; is fast for graphics because it's a &lt;span id="SPELLING_ERROR_12" class="blsp-spelling-corrected"&gt;parallel&lt;/span&gt; machine on chip and raster graphics is an &lt;span id="SPELLING_ERROR_13" class="blsp-spelling-corrected"&gt;inherently&lt;/span&gt; &lt;span id="SPELLING_ERROR_14" class="blsp-spelling-corrected"&gt;parallel&lt;/span&gt; problem. Everything in the pipeline comes in multiple of them: &lt;span id="SPELLING_ERROR_15" class="blsp-spelling-error"&gt;vertices&lt;/span&gt;, maps, and pixels. In the end, they are all arrays of data. Better yet, they are arrays of data that you perform similar tasks on. &lt;span id="SPELLING_ERROR_16" class="blsp-spelling-error"&gt;GPU&lt;/span&gt; architects observed this, and rightfully optimized for data and instruction &lt;span id="SPELLING_ERROR_17" class="blsp-spelling-corrected"&gt;parallelism&lt;/span&gt;. &lt;span id="SPELLING_ERROR_18" class="blsp-spelling-error"&gt;CPUs&lt;/span&gt; never had the luxury of building such &lt;span id="SPELLING_ERROR_19" class="blsp-spelling-corrected"&gt;parallelism&lt;/span&gt;, its task was to be the conductor, and the trait of a conductor is to be able to branch fast and reduce latency when computing. It will never see a thousands/millions element array that it can perform the same instructions on. Cache is the &lt;span id="SPELLING_ERROR_20" class="blsp-spelling-error"&gt;CPUs&lt;/span&gt; answer to reducing latency, because everything is unpredictable, the more memory it can pack in its cache the faster it can fetch thus compute. When given the chance a CPU architect rather use silicon for cache to increase odds of reducing latency than to use it for computation. (The &lt;span id="SPELLING_ERROR_21" class="blsp-spelling-error"&gt;flipside&lt;/span&gt; is: bigger the cache, the slower it is. This is why cache comes in levels.) &lt;span id="SPELLING_ERROR_22" class="blsp-spelling-error"&gt;GPU&lt;/span&gt; is almost the exact opposite, it doesn't need cache, because CPU sends it a stream that is the definition of predictable. The vertex &lt;span id="SPELLING_ERROR_23" class="blsp-spelling-error"&gt;shader&lt;/span&gt; streams another predictable stream to geometry setup unit. The &lt;span id="SPELLING_ERROR_24" class="blsp-spelling-error"&gt;rasterizer&lt;/span&gt; sends stream of quads to the fragment &lt;span id="SPELLING_ERROR_25" class="blsp-spelling-error"&gt;shader&lt;/span&gt;. Fragment &lt;span id="SPELLING_ERROR_26" class="blsp-spelling-error"&gt;shader&lt;/span&gt; accumulates results in color buffers (tiling of these further promotes uniform access). It might have branches when following through the stages, but you can bet that there's more predictability in each stage than there is randomness. What's the point of all this? To illustrate that CPU and &lt;span id="SPELLING_ERROR_27" class="blsp-spelling-error"&gt;GPU&lt;/span&gt; architects take the same computer architecture classes, and that there isn't some black magic sauce that allows &lt;span id="SPELLING_ERROR_28" class="blsp-spelling-error"&gt;ATI&lt;/span&gt;/&lt;span id="SPELLING_ERROR_29" class="blsp-spelling-error"&gt;NVIDIA&lt;/span&gt; chips to be fast. It it simply optimizing for the use case. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;The post is getting long, I guess I will rant about being stuck in a pipeline next time...&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1935045230982980752-4216887548081427870?l=hexley.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://hexley.blogspot.com/feeds/4216887548081427870/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://hexley.blogspot.com/2009/01/future-of-real-time-rendering.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1935045230982980752/posts/default/4216887548081427870'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1935045230982980752/posts/default/4216887548081427870'/><link rel='alternate' type='text/html' href='http://hexley.blogspot.com/2009/01/future-of-real-time-rendering.html' title='The future of real-time rendering'/><author><name>Hexley</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
