Fix what was requested to be fixed and finish some unfinished sentences, add an explanation of the clock generation algorithm

2026-02-22 09:35:35 -07:00 · 2025-01-18 11:47:10 +00:00
parent ff85e2cdd0
commit 9117da862f
1 changed files with 16 additions and 14 deletions
--- a/_posts/2025-01-17-riva128-part-1.md
+++ b/_posts/2025-01-17-riva128-part-1.md
@@ -31,7 +31,7 @@ This is the first part of a series of blog posts that aims to demystify, once an
 ## A brief history

 ### Beginnings
-NVidia was conceived in 1992 by three engineers from LSI Logic and Sun Microsystems - Jensen Huang (now one of the world's richest men, still the CEO and, apparently, mobbed by fans in the country of his birth, Taiwan), Curtis Priem (whose boss almost convinced him to work on Java instead of founding the company) and Chris Malachowsky (a veteran of graphics chip development). They saw a business opportunity in the PC graphics and audio market, which was dominated by low-end, high-volume players such as S3 Graphics, Tseng Labs, Cirrus Logic and Matrox (the only one that still exists today - after exiting the consumer graphics market in 2003 and ceasing to design graphics cards entirely in 2014). The company was formally founded on April 5, 1993, after all three left their jobs at LSI Logic and Sun between December 1992 and March 1993. Immediately (well, after the requisite $3 million of venture capital funding was acquired - a little nepotism owing to their reputation helped) began work on its first generation graphics chip; it was one of the first of a rush of dozens of companies attempting to develop graphics cards - both established players in the 2D graphics market such as Number Nine and S3, and new companies, almost all of which no longer exist - and many of which failed to even release a single graphics card. The name was initially GXNV ("GX next version", after a graphics card Malachowsky led the development of at Sun), but Huang requested him to rename the card to NV1 in order to not get sued. This also inspired the name of the company - NVidia, after other names such as "Primal Graphics" and "Huaprimal" were considered and rejected, and their originally chosen name - Invision - turned out to have been trademarked by a toilet paper comapny. Toilet paper, in a perhaps ironic twist of fate, turned out to be an apt metaphor for the sales, if not quality, of their first product, which Jensen Huang appears to be embarassed to discuss when asked, and has been quotes as saying "You don't build NV1 because you're great". The product was released in 1995 after a two-year development cycle and the creation of what Nvidia dubbed a hardware simulator, but actually appears to have been simply a set of Windows 3.x drivers intended to emulate their architecture, called the NV0 in 1994. 
+NVidia was conceived in 1992 by three engineers from LSI Logic and Sun Microsystems - Jensen Huang (now one of the world's richest men, still the CEO and, apparently, mobbed by fans in the country of his birth, Taiwan), Curtis Priem (whose boss almost convinced him to work on Java instead of founding the company) and Chris Malachowsky (a veteran of graphics chip development). They saw a business opportunity in the PC graphics and audio market, which was dominated by low-end, high-volume players such as S3 Graphics, Tseng Labs, Cirrus Logic and Matrox (the only one that still exists today - after exiting the consumer graphics market in 2003 and ceasing to design graphics cards entirely in 2014). The company was formally founded on April 5, 1993, after all three left their jobs at LSI Logic and Sun between December 1992 and March 1993. Immediately (well, after the requisite $3 million of venture capital funding was acquired - a little nepotism owing to their reputation helped) began work on its first generation graphics chip; it was one of the first of a rush of dozens of companies attempting to develop graphics cards - both established players in the 2D graphics market such as Number Nine and S3, and new companies, almost all of which no longer exist - and many of which failed to even release a single graphics card. The name was initially GXNV ("GX next version", after a graphics card Malachowsky led the development of at Sun), but Huang requested him to rename the card to NV1 in order to not get sued. This also inspired the name of the company - NVidia, after other names such as "Primal Graphics" and "Huaprimal" were considered and rejected, and their originally chosen name - Invision - turned out to have been trademarked by a toilet paper company. In a perhaps ironic twist of fate, toilet paper turned out to be an apt metaphor for the sales, if not quality, of their first product, which Jensen Huang appears to be embarassed to discuss when asked, and has been quotes as saying "You don't build NV1 because you're great". The product was released in 1995 after a two-year development cycle and the creation of what Nvidia dubbed a hardware simulator, but actually appears to have been simply a set of Windows 3.x drivers intended to emulate their architecture, called the NV0 in 1994. 

 ### The NV1
 The NV1 was a combination graphics, audio, DRM (yes, really) and game port card implementing what Nvidia dubbed the "NV Unified Media Architecture (UMA)"; the chip was manufactured by SGS-Thomson Microelectronics - now STMicroelectronics - on the 350 nanometer node, who also white-labelled Nvidia's design (except the DAC, which seems to have been designed by SGS, at least based on the original contract text from 1993) as the STG-2000 (without audio functionality - this was also called the "NV1-V32", for 32-bit VRAM, in internal documentation, with Nvidia's being the NV1-D64). The card was designed to implement a reasonable level of 3D graphics functionality, as well as audio, public-key encryption for DRM purposes (which was never used, as it would have required the cooperation of software companies) and Sega Saturn game ports in a single megabyte of memory (memory cost $50 a megabyte when the initial design of the NV1 chip began in 1993). In order to achieve this, many techniques had to be used that ultimately compromised the quality of the 3D rendering of the card, such as using forward texture mapping, where a texel (output pixel) of a texture is directly mapped to a point on the screen, instead of the more traditional inverse texture mapping, which iterates through pixels and maps texels from those. While this has memory space advantages (as you can cache the texture in the very limited amount of VRAM Nvidia had to work with very easily), it has many more disadvantages - firstly, this approach does not support UV mapping (a special coordinate system used to map textures to three-dimensional objects) and other aspects of what would be considered to be today basic graphical functionality. Additionally, the fundamental implementation of 3D rendering used quad patching instead of traditional triangle-based approaches - this has very advantageous implications for things like curved surfaces, and may have been a very effective design for the CAD/CAM customers purchasing more high end 3D products. However, it turned out to not be particularly useful at all for the actually intended target market - gaming. There was also a total lack of SoundBlaster compatibility (required for audio to sound half-decent in many games) in the audio engine, and partially-emulated and very slow VGA compatibility, which led to slow performance in the games people *actually played*, unless your favourite game was a crappier, slower version of Descent, Virtua Cop or Daytona USA for some reason. Another body blow to Nvidia was received when Microsoft released Direct3D in 1996 with DirectX 2.0, which simultaneously used triangles, became the standard 3D API and killed all of the numerous non-OpenGL proprietary 3D apis, including S3's S3D and later Metal, ATIs 3DCIF, and Nvidia's NVLIB.
@@ -47,7 +47,7 @@ At this point, Nvidia had no sales, no customers, and barely any money (at some

 However, this was nothing compared to the body blow about to hit the entire industry, Nvidia included. At a conference in early 1996, an $80,000 SiliconGraphics (then the world leader in accelerated graphics) machine crashed during a demo by the then-CEO Ed McCracken. While they were rebooting the machine, if accounts of the event are to be believed, the people in the event started leaving, many of which, based in rumours that they had heard heading downstairs to another demo by a then-tiny company made up of ex-SGI employes calling itself "3D/fx" (later shortened to 3dfx), which was claiming comparable graphics quality for $250...and had demos to prove it. In many of the cases of supposed "wonder innovations" in the tech industry, it turns out to be too good to be true, but when their card, the "Voodoo Graphics" was first released in the form of the "Righteous 3D" by Orchid in October 1996, it turned out to be true. Despite the fact that it was a 3D-only card and required a 2D card to be installed, and the fact it could not accelerate graphics in a window (which almost all other 3d cards could do), the card's performance was so high relative to the other efforts (including the NV1) that it not only had rave reviews on its own but kicked off a revolution in consumer 3D graphics, which especially caught fire when GLQuake was released in January 1997.

-The reasons that 3dfx was able to design such an effective GPU when all others failed are numerous - the price of RAM plummeted by 80% through 1996  (which allowed 3dfx to cut thei estimated retail price for the Voodoo from $1000 to $300), many of their staff members came from what at that time was perhaps the most respected and certainly the largest company in the graphics industry, SiliconGraphics (which by 1997 had over fifteen years of experience in developing graphical hardware), and while 3dfx used the proprietary Glide API, it also supported OpenGL and Direct3D - Glide was designed to be very similar to OpenGL while allowing for 3dfx to approximate standard graphical techniques, which, as well as their driver design - the Voodoo only accelerates edge interpolation (where a triangle is converted into "spans" of horizontal lines, and the positions of nearby vertexes are used to determine the span's start and end positions), texture mapping and blending, span interpolation (which to simplify a complex topic generally involves, in a GPU of this era, z-buffering, also known as depth buffering, sorting polygons back to front, and color buffering, storing the color of each pixel sent to the screen in a buffer which allows for blending and alpha transparency), and final presentation of the rendered 3D scene - the rest was all done in software. All of these reasons were key to the low price and high quality of the card. 
+The reasons that 3dfx was able to design such an effective GPU when all others failed are numerous - the price of RAM plummeted by 80% through 1996  (which allowed 3dfx to cut their estimated retail price for the Voodoo from $1000 to $300), many of their staff members came from what at that time was perhaps the most respected and certainly the largest company in the graphics industry, SiliconGraphics (which by 1997 had over fifteen years of experience in developing graphical hardware), and while 3dfx used the proprietary Glide API, it also supported OpenGL and Direct3D - Glide was designed to be very similar to OpenGL while allowing for 3dfx to approximate standard graphical techniques, which, as well as their driver design - the Voodoo only accelerates edge interpolation (where a triangle is converted into "spans" of horizontal lines, and the positions of nearby vertexes are used to determine the span's start and end positions), texture mapping and blending, span interpolation (which to simplify a complex topic generally involves, in a GPU of this era, z-buffering, also known as depth buffering, sorting polygons back to front, and color buffering, storing the color of each pixel sent to the screen in a buffer which allows for blending and alpha transparency), and final presentation of the rendered 3D scene - the rest was all done in software. All of these reasons were key to the low price and high quality of the card. 

 Effectively, Nvidia had to design a graphics architecture that could at very least get close to 3dfx's performance, on a shoestring budget, with very little resources (60% of the staff, including the entire sales and marketing teams, having been laid off to preserve money). Since they did not have the time, they not only could not completely redesign the NV1 from scratch if they felt the need to do this (this would take two years - time that Nvidia didn't have - and any design that came out of this effort would be immediately obsoleted by other companies, such as 3dfx's Voodoo line and ATI with its initially rather pointless, but rapidly advancing in performance and driver stability, Rage series of chips) the chip would have to work reasonably well on the first tapeout, as they simply did not have the capital to produce more revisions of the chip, The fact they were able to achieve a successful design in the form of the NV3 under such conditions was testament to the intelligence, skill and luck of Nvidia's designers. Later on in this blogpost, we will explore how they managed to achieve this.

@@ -56,15 +56,15 @@ It was with these financial, competitive and time constraints in mind that desig

 After the NV2 disaster, the company made several calls on the NV3's design that turned out to be very good decisions. First, they acquiesced to Sega's advice (which they might have done already, but too late, to save the Mutara V08/NV2) and moved to an inverse texture mapping triangle based model (although some remnants of the original quad patching design remain) and removed the never-used DRM functionality from the card. This may have been assisted by the replacement of Curtis Priem with the rather egg-shaped David Kirk, perhaps notable as a "Special Thanks" credit on Gex and the producer of the truly unparalleled *3D Baseball* on the Sega Saturn during his time at Crystal Dynamics, as chief designer - Priem insisted on including the DRM functionality with the NV1, because back when he worked at Sun, the game he had written as a demo of the GX GPU designed by Malachowsky was regularly pirated. Another decision that turned out to pay very large dividends was deciding to forgo a native API entirely and entirely build the card around accelerating the most popular graphical APIs - which led to an initial focus on Direct3D (although OpenGL drivers were first publicly released in alpha form in December 1997, and released fully in early 1998). Initially DirectX 3.0 was targeted, but 5.0 came out late during the development of the chip (4.0 was cancelled due to lack of developer excitement about its functionality) and the chip is mostly Direct3D 5.0 compliant (with the exception of some blending modes such as additive blending, which Jensen Huang later claimed was due to Microsoft not giving them the specification in time), which was made much easier by the design of their driver (which allowed, and still allows, graphical APIs to be plugged in as "clients" to the Resource Manager kernel - as I mentioned earlier, this will be explained in full detail later). The VGA core (which was so separate from the main GPU on the NV1 that it had its own PCI ID) was replaced by a VGA core licensed from Weitek (who would soon exit the graphics market), which was placed in the chip parallel to the main GPU with its own 32-bit bus, which massively accelerated performance in unaccelerated VESA titles, like Doom - and provided a real advantage over the 3D-only 3dfx cards (3dfx did have a combination card, the SST-96 or Voodoo Rush, but it used a crappy Alliance card and was generally considered a failure). Finally, Huang, in his capacity as the CEO, allowed the chip to be expanded (in terms of physical size and number of gates) from its original specification, allowing for a more complex design with more features. 

-The initial revision of the architecture appears to have been completed in January 1997. Then, aided by hardware simulation software (unlike the NV0, an actual hardware simulation) purchased from another almost-bankrupt company, an exhaustive test set was completed. The first bug presented itself almost immediately when the "C" character in the MS-DOS codepage appeared incorrectly, Windows took 15 minutes to boot, and moving the mouse cursor required a map of the screen so you didn't lose it by moving too far, but ultimately the testing was completed. However, Nvidia didn't have the money to respin the silicon for a second stepping if problems appeared, so it had to work at least reasonably well in the first stepping. Luckily for Nvidia, when the card came back it worked well enough to be sold to Nvidia's board partners (almost certainly due to that hardware simulation package they had), and the company survived - most accounts indicate it was only three or four weeks away from bankruptcy; when 3dfx saw the RIVA 128 at its reveal at the CGDC 1997 conference, the response of one of the founders was "You guys are still around?" - Nvidia's financial problems were so severe that 3dfx almost *bought* Nvidia, effectively for the purpose of killing the company as a theoretical competitor, but refused as they assumed they wouldbe bankrupt within months anyway (a disastrous decision). However, this revision of the chip - revision A - was not the revision that Nvidia actually commercialised; SGS-Thompson dropped the plans for the STG-3000 at some point, which led Nvidia, now flush with cash - revenue in the first nine months of 1997 was only $5.5 million, but skyrocketed up to $23.5 million in the last three months - the first three month period of the RIVA 128's availability, owing to the numerous sales of RIVA 128 chips to add-in board partners, to create a new revision of he chip to remove the sound functionality (although some remnants of it were left after it was removed, and it appears some errata were fixed and other minor adjustments made to the silicon) and respun the chip, with the revision B silicon being completed in October 1997 and presumably available a month or two later. It is most likely that some revision A cards were sold at retail, but based on the dates, these would have to be very early units, with the earliest Nvidia RIVA 128 drivers that I have discovered (labelled as "Version 0.75") dated August 1997 (these also have NV1 support - and actually are the only Windows NT drivers with NV1 support), and reviews starting to drop on websites like Anandtech in the first half of September 1997. There are no known drivers for the audio functionality in the revision A of the RIVA 128 available, so anyone wishing to use it would have to write custom drivers to actually use it. 
+The initial revision of the architecture appears to have been completed in January 1997. Then, aided by hardware simulation software (unlike the NV0, an actual hardware simulation) purchased from another almost-bankrupt company, an exhaustive test set was completed. The first bug presented itself almost immediately when the "C" character in the MS-DOS codepage appeared incorrectly, Windows took 15 minutes to boot, and moving the mouse cursor required a map of the screen so you didn't lose it by moving too far, but ultimately the testing was completed. However, Nvidia didn't have the money to respin the silicon for a second stepping if problems appeared, so it had to work at least reasonably well in the first stepping. Luckily for Nvidia, when the card came back it worked well enough to be sold to Nvidia's board partners (almost certainly due to that hardware simulation package they had), and the company survived - most accounts indicate it was only three or four weeks away from bankruptcy; when 3dfx saw the RIVA 128 at its reveal at the CGDC 1997 conference, the response of one of the founders was "You guys are still around?" - Nvidia's financial problems were so severe that 3dfx almost *bought* Nvidia, effectively for the purpose of killing the company as a theoretical competitor, but refused, as they assumed they would be bankrupt within months anyway (a disastrous decision). However, this revision of the chip - revision A - was not the revision that Nvidia actually commercialised; SGS-Thompson dropped the plans for the STG-3000 at some point, which led Nvidia, now flush with cash (revenue in the first nine months of 1997 was only $5.5 million, but skyrocketed up to $23.5 million in the last three months - the first three month period of the RIVA 128's availability, owing to the numerous sales of RIVA 128 chips to add-in board partners), to create a new revision of the chip to remove the sound functionality (although some remnants of it were left after it was removed); some errata was also fixed and other minor adjustments made to the silicon - there are mentions of quality problems with early cards in a lawsuit filed against STB Systems (who were the first OEM partner for the Riva 128), it is not clear if the problems were on STB or Nvidia's end and respun the chip, with the revision B silicon being completed in October 1997 and presumably available a month or two later. It is most likely that some revision A cards were sold at retail, but based on the dates, these would have to be very early units, with the earliest Nvidia RIVA 128 drivers that I have discovered (labelled as "Version 0.75") dated August 1997 (these also have NV1 support - and actually are the only Windows NT drivers with NV1 support), and reviews starting to drop on websites like Anandtech in the first half of September 1997. There are no known drivers for the audio functionality in the revision A of the RIVA 128 available, so anyone wishing to use it would have to write custom drivers to actually use it. 

-The card generally reviewed quite well, with raw speed, although not video quality, higher than the Voodoo1 - most likely, the lower quality of the NV3 architecture's graphics output - especially its highly criticised diffing and the  owing to rushed development (due to Nvidia's financial situation) leading to shortcuts being taken in the GPU design process in order to ship on time. For example, some of the Direct3D 5.0 blending modes are not supported, and per-polygon mipmapping, a graphical technique involving scaling down textures as you move away from an object in order to prevent shimmering, was used instead of the more accurate per pixel approach, causing seams between different mipmapping layers. Furthermore, some games exhibited seams, and the drivers were generally very rough at launch (although the version 3.xx drivers released in 1998 and 1999 were much better, and even included OpenGL support). Over a million units were sold within a few months and Nvidia's immediate existence as a company was secured; an enhanced version (revision C, also called "NV3T"), branded as the RIVA 128 ZX, was released in March 1998 in order to compete with a hypothetically very fast and much-hyped, but not actually very good card, the Intel/Lockheed Martin i740 chip (the predecessor of the universally detested Intel iGPUs). As of 2024, Intel has finally managed to produce a graphics card people actually want to buy (and aren't forced to due to a lack of financial resources, as with their iGPUs), the mid-range Intel Arc B580, based on the "Battlemage" architecture, Intel's 16th-generation GPU architecture (or in Intel parlance, the 13th, because of great names such as "Generation 12.7"). Better late than never, I guess.
+The card generally reviewed quite well at its launch and was considered as the fastest graphics card released in 1997, with raw speed, although not video quality, higher than the Voodoo1 - most likely, the lower quality of the NV3 architecture's graphics output owes much to the card's rushed development (due to Nvidia's financial situation) leading to shortcuts being taken in the GPU design process in order to ship on time. For example, some of the Direct3D 5.0 blending modes are not supported, and per-polygon mipmapping, a graphical technique involving scaling down textures as you move away from an object in order to prevent shimmering, was used instead of the more accurate per pixel approach, causing seams between different mipmapping layers. The dithering quality and the quality of the Riva 128's bilinear texture filtering were often criticised. Furthermore, some games exhibited seams between polygons, and the drivers were generally very rough at launch, especially if the graphics card was an upgrade and previous drivers were not. While Nvidia were able to fix many of the driver issues by the time of the version 3.xx drivers, which were released in 1998 and 1999, and even wrote a fairly decent OpenGL ICD, the standards for graphical quality had risen over time and what was considered "decent" in 1997 was considered to be "bad" and even "awful" by 1999. Nevertheless, Over a million units were sold within a few months and Nvidia's immediate existence as a company was secured; an enhanced version (revision C, also called "NV3T"), branded as the RIVA 128 ZX, was released in March 1998 in order to compete with a hypothetically very fast and much-hyped, but not actually very good card, the Intel/Lockheed Martin i740 chip (the predecessor of the universally detested Intel iGPUs). As of 2024, Intel has finally managed to produce a graphics card people actually want to buy (and aren't forced to due to a lack of financial resources, as with their iGPUs), the mid-range Intel Arc B580, based on the "Battlemage" architecture, Intel's 16th-generation GPU architecture (or in Intel parlance, the 13th, because of great names such as "Generation 12.7"). Better late than never, I guess.

 After all of this history and exposition, we are finally ready to actually explore the GPU behind the RIVA 128 series. I refer to it as NV3, as NV3 is the architecture behind it (and this can be used to refer to all cards manufactured using it, including STG-3000, if a prototype form of one does ever turn out to exist, RIVA 128, and RIVA 128 ZX). Note that the architecture of the 32-bit Weitek core will not be discussed in length here unless it is absolutely required. It's pretty much a standard SVGA, and really is not that interesting compared to the main card. It's not even substantially integrated with the main GPU, although there are a few areas in the design that allow the main GPU to write directly to the Weitek's registers.

 ## Architectural Overview

-The NV3 is the third-generation of the NV architecture designed by Nvidia in 1997, commercialised as the RIVA 128 (or RIVA 128 ZX). It implements a "partially" (by modern standards; by the standards of 1997 it was one of the more fully featured and complete accelerators available) hardware-accelerated, fixed-function 2D and 3D render path, primarily aimed at desktop software and video games. It can use the legacy PCI 2.1 bus, or the then-brand new AGP 1X bus, with the RIVA 128 ZX improving this further in order to use AGP 2X. The primary goal of the architecture was to be cheap to manufacture, be completed quickly (due to the very bad financial condition of Nvidia at that time), and to beat the 3dfx Voodoo1 in raw pixel pushing performance. It generally achieved these goals, with some caveats, with a cost of $15 per chip in bulk, a design period of somewhere around nine months (excluding Revision B), and mostly better than Voodoo performance (although the Glide API did help 3dfx out); the Nvidia architecture is much more efficient at drawing small triangles, but this rapidly drops off to a slightly-better-than-the-Voodoo raw performance when drawing larger triangles. While the focus of study has been the Revision B card, efforts have been made to understand both the A and C revisions. To change revision, the NV_PFB_BOOT_0 register in MMIO space (at offset `0x100000`) must return the following values:
+The NV3 is the third-generation of the NV architecture designed by Nvidia in 1997, commercialised as the RIVA 128 (or RIVA 128 ZX). It implements a "partially" (by modern standards; by the standards of 1997 it was one of the more fully featured and complete accelerators available) hardware-accelerated, fixed-function 2D and 3D render path, primarily aimed at desktop software and video games. It can use the legacy PCI 2.1 bus, or the then-brand new AGP 1X bus, with the RIVA 128 ZX improving this further in order to use AGP 2X. The primary goal of the architecture was to be cheap to manufacture, be completed quickly (due to the very bad financial condition of Nvidia at that time), and to beat the 3dfx Voodoo1 in raw pixel pushing performance. It generally achieved these goals, with some caveats, with a cost of $15 per chip in bulk, a design period of somewhere around nine months (excluding Revision B), and mostly better than Voodoo performance (although the Glide API did help 3dfx out); the Nvidia architecture is much more efficient at drawing small triangles, but this rapidly drops off to a slightly-better-than-the-Voodoo raw performance (which probably ends up being less efficient overall due to the Riva's higher clockspeed) when drawing larger triangles. While the focus of study has been the Revision B card, efforts have been made to understand both the A and C revisions. To change revision, the NV_PFB_BOOT_0 register in MMIO space (at offset `0x100000`) must return the following values:

 | Revision | NV_PFB_BOOT_0 value |
 | -------- | ------------------- |
@@ -84,8 +84,7 @@ Furthermore, the PCI configuration space Revision ID register must return the fo

 There is a common misconception that the PCI ID is different on RIVA 128 ZX chips. This is partially true, but misleading. The standard NV3 architecture uses a PCI vendor ID of `0x12D2` (labelled as "SGS/Thomson-Nvidia joint venture" - not the later Nvidia vendor ID!) and `0x0018` for the device ID. If ACPI is enabled on a Riva 128 ZX, the ID changes to `0x0019`. However, the presence of a 0x0019 device ID is not sufficient: the revision must be C, or 0x20, for a Riva 128 ZX to be detected and the specific Device ID does not matter. This has been verified by reading both reverse engineered VBIOS and driver code. The device ID can be either value, the best way to check is to use the revision ID encoded into the board at manufacturing time (either using the NV_PFB_BOOT_0 register, or the PCI configuration space registers). 

-The NV3 architecture incorporates accelerated triangle setup, which the Voodoo Graphics only implements around two thirds of, the aforementioned span and edge interpolation, texture mapping, blending, and final presentation. It does not accelerate the initial polygon transformation or lighting rendering phases. It is capable of rendering in 2D at a resolution of up to 1280x1024 (at least 1600x1200 in ZX, not sure what?) and 32-bit colour. 3D rendering is only possible in 16-bit colour, and at 960x720 or lower in a 4MB card due to a lack of VRAM. While 2MB and even 1MB cards were planned, they were seemingly never released. The level of pain of using them can only be imagined; there were also low-end cards released that only used a 64-bit bus - handled using a manufacture-time configuration mechanism, sometimes exposed via DIP switches, known as the straps, which will be explained in Part 2. The RIVA 128 ZX, to compete with the i740, had, among other changes that will be described later, an increased amount of VRAM (8 Megabytes) that also allowed it to perform 3D at higher resolutions of up to 1280x1024. The design of the Riva is very complex compared to other contemporaneous video cards; I am not sure why such a complex design was used, but it was inherited from the NV1 - the only real reason I can think of is that the overengineered design is intended to be future-proof and easy to enhance without requiring complete rewiring of the silicon, as many other companies had to do. EDID is supported for monitor identification via an entirely software-programmed I2C bus. The GPU is split into a large number (around a dozen) subsystems, each one of which starts with the letter "P" for some reason; some examples of subsystems are `PGRAPH`, `PTIMER`, `PFIFO`, `PRAMDAC` and `PBUS` - presumably, a subsystem has a 1:1 mapping with a functional block on the GPU die, since the registers are named after the subsystem that they are a part of. There are several hundred different registers across the entire graphics card, so things are necessarily simplified for brevity, at least in Part 1. To be honest, the architecture of this graphics card is too complicated to show in a diagram without simplifying things so much as to be effectively pointless or complicating it to the point of not being useful (I tried!), so one hasn't been provided.
-
+The NV3 architecture incorporates accelerated triangle setup, which the Voodoo Graphics only implements around two thirds of, the aforementioned span and edge interpolation, texture mapping, blending, and final presentation. It does not accelerate the initial polygon transformation or lighting rendering phases. It is capable of rendering in 2D at a resolution of up to 1280x1024 (at least 1600x1200 in ZX, not sure what?) and 32-bit colour. 3D rendering is only possible in 16-bit colour, and at 960x720 or lower in a 4MB card due to a lack of VRAM. While 2MB and even 1MB cards were planned, they were seemingly never released. The level of pain of using them can only be imagined; there were also low-end cards released that only used a 64-bit bus - handled using a manufacture-time configuration mechanism, sometimes exposed via DIP switches, known as the straps, which will be explained in Part 2. The RIVA 128 ZX, to compete with the i740, had, among other changes that will be described later, an increased amount of VRAM (8 Megabytes) that also allowed it to perform 3D at higher resolutions of up to 1280x1024. The design of the Riva is very complex compared to other contemporaneous video cards; I am not sure why such a complex design was used, but it was inherited from the NV1 - the only real reason I can think of is that the overengineered design is intended to be future-proof and easy to enhance without requiring complete rewiring of the silicon, as many other companies had to do. EDID is supported for monitor identification via an entirely software-programmed I2C bus. The GPU is split into a large number (around a dozen) of subsystems (or really "functional blocks" since they are implemented as hardware), each one of which starts with the letter "P" for some reason; some examples of subsystems are `PGRAPH`, `PTIMER`, `PFIFO`, `PRAMDAC` and `PBUS` - presumably, a subsystem has a 1:1 mapping with a functional block on the GPU die, since the registers are named after the subsystem that they are a part of. There are several hundred different registers across the entire graphics card, so things are necessarily simplified for brevity, at least in Part 1. To be honest, the architecture of this graphics card is too complicated to show in a diagram without simplifying things so much as to be effectively pointless or complicating it to the point of not being useful (I tried!), so a diagram has not been provided.

 ### Fundamental Concept: The Scene Graph
 In order to begin to understand the Nvidia NV3 architecture, you have to understand the fundamental concept of a scene graph. Although the architecture does not strictly implement a scene graph, the concept is still good to understand how graphical objects are represented by the GPU. A scene graph is a description of a form of tree where the nodes of the tree are graphical objects. The properties of a parent object cascade down to its children; this is how almost all modern game engines represent 3D space (Unity, Unreal, Godot...); a very easy way to understand how a scene graph works is, although with the caveat that characteristics of parent nodes do not automatically cascade down to a child (although they can), is - I am not joking - install Roblox Studio, place some objects into the scene, and save the file as an "RBXLX" file (it has to be RBXLX, as by default since 2013 the engine exports a binary format, although the structure is similar). Then, open it in a text editor of your choice. You will see an XML representation of the scene you have created represented by a scene graph. 
@@ -93,7 +92,11 @@ In order to begin to understand the Nvidia NV3 architecture, you have to underst
 The concept of the scene graph is almost certainly how the functional block of all Nvidia GPUs that actually implements the 2D and 3D drawing engine that makes the GPU, well, a GPU, received its name: `PGRAPH`. This part has survived all the way from very first NV1 all the way to the Blackwell architecture, powering Nvidia's latest AI-focused GPUs and the brand new RTX 5000 series of consumer-focused GPUs (Nvidia has not had a ground-up redesign since they started development of their initial NV1 architecture in 1993, although the Ship of Theseus argument applies here).

 ### Clocks
-The RIVA 128 is not dependent on the host clock of the machine that it is inserted into. It has (depending on boot-time configuration) a 13.5 or 14.3 Megahertz clock crystal that is split by the hardware into the memory clock (MCLK) and the video clock (VCLK). Note that these names are misleading; the memory clock also handles the actual rendering and timing on the card, with the VCLK seemingly just handling the actual pushing out of frames. The actual clocks are set by the Video BIOS (which does not play a serious role in this particular iteration of the Nvidia architecture - it only does a very basic POST sequence, initialises the card and sets its clockspeed. After the card is initialised, it effectively never needs the VBIOS again, although there are mechanisms to read from it after initialisation) and were controlled by the OEM manufacturer using three clock parameters, which the card uses to. The RAMDAC in the card, which handles final conversion of the digital image generated by the GPU into an analog video signal, has its own clock (ACLK) that ran at around 200 Mhz in the RIVA 128 (revision A/B) and 260 Mhz in the revision C (RIVA 128 ZX) cards. It was not configurable by OEM manufacturers, unlike the other cards.
+The RIVA 128 is not dependent on the host clock of the machine that it is inserted into. It has (depending on boot-time configuration) a 13.5 or 14.3 Megahertz clock crystal that is split by the hardware into the memory clock (MCLK) and the video clock (VCLK). Note that these names are misleading; the memory clock also handles the actual rendering and timing on the card, with the VCLK seemingly just handling the actual pushing out of frames. The actual clocks are controlled by registers in `PRAMDAC` set by the Video BIOS (which does not otherwise play a serious role in this particular iteration of the Nvidia architecture - it only does a very basic POST sequence, initialises the card and sets its clockspeed), and can later be overridden by the drivers. After the card is initialised, it effectively never needs the VBIOS again, although there are mechanisms to read from it after initialisation) and were controlled by the OEM manufacturer using three clock parameters (`m`, `n` and `p`), which the card uses to generate the final memory and pixel clock speed using the following algorithm:
+
+`(frequency * nv3->pramdac.pixel_clock_n) / (nv3->pramdac.pixel_clock_m << nv3->pramdac.pixel_clock_p);`
+
+The RAMDAC in the card, which handles final conversion of the digital image generated by the GPU into an analog video signal and clock generation (via three phase-locked loops), has its own clock (ACLK) that ran at around 200 Mhz in the RIVA 128 (revision A/B) and 260 Mhz in the revision C (RIVA 128 ZX) cards. It was not configurable by OEM manufacturers, unlike the other cards.

 Generally, most manufacturers set the memory clock at around 100 megahertz and the pixel clock at around 40 Megahertz.

@@ -153,8 +156,7 @@ This is the primary area of memory mapping, and is set up as Base Address Regist
 | `0x681200-0x681FFF` | USER_DAC    | Optional for external DAC?                                          |
 | `0x800000-0xFFFFFF` | USER        | Graphics object submission area (for PFIFO, via DMA)                |

-_Note_: There is a wrinkle to this setup here. The VBIOS has to be able to communicate with the main GPU in real mode when PCI is not available. This is achieved by mapping I/O ports `0x3d0`-`0x3d3` in the Weitek core to the registers for a mechanism called RMA - Real Mode Access - that effectively serve as a mechanism for forming a 32-bit address; when a 32-bit address is formed, (governed by a mode register) the next SVGA x86 I/O port read/write will become a read/write from the main GPU PCI BAR0 MMIO space. This allows the VBIOS to POST the GPU during its initialisation process.
-
+_Note_: There is a wrinkle to this setup here. The VBIOS has to be able to communicate with the main GPU in real mode when PCI is not available. This is achieved by mapping I/O ports `0x3d0`-`0x3d3` in the Weitek core to the registers for a mechanism called RMA - Real Mode Access - that effectively serve as a mechanism for forming a 32-bit address; when a 32-bit address is formed by writing to all four RMA registers, (internally implemented using a mode register) the next SVGA x86 I/O port read/write will become a read/write from the main GPU PCI BAR0 MMIO space. This allows the VBIOS to POST the GPU during its initialisation process.

 #### DFB
 DFB means "Dumb Framebuffer" (that's what Nvidia chose to call it) and is simply a linear framebuffer. It is mapped into PCI BAR1 and has a size of 0x400000 by default (depending on the VRAM size?). In the NV3, it is mapped into BAR1 (on later GPUs it was moved to BAR0 starting at `0x1000000`). It is presumably meant for manipulating the GPU without using its DMA facilities.  
@@ -174,7 +176,7 @@ or in the form of bitwise math - code is from my in progress RIVA 128 emulatino
 I'm not entirely sure why they did this, but I assume it was for providing a more convenient interface to the user and for general efficiency reasons. 

 #### Interrupts
-Any graphics card worth its salt needs an interrupt system. So a REALLY good one must have two completely different systems for notifying other parts of the GPU  about events, right? There is a traditional interrupt system, with both software and hardware support (indicated by bit 31 of the interrupt status register) controlled by a register in `PMC` that turns on and off interrupts for different components of the GPU. Each component of the GPU also allows individual interrupts to be turned on or off, and has its own interrupt status register. Each component (including the removed-in-revision-B `PAUDIO` for some reason) is represented by a bit in the `PMC` interrupt status register. If the interrupt status register of a component, ANDED with the  interrupt status register, is 1, an interrupt is declared to be pending (with some minor exceptions that will be explained in later parts) and a PCI/AGP IRQ is sent. The interrupt registers are set up such that, when they are viewed in hexadecimal, an enabled interrupt appears as a 1 and a disabled interrupt as a 0. Interrupts can be turned off GPU-wide (or for one of just hardware or software) via the `PMC_INTR_EN` register (at `0x0140`)
+Any graphics card worth its salt needs an interrupt system. So a REALLY good one must have two completely different systems for notifying other parts of the GPU  about events, right? There is a traditional interrupt system, with both software and hardware support (indicated by bit 31 of the interrupt status register) controlled by a register in `PMC` that turns on and off interrupts for different components of the GPU. Each component of the GPU also allows individual interrupts to be turned on or off, and has its own interrupt status register. Each component (including the removed-in-revision-B `PAUDIO` for some reason) is represented by a bit in the `PMC` interrupt status register. If the interrupt status register of a component, ANDED with the interrupt status register, is 1, an interrupt is declared to be pending (with some minor exceptions that will be explained in later parts) and a PCI/AGP IRQ is sent. The interrupt registers are set up such that, when they are viewed in hexadecimal, an enabled interrupt appears as a 1 and a disabled interrupt as a 0. Interrupts can be turned off GPU-wide (or for one of just hardware or software) via the `PMC_INTR_EN` register (at `0x0140`)

 This allows an interrupt to be implemented as:

@@ -186,7 +188,7 @@ This allows an interrupt to be implemented as:
 Time-sensitive functions are provided by a nice, simple (except for the fact that, for some strange reason, the counter is 56-bit, split into two 32-bit registers `PTIMER_TIME0`, of which only bits 31 through 5 are meaningful, and `PTIMER_TIME1`...which has bits 28 through 0 meaningful instead?) programmable interval timer that fires an interrupt whenever the threshold value (set by the `PTIMER_ALARM`) is exceeded in nanoseconds. This is how the drivers internally keep track of many actions that they need to perform and is the first functional block you need to get right if you ever hope to emulate the RIVA 128.

 #### Graphics Commands & DMA Engine Overview
-What may be called graphics commands in other GPU architectures are instead called graphics objects in the NV3 architecture, and in fact all Nvidia architectures use this nomenclature. They are submitted into the GPU core via a custom DMA engine (although Parallel I/O can be used) with its own translation lookaside buffer and other memory management structures. There are 8 DMA channels (only one is allowed at a time; a mechanism known as "context switching" must be performed to use other channels (involving writing to PGRAPH registers for every class to set the current channel ID), with channel 0 being the default). All DMA channels are 64-kilobytes in size of RAM called RAMIN (which will be explained later), and are further divided into subchannels that are `0x2000` bytes in length. The meaning of what is in those subchannels depends on the type (or, as Nvidia calls it - class) of object submitted into them, with the attributes of each object being called a method.  All objects have a defined name (really just a 32-bit value) and another 32-bit value storing various information about the object - where it is relative to the start of `RAMIN`, if it is a software-injected or hardware graphical rendering object (bit 31), the channel and subchannel ID the object is associated with, and the object's class. This is called their *context*. Their contexts are stored in an area of RAM called `RAMFC` if the channel they are in is not being used, and if it is, they are stored in `RAMHT` - a hash table*, where the hash key is every byte of the object's name (which must be above 4096 due to Nvidia's drivers reserving IDs below that) XORed individually, which is XORed with the channel ID to get the final hash ID. This is then multiplied by 16 to get the object's offset from the start of RAMHT. (It seems the drivers have to manage that this area does not get full on their own with only basic error handling from the hardware itself!). The first four bytes are its name, then its context, and finally the actual methods of the objects that we discussed earlier.
+What may be called graphics commands in other GPU architectures are instead called graphics objects in the NV3 architecture, and in fact all Nvidia architectures use this nomenclature. They are submitted into the GPU core via a custom DMA engine (although Parallel I/O can be used) with its own translation lookaside buffer and other memory management structures. There are 8 DMA channels (only one is allowed at a time; a mechanism known as "context switching" must be performed to use other channels (involving writing to PGRAPH registers for every class to set the current channel ID), with channel 0 being the default). All DMA channels are 64-kilobytes in size of RAM called RAMIN (which will be explained later), and are further divided into subchannels that are `0x2000` bytes in length. The meaning of what is in those subchannels depends on the type (or, as Nvidia calls it - class) of object submitted into them, with the attributes of each object being called a method. All objects have a defined name (really just a 32-bit value) and another 32-bit value storing various information about the object - where it is relative to the start of `RAMIN`, if it is a software-injected or hardware graphical rendering object (bit 31), the channel and subchannel ID the object is associated with, and the object's class. This is called their *context*. Their contexts are stored in an area of RAM called `RAMFC` if the channel they are in is not being used, and if it is, they are stored in `RAMHT` - a hash table*, where the hash key is every byte of the object's name (which must be above 4096 due to Nvidia's drivers reserving IDs below that) XORed individually, which is XORed with the channel ID to get the final hash ID. This is then multiplied by 16 to get the object's offset from the start of RAMHT. (It seems the drivers have to manage that this area does not get full on their own with only basic error handling from the hardware itself!). The first four bytes are its name, then its context, and finally the actual methods of the objects that we discussed earlier.

 The exact methods of every graphics object are incredibly long and often shared between several different types of objects (although the first `0x100` bytes are shared and usually the first bytes after that are shared too) and won't be listed in part 1, but an overall list of graphics objects (note - these are the graphics objects defined by the *hardware*, the *drivers* implement their own, much larger set of graphics objects that do not map exactly to the ones in the GPU; furthermore, as you will see later, due to the large - 8KB - size of each object, *only one object does not mean only one - or even any - single object is drawn!*):

@@ -235,7 +237,7 @@ A basic (presumably pre-transformed...?) 2D triangle. Depending on the methods u
 * A set of up to 8 triangles with a single arbitrary 32-bit colour for the entire mesh, and three 16-bit position values for each of the triangle's vertexes.
 * A part of a mesh of up to 16 triangles with a 32-bit colour and two 32-bit position values for each of the points on the mesh.

-**`0x0C` (Windows 95 GDI Text Acceleration)**: A piece of hardware functionality intended to acce. This is a very complicated set of clipping logic that won't be covered until Part 3 - it's too long for this part, and I don't fully understand it yet.
+**`0x0C` (Windows 95 GDI Text Acceleration)**: A piece of hardware functionality intended to accelerate the manner by which Windows 95's GDI (and its DIB Engine?) renders text. This is a very complicated set of clipping logic that won't be covered until Part 3 - it's too long for this part, and I don't fully understand it yet.

 **`0x0D` (Memory to memory format)**: Changes the format of a set of pixels in VRAM. Allows changing the line (vertical size) length, count and pitch of the image.

@@ -294,7 +296,7 @@ If the GPU detects either that the cache ran out during submission, that the cac
 Not really sure what this is for but I assume it's a spare area for random stuff.

 #### Interrupts 2.0: Notifiers
-However, some people at Nvidia decided that they were too cool for interrupts. Why have an interrupt that tells the GPU to do something, when *you could have an interrupt that has the GPU tell the drivers to do something!*. So they implemented the incredible "notifier" system. It appears to have been implemented to allow the drivers to manage the GPU resources when the silicon could not implement them. Every single subsystem in the GPU has a notifier enable register alongside its interrupt enable register (some have multiple  for different types of notifiers!) Notifiers appear to be intended to work with the object class system (although they may also exist within GPU subsystems, they mostly exist within `PGRAPH`, `PME` and `PVIDEO`) and are actually different *per-class of object* - with each object having a set of "notification parameters" that can be used to trigger a notification and are triggered by the `SetNotify` method at `0x104` within an object when it is stored inside of RAMHT. There is also `SetNotifyCtxDma`, usually but not always at `0x0`, which is used for the aforementioned context switching. Notifiers appear to be "requested" until the GPU processes them, and PGRAPH can take up to 16 software and 1 hardware notifier type.
+However, some people at Nvidia decided that they were too cool for interrupts. Why have an interrupt that tells the GPU to do something, when *you could have an interrupt that has the GPU tell the drivers to do something!*. So they implemented the incredible "notifier" system. It appears to have been implemented to allow the drivers to manage the GPU resources when the silicon could not implement them. Every single subsystem in the GPU has a notifier enable register alongside its interrupt enable register (some have multiple different notifier enable registers for different types of notifiers!) Notifiers appear to be intended to work with the object class system (although they may also exist within GPU subsystems, they mostly exist within `PGRAPH`, `PME` and `PVIDEO`) and are actually different *per-class of object* - with each object having a set of "notification parameters" that can be used to trigger a notification and are triggered by the `SetNotify` method at `0x104` within an object when it is stored inside of RAMHT. There is also the `SetNotifyCtxDma` method, usually but not always at `0x0`, which is used for the aforementioned context switching. Notifiers appear to be "requested" until the GPU processes them, and PGRAPH can take up to 16 software and 1 hardware notifier type.

 More research is ongoing. It seems most notifiers are generated by the driver in order to manage hardware resources that they would not otherwise be capable of managing, such as the PFIFO caches.