From beba6130aaf2ed6b7b74426993e9adf1b65f6afb Mon Sep 17 00:00:00 2001 From: RichardG867 Date: Thu, 6 Feb 2025 20:59:24 -0300 Subject: [PATCH] Re-review --- _posts/2025-01-22-riva128-part-1.md | 34 +++++++++++++++-------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/_posts/2025-01-22-riva128-part-1.md b/_posts/2025-01-22-riva128-part-1.md index d07aad6..73816f6 100644 --- a/_posts/2025-01-22-riva128-part-1.md +++ b/_posts/2025-01-22-riva128-part-1.md @@ -50,7 +50,7 @@ In a perhaps ironic twist of fate, toilet paper turned out to be an apt metaphor ### The NV1 -The **NV1** was a combination graphics, audio, DRM (yes, really) and game port card implementing what NVIDIA dubbed the "NV Unified Media Architecture" (UMA); the chip was manufactured by SGS-Thomson Microelectronics (now STMicroelectronics) on the 350 nanometer node, who also white-labelled NVIDIA's design (which allegedly[^contract] featured a DAC block designed by SGS-Thomson) as the STG-2000, a variant without audio functionality, also called the "NV1-V32" (for 32-bit VRAM) in internal documentation as opposed to NVIDIA's NV1-D64. The chip was designed to implement a reasonable level of 3D graphics functionality, as well as audio, public-key encryption for DRM purposes (ultimately never used as it would have required the cooperation of software companies) and Sega Saturn game ports, all within a single megabyte of RAM, as memory costs were around $50 a megabyte when initial design began in 1993. +The **NV1** was a combination graphics, audio, DRM (yes, really) and game port card implementing what NVIDIA dubbed the "NV Unified Media Architecture" (UMA); the chip was manufactured by SGS-Thomson Microelectronics (now STMicroelectronics) on a 350-nanometer process node, who also white-labelled NVIDIA's design (which allegedly[^contract] featured a DAC block designed by SGS-Thomson) as the STG-2000, a variant without audio functionality, also called the "NV1-V32" (for 32-bit VRAM) in internal documentation as opposed to NVIDIA's NV1-D64. The chip was designed to implement a reasonable level of 3D graphics functionality, as well as audio, public-key encryption for DRM purposes (ultimately never used as it would have required the cooperation of software companies) and Sega Saturn game ports, all within a single megabyte of RAM, as memory costs were around $50 a megabyte when initial design began in 1993. [^contract]: Source: [Strategic Collaboration Agreement between NVIDIA and SGS-Thomson](http://web.archive.org/web/20240722140726/https://contracts.onecle.com/nvidia/sgs.collab.1993.11.10.shtml), originally covering NV1 but later revised to include NV3, apparently part of a filing with the US Securities and Exchange Commission. @@ -60,37 +60,37 @@ Additionally, the fundamental implementation of 3D rendering used quad patching [^metal]: S3's later API for the Savage family, not to be confused with Apple's Metal from many years later. -The upshot of all of this was what can be understood as nothing less than the total failure of NVIDIA to sell or convince anyone to develop for NV1 in any way, despite its innovative silicon design. While Diamond Multimedia purchased 250,000 chips to place into their "Edge 3D" series of boards, barely any of them sold, and those that did sell were often returned, leading to the chips themselves being returned to NVIDIA and hundreds of thousands of chips sitting simply unused in warehouses. Barely any NV1-capable software was released, with the few pieces of software that do exist coming via a partnership with Sega (more on that later), while most others were forced to run under software emulators for Direct3D (or other APIs) written by Priem, which were made possible by the software architecture NVIDIA chose for their drivers, but were slower and worse-looking than software rendering, buggy, and extremely unappealing. +The upshot of all of this was what can be understood as nothing less than the total failure of NVIDIA to sell or convince anyone to develop for NV1 in any way, despite its innovative silicon design. While Diamond Multimedia purchased 250,000 chips to place into their "Edge 3D" series of cards, and other manufacturers produced cards in smaller quantities, barely any of them sold, and those that did sell were often returned, leading to the chips themselves being returned to NVIDIA and hundreds of thousands of chips sitting simply unused in warehouses. Barely any NV1-capable software was released, with the few pieces of software that do exist coming via a partnership with Sega (more on that later), while most others were forced to run under software emulators for Direct3D (or other APIs) written by Priem, which were made possible by the software architecture NVIDIA chose for their drivers, but were slower and worse-looking than software rendering, buggy, and generally extremely unappealing. -NVIDIA lost $6.4 million in 1995 on a revenue of $1.1 million, and $3 million on a revenue of $3.9 million in 1996. Most of the capital that allowed NVIDIA to continue operating were from the milestone payments from SGS-Thomson for developing the chip, their NV2 contract with Sega (again, more on that later), and their venture capital funding, but not from the very few NV1 sales. The NV1 was poorly reviewed, had very little software and ultimately no sales; despite various desperate efforts to revive it, including releasing the SDK for free (including the proprietary NVLIB API used to develop games for the chip) and straight up begging their customers on their website to spam developers with requests to add NV1 support to games, the chip was effectively dead within a year. +NVIDIA lost $6.4 million in 1995 on a revenue of $1.1 million, and $3 million on a revenue of $3.9 million in 1996. Most of the capital that allowed NVIDIA to continue operating were from the milestone payments from SGS-Thomson for developing the chip, their NV2 contract with Sega (again, more on that later), and their venture capital funding, but not from the very few NV1 sales. The NV1 was poorly reviewed, had very little software and ultimately almost no sales; despite various desperate efforts to revive it, including releasing the SDK for free (with a new proprietary *NVLIB* API for game development as an alternative to direct hardware programming) and by early 1996 straight up begging customers on their website to spam developers with requests to add NV1 support to games, the chip was effectively dead within a year. ### The NV2 -Nevertheless, NVIDIA grew to close to a hundred employees, including sales and marketing teams. The company, and especially its cofounders, remained confident in their architecture and overall prospects of success. They had managed to solidify a business relationship with Sega, to the point where they had initially won the contract to provide the graphics hardware for the successor to the Sega Saturn, at that time codenamed "V08". The GPU was codenamed "Mutara" (after the nebula critical to the plot in *Star Trek II: The Wrath of Khan*) and the overall architecture was the **NV2**. It maintained many of the functional characteristics of the NV1 and was essentially a more powerful successor to that card. According to available sources, this would have been the only NVIDIA chip manufactured by the then-just founded Helios Semiconductor. +Nevertheless, NVIDIA grew to close to a hundred employees, including sales and marketing teams. The company, and especially its cofounders, remained confident in their architecture and overall prospects of success. They had managed to solidify a business relationship with Sega, to the point where they had initially won the contract to provide the graphics hardware for the successor to the Sega Saturn, at that time codenamed "V08". The GPU was codenamed "Mutara" (after the nebula critical to the plot in *Star Trek II: The Wrath of Khan*) and the overall architecture was the **NV2**. It maintained many of the functional characteristics of the NV1 and was essentially a more powerful successor to that chip. According to available sources, this would have been the only NVIDIA chip manufactured by the then-just founded Helios Semiconductor. -However, problems started to emerge almost immediately. Game developers, especially Sega's internal teams, were not happy with having to use a GPU with such a heterodox design; for example, porting games to or from the PC, which Sega did do at the time, would be made far harder. This position was especially championed by Yu Suzuki, head of one of Sega's most prestigious internal development teams Sega-AM2, responsible for the *Daytona USA*, *Virtua Racing*, *Virtua Fighter*, and *Shenmue* series among others, who sent his best graphics programmer to interface with NVIDIA and push for NVIDIA to change the rendering method to a more traditional triangle-based approach. At this point, the story diverges: some tellings claim that NVIDIA simply refused to accede to Sega's request and this severely damaged their relationship irrepairably, leading to the NV2's cancellation; others that the NV2 wasn't killed until it failed to produce any video during a demonstration, and Sega still paid NVIDIA for developing it to prevent bankruptcy, with a single engineer apparently assigned to (and succeeding at) getting the card working for the sole purpose of receiving a milestone payment. +However, problems started to emerge almost immediately. Game developers, especially Sega's internal teams, were not happy with having to use a GPU with such a heterodox design; for example, porting games to or from the PC, which Sega did do at the time, would be made far harder. This position was especially championed by Yu Suzuki, head of one of Sega's most prestigious internal development teams Sega-AM2, responsible for the *Daytona USA*, *Virtua Racing*, *Virtua Fighter*, and *Shenmue* series among others, who sent his best graphics programmer to interface with NVIDIA and push for the company to change the rendering method to a more traditional triangle-based approach. At this point, the story diverges: some tellings claim that NVIDIA simply refused to accede to Sega's request and this damaged their relationship irreparably, leading to the NV2's cancellation; others that the NV2 wasn't killed until it failed to produce any video during a demonstration, and Sega still paid NVIDIA for developing it to prevent bankruptcy, with a single engineer apparently assigned to (and succeeding at) getting the chip working for the sole purpose of receiving a milestone payment. -At some point, Sega, as a traditional Japanese company, couldn't simply kill the deal, so the NV2 was officially relegated to be used in the successor to the educational toddler-aimed Sega Pico, while in reality, Sega of America had already been told to "not worry" about NVIDIA anymore. NVIDIA got the hint, and the NV2 was cancelled. With both NV1 and NV2 out of the picture, NVIDIA had no sales, no customers, and barely any money; at some point in late 1996, the company had $3 million and was burning through $330,000 a month, and most of the NV2 team had been redeployed to the next-generation NV3. No venture capital funding was going to be forthcoming due to the failure to actually create any products people wanted to buy, at least not without extremely unfavourable terms on things like ownership. The company was effectively almost a complete failure and a waste of years of the employees' time. +At some point, Sega, as a traditional Japanese company, couldn't simply kill the deal, so the NV2 was officially relegated to be used in the successor to the educational toddler-aimed Sega Pico, while in reality, Sega of America had already been told to "not worry" about NVIDIA anymore. NVIDIA got the hint, and the NV2 was cancelled. With both NV1 and NV2 out of the picture, NVIDIA had no sales, no customers, and barely any money; by late 1996, the company had $3 million in the bank and was burning through $330,000 a month, and most of the NV2 team had been redeployed to the next-generation NV3. No venture capital funding was going to be forthcoming due to the failure to actually create any products people wanted to buy, at least not without extremely unfavourable terms on things like ownership. The company was effectively almost a complete failure and a waste of years of the employees' time. ### Near destruction of the company -By the end of 1996, things had gotten infinitely worse, with the competition heating up extraordinarily fast; despite NV1 being the first texture-mapped consumer GPU ever released, they had been fundamentally outclassed by their competition. It was a one-two punch: initially, Rendition - founded around the same time as NVIDIA in 1993 - released its V1000 chip based on a custom RISC architecture, and while not particularly fast, it was, for a few months, the only card that could run Quake (the hottest game of 1996) in hardware accelerated mode. The V1000 was an early market leader, alongside S3's laughably bad ViRGE (Video and Rendering Graphics Engine) which was infamously slower than software rendering on high-end CPUs at launch, and was reserved for high-volume OEM bargain-bin disaster machines. +By the end of 1996, things had gotten infinitely worse, with the competition heating up extraordinarily fast; despite NV1 being the first texture-mapped consumer GPU ever released, they had been fundamentally outclassed by their competition. It was a one-two punch: initially, Rendition - founded around the same time as NVIDIA in 1993 - released its V1000 chip based on a custom RISC architecture, and while not particularly fast, it was, for a few months, the only chip that could run Quake (the hottest game of 1996) in hardware accelerated mode. The V1000 was an early market leader, alongside S3's laughably bad ViRGE (Video and Rendering Graphics Engine) which was infamously slower than software rendering on high-end CPUs at launch, and was reserved for high-volume OEM bargain-bin disaster machines. However, this was nothing compared to the body blow about to hit the entire industry, NVIDIA included. At a conference in early 1996, an $80,000 machine from SiliconGraphics, then the world leader in accelerated graphics, crashed during a demo by the then-CEO Ed McCracken. If accounts of the event are to be believed, while the machine rebooted, people who had heard rumors left the room and headed downstairs to another demo by a then-tiny company made up of ex-SGI employes calling themselves "3D/fx" (later shortened to 3dfx), claiming comparable graphics quality for $250... with demos to prove it. As with many cases of supposed "wonder innovations" in the tech industry, it was too good to be true, but when their card, the "Voodoo Graphics" was first released in the form of the "Righteous 3D" by Orchid in October 1996, it turned out to be true. Despite the fact that it was a 3D-only card and required a 2D card to be installed, and the fact it could not accelerate graphics in a window (which almost all other cards could do), performance was so high relative to other products (including the NV1) that it not only had rave reviews on its own but also kicked off a revolution in consumer 3D graphics, which especially caught fire when GLQuake was released in January 1997. The reasons for 3dfx being able to design such an effective GPU when all others failed were numerous. The price of RAM plummeted by 80% throughout 1996, allowing the Voodoo's estimated retail price to be cut from $1000 to $300; many of their staff members came from SiliconGraphics, perhaps the most respected and certainly the largest company in the graphics industry of that time[^sgi]; and while 3dfx used the proprietary Glide API, it also supported OpenGL and Direct3D. Glide was designed to be very similar to OpenGL while allowing for 3dfx to approximate standard graphical techniques, which, as well as their driver design - the Voodoo only accelerates edge interpolation[^edge], texture mapping and blending, span interpolation[^span], and final presentation of the rendered 3D scene - the rest was all done in software. All of these factors were key in what proved to be an exceptionally low price for what was considered to be an exceptionally high quality for the time of the card. -[^sgi]: By 1997, SGI had over 15 years of experience in developing graphical hardware, while also suffering from rampant mismanagement and experiencing the start of what would later prove to be their terminal decline. +[^sgi]: By 1997, SGI had over 15 years of experience in developing graphics hardware, while also suffering from rampant mismanagement and experiencing the start of what would later prove to be their terminal decline. [^edge]: Where a triangle is converted into "spans" of horizontal lines, and the positions of nearby vertexes are used to determine the span's start and end positions. -[^span]: To simplify a complex topic, in a GPU of this era, span interpolation generally involves Z-buffering (also known as depth buffering), sorting polygons back to front, and color buffering, storing the color of each pixel sent to the screen in a buffer which allows for blending and alpha transparency. Some GPUs do not implement a Z-buffer, with examples including the NV1, original ATI 3D Rage and PS1 Geometry Transformation Engine, so sorting of polygons has to be handled by the programmer. +[^span]: To simplify a complex topic, in a GPU of this era, span interpolation generally involves Z-buffering (also known as depth buffering), sorting polygons back to front, and color buffering, storing the color of each pixel sent to the screen in a buffer which allows for blending and alpha transparency. Some GPUs do not implement a Z-buffer and delegate polygon sorting to software instead; examples include the NV1, original ATI Rage and the PlayStation 1's Geometry Transformation Engine. Effectively, NVIDIA had to design a graphics architecture that could at the very least get close to 3dfx's performance, on a shoestring budget and with very little resources, as 60% of their staff (including the entire sales and marketing teams) had been laid off to preserve money. They could not do a complete redesign of the NV1 from scratch if they felt the need to, as it would take two years (time they simply didn't have) and any design that came out of this effort would be immediately obsoleted by competitors, such as 3dfx's Voodoo series, and ATI's Rage which was initially rather pointless but rapidly advancing in performance and driver stability. The chip would also have to work reasonably well on the first tapeout, as there was no capital to produce more revisions of the chip. The fact NVIDIA were able to achieve a successful design in the form of the NV3 under such conditions was a testament to the intelligence, skill and luck of their designers; we will explore how they managed to achieve this later on this write-up. ### The NV3 -It was with these financial, competitive and time constraints in mind that design on the NV3 began in 1996. This chip would eventually be commercialised as the RIVA 128, standing for "Real-time Interactive Video and Animation accelerator" followed by a nod to its 128-bit internal bus width which was very large at the time. NVIDIA retained SGS-Thomson (soon to be STMicroelectronics) as their manufacturing partner, in exchange for SGS-Thomson cancelling their competing STG-3001 GPU. In a similar vein to the NV1, NVIDIA was to sell the chip as "NV3" and SGS-Thomson was to white-label it as STG-3000, once again separated by audio functionality; however, NVIDIA convinced SGS-Thomson to cancel their own part and stick to manufacturing the NV3 instead, which would prove to be a terrible decision when NVIDIA dropped them in favor of TSMC for manufacturing of the RIVA 128 ZX due to both yield issues and pressure from venture capital funders. STMicro went on to manufacture PowerVR chips for a few more years, before dropping out of the market entirely by 2001. +It was with these financial, competitive and time constraints in mind that design on the NV3 began in 1996. This chip would eventually be commercialised as the RIVA 128, standing for "Real-time Interactive Video and Animation accelerator" followed by a nod to its 128-bit internal bus width, which was very large for the time. NVIDIA retained SGS-Thomson (soon to be STMicroelectronics) as their manufacturing partner, in exchange for SGS-Thomson cancelling their competing STG-3001 GPU. In a similar vein to the NV1, NVIDIA was to sell the chip as "NV3" and SGS-Thomson was to white-label it as STG-3000, once again separated by audio functionality; however, NVIDIA convinced SGS-Thomson to cancel their own part and stick to manufacturing the NV3 instead, which later proved to be a terrible decision when NVIDIA dropped them in favor of TSMC for manufacturing of the RIVA 128 ZX due to both yield issues and pressure from venture capital funders. ST went on to manufacture the PowerVR Kyro series of GPU chips before dropping out of the market entirely by 2002. After the NV2 disaster, the company made several calls on the NV3's design that turned out to be very good decisions. First, they acquiesced to Sega's advice (which they might have already done to save the Mutara V08/NV2, but it was too late) and moved to an inverse texture mapping triangle-based model, although some remnants of the original quad patching design remain. The unused DRM functionality was also remove, which may have been assisted by David Kirk[^dkirk] taking over from Curtis Priem as chief designer, as Priem insisted on including the DRM functionality with the NV1, citing piracy issues with the game he had written as a demo of the Malachowsky-designed GX GPU back when he worked at Sun. @@ -158,7 +158,7 @@ The scene graph is almost certainly the namesake for the functional block actual The RIVA 128 is not dependent on the host machine's clock. It has a 13.5 or 14.3 MHz (depending on boot-time configuration) clock crystal, split by the hardware into a memory clock (MCLK) and video clock (VCLK). Note that these names are misleading; the MCLK also handles the chip's actual rendering and timing, with the VCLK seemingly just handling the actual pushing out of frames. -The actual clocks are controlled by registers in `PRAMDAC` set by the video BIOS, which can later be overridden by drivers. In this iteration of the NVIDIA architecture, the VBIOS only performs a very basic POST sequence, initialises the card and sets its clock speed; once the chip is initialised, the VBIOS is effectively never needed again, although there are mechanisms to read from it after initialisation. Clocks were controlled by card manufacturers through parameters `m`/`n`/`p`, from which the chip derives the final memory and pixel clock speed with the formula `(frequency * n) / (m << p)`. Generally, most manufacturers set the memory clock at around 100 MHz, and the pixel clock at around 40 MHz. +The actual clocks are controlled by registers in `PRAMDAC` set by the video BIOS, which can later be overridden by drivers. In this iteration of the NVIDIA architecture, the VBIOS only performs a very basic POST sequence, initialises the card and sets its clock speed; once the chip is initialised, the VBIOS is effectively never needed again, although there are mechanisms to read from it after initialisation. Clocks were controlled by card manufacturers through parameters `m`/`n`/`p`, from which the chip derives the final memory and pixel clock speed with the formula `(frequency * n) / (m << p)`. Generally, most manufacturers set the memory clock at around 100 MHz, and the pixel clock at around 40 MHz, although drivers seemingly reduce these clocks in some cases. The chip's RAMDAC handles final conversion of the digital image generated by the GPU into an analog video signal, and clock generation via three phase-locked loops. It has its own clock (ACLK) running at around 200 MHz on RIVA 128 (revision A/B) and 260 MHz on RIVA 128 ZX (revision C) chips, which unlike the other clocks, was not configurable by manufacturers. @@ -247,9 +247,11 @@ The least straightforward part of this timer is the counter itself, a 56-bit val ### Graphics commands and DMA engine -What may be called *graphics commands* in other GPU architectures are instead called *graphics objects* in the NV3 and all other NVIDIA architectures. Objects are submitted into the GPU core via writing into the `NV_USER` section of the MMIO BAR0 region using programmed I/O. Despite the fact that a custom memory access engine with its own translation lookaside buffer and other memory management structures was implemented for types of graphics objects that perform memory transfers, it does not seem to be used for graphics object submission until the NV4 architecture. Existing documentation is contradictory on if this exists on the NV3, but drivers do not seem to use DMA to submit graphics objects; if a DMA submission method exists, it certainly works very differently to later versions of the architecture. +What may be called *graphics commands* in other GPU architectures are instead called *graphics objects* in the NV3 and all other NVIDIA architectures. Objects are submitted into the GPU core by writing into the `NV_USER` section of the MMIO BAR0 region through programmed I/O. Despite the fact that a custom memory access engine with its own translation lookaside buffer and other memory management structures was implemented for graphics object types that perform memory transfers, it does not seem to be used for graphics object submission until the NV4 architecture. Existing documentation is contradictory on whether or not this exists on the NV3, but drivers do not seem to use DMA to submit graphics objects; if a DMA submission method exists, it certainly works very differently to later versions of the architecture. -There are 8 DMA channels, with the default being channel 0 (also the only channel accessible through PIO?), but only one can be used at a time; using other channels requires a *context switch*, which entails writing the current channel ID to to PGRAPH registers for every class. All DMA channels use 64 KB of RAMIN memory (to be explained later), further divided into 8 KB (`0x2000`) subchannels, effectively representing one object; the meaning of what is in those subchannels depends on the type (or *class* to use NVIDIA terminology) of the object submitted into them, with the attributes of each object being called a *method*. A simple way to program the GPU is to simply create subchannels for specific objects (such as one for text, one for rectangle, etc...) and change their data and methods as the program runs in order to create a graphical effect. However, this is a severely limited way of programming the GPU (although Nvidia did successfully deploy it for simpler projects, such as the NT 3.x miniport driver and early versions of the NT 4.0 miniport driver, before their full Resource Manager was able to be ported), and you are intended to use context switches between DMA channels, as well as additional classes defined in the drivers, to program the card to its full potential. +There are 8 DMA channels, with the default being channel 0 (also the only channel accessible through PIO?), but only one can be used at a time; using other channels requires a *context switch*, which entails writing the current channel ID to to `PGRAPH` registers for every class. All DMA channels use 64 KB of `RAMIN` memory (to be explained later), further divided into 8 KB (`0x2000`) subchannels, effectively representing one object; the meaning of what is in those subchannels depends on the type (or *class* to use NVIDIA terminology) of the object submitted into them, with the attributes of each object being called a *method*. A simple way to program the GPU is to just create subchannels for specific objects (such as one for text, one for rectangle, and so on) and change their data and methods as the program runs in order to create a graphical effect; however, this is a severely limited approach[^nt], and tapping the chip's full potential requires the use of context switches between DMA channels, as well as the additional classes implemented in software by the drivers. + +[^nt]: NVIDIA have successfully deployed this approach on simpler projects, such as early versions of their Windows NT miniport drivers, before the full Resource Manager was able to be ported. All objects have a *context*, consisting of a 32-bit "name" and another 32-bit value storing its class, associated channel and subchannel ID, where it is relative to the start of `RAMIN`, and whether it's a software-injected or hardware graphical rendering object (bit 31). Contexts are stored in an area of RAM called `RAMFC` if the object's channel is not being used; otherwise, they are stored in `RAMHT`, a *hash table* where the hash key is a single byte calculated by XORing each byte of the object's name[^htdriver] as well as the channel ID. Objects are stored in `RAMHT` as structures consisting of their 8-byte context followed by the *methods* mentioned earlier; an object's byte offset in `RAMHT` is its hash multiplied by 16. @@ -257,7 +259,7 @@ All objects have a *context*, consisting of a 32-bit "name" and another 32-bit v The exact set of methods of every graphics object in the architecture is incredibly long and often shared between several different types of objects (although the first 256 bytes and usually a few more after that are shared), and thus won't be listed in part 1. An overall list of graphics objects can be found in the next section, but note that these are the ones defined by the hardware, while the drivers implement a much larger set of objects that do not map exactly to the ones in the GPU; furthermore, as you will see later, as each object is quite large at 8 KB, only one object does not mean only one (or even any at all - some are used to represent DMA objects, for example) graphics objects are drawn once the object is process. Objects can also be connected together with a special type of object called a "patchcord" constructed by the Resource Manager; the name is a remnant from the old NV1 quad patching days. -Graphics objects, after they are written to `NV_USER`, are sent to one of two caches within the `PFIFO` subsystem: `CACHE0` which holds a single entry (really intended for the notifier engine - more on it later - to be able to inject graphics commands from software), or `CACHE1` which holds 32 entries on revisions A-B and 64 on revision C onwards. What these critical components actually do will be explored in full in later parts, but they effectively just store object names and contexts as they are waiting to be sent to `RAMIN`; a "pusher" pushes objects in from the bus as they are written into `NV_USER`, and a "puller" pulls them out of the bus and sends them where they need to be inside of the VRAM (or to `RAMRO` if they are invalid). +After being written to `NV_USER`, graphics objects are sent to one of two caches within the `PFIFO` subsystem: `CACHE0` which holds a single entry (really intended for the [notifier engine](#interrupts-20-notifiers) to be able to inject graphics commands from software), or `CACHE1` which holds 32 entries on revisions A-B and 64 on revision C onwards. What these critical components actually do will be explored in full in later parts, but they effectively just store object names and contexts as they are waiting to be sent to `RAMIN`; a "pusher" pushes objects in from the bus as they are written into `NV_USER`, and a "puller" pulls them out of the bus and sends them where they need to be inside of the VRAM (or to `RAMRO` if they are invalid). Once objects are pulled out, the GPU will simply manipulate the various registers in the `PGRAPH` subsystem in order to draw the object (if the object is actually rendered), and/or perform any DMA operations the graphics object may require using the DMA engine. Objects do not appear to "disappear" on frame refresh; instead, it would simply appear that they are simply drawn over, and most likely, any renderer will simply clear the entire screen (with a *Rectangle* object for instance) before resubmitting any graphics objects they need to render. @@ -351,7 +353,7 @@ The `PFIFO_RUNOUT_STATUS` register holds the current state of the `RAMRO` region #### RAMAU -`RAMAU` was an area used on NV1 cards and revision A NV3 cards for storing audio data being streamed into the CPU. On Revision B and later cards, the area is still mapped to MMIO space, but its functionality has been entirely removed and it is dummied out. +`RAMAU` was an area used on the NV1 and revision A NV3 chips for storing audio data being streamed into the CPU. On Revision B and later cards, this area is still mapped to MMIO space, but its functionality has been removed entirely. ### Interrupts 2.0: Notifiers @@ -375,4 +377,4 @@ I haven't looked into this part as much, so expect more information in an update The next part will dive into how NVIDIA's drivers work and how they make this ridiculously complicated mess of an architecture transform itself into a GPU that allows you to run games you may actually want to play. Stay tuned! ---- \ No newline at end of file +---