forked from mrq/FidelityFX-FSR2
FidelityFX FSR v2.0.1a
This commit is contained in:
parent
c8fc17d281
commit
2e6d42ad0a
82
README.md
82
README.md
|
@ -64,7 +64,7 @@ You can find the binaries for FidelityFX FSR in the release section on GitHub.
|
|||
- [Robust Contrast Adaptive Sharpening (RCAS)](#robust-contrast-adaptive-sharpening-rcas)
|
||||
- [Building the sample](#building-the-sample)
|
||||
- [Version history](#version-history)
|
||||
- [Limitations](#limitations)
|
||||
- [Limitations](release_notes.txt)
|
||||
- [References](#references)
|
||||
|
||||
# Introduction
|
||||
|
@ -98,17 +98,17 @@ To use FSR2 you should follow the steps below:
|
|||
|
||||
7. Include the `ffx_fsr2.h` header file in your codebase where you wish to interact with FSR2.
|
||||
|
||||
8. Create a backend for your target API. E.g. for DirectX12 you should call [`ffxFsr2GetInterfaceDX12`](../src/ffx-fsr2-api/ffx_fsr2.h#L204). A scratch buffer should be allocated of the size returned by calling [`ffxFsr2GetScratchMemorySizeDX12`](../src/ffx-fsr2-api/dx12/ffx_fsr2_dx12.h#L40) and the pointer to that buffer passed to [`ffxFsr2GetInterfaceDX12`](../src/ffx-fsr2-api/dx12/ffx_fsr2_dx12.h#L55).
|
||||
8. Create a backend for your target API. E.g. for DirectX12 you should call [`ffxFsr2GetInterfaceDX12`](src/ffx-fsr2-api/dx12/ffx_fsr2_dx12.h#L55). A scratch buffer should be allocated of the size returned by calling [`ffxFsr2GetScratchMemorySizeDX12`](src/ffx-fsr2-api/dx12/ffx_fsr2_dx12.h#L40) and the pointer to that buffer passed to [`ffxFsr2GetInterfaceDX12`](src/ffx-fsr2-api/dx12/ffx_fsr2_dx12.h#L55).
|
||||
|
||||
9. Create a FSR2 context by calling [`ffxFsr2ContextCreate`](../src/ffx-fsr2-api/ffx_fsr2.h#L204). The parameters structure should be filled out matching the configuration of your application. See the API reference documentation for more details.
|
||||
9. Create a FSR2 context by calling [`ffxFsr2ContextCreate`](src/ffx-fsr2-api/ffx_fsr2.h#L213). The parameters structure should be filled out matching the configuration of your application. See the API reference documentation for more details.
|
||||
|
||||
10. Each frame you should call [`ffxFsr2ContextDispatch`](../src/ffx-fsr2-api/ffx_fsr2.h#L254) to launch FSR2 workloads. The parameters structure should be filled out matching the configuration of your application. See the API reference documentation for more details.
|
||||
10. Each frame you should call [`ffxFsr2ContextDispatch`](src/ffx-fsr2-api/ffx_fsr2.h#L254) to launch FSR2 workloads. The parameters structure should be filled out matching the configuration of your application. See the API reference documentation for more details.
|
||||
|
||||
11. When your application is terminating (or you wish to destroy the context for another reason) you should call [`ffxFsr2ContextDestroy`](../src/ffx-fsr2-api/ffx_fsr2.h#L268). The GPU should be idle before calling this function.
|
||||
11. When your application is terminating (or you wish to destroy the context for another reason) you should call [`ffxFsr2ContextDestroy`](src/ffx-fsr2-api/ffx_fsr2.h#L277). The GPU should be idle before calling this function.
|
||||
|
||||
12. Sub-pixel jittering should be applied to your application's projection matrix. This should be done when performing the main rendering of your application. You should use the [`ffxFsr2GetJitterOffset`](../src/ffx-fsr2-api/ffx_fsr2.h#L268) function to compute the precise jitter offsets. See [Camera jitter](#camera-jitter) section for more details.
|
||||
12. Sub-pixel jittering should be applied to your application's projection matrix. This should be done when performing the main rendering of your application. You should use the [`ffxFsr2GetJitterOffset`](src/ffx-fsr2-api/ffx_fsr2.h#L422) function to compute the precise jitter offsets. See [Camera jitter](#camera-jitter) section for more details.
|
||||
|
||||
13. For the best upscaling quality it is strongly advised that you populate the [Reactive mask](#reactive-mask) and [Transparency & composition mask](#transparency-and-composition-mask) according to our guidelines. You can also use [`ffxFsr2ContextGenerateReactiveMask`](../src/ffx-fsr2-api/ffx_fsr2.h#L265) as a starting point.
|
||||
13. For the best upscaling quality it is strongly advised that you populate the [Reactive mask](#reactive-mask) and [Transparency & composition mask](#transparency-and-composition-mask) according to our guidelines. You can also use [`ffxFsr2ContextGenerateReactiveMask`](src/ffx-fsr2-api/ffx_fsr2.h#L265) as a starting point.
|
||||
|
||||
14. Applications should expose [scaling modes](#scaling-modes), in their user interface in the following order: Quality, Balanced, Performance, and (optionally) Ultra Performance.
|
||||
|
||||
|
@ -148,7 +148,7 @@ The table below summarizes the measured performance of FSR2 on a variety of hard
|
|||
| | Performance (2x) | 0.2ms | 0.2ms | 0.2ms | 0.3ms | 0.4ms | 0.5ms | 0.5ms | 0.8ms | 1.3ms |
|
||||
| | Ultra perf. (3x) | 0.2ms | 0.2ms | 0.2ms | 0.3ms | 0.4ms | 0.4ms | 0.4ms | 0.7ms | 1.1ms |
|
||||
|
||||
Figures are rounded to the nearest 0.1ms and are without [`enableSharpening`](../src/ffx-fsr2-api/ffx_fsr2.h#L127) set.
|
||||
Figures are rounded to the nearest 0.1ms and are without [`enableSharpening`](src/ffx-fsr2-api/ffx_fsr2.h#L127) set.
|
||||
|
||||
## Memory requirements
|
||||
Using FSR2 requires some additional GPU local memory to be allocated for consumption by the GPU. When using the FSR2 API, this memory is allocated when the FSR2 context is created, and is done so via the series of callbacks which comprise the backend interface. This memory is used to store intermediate surfaces which are computed by the FSR2 algorithm as well as surfaces which are persistent across many frames of the application. The table below includes the amount of memory used by FSR2 under various operating conditions. The "Working set" column indicates the total amount of memory used by FSR2 as the algorithm is executing on the GPU; this is the amount of memory FSR2 will require to run. The "Persistent memory" column indicates how much of the "Working set" column is required to be left intact for subsequent frames of the application; this memory stores the temporal data consumed by FSR2. The "Aliasable memory" column indicates how much of the "Working set" column may be aliased by surfaces or other resources used by the application outside of the operating boundaries of FSR2.
|
||||
|
@ -175,23 +175,23 @@ For details on how to manage FSR2's memory requirements please refer to the sect
|
|||
## Input resources
|
||||
FSR2 is a temporal algorithm, and therefore requires access to data from both the current and previous frame. The following table enumerates all external inputs required by FSR2.
|
||||
|
||||
> The resolution column indicates if the data should be at 'rendered' resolution or 'presentation' resolution. 'Rendered' resolution indicates that the resource should match the resolution at which the application is performing its rendering. Conversely, 'presentation' indicates that the resolution of the target should match that which is to be presented to the user. All resources are from the current rendered frame, for DirectX(R)12 and Vulkan(R) applications all input resources should be transitioned to [`D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE`](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_resource_states) and [`VK_ACCESS_SHADER_READ_BIT`](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/VkAccessFlagBits.html) respectively before calling [`ffxFsr2ContextDispatch`](../src/ffx-fsr2-api/ffx_fsr2.h#L254).
|
||||
> The resolution column indicates if the data should be at 'rendered' resolution or 'presentation' resolution. 'Rendered' resolution indicates that the resource should match the resolution at which the application is performing its rendering. Conversely, 'presentation' indicates that the resolution of the target should match that which is to be presented to the user. All resources are from the current rendered frame, for DirectX(R)12 and Vulkan(R) applications all input resources should be transitioned to [`D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE`](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_resource_states) and [`VK_ACCESS_SHADER_READ_BIT`](https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/VkAccessFlagBits.html) respectively before calling [`ffxFsr2ContextDispatch`](src/ffx-fsr2-api/ffx_fsr2.h#L254).
|
||||
|
||||
| Name | Resolution | Format | Type | Notes |
|
||||
| ----------------|------------------------------|------------------------------------|-----------|------------------------------------------------|
|
||||
| Color buffer | Render | `APPLICATION SPECIFIED` | Texture | The render resolution color buffer for the current frame provided by the application. If the contents of the color buffer are in high dynamic range (HDR), then the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](../src/ffx-fsr2-api/ffx_fsr2.h#L87) flag should be set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
| Depth buffer | Render | `APPLICATION SPECIFIED (1x FLOAT)` | Texture | The render resolution depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application's control. The configuration of the depth should be communicated to FSR2 via the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164). You should set the [`FFX_FSR2_ENABLE_DEPTH_INVERTED`](../src/ffx-fsr2-api/ffx_fsr2.h#L90) flag if your depth buffer is inverted (that is [1..0] range), and you should set the [`FFX_FSR2_ENABLE_DEPTH_INFINITE`](../src/ffx-fsr2-api/ffx_fsr2.h#L91) flag if your depth buffer has an infinite far plane. If the application provides the depth buffer in `D32S8` format, then FSR2 will ignore the stencil component of the buffer, and create an `R32_FLOAT` resource to address the depth buffer. On GCN and RDNA hardware, depth buffers are stored separately from stencil buffers. |
|
||||
| Motion vectors | Render or presentation | `APPLICATION SPECIFIED (2x FLOAT)` | Texture | The 2D motion vectors for the current frame provided by the application in [**(<-width, -height>**..**<width, height>**] range. If your application renders motion vectors with a different range, you may use the [`motionVectorScale`](../src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to adjust them to match the expected range for FSR2. Internally, FSR2 uses 16-bit quantities to represent motion vectors in many cases, which means that while motion vectors with greater precision can be provided, FSR2 will not benefit from the increased precision. The resolution of the motion vector buffer should be equal to the render resolution, unless the [`FFX_FSR2_ENABLE_DISPLAY_RESOLUTION_MOTION_VECTORS`](../src/ffx-fsr2-api/ffx_fsr2.h#L88) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164), in which case it should be equal to the presentation resolution. |
|
||||
| Color buffer | Render | `APPLICATION SPECIFIED` | Texture | The render resolution color buffer for the current frame provided by the application. If the contents of the color buffer are in high dynamic range (HDR), then the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](src/ffx-fsr2-api/ffx_fsr2.h#L87) flag should be set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
| Depth buffer | Render | `APPLICATION SPECIFIED (1x FLOAT)` | Texture | The render resolution depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application's control. The configuration of the depth should be communicated to FSR2 via the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164). You should set the [`FFX_FSR2_ENABLE_DEPTH_INVERTED`](src/ffx-fsr2-api/ffx_fsr2.h#L90) flag if your depth buffer is inverted (that is [1..0] range), and you should set the [`FFX_FSR2_ENABLE_DEPTH_INFINITE`](src/ffx-fsr2-api/ffx_fsr2.h#L91) flag if your depth buffer has an infinite far plane. If the application provides the depth buffer in `D32S8` format, then FSR2 will ignore the stencil component of the buffer, and create an `R32_FLOAT` resource to address the depth buffer. On GCN and RDNA hardware, depth buffers are stored separately from stencil buffers. |
|
||||
| Motion vectors | Render or presentation | `APPLICATION SPECIFIED (2x FLOAT)` | Texture | The 2D motion vectors for the current frame provided by the application in [**(<-width, -height>**..**<width, height>**] range. If your application renders motion vectors with a different range, you may use the [`motionVectorScale`](src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to adjust them to match the expected range for FSR2. Internally, FSR2 uses 16-bit quantities to represent motion vectors in many cases, which means that while motion vectors with greater precision can be provided, FSR2 will not benefit from the increased precision. The resolution of the motion vector buffer should be equal to the render resolution, unless the [`FFX_FSR2_ENABLE_DISPLAY_RESOLUTION_MOTION_VECTORS`](src/ffx-fsr2-api/ffx_fsr2.h#L88) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164), in which case it should be equal to the presentation resolution. |
|
||||
| Reactive mask | Render | `R8_UNORM` | Texture | As some areas of a rendered image do not leave a footprint in the depth buffer or include motion vectors, FSR2 provides support for a reactive mask texture which can be used to indicate to FSR2 where such areas are. Good examples of these are particles, or alpha-blended objects which do not write depth or motion vectors. If this resource is not set, then FSR2's shading change detection logic will handle these cases as best it can, but for optimal results, this resource should be set. For more information on the reactive mask please refer to the [Reactive mask](#reactive-mask) section. |
|
||||
| Exposure | 1x1 | `R32_FLOAT` | Texture | A 1x1 texture containing the exposure value computed for the current frame. This resource is optional, and may be omitted if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](../src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164). |
|
||||
| Exposure | 1x1 | `R32_FLOAT` | Texture | A 1x1 texture containing the exposure value computed for the current frame. This resource is optional, and may be omitted if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164). |
|
||||
|
||||
## Depth buffer configurations
|
||||
It is strongly recommended that an inverted, infinite depth buffer is used with FSR2. However, alternative depth buffer configurations are supported. An application should inform the FSR2 API of its depth buffer configuration by setting the appropriate flags during the creation of the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164). The table below contains the appropriate flags.
|
||||
It is strongly recommended that an inverted, infinite depth buffer is used with FSR2. However, alternative depth buffer configurations are supported. An application should inform the FSR2 API of its depth buffer configuration by setting the appropriate flags during the creation of the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164). The table below contains the appropriate flags.
|
||||
|
||||
| FSR2 flag | Note |
|
||||
|----------------------------------|--------------------------------------------------------------------------------------------|
|
||||
| [`FFX_FSR2_ENABLE_DEPTH_INVERTED`](../src/ffx-fsr2-api/ffx_fsr2.h#L90) | A bit indicating that the input depth buffer data provided is inverted [max..0]. |
|
||||
| [`FFX_FSR2_ENABLE_DEPTH_INFINITE`](../src/ffx-fsr2-api/ffx_fsr2.h#L91) | A bit indicating that the input depth buffer data provided is using an infinite far plane. |
|
||||
| [`FFX_FSR2_ENABLE_DEPTH_INVERTED`](src/ffx-fsr2-api/ffx_fsr2.h#L90) | A bit indicating that the input depth buffer data provided is inverted [max..0]. |
|
||||
| [`FFX_FSR2_ENABLE_DEPTH_INFINITE`](src/ffx-fsr2-api/ffx_fsr2.h#L91) | A bit indicating that the input depth buffer data provided is using an infinite far plane. |
|
||||
|
||||
|
||||
## Providing motion vectors
|
||||
|
@ -201,7 +201,7 @@ A key part of a temporal algorithm (be it antialiasing or upscaling) is the prov
|
|||
|
||||
![alt text](docs/media/super-resolution-temporal/motion-vectors.svg "A diagram showing a 2D motion vector.")
|
||||
|
||||
If your application computes motion vectors in another space - for example normalized device coordinate space - then you may use the [`motionVectorScale`](../src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to instruct FSR2 to adjust them to match the expected range for FSR2. The code examples below illustrate how motion vectors may be scaled to screen space. The example HLSL and C++ code below illustrates how NDC-space motion vectors can be scaled using the FSR2 host API.
|
||||
If your application computes motion vectors in another space - for example normalized device coordinate space - then you may use the [`motionVectorScale`](src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to instruct FSR2 to adjust them to match the expected range for FSR2. The code examples below illustrate how motion vectors may be scaled to screen space. The example HLSL and C++ code below illustrates how NDC-space motion vectors can be scaled using the FSR2 host API.
|
||||
|
||||
```HLSL
|
||||
// GPU: Example of application NDC motion vector computation
|
||||
|
@ -213,7 +213,7 @@ dispatchParameters.motionVectorScale.y = (float)renderHeight;
|
|||
```
|
||||
|
||||
### Precision & resolution
|
||||
Internally, FSR2 uses 16bit quantities to represent motion vectors in many cases, which means that while motion vectors with greater precision can be provided, FSR2 will not currently benefit from the increased precision. The resolution of the motion vector buffer should be equal to the render resolution, unless the [`FFX_FSR2_ENABLE_DISPLAY_RESOLUTION_MOTION_VECTORS`](../src/ffx-fsr2-api/ffx_fsr2.h#L88) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164), in which case it should be equal to the presentation resolution.
|
||||
Internally, FSR2 uses 16bit quantities to represent motion vectors in many cases, which means that while motion vectors with greater precision can be provided, FSR2 will not currently benefit from the increased precision. The resolution of the motion vector buffer should be equal to the render resolution, unless the [`FFX_FSR2_ENABLE_DISPLAY_RESOLUTION_MOTION_VECTORS`](src/ffx-fsr2-api/ffx_fsr2.h#L88) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164), in which case it should be equal to the presentation resolution.
|
||||
|
||||
### Coverage
|
||||
FSR2 will perform better quality upscaling when more objects provide their motion vectors. It is therefore advised that all opaque, alpha-tested and alpha-blended objects should write their motion vectors for all covered pixels. If vertex shader effects are applied - such as scrolling UVs - these calculations should also be factored into the calculation of motion for the best results. For alpha-blended objects it is also strongly advised that the alpha value of each covered pixel is stored to the corresponding pixel in the [reactive mask](#reactive-mask). This will allow FSR2 to perform better handling of alpha-blended objects during upscaling. The reactive mask is especially important for alpha-blended objects where writing motion vectors might be prohibitive, such as particles.
|
||||
|
@ -225,19 +225,19 @@ Therefore, it is strongly encouraged that applications provide a reactive mask t
|
|||
|
||||
While there are other applications for the reactive mask, the primary application for the reactive mask is producing better results of upscaling images which include alpha-blended objects. A good proxy for reactiveness is actually the alpha value used when compositing an alpha-blended object into the scene, therefore, applications should write `alpha` to the reactive mask. It should be noted that it is unlikely that a reactive value of close to 1 will ever produce good results. Therefore, we recommend clamping the maximum reactive value to around 0.9.
|
||||
|
||||
If a [Reactive mask](#reactive-mask) is not provided to FSR2 (by setting the [`reactive`](../src/ffx-fsr2-api/ffx_fsr2.h#L121) field of [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) to `NULL`) then an internally generated 1x1 texture with a cleared reactive value will be used.
|
||||
If a [Reactive mask](#reactive-mask) is not provided to FSR2 (by setting the [`reactive`](src/ffx-fsr2-api/ffx_fsr2.h#L121) field of [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) to `NULL`) then an internally generated 1x1 texture with a cleared reactive value will be used.
|
||||
|
||||
## Transparency & composition mask
|
||||
In addition to the [Reactive mask](#reactive-mask), FSR2 provides for the application to denote areas of other specialist rendering which should be accounted for during the upscaling process. Examples of such special rendering include areas of raytraced reflections or animated textures.
|
||||
|
||||
While the [Reactive mask](#reactive-mask) adjusts the accumulation balance, the [Transparency & composition mask](#transparency-and-composition-mask) adjusts the pixel locks created by FSR2. A pixel with a value of 0 in the [Transparency & composition mask](#ttransparency-and-composition-mask) does not perform any additional modification to the lock for that pixel. Conversely, a value of 1 denotes that the lock for that pixel should be completely removed.
|
||||
|
||||
If a [Transparency & composition mask](#transparency-and-composition-mask) is not provided to FSR2 (by setting the [`transparencyAndComposition`](#../src/ffx-fsr2-api/ffx_fsr2.h#L122) field of [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) to `NULL`) then an internally generated 1x1 texture with a cleared transparency and composition value will be used.
|
||||
If a [Transparency & composition mask](#transparency-and-composition-mask) is not provided to FSR2 (by setting the [`transparencyAndComposition`](#src/ffx-fsr2-api/ffx_fsr2.h#L122) field of [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) to `NULL`) then an internally generated 1x1 texture with a cleared transparency and composition value will be used.
|
||||
|
||||
## Automatically generating reactivity
|
||||
To help applications generate the [Reactive mask](#reactive-mask) and the [Transparency & composition mask](#transparency-and-composition-mask), FSR2 provides an optional helper API. Under the hood, the API launches a compute shader which computes these values for each pixel using a luminance-based heuristic.
|
||||
|
||||
Applications wishing to do this can call the [`ffxFsr2ContextGenerateReactiveMask`](../src/ffx-fsr2-api/ffx_fsr2.h#L265) function and should pass two versions of the color buffer, one containing opaque only geometry, and the other containing both opaque and alpha-blended objects.
|
||||
Applications wishing to do this can call the [`ffxFsr2ContextGenerateReactiveMask`](src/ffx-fsr2-api/ffx_fsr2.h#L265) function and should pass two versions of the color buffer, one containing opaque only geometry, and the other containing both opaque and alpha-blended objects.
|
||||
|
||||
## Exposure
|
||||
FSR2 provides two values which control the exposure used when performing upscaling. They are as follows:
|
||||
|
@ -249,7 +249,7 @@ The exposure value should match that which the application uses during any subse
|
|||
|
||||
> In various stages of the FSR2 algorithm described in this document, FSR2 will compute its own exposure value for internal use. It is worth noting that all outputs from FSR2 will have this internal tonemapping reversed before the final output is written. Meaning that FSR2 returns results in the same domain as the original input signal.
|
||||
|
||||
Poorly selected exposure values can have a drastic impact on the final quality of FSR2's upscaling. Therefore, it is recommended that [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](../src/ffx-fsr2-api/ffx_fsr2.h#L92) is used by the application, unless there is a particular reason not to. When [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](../src/ffx-fsr2-api/ffx_fsr2.h#L92) is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure, the exposure calculation shown in the HLSL code below is used to compute the exposure value, this matches the exposure response of ISO 100 film stock.
|
||||
Poorly selected exposure values can have a drastic impact on the final quality of FSR2's upscaling. Therefore, it is recommended that [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](src/ffx-fsr2-api/ffx_fsr2.h#L92) is used by the application, unless there is a particular reason not to. When [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](src/ffx-fsr2-api/ffx_fsr2.h#L92) is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure, the exposure calculation shown in the HLSL code below is used to compute the exposure value, this matches the exposure response of ISO 100 film stock.
|
||||
|
||||
```HLSL
|
||||
float ComputeAutoExposureFromAverageLog(float averageLogLuminance)
|
||||
|
@ -296,7 +296,7 @@ ffx_types.h
|
|||
ffx_util.h
|
||||
```
|
||||
|
||||
To use the FSR2 API, you should link `ffx_fsr2_api_x64.lib` which will provide the symbols for the application-facing APIs. However, FSR2's API has a modular backend, which means that different graphics APIs and platforms may be targetted through the use of a matching backend. Therefore, you should further include the backend lib matching your requirements, referencing the table below.
|
||||
To use the FSR2 API, you should link `ffx_fsr2_api_x64.lib` which will provide the symbols for the application-facing APIs. However, FSR2's API has a modular backend, which means that different graphics APIs and platforms may be targeted through the use of a matching backend. Therefore, you should further include the backend lib matching your requirements, referencing the table below.
|
||||
|
||||
| Target | Library name |
|
||||
|---------------------|-------------------------|
|
||||
|
@ -305,11 +305,11 @@ To use the FSR2 API, you should link `ffx_fsr2_api_x64.lib` which will provide t
|
|||
|
||||
> Please note the modular architecture of the FSR2 API allows for custom backends to be implemented. See the [Modular backend](#modular-backend) section for more details.
|
||||
|
||||
To begin using the API, the application should first create a [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164) structure. This structure should be located somewhere with a lifetime approximately matching that of your backbuffer; somewhere on the application's heap is usually a good choice. By calling [`ffxFsr2ContextCreate`](../src/ffx-fsr2-api/ffx_fsr2.h#L204) the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164) structure will be populated with the data it requires. Moreover, a number of calls will be made from [`ffxFsr2ContextCreate`](../src/ffx-fsr2-api/ffx_fsr2.h#L204) to the backend which is provided to [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164) as part of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. These calls will perform such tasks as creating intermediate resources required by FSR2 and setting up shaders and their associated pipeline state. The FSR2 API does not perform any dynamic memory allocation.
|
||||
To begin using the API, the application should first create a [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164) structure. This structure should be located somewhere with a lifetime approximately matching that of your backbuffer; somewhere on the application's heap is usually a good choice. By calling [`ffxFsr2ContextCreate`](src/ffx-fsr2-api/ffx_fsr2.h#L213) the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164) structure will be populated with the data it requires. Moreover, a number of calls will be made from [`ffxFsr2ContextCreate`](src/ffx-fsr2-api/ffx_fsr2.h#L213) to the backend which is provided to [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164) as part of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. These calls will perform such tasks as creating intermediate resources required by FSR2 and setting up shaders and their associated pipeline state. The FSR2 API does not perform any dynamic memory allocation.
|
||||
|
||||
Each frame of your application where upscaling is required, you should call [`ffxFsr2ContextDispatch`](../src/ffx-fsr2-api/ffx_fsr2.h#L254). This function accepts the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164) structure that was created earlier in the application's lifetime as well as a description of precisely how upscaling should be performed and on which data. This description is provided by the application filling out a [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure.
|
||||
Each frame of your application where upscaling is required, you should call [`ffxFsr2ContextDispatch`](src/ffx-fsr2-api/ffx_fsr2.h#L254). This function accepts the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164) structure that was created earlier in the application's lifetime as well as a description of precisely how upscaling should be performed and on which data. This description is provided by the application filling out a [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure.
|
||||
|
||||
Destroying the context is performed by calling [`ffxFsr2ContextDestroy`](../src/ffx-fsr2-api/ffx_fsr2.h#L268). Please note, that the GPU should be idle before attempting to call [`ffxFsr2ContextDestroy`](../src/ffx-fsr2-api/ffx_fsr2.h#L268), and the function does not perform implicit synchronization to ensure that resources being accessed by FSR2 are not currently in flight. The reason for this choice is to avoid FSR2 introducing additional GPU flushes for applications who already perform adequate synchronization at the point where they might wish to destroy the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164), this allows an application to perform the most efficient possible creation and teardown of the FSR2 API when required.
|
||||
Destroying the context is performed by calling [`ffxFsr2ContextDestroy`](src/ffx-fsr2-api/ffx_fsr2.h#L277). Please note, that the GPU should be idle before attempting to call [`ffxFsr2ContextDestroy`](src/ffx-fsr2-api/ffx_fsr2.h#L277), and the function does not perform implicit synchronization to ensure that resources being accessed by FSR2 are not currently in flight. The reason for this choice is to avoid FSR2 introducing additional GPU flushes for applications who already perform adequate synchronization at the point where they might wish to destroy the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164), this allows an application to perform the most efficient possible creation and teardown of the FSR2 API when required.
|
||||
|
||||
There are additional helper functions which are provided as part of the FSR2 API. These helper functions perform tasks like the computation of sub-pixel jittering offsets, as well as the calculation of rendering resolutions based on dispatch resolutions and the default [scaling modes](#scaling-modes) provided by FSR2.
|
||||
|
||||
|
@ -327,7 +327,7 @@ Out of the box, the FSR2 API will compile into multiple libraries following the
|
|||
## Memory management
|
||||
If the FSR2 API is used with one of the supplied backends (e.g: DirectX(R)12 or Vulkan(R)) then all the resources required by FSR2 are created as committed resources directly using the graphics device provided by the host application. However, by overriding the create and destroy family of functions present in the backend interface it is possible for an application to more precisely control the memory management of FSR2.
|
||||
|
||||
To do this, you can either provide a full custom backend to FSR2 via the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure passed to [`ffxFsr2ContextCreate`](../src/ffx-fsr2-api/ffx_fsr2.h#L204) function, or you can retrieve the backend for your desired API and override the resource creation and destruction functions to handle them yourself. To do this, simply overwrite the [`fpCreateResource`](../src/ffx-fsr2-api/ffx_fsr2_interface.h#L403) and [`fpDestroyResource`](../src/ffx-fsr2-api/ffx_fsr2_interface.h#L399) function pointers.
|
||||
To do this, you can either provide a full custom backend to FSR2 via the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure passed to [`ffxFsr2ContextCreate`](src/ffx-fsr2-api/ffx_fsr2.h#L213) function, or you can retrieve the backend for your desired API and override the resource creation and destruction functions to handle them yourself. To do this, simply overwrite the [`fpCreateResource`](src/ffx-fsr2-api/ffx_fsr2_interface.h#L399) and [`fpDestroyResource`](src/ffx-fsr2-api/ffx_fsr2_interface.h#L403) function pointers.
|
||||
|
||||
``` CPP
|
||||
// Setup DX12 interface.
|
||||
|
@ -355,7 +355,7 @@ errorCode = ffxFsr2ContextCreate(&context, &contextDescription);
|
|||
FFX_ASSERT(errorCode == FFX_OK);
|
||||
```
|
||||
|
||||
One interesting advantage to an application taking control of the memory management required for FSR2 is that resource aliasing maybe performed, which can yield a memory saving. The table present in [Memory requirements](#memory-requirements) demonstrates the savings available through using this technique. In order to realise the savings shown in this table, an appropriate area of memory - the contents of which are not required to survive across a call to the FSR2 dispatches - should be found to share with the aliasable resources required for FSR2. Each [`FfxFsr2CreateResourceFunc`](../src/ffx-fsr2-api/ffx_fsr2_interface.h#L197) call made by FSR2's core API through the FSR2 backend interface will contains a set of flags as part of the [`FfxCreateResourceDescription`](../src/ffx-fsr2-api/ffx_types.h#L251) structure. If the [`FFX_RESOURCE_FLAGS_ALIASABLE`](../src/ffx-fsr2-api/ffx_types.h#L101) is set in the [`flags`](../src/ffx-fsr2-api/ffx_types.h#L208) field this indicates that the resource may be safely aliased with other resources in the rendering frame.
|
||||
One interesting advantage to an application taking control of the memory management required for FSR2 is that resource aliasing maybe performed, which can yield a memory saving. The table present in [Memory requirements](#memory-requirements) demonstrates the savings available through using this technique. In order to realise the savings shown in this table, an appropriate area of memory - the contents of which are not required to survive across a call to the FSR2 dispatches - should be found to share with the aliasable resources required for FSR2. Each [`FfxFsr2CreateResourceFunc`](src/ffx-fsr2-api/ffx_fsr2_interface.h#L399) call made by FSR2's core API through the FSR2 backend interface will contains a set of flags as part of the [`FfxCreateResourceDescription`](src/ffx-fsr2-api/ffx_types.h#L251) structure. If the [`FFX_RESOURCE_FLAGS_ALIASABLE`](src/ffx-fsr2-api/ffx_types.h#L101) is set in the [`flags`](src/ffx-fsr2-api/ffx_types.h#L208) field this indicates that the resource may be safely aliased with other resources in the rendering frame.
|
||||
|
||||
## Temporal Antialiasing
|
||||
Temporal antialiasing (TAA) is a technique which uses the output of previous frames to construct a higher quality output from the current frame. As FSR2 has a similar goal - albeit with the additional goal of also increasing the resolution of the rendered image - there is no longer any need to include a separate TAA pass in your application.
|
||||
|
@ -372,7 +372,7 @@ Internally, these function implement a Halton[2,3] sequence [[Halton](#reference
|
|||
|
||||
![alt text](docs/media/super-resolution-temporal/jitter-space.svg "A diagram showing how to map sub-pixel jitter offsets to projection offsets.")
|
||||
|
||||
It is important to understand that the values returned from the [`ffxFsr2GetJitterOffset`](../src/ffx-fsr2-api/ffx_fsr2.h#L268) are in unit pixel space, and in order to composite this correctly into a projection matrix we must convert them into projection offsets. The diagram above shows a single pixel in unit pixel space, and in projection space. The code listing below shows how to correctly composite the sub-pixel jitter offset value into a projection matrix.
|
||||
It is important to understand that the values returned from the [`ffxFsr2GetJitterOffset`](src/ffx-fsr2-api/ffx_fsr2.h#L422) are in unit pixel space, and in order to composite this correctly into a projection matrix we must convert them into projection offsets. The diagram above shows a single pixel in unit pixel space, and in projection space. The code listing below shows how to correctly composite the sub-pixel jitter offset value into a projection matrix.
|
||||
|
||||
``` CPP
|
||||
const int32_t jitterPhaseCount = ffxFsr2GetJitterPhaseCount(renderWidth, displayWidth);
|
||||
|
@ -388,9 +388,9 @@ const Matrix4 jitterTranslationMatrix = translateMatrix(Matrix3::identity, Vecto
|
|||
const Matrix4 jitteredProjectionMatrix = jitterTranslationMatrix * projectionMatrix;
|
||||
```
|
||||
|
||||
Jitter should be applied to *all* rendering. This includes opaque, alpha transparent, and raytraced objects. For rasterized objects, the sub-pixel jittering values calculated by the [`ffxFsr2GetJitterOffset`](../src/ffx-fsr2-api/ffx_fsr2.h#L268) function can be applied to the camera projection matrix which is ultimately used to perform transformations during vertex shading. For raytraced rendering, the sub-pixel jitter should be applied to the ray's origin - often the camera's position.
|
||||
Jitter should be applied to *all* rendering. This includes opaque, alpha transparent, and raytraced objects. For rasterized objects, the sub-pixel jittering values calculated by the [`ffxFsr2GetJitterOffset`](src/ffx-fsr2-api/ffx_fsr2.h#L422) function can be applied to the camera projection matrix which is ultimately used to perform transformations during vertex shading. For raytraced rendering, the sub-pixel jitter should be applied to the ray's origin - often the camera's position.
|
||||
|
||||
Whether you elect to use the recommended [`ffxFsr2GetJitterOffset`](../src/ffx-fsr2-api/ffx_fsr2.h#L268) function or your own sequence generator, you must set the [`jitterOffset`](../src/ffx-fsr2-api/ffx_fsr2.h#L124) field of the [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to inform FSR2 of the jitter offset that has been applied in order to render each frame. Moreover, if not using the recommended [`ffxFsr2GetJitterOffset`](../src/ffx-fsr2-api/ffx_fsr2.h#L268) function, care should be taken that your jitter sequence never generates a null vector; that is value of 0 in both the X and Y dimensions.
|
||||
Whether you elect to use the recommended [`ffxFsr2GetJitterOffset`](src/ffx-fsr2-api/ffx_fsr2.h#L422) function or your own sequence generator, you must set the [`jitterOffset`](src/ffx-fsr2-api/ffx_fsr2.h#L124) field of the [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to inform FSR2 of the jitter offset that has been applied in order to render each frame. Moreover, if not using the recommended [`ffxFsr2GetJitterOffset`](src/ffx-fsr2-api/ffx_fsr2.h#L422) function, care should be taken that your jitter sequence never generates a null vector; that is value of 0 in both the X and Y dimensions.
|
||||
|
||||
The table below shows the jitter sequence length for each of the default quality modes.
|
||||
|
||||
|
@ -403,7 +403,7 @@ The table below shows the jitter sequence length for each of the default quality
|
|||
| Custom | [1..n]x (per dimension) | `ceil(8 * n^2)` |
|
||||
|
||||
## Camera jump cuts
|
||||
Most applications with real-time rendering have a large degree of temporal consistency between any two consecutive frames. However, there are cases where a change to a camera's transformation might cause an abrupt change in what is rendered. In such cases, FSR2 is unlikely to be able to reuse any data it has accumulated from previous frames, and should clear this data such to exclude it from consideration in the compositing process. In order to indicate to FSR2 that a jump cut has occurred with the camera you should set the [`reset`](../src/ffx-fsr2-api/ffx_fsr2.h#L131) field of the [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to `true` for the first frame of the discontinuous camera transformation.
|
||||
Most applications with real-time rendering have a large degree of temporal consistency between any two consecutive frames. However, there are cases where a change to a camera's transformation might cause an abrupt change in what is rendered. In such cases, FSR2 is unlikely to be able to reuse any data it has accumulated from previous frames, and should clear this data such to exclude it from consideration in the compositing process. In order to indicate to FSR2 that a jump cut has occurred with the camera you should set the [`reset`](src/ffx-fsr2-api/ffx_fsr2.h#L131) field of the [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to `true` for the first frame of the discontinuous camera transformation.
|
||||
|
||||
Rendering performance may be slightly less than typical frame-to-frame operation when using the reset flag, as FSR2 will clear some additional internal resources.
|
||||
|
||||
|
@ -426,7 +426,7 @@ The following table illustrates the mipmap biasing factor which results from eva
|
|||
| Ultra performance | 3.0X (per dimension) | -2.58 |
|
||||
|
||||
## HDR support
|
||||
High dynamic range images are supported in FSR2. To enable this, you should set the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](../src/ffx-fsr2-api/ffx_fsr2.h#L87) bit in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. Images should be provided to FSR2 in linear color space.
|
||||
High dynamic range images are supported in FSR2. To enable this, you should set the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](src/ffx-fsr2-api/ffx_fsr2.h#L87) bit in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. Images should be provided to FSR2 in linear color space.
|
||||
|
||||
> Support for additional color spaces might be provided in a future revision of FSR2.
|
||||
|
||||
|
@ -483,7 +483,7 @@ The following table contains all resources consumed by the [Compute luminance py
|
|||
|
||||
| Name | Temporal layer | Resolution | Format | Type | Notes |
|
||||
| ----------------|-----------------|--------------|-------------------------|-----------|----------------------------------------------|
|
||||
| Color buffer | Current frame | Render | `APPLICATION SPECIFIED` | Texture | The render resolution color buffer for the current frame provided by the application. If the contents of the color buffer are in high dynamic range (HDR), then the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](../src/ffx-fsr2-api/ffx_fsr2.h#L87) flag should be set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
| Color buffer | Current frame | Render | `APPLICATION SPECIFIED` | Texture | The render resolution color buffer for the current frame provided by the application. If the contents of the color buffer are in high dynamic range (HDR), then the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](src/ffx-fsr2-api/ffx_fsr2.h#L87) flag should be set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
|
||||
### Resource outputs
|
||||
The following table contains all resources produced or modified by the [Compute luminance pyramid](#compute-luminance-pyramid) stage.
|
||||
|
@ -492,11 +492,11 @@ The following table contains all resources produced or modified by the [Compute
|
|||
|
||||
| Name | Temporal layer | Resolution | Format | Type | Notes |
|
||||
| ----------------------------|-----------------|------------------|-------------------------|-----------|----------------------------------------------|
|
||||
| Exposure | Current frame | 1x1 | `R32_FLOAT` | Texture | A 1x1 texture containing the exposure value computed for the current frame. This resource is optional, and may be omitted if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](../src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164). |
|
||||
| Exposure | Current frame | 1x1 | `R32_FLOAT` | Texture | A 1x1 texture containing the exposure value computed for the current frame. This resource is optional, and may be omitted if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164). |
|
||||
| Current luminance | Current frame | `Render * 0.5` | `R16_FLOAT` | Texture | A texture at 50% of render resolution texture which contains the luminance of the current frame. |
|
||||
|
||||
### Description
|
||||
The [Compute luminance pyramid](#compute-luminance-pyramid) stage is implemented using FidelityFX [Single Pass Downsampler](single-pass-downsampler.md), an optimized technique for producing mipmap chains using a single compute shader dispatch. Instead of the conventional (full) pyramidal approach, SPD provides a mechanism to produce a specific set of mipmap levels for an arbitrary input texture, as well as performing arbitrary calculations on that data as we store it to the target location in memory. In FSR2, we are interested in producing in upto two intermediate resources depending on the configuration of the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164). The first resource is a low-resolution representation of the current luminance, this is used later in FSR2 to attempt to detect shading changes. The second is the exposure value, and while it is always computed, it is only used by subsequent stages if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](../src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure upon context creation. The exposure value - either from the application, or the [Compute luminance pyramid](#compute-luminance-pyramid) stage - is used in the [Adjust input color](#adjust-input-color) stage of FSR2, as well as by the [Reproject & Accumulate](#project-and-accumulate) stage.
|
||||
The [Compute luminance pyramid](#compute-luminance-pyramid) stage is implemented using FidelityFX [Single Pass Downsampler](single-pass-downsampler.md), an optimized technique for producing mipmap chains using a single compute shader dispatch. Instead of the conventional (full) pyramidal approach, SPD provides a mechanism to produce a specific set of mipmap levels for an arbitrary input texture, as well as performing arbitrary calculations on that data as we store it to the target location in memory. In FSR2, we are interested in producing in upto two intermediate resources depending on the configuration of the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164). The first resource is a low-resolution representation of the current luminance, this is used later in FSR2 to attempt to detect shading changes. The second is the exposure value, and while it is always computed, it is only used by subsequent stages if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure upon context creation. The exposure value - either from the application, or the [Compute luminance pyramid](#compute-luminance-pyramid) stage - is used in the [Adjust input color](#adjust-input-color) stage of FSR2, as well as by the [Reproject & Accumulate](#project-and-accumulate) stage.
|
||||
|
||||
![alt text](docs/media/super-resolution-temporal/auto-exposure.svg "A diagram showing the mipmap levels written by auto-exposure.")
|
||||
|
||||
|
@ -542,8 +542,8 @@ The following table contains all resources consumed by the [Adjust input color](
|
|||
|
||||
| Name | Temporal layer | Resolution | Format | Type | Notes |
|
||||
| ----------------|-----------------|--------------|---------------------------|-----------|----------------------------------------------|
|
||||
| Color buffer | Current frame | Render | `APPLICATION SPECIFIED` | Texture | The render resolution color buffer for the current frame provided by the application. If the contents of the color buffer are in high dynamic range (HDR), then the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](../src/ffx-fsr2-api/ffx_fsr2.h#L87) flag should be set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
| Exposure | Current frame | 1x1 | ``R32_FLOAT`` | Texture | A 1x1 texture containing the exposure value computed for the current frame. This resource can be supplied by the application, or computed by the [Compute luminance pyramid](#compute-luminance-pyramid) stage of FSR2 if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](../src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
| Color buffer | Current frame | Render | `APPLICATION SPECIFIED` | Texture | The render resolution color buffer for the current frame provided by the application. If the contents of the color buffer are in high dynamic range (HDR), then the [`FFX_FSR2_ENABLE_HIGH_DYNAMIC_RANGE`](src/ffx-fsr2-api/ffx_fsr2.h#L87) flag should be set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
| Exposure | Current frame | 1x1 | ``R32_FLOAT`` | Texture | A 1x1 texture containing the exposure value computed for the current frame. This resource can be supplied by the application, or computed by the [Compute luminance pyramid](#compute-luminance-pyramid) stage of FSR2 if the [`FFX_FSR2_ENABLE_AUTO_EXPOSURE`](src/ffx-fsr2-api/ffx_fsr2.h#L92) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure. |
|
||||
|
||||
### Resource outputs
|
||||
The following table contains all resources produced or modified by the [Adjust input color](#Adjust-input-color) stage.
|
||||
|
@ -592,8 +592,8 @@ The following table contains all of the resources which are required by the reco
|
|||
|
||||
| Name | Temporal layer | Resolution | Format | Type | Notes |
|
||||
| ----------------------------|-----------------|------------|------------------------------------|-----------|------------------------------------------------|
|
||||
| Depth buffer | Current frame | Render | `APPLICATION SPECIFIED (1x FLOAT)` | Texture | The render resolution depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application's control. The configuration of the depth should be communicated to FSR2 via the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164). You should set the [`FFX_FSR2_ENABLE_DEPTH_INVERTED`](../src/ffx-fsr2-api/ffx_fsr2.h#L90) flag if your depth buffer is inverted (that is [1..0] range), and you should set the flag if your depth buffer has as infinite far plane. If the application provides the depth buffer in `D32S8` format, then FSR2 will ignore the stencil component of the buffer, and create an `R32_FLOAT` resource to address the depth buffer. On GCN and RDNA hardware, depth buffers are stored separately from stencil buffers. |
|
||||
| Motion vectors | Current fraame | Render or presentation | `APPLICATION SPECIFIED (2x FLOAT)` | Texture | The 2D motion vectors for the current frame provided by the application in [*(<-width, -height>*..*<width, height>*] range. If your application renders motion vectors with a different range, you may use the [`motionVectorScale`](../src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to adjust them to match the expected range for FSR2. Internally, FSR2 uses 16bit quantities to represent motion vectors in many cases, which means that while motion vectors with greater precision can be provided, FSR2 will not benefit from the increased precision. The resolution of the motion vector buffer should be equal to the render resolution, unless the [`FFX_FSR2_ENABLE_DISPLAY_RESOLUTION_MOTION_VECTORS`](../src/ffx-fsr2-api/ffx_fsr2.h#L88) flag is set in the [`flags`](../src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](../src/ffx-fsr2-api/ffx_fsr2.h#L164), in which case it should be equal to the presentation resolution. |
|
||||
| Depth buffer | Current frame | Render | `APPLICATION SPECIFIED (1x FLOAT)` | Texture | The render resolution depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application's control. The configuration of the depth should be communicated to FSR2 via the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164). You should set the [`FFX_FSR2_ENABLE_DEPTH_INVERTED`](src/ffx-fsr2-api/ffx_fsr2.h#L90) flag if your depth buffer is inverted (that is [1..0] range), and you should set the flag if your depth buffer has as infinite far plane. If the application provides the depth buffer in `D32S8` format, then FSR2 will ignore the stencil component of the buffer, and create an `R32_FLOAT` resource to address the depth buffer. On GCN and RDNA hardware, depth buffers are stored separately from stencil buffers. |
|
||||
| Motion vectors | Current fraame | Render or presentation | `APPLICATION SPECIFIED (2x FLOAT)` | Texture | The 2D motion vectors for the current frame provided by the application in [*(<-width, -height>*..*<width, height>*] range. If your application renders motion vectors with a different range, you may use the [`motionVectorScale`](src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure to adjust them to match the expected range for FSR2. Internally, FSR2 uses 16bit quantities to represent motion vectors in many cases, which means that while motion vectors with greater precision can be provided, FSR2 will not benefit from the increased precision. The resolution of the motion vector buffer should be equal to the render resolution, unless the [`FFX_FSR2_ENABLE_DISPLAY_RESOLUTION_MOTION_VECTORS`](src/ffx-fsr2-api/ffx_fsr2.h#L88) flag is set in the [`flags`](src/ffx-fsr2-api/ffx_fsr2.h#L103) field of the [`FfxFsr2ContextDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L101) structure when creating the [`FfxFsr2Context`](src/ffx-fsr2-api/ffx_fsr2.h#L164), in which case it should be equal to the presentation resolution. |
|
||||
|
||||
### Resource outputs
|
||||
The following table contains all of the resources which are produced by the reconstruct & dilate stage.
|
||||
|
@ -609,7 +609,7 @@ The following table contains all of the resources which are produced by the reco
|
|||
### Description
|
||||
The first step of the [Reconstruct & dilate](#reconstruct-and-dilate) stage is to compute the dilated depth values and motion vectors from the application's depth values and motion vectors for the current frame. Dilated depth values and motion vectors emphasise the edges of geometry which has been rendered into the depth buffer. This is because the edges of geometry will often introduce discontinuities into a contiguous series of depth values, meaning that as depth values and motion vectors are dilated, they will naturally follow the contours of the geometric edges present in the depth buffer. In order to compute the dilated depth values and motion vectors, FSR2 looks at the depth values for a 3x3 neighbourhood for each pixel and then selects the depth values and motion vectors in that neighbourhood where the depth value is nearest to the camera. In the diagram below, you can see how the central pixel of the 3x3 kernel is updated with the depth value and motion vectors from the pixel with the largest depth value - the pixel on the central, right hand side.
|
||||
|
||||
As this stage is the first time that motion vectors are consumed by FSR2, this is where motion vector scaling is applied if using the FSR2 host API. Motion vector scaling factors provided via the [`motionVectorScale`](../src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](../src/ffx-fsr2-api/ffx_fsr2.h#L114) structure and allows you to transform non-screenspace motion vectors into screenspace motion vectors which FSR2 expects.
|
||||
As this stage is the first time that motion vectors are consumed by FSR2, this is where motion vector scaling is applied if using the FSR2 host API. Motion vector scaling factors provided via the [`motionVectorScale`](src/ffx-fsr2-api/ffx_fsr2.h#L125) field of the [`FfxFsr2DispatchDescription`](src/ffx-fsr2-api/ffx_fsr2.h#L114) structure and allows you to transform non-screenspace motion vectors into screenspace motion vectors which FSR2 expects.
|
||||
|
||||
``` CPP
|
||||
// An example of how to manipulate motion vector scaling factors using the FSR2 host API.
|
||||
|
|
|
@ -1,11 +0,0 @@
|
|||
include(common)
|
||||
|
||||
add_library(winpixeventruntimelib SHARED IMPORTED GLOBAL)
|
||||
set_property(TARGET winpixeventruntimelib PROPERTY IMPORTED_IMPLIB ${CMAKE_CURRENT_SOURCE_DIR}/WinPixEventRuntime.lib)
|
||||
|
||||
set(WINPIXEVENT_BIN
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/WinPixEventRuntime.dll"
|
||||
)
|
||||
|
||||
copyTargetCommand("${WINPIXEVENT_BIN}" ${CMAKE_RUNTIME_OUTPUT_DIRECTORY} copied_winpixevent_bin)
|
||||
add_dependencies(winpixeventruntimelib copied_winpixevent_bin)
|
|
@ -1,531 +0,0 @@
|
|||
/*==========================================================================;
|
||||
*
|
||||
* Copyright (C) Microsoft Corporation. All Rights Reserved.
|
||||
*
|
||||
* File: PIXEvents.h
|
||||
* Content: PIX include file
|
||||
* Don't include this file directly - use pix3.h
|
||||
*
|
||||
****************************************************************************/
|
||||
#pragma once
|
||||
|
||||
#ifndef _PixEvents_H_
|
||||
#define _PixEvents_H_
|
||||
|
||||
#ifndef _PIX3_H_
|
||||
# error Do not include this file directly - use pix3.h
|
||||
#endif
|
||||
|
||||
#include "PIXEventsCommon.h"
|
||||
|
||||
#if defined(XBOX) || defined(_XBOX_ONE) || defined(_DURANGO)
|
||||
# define PIX_XBOX
|
||||
#endif
|
||||
|
||||
#if _MSC_VER < 1800
|
||||
# error This version of pix3.h is only supported on Visual Studio 2013 or higher
|
||||
#elif _MSC_VER < 1900
|
||||
# ifndef constexpr // Visual Studio 2013 doesn't support constexpr
|
||||
# define constexpr
|
||||
# define PIX3__DEFINED_CONSTEXPR
|
||||
# endif
|
||||
#endif
|
||||
|
||||
namespace PIXEventsDetail
|
||||
{
|
||||
template<typename... ARGS>
|
||||
struct PIXEventTypeInferer
|
||||
{
|
||||
static constexpr PIXEventType Begin() { return PIXEvent_BeginEvent_VarArgs; }
|
||||
static constexpr PIXEventType SetMarker() { return PIXEvent_SetMarker_VarArgs; }
|
||||
static constexpr PIXEventType BeginOnContext() { return PIXEvent_BeginEvent_OnContext_VarArgs; }
|
||||
static constexpr PIXEventType SetMarkerOnContext() { return PIXEvent_SetMarker_OnContext_VarArgs; }
|
||||
|
||||
// Xbox and Windows store different types of events for context events.
|
||||
// On Xbox these include a context argument, while on Windows they do
|
||||
// not. It is important not to change the event types used on the
|
||||
// Windows version as there are OS components (eg debug layer & DRED)
|
||||
// that decode event structs.
|
||||
#ifdef PIX_XBOX
|
||||
static constexpr PIXEventType GpuBeginOnContext() { return PIXEvent_BeginEvent_OnContext_VarArgs; }
|
||||
static constexpr PIXEventType GpuSetMarkerOnContext() { return PIXEvent_SetMarker_OnContext_VarArgs; }
|
||||
#else
|
||||
static constexpr PIXEventType GpuBeginOnContext() { return PIXEvent_BeginEvent_VarArgs; }
|
||||
static constexpr PIXEventType GpuSetMarkerOnContext() { return PIXEvent_SetMarker_VarArgs; }
|
||||
#endif
|
||||
};
|
||||
|
||||
template<>
|
||||
struct PIXEventTypeInferer<void>
|
||||
{
|
||||
static constexpr PIXEventType Begin() { return PIXEvent_BeginEvent_NoArgs; }
|
||||
static constexpr PIXEventType SetMarker() { return PIXEvent_SetMarker_NoArgs; }
|
||||
static constexpr PIXEventType BeginOnContext() { return PIXEvent_BeginEvent_OnContext_NoArgs; }
|
||||
static constexpr PIXEventType SetMarkerOnContext() { return PIXEvent_SetMarker_OnContext_NoArgs; }
|
||||
|
||||
#ifdef PIX_XBOX
|
||||
static constexpr PIXEventType GpuBeginOnContext() { return PIXEvent_BeginEvent_OnContext_NoArgs; }
|
||||
static constexpr PIXEventType GpuSetMarkerOnContext() { return PIXEvent_SetMarker_OnContext_NoArgs; }
|
||||
#else
|
||||
static constexpr PIXEventType GpuBeginOnContext() { return PIXEvent_BeginEvent_NoArgs; }
|
||||
static constexpr PIXEventType GpuSetMarkerOnContext() { return PIXEvent_SetMarker_NoArgs; }
|
||||
#endif
|
||||
};
|
||||
|
||||
inline void PIXCopyEventArguments(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit)
|
||||
{
|
||||
// nothing
|
||||
}
|
||||
|
||||
template<typename ARG, typename... ARGS>
|
||||
void PIXCopyEventArguments(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, ARG const& arg, ARGS const&... args)
|
||||
{
|
||||
PIXCopyEventArgument(destination, limit, arg);
|
||||
PIXCopyEventArguments(destination, limit, args...);
|
||||
}
|
||||
|
||||
template<typename STR, typename... ARGS>
|
||||
__declspec(noinline) void PIXBeginEventAllocate(PIXEventsThreadInfo* threadInfo, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
UINT64 time = PIXEventsReplaceBlock(threadInfo, false);
|
||||
if (!time)
|
||||
return;
|
||||
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
if (destination >= limit)
|
||||
return;
|
||||
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::Begin());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
|
||||
template<typename STR, typename... ARGS>
|
||||
void PIXBeginEvent(UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsThreadInfo* threadInfo = PIXGetThreadInfo();
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
|
||||
if (destination < limit)
|
||||
{
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
UINT64 time = PIXGetTimestampCounter();
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::Begin());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
else if (limit != nullptr)
|
||||
{
|
||||
PIXBeginEventAllocate(threadInfo, color, formatString);
|
||||
}
|
||||
}
|
||||
|
||||
template<typename STR, typename... ARGS>
|
||||
__declspec(noinline) void PIXSetMarkerAllocate(PIXEventsThreadInfo* threadInfo, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
UINT64 time = PIXEventsReplaceBlock(threadInfo, false);
|
||||
if (!time)
|
||||
return;
|
||||
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
|
||||
if (destination >= limit)
|
||||
return;
|
||||
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::SetMarker());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
|
||||
template<typename STR, typename... ARGS>
|
||||
void PIXSetMarker(UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsThreadInfo* threadInfo = PIXGetThreadInfo();
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
if (destination < limit)
|
||||
{
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
UINT64 time = PIXGetTimestampCounter();
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::SetMarker());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
else if (limit != nullptr)
|
||||
{
|
||||
PIXSetMarkerAllocate(threadInfo, color, formatString, args...);
|
||||
}
|
||||
}
|
||||
|
||||
#if !PIX_XBOX
|
||||
template<typename STR, typename... ARGS>
|
||||
__declspec(noinline) void PIXBeginEventOnContextCpuAllocate(PIXEventsThreadInfo* threadInfo, void* context, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
UINT64 time = PIXEventsReplaceBlock(threadInfo, false);
|
||||
if (!time)
|
||||
return;
|
||||
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
|
||||
if (destination >= limit)
|
||||
return;
|
||||
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::BeginOnContext());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, context, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
|
||||
template<typename STR, typename... ARGS>
|
||||
void PIXBeginEventOnContextCpu(void* context, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsThreadInfo* threadInfo = PIXGetThreadInfo();
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
if (destination < limit)
|
||||
{
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
UINT64 time = PIXGetTimestampCounter();
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::BeginOnContext());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, context, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
else if (limit != nullptr)
|
||||
{
|
||||
PIXBeginEventOnContextCpuAllocate(threadInfo, context, color, formatString, args...);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
template<typename CONTEXT, typename STR, typename... ARGS>
|
||||
void PIXBeginEvent(CONTEXT* context, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
#if PIX_XBOX
|
||||
PIXBeginEvent(color, formatString, args...);
|
||||
#else
|
||||
PIXBeginEventOnContextCpu(context, color, formatString, args...);
|
||||
#endif
|
||||
|
||||
// TODO: we've already encoded this once for the CPU event - figure out way to avoid doing it again
|
||||
UINT64 buffer[PIXEventsGraphicsRecordSpaceQwords];
|
||||
UINT64* destination = buffer;
|
||||
UINT64* limit = buffer + PIXEventsGraphicsRecordSpaceQwords - PIXEventsReservedTailSpaceQwords;
|
||||
|
||||
*destination++ = PIXEncodeEventInfo(0, PIXEventTypeInferer<ARGS...>::GpuBeginOnContext());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, formatString, args...);
|
||||
*destination = 0ull;
|
||||
|
||||
PIXBeginGPUEventOnContext(context, static_cast<void*>(buffer), static_cast<UINT>(reinterpret_cast<BYTE*>(destination) - reinterpret_cast<BYTE*>(buffer)));
|
||||
}
|
||||
|
||||
#if !PIX_XBOX
|
||||
template<typename STR, typename... ARGS>
|
||||
__declspec(noinline) void PIXSetMarkerOnContextCpuAllocate(PIXEventsThreadInfo* threadInfo, void* context, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
UINT64 time = PIXEventsReplaceBlock(threadInfo, false);
|
||||
if (!time)
|
||||
return;
|
||||
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
|
||||
if (destination >= limit)
|
||||
return;
|
||||
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::SetMarkerOnContext());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, context, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
|
||||
template<typename STR, typename... ARGS>
|
||||
void PIXSetMarkerOnContextCpu(void* context, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsThreadInfo* threadInfo = PIXGetThreadInfo();
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
if (destination < limit)
|
||||
{
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
UINT64 time = PIXGetTimestampCounter();
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEventTypeInferer<ARGS...>::SetMarkerOnContext());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, context, formatString, args...);
|
||||
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
else if (limit != nullptr)
|
||||
{
|
||||
PIXSetMarkerOnContextCpuAllocate(threadInfo, context, color, formatString, args...);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
template<typename CONTEXT, typename STR, typename... ARGS>
|
||||
void PIXSetMarker(CONTEXT* context, UINT64 color, STR formatString, ARGS... args)
|
||||
{
|
||||
#if PIX_XBOX
|
||||
PIXSetMarker(color, formatString, args...);
|
||||
#else
|
||||
PIXSetMarkerOnContextCpu(context, color, formatString, args...);
|
||||
#endif
|
||||
|
||||
UINT64 buffer[PIXEventsGraphicsRecordSpaceQwords];
|
||||
UINT64* destination = buffer;
|
||||
UINT64* limit = buffer + PIXEventsGraphicsRecordSpaceQwords - PIXEventsReservedTailSpaceQwords;
|
||||
|
||||
*destination++ = PIXEncodeEventInfo(0, PIXEventTypeInferer<ARGS...>::GpuSetMarkerOnContext());
|
||||
*destination++ = color;
|
||||
|
||||
PIXCopyEventArguments(destination, limit, formatString, args...);
|
||||
*destination = 0ull;
|
||||
|
||||
PIXSetGPUMarkerOnContext(context, static_cast<void*>(buffer), static_cast<UINT>(reinterpret_cast<BYTE*>(destination) - reinterpret_cast<BYTE*>(buffer)));
|
||||
}
|
||||
|
||||
__declspec(noinline) inline void PIXEndEventAllocate(PIXEventsThreadInfo* threadInfo)
|
||||
{
|
||||
UINT64 time = PIXEventsReplaceBlock(threadInfo, true);
|
||||
if (!time)
|
||||
return;
|
||||
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
|
||||
if (destination >= limit)
|
||||
return;
|
||||
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEvent_EndEvent);
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
|
||||
inline void PIXEndEvent()
|
||||
{
|
||||
PIXEventsThreadInfo* threadInfo = PIXGetThreadInfo();
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
if (destination < limit)
|
||||
{
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
UINT64 time = PIXGetTimestampCounter();
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEvent_EndEvent);
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
else if (limit != nullptr)
|
||||
{
|
||||
PIXEndEventAllocate(threadInfo);
|
||||
}
|
||||
}
|
||||
|
||||
#if !PIX_XBOX
|
||||
__declspec(noinline) inline void PIXEndEventOnContextCpuAllocate(PIXEventsThreadInfo* threadInfo, void* context)
|
||||
{
|
||||
UINT64 time = PIXEventsReplaceBlock(threadInfo, true);
|
||||
if (!time)
|
||||
return;
|
||||
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
|
||||
if (destination >= limit)
|
||||
return;
|
||||
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEvent_EndEvent_OnContext);
|
||||
PIXCopyEventArgument(destination, limit, context);
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
|
||||
inline void PIXEndEventOnContextCpu(void* context)
|
||||
{
|
||||
PIXEventsThreadInfo* threadInfo = PIXGetThreadInfo();
|
||||
UINT64* destination = threadInfo->destination;
|
||||
UINT64* limit = threadInfo->biasedLimit;
|
||||
if (destination < limit)
|
||||
{
|
||||
limit += PIXEventsSafeFastCopySpaceQwords;
|
||||
UINT64 time = PIXGetTimestampCounter();
|
||||
*destination++ = PIXEncodeEventInfo(time, PIXEvent_EndEvent_OnContext);
|
||||
PIXCopyEventArgument(destination, limit, context);
|
||||
*destination = PIXEventsBlockEndMarker;
|
||||
threadInfo->destination = destination;
|
||||
}
|
||||
else if (limit != nullptr)
|
||||
{
|
||||
PIXEndEventOnContextCpuAllocate(threadInfo, context);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
template<typename CONTEXT>
|
||||
void PIXEndEvent(CONTEXT* context)
|
||||
{
|
||||
#if PIX_XBOX
|
||||
PIXEndEvent();
|
||||
#else
|
||||
PIXEndEventOnContextCpu(context);
|
||||
#endif
|
||||
PIXEndGPUEventOnContext(context);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
template<typename... ARGS>
|
||||
void PIXBeginEvent(UINT64 color, PCWSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXBeginEvent(color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename... ARGS>
|
||||
void PIXBeginEvent(UINT64 color, PCSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXBeginEvent(color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename... ARGS>
|
||||
void PIXSetMarker(UINT64 color, PCWSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXSetMarker(color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename... ARGS>
|
||||
void PIXSetMarker(UINT64 color, PCSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXSetMarker(color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename CONTEXT, typename... ARGS>
|
||||
void PIXBeginEvent(CONTEXT* context, UINT64 color, PCWSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXBeginEvent(context, color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename CONTEXT, typename... ARGS>
|
||||
void PIXBeginEvent(CONTEXT* context, UINT64 color, PCSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXBeginEvent(context, color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename CONTEXT, typename... ARGS>
|
||||
void PIXSetMarker(CONTEXT* context, UINT64 color, PCWSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXSetMarker(context, color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename CONTEXT, typename... ARGS>
|
||||
void PIXSetMarker(CONTEXT* context, UINT64 color, PCSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXEventsDetail::PIXSetMarker(context, color, formatString, args...);
|
||||
}
|
||||
|
||||
inline void PIXEndEvent()
|
||||
{
|
||||
PIXEventsDetail::PIXEndEvent();
|
||||
}
|
||||
|
||||
template<typename CONTEXT>
|
||||
void PIXEndEvent(CONTEXT* context)
|
||||
{
|
||||
PIXEventsDetail::PIXEndEvent(context);
|
||||
}
|
||||
|
||||
template<typename CONTEXT>
|
||||
class PIXScopedEventObject
|
||||
{
|
||||
CONTEXT* m_context;
|
||||
|
||||
public:
|
||||
template<typename... ARGS>
|
||||
PIXScopedEventObject(CONTEXT* context, UINT64 color, PCWSTR formatString, ARGS... args)
|
||||
: m_context(context)
|
||||
{
|
||||
PIXBeginEvent(context, color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename... ARGS>
|
||||
PIXScopedEventObject(CONTEXT* context, UINT64 color, PCSTR formatString, ARGS... args)
|
||||
: m_context(context)
|
||||
{
|
||||
PIXBeginEvent(context, color, formatString, args...);
|
||||
}
|
||||
|
||||
~PIXScopedEventObject()
|
||||
{
|
||||
PIXEndEvent(m_context);
|
||||
}
|
||||
};
|
||||
|
||||
template<>
|
||||
class PIXScopedEventObject<void>
|
||||
{
|
||||
public:
|
||||
template<typename... ARGS>
|
||||
PIXScopedEventObject(UINT64 color, PCWSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXBeginEvent(color, formatString, args...);
|
||||
}
|
||||
|
||||
template<typename... ARGS>
|
||||
PIXScopedEventObject(UINT64 color, PCSTR formatString, ARGS... args)
|
||||
{
|
||||
PIXBeginEvent(color, formatString, args...);
|
||||
}
|
||||
|
||||
~PIXScopedEventObject()
|
||||
{
|
||||
PIXEndEvent();
|
||||
}
|
||||
};
|
||||
|
||||
#define PIXConcatenate(a, b) a ## b
|
||||
#define PIXGetScopedEventVariableName(a, b) PIXConcatenate(a, b)
|
||||
#define PIXScopedEvent(context, ...) PIXScopedEventObject<PIXInferScopedEventType<decltype(context)>::Type> PIXGetScopedEventVariableName(pixEvent, __LINE__)(context, __VA_ARGS__)
|
||||
|
||||
#ifdef PIX3__DEFINED_CONSTEXPR
|
||||
#undef constexpr
|
||||
#undef PIX3__DEFINED_CONSTEXPR
|
||||
#endif
|
||||
|
||||
#endif // _PIXEvents_H__
|
|
@ -1,587 +0,0 @@
|
|||
/*==========================================================================;
|
||||
*
|
||||
* Copyright (C) Microsoft Corporation. All Rights Reserved.
|
||||
*
|
||||
* File: PIXEventsCommon.h
|
||||
* Content: PIX include file
|
||||
* Don't include this file directly - use pix3.h
|
||||
*
|
||||
****************************************************************************/
|
||||
#pragma once
|
||||
|
||||
#ifndef _PIXEventsCommon_H_
|
||||
#define _PIXEventsCommon_H_
|
||||
|
||||
#include <cstdint>
|
||||
|
||||
#if defined(_M_X64) || defined(_M_IX86)
|
||||
#include <emmintrin.h>
|
||||
#endif
|
||||
|
||||
//
|
||||
// The PIXBeginEvent and PIXSetMarker functions have an optimized path for
|
||||
// copying strings that work by copying 128-bit or 64-bits at a time. In some
|
||||
// circumstances this may result in PIX logging the remaining memory after the
|
||||
// null terminator.
|
||||
//
|
||||
// By default this optimization is enabled unless Address Sanitizer is enabled,
|
||||
// since this optimization can trigger a global-buffer-overflow when copying
|
||||
// string literals.
|
||||
//
|
||||
// The PIX_ENABLE_BLOCK_ARGUMENT_COPY controls whether or not this optimization
|
||||
// is enabled. Applications may also explicitly set this macro to 0 to disable
|
||||
// the optimization if necessary.
|
||||
//
|
||||
|
||||
#if defined(PIX_ENABLE_BLOCK_ARGUMENT_COPY)
|
||||
// Previously set values override everything
|
||||
# define PIX_ENABLE_BLOCK_ARGUMENT_COPY_SET 0
|
||||
#elif defined(__has_feature)
|
||||
# if __has_feature(address_sanitizer)
|
||||
// Disable block argument copy when address sanitizer is enabled
|
||||
# define PIX_ENABLE_BLOCK_ARGUMENT_COPY 0
|
||||
# define PIX_ENABLE_BLOCK_ARGUMENT_COPY_SET 1
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#if !defined(PIX_ENABLE_BLOCK_ARGUMENT_COPY)
|
||||
// Default to enabled.
|
||||
# define PIX_ENABLE_BLOCK_ARGUMENT_COPY 1
|
||||
# define PIX_ENABLE_BLOCK_ARGUMENT_COPY_SET 1
|
||||
#endif
|
||||
|
||||
struct PIXEventsBlockInfo;
|
||||
|
||||
struct PIXEventsThreadInfo
|
||||
{
|
||||
PIXEventsBlockInfo* block;
|
||||
UINT64* biasedLimit;
|
||||
UINT64* destination;
|
||||
};
|
||||
|
||||
extern "C" UINT64 WINAPI PIXEventsReplaceBlock(PIXEventsThreadInfo* threadInfo, bool getEarliestTime) noexcept;
|
||||
|
||||
enum PIXEventType
|
||||
{
|
||||
PIXEvent_EndEvent = 0x000,
|
||||
PIXEvent_BeginEvent_VarArgs = 0x001,
|
||||
PIXEvent_BeginEvent_NoArgs = 0x002,
|
||||
PIXEvent_SetMarker_VarArgs = 0x007,
|
||||
PIXEvent_SetMarker_NoArgs = 0x008,
|
||||
|
||||
PIXEvent_EndEvent_OnContext = 0x010,
|
||||
PIXEvent_BeginEvent_OnContext_VarArgs = 0x011,
|
||||
PIXEvent_BeginEvent_OnContext_NoArgs = 0x012,
|
||||
PIXEvent_SetMarker_OnContext_VarArgs = 0x017,
|
||||
PIXEvent_SetMarker_OnContext_NoArgs = 0x018,
|
||||
};
|
||||
|
||||
static const UINT64 PIXEventsReservedRecordSpaceQwords = 64;
|
||||
//this is used to make sure SSE string copy always will end 16-byte write in the current block
|
||||
//this way only a check if destination < limit can be performed, instead of destination < limit - 1
|
||||
//since both these are UINT64* and SSE writes in 16 byte chunks, 8 bytes are kept in reserve
|
||||
//so even if SSE overwrites 8 extra bytes, those will still belong to the correct block
|
||||
//on next iteration check destination will be greater than limit
|
||||
//this is used as well for fixed size UMD events and PIXEndEvent since these require less space
|
||||
//than other variable length user events and do not need big reserved space
|
||||
static const UINT64 PIXEventsReservedTailSpaceQwords = 2;
|
||||
static const UINT64 PIXEventsSafeFastCopySpaceQwords = PIXEventsReservedRecordSpaceQwords - PIXEventsReservedTailSpaceQwords;
|
||||
static const UINT64 PIXEventsGraphicsRecordSpaceQwords = 64;
|
||||
|
||||
//Bits 7-19 (13 bits)
|
||||
static const UINT64 PIXEventsBlockEndMarker = 0x00000000000FFF80;
|
||||
|
||||
//Bits 10-19 (10 bits)
|
||||
static const UINT64 PIXEventsTypeReadMask = 0x00000000000FFC00;
|
||||
static const UINT64 PIXEventsTypeWriteMask = 0x00000000000003FF;
|
||||
static const UINT64 PIXEventsTypeBitShift = 10;
|
||||
|
||||
//Bits 20-63 (44 bits)
|
||||
static const UINT64 PIXEventsTimestampReadMask = 0xFFFFFFFFFFF00000;
|
||||
static const UINT64 PIXEventsTimestampWriteMask = 0x00000FFFFFFFFFFF;
|
||||
static const UINT64 PIXEventsTimestampBitShift = 20;
|
||||
|
||||
inline UINT64 PIXEncodeEventInfo(UINT64 timestamp, PIXEventType eventType)
|
||||
{
|
||||
return ((timestamp & PIXEventsTimestampWriteMask) << PIXEventsTimestampBitShift) |
|
||||
(((UINT64)eventType & PIXEventsTypeWriteMask) << PIXEventsTypeBitShift);
|
||||
}
|
||||
|
||||
//Bits 60-63 (4)
|
||||
static const UINT64 PIXEventsStringAlignmentWriteMask = 0x000000000000000F;
|
||||
static const UINT64 PIXEventsStringAlignmentReadMask = 0xF000000000000000;
|
||||
static const UINT64 PIXEventsStringAlignmentBitShift = 60;
|
||||
|
||||
//Bits 55-59 (5)
|
||||
static const UINT64 PIXEventsStringCopyChunkSizeWriteMask = 0x000000000000001F;
|
||||
static const UINT64 PIXEventsStringCopyChunkSizeReadMask = 0x0F80000000000000;
|
||||
static const UINT64 PIXEventsStringCopyChunkSizeBitShift = 55;
|
||||
|
||||
//Bit 54
|
||||
static const UINT64 PIXEventsStringIsANSIWriteMask = 0x0000000000000001;
|
||||
static const UINT64 PIXEventsStringIsANSIReadMask = 0x0040000000000000;
|
||||
static const UINT64 PIXEventsStringIsANSIBitShift = 54;
|
||||
|
||||
//Bit 53
|
||||
static const UINT64 PIXEventsStringIsShortcutWriteMask = 0x0000000000000001;
|
||||
static const UINT64 PIXEventsStringIsShortcutReadMask = 0x0020000000000000;
|
||||
static const UINT64 PIXEventsStringIsShortcutBitShift = 53;
|
||||
|
||||
inline UINT64 PIXEncodeStringInfo(UINT64 alignment, UINT64 copyChunkSize, BOOL isANSI, BOOL isShortcut)
|
||||
{
|
||||
return ((alignment & PIXEventsStringAlignmentWriteMask) << PIXEventsStringAlignmentBitShift) |
|
||||
((copyChunkSize & PIXEventsStringCopyChunkSizeWriteMask) << PIXEventsStringCopyChunkSizeBitShift) |
|
||||
(((UINT64)isANSI & PIXEventsStringIsANSIWriteMask) << PIXEventsStringIsANSIBitShift) |
|
||||
(((UINT64)isShortcut & PIXEventsStringIsShortcutWriteMask) << PIXEventsStringIsShortcutBitShift);
|
||||
}
|
||||
|
||||
template<UINT alignment, class T>
|
||||
inline bool PIXIsPointerAligned(T* pointer)
|
||||
{
|
||||
return !(((UINT64)pointer) & (alignment - 1));
|
||||
}
|
||||
|
||||
// Generic template version slower because of the additional clear write
|
||||
template<class T>
|
||||
inline void PIXCopyEventArgument(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, T argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*destination = 0ull;
|
||||
*((T*)destination) = argument;
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
// int32 specialization to avoid slower double memory writes
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<INT32>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, INT32 argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*reinterpret_cast<INT64*>(destination) = static_cast<INT64>(argument);
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
// unsigned int32 specialization to avoid slower double memory writes
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<UINT32>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, UINT32 argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*destination = static_cast<UINT64>(argument);
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
// int64 specialization to avoid slower double memory writes
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<INT64>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, INT64 argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*reinterpret_cast<INT64*>(destination) = argument;
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
// unsigned int64 specialization to avoid slower double memory writes
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<UINT64>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, UINT64 argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*destination = argument;
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
//floats must be cast to double during writing the data to be properly printed later when reading the data
|
||||
//this is needed because when float is passed to varargs function it's cast to double
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<float>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, float argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*reinterpret_cast<double*>(destination) = static_cast<double>(argument);
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
//char has to be cast to a longer signed integer type
|
||||
//this is due to printf not ignoring correctly the upper bits of unsigned long long for a char format specifier
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<char>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, char argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*reinterpret_cast<INT64*>(destination) = static_cast<INT64>(argument);
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
//unsigned char has to be cast to a longer unsigned integer type
|
||||
//this is due to printf not ignoring correctly the upper bits of unsigned long long for a char format specifier
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<unsigned char>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, unsigned char argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*destination = static_cast<UINT64>(argument);
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
//bool has to be cast to an integer since it's not explicitly supported by string format routines
|
||||
//there's no format specifier for bool type, but it should work with integer format specifiers
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<bool>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, bool argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
*destination = static_cast<UINT64>(argument);
|
||||
++destination;
|
||||
}
|
||||
}
|
||||
|
||||
inline void PIXCopyEventArgumentSlowest(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PCSTR argument)
|
||||
{
|
||||
*destination++ = PIXEncodeStringInfo(0, 8, TRUE, FALSE);
|
||||
while (destination < limit)
|
||||
{
|
||||
UINT64 c = static_cast<uint8_t>(argument[0]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = 0;
|
||||
return;
|
||||
}
|
||||
UINT64 x = c;
|
||||
c = static_cast<uint8_t>(argument[1]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 8;
|
||||
c = static_cast<uint8_t>(argument[2]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 16;
|
||||
c = static_cast<uint8_t>(argument[3]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 24;
|
||||
c = static_cast<uint8_t>(argument[4]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 32;
|
||||
c = static_cast<uint8_t>(argument[5]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 40;
|
||||
c = static_cast<uint8_t>(argument[6]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 48;
|
||||
c = static_cast<uint8_t>(argument[7]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 56;
|
||||
*destination++ = x;
|
||||
argument += 8;
|
||||
}
|
||||
}
|
||||
|
||||
inline void PIXCopyEventArgumentSlow(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PCSTR argument)
|
||||
{
|
||||
#if PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
if (PIXIsPointerAligned<8>(argument))
|
||||
{
|
||||
*destination++ = PIXEncodeStringInfo(0, 8, TRUE, FALSE);
|
||||
UINT64* source = (UINT64*)argument;
|
||||
while (destination < limit)
|
||||
{
|
||||
UINT64 qword = *source++;
|
||||
*destination++ = qword;
|
||||
//check if any of the characters is a terminating zero
|
||||
if (!((qword & 0xFF00000000000000) &&
|
||||
(qword & 0xFF000000000000) &&
|
||||
(qword & 0xFF0000000000) &&
|
||||
(qword & 0xFF00000000) &&
|
||||
(qword & 0xFF000000) &&
|
||||
(qword & 0xFF0000) &&
|
||||
(qword & 0xFF00) &&
|
||||
(qword & 0xFF)))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
#endif // PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
{
|
||||
PIXCopyEventArgumentSlowest(destination, limit, argument);
|
||||
}
|
||||
}
|
||||
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<PCSTR>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PCSTR argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
if (argument != nullptr)
|
||||
{
|
||||
#if (defined(_M_X64) || defined(_M_IX86)) && PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
if (PIXIsPointerAligned<16>(argument))
|
||||
{
|
||||
*destination++ = PIXEncodeStringInfo(0, 16, TRUE, FALSE);
|
||||
__m128i zero = _mm_setzero_si128();
|
||||
if (PIXIsPointerAligned<16>(destination))
|
||||
{
|
||||
while (destination < limit)
|
||||
{
|
||||
__m128i mem = _mm_load_si128((__m128i*)argument);
|
||||
_mm_store_si128((__m128i*)destination, mem);
|
||||
//check if any of the characters is a terminating zero
|
||||
__m128i res = _mm_cmpeq_epi8(mem, zero);
|
||||
destination += 2;
|
||||
if (_mm_movemask_epi8(res))
|
||||
break;
|
||||
argument += 16;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
while (destination < limit)
|
||||
{
|
||||
__m128i mem = _mm_load_si128((__m128i*)argument);
|
||||
_mm_storeu_si128((__m128i*)destination, mem);
|
||||
//check if any of the characters is a terminating zero
|
||||
__m128i res = _mm_cmpeq_epi8(mem, zero);
|
||||
destination += 2;
|
||||
if (_mm_movemask_epi8(res))
|
||||
break;
|
||||
argument += 16;
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
#endif // (defined(_M_X64) || defined(_M_IX86)) && PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
{
|
||||
PIXCopyEventArgumentSlow(destination, limit, argument);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
*destination++ = 0ull;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<PSTR>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PSTR argument)
|
||||
{
|
||||
PIXCopyEventArgument(destination, limit, (PCSTR)argument);
|
||||
}
|
||||
|
||||
inline void PIXCopyEventArgumentSlowest(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PCWSTR argument)
|
||||
{
|
||||
*destination++ = PIXEncodeStringInfo(0, 8, FALSE, FALSE);
|
||||
while (destination < limit)
|
||||
{
|
||||
UINT64 c = static_cast<uint16_t>(argument[0]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = 0;
|
||||
return;
|
||||
}
|
||||
UINT64 x = c;
|
||||
c = static_cast<uint16_t>(argument[1]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 16;
|
||||
c = static_cast<uint16_t>(argument[2]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 32;
|
||||
c = static_cast<uint16_t>(argument[3]);
|
||||
if (!c)
|
||||
{
|
||||
*destination++ = x;
|
||||
return;
|
||||
}
|
||||
x |= c << 48;
|
||||
*destination++ = x;
|
||||
argument += 4;
|
||||
}
|
||||
}
|
||||
|
||||
inline void PIXCopyEventArgumentSlow(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PCWSTR argument)
|
||||
{
|
||||
#if PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
if (PIXIsPointerAligned<8>(argument))
|
||||
{
|
||||
*destination++ = PIXEncodeStringInfo(0, 8, FALSE, FALSE);
|
||||
UINT64* source = (UINT64*)argument;
|
||||
while (destination < limit)
|
||||
{
|
||||
UINT64 qword = *source++;
|
||||
*destination++ = qword;
|
||||
//check if any of the characters is a terminating zero
|
||||
//TODO: check if reversed condition is faster
|
||||
if (!((qword & 0xFFFF000000000000) &&
|
||||
(qword & 0xFFFF00000000) &&
|
||||
(qword & 0xFFFF0000) &&
|
||||
(qword & 0xFFFF)))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
#endif // PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
{
|
||||
PIXCopyEventArgumentSlowest(destination, limit, argument);
|
||||
}
|
||||
}
|
||||
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<PCWSTR>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PCWSTR argument)
|
||||
{
|
||||
if (destination < limit)
|
||||
{
|
||||
if (argument != nullptr)
|
||||
{
|
||||
#if (defined(_M_X64) || defined(_M_IX86)) && PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
if (PIXIsPointerAligned<16>(argument))
|
||||
{
|
||||
*destination++ = PIXEncodeStringInfo(0, 16, FALSE, FALSE);
|
||||
__m128i zero = _mm_setzero_si128();
|
||||
if (PIXIsPointerAligned<16>(destination))
|
||||
{
|
||||
while (destination < limit)
|
||||
{
|
||||
__m128i mem = _mm_load_si128((__m128i*)argument);
|
||||
_mm_store_si128((__m128i*)destination, mem);
|
||||
//check if any of the characters is a terminating zero
|
||||
__m128i res = _mm_cmpeq_epi16(mem, zero);
|
||||
destination += 2;
|
||||
if (_mm_movemask_epi8(res))
|
||||
break;
|
||||
argument += 8;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
while (destination < limit)
|
||||
{
|
||||
__m128i mem = _mm_load_si128((__m128i*)argument);
|
||||
_mm_storeu_si128((__m128i*)destination, mem);
|
||||
//check if any of the characters is a terminating zero
|
||||
__m128i res = _mm_cmpeq_epi16(mem, zero);
|
||||
destination += 2;
|
||||
if (_mm_movemask_epi8(res))
|
||||
break;
|
||||
argument += 8;
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
#endif // (defined(_M_X64) || defined(_M_IX86)) && PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
{
|
||||
PIXCopyEventArgumentSlow(destination, limit, argument);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
*destination++ = 0ull;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template<>
|
||||
inline void PIXCopyEventArgument<PWSTR>(_Out_writes_to_ptr_(limit) UINT64*& destination, _In_ const UINT64* limit, _In_ PWSTR argument)
|
||||
{
|
||||
PIXCopyEventArgument(destination, limit, (PCWSTR)argument);
|
||||
};
|
||||
|
||||
#if defined(__d3d12_x_h__) || defined(__d3d12_h__)
|
||||
|
||||
inline void PIXSetGPUMarkerOnContext(_In_ ID3D12GraphicsCommandList* commandList, _In_reads_bytes_(size) void* data, UINT size)
|
||||
{
|
||||
commandList->SetMarker(D3D12_EVENT_METADATA, data, size);
|
||||
}
|
||||
|
||||
inline void PIXSetGPUMarkerOnContext(_In_ ID3D12CommandQueue* commandQueue, _In_reads_bytes_(size) void* data, UINT size)
|
||||
{
|
||||
commandQueue->SetMarker(D3D12_EVENT_METADATA, data, size);
|
||||
}
|
||||
|
||||
inline void PIXBeginGPUEventOnContext(_In_ ID3D12GraphicsCommandList* commandList, _In_reads_bytes_(size) void* data, UINT size)
|
||||
{
|
||||
commandList->BeginEvent(D3D12_EVENT_METADATA, data, size);
|
||||
}
|
||||
|
||||
inline void PIXBeginGPUEventOnContext(_In_ ID3D12CommandQueue* commandQueue, _In_reads_bytes_(size) void* data, UINT size)
|
||||
{
|
||||
commandQueue->BeginEvent(D3D12_EVENT_METADATA, data, size);
|
||||
}
|
||||
|
||||
inline void PIXEndGPUEventOnContext(_In_ ID3D12GraphicsCommandList* commandList)
|
||||
{
|
||||
commandList->EndEvent();
|
||||
}
|
||||
|
||||
inline void PIXEndGPUEventOnContext(_In_ ID3D12CommandQueue* commandQueue)
|
||||
{
|
||||
commandQueue->EndEvent();
|
||||
}
|
||||
|
||||
#endif //__d3d12_x_h__
|
||||
|
||||
template<class T> struct PIXInferScopedEventType { typedef T Type; };
|
||||
template<class T> struct PIXInferScopedEventType<const T> { typedef T Type; };
|
||||
template<class T> struct PIXInferScopedEventType<T*> { typedef T Type; };
|
||||
template<class T> struct PIXInferScopedEventType<T* const> { typedef T Type; };
|
||||
template<> struct PIXInferScopedEventType<UINT64> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<const UINT64> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<INT64> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<const INT64> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<UINT> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<const UINT> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<INT> { typedef void Type; };
|
||||
template<> struct PIXInferScopedEventType<const INT> { typedef void Type; };
|
||||
|
||||
|
||||
#if PIX_ENABLE_BLOCK_ARGUMENT_COPY_SET
|
||||
#undef PIX_ENABLE_BLOCK_ARGUMENT_COPY
|
||||
#endif
|
||||
|
||||
#undef PIX_ENABLE_BLOCK_ARGUMENT_COPY_SET
|
||||
|
||||
#endif //_PIXEventsCommon_H_
|
Binary file not shown.
Binary file not shown.
|
@ -1,144 +0,0 @@
|
|||
/*==========================================================================;
|
||||
*
|
||||
* Copyright (C) Microsoft Corporation. All Rights Reserved.
|
||||
*
|
||||
* File: pix3.h
|
||||
* Content: PIX include file
|
||||
*
|
||||
****************************************************************************/
|
||||
#pragma once
|
||||
|
||||
#ifndef _PIX3_H_
|
||||
#define _PIX3_H_
|
||||
|
||||
#include <sal.h>
|
||||
|
||||
#ifndef __cplusplus
|
||||
#error "Only C++ files can include pix3.h. C is not supported."
|
||||
#endif
|
||||
|
||||
#if !defined(USE_PIX_SUPPORTED_ARCHITECTURE)
|
||||
#if defined(_M_X64) || defined(USE_PIX_ON_ALL_ARCHITECTURES) || defined(_M_ARM64)
|
||||
#define USE_PIX_SUPPORTED_ARCHITECTURE
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#if !defined(USE_PIX)
|
||||
#if defined(USE_PIX_SUPPORTED_ARCHITECTURE) && (defined(_DEBUG) || DBG || defined(PROFILE) || defined(PROFILE_BUILD)) && !defined(_PREFAST_)
|
||||
#define USE_PIX
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#if defined(USE_PIX) && !defined(USE_PIX_SUPPORTED_ARCHITECTURE)
|
||||
#pragma message("Warning: Pix markers are only supported on AMD64 and ARM64")
|
||||
#endif
|
||||
|
||||
#if defined(XBOX) || defined(_XBOX_ONE) || defined(_DURANGO) || defined(_GAMING_XBOX)
|
||||
#include "pix3_xbox.h"
|
||||
#else
|
||||
#include "pix3_win.h"
|
||||
#endif
|
||||
|
||||
// These flags are used by both PIXBeginCapture and PIXGetCaptureState
|
||||
#define PIX_CAPTURE_TIMING (1 << 0)
|
||||
#define PIX_CAPTURE_GPU (1 << 1)
|
||||
#define PIX_CAPTURE_FUNCTION_SUMMARY (1 << 2)
|
||||
#define PIX_CAPTURE_FUNCTION_DETAILS (1 << 3)
|
||||
#define PIX_CAPTURE_CALLGRAPH (1 << 4)
|
||||
#define PIX_CAPTURE_INSTRUCTION_TRACE (1 << 5)
|
||||
#define PIX_CAPTURE_SYSTEM_MONITOR_COUNTERS (1 << 6)
|
||||
#define PIX_CAPTURE_VIDEO (1 << 7)
|
||||
#define PIX_CAPTURE_AUDIO (1 << 8)
|
||||
|
||||
union PIXCaptureParameters
|
||||
{
|
||||
enum PIXCaptureStorage
|
||||
{
|
||||
Hybrid = 0,
|
||||
Disk,
|
||||
Memory,
|
||||
};
|
||||
|
||||
struct GpuCaptureParameters
|
||||
{
|
||||
PVOID reserved;
|
||||
} GpuCaptureParameters;
|
||||
|
||||
struct TimingCaptureParameters
|
||||
{
|
||||
PWSTR FileName;
|
||||
UINT32 MaximumToolingMemorySizeMb;
|
||||
PIXCaptureStorage CaptureStorage;
|
||||
|
||||
BOOL CaptureGpuTiming;
|
||||
|
||||
BOOL CaptureCallstacks;
|
||||
BOOL CaptureCpuSamples;
|
||||
UINT32 CpuSamplesPerSecond;
|
||||
} TimingCaptureParameters;
|
||||
};
|
||||
|
||||
typedef PIXCaptureParameters* PPIXCaptureParameters;
|
||||
|
||||
|
||||
#if defined(USE_PIX) && defined(USE_PIX_SUPPORTED_ARCHITECTURE)
|
||||
|
||||
#define PIX_EVENTS_ARE_TURNED_ON
|
||||
|
||||
#include "PIXEventsCommon.h"
|
||||
#include "PIXEvents.h"
|
||||
|
||||
// Starts a programmatically controlled capture.
|
||||
// captureFlags uses the PIX_CAPTURE_* family of flags to specify the type of capture to take
|
||||
extern "C" HRESULT WINAPI PIXBeginCapture1(DWORD captureFlags, _In_opt_ const PPIXCaptureParameters captureParameters);
|
||||
inline HRESULT PIXBeginCapture(DWORD captureFlags, _In_opt_ const PPIXCaptureParameters captureParameters) { return PIXBeginCapture1(captureFlags, captureParameters); }
|
||||
|
||||
// Stops a programmatically controlled capture
|
||||
// If discard == TRUE, the captured data is discarded
|
||||
// If discard == FALSE, the captured data is saved
|
||||
extern "C" HRESULT WINAPI PIXEndCapture(BOOL discard);
|
||||
|
||||
extern "C" DWORD WINAPI PIXGetCaptureState();
|
||||
|
||||
extern "C" void WINAPI PIXReportCounter(_In_ PCWSTR name, float value);
|
||||
|
||||
#else
|
||||
|
||||
// Eliminate these APIs when not using PIX
|
||||
inline HRESULT PIXBeginCapture1(DWORD, _In_opt_ const PIXCaptureParameters*) { return S_OK; }
|
||||
inline HRESULT PIXBeginCapture(DWORD, _In_opt_ const PIXCaptureParameters*) { return S_OK; }
|
||||
inline HRESULT PIXEndCapture(BOOL) { return S_OK; }
|
||||
inline DWORD PIXGetCaptureState() { return 0; }
|
||||
inline void PIXReportCounter(_In_ PCWSTR, float) {}
|
||||
inline void PIXNotifyWakeFromFenceSignal(_In_ HANDLE) {}
|
||||
|
||||
inline void PIXBeginEvent(UINT64, _In_ PCSTR, ...) {}
|
||||
inline void PIXBeginEvent(UINT64, _In_ PCWSTR, ...) {}
|
||||
inline void PIXBeginEvent(void*, UINT64, _In_ PCSTR, ...) {}
|
||||
inline void PIXBeginEvent(void*, UINT64, _In_ PCWSTR, ...) {}
|
||||
inline void PIXEndEvent() {}
|
||||
inline void PIXEndEvent(void*) {}
|
||||
inline void PIXSetMarker(UINT64, _In_ PCSTR, ...) {}
|
||||
inline void PIXSetMarker(UINT64, _In_ PCWSTR, ...) {}
|
||||
inline void PIXSetMarker(void*, UINT64, _In_ PCSTR, ...) {}
|
||||
inline void PIXSetMarker(void*, UINT64, _In_ PCWSTR, ...) {}
|
||||
inline void PIXScopedEvent(UINT64, _In_ PCSTR, ...) {}
|
||||
inline void PIXScopedEvent(UINT64, _In_ PCWSTR, ...) {}
|
||||
inline void PIXScopedEvent(void*, UINT64, _In_ PCSTR, ...) {}
|
||||
inline void PIXScopedEvent(void*, UINT64, _In_ PCWSTR, ...) {}
|
||||
|
||||
// don't show warnings about expressions with no effect
|
||||
#pragma warning(disable:4548)
|
||||
#pragma warning(disable:4555)
|
||||
|
||||
#endif // USE_PIX
|
||||
|
||||
// Use these functions to specify colors to pass as metadata to a PIX event/marker API.
|
||||
// Use PIX_COLOR() to specify a particular color for an event.
|
||||
// Or, use PIX_COLOR_INDEX() to specify a set of unique event categories, and let PIX choose
|
||||
// the colors to represent each category.
|
||||
inline UINT PIX_COLOR(BYTE r, BYTE g, BYTE b) { return 0xff000000 | (r << 16) | (g << 8) | b; }
|
||||
inline UINT PIX_COLOR_INDEX(BYTE i) { return i; }
|
||||
const UINT PIX_COLOR_DEFAULT = PIX_COLOR_INDEX(0);
|
||||
|
||||
#endif // _PIX3_H_
|
|
@ -1,48 +0,0 @@
|
|||
/*==========================================================================;
|
||||
*
|
||||
* Copyright (C) Microsoft Corporation. All Rights Reserved.
|
||||
*
|
||||
* File: PIX3_win.h
|
||||
* Content: PIX include file
|
||||
* Don't include this file directly - use pix3.h
|
||||
*
|
||||
****************************************************************************/
|
||||
|
||||
#pragma once
|
||||
|
||||
#ifndef _PIX3_H_
|
||||
#error Don't include this file directly - use pix3.h
|
||||
#endif
|
||||
|
||||
#ifndef _PIX3_WIN_H_
|
||||
#define _PIX3_WIN_H_
|
||||
|
||||
// PIXEventsThreadInfo is defined in PIXEventsCommon.h
|
||||
struct PIXEventsThreadInfo;
|
||||
|
||||
extern "C" PIXEventsThreadInfo* PIXGetThreadInfo() noexcept;
|
||||
|
||||
#if defined(USE_PIX) && defined(USE_PIX_SUPPORTED_ARCHITECTURE)
|
||||
// Notifies PIX that an event handle was set as a result of a D3D12 fence being signaled.
|
||||
// The event specified must have the same handle value as the handle
|
||||
// used in ID3D12Fence::SetEventOnCompletion.
|
||||
extern "C" void WINAPI PIXNotifyWakeFromFenceSignal(_In_ HANDLE event);
|
||||
#endif
|
||||
|
||||
// The following defines denote the different metadata values that have been used
|
||||
// by tools to denote how to parse pix marker event data. The first two values
|
||||
// are legacy values.
|
||||
#define WINPIX_EVENT_UNICODE_VERSION 0
|
||||
#define WINPIX_EVENT_ANSI_VERSION 1
|
||||
#define WINPIX_EVENT_PIX3BLOB_VERSION 2
|
||||
|
||||
#define D3D12_EVENT_METADATA WINPIX_EVENT_PIX3BLOB_VERSION
|
||||
|
||||
__forceinline UINT64 PIXGetTimestampCounter()
|
||||
{
|
||||
LARGE_INTEGER time = {};
|
||||
QueryPerformanceCounter(&time);
|
||||
return time.QuadPart;
|
||||
}
|
||||
|
||||
#endif //_PIX3_WIN_H_
|
|
@ -1,48 +0,0 @@
|
|||
// This file is part of the FidelityFX SDK.
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
// The above copyright notice and this permission notice shall be included in
|
||||
// all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
// THE SOFTWARE.
|
||||
|
||||
Texture2D<float4> r_in : register(t0);
|
||||
RWTexture2D<float4> rw_out : register(u0);
|
||||
|
||||
cbuffer cbBlit0 : register(b0) {
|
||||
|
||||
int g_outChannelRed;
|
||||
int g_outChannelGreen;
|
||||
int g_outChannelBlue;
|
||||
float2 outChannelRedMinMax;
|
||||
float2 outChannelGreenMinMax;
|
||||
float2 outChannelBlueMinMax;
|
||||
};
|
||||
|
||||
[numthreads(8, 8, 1)]
|
||||
void CS(uint3 globalID : SV_DispatchThreadID)
|
||||
{
|
||||
float4 srcColor = r_in[globalID.xy];
|
||||
|
||||
// remap channels
|
||||
float4 dstColor = float4(srcColor[g_outChannelRed], srcColor[g_outChannelGreen], srcColor[g_outChannelBlue], 1.f);
|
||||
|
||||
// apply offset and scale
|
||||
dstColor.rgb -= float3(outChannelRedMinMax.x, outChannelGreenMinMax.x, outChannelBlueMinMax.x);
|
||||
dstColor.rgb /= float3(outChannelRedMinMax.y - outChannelRedMinMax.x, outChannelGreenMinMax.y - outChannelGreenMinMax.x, outChannelBlueMinMax.y - outChannelBlueMinMax.x);
|
||||
|
||||
rw_out[globalID.xy] = float4(dstColor);
|
||||
}
|
|
@ -1,36 +0,0 @@
|
|||
// FidelityFX Super Resolution Sample
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
|
||||
//
|
||||
// This file is part of the FidelityFX Super Resolution beta which is
|
||||
// released under the BETA SOFTWARE EVALUATION LICENSE AGREEMENT.
|
||||
//
|
||||
// See file LICENSE.txt for full license details.
|
||||
|
||||
Texture2D<float4> r_in : register(t0);
|
||||
RWTexture2D<float4> rw_out : register(u0);
|
||||
|
||||
cbuffer cbBlit0 : register(b0) {
|
||||
|
||||
int g_outChannelRed;
|
||||
int g_outChannelGreen;
|
||||
int g_outChannelBlue;
|
||||
float2 outChannelRedMinMax;
|
||||
float2 outChannelGreenMinMax;
|
||||
float2 outChannelBlueMinMax;
|
||||
};
|
||||
|
||||
[numthreads(8, 8, 1)]
|
||||
void CS(uint3 globalID : SV_DispatchThreadID)
|
||||
{
|
||||
float4 srcColor = r_in[globalID.xy];
|
||||
|
||||
// remap channels
|
||||
float4 dstColor = float4(srcColor[g_outChannelRed], srcColor[g_outChannelGreen], srcColor[g_outChannelBlue], 1.f);
|
||||
|
||||
// apply offset and scale
|
||||
dstColor.rgb -= float3(outChannelRedMinMax.x, outChannelGreenMinMax.x, outChannelBlueMinMax.x);
|
||||
dstColor.rgb /= float3(outChannelRedMinMax.y - outChannelRedMinMax.x, outChannelGreenMinMax.y - outChannelGreenMinMax.x, outChannelBlueMinMax.y - outChannelBlueMinMax.x);
|
||||
|
||||
rw_out[globalID.xy] = float4(dstColor);
|
||||
}
|
Loading…
Reference in New Issue
Block a user