Cuda c programming guide CUDA C PROGRAMMING GUIDE PG- - v July Design Guide CCHANGES FROM VERSION ?? Documented restriction that operator- overloads cannot be global functions in Operator Function ?? Removed guidance to break -byte shu es into two -byte i
CUDA C PROGRAMMING GUIDE PG- - v July Design Guide CCHANGES FROM VERSION ?? Documented restriction that operator- overloads cannot be global functions in Operator Function ?? Removed guidance to break -byte shu es into two -byte instructions -byte shu e variants are provided since CUDA See Warp Shu e Functions ?? Passing restrict references to global functions is now supported Updated comment in global functions and function templates ?? Documented CUDAENABLECRC CHECK in CUDA Environment Variables ?? Warp matrix functions PREVIEW FEATURE now support matrix products with m n k and m n k in addition to m n k ?? Added new Uni ?ed Memory sections System Allocator Hardware Coherency Access Counters www nvidia com CUDA C Programming Guide PG- - v ii CTABLE OF CONTENTS Chapter Introduction From Graphics Processing to General Purpose Parallel Computing CUDA A General-Purpose Parallel Computing Platform and Programming Model A Scalable Programming Model Document Structure Chapter Programming Model Kernels Thread Hierarchy Memory Hierarchy Heterogeneous Programming Compute Capability Chapter Programming Interface Compilation with NVCC Compilation Work ow O ine Compilation Just-in-Time Compilation Binary Compatibility PTX Compatibility Application Compatibility C C Compatibility -Bit Compatibility CUDA C Runtime Initialization Device Memory Shared Memory Page-Locked Host Memory Portable Memory Write-Combining Memory Mapped Memory Asynchronous Concurrent Execution Concurrent Execution between Host and Device Concurrent Kernel Execution Overlap of Data Transfer and Kernel Execution Concurrent Data Transfers Streams Events Synchronous Calls Multi-Device System Device Enumeration Device Selection www nvidia com CUDA C Programming Guide PG- - v iii C Stream and Event Behavior Peer-to-Peer Memory Access Peer-to-Peer Memory Copy Uni ?ed Virtual Address Space Interprocess Communication Error Checking Call Stack Texture and Surface Memory Texture Memory Surface Memory CUDA Arrays Read Write Coherency Graphics Interoperability OpenGL Interoperability Direct D Interoperability SLI Interoperability Versioning and Compatibility Compute Modes Mode Switches Tesla Compute Cluster Mode for Windows Chapter Hardware Implementation SIMT Architecture Hardware Multithreading Chapter Performance Guidelines Overall Performance Optimization Strategies Maximize Utilization Application Level Device Level Multiprocessor Level Occupancy Calculator Maximize Memory Throughput Data Transfer between Host and Device Device Memory Accesses Maximize Instruction Throughput Arithmetic Instructions Control Flow Instructions Synchronization Instruction Appendix A CUDA-Enabled GPUs Appendix B C Language Extensions B Function Execution Space Speci ?ers B device B global B host www nvidia com CUDA C Programming Guide PG- - v iv CB noinline and forceinline B Variable Memory Space Speci ?ers B device B constant B shared B managed B restrict B Built- in Vector Types B char short int long longlong oat double B dim B Built-in Variables B gridDim B blockIdx B blockDim B threadIdx B warpSize B Memory Fence Functions B Synchronization Functions B Mathematical Functions B Texture Functions B Texture Object API B tex Dfetch B tex D B tex DLod B tex DGrad B tex D B tex DLod B tex DGrad B tex D B tex DLod B tex DGrad B tex DLayered B tex DLayeredLod B tex DLayeredGrad B tex DLayered B tex DLayeredLod B tex DLayeredGrad B texCubemap
Documents similaires
-
94
-
0
-
0
Licence et utilisation
Gratuit pour un usage personnel Attribution requise- Détails
- Publié le Jan 17, 2021
- Catégorie Administration
- Langue French
- Taille du fichier 919.7kB