Bird wrote:But I can't get Dynamic Parallelism working. I always get a cudaLaunchKernel cudaErrorInvalidSource(300) error
that's the whole purpose of the effort.
I still have not try yet, so I might too have the same errors, but that another issue.
so far we now pass the point where the engine build and link the genertated code by the compiler.
I still get these nasty warnings,
- Code: Select all
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include\crt/host_runtime.h(256): warning C4505: '__cudaUnregisterBinaryUtil': unreferenced function with internal linkage has been removed
1>Done building project "ndSolverCuda.vcxproj".
1>ndCudaDevice.cu
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include\crt/host_runtime.h(256): warning C4505: '__cudaUnregisterBinaryUtil': unreferenced function with internal linkage has been removed
1>Done building project "ndSolverCuda.vcxproj".
...
1>C:/Users/julio/AppData/Local/Temp/tmpxft_0000a98c_00000000-7_ndSolverCuda_d.device-link.reg.c(2): warning C4100: 'prelinked_fatbinc': unreferenced formal parameter
1>C:/Users/julio/AppData/Local/Temp/tmpxft_0000a98c_00000000-7_ndSolverCuda_d.device-link.reg.c(3): warning C4100: 'prelinked_fatbinc': unreferenced formal parameter
...
I assume is because the compiler is generation them but the code is not calling them yet.
anyway, I think I have a good refactorization now.
the cuda context is now a dll, and containing all the cuda kernel and is made of .cu files and header.
all the cpp+ glue code is included in the ndNewton library.
so that class is what the engine see for scene and solver, and it is all c++
thet the cuda code is an interface in the dll, but does no implement any coda code,
all the code stuff is hidden in the implementation, the class looks like this
- Code: Select all
class ndCudaContext
{
public:
D_CUDA_API ndCudaContext();
D_CUDA_API ~ndCudaContext();
D_CUDA_API bool IsValid() const;
D_CUDA_API const char* GetStringId() const;
ndCudaDevice* m_device;
ndCudaContextImplement* m_implement;
};
so it is a 100% cuda application and the contact is just the c++ glue.
if an app does no use dll, we still can load the dll as a resurce using loadDll function.
this seems to works, I managed to initialice the device, but does not do anything so far and has assert almost every where.
It is just too much to cover in one weekend. but I am hopeful this will allow for two things.
1-use the dynamics paralelism.
2-use the standard code libraries in case we need it.
if anyone sync and try to build it just to make sure I did not make error on the cmake scripts.
now I will continue adding the functionality again.
one of the problem of making a strict cuda project is that I will no be able to share high level newton code. so I will have to rely on casting and proxy data structures like we did in 3.14
for example ndArray, can not be use so those will have to be in the ndNewton side and the call will have to pass the adress and size to the cuda context, because is no aware of the high level,
but I think that will be fine. and far better than no working.