one of the cool things about Sycl, is that Intel support it, and integrate rather nicely with visual studio 2017.
I try this about tow year ago, when the say integrated with VS 2015,
but it was a contrived process with may moronic steps that I could never got to work,
but I took a look again and now they have an automatic integration that let the use make sycl app but just making a project. so you just select dcp++ and teh type of project and that's all. form the intell
[url]https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-dpcpp-compiler/top.html
[/url]
it say this:
Windows*
The compiler integrates into the following versions of Microsoft Visual Studio*:
Visual Studio 2019
Visual Studio 2017
and it is true, so I guess that we can just go for Sycl with a minimum of VS-2017
I just made a hello world project, and this is how one of the sycl kernel looks like
- Code: Select all
//************************************
// Compute vector addition in DPC++ on device: sum of the data is returned in
// 3rd parameter "sum_parallel"
//************************************
void VectorAddInDPCPP(
queue& q,
const IntArray &addend_1,
const IntArray &addend_2,
IntArray &sum_parallel)
{
// create the range object for the arrays managed by the buffer
range<1> num_items{array_size};
buffer<int, 1> addend_1_buf(addend_1.data(), num_items);
buffer<int, 1> addend_2_buf(addend_2.data(), num_items);
buffer<int, 1> sum_buf(sum_parallel.data(), num_items);
auto TestKernel = [&](handler &h)
{
auto sum_accessor = sum_buf.get_access<dp_write>(h);
auto addend_1_accessor = addend_1_buf.get_access<dp_read>(h);
auto addend_2_accessor = addend_2_buf.get_access<dp_read>(h);
h.parallel_for(num_items, [=](id<1> i)
{
sum_accessor[i] = addend_1_accessor[i] + addend_2_accessor[i];
});
};
q.submit(TestKernel);
}
this is all cpp code, but guess what this is how kernels look in newton now
- Code: Select all
void VectorAddInNewton(
ndThreadPool* const threadPool,
const ndArray<ndInt32>& addend_1,
const ndArray<ndInt32>& addend_2,
ndArray<ndInt32>& sum_parallel)
{
auto TestKernel = ndMakeObject::ndFunction([&](ndInt32 threadIndex, ndInt32 threadCount)
{
D_TRACKTIME();
const ndStartEnd startEnd(sum_parallel.GetCount(), threadIndex, threadCount);
for (ndInt32 i = startEnd.m_start; i < startEnd.m_end; ++i)
{
sum_parallel[i] = addend_1[i] + addend_2[i];
}
});
threadPool->ParallelExecute(TestKernel);
}
the similarity is remarkable.
there are few hurdles, I can only get the Sycl to work when selection the Host device, the GPU returns is recognized, but when execute the kernel, return false.
also I do had not found how to make enumerate devices other than intel, but the say the new versions works can select CUDA back end, and it is opencl base, so if there is an open cl drive,
it should recognize Intel, but also AMD and Nvidia gpus.
anyway I will experiment with this because thsi coudl be a huge, development for newton.