Posted
over 13 years
ago
by
mbien
Since I am doing a lot with OpenCL lately I decided to try to improve the tooling around OpenCL a bit. A weekend later the NetBeans OpenCL Pack was born :).
Features Including:
OpenCL Editor with syntax highlighting code completion and CL
... [More]
reference pages integration
OpenCL compiler integration
In-editor annotations of compiler warnings and errors updated as you type
JOCL project template
Technical Details:
The editor uses ANTLR as parser and lexer. This allows such simple things like keyword highlighting and also more complex features like semantic highlighting, formatting and auto completion (formatting is not yet implemented). It can also detect and report syntax errors, however this feature is automatically disabled if an OpenCL compiler is present on the host system. All with help of JOCL detected OpenCL implementations can be used as compiler backend.
Instead of using the old OpenGL Pack as template I decided to write it from scratch using latest NetBeans 7 and Java 7 APIs. So you will have to start NB with JDK7 to be able to use it.
Download
you can download it from the NetBeans plugin portal, sourcecode is on github
feedback and/or contributions/bugreports are as always appreciated
Screenshots:
have fun!
[Less]
|
Posted
over 13 years
ago
by
mbien
Since I am doing a lot with OpenCL lately I decided to try to improve the tooling around OpenCL a bit. A weekend later the NetBeans OpenCL Pack was born :).
Features Including:
OpenCL Editor with syntax highlighting code completion and CL
... [More]
reference pages integration
OpenCL compiler integration
In-editor annotations of compiler warnings and errors updated as you type
JOCL project template
Technical Details:
The editor uses ANTLR as parser and lexer. This allows such simple things like keyword highlighting and also more complex features like semantic highlighting, formatting and auto completion (formatting is not yet implemented). It can also detect and report syntax errors, however this feature is automatically disabled if an OpenCL compiler is present on the host system. All with help of JOCL detected OpenCL implementations can be used as compiler backend.
Instead of using the old OpenGL Pack as template I decided to write it from scratch using latest NetBeans 7 and Java 7 APIs. So you will have to start NB with JDK7 to be able to use it.
Download
you can download it from the NetBeans plugin portal
[mirror], sourcecode is on github
feedback and/or contributions/bugreports are as always appreciated
Screenshots:
have fun!
[Less]
|
Posted
over 13 years
ago
by
mbien
Ok some of them are big, but I will only cover the little things with this blog entry :).
CLKernel
I added multiple utility methods to CLKernel and related classes. It is for example now possible to create a kernel and set its arguments in
... [More]
one line.
CLKernel sha512 = program.createCLKernel("sha512", padBuffer, digestBuffer, rangeBuffer);
Thanks to feedback in the jocl forums I also added methods to set vector typed arguments directly. In past you could do this only by setting them via a java.util.Buffer.
kernel.setArg(index, x, y, z, w);
Another small feature of CLKernel is to enforce 32bit arguments. You may want to switch between single and double floatingpoint precision at runtime or mix between both to improve performance you will have to compile the program with the double FP extension enabled. By setting kernel.setForce32bitArgs(true) all java doubles used as kernel arguments will be automatically cast down to 32bit CL floats (see MultiDeviceFractal demo for a example). This is nothing special but might safe you several if(single){setArg((float)foo)}else{setArg(foo)} constructs.
CLWork
CLKernel still only represents the function in the OpenCL program you want to call - nothing more. The new CLWork object contains everything required for kernel execution, like the NDRange and the kernel itself.
int size = buffer.getNIOCapacity();
CLWork1D work = CLWork.create1D(program.createCLKernel("sum", buffer, size));
work.setWorkSize(size, 1).optimizeFor(device);
// execute
queue.putWriteBuffer(buffer, false)
.putWork(work)
.putReadBuffer(buffer, true);
optimizeFor(device) adjusts the workgroup size to meet device specific recommended values. This should make sure that all computing units of your GPU are used by dividing the work into groups (however this only works if your task does not care about the workgroup size, see javadoc).
CLSubDevice
Sometimes you don't want to put your CLDevice under 100% load. This might be the case for example if your device is the CPU your application is running on or if you have to share the GPU with an OpenGL context for rendering. One easy way of controlling device load is to limit the amount of compute units used for a task.
CLPlatform platform = CLPlatform.getDefault(version(CL_1_1), type(CPU));
CLDevice devices = platform.getMaxFLOPSDevice(type(CPU));
CLSubDevice[] subs = device.createSubDevicesByCount(4, 4);
// array contains now two virtual devices containing four CPU cores each
CLContext context = CLContext.create(subs);
CLCommandQueue queue = subs[0].createCommandQueue();
...
CLSubDevices extends CLDevice and can be used for context creation, queue creation and everywhere you would use the CLDevice. Prior to creating subdevices you should check if device.isFissionSupported() returns true.
CLProgram builder
Ok, this utility is not that new but I haven't blogged about it yet. If program.build() isn't enough you should take a look at the program builder.
CLBuildConfiguration stores everything which is needed for program compilation and is easily configurable via the builder pattern :).
// reusable builder
CLBuildConfiguration builder = CLProgramBuilder.createConfiguration()
.withOption(ENABLE_MAD)
.forDevices(context.getDevices())
.withDefine("RADIUS", 5)
.withDefine("ENABLE_FOOBAR");
builder.build(programA);
builder.build(programB);
...
CLBuildConfiguration is fully reusable and can be upgraded to CLProgramConfiguration if you combine it with a CLProgram. Both can be serialised which allows to store the build configuration or the entire prebuild program on disc or send it over the network. (caching binaries on disc can safe startup time for example)
// program configuration
ois = new ObjectInputStream(new FileInputStream(file));
CLProgramConfiguration programConfig = CLProgramBuilder.loadConfiguration(ois, context);
assertNotNull(programConfig.getProgram());
ois.close();
program = programConfig.build(); // builds from source or loads binaries if possible
assertTrue(program.isExecutable());
Note: loading binaries and associating them with the right driver/device is currently not trivial with OpenCL. Even if everything works as intended it is still possible that the driver refuses the binaries for some reason (driver update...etc). Thats why its recommended to add the program source to the configuration before calling build() to allow a automatic rebuild as fallback.
// another entry point for complex builds (prepare() returns CLProgramConfiguration)
program.prepare().withOption(ENABLE_MAD).forDevice(context.getMaxFlopsDevice()).build();
(all snippets have been stolen from the junit tests)
I am sure I forgot something... but this should cover at least some of the incremental improvements. Expect a few more blog entries for the larger features soon.
- - - - - -
In other news: Nvidia released OpenCL 1.1 drivers, some of us thought this would never happen -> all major vendors (AMD, Intel, NV, IBM, ZiiLABS ..) support now OpenCL 1.1 (screenshot)
have fun!
[Less]
|
Posted
over 13 years
ago
by
mbien
Ok some of them are big, but I will only cover the little things with this blog entry :).
CLKernel
I added multiple utility methods around CLKernel. It is for example now possible to create a kernel and set its arguments in one line.
... [More]
CLKernel sha512 = program.createCLKernel("sha512", padBuffer, digestBuffer, rangeBuffer);
Thanks to feedback in the jocl forums I also added methods to set vector typed arguments directly. In past you could do this only by setting them via a java.util.Buffer.
kernel.setArg(index, x, y, z, w);
Another small feature of CLKernel is to enforce 32bit arguments. You may want to switch between single and double floatingpoint precision at runtime or mix between both to improve performance you will have to compile the program with the double FP extension enabled. If you do so, all 32bit float arguments will become 64bit floats. By setting kernel.setForce32bitArgs(true) all java doubles used as kernel arguments will be automatically cast down to 32bit CL floats (see MultiDeviceFractal demo for a example). This is nothing special but might safe you several if(single){setArg((float)foo)}else{setArg(foo)} constructs.
CLWork
CLKernel still only represents the function in the OpenCL program you want to call - nothing more. The new CLWork object contains everything required for kernel execution, like the NDRange and the kernel itself.
int size = buffer.getNIOCapacity();
CLWork1D work = CLWork.create1D(program.createCLKernel("sum", buffer, size));
work.setWorkSize(size, 1).optimizeFor(device);
// execute
queue.putWriteBuffer(buffer, false)
.putWork(work)
.putReadBuffer(buffer, true);
optimizeFor(device) adjusts the workgroup size to meet device specific recommended values. This should make sure that all computing units of your GPU are used by dividing the work into groups (however this only works if your task does not care about the workgroup size, see javadoc).
CLSubDevice
Sometimes you don't want to put your CLDevice under 100% load. This might be the case for example if your device is the CPU your application is running on or if you have to share the GPU with an OpenGL context for rendering.
CLPlatform platform = CLPlatform.getDefault(version(CL_1_1), type(CPU));
CLDevice devices = platform.getMaxFLOPSDevice(type(CPU));
CLSubDevice[] subs = device.createSubDevicesByCount(4, 4);
// array contains now two virtual devices containing four CPU cores each
CLContext context = CLContext.create(subs);
CLProgram builder
Ok, this utility is not that new but I haven't blogged about it yet. If program.build() isn't enough you should take a look at the program builder.
CLBuildConfiguration stores everything which is needed for program compilation and is easily configurable via the builder pattern :).
// reusable builder
CLBuildConfiguration builder = CLProgramBuilder.createConfiguration()
.withOption(ENABLE_MAD)
.forDevices(context.getDevices())
.withDefine("RADIUS", 5)
.withDefine("ENABLE_FOOBAR");
builder.build(programA);
builder.build(programB);
...
CLBuildConfiguration is fully reusable and can be upgraded to CLProgramConfiguration if you combine it with a CLProgram. Both can be serialised which allows to store the build configuration or the entire prebuild program on disc or send it over the network. (caching binaries on disc can safe startup time for example)
// program configuration
ois = new ObjectInputStream(new FileInputStream(file));
CLProgramConfiguration programConfig = CLProgramBuilder.loadConfiguration(ois, context);
assertNotNull(programConfig.getProgram());
ois.close();
program = programConfig.build(); // builds from source or loads binaries if possible
assertTrue(program.isExecutable());
Note: loading binaries and associating them with the right driver/device is currently not trivial with OpenCL. Even if everything works as intended it is still possible that the driver refuses the binaries for some reason (driver update...etc). Thats why its recommended to add the program source to the configuration before calling build() to allow a automatic rebuild as fallback.
// another entry point for complex builds (prepare() returns CLProgramConfiguration)
program.prepare().withOption(ENABLE_MAD).forDevice(context.getMaxFlopsDevice()).build();
(all snippets have been stolen from the junit tests)
I am sure I forgot something... but this should cover at least some of the incremental improvements. Expect a few more blog entries for the larger features soon.
- - - - - -
In other news: Nvidia released OpenCL 1.1 drivers, some of us thought this would never happen -> all three vendors (AMD, Intel, NV) support now OpenCL 1.1 (screenshot)
have fun!
[Less]
|
Posted
over 13 years
ago
by
mbien
One nice feature of OpenCL is that the platform abstraction was handled in the spec from the first day on. You can install all OpenCL drivers side by side and let the application choose at runtime, on which device and on which platform it should
... [More]
execute the kernels.
As of today there are three four vendors which provide OpenCL implementations for the desktop. AMD and Intel support the OpenCL 1.1 specification where Nvidia apparently tries to stick with 1.0 to encourage their customers to stick with CUDA ;-). [edit] And of course there is also Apple providing out-of-the box OpenCL 1.0 support in MacOS 10.6.
JOCL contains a small CLInfo utility which can be used to quickly verify OpenCL installations. Here is the output on my system (ubuntu was booted) having all three SDKs installed:
CL_PLATFORM_NAMEATI StreamNVIDIA CUDAIntel(R) OpenCLCL_PLATFORM_VERSIONOpenCL 1.1 ATI-Stream-v2.2 (302)OpenCL 1.0 CUDA 4.0.1OpenCL 1.1 LINUXCL_PLATFORM_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILECL_PLATFORM_VENDORAdvanced Micro Devices, Inc.NVIDIA CorporationIntel(R) CorporationCL_PLATFORM_ICD_SUFFIX_KHRAMDNVIntelCL_PLATFORM_EXTENSIONS[cl_khr_icd, cl_amd_event_callback][cl_khr_icd, cl_khr_byte_addressable_store, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_gl_sharing][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]CL_DEVICE_NAMEIntel(R) Core(TM) i7 CPU 940 @ 2.93GHzGeForce GTX 295GeForce GTX 295Intel(R) Core(TM) i7 CPU 940 @ 2.93GHzCL_DEVICE_TYPECPUGPUGPUCPUCL_DEVICE_AVAILABLEtruetruetruetrueCL_DEVICE_VERSIONOpenCL 1.1 ATI-Stream-v2.2 (302)OpenCL 1.0 CUDAOpenCL 1.0 CUDAOpenCL 1.1 CL_DEVICE_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILECL_DEVICE_ENDIAN_LITTLEtruetruetruetrueCL_DEVICE_VENDORGenuineIntelNVIDIA CorporationNVIDIA CorporationIntel(R) CorporationCL_DEVICE_EXTENSIONS[cl_amd_device_attribute_query, cl_khr_byte_addressable_store, cl_khr_int64_extended_atomics, cl_khr_local_int32_extended_atomics, cl_amd_fp64, cl_amd_printf, cl_khr_local_int32_base_atomics, cl_khr_int64_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_ext_device_fission][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics][cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]CL_DEVICE_MAX_COMPUTE_UNITS830308CL_DEVICE_MAX_CLOCK_FREQUENCY2934124212422930CL_DEVICE_VENDOR_ID40984318431832902CL_DEVICE_OPENCL_C_VERSIONOpenCL C 1.1 com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]OpenCL C 1.1 CL_DRIVER_VERSION2.0270.41.06270.41.061.1CL_DEVICE_ADDRESS_BITS64323264CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT8118CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR161116CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT4114CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG2112CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT4114CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE0112CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR16com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]16CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT8com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]8CL_DEVICE_NATIVE_VECTOR_WIDTH_INT4com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]4CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG2com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]2CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF0com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]0CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT4com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]4CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE0com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]2CL_DEVICE_MAX_WORK_GROUP_SIZE10245125121024CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS3333CL_DEVICE_MAX_WORK_ITEM_SIZES[1024, 1024, 1024][512, 512, 64][512, 512, 64][1024, 1024, 1024]CL_DEVICE_MAX_PARAMETER_SIZE4096435243521024CL_DEVICE_MAX_MEM_ALLOC_SIZE10737418242348318722347008003154703360CL_DEVICE_GLOBAL_MEM_SIZE322122547293932748893880320012618813440CL_DEVICE_LOCAL_MEM_SIZE32768163841638432768CL_DEVICE_HOST_UNIFIED_MEMORYtruecom.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]trueCL_DEVICE_MAX_CONSTANT_BUFFER_SIZE655366553665536131072CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE640064CL_DEVICE_GLOBAL_MEM_CACHE_SIZE3276800262144CL_DEVICE_MAX_CONSTANT_ARGS899128CL_DEVICE_IMAGE_SUPPORTfalsetruetruetrueCL_DEVICE_MAX_READ_IMAGE_ARGS0128128128CL_DEVICE_MAX_WRITE_IMAGE_ARGS088128CL_DEVICE_IMAGE2D_MAX_WIDTH0409640968192CL_DEVICE_IMAGE2D_MAX_HEIGHT032768327688192CL_DEVICE_IMAGE3D_MAX_WIDTH0204820482048CL_DEVICE_IMAGE3D_MAX_HEIGHT0204820482048CL_DEVICE_IMAGE3D_MAX_DEPTH0204820482048CL_DEVICE_MAX_SAMPLERS01616128CL_DEVICE_PROFILING_TIMER_RESOLUTION110001000340831CL_DEVICE_EXECUTION_CAPABILITIES[EXEC_KERNEL, EXEC_NATIVE_KERNEL][EXEC_KERNEL][EXEC_KERNEL][EXEC_KERNEL, EXEC_NATIVE_KERNEL]CL_DEVICE_HALF_FP_CONFIG[][][][]CL_DEVICE_SINGLE_FP_CONFIG[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO][INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST]CL_DEVICE_DOUBLE_FP_CONFIG[][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]CL_DEVICE_LOCAL_MEM_TYPEGLOBALLOCALLOCALGLOBALCL_DEVICE_GLOBAL_MEM_CACHE_TYPEREAD_WRITENONENONEREAD_WRITECL_DEVICE_QUEUE_PROPERTIES[PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE]CL_DEVICE_COMPILER_AVAILABLEtruetruetruetrueCL_DEVICE_ERROR_CORRECTION_SUPPORTfalsefalsefalsefalsecl_khr_fp16falsefalsefalsefalsecl_khr_fp64falsetruetruetruecl_khr_gl_sharing | cl_APPLE_gl_sharingtruetruetruetrue
The CLInfo utility is part of the jocl-demos project and is also available via webstart. For a plain text version of the above output you can run:
java -jar jocl.jar:gluegen-rt.jar\
-Djava.library.path="path/to/jocl/libs:path/to/gluegen/libs" com.jogamp.opencl.util.CLInfo
(btw to install the intel sdk on debian based systems follow this link)
happy coding!
[Less]
|
Posted
over 13 years
ago
by
mbien
One nice feature of OpenCL is that the platform abstraction was handled in the spec from the first day on. You can install all OpenCL drivers side by side and let the application choose at runtime, on which device and on which platform it should
... [More]
execute the kernels.
As of today there are three four vendors which provide OpenCL implementations for the desktop. AMD and Intel support the OpenCL 1.1 specification where Nvidia apparently tries to stick with 1.0 to encourage their customers to stick with CUDA ;-). [edit] And of course there is also Apple providing out-of-the box OpenCL 1.0 support in MacOS 10.6.
JOCL contains a small CLInfo utility which can be used to quickly verify OpenCL installations. Here is the output on my system (ubuntu was booted) having all three SDKs installed:
CL_PLATFORM_NAMEATI StreamNVIDIA CUDAIntel(R) OpenCLCL_PLATFORM_VERSIONOpenCL 1.1 ATI-Stream-v2.2 (302)OpenCL 1.0 CUDA 4.0.1OpenCL 1.1 LINUXCL_PLATFORM_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILECL_PLATFORM_VENDORAdvanced Micro Devices, Inc.NVIDIA CorporationIntel(R) CorporationCL_PLATFORM_ICD_SUFFIX_KHRAMDNVIntelCL_PLATFORM_EXTENSIONS[cl_khr_icd, cl_amd_event_callback][cl_khr_icd, cl_khr_byte_addressable_store, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_gl_sharing][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]CL_DEVICE_NAMEIntel(R) Core(TM) i7 CPU 940 @ 2.93GHzGeForce GTX 295GeForce GTX 295Intel(R) Core(TM) i7 CPU 940 @ 2.93GHzCL_DEVICE_TYPECPUGPUGPUCPUCL_DEVICE_AVAILABLEtruetruetruetrueCL_DEVICE_VERSIONOpenCL 1.1 ATI-Stream-v2.2 (302)OpenCL 1.0 CUDAOpenCL 1.0 CUDAOpenCL 1.1 CL_DEVICE_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILECL_DEVICE_ENDIAN_LITTLEtruetruetruetrueCL_DEVICE_VENDORGenuineIntelNVIDIA CorporationNVIDIA CorporationIntel(R) CorporationCL_DEVICE_EXTENSIONS[cl_amd_device_attribute_query, cl_khr_byte_addressable_store, cl_khr_int64_extended_atomics, cl_khr_local_int32_extended_atomics, cl_amd_fp64, cl_amd_printf, cl_khr_local_int32_base_atomics, cl_khr_int64_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_ext_device_fission][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics][cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]CL_DEVICE_MAX_COMPUTE_UNITS830308CL_DEVICE_MAX_CLOCK_FREQUENCY2934124212422930CL_DEVICE_VENDOR_ID40984318431832902CL_DEVICE_OPENCL_C_VERSIONOpenCL C 1.1 com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]OpenCL C 1.1 CL_DRIVER_VERSION2.0270.41.06270.41.061.1CL_DEVICE_ADDRESS_BITS64323264CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT8118CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR161116CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT4114CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG2112CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT4114CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE0112CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR16com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]16CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT8com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]8CL_DEVICE_NATIVE_VECTOR_WIDTH_INT4com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]4CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG2com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]2CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF0com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]0CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT4com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]4CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE0com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]2CL_DEVICE_MAX_WORK_GROUP_SIZE10245125121024CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS3333CL_DEVICE_MAX_WORK_ITEM_SIZES[1024, 1024, 1024][512, 512, 64][512, 512, 64][1024, 1024, 1024]CL_DEVICE_MAX_PARAMETER_SIZE4096435243521024CL_DEVICE_MAX_MEM_ALLOC_SIZE10737418242348318722347008003154703360CL_DEVICE_GLOBAL_MEM_SIZE322122547293932748893880320012618813440CL_DEVICE_LOCAL_MEM_SIZE32768163841638432768CL_DEVICE_HOST_UNIFIED_MEMORYtruecom.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]trueCL_DEVICE_MAX_CONSTANT_BUFFER_SIZE655366553665536131072CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE640064CL_DEVICE_GLOBAL_MEM_CACHE_SIZE3276800262144CL_DEVICE_MAX_CONSTANT_ARGS899128CL_DEVICE_IMAGE_SUPPORTfalsetruetruetrueCL_DEVICE_MAX_READ_IMAGE_ARGS0128128128CL_DEVICE_MAX_WRITE_IMAGE_ARGS088128CL_DEVICE_IMAGE2D_MAX_WIDTH0409640968192CL_DEVICE_IMAGE2D_MAX_HEIGHT032768327688192CL_DEVICE_IMAGE3D_MAX_WIDTH0204820482048CL_DEVICE_IMAGE3D_MAX_HEIGHT0204820482048CL_DEVICE_IMAGE3D_MAX_DEPTH0204820482048CL_DEVICE_MAX_SAMPLERS01616128CL_DEVICE_PROFILING_TIMER_RESOLUTION110001000340831CL_DEVICE_EXECUTION_CAPABILITIES[EXEC_KERNEL, EXEC_NATIVE_KERNEL][EXEC_KERNEL][EXEC_KERNEL][EXEC_KERNEL, EXEC_NATIVE_KERNEL]CL_DEVICE_HALF_FP_CONFIG[][][][]CL_DEVICE_SINGLE_FP_CONFIG[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO][INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST]CL_DEVICE_DOUBLE_FP_CONFIG[][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]CL_DEVICE_LOCAL_MEM_TYPEGLOBALLOCALLOCALGLOBALCL_DEVICE_GLOBAL_MEM_CACHE_TYPEREAD_WRITENONENONEREAD_WRITECL_DEVICE_QUEUE_PROPERTIES[PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE]CL_DEVICE_COMPILER_AVAILABLEtruetruetruetrueCL_DEVICE_ERROR_CORRECTION_SUPPORTfalsefalsefalsefalsecl_khr_fp16falsefalsefalsefalsecl_khr_fp64falsetruetruetruecl_khr_gl_sharing | cl_APPLE_gl_sharingtruetruetruetrue
The CLInfo utility is part of the jocl-demos project and is also available via webstart. For a plain text version of the above output you can run:
java -jar jocl.jar:gluegen-rt.jar\
-Djava.library.path="path/to/jocl/libs:path/to/gluegen/libs" com.jogamp.opencl.util.CLInfo
(btw to install the intel sdk on debian based systems follow this link)
happy coding!
[Less]
|
Posted
over 14 years
ago
by
mbien
I am currently working on Java Binding for the OpenCL API using GlueGen (as used in JOGL, JOAL). The project started as part of my bachelor of CS thesis short after the release of the first OpenCL specification draft and is now fully feature
... [More]
complete with OpenCL 1.1. JOCL is currently in the stabilization phase, a beta release shouldn't be far away.
Overview - How does it work?
JOCL enables applications running on the JVM to use OpenCL for massively parallel, high performance computing tasks, executed on heterogeneous hardware (GPUs, CPUs, FPGAs etc) in a platform independent manner. JOCL consists of two parts, the low level and the high level binding.
The low level bindings (LLB) are automatically generated using the official OpenCL headers as input and provide a high performance, JNI based, 1:1 mapping to the C functions.
This has the following advantages:
reduces maintenance overhead and ensures spec conformance
compiletime JNI bindings are the fastest way to access native libs from the JVM
makes translating OpenCL C code into Java JOCL very easy (e.g. from books or tutorials)
flexibility and stability: OpenCL libs are loaded dynamically and accessed via function pointers
The hand written high level bindings (HLB) is build on top of LLB and hides most boilerplate code (like object IDs, pointers and resource management) behind easy to use java objects.
HLB use direct NIO buffers internally for fast memory transfers between the JVM and the OpenCL implementation and is very GC friendly. Most of the API is designed for method chaining but of course you don't have to use it this way if you don't want to. JOCL also seamlessly integrates with JOGL 2 (both are built and tested together). Just pass the JOGL context as parameter to the JOCL context factory and you will receive a shared context. If you already know OpenCL and Java, HLB should be very intuitive for you.
The project is available on jogamp.org. Please use the mailinglist / forum for feedback or questions and the bugtracker if you experience any issues.
The JOCL root repository is located on github, you may also want to take a look at the jocl-demos project. (If the demos are not enough you might also want to take a look at the junit tests)
Screenshots (sourcecode in jocl-demos project):
More regarding OpenGL interoperability and other features in upcoming blog entries.
The following sample shows basic setup, computation and cleanup using the high level APIs.
Hello World or parallel a b=c
/**
* Hello Java OpenCL example. Adds all elements of buffer A to buffer B
* and stores the result in buffer C.
* Sample was inspired by the Nvidia VectorAdd example written in C/C
* which is bundled in the Nvidia OpenCL SDK.
* @author Michael Bien
*/
public class HelloJOCL {
public static void main(String[] args) throws IOException {
// Length of arrays to process (arbitrary number)
int elementCount = 11444777;
// Local work size dimensions
int localWorkSize = 256;
// rounded up to the nearest multiple of the localWorkSize
int globalWorkSize = roundUp(localWorkSize, elementCount);
// setup
CLContext context = CLContext.create();
CLProgram program = context.createProgram(
HelloJOCL.class.getResourceAsStream("VectorAdd.cl")
).build();
CLBuffer<FloatBuffer> clBufferA =
context.createFloatBuffer(globalWorkSize, READ_ONLY);
CLBuffer<FloatBuffer> clBufferB =
context.createFloatBuffer(globalWorkSize, READ_ONLY);
CLBuffer<FloatBuffer> clBufferC =
context.createFloatBuffer(globalWorkSize, WRITE_ONLY);
out.println("used device memory: "
(clBufferA.getSize() clBufferB.getSize() clBufferC.getSize())/1000000 "MB");
// fill read buffers with random numbers (just to have test data).
fillBuffer(clBufferA.getBuffer(), 12345);
fillBuffer(clBufferB.getBuffer(), 67890);
// get a reference to the kernel functon with the name 'VectorAdd'
// and map the buffers to its input parameters.
CLKernel kernel = program.createCLKernel("VectorAdd");
kernel.putArgs(clBufferA, clBufferB, clBufferC).putArg(elementCount);
// create command queue on fastest device.
CLCommandQueue queue = context.getMaxFlopsDevice().createCommandQueue();
// asynchronous write to GPU device,
// blocking read later to get the computed results back.
long time = nanoTime();
queue.putWriteBuffer(clBufferA, false)
.putWriteBuffer(clBufferB, false)
.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)
.putReadBuffer(clBufferC, true);
time = nanoTime() - time;
// cleanup all resources associated with this context.
context.release();
// print first few elements of the resulting buffer to the console.
out.println("a b=c results snapshot: ");
for(int i = 0; i < 10; i )
out.print(clBufferC.getBuffer().get() ", ");
out.println("...; " clBufferC.getBuffer().remaining() " more");
out.println("computation took: " (time/1000000) "ms");
}
private static final void fillBuffer(FloatBuffer buffer, int seed) {
Random rnd = new Random(seed);
while(buffer.remaining() != 0)
buffer.put(rnd.nextFloat()*100);
buffer.rewind();
}
private static final int roundUp(int groupSize, int globalSize) {
int r = globalSize % groupSize;
if (r == 0) {
return globalSize;
} else {
return globalSize groupSize - r;
}
}
}
VectorAdd.cl
// OpenCL Kernel Function for element by element vector addition
kernel void VectorAdd(global const float* a,
global const float* b,
global float* c, int numElements) {
// get index into global data array
int iGID = get_global_id(0);
// bound check (equivalent to the limit on a 'for' loop)
if (iGID >= numElements) {
return;
}
// add the vector elements
c[iGID] = a[iGID] b[iGID];
}
[Less]
|
Posted
over 14 years
ago
by
mbien
I am currently working on Java Binding for the OpenCL API using GlueGen (as used in JOGL, JOAL). The project started as part of my bachelor of CS thesis short after the release of the first OpenCL specification draft and is now fully feature
... [More]
complete with OpenCL 1.1. JOCL is currently in the stabilization phase, a beta release shouldn't be far away.
Overview - How does it work?
JOCL enables applications running on the JVM to use OpenCL for massively parallel, high performance computing tasks, executed on heterogeneous hardware (GPUs, CPUs, FPGAs etc) in a platform independent manner. JOCL consists of two parts, the low level and the high level binding.
The low level bindings (LLB) are automatically generated using the official OpenCL headers as input and provide a high performance, JNI based, 1:1 mapping to the C functions.
This has the following advantages:
reduces maintenance overhead and ensures spec conformance
compiletime JNI bindings are the fastest way to access native libs from the JVM
makes translating OpenCL C code into Java JOCL very easy (e.g. from books or tutorials)
flexibility and stability: OpenCL libs are loaded dynamically and accessed via function pointers
The hand written high level bindings (HLB) is build on top of LLB and hides most boilerplate code (like object IDs, pointers and resource management) behind easy to use java objects.
HLB use direct NIO buffers internally for fast memory transfers between the JVM and the OpenCL implementation and is very GC friendly. Most of the API is designed for method chaining but of course you don't have to use it this way if you don't want to. JOCL also seamlessly integrates with JOGL 2 (both are built and tested together). Just pass the JOGL context as parameter to the JOCL context factory and you will receive a shared context. If you already know OpenCL and Java, HLB should be very intuitive for you.
The project is available on jogamp.org. Please use the mailinglist / forum for feedback or questions and the bugtracker if you experience any issues.
The JOCL root repository is located on github, you may also want to take a look at the jocl-demos project. (If the demos are not enough you might also want to take a look at the junit tests)
Screenshots (sourcecode in jocl-demos project):
More regarding OpenGL interoperability and other features in upcoming blog entries.
The following sample shows basic setup, computation and cleanup using the high level APIs.
Hello World or parallel a b=c
/**
* Hello Java OpenCL example. Adds all elements of buffer A to buffer B
* and stores the result in buffer C.
* Sample was inspired by the Nvidia VectorAdd example written in C/C
* which is bundled in the Nvidia OpenCL SDK.
* @author Michael Bien
*/
public class HelloJOCL {
public static void main(String[] args) throws IOException {
// Length of arrays to process (arbitrary number)
int elementCount = 11444777;
// Local work size dimensions
int localWorkSize = 256;
// rounded up to the nearest multiple of the localWorkSize
int globalWorkSize = roundUp(localWorkSize, elementCount);
// setup
CLContext context = CLContext.create();
CLProgram program = context.createProgram(
HelloJOCL.class.getResourceAsStream("VectorAdd.cl")
).build();
CLBuffer<FloatBuffer> clBufferA =
context.createFloatBuffer(globalWorkSize, READ_ONLY);
CLBuffer<FloatBuffer> clBufferB =
context.createFloatBuffer(globalWorkSize, READ_ONLY);
CLBuffer<FloatBuffer> clBufferC =
context.createFloatBuffer(globalWorkSize, WRITE_ONLY);
out.println("used device memory: "
(clBufferA.getSize() clBufferB.getSize() clBufferC.getSize())/1000000 "MB");
// fill read buffers with random numbers (just to have test data).
fillBuffer(clBufferA.getBuffer(), 12345);
fillBuffer(clBufferB.getBuffer(), 67890);
// get a reference to the kernel functon with the name 'VectorAdd'
// and map the buffers to its input parameters.
CLKernel kernel = program.createCLKernel("VectorAdd");
kernel.putArgs(clBufferA, clBufferB, clBufferC).putArg(elementCount);
// create command queue on fastest device.
CLCommandQueue queue = context.getMaxFlopsDevice().createCommandQueue();
// asynchronous write to GPU device,
// blocking read later to get the computed results back.
long time = nanoTime();
queue.putWriteBuffer(clBufferA, false)
.putWriteBuffer(clBufferB, false)
.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)
.putReadBuffer(clBufferC, true);
time = nanoTime() - time;
// cleanup all resources associated with this context.
context.release();
// print first few elements of the resulting buffer to the console.
out.println("a b=c results snapshot: ");
for(int i = 0; i < 10; i )
out.print(clBufferC.getBuffer().get() ", ");
out.println("...; " clBufferC.getBuffer().remaining() " more");
out.println("computation took: " (time/1000000) "ms");
}
private static final void fillBuffer(FloatBuffer buffer, int seed) {
Random rnd = new Random(seed);
while(buffer.remaining() != 0)
buffer.put(rnd.nextFloat()*100);
buffer.rewind();
}
private static final int roundUp(int groupSize, int globalSize) {
int r = globalSize % groupSize;
if (r == 0) {
return globalSize;
} else {
return globalSize groupSize - r;
}
}
}
VectorAdd.cl
// OpenCL Kernel Function for element by element vector addition
kernel void VectorAdd(global const float* a,
global const float* b,
global float* c, int numElements) {
// get index into global data array
int iGID = get_global_id(0);
// bound check (equivalent to the limit on a 'for' loop)
if (iGID >= numElements) {
return;
}
// add the vector elements
c[iGID] = a[iGID] b[iGID];
}
[Less]
|
Posted
over 14 years
ago
by
mbien
Thanks to Justin Stoecker, computer science graduate student at the University of Miami, JOGL gets a new set of getting started tutorials:
JOGL, or Java Bindings for OpenGL, allows Java programs to access the OpenGL API for graphics
... [More]
programming. The graphics code in JOGL programs will look almost identical to that found in C or C++ OpenGL programs, as the API is automatically generated from C header files. This is one of the greatest strengths of JOGL, as it is quite easy to port OpenGL programs written in C or C++ to JOGL; learning JOGL is essentially learning OpenGL[...]
Tutorials:
index
Tutorial 1 - Environment Setup
Tutorial 2 - Creating a Window
Tutorial 3 - Creating a Render Loop
Thanks Justin! [Less]
|
Posted
over 14 years
ago
by
mbien
Thanks to Justin Stoecker, computer science graduate student at the University of Miami, JOGL gets a new set of getting started tutorials:
JOGL, or Java Bindings for OpenGL, allows Java programs to access the OpenGL API for graphics
... [More]
programming. The graphics code in JOGL programs will look almost identical to that found in C or C++ OpenGL programs, as the API is automatically generated from C header files. This is one of the greatest strengths of JOGL, as it is quite easy to port OpenGL programs written in C or C++ to JOGL; learning JOGL is essentially learning OpenGL[...]
Tutorials:
index
Tutorial 1 - Environment Setup
Tutorial 2 - Creating a Window
Tutorial 3 - Creating a Render Loop
Thanks Justin! [Less]
|