Extended CUDA Library (ecuda)
2.0
|
Encapsulates CUDA API event objects and functions. More...
#include <event.hpp>
Public Member Functions | |
event () | |
Default constructor. More... | |
event (unsigned flags) | |
Constructs an event with the given flags. More... | |
~event () | |
Destructor. More... | |
void | record (cudaStream_t stream=0) |
Records an event. More... | |
cudaError_t | synchronize () |
Wait until the completion of all device work preceding the most recent call to record(). More... | |
cudaError_t | query () |
Query the status of all device work preceding the most recent call to record(). More... | |
float | operator- (event &other) |
Computes the elapsed time between another event and this event. More... | |
Static Public Member Functions | |
static float | elapsed_time (event &start, event &end) |
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds). More... | |
Encapsulates CUDA API event objects and functions.
CUDA events are useful for assessing the running time of device operations. This can be handy when optimizing kernel implementations or thread configurations.
This is just a thin wrapper around the appropriate cudaEventXXXXX functions in the CUDA API to provide access to the event functions in a more C++-like style. The documentation is shamelessly lifted from the official CUDA documentation (http://docs.nvidia.com/cuda/index.html).
For example, to get the running time of a kernel function:
|
inline |
|
inline |
Constructs an event with the given flags.
As of now, valid flags specified by the CUDA API are:
flags | A valid event flag. |
|
inline |
Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the record() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.
If record() has not been called on either event, then cudaErrorInvalidResourceHandle is returned. If record() has been called on both events but one or both of them has not yet been completed (that is, query() would return cudaErrorNotReady on at least one of the events), cudaErrorNotReady is returned. If either event was created with the cudaEventDisableTiming flag, then this function will return cudaErrorInvalidResourceHandle.
start | Starting event. |
end | Ending event. |
|
inline |
|
inline |
Query the status of all device work preceding the most recent call to record().
This applies to the appropriate compute streams, as specified by the arguments to record().
If this work has successfully been completed by the device, or if record() has not been called on event, then cudaSuccess is returned. If this work has not yet been completed by the device then cudaErrorNotReady is returned.
|
inline |
Records an event.
If stream is non-zero, the event is recorded after all preceding operations in stream have been completed; otherwise, it is recorded after all preceding operations in the CUDA context have been completed. Since operation is asynchronous, query() and/or synchronize() must be used to determine when the event has actually been recorded.
If record() has previously been called on event, then this call will overwrite any existing state in event. Any subsequent calls which examine the status of event will only examine the completion of this most recent call to record().
stream | Stream in which to record event. |
|
inline |
Wait until the completion of all device work preceding the most recent call to record().
This applies to the appropriate compute streams, as specified by the arguments to record(). If record() has not been called on event, cudaSuccess is returned immediately.
Waiting for an event that was created with the cudaEventBlockingSync flag will cause the calling CPU thread to block until the event has been completed by the device. If the cudaEventBlockingSync flag has not been set, then the CPU thread will busy-wait until the event has been completed by the device.