Managing memory

Allocating

The vex::vector<T> class constructor accepts a const reference to std::vector<vex::backend::command_queue>. A vex::Context instance may be conveniently converted to this type, but it is also possible to initialize the command queues elsewhere (e.g. with the OpenCL backend vex::backend::command_queue is typedefed to cl::CommandQueue), thus completely eliminating the need to create a vex::Context. Each command queue in the list should uniquely identify a single compute device.

The contents of the created vector will be partitioned across all devices that were present in the queue list. The size of each partition will be proportional to the device bandwidth, which is measured the first time the device is used. All vectors of the same size are guaranteed to be partitioned consistently, which minimizes inter-device communication.

In the example below, three device vectors of the same size are allocated. Vector A is copied from the host vector a, and the other vectors are created uninitialized:

const size_t n = 1024 * 1024;
vex::Context ctx( vex::Filter::Any );

std::vector<double> a(n, 1.0);

vex::vector<double> A(ctx, a);
vex::vector<double> B(ctx, n);
vex::vector<double> C(ctx, n);

Assuming that the current system has an NVIDIA GPU, an AMD GPU, and an Intel CPU installed, possible partitioning may look like this:

_images/partitioning.png
template <typename T>
class vector : public vex::vector_expression<Expr>

Device vector.

Public Functions

vector()

Empty constructor.

vector(const vector &v)

Copy constructor.

vector(vector &&v)

Move constructor.

vector(const backend::command_queue &q, const backend::device_vector<T> &buffer, size_t size = 0)

Wraps a native buffer without owning it.

May be used to apply VexCL functions to buffers allocated and managed outside of VexCL.

vector(const std::vector<backend::command_queue> &queue, size_t size, const T *host = 0, backend::mem_flags flags = backend::MEM_READ_WRITE)

Creates vector of the given size and optionally copies host data.

vector(size_t size, const T *host = 0, backend::mem_flags flags = backend::MEM_READ_WRITE)

Creates vector of the given size and optionally copies host data.

This version uses the most recently created VexCL context.

vector(const std::vector<backend::command_queue> &queue, const std::vector<T> &host, backend::mem_flags flags = backend::MEM_READ_WRITE)

Creates new device vector and copies the host vector.

vector(const std::vector<T> &host, backend::mem_flags flags = backend::MEM_READ_WRITE)

Creates new device vector and copies the host vector.

This version uses the most recently created VexCL context.

template <class Expr>
vector(const Expr &expr)

Constructs new vector from vector expression.

This will fail if VexCL is unable to automatically determine the expression size and the compute devices to use.

void swap(vector &v)

Swap function.

void resize(const vector &v, backend::mem_flags flags = backend::MEM_READ_WRITE)

Resizes the vector.

Borrows devices, size, and data from the given vector. Any data contained in the resized vector will be lost as a result.

void resize(const std::vector<backend::command_queue> &queue, size_t size, const T *host = 0, backend::mem_flags flags = backend::MEM_READ_WRITE)

Resizes the vector with the given parameters.

This is equivalent to reconstructing the vector with the given parameters. Any data contained in the resized vector will be lost as a result.

void resize(const std::vector<backend::command_queue> &queue, const std::vector<T> &host, backend::mem_flags flags = backend::MEM_READ_WRITE)

Resizes the vector.

This is equivalent to reconstructing the vector with the given parameters. Any data contained in the resized vector will be lost as a result.

void resize(size_t size, const T *host = 0, backend::mem_flags flags = backend::MEM_READ_WRITE)

Resizes the vector.

void clear()

Fills vector with zeros.

This does not change the vector size!

const backend::device_vector<T> &operator()(unsigned d = 0) const

Returns memory buffer located on the given device.

backend::device_vector<T> &operator()(unsigned d = 0)

Returns memory buffer located on the given device.

const_iterator begin() const

Returns const iterator to the first element of the vector.

const_iterator end() const

Returns const iterator referring to the past-the-end element in the vector.

iterator begin()

Returns iterator to the first element of the vector.

iterator end()

Returns iterator referring to the past-the-end element in the vector.

const element operator[](size_t index) const

Access vector element.

element operator[](size_t index)

Access vector element.

const element at(size_t index) const

at() style access is identical to operator[]

element at(size_t index)

at() style access is identical to operator[]

size_t size() const

Returns vector size.

size_t nparts() const

Returns number of vector parts.

Each partition is located on single device.

size_t part_size(unsigned d) const

Returns vector part size on the given device.

size_t part_start(unsigned d) const

Returns index of the first element located on the given device.

const std::vector<backend::command_queue> &queue_list() const

Returns reference to the vector of command queues used to construct the vector.

backend::device_vector<T>::mapped_array map(unsigned d = 0)

Maps vector part located on the given device to a host array.

This returns a smart pointer that will be unmapped automatically upon destruction

backend::device_vector<T>::mapped_array map(unsigned d = 0) const

Maps vector part located on the given device to a host array.

This returns a smart pointer that will be unmapped automatically upon destruction

const vector &operator=(const vector &x)

Copy assignment.

const vector &operator=(vector &&v)

Move assignment.

template <class Expr>
auto operator=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator+=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator-=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator*=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator/=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator%=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator&=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator|=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator^=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator<<=(const Expr &expr)

Expression assignment operator.

template <class Expr>
auto operator>>=(const Expr &expr)

Expression assignment operator.

Copying

The vex::copy() function allows to copy data between host and compute device memory spaces. There are two forms of the function – a simple one which accepts whole vectors, and an STL-like one, which accepts pairs of iterators:

std::vector<double> h(n);       // Host vector.
vex::vector<double> d(ctx, n);  // Device vector.

// Simple form:
vex::copy(h, d);    // Copy data from host to device.
vex::copy(d, h);    // Copy data from device to host.

// STL-like form:
vex::copy(h.begin(), h.end(), d.begin()); // Copy data from host to device.
vex::copy(d.begin(), d.end(), h.begin()); // Copy data from device to host.

The STL-like variant can copy sub-ranges of the vectors, or copy data from/to raw host pointers.

Vectors also overload the array subscript operator, vex::vector::operator[](), so that users may directly read or write individual vector elements. This operation is highly ineffective and should be used with caution. Iterators allow for element access as well, so that STL algorithms may in principle be used with device vectors. This would be very slow but may be used as a temporary building block.

Another option for host-device data transfer is mapping device memory buffer to a host array. The mapped array then may be transparently read or written. The method vex::vector::map() maps the d-th partition of the vector and returns the mapped array:

vex::vector<double> X(ctx, N);
{
    auto mapped_ptr = X.map(0); // Unmapped automatically when goes out of scope
    for(size_t i = 0; i < X.part_size(0); ++i)
        mapped_ptr[i] = host_function(i);
}

Shared virtual memory

Both OpenCL 2.0 and CUDA 6.0 allow to share the same virtual address range between the host and the compute devices, so that there is no longer need to copy buffers between devices. In other words, no keeping track of buffers and explicitly copying them across devices! Just use shared pointers. OpenCL 2.0 calls this concept Shared Virtual Memory (SVM), and CUDA 6.0 talks about Unified Memory. In VexCL, both of these are abstracted into vex::svm_vector<T> class.

The vex::svm_vector<T> constructor, as opposed to vex::vector<T>, takes single instance of vex::backend::command_queue. This is because the SVM vector has to be associated with a single device context. The SVM vectors in VexCL may be used in the same way normal vectors are used.

Example:

// Allocate SVM vector for the first device in context:
vex::svm_vector<int> x(ctx.queue(0), n);

// Fill the vector on the host.
{
    auto p = x.map(vex::backend::MAP_WRITE);
    for(int i = 0; i < n; ++i)
        p[i] = i * 2;
}
template <typename T>
class svm_vector : public vex::vector_expression<Expr>, public vex::vector_expression<Expr>, public vex::vector_expression<Expr>

Shared Virtual Memory wrapper class.

Public Functions

svm_vector(const cl::CommandQueue &q, size_t n)

Allocates SVM vector on the given device.

size_t size() const

Returns size of the SVM vector.

const cl::CommandQueue &queue() const

Returns reference to the command queue associated with the SVM vector.

mapped_pointer map(cl_map_flags map_flags = CL_MAP_READ | CL_MAP_WRITE)

Returns host pointer ready to be either read or written by the host.

This returns a smart pointer that will be unmapped automatically upon destruction

const svm_vector &operator=(const svm_vector &other)

Copy assignment operator.