An instance of vex::Reductor<T, ReduceOP=vex::SUM>
allows one to
reduce an arbitrary vector expression to a single value of type T. Supported
reduction operations are vex::SUM
, vex::MIN
, and
vex::MAX
. Reductor objects are steteful – they keep small
temporary buffers on compute devices and receive a list of command queues at
construction.
In the following example an inner product of two vectors is computed:
vex::Reductor<double, vex::SUM> sum(ctx);
double s = sum(x * y);
Also, see Random number generation for an example of estimating value of \(\pi\) with the Monte Carlo method.
Reduce operations may be combined with the
vex::CombineReductors
class. This way several
reduction operations will be fused into single compute kernel. The operations
should return the same scalar type, and the result of the combined reduction
operation will be appropriately sized OpenCL/CUDA vector type.
In the following example minimum and maximum values of the vector are computed at the same time:
vex::Reductor<double, vex::CombineReductors<vex::MIN, vex::MAX>> minmax(ctx);
cl_double2 m = minmax(x);
std::cout << "min(x) = " << m.s[0] << std::endl;
std::cout << "max(x) = " << m.s[1] << std::endl;
In fact, the operation is so common, that VexCL provides a convenience typedef
vex::MIN_MAX
.
Reductor
¶Parallel reduction of arbitrary expression.
Reduction uses small temporary buffer on each device present in the queue parameter. One Reductor class for each reduction kind is enough per thread of execution.
Public Functions
Reductor
(const std::vector<backend::command_queue> &queue = current_context().queue())¶Constructor.
operator()
(const Expr &expr) const¶Compute reduction of a vector expression.
operator()
(const Expr &expr) constCompute reduction of a multivector expression.
SUM
¶Summation.
Subclassed by vex::SUM_Kahan
MIN
¶Minimum element.
MAX
¶Maximum element.
CombineReductors
¶Combines several reduce operations.
One of the most common operations in linear algebra is the matrix-vector
product. An instance of vex::SpMat
class holds a representation of
a sparse matrix. Its constructor accepts a sparse matrix in common CRS format.
In the example below a vex::SpMat is constructed from an Eigen sparse
matrix:
Eigen::SparseMatrix<double, Eigen::RowMajor, int> E;
vex::SpMat<double, int> A(ctx, E.rows(), E.cols(),
E.outerIndexPtr(), E.innerIndexPtr(), E.valuePtr());
Matrix-vector products may be used in vector expressions. The only restriction is that the expressions have to be additive. This is due to the fact that the operation involves inter-device communication for multi-device contexts.
// Compute residual value for a system of linear equations:
Z = Y - A * X;
This restriction may be lifted for single-device contexts. In this case VexCL
does not need to worry about inter-device communication. Hence, it is possible
to inline matrix-vector product into a normal vector expression with the help of
vex::make_inline()
:
residual = sum(Y - vex::make_inline(A * X));
Z = sin(vex::make_inline(A * X));
SpMat
¶Sparse matrix in hybrid ELL-CSR format.
Public Functions
SpMat
(const std::vector<backend::command_queue> &queue, size_t n, size_t m, const idx_t *row, const col_t *col, const val_t *val)¶Constructor.
Constructs GPU representation of the \(n \times m\)matrix. Input matrix is in CSR format. GPU matrix utilizes ELL format and is split equally across all compute devices.
::
make_inline
(const MVProdExpr &expr)¶Inlines a sparse matrix - vector product.
When applied to a matrix-vector product, the product becomes inlineable. That is, it may be used in any vector expression (not just additive expressions). The user has to guarantee the function is only used in single-device expressions.
Example:
// Get maximum residual value:
eps = sum( fabs(f - vex::make_inline(A * x)) );
VexCL provides several standalone parallel primitives that may not be used as
part of a vector expression. These are vex::inclusive_scan_by_key()
,
vex::exclusive_scan_by_key()
, vex::sort()
,
vex::sort_by_key()
, vex::reduce_by_key()
. All of these
functions take VexCL vectors both as input and output parameters.
Sort and scan functions take an optional function object used for comparison
and summing of elements. The functor should provide the same interface as, e.g.
std::less
for sorting or std::plus
for summing; additionally, it should
provide a VexCL function for device-side operations.
Here is an example of such an object comparing integer elements in such a way that even elements precede odd ones:
template <typename T>
struct even_first {
#define BODY \
char bit1 = 1 & a; \
char bit2 = 1 & b; \
if (bit1 == bit2) return a < b; \
return bit1 < bit2;
// Device version.
VEX_FUNCTION(bool, device, (int, a)(int, b), BODY);
// Host version.
bool operator()(int a, int b) const { BODY }
#undef BODY
};
Same functor could be created with the help of VEX_DUAL_FUNCTOR
macro, which takes return type, sequence of arguments (similar to the
VEX_FUNCTION
), and the body of the functor:
template <typename T>
struct even_first {
VEX_DUAL_FUNCTOR(bool, (T, a)(T, b),
char bit1 = 1 & a;
char bit2 = 1 & b;
if (bit1 == bit2) return a < b;
return bit1 < bit2;
)
};
Note that VexCL already provides vex::less<T>
,
vex::less_equal<T>
, vex::greater<T>
,
vex::greater_equal<T>
, and vex::plus<T>
.
The need to provide both host-side and device-side parts of the functor comes from the fact that multidevice vectors are first sorted partially on each of the compute devices and then merged on the host.
Sorting algorithms may also take tuples of keys/values (in fact, any
Boost.Fusion sequence will do). One will have to explicitly specify the
comparison functor in this case. Both host and device variants of the
comparison functor should take 2n
arguments, where n
is the number of
keys. The first n
arguments correspond to the left set of keys, and the
second n
arguments correspond to the right set of keys. Here is an example
that sorts values by a tuple of two keys:
vex::vector<int> keys1(ctx, n);
vex::vector<float> keys2(ctx, n);
vex::vector<double> vals (ctx, n);
struct {
VEX_FUNCTION(bool, device, (int, a1)(float, a2)(int, b1)(float, b2),
return (a1 == b1) ? (a2 < b2) : (a1 < b1);
);
bool operator()(int a1, float a2, int b1, float b2) const {
return std::make_tuple(a1, a2) < std::make_tuple(b1, b2);
}
} comp;
vex::sort_by_key(std::tie(keys1, keys2), vals, comp);
::
inclusive_scan
(vector<T> const &input, vector<T> &output, T init, Oper oper)Inclusive scan.
::
exclusive_scan
(vector<T> const &input, vector<T> &output, T init, Oper oper)Exclusive scan.
::
inclusive_scan_by_key
(K &&keys, const vector<V> &ivals, vector<V> &ovals, Comp comp, Oper oper, V init = V())Inclusive scan by key.
::
exclusive_scan_by_key
(K &&keys, const vector<V> &ivals, vector<V> &ovals, Comp comp, Oper oper, V init = V())Exclusive scan by key.
::
sort
(K &&keys, Comp comp)Sorts the vector into ascending order.
::
sort_by_key
(K &&keys, V &&vals, Comp comp)Sorts the elements in keys and values into ascending key order.
VEX_DUAL_FUNCTOR
(type, args, ...)¶Defines both device and host versions of a function call operator.
The intended use is the creation of comparison and reduction functors for use with scan/sort/reduce algorithms.
Example:
template <typename T>
struct less {
VEX_DUAL_FUNCTOR(bool, (T, a)(T, b),
return a < b;
)
};
less
: public std::less<T>¶Function object class for less-than inequality comparison.
The need for host-side and device-side parts comes from the fact that vectors are partially sorted on device and then final merge step is done on host.
less_equal
: public std::less_equal<T>¶Function object class for less-than-or-equal inequality comparison.
greater
: public std::greater<T>¶Function object class for greater-than inequality comparison.
greater_equal
: public std::greater_equal<T>¶Function object class for greater-than-or-equal inequality comparison.
plus
: public std::plus<T>¶Binary function object class whose call returns the result of adding its two arguments.