shmem_collect

Concatenates blocks of data from multiple PEs to an array in every PE
participating in the collective routine.

Definitions

C11 Synopsis

int shmem_collect(shmem_team_t team, TYPE *dest, const TYPE *source,
                    size_t nelems);
int shmem_fcollect(shmem_team_t team, TYPE *dest, const TYPE *source,
                    size_t nelems);

C/C++ Synopsis

int shmem_TYPENAME_collect(shmem_team_t team, TYPE *dest, const TYPE *source,
                            size_t nelems);
int shmem_TYPENAME_fcollect(shmem_team_t team, TYPE *dest, const TYPE *source,
                            size_t nelems);
where TYPE is one of the standard RMA types and has a corresponding TYPENAME
specified by Table:5.
int shmem_collectmem(shmem_team_t team, void *dest, const void *source,
                        size_t nelems);
int shmem_fcollectmem(shmem_team_t team, void *dest, const void *source,
                        size_t nelems);

Deprecated Synopsis

Deprecated C/C++ Synopsis

void shmem_collect32(void *dest, const void *source, size_t nelems,
                     int PE_start, int logPE_stride, int PE_size,
                     long *pSync);
void shmem_collect64(void *dest, const void *source, size_t nelems,
                     int PE_start, int logPE_stride, int PE_size,
                     long *pSync);
void shmem_fcollect32(void *dest, const void *source, size_t nelems,
                     int PE_start, int logPE_stride, int PE_size,
                     long *pSync);
void shmem_fcollect64(void *dest, const void *source, size_t nelems,
                     int PE_start, int logPE_stride, int PE_size,
                     long *pSync);

Datatype Reference Table

Table:5

|           TYPE          |      TYPENAME       |
|-------------------------|---------------------|
|   float                 |     float           |
|   double                |     double          |
|   long double           |     longdouble      |
|   char                  |     char            |
|   signed char           |     schar           |
|   short                 |     short           |
|   int                   |     int             |
|   long                  |     long            |
|   long long             |     longlong        |
|   unsigned char         |     uchar           |
|   unsigned short        |     ushort          |
|   unsigned int          |     uint            |
|   unsigned long         |     ulong           |
|   unsigned long long    |     ulonglong       |
|   int8_t                |     int8            |
|   int16_t               |     int16           |
|   int32_t               |     int32           |
|   int64_t               |     int64           |
|   uint8_t               |     uint8           |
|   uint16_t              |     uint16          |
|   uint32_t              |     uint32          |
|   uint64_t              |     uint64          |
|   size_t                |     size            |
|   ptrdiff_t             |     ptrdiff         |

Arguments

team        A valid OpenSHMEM team handle
dest        Symmetric address of an array large enough to accept the
            concatenation of the source arrays on all participating PEs.
            The type of dest should match that implied in the
            SYNOPSIS section.
source      Symmetric address of the source data object. The type of source
            should match that implied in the SYNOPSIS section.
nelems      The number of elements in source array. For shmem_[f]collectmem,
            elements are bytes; for shmem_[f]collect{32,64},
            elements are 4 or 8 bytes, respectively.

---Deprecated---------------------------------------------------

PE_start    The lowest PE number of the active set of PEs.
logPE_stride    The log (base 2) of the stride between consecutive PE
            numbers in the active set.
PE_size     The number of PEs in the active set.
pSync       Symmetric address of a work array of size at least
            SHMEM_COLLECT_SYNC_SIZE.

Description

OpenSHMEM collect and fcollect routines perform a collective operation to
concatenate nelems data items from the source array into the dest array,
over an OpenSHMEM team or active set in processor number order.
The resultant dest array contains the contribution from PEs as follows:
    • For an active set, the data from PE PE_start is first, then the
        contribution from PE PE_start + PE_stride second, and so on.
    • For a team, the data from PE number 0 in the team is first, then the
        contribution from PE 1 in the team, and so on.

The collected result is written to the dest array for all PEs that
participate in the operation. The same dest and source arrays must be passed
by all PEs that participate in the operation.

The fcollect routines require that nelems be the same value in all
participating PEs, while the collect routines allow nelems to vary from PE
to PE.

Team-based collect routines operate over all PEs in the provided team
argument. All PEs in the provided team must participate in the operation.
If team compares equal to SHMEM_TEAM_INVALID or is otherwise invalid, the
behavior is undefined.

Active-set-based collective routines operate over all PEs in the active set
defined by the PE_start, logPE_stride, PE_size triplet. As with all
active-set-based collective routines, each of these routines assumes that
only PEs in the active set call the routine. If a PE not in the active set
and calls this collective routine, the behavior is undefined.

The values of arguments PE_start, logPE_stride, and PE_size must be the same
value on all PEs in the active set. The same pSync work array must be passed
by all PEs in the active set.

Upon return from a collective routine, the following are true for the
local PE:
    • The dest array is updated and the source array may be safely reused.
    • For active-set-based collective routines, the values in the pSync
        array are restored to the original values

Return Values

Zero on successful local completion. Nonzero otherwise.

Notes

The collective routines operate on active PE sets that have a
non-power-of-two PE_size with some performance degradation. They operate
with no performance degradation when nelems is a non-power-of-two value.

Examples

C/C++ Example

The following shmem_collect example is for  C/C++ programs:
#include <shmem.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    static long lock = 0;

    shmem_init();
    int mype = shmem_my_pe();
    int npes = shmem_n_pes();
    int my_nelem = mype + 1; /* linearly increasing number of elements with PE */
    int total_nelem = (npes * (npes + 1)) / 2;

    int *source = (int *) shmem_malloc(npes * sizeof(int)); /* symmetric alloc */
    int *dest = (int *) shmem_malloc(total_nelem * sizeof(int));

    for (int i = 0; i < my_nelem; i++)
        source[i] = (mype * (mype + 1)) / 2 + i;
    for (int i = 0; i < total_nelem; i++)
        dest[i] = -9999;

    /* Wait for all PEs to initialize source/dest: */
    shmem_team_sync(SHMEM_TEAM_WORLD);

    shmem_int_collect(SHMEM_TEAM_WORLD, dest, source, my_nelem);

    shmem_set_lock(&lock); /* Lock prevents interleaving printfs */
    printf("%d: %d", mype, dest[0]);
    for (int i = 1; i < total_nelem; i++)
        printf(", %d", dest[i]);
    printf("\n");
    shmem_clear_lock(&lock);
    shmem_finalize();
    return 0;
}