CUDA programming language, being a C with extensions, offers the ability to use arbitrary data structures in GPU programs. But in order for the hardware to perform efficient global loads and stores with variables of structured types, additional alignment details must be specified.

Take a look at this structure definition:
typedef struct{
    float a;
    float b;
} testStructure;

Without alignment specification the compiler will not automatically use a single 64-bit global memory load/store instruction, but will emit two 32-bit load instructions instead. 
This significantly impacts aggregate load/store bandwidth, since the latter breaks coalescing rules because of incontiguous memory access pattern. Refer to section 5.1.2.1 of the Programming Guide.