The main reason why bits are grouped together is to represent characters. Because of the binary nature of all thinks computing, ideal 'clumps' of bits come in powers of 2 ie 1,2,4,8,16,32...... (basically because they can always be divided into smaller equal packages {it also creates shortcuts for storing size, but that's another story}). Obviously 4 bits {nybble in some circles} can give us 2^4 or 16 unique characters. As most alphabets are larger than this, 2^8 (or 256 characters) is a more suitable choice.
Originally (until the mid 1950's) the term byte was used for a string of bits (of any length - usually 1 transmission - that is 1 (command) sequence or similar). The origin of the word is unclear but is thought to come from around the same time when Werner Bucholz (or similar) used the word bite (derived from, but distinct to bit) to describe a bitstring that could encode a character to be transmitted between peripherals. To avoid spelling problems this eventually became byte (hence nybble for half-byte).
I have also seen references to BYTE being an acronym, most commonly Binary Yoked Transfer Element (see acronymfinder.com). There is a vague mention on Dictionary.com of another possible origin from an IBM acronym but I suspect that bite may have changed to byte for a little bit of both reasons.
Machines exist that have used other length bytes (particularly 7 or 9). This have not really survived merely because they are not as easy to manipulate. You certainly can't split an odd number in half, which means if you were to divide bytes, you would have to keep track of the length of the bitstring.
Finally, 8 is also a convenient number, many people (psychologists and the like) claim that the human mind can generally recall only 7-8 things immediately (without playing memory tricks).
I think I'm getting off track now.....