The main work in you example is the iterative growing of the array. This is a waste of time in sequential and parallel code. Pre-allocate the output properly.
Starting parallel threads must take some time. For such a trivial code, the overhead is expected to be higher than the payload. Compare this with instructing 8 people to say the numbers 1 to 50. It is much faster to do this by your own.