In the June 2, 1999 issue of The Journal of the American Medical Association, an article1 by Dr Douglas S. King and colleagues was published reporting the results of a clinical investigation of the effects of androstenedione (AD), a widely-used sports supplement introduced by Patrick Arnold of LPJ Research. The popular media immediately reported the conclusions of this article to their readers and viewers, proclaiming that AD is proven to be ineffective and dangerous. The truth of the matter is different — what is being widely claimed about this study is in many cases entirely false. Here we will examine the study closely and shall find out that the popular reports concerning the results and conclusions of this study are at best erroneous.
Purpose of the Study
The reported purpose was to determine if AD actually increases blood testosterone levels, muscle fiber size, or strength, and to determine effects on blood lipids and on indicators of liver function.
After subjects took 100 mg of AD, testosterone levels at nine points between 1 and 5 hours were observed to increase about 20% (value estimated from their graph) compared to baseline and the 30 minute measurement. When the levels of the AD group are compared to the placebo group, the AD group is seen to have started with levels about 20% lower than the placebo group, and then between 1 and 5 hours, for nine consecutive measurements, levels are in every case higher than the placebo group, usually about 10% higher.
This information was not presented in popular accounts of this study and was glossed over within the JAMA article. The authors did not choose to give numerical results, but gave only a graph with a statement that this was not “significant.”
In the longer-term study (8 weeks) the AD group suffered no decreases in LH or FSH.
The AD group suffered no adverse physical effects.
No adverse changes in markers of liver function were seen from AD.
A fairly small worsening (decrease) of HDL cholesterol from an average 1.09 mmol/L to an average 0.96 mmol/L was seen in the AD group.
In 6 weeks of AD usage, the AD group lost almost 5 lb. of fat on average without dieting while the placebo group lost less than 2 lb. of fat, again without dieting. This result was also deemed not significant.
The AD group increased strength by an average of 30% over 8 weeks while the placebo group increased strength by an average of about 31%. The two groups had similar increases in size of muscle fibers and changes in body circumferences.
Both groups gained about 6 lb. of lean mass on average. The AD group made the same gain in lean mass as the placebo group while losing 3 lb. more fat.
In the longer-term study of 8 weeks, increases in levels of estradiol and estrone in the AD group were about 30-40% compared to their baseline values.
Conclusions Made by Authors
The authors concluded somehow from their data, “Androstenedione supplementation does not increase serum testosterone concentrations or enhance skeletal muscle adaptations to resistance in training in normotestogeneric young men [young men with normal testosterone] and may result in adverse health consequences.”
I know it may seem rather confusing that they concluded this from the above results, but that is exactly what they did. And it is their conclusions that are being trumpeted in the popular media, not their actual findings.
Methods of the Study
Thirty young men between the ages of 19 and 29 were recruited. These subjects had been performing no weight training and had not been using AD or any other nutritional supplements.
The first experiment done was a study to determine the short-term (acute) effect of AD. Ten of the subjects were chosen for this study, and five received AD and five received placebo. Blood testosterone levels were determined before taking AD or placebo and then every 30 minutes for the next six hours. This experiment was repeated again a week later.
For the longer term experiment over 8 weeks to study effects on resistance training and effect on testosterone, the 20 subjects were divided randomly into a group of 10 which would receive AD and a group of 10 that would receive placebo. One of the members of the AD group was found to have a health problem and therefore was removed from the study, giving a size of n=9 for the AD group.
The study was performed double-blind and neither the authors nor anyone involved in the study knew who was in which group until the experiments were concluded and statistical analysis began.
During the 8 week experiment, the subjects receiving AD took 100 mg three times per day at 9 AM, 3 PM, and at bedtime. Subjects received AD or placebo only during weeks 1 and 2, 4 and 5, and 7 and 8. This was done to follow the cycling recommendations of the manufacturer.
Resistance training consisted of training three days per week on 10 exercises under supervision of one of the scientists. Maximal strength (one rep maximum, or 1RM) was determined at weeks 4 and 8 on each of eight exercises (for some unstated reason, leg press and calf press were not tested for strength).
Body composition was determined by hydrostatic (underwater) weighing.
Diet was not controlled. The subjects were asked to eat in the same manner as they normally did, and after the study was over, all said that they did so.
Measurements were taken of various body circumferences such as biceps, chest, waist, etc., and biopsies were taken of extremely small samples of muscle to measure changes in fiber size.
The Meaning of Statistical Significance
Before analyzing and criticizing the methods used, we need first to consider the question of statistical significance. Unfortunately there will be a little bit of work to this and few will consider the next page to be fun. It is very important though and necessary to understanding the JAMA study, as well as many other scientific or medical studies.
When dealing with small numbers of subjects, chance becomes very important and must be considered.
Let’s say for example that usually 1/4 of men are fast gainers, 1/2 are average, and 1/4 are slow gainers. While no actual numbers exist, let’s say that for a given program using beginners, fast gainers will add 15 lb. of muscle in 8 weeks, average gainers will add 10 lb., and slow gainers will add 5 lb.
How do we describe this variation? We can take an average of everybody and in this case it happens that the average is 10 lb. of muscle gain. But some gain more, some gain less. There is a statistical formula which we need not know, but from the data for each person lets us calculate a number that represents the variability, called the standard deviation. About 2/3 of people will be within one standard deviation of the average, and about 95% will be within two standard deviations of the average. In our example, the standard deviation is 4.3 lb. So it is quite common that one person will gain as much as 4.3 lb. more or less than average, but quite uncommon for someone to be different from the average by more than twice that (8.6 lb.) This is just to give you an idea of what a standard deviation is.
Hang on, it gets easier!
So let’s say that we do a study with 10 people in each group to find out whether Supplement X gives better gains than placebo. When we are done, the X group gained a little more on average than the placebo group.
We need to know if it’s reasonable to believe that could have just happened by chance. If so then we do not know whether X works or not.
A statistician can take the standard deviations of each group, the number of people in each group, and the difference in the average gains, and tell you how likely it is that chance could have given that difference. If his math says there’s less than a 5 percent chance of the difference happening by luck, this is called a “significant” increase.
This is very important: “Significant” has nothing to do with whether a change is important or how large the measured change is. “Significant” only means unlikely to happen by chance. A large measured increase that would be of life-changing importance if true might not be a significant increase, while a tiny measured increase that no one could ever care about might be a significant increase.
It is very easy to be deceived on this and to incorrectly assume that “significant” means substantial or important and “not significant” means very small and unimportant. Not at all. “Significant” is a technical word that is very easily completely misunderstood because the scientific usage is contrary to the common one.
So let’s say the Supplement X group on average gained one more lb. of muscle after 8 weeks. Is that significant? Again, the answer has nothing to do with whether one lb. is a worthwhile amount or not. In this particular study, 1 lb. is not significant: this could easily happen by chance. All it would take would be for the X group to have gotten one extra fast gainer in place of a slow gainer, and that can easily be a result of luck.
What do we conclude then? That X did not give any gains? Definitely we cannot conclude this. Certainly it is possible that it gives 1 lb. of gains: we observed that much change, after all, and it might be that chance did not work in favor of the Supplement X group. It’s even possible that X might cause more gains than we observed, but it suffered from bad luck in this study, with its group getting more than its share of slow gainers. So we do not know. The 1 lb. observed change is not significant because we cannot be confident that it did not happen by chance.
How big a change would have been needed for the statistics to say the difference is “significant”?
With subjects as variable as this, and only 10 in each group, the answer is about 4 lb. Any measured increase (caused by Supplement X or not) of less than 4 lb. will not be found significant by this study. Only observed increases of more than 4 lb. would be deemed significant.
So if we don’t care about gains of less than 4 lb. anyway, then “no significant increase” in the scientific sense matches up well with practical application. But if we do consider a 1 to 3 lb. change important, then the study is not sensitive enough and is inadequate.
To be able to find smaller observed changes such as 1 lb. to be significant, we would need either less variability among the subjects, or more subjects, so luck could not have as much effect. With the same variability, we would need 120 subjects per group if we wanted to be able to find 1 lb. to be significant. Or if we must use only 10 subjects per group, we would need to reduce their standard deviation to only about 1 lb., by finding a set of people where gains would be similar for everyone unless Supplement X does something.
Now back to the JAMA study.
Criticism of Methods
Consultation with any expert would have resulted in these investigators finding out from the very beginning that their study could not be expected to discover any effects of AD unless those effects were quite large, and larger than anyone really was expecting. Having this information and being advised of how to overcome the problems, perhaps a useful study could have been performed, and we would have fairly accurate and precise measurements of the effects of AD. That, however, was not to be.
What are the problems with the methods?
Not only were only a small number of subjects selected but the variability of the subjects was (and could have been predicted to be) very high. As we saw in the above discussion, when both of these are the case, then only rather large effects can be found significant, and moderate effects that might well be very valuable simply cannot be discerned from the other variation.
When the number of subjects is small, assignment to groups should not simply be random. If this is done, then it is quite possible (as happened in this study) that one group will consist of mostly fatter people than the other, or weaker people, or people with higher starting levels of testosterone, etc. Instead what should be done is that subjects should be paired, finding the most similar matches for each person, and then one member of the pair is assigned randomly to one of the two groups, and the other member to the other group. This results in groups that are much more similar to each other. This is standard procedure in the pharmaceutical industry for small-scale trials.
Since the JAMA study failed to do this, their results are open to question. Would perhaps the AD group have done even better if they hadn’t been the fatter and weaker group to start out with? Or, where the AD group outperformed the placebo group, was their initial weakness and fatness the sole cause of whatever difference was seen between the groups? We don’t know. Obviously, this was a flaw in the study. Even if the authors did not wish to use the paired method, their statistician should have warned them that the randomization process yielded, by bad luck, groups substantially and significantly different from each other, and advised that the subjects should be re-randomized to yield groups that were not different from each other.
Untrained individuals were used as subjects. If they had chosen selected trained athletes, drawing perhaps from the various athletic teams of their university, they would have been able to find subjects who had reached near steady-state results to their training, and who therefore would not show much variability in their gains. With these subjects, even small effects of AD might be very apparent. Untrained subjects, however, can make rapid improvements independent of whether any supplements are used, and with very high variability.
An athlete, say a 3rd or 4th year college player who does not use drugs, will generally not suddenly add 5 or 10 lb. of muscle while continuing his regular training program. If he is lucky he may have gained 5 lb. over the last year and will gain 5 lb. during the next year. Perhaps over 8 weeks a few of them will add a pound or two of lean mass, and some will lose a pound or two, but the variability will be much lower than with untrained individuals. It is very possible for one person to gain 10 lb. more than another person over 8 weeks if both are new to training. Choice of such subjects guaranteed that small effects of AD which might be of great relevance to athletes would be deemed statistically not significant – not reliably distinguished from chance – by this study.
- Overfat individuals were used as subjects. Starting bodyfat percentages were over 23% for the AD group and over 21% for the placebo group. This might have affected conversion to estrogen,2 because much aromatization occurs in fat tissue, and in any case such subjects are hardly representative of athletes.
- Subjects were on a diet atypical of athletes, receiving only 85-98 grams of protein per day. This is shown in various studies to be inadequate for individuals undergoing resistance training.3,4,5 It cannot be ruled out that the reason the AD and placebo groups gained the same amounts is that protein consumption was the limiting factor in each case.
Failure to control diet essentially guaranteed inconclusive results with respect to fat loss.
In short, the methods used were such that it could not have been rationally expected that possibly important effects of androstenedione (such as perhaps raising testosterone levels by 20% or 30% during a window of a few hours, or improving ability to lose fat by say 1/4 or 1/2 lb. per week, or improving strength of a trained athlete by 5%) could be detected. If they did exist they either were not detected, or if detected were deemed not significant because they could have occurred by chance. In other words, the study was born to fail.
Valid Conclusions to be Drawn from the Data
While the study was not capable of determining if small increases such as 20-30% occur from the supplement or from chance, the study does have some limit of detection though this is not reported and I cannot compute it because the authors fail to provide the necessary data. It is obviously rather high though. The data, though variable, seems precise enough to enable us to say it is proven that testosterone levels are not increased by 200%, or perhaps the data is good enough to be able to say that testosterone is not increased by 100%. The authors did not perform the necessary calculations to determine exactly what limit might be set, and instead conclude that there is “no effect.”
Work done elsewhere proving with 95% statistical confidence that increase does occur6 is not disproven by the JAMA study. King et al. cannot state with even 50% confidence from their data that there was no increase (since after all they did observe an increase). One simply cannot claim, as these authors did, that one’s observations of an increase of about 30% prove that there is no effect!
Nothing can be concluded concerning fat loss because the data is too variable. AD may be beneficial or it may not. It may be concluded that no adverse effects occurred on markers of liver function or on LH and FSH levels. It may also be concluded that the AD group suffered a small worsening of HDL cholesterol while the placebo group saw a small improvement.
The data of the JAMA study on effects of androstenedione does not support the principal conclusions of the authors, which are an exercise in illogic. While in medicine it is the general practice to assume that something unknown does not work and the burden of proof is to show that it does work, nonetheless it is logically erroneous to claim that failure to prove effect (because of limitations of your method) proves that there is no effect. When, as is the case with AD, others have proven effect, one should not try to make a case with data that proves nothing one way or the other because of statistical uncertainty. I can only conclude that, whether among the authors or the reviewers, a strong bias must have existed against androstenedione.
Although the efficacy, if any, of androstenedione clearly cannot be compared to anabolic/androgenic steroids, the inability of this study to detect effects even when it observes them, and the willingness of JAMA to publish such a study, is rather reminiscent of JAMA articles of the past announcing that anabolic steroids are not shown to improve athletic performance.
There are, after all, none so blind as those who will not see.
Special thanks are due to Dr Karlis Ullis, Dr Tim Ziegenfuss, Dr Richard Cohen, and Will Brink. These gentlemen and I collaborated, with Will Brink organizing the affair, in producing a letter to JAMA in rebuttal of the King study. No doubt various ideas and points of view written here originate from their contributions.
1. King DS, Sharp RL, Vukovich MD, et al. Effect of oral androstenedione on serum testosterone and adaptations to resistance training in young men. JAMA. 1999;281:2020-2028.
2. Kley HK, Deselaers T, Peerenboom H, Kruskemper HL. Enhanced conversion of androstenedione to estrogens in obese males. J Clin Endocrinol Metab. 1980;51(5):1128-32.
3. Lemon PW. Is increased dietary protein necessary or beneficial for individuals with a physically active life style? Nutr Rev. 1996;54:S169.
4. Lemon PW. Do athletes need more dietary protein and amino acids? Int J Sports Nutr. 1995;S39-61.
5. Tarnopolsky MA. Evaluation of protein requirements for trained strength athletes. J Applied Phys. 1992;73(5):1986-1995.
6. Earnest CP, Olson MA, Beckham SG, et al. Oral 4-androstene-3,17-dione and 4-androstene-3,17-diol supplementation in young males. JPEN. 1999;23(1):S16.Read more from this MESO-Rx article at: http://mesomorphosis.com/articles/pharmacology/androstenedione-study.htm#ixzz1XPAwulwC