ML practitioners want to minimize the loss they can achieve on a given compute budget, i.e. achieve the minimum of L(N). If we decrease C until the minimum of L(N) coincides with y = 2.0025 (the loss predicted for GPT-3 by the Hoffman scaling law), the resulting value of C is approximately 1.05E+23 FLOP and the value of N at that minimum is approximately 15E+9 parameters. In turn, the resulting value of D is 1.05E+23 / (6*15E+9) ~= 1.17E+12 tokens.