"etc." is the most important part here. There is NL SFT and code SFT data which guessing by the names are instruction data very likely from GPT-4. It is known in finetuning community that training with GPT-4 data is the easiest way of improving the model. If that's the case base JetMoE should be compared to finetuned llama, not base llama.
I guess it is good that they mentioned some of it, but yeah, that isn't exceptionally helpful when making claims of it being 100% open source.
I'm not sure why they feel the need to be so secretive if all of the sources are open.