Good point about the MLP architectures. I actually had a section on TSMixer prior to publishing but than removed it as the article was already long and I was focusing more on papers out of the big three conferences (NIPS, ICML, and ICLR). I do like the relative simplicity of the architecture though compared to transformers and already added it FF. I look forward to testing it out on my own datasets and maybe TIDE as well.
The Google article sounds interesting. Though I wasn't able to tell if they tested it on multivariate time series data or just univariate (I just skimmed it quickly). If they did do it on multivariate I would be curious as how they handle the problem of datasets with different numbers of covariates.
I can see pre-training potentially working well on univariate data, but multivariate is where I think it becomes more difficult to imagine a scenario where it works well.