Benchmarking PyCaret AutoML Against IndoBERT Fine-Tuning for Sentiment Analysis on Indonesian IKN Twitter Data

A comparative study validates that transformer-based fine-tuning substantially outperforms classical AutoML on Indonesian-language sentiment tasks, with IndoBERT reaching 89.6% accuracy versus Logistic Regression's 77.6%. The 12-point gap underscores a persistent pattern across non-English NLP: pretrained language models dominate narrow, domain-specific classification even on modest datasets. For practitioners deploying sentiment systems in underrepresented languages, the finding reinforces that transfer learning from multilingual checkpoints now sets the baseline, making classical pipelines largely obsolete for text understanding.
Modelwire context
Analyst takeThe study's real contribution is not the accuracy gap itself, which was already predictable, but the choice of domain: IKN (Ibu Kota Nusantara) Twitter data ties model evaluation directly to a high-stakes political and infrastructure context, suggesting Indonesian NLP benchmarking is maturing from generic corpora toward policy-relevant datasets.
This paper lands on the same day as 'Benchmarking Logistic Regression, SVM, and LightGBM Against BiLSTM with Attention for Sentiment Analysis on Indonesian Product Reviews,' and the two studies reach opposite conclusions about classical ML viability, which is the more important signal. Together they suggest the answer is domain-dependent: classical pipelines hold up on high-volume e-commerce data but fall apart on shorter, noisier political social media text where pretrained contextual representations carry more weight. Practitioners choosing architectures for Indonesian-language deployments now have a cleaner decision framework, but also a messier one, because the right choice depends heavily on corpus characteristics rather than a single universal recommendation.
Watch whether the indobenchmark/indobert-base-p1 checkpoint gets tested against newer multilingual models like mDeBERTa or SeaLLM on the same IKN dataset within the next six months. If the gap narrows significantly, the 12-point advantage attributed to IndoBERT may reflect checkpoint recency more than architectural fit.
Coverage we drew on
This analysis is generated by Modelwire’s editorial layer from our archive and the summary above. It is not a substitute for the original reporting. How we write it.
MentionsPyCaret · IndoBERT · indobenchmark/indobert-base-p1 · Ibu Kota Nusantara
Modelwire Editorial
This synthesis and analysis was prepared by the Modelwire editorial team. We use advanced language models to read, ground, and connect the day’s most significant AI developments, providing original strategic context that helps practitioners and leaders stay ahead of the frontier.
Modelwire summarizes, we don’t republish. The full content lives on arxiv.org. If you’re a publisher and want a different summarization policy for your work, see our takedown page.