Cakmak, Muhammet2026-04-252026-04-2520252169-3536https://doi.org/10.1109/ACCESS.2025.3567774https://hdl.handle.net/11486/8441In recent years, Vision Transformers (ViTs) have gained prominence as a highly effective method for image classification, often outperforming traditional Convolutional Neural Networks (CNNs). However, their relatively slow processing speed limits their practical use, particularly in real-time applications. Conversely, CNN-based transfer learning models provide faster inference but may struggle with classification accuracy on complex datasets. To address these challenges, Temporal Coordinate Attention (TCA) modules have been introduced to optimize efficiency and performance. This study proposes a hybrid architecture combining EfficientNet, Vision Transformer, and Temporal Channel Attention modules that integrates the accuracy of ViTs, the computational efficiency of CNNs, and the enhancement capabilities of TCA modules. The model is designed to classify Siirt and Kirmizi pistachio varieties with high precision. It achieves outstanding results, including 99.07% accuracy, 99.12% recall, and a Cohen's Kappa score of 98.10%. These findings highlight the model's robustness, demonstrating its ability to perform reliable classifications with minimal bias, making it well-suited for real-world applications.eninfo:eu-repo/semantics/openAccessAccuracyComputational modelingTransformersConvolutional neural networksRandom forestsComputer visionArtificial intelligenceMachine learningFeature extractionImage classificationClassificationvision transformerEfficientNetTCAmachine learningA New Lightweight Hybrid Model for Pistachio Classification Using Transformers and EfficientNetArticle13858578587210.1109/ACCESS.2025.35677742-s2.0-105004908278Q1WOS:001492129400023Q20000-0002-3752-6642