General Information
    • ISSN: 1793-8201 (Print), 2972-4511 (Online)
    • Abbreviated Title: Int. J. Comput. Theory Eng.
    • Frequency: Quarterly
    • DOI: 10.7763/IJCTE
    • Editor-in-Chief: Prof. Mehmet Sahinoglu
    • Associate Editor-in-Chief: Assoc. Prof. Alberto Arteta, Assoc. Prof. Engin Maşazade
    • Managing Editor: Ms. Cecilia Xie
    • Abstracting/Indexing: Scopus (Since 2022), INSPEC (IET), CNKI,  Google Scholar, EBSCO, etc.
    • Average Days from Submission to Acceptance: 192 days
    • APC: 800 USD
    • E-mail: editor@ijcte.org
    • Journal Metrics:
    • SCImago Journal & Country Rank
Article Metrics in Dimensions

IJCTE 2025 Vol.17(1): 1-12
DOI: 10.7763/IJCTE.2025.V17.1363

Efficient Packet Payload Feature Extraction Using the BIGBIRD Model

Son A. Pham1,*, and Yasuhiro Nakamura2
1. Graduate School of Science and Engineering, National Defense Academy, Yokosuka, Japan
2. Department of Computer Science, School of Electrical and Computer Engineering, National Defense Academy, Yokosuka, Japan
Email: pisonnda@gmail.com (S.A.P.); yas@nda.ac.jp (Y.N.)
*Corresponding author

Manuscript received February 20, 2024; revised March 22, 2024; accepted June 17, 2024; published January 9, 2025

Abstract—In recent years, the rise in cyber-attacks on the Internet has become a major concern. Addressing these threats requires continuous monitoring and analysis of communication patterns in cyberspace. However, the large volume and diverse nature of incoming packets and payloads present a challenge for simultaneous processing. Preliminary clustering of payloads is essential for subsequent analysis and interpretation. Previous studies have explored the use of natural language processing models such as N-gram, Word2Vec, and Bidirectional Encoder Representations from Transformers (BERT) to identify and categorize payloads, aiming to extract payload features for distinguishing between benign and malicious content. However, these models overlook the sequential order and byte-level positioning within payloads, thus limiting their effectiveness in capturing the intrinsic characteristics of payload content. This study introduces a novel model, which effectively extracts comprehensive features of payload content by considering the sequential ordering of byte sequences. Comparative experiments demonstrate that the clustering rate and clustering accuracy of the proposed method surpass those of other text feature extraction models such as N-gram, Word2Vec, and BERT, even when using the same clustering models. Moreover, the practical applicability of the proposed model is validated through its adaptation to actual observed data. This research significantly contributes to the field of cybersecurity and is expected to lead to future advancements and applications in this domain.

Keywords—clustering, features extraction, computer network and communications, network observation

[PDF]

Cite: Son A. Pham and Yasuhiro Nakamura, " Efficient Packet Payload Feature Extraction Using the BIGBIRD Model," International Journal of Computer Theory and Engineering, vol. 17, no. 1, pp. 1-12, 2025.

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).


Copyright © 2008-2025. International Association of Computer Science and Information Technology. All rights reserved.