KR20240091474A

KR20240091474A - IEEE 802.15.4 Unslotted CSMA/CA Reinforcement Learning-based Dynamic Backoff Parameter Selection Device and Method for Massive IoT

Info

Publication number: KR20240091474A
Application number: KR1020220174428A
Authority: KR
Inventors: 정상화; 이시현
Original assignee: 부산대학교 산학협력단
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2024-06-21

Abstract

The present invention relates to an apparatus and a method for selecting a dynamic backoff parameter based on reinforcement learning for large-scale internet of things in IEEE 802.15.4 unslotted CSMA/CA, which can adjust parameters without additional packet load through Q-learning, which is a reinforcement learning technique of artificial intelligence. An electronic device including at least one processor for selecting a dynamic backoff parameter based on reinforcement learning for large-scale internet of things in IEEE 802.15.4 unslotted CSMA/CA comprises: a MAC layer module requesting a behavior value to a Q-learning module to receive the behavior value, requesting packet transmission to a PHY layer module, and transmitting and receiving channel information to and from the PHY layer module; a Q-learning module including a Q-table, performing state maintenance, behavior selection, and compensation management, and a Q(s,a) value maintenance; and a PHY layer module receiving a packet transmission request from the MAC layer module and delivering a received packet to the upper MAC layer module when another packet is received.

Description

IEEE 802.15.4 Unslotted CSMA/CA Reinforcement Learning-based Dynamic Backoff Parameter Selection Device and Method for Massive IoT

본 발명은 대규모 사물인터넷에 관한 것으로, 구체적으로 인공지능의 강화학습 기법인 Q-learning을 통하여 파라미터를 추가적인 패킷 부하 없이 조절할 수 있도록 한 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법에 관한 것이다.The present invention relates to large-scale Internet of Things, and specifically, reinforcement for large-scale Internet of Things of IEEE 802.15.4 non-slotted CSMA/CA that allows parameters to be adjusted without additional packet load through Q-learning, a reinforcement learning technique of artificial intelligence. It relates to an apparatus and method for learning-based dynamic backoff parameter selection.

사물인터넷은 21세기 생활에서 무선네트워크를 통하여 다양한 사물이 핵심 네트워크에 연결되어 정보를 교환할 수 있다.The Internet of Things allows various objects to connect to the core network and exchange information through a wireless network in 21st century life.

사람들은 다양한 유형의 사물인터넷 기술을 사용하여 먼 거리에서도 원격으로 사물의 동작을 관리하거나 모니터링할 수 있다.People can use various types of IoT technologies to remotely manage or monitor the behavior of objects from a distance.

IoT 기반 시스템은 지난 몇 년 동안 산업계에서 확산되어 스마트홈, 스마트 교통, 스마트 병원 등 스마트 시티와 같은 여러 애플리케이션으로 발전하였다.IoT-based systems have spread in industry over the past few years and have developed into several applications such as smart cities, smart homes, smart transportation, and smart hospitals.

이러한 애플리케이션의 급속한 확산으로 인하여 무선 네트워크에 대한 데이터 트래픽의 수요가 폭발적으로 증가하고 있다. 이처럼 대기시간에 덜 민감하고, 처리량 요구사항은 낮지만, 적용범위가 우수한 네트워크에서의 대량의 IoT기기들을 통칭 mIoT(massive IoT)라고 칭한다.Due to the rapid spread of these applications, the demand for data traffic on wireless networks is explosively increasing. Such large numbers of IoT devices in networks that are less sensitive to latency, have lower throughput requirements, but have excellent coverage are collectively referred to as mIoT (massive IoT).

한편, 스마트 시티 등에 적용되는 IoT 장치들은 장기간에도 인력에 의한 관리가 적은 저전력 광역네트워크(LPWAN, Low Power Wide Area Network)를 구성하고자 한다. 저전력 광역네트워크에 대한 수요에 의해 IEEE에서는 IEEE 802.15.4 기술 표준을 제정하였다. IEEE 802.15.4 기술 표준은 물리계층인 PHY계층과 더불어 데이터의 충돌 회피 기술을 포함하는 MAC계층을 포함하는 기술표준으로써 저전력 광역 네트워크를 지원한다.Meanwhile, IoT devices applied to smart cities, etc. seek to form a low power wide area network (LPWAN) that requires little management by manpower even for a long period of time. In response to the demand for low-power wide area networks, IEEE established the IEEE 802.15.4 technical standard. The IEEE 802.15.4 technical standard is a technical standard that includes the PHY layer, which is the physical layer, as well as the MAC layer, which includes data collision avoidance technology, and supports low-power wide area networks.

IEEE 802.15.4 기술표준은 IEEE 802.11과 IEEE 802.15.3 기술표준과 마찬가지로 CSMA/CA(Carrier Sensing Multiple Access/Collision Avoidance)를 사용한다. The IEEE 802.15.4 technical standard uses CSMA/CA (Carrier Sensing Multiple Access/Collision Avoidance) like the IEEE 802.11 and IEEE 802.15.3 technical standards.

CSMA/CA는 슬롯기반(Slotted)과 비슬롯기반(Unslotted)로 구분하는데 중요한 차이점은 비컨에 의한 시간동기화 여부이다.CSMA/CA is divided into slotted and unslotted. The important difference is whether time synchronization is performed by beacons.

대규모 IoT에서는 다수의 IoT기기들의 시간동기화에 필요한 비컨에 대한 오버헤드 등의 이유로 비슬롯기반 방식이 Wi-SUN(Wireless Smart-metering Utility Network), Zig-bee 등의 기술에 채택 되었다. In large-scale IoT, non-slot based methods were adopted for technologies such as Wi-SUN (Wireless Smart-metering Utility Network) and Zig-bee due to the overhead of beacons required for time synchronization of multiple IoT devices.

도 1은 IEEE 802.15.4의 Unslotted CSMA/CA 알고리즘 흐름도이다.Figure 1 is a flow chart of the Unslotted CSMA/CA algorithm of IEEE 802.15.4.

패킷을 전송하고자 하는 노드는 채널에 접근하기 이전에 Unslotted CSMA/CA 알고리즘을 통해 채널에 대한 접근 권한을 얻어 전송할 수 있다.Nodes wishing to transmit packets can transmit by obtaining access to the channel through the Unslotted CSMA/CA algorithm before accessing the channel.

각 전송 노드는 전송 시점으로부터 [0, ] 범위의 Backoff 값을 무작위로 선택하여 해당 기간만큼 전송을 지연시킨다. 이후 CCA(Channel Clear Access)를 실시하여 접근하고자 하는 채널에 대해 유휴(idle)상태인지 판단한다.Each transmission node is [0, ] Randomly selects a Backoff value in the range and delays transmission by the corresponding period. Afterwards, CCA (Channel Clear Access) is performed to determine whether the channel to be accessed is in an idle state.

채널이 유휴하다면 노드는 채널에 대한 접근 권한을 승인받고 패킷을 전송 할 수 있다. 유휴하지 않다면 BE값을 1만큼 증가시키고 더 넓은 범위에서 랜덤하게 백오프값을 정하여 CCA를 재시도하며, 최대 재시도 기회에 도달할 때까지 반복 할 수 있다.If the channel is idle, the node is granted access to the channel and can transmit packets. If it is not idle, you can increase the BE value by 1 and retry CCA by randomly setting the backoff value in a wider range, and repeat until the maximum retry opportunity is reached.

표 1은 비슬롯 CSMA/CA의 알고리즘 파라미터를 나타낸 것이다.Table 1 shows the algorithm parameters of non-slot CSMA/CA.

macMaxFrameRetries는 패킷의 재전송 기회를 의미하며, macMaxCSMABackoffs는 단일 패킷에 대한 채널 유휴여부 판단 기회를 의미한다. 그리고 macMaxBE와 macMinBE는 각각 백오프 지수의 최댓값과 최소값이다. macMaxFrameRetries refers to the opportunity to retransmit a packet, and macMaxCSMABackoffs refers to the opportunity to determine whether the channel is idle for a single packet. And macMaxBE and macMinBE are the maximum and minimum values of the backoff index, respectively.

CSMA/CA 알고리즘은 경쟁기반(Contention Based) 프로토콜로써 표 1의 파라미터들을 적절하게 조절하면 패킷의 충돌확률, 지연시간과 같은 네트워크 성능을 개선할 수 있음은 여러 연구들을 통해 확인되었다.The CSMA/CA algorithm is a contention-based protocol, and it has been confirmed through several studies that network performance, such as packet collision probability and delay time, can be improved by appropriately adjusting the parameters in Table 1.

그러나 대부분 특정 환경에 대한 정적인 파라미터들을 적용하여 성능 최적화를 이루었다. 또한 다른 연구들은 네트워크에 패킷을 추가적으로 교환 혹은 중앙제어장치에 의한 제어 패킷들을 통하여 적절한 파라미터를 제공받는 연구가 존재한다.However, in most cases, performance was optimized by applying static parameters for specific environments. In addition, other studies exist in which appropriate parameters are provided through additional packet exchange in the network or control packets by a central control device.

정적인 파라미터 조절방식은 네트워크 환경이 변화하면 사용자에 의해 파라미터를 조절해야 한다는 문제점이 있다.The static parameter adjustment method has the problem that parameters must be adjusted by the user when the network environment changes.

또한, 추가적인 패킷 전송을 통한 방식의 문제점은 네트워크에 추가적인 부하가 가해진다는 문제점이 존재한다.Additionally, a problem with the method using additional packet transmission is that additional load is applied to the network.

따라서, mIoT와 같은 많은 수의 IoT 장치를 가지는 네트워크 환경에서 경쟁이 심화되어도 성능을 일정수준 이상 보장할 수 있도록 하는 새로운 기술의 개발이 요구되고 있다.Therefore, in a network environment with a large number of IoT devices, such as mIoT, there is a need for the development of new technologies that can guarantee performance at a certain level or higher even as competition intensifies.

대한민국 공개특허 제10-2009-0022366호Republic of Korea Patent Publication No. 10-2009-0022366 대한민국 등록특허 제10-0813884호Republic of Korea Patent No. 10-0813884 대한민국 등록특허 제10-2308799호Republic of Korea Patent No. 10-2308799

본 발명은 종래 기술의 대규모 사물인터넷 기술의 문제점을 해결하기 위한 것으로, 인공지능의 강화학습 기법인 Q-learning을 통하여 파라미터를 추가적인 패킷 부하 없이 조절할 수 있도록 한 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is intended to solve the problems of the large-scale Internet of Things technology of the prior art, and is an IEEE 802.15.4 non-slotted CSMA/CA that allows parameters to be adjusted without additional packet load through Q-learning, a reinforcement learning technique of artificial intelligence. The purpose is to provide a reinforcement learning-based dynamic backoff parameter selection device and method for large-scale Internet of Things.

본 발명은 채널 접근 경쟁이 심화되었을 때 macMaxBE, macMaxCSMABackoffs, macMinBE값을 조절하여 MAC 계층의 성능을 향상시켜 많은 수의 IoT 장치를 가지는 네트워크 환경에서는 경쟁이 심화되어도 성능을 일정수준 이상 보장할 수 있도록 한 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법을 제공하는데 그 목적이 있다.The present invention improves the performance of the MAC layer by adjusting the macMaxBE, macMaxCSMABackoffs, and macMinBE values when channel access competition intensifies, thereby ensuring performance at a certain level even when competition intensifies in a network environment with a large number of IoT devices. The purpose is to provide a reinforcement learning-based dynamic backoff parameter selection device and method for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA.

본 발명은 각 노드는 전송에 성공 여부에 따라 macMinBE값에 대한 보상 누적 값을 얻을 수 있으며, 네트워크에 경쟁 노드가 많아질수록 점차 높은 macMinBE값을 선택할 수 있어 노드 스스로 학습을 통하여 파라미터를 적절히 조절하는 것에 의해 패킷전송률, 지연시간 측면에서 우수한 특성을 확보할 수 있도록 한 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법을 제공하는데 그 목적이 있다.In the present invention, each node can obtain a cumulative reward value for the macMinBE value depending on whether the transmission is successful, and as the number of competing nodes in the network increases, a gradually higher macMinBE value can be selected, allowing the nodes to appropriately adjust parameters through learning on their own. The purpose is to provide a reinforcement learning-based dynamic backoff parameter selection device and method for large-scale IoT of IEEE 802.15.4 non-slot CSMA/CA that ensures excellent characteristics in terms of packet transmission rate and delay time. .

본 발명은 네트워크의 각 노드들은 추가적인 오버헤드, 사용자의 파라미터 조정 없이 학습된 정보에 기반하여 자율적으로 macMinBE값을 조율하여 경쟁이 심화되어도 성능을 보장할 수 있도록 한 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is an IEEE 802.15.4 non-slot CSMA/CA system in which each node in the network autonomously adjusts the macMinBE value based on learned information without additional overhead or user parameter adjustment, ensuring performance even when competition intensifies. The purpose is to provide a reinforcement learning-based dynamic backoff parameter selection device and method for large-scale Internet of Things.

본 발명의 다른 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Other objects of the present invention are not limited to the objects mentioned above, and other objects not mentioned will be clearly understood by those skilled in the art from the description below.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치는 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택을 적어도 하나의 프로세서를 포함하는 전자 장치가, Q-learning 모듈로 행동 값 요청하여 행동값을 수신하고, PHY 계층 모듈로 패킷 전송을 요청하고, PHY 계층 모듈과 채널 정보를 송수신하는 MAC 계층 모듈;Q-table을 포함하고, 상태 유지, 행동 선택, 보상 관리를 수행하고 Q(s,a)값을 유지하는 Q-learning 모듈;MAC 계층 모듈로부터 패킷의 송신요청을 받고 또 다른 패킷 수신시에 상위 계층인 MAC 계층으로 패킷을 전달하는 PHY 계층 모듈;을 포함하는 것을 특징으로 한다.In order to achieve the above object, a reinforcement learning-based dynamic backoff parameter selection device for large-scale Internet of Things of IEEE 802.15.4 non-slot CSMA/CA according to the present invention is used for large-scale Internet of Things of IEEE 802.15.4 non-slot CSMA/CA. For reinforcement learning-based dynamic backoff parameter selection, an electronic device including at least one processor requests an action value from the Q-learning module, receives the action value, requests packet transmission to the PHY layer module, and MAC layer module that transmits and receives channel information; Q-learning module that includes Q-table, performs state maintenance, action selection, reward management, and maintains Q(s,a) value; Transmission of packets from MAC layer module A PHY layer module that receives a request and transfers the packet to the MAC layer, which is an upper layer, when receiving another packet.

다른 목적을 달성하기 위한 본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 방법은 적어도 하나의 프로세서를 포함하는 전자 장치에서 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택을 위한 동작이 수행되고, 채널 접근부와 Q-learning 모듈을 통하여 minBE값이 선택되며, BE, NB, macMaxCSMABackoffs 변수를 초기화하는 단계;채널 접근부를 통해 [0, 2^BE-1]의 백오프 범위에서 무작위 시간동안 백오프 지연을 발생시키는 단계;채널 감지부를 통해 채널 정보의 유휴함을 송수신하는 단계;채널 유휴를 판단하여 채널이 유휴하다면, MAC계층 관리부와 패킷 송수신부를 통해 패킷을 전송하는 단계;패킷 송수신부, MAC 계층 관리부를 통해 ACK 메시지의 수신여부를 판단하여 결과 정보를 상태 유지부로 전달하는 단계;상태 유지부와 보상 관리부를 통해 Q-table을 업데이트하는 단계;를 포함하는 것을 특징으로 한다.Reinforcement learning-based dynamic backoff parameter selection method for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA according to the present invention for achieving another purpose is a method for selecting IEEE 802.15.4 non-slotted CSMA/CA in an electronic device including at least one processor. An operation is performed to select dynamic backoff parameters based on reinforcement learning for large-scale IoT in slot CSMA/CA, the minBE value is selected through the channel access unit and Q-learning module, and the BE, NB, and macMaxCSMABackoffs variables are initialized. Step; Generating a backoff delay for a random time in the backoff range of [0, 2 ^BE -1] through a channel access unit; Transmitting and receiving idleness of channel information through a channel detection unit; Determining channel idleness to determine channel idleness If this is idle, transmitting a packet through the MAC layer management unit and the packet transceiver; determining whether an ACK message has been received through the packet transceiver and the MAC layer management unit and transmitting the resulting information to the state maintenance unit; state maintenance unit and compensation Characterized by including a step of updating the Q-table through the management unit.

이상에서 설명한 바와 같은 본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법은 다음과 같은 효과가 있다.As described above, the reinforcement learning-based dynamic backoff parameter selection device and method for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA according to the present invention has the following effects.

첫째, 인공지능의 강화학습 기법인 Q-learning을 통하여 파라미터를 추가적인 패킷 부하 없이 조절할 수 있도록 한다.First, Q-learning, an artificial intelligence reinforcement learning technique, allows parameters to be adjusted without additional packet load.

둘째, 채널 접근 경쟁이 심화되었을 때 macMaxBE, macMaxCSMABackoffs, macMinBE값을 조절하여 MAC 계층의 성능을 향상시켜 많은 수의 IoT 장치를 가지는 네트워크 환경에서는 경쟁이 심화되어도 성능을 일정수준 이상 보장할 수 있도록 한다.Second, when competition for channel access intensifies, the performance of the MAC layer is improved by adjusting the macMaxBE, macMaxCSMABackoffs, and macMinBE values, so that performance can be guaranteed to a certain level even if competition intensifies in a network environment with a large number of IoT devices.

셋째, 각 노드는 전송에 성공 여부에 따라 macMinBE값에 대한 보상 누적 값을 얻을 수 있으며, 네트워크에 경쟁 노드가 많아질수록 점차 높은 macMinBE값을 선택할 수 있어 노드 스스로 학습을 통하여 파라미터를 적절히 조절하는 것에 의해 패킷전송률, 지연시간 측면에서 우수한 특성을 확보할 수 있도록 한다.Third, each node can obtain a cumulative compensation value for the macMinBE value depending on whether the transmission is successful, and as there are more competing nodes in the network, a gradually higher macMinBE value can be selected, allowing the nodes to appropriately adjust parameters through learning on their own. This ensures excellent characteristics in terms of packet transmission rate and delay time.

넷째, 네트워크의 각 노드들은 추가적인 오버헤드, 사용자의 파라미터 조정 없이 학습된 정보에 기반하여 자율적으로 macMinBE값을 조율하여 경쟁이 심화되어도 성능을 보장할 수 있도록 한다.Fourth, each node in the network autonomously adjusts the macMinBE value based on learned information without additional overhead or user parameter adjustment, ensuring performance even when competition intensifies.

도 1은 IEEE 802.15.4의 Unslotted CSMA/CA 알고리즘 흐름도
도 2는 강화학습 모델 구성도
도 3은 macMinBE값과 경쟁 노드 수에 따른 패킷 전송률 변화 그래프
도 4는 본 발명에 따른 강화학습 기반 동적 백오프 파라미터 선택 장치의 구성도
도 5는 본 발명에 따른 강화학습 기반 동적 백오프 파라미터 선택 방법을 나타낸 플로우 차트
도 6은 경쟁 노드 수 변화에 따른 패킷전송률 그래프
도 7은 경쟁 노드 수 변화에 따른 지연시간 그래프
도 8은 본 발명에 따른 경쟁 노드수 변화에 따른 MinBE선택 비율 그래프Figure 1 is a flow chart of the Unslotted CSMA/CA algorithm of IEEE 802.15.4
Figure 2 is a reinforcement learning model configuration diagram
Figure 3 is a graph of packet transmission rate change according to macMinBE value and number of competing nodes.
Figure 4 is a configuration diagram of a reinforcement learning-based dynamic backoff parameter selection device according to the present invention.
Figure 5 is a flow chart showing a reinforcement learning-based dynamic backoff parameter selection method according to the present invention.
Figure 6 is a graph of packet transmission rate according to change in the number of competing nodes.
Figure 7 is a graph of delay time according to change in the number of competing nodes.
Figure 8 is a graph of MinBE selection ratio according to change in the number of competing nodes according to the present invention.

이하, 본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of the reinforcement learning-based dynamic backoff parameter selection device and method for the large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA according to the present invention will be described in detail as follows.

본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.The features and advantages of the reinforcement learning-based dynamic backoff parameter selection device and method for the large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA according to the present invention will become apparent through the detailed description of each embodiment below. .

도 2는 강화학습 모델 구성도이다.Figure 2 is a diagram showing the configuration of a reinforcement learning model.

본 개시에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in this disclosure are general terms that are currently widely used as much as possible while considering the functions in this disclosure, but this may vary depending on the intention or precedent of a person working in the art, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of this disclosure, rather than simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a part "includes" a certain element throughout the specification, this means that, unless specifically stated to the contrary, it does not exclude other elements but may further include other elements. In addition, terms such as "... unit" and "module" used in the specification refer to a unit that processes at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software. .

특히, 적어도 하나의 기능이나 동작을 처리하는 단위들은 적어도 하나의 프로세서를 포함하는 전자 장치로 구현될 수 있고, 기능이나 동작을 처리하는 방식에 따라 전자 장치에 적어도 하나의 주변 장치가 연결될 수 있다. 주변 장치들은 데이터 입력 장치, 데이터 출력 장치, 데이터 저장 장치를 포함할 수 있다.In particular, units that process at least one function or operation may be implemented as an electronic device including at least one processor, and at least one peripheral device may be connected to the electronic device depending on the method of processing the function or operation. Peripheral devices may include data input devices, data output devices, and data storage devices.

본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법은 인공지능의 강화학습 기법인 Q-learning을 통하여 파라미터를 추가적인 패킷 부하 없이 조절할 수 있도록 한 것이다.The reinforcement learning-based dynamic backoff parameter selection device and method for the large-scale Internet of Things of IEEE 802.15.4 non-slot CSMA/CA according to the present invention can adjust parameters without additional packet load through Q-learning, a reinforcement learning technique of artificial intelligence. It was made possible.

이를 위하여, 본 발명은 많은 수의 IoT 장치를 가지는 네트워크 환경에서는 경쟁이 심화되어도 성능을 일정수준 이상 보장할 수 있도록 채널 접근 경쟁이 심화되었을 때 macMaxBE, macMaxCSMABackoffs, macMinBE값을 조절하여 MAC 계층의 성능을 향상시키는 구성을 포함할 수 있다.To this end, the present invention improves the performance of the MAC layer by adjusting the macMaxBE, macMaxCSMABackoffs, and macMinBE values when channel access competition intensifies so that performance can be guaranteed at a certain level even when competition intensifies in a network environment with a large number of IoT devices. It may include configurations that improve it.

본 발명은 패킷전송률, 지연시간 측면에서 우수한 특성을 확보할 수 있도록 하기 위하여, 각 노드는 전송에 성공 여부에 따라 macMinBE값에 대한 보상 누적 값을 얻고, 네트워크에 경쟁 노드가 많아질수록 점차 높은 macMinBE값을 선택할 수 있어 노드 스스로 학습을 통하여 파라미터를 적절히 조절하는 구성을 포함할 수 있다.In order to secure excellent characteristics in terms of packet transmission rate and delay time, the present invention obtains a cumulative compensation value for the macMinBE value depending on whether the transmission is successful, and as the number of competing nodes in the network increases, the macMinBE gradually increases. Since the value can be selected, it can include a configuration that appropriately adjusts the parameters through learning by the node itself.

본 발명은 네트워크의 각 노드들은 추가적인 오버헤드, 사용자의 파라미터 조정 없이 학습된 정보에 기반하여 자율적으로 macMinBE값을 조율하여 경쟁이 심화되어도 성능을 보장할 수 있도록 하는 구성을 포함할 수 있다.The present invention may include a configuration in which each node in the network autonomously adjusts the macMinBE value based on learned information without additional overhead or user parameter adjustment to ensure performance even when competition intensifies.

이하의 설명에서 macMaxFrameRetries는 패킷의 재전송 기회를 의미하며, macMaxCSMABackoffs는 단일 패킷에 대한 채널 유휴여부 판단 기회를 의미한다. 그리고 macMaxBE와 macMinBE는 각각 백오프 지수의 최댓값과 최소값이다. In the description below, macMaxFrameRetries refers to the opportunity to retransmit a packet, and macMaxCSMABackoffs refers to the opportunity to determine whether the channel is idle for a single packet. And macMaxBE and macMinBE are the maximum and minimum values of the backoff index, respectively.

도 2에서 나타낸 강화학습(Reinforcement Learning)은 MDP(Markov Decision Process) 기반의 최적화 개념과 동물심리학 개념(trial-and-error)을 결합한 인공지능 기반 기계학습 알고리즘 중 하나이며, 시스템 최적화 문제를 풀기 위해 많은 연구 및 개발이 이루어지고 있다.Reinforcement Learning, shown in Figure 2, is one of the artificial intelligence-based machine learning algorithms that combines MDP (Markov Decision Process)-based optimization concept and animal psychology concept (trial-and-error), and is used to solve system optimization problems. A lot of research and development is being done.

또한, 강화학습은 모든 시스템 환경 정보를 담당하고 관여하는 시뮬레이션 혹은 시스템환경(Environment)을 중심으로 에이전트(Agent)가 환경에서 파생되는 데이터를 이용하여 보상함수(Reward Function)를 구성하고 이를 반복적으로 개선하여 최적의 목표를 달성하는 시스템 제어 방법이다.In addition, reinforcement learning is centered around a simulation or system environment that is responsible for and participates in all system environment information, and the agent uses data derived from the environment to construct a reward function and improve it iteratively. It is a system control method that achieves the optimal goal.

이를 위해서 에이전트는 환경으로부터 파생되는 복수의 환경 상태(State) 변화, 에이전트의 행동(Action) 제어, 시스템 보상함수 설계, 정책(Policy) 개선 및 최적화(Optimization) 모델 도출이라는 유기적인 프로세스를 진행하여야 하며, 이에 따른 환경 상태 정의, 행동 결정, 보상함수 및 정책 설계 등의 학습 지표들이 잘 맞물려서 작동해야 좋은 학습 효과를 얻을 수 있다.To achieve this, the agent must proceed with an organic process of changing multiple environmental states derived from the environment, controlling the agent's action, designing the system compensation function, improving the policy, and deriving an optimization model. , learning indicators such as environmental state definition, action decision, reward function, and policy design must work well together to achieve good learning effects.

특히 강화학습은 무선네트워크의 동적 상황을 반영하여 학습시킬 수 있기에 무선네트워크 분야에서 사용할 수 있는 기계학습법이다. 강화학습의 요소로는 정책, 보상, 가치 함수, 환경, 주체가 있다. 정책은 주체의 행동을 결정짓는 방식을 정의하며, 보상은 주체의 행동에 의한 환경으로부터의 평가이다. 가치함수는 주체가 특정 상태(State)에서 이후 받을 수 있는 보상들의 감가가 더해진 값이다. 일반적으로 정책은 이 가치함수를 최대화 할 수 있는 방향으로 정해진다.In particular, reinforcement learning is a machine learning method that can be used in the wireless network field because it can learn by reflecting the dynamic situation of the wireless network. Elements of reinforcement learning include policy, reward, value function, environment, and subject. Policies define how to determine the subject's behavior, and rewards are the evaluation from the environment of the subject's actions. The value function is a value that adds the depreciation of rewards that the subject can receive later in a specific state. In general, policies are set in a direction that can maximize this value function.

본 발명에서 사용될 Q-learning은 TD(Temporal-Difference)모델에 기반한 Off-policy 알고리즘이다.Q-learning to be used in the present invention is an off-policy algorithm based on the TD (Temporal-Difference) model.

가치 함수가 에이전트의 상태(s)에 대해서만 보상을 저장하는 것과는 다르게 는 에이전트의 상태, 행동까지 반영하여 보상을 저장한다. value function Unlike storing rewards only for the agent's state(s), stores the reward by reflecting the agent's state and behavior.

값은 수학식 1 및 수학식 2에 의해 갱신되며 이 (s,a)의 Q-Table을 통하여 보상을 최대화 할 행동을 선택하는 정책을 가지게 된다. The value is updated by Equation 1 and Equation 2, and a policy for selecting the action that will maximize the reward is obtained through the Q-Table of (s,a).

비슬롯 CSMA/CA는 경쟁기반 채널접근 프로토콜이기에 채널에 접근하고자 하는 노드 수가 많아질수록 채널이 점유되어 있을 확률과 설령 패킷을 전송하여도 패킷의 충돌 확률이 높아진다.Since non-slot CSMA/CA is a contention-based channel access protocol, as the number of nodes trying to access the channel increases, the probability that the channel is occupied and even if packets are transmitted, the probability of packet collision increases.

즉, mIoT와 같은 많은 수의 IoT 장치를 가지는 네트워크 환경에서는 경쟁이 심화되어도 성능을 일정수준 이상 보장되어야 한다.In other words, in a network environment with a large number of IoT devices such as mIoT, performance must be guaranteed at a certain level even if competition intensifies.

IEEE 802.15.4에서 주어진 표 1과 같은 파라미터들은 기본 값을 가지고 있으나 이는 채널 접근 경쟁이 심화되었을 때를 반영하지 못한다.Parameters as shown in Table 1 given in IEEE 802.15.4 have default values, but this does not reflect when channel access competition intensifies.

본 발명은 macMaxBE, macMaxCSMABackoffs, macMinBE값을 조절함으로써 MAC 계층의 성능 향상을 목표로 한다.The present invention aims to improve the performance of the MAC layer by adjusting the macMaxBE, macMaxCSMABackoffs, and macMinBE values.

강화학습 설계의 주요 4요소는 도 2에서의 주체(agent), 상태(state), 행동(action), 보상(reward)이다. 주체는 학습의 대상이며, 상태(state)는 학습 주체가 바라보는 환경이다. 즉, 무엇을 보고 행동을 할 것 인가이며, 행동은 주체가 현재 상태에서 어떤 동작을 할 것 인지이며, 보상은 현재 상태에 대해 어떤 행동을 하였느냐에 대한 환경(environment)으로부터의 평가이다.The four main elements of reinforcement learning design are agent, state, action, and reward in Figure 2. The subject is the object of learning, and the state is the environment viewed by the learning subject. In other words, what to see and how to act, action is what the subject will do in the current state, and reward is an evaluation from the environment of what action was taken in relation to the current state.

본 발명에서의 주체는 네트워크의 노드들이며, 상태는 패킷 전송률, 행동은 표 1의 macMinBE값을 선택하는 것이 행동이며, 보상은 패킷 전송 후 ACK메시지의 수신여부에 따라 달라진다. ACK메시지는 부모노드가 성공적으로 패킷을 수신했을 때 보내는 메시지이다. 상태, 행동, 보상에 대한 구체적인 사항은 다음과 같다.In the present invention, the subjects are the nodes of the network, the state is the packet transmission rate, the action is to select the macMinBE value in Table 1, and the reward varies depending on whether an ACK message is received after packet transmission. The ACK message is a message sent when the parent node successfully receives a packet. Specific details on status, actions, and rewards are as follows.

본 발명에서의 상태는 수학식 3을 따른다.The state in the present invention follows Equation 3.

수학식 3은 패킷 전송률의 수식이며, 0~20까지의 정수값을 가짐으로써 상태의 개수는 21개이다. 즉, 학습의 주체인 노드는 현재 자신의 패킷 전송률을 관찰하여 행동하게 된다.Equation 3 is the formula for packet transmission rate, and has integer values from 0 to 20, so the number of states is 21. In other words, the node that is the subject of learning takes action by observing its current packet transmission rate.

본 발명에서의 행동은 macMinBE값을 선택하는 것이다. The action in the present invention is to select the macMinBE value.

macMinBE값은 CSMA/CA 알고리즘이 시작될 때 BE(Backoff Exponential)값에 적용 되는 값이다. 예를 들어, macMinBE값이 표 1의 기본 값인 3일 때, CSMA/CA알고리즘 실행 시 3으로 초기화 되어 [0, 2³-1]범위 내에서 백오프 시간을 무작위로 선택하게 된다. The macMinBE value is the value applied to the BE (Backoff Exponential) value when the CSMA/CA algorithm starts. For example, when the macMinBE value is 3, the default value in Table 1, when the CSMA/CA algorithm is executed, it is initialized to 3 and the backoff time is randomly selected within the range of [0, 2 ³ -1].

도 3은 macMinBE값과 경쟁 노드 수에 따른 패킷 전송률 변화 그래프이다.Figure 3 is a graph of packet transmission rate change according to macMinBE value and number of competing nodes.

macMinBE값은 [도 3에서와 같이 경쟁 노드수가 증가할수록 높은 값을 가지는 것이 유리하다. 그 이유는 경쟁 노드 수가 많아지면 많아질수록 백오프 시간의 선택지도 많아져 더 넓은 범위에서 시작하는 것이 유리하기 때문이다.The macMinBE value [As shown in Figure 3, it is advantageous to have a higher value as the number of competing nodes increases. The reason is that as the number of competing nodes increases, there are more options for backoff time, so it is advantageous to start from a wider range.

예를 들어, 노드 수가 50개일 때 50개의 노드가 [0, 2³-1]범위의 8개의 백오프 선택지에서 고르는 것 보다 [0, 2⁸-1]범위의 255개의 백오프 선택지에서 고르는 것이 패킷 충돌 확률이 적기 때문이다.For example, when the number of nodes is 50, it is better for 50 nodes to choose from 255 backoff options in the range [0, 2 ⁸ -1] than to choose from 8 backoff options in the range [0, 2 ³ -1]. This is because the probability of packet collision is low.

행동의 선택은 Q-learning의 Q-값 배열에서 최댓값을 가지는 값이 선택된다. When selecting an action, the value with the maximum value in the Q-value array of Q-learning is selected.

Q-값 배열은 행으로 행동, 열로 상태에 대한 보상 값을 축적한다.A Q-value array accumulates reward values for actions as rows and states as columns.

본 발명에서의 상태는 각 노드의 패킷 전송률이고, 행동은 시작 시 초기화 되는 macMinBE값이므로, Q-값 테이블의 각 요소는 각 노드의 패킷 전송률에 따른 macMinBE값을 적용했을 때의 보상 값의 축적치 이다.In the present invention, the state is the packet transmission rate of each node, and the behavior is the macMinBE value initialized at startup, so each element of the Q-value table is an accumulation of compensation values when applying the macMinBE value according to the packet transmission rate of each node. am.

그러므로 각 노드는 현재 패킷 전송률에서 보상 값의 축적치 중 가장 높은 행동을 취한다.Therefore, each node takes the action with the highest accumulated reward value at the current packet transmission rate.

본 발명에서의 보상은 행동에서 설명하였듯이 각 노드의 패킷 전송률에 따른 macMinBE값을 적용했을 때의 결과 값이다.The compensation in the present invention is the result of applying the macMinBE value according to the packet transmission rate of each node, as explained in the action.

보상 값에 대한 설계는 행동을 선택함에 있어서 직접적으로 영향을 미친다. Designing reward values directly influences behavior selection.

본 발명에서의 보상은 전송 후 ACK 메시지의 수신여부에 따라 달라진다. 전송 후 ACK수신에 성공하면 수학식 4를 통해 보상을 받고, 노드가 전송에 성공한 BE값보다 큰 BE값들에 대해서는 수학식 5를 통해 패널티를 가한다.Compensation in the present invention varies depending on whether an ACK message is received after transmission. If ACK reception is successful after transmission, a reward is received through Equation 4, and a penalty is applied through Equation 5 for BE values that are greater than the BE value that the node successfully transmitted.

반대로 전송 후 ACK수신에 실패하면 수학식 6을 통해 보상을 받고, 노드가 전송에 실패한 BE값보다 큰 BE값들에 대해서는 수학식 7을 통해 보상을 준다.Conversely, if ACK reception fails after transmission, compensation is received through Equation 6, and compensation is provided through Equation 7 for BE values that are greater than the BE value that the node failed to transmit.

S는 State, 즉 패킷 전송률이다. 20/S는 패킷 전송률이 낮을수록 패킷의 전송 성공 혹은 실패에 대한 보상을 유동적으로 주기 위함이다.S is State, that is, packet transmission rate. 20/S is intended to provide flexible compensation for packet transmission success or failure as the packet transmission rate is lower.

NB는 노드가 실행한 CSMA/CA의 횟수이며, MaxNB는 최대로 실행할 수 있는 CSMA/CA 횟수이다. 즉 CSMA/CA를 적게 시도할수록 가해지는 패널티가 적다.NB is the number of CSMA/CAs executed by the node, and MaxNB is the maximum number of CSMA/CAs that can be executed. In other words, the fewer CSMA/CA attempts, the smaller the penalty is.

마지막으로 A는 Action, 즉 노드가 선정한 macMinBE값이며 MaxBE는 최대로 선택할 수 있는 BE값이다. 즉 낮은 BE값에서 전송이 성공할수록 가해지는 패널티가 적다.Lastly, A is the Action, that is, the macMinBE value selected by the node, and MaxBE is the maximum BE value that can be selected. In other words, the more successful the transmission is at a low BE value, the smaller the penalty is.

기타 파라미터 값 변경은 다음과 같다.Other parameter value changes are as follows.

macMaxBE값은 강화학습 Agent의 자율성을 보장하기 위하여 표 1의 최대치인 8을 적용한다.The macMaxBE value applies the maximum value of 8 in Table 1 to ensure the autonomy of the reinforcement learning agent.

macMaxCSMABackoffs 값은 Agent가 어떤 macMinBE값을 선택하여도 최대치인 macMaxBE값 8까지 탐색할 수 있도록 수학식 8을 적용한다.The macMaxCSMABackoffs value applies Equation 8 so that the Agent can search up to the maximum macMaxBE value of 8, no matter which macMinBE value it selects.

도 4는 본 발명에 따른 강화학습 기반 동적 백오프 파라미터 선택 장치의 구성도이다.Figure 4 is a configuration diagram of a reinforcement learning-based dynamic backoff parameter selection device according to the present invention.

본 발명에 따른 강화학습 기반 동적 백오프 파라미터 선택 장치는 크게, MAC 계층 모듈(200), Q-learning 모듈(100), PHY 계층 모듈(300)로 이뤄져 있다. The reinforcement learning-based dynamic backoff parameter selection device according to the present invention largely consists of a MAC layer module 200, a Q-learning module 100, and a PHY layer module 300.

MAC 계층 모듈(200)은 채널 접근부(50), MAC 계층 관리부(60)을 포함한다.The MAC layer module 200 includes a channel access unit 50 and a MAC layer management unit 60.

채널 접근부(50)에서는 Q-learning 모듈(100)의 행동 선택부(30)에게 행동 값 요청하고 행동값을 수신하고, MAC 계층 관리부(60)로 패킷 전송을 요청 한다. The channel access unit 50 requests an action value from the action selection unit 30 of the Q-learning module 100, receives the action value, and requests packet transmission to the MAC layer management unit 60.

또한 CCA를 위하여 PHY 계층 모듈(300)의 채널 감지부(70)와 채널 정보를 송수신한다. Additionally, for CCA, channel information is transmitted and received with the channel detection unit 70 of the PHY layer module 300.

MAC 계층 관리부(60)에서는 채널 접근부(50)로부터 패킷 전송 요청을 수신하여 PHY 계층 모듈(300)의 패킷 송수신부(80)로 패킷 전송을 요청한다. The MAC layer management unit 60 receives a packet transmission request from the channel access unit 50 and requests the packet transmission/reception unit 80 of the PHY layer module 300 to transmit the packet.

그리고 Q-learning 모듈(100)은 Q-table(10), 행동 선택부(30), 상태 유지부(40), 보상 관리부(20)를 포함한다.And the Q-learning module 100 includes a Q-table (10), an action selection unit 30, a state maintenance unit 40, and a reward management unit 20.

Q-table(10)은 상태 유지부(40), 행동 선택부(30), 보상 관리부(20)와의 유기적인 정보 교환으로 Q(s,a)값을 유지한다.The Q-table (10) maintains the Q(s,a) value through organic information exchange with the state maintenance unit (40), action selection unit (30), and reward management unit (20).

행동 선택부(30)에서는 채널 접근부(50)로의 행동 값 요청을 수신하고, 이에 따라 상태 유지부(40)로부터 상태값을 확인하여 Q-table(10)을 참조, 최대 Q(s,a)값을 가지는 행동을 반환한다.The action selection unit 30 receives an action value request from the channel access unit 50, checks the status value from the state maintenance unit 40, and refers to the Q-table (10), with a maximum Q(s,a). ) returns an action with a value.

상태 유지부(40)는 MAC 계층 모듈(200)에서 송신 요청한 패킷의 성공 여부 정보를 수신하여 상태를 수시로 갱신하여 유지시킨다.The state maintenance unit 40 receives information on whether the packet requested to be transmitted is successful or not from the MAC layer module 200, and updates and maintains the state from time to time.

보상 관리부(20)는 행동 선택부(30)와 MAC 계층의 패킷 송신 결과 정보를 참조하여 보상을 발생시키고, 이를 Q-table(10)에 반영하여 갱신한다. The compensation management unit 20 generates compensation by referring to the packet transmission result information of the action selection unit 30 and the MAC layer, and updates it by reflecting it in the Q-table 10.

그리고 PHY 계층 모듈(300)의 패킷 송수신부(80)는 상위 계층인 MAC 계층 모듈(200)로부터 패킷의 송신요청을 받고 또 다른 패킷 수신시에 상위 계층인 MAC 계층으로 패킷을 전달하며, 채널 감지부(70)는 MAC 계층 모듈(200)의 채널 접근부(50)와 채널 정보를 송수신한다.And the packet transceiver 80 of the PHY layer module 300 receives a packet transmission request from the MAC layer module 200, which is an upper layer, and when receiving another packet, transfers the packet to the MAC layer, which is an upper layer, and detects the channel. The unit 70 transmits and receives channel information to and from the channel access unit 50 of the MAC layer module 200.

본 발명에 따른 강화학습 기반 동적 백오프 파라미터 선택 방법을 구체적으로 설명하면 다음과 같다.The reinforcement learning-based dynamic backoff parameter selection method according to the present invention will be described in detail as follows.

도 5는 본 발명에 따른 강화학습 기반 동적 백오프 파라미터 선택 방법을 나타낸 플로우 차트이다.Figure 5 is a flow chart showing a reinforcement learning-based dynamic backoff parameter selection method according to the present invention.

먼저, 채널 접근부(50)와 Q-learning 모듈(100)을 통하여 minBE값이 선택되며, BE, NB, macMaxCSMABackoffs 변수가 초기화된다.(S501)First, the minBE value is selected through the channel access unit 50 and the Q-learning module 100, and the BE, NB, and macMaxCSMABackoffs variables are initialized (S501).

이어, 채널 접근부(50)를 통해 [0, 2^BE-1]의 백오프 범위에서 무작위 시간동안 백오프 지연을 발생시킨다.(S502)Next, a backoff delay is generated for a random time in the backoff range of [0, 2 ^BE -1] through the channel access unit 50 (S502).

그리고 채널 감지부(70)를 통해 채널 정보의 유휴함을 송수신한다.(S503)Then, the idleness of the channel information is transmitted and received through the channel detection unit 70 (S503).

이어, 채널 유휴를 판단하고(S504) 채널이 유휴하다면, MAC계층 관리부(60)와 패킷 송수신부(80)를 통해 패킷을 전송하고(S505), 패킷 송수신부(80), MAC 계층 관리부(60)를 통해 ACK 메시지의 수신여부를 판단하여(S506) 결과 정보를 상태 유지부(40)로 전달한다.(S507)Next, it is determined that the channel is idle (S504), and if the channel is idle, the packet is transmitted through the MAC layer management unit 60 and the packet transceiver 80 (S505), and the packet transceiver 80 and the MAC layer management unit 60 ) to determine whether the ACK message is received (S506) and transmit the resulting information to the state maintenance unit 40 (S507).

그리고 상태 유지부(40)와 보상 관리부(20)를 통해 Q-table이 업데이트 된다.(S510)Then, the Q-table is updated through the state maintenance unit 40 and the compensation management unit 20 (S510).

만약, 채널 유휴를 판단하여(S504) 채널이 유휴하지 않다면 채널 접근부(50)를 통해 다음 백오프 알고리즘을 진행할지에 대한 여부를 판단한다.(S508)(S509)If the channel is determined to be idle (S504) and the channel is not idle, it is determined whether to proceed with the next backoff algorithm through the channel access unit 50 (S508) (S509).

도 6은 경쟁 노드 수 변화에 따른 패킷전송률 그래프이다.Figure 6 is a graph of packet transmission rate according to change in the number of competing nodes.

네트워크 시뮬레이터 3(ns-3)를 통해 본 발명의 성능을 측정한 결과는 다음과 같다.The results of measuring the performance of the present invention through network simulator 3 (ns-3) are as follows.

성능은 ns-3의 IEEE 802.15.4기반인 LR-WPAN모델의 MAC계층과 CSMA/CA모듈 내에 본 발명을 적용하고, 종래 기술과의 성능을 비교하며, 노드들의 macMinBE 선택 빈도를 관찰한다.For performance, the present invention is applied within the MAC layer and CSMA/CA module of the LR-WPAN model based on IEEE 802.15.4 of ns-3, the performance is compared with the prior art, and the macMinBE selection frequency of nodes is observed.

노드들의 경쟁이 심화될 때를 가정하므로, 노드 수를 점차 증가시키며 패킷 전송률과 지연시간을 비교분석한다. 학습의 주체가 macMinBE값을 제대로 선택해야 하는지 비교하기 위해 macMaxBE과 macMaxCSMABackoffs는 각각 본 발명의 값과 동일한 8과 수학식 3을 따른다.As we assume a time when competition among nodes intensifies, we gradually increase the number of nodes and compare and analyze packet transmission rate and delay time. In order to compare whether the subject of learning should properly select the macMinBE value, macMaxBE and macMaxCSMABackoffs follow 8 and Equation 3, respectively, which are the same as the values of the present invention.

도 6에서 본 발명은 경쟁 노드 수 10개 이하에선 기존 CSMA/CA와 별 다르지 않은 모습을 보인다. 그러나 경쟁이 심화될수록 기존 기법의 패킷전송률이 낮아지는 것 대비 하락폭이 감소함을 볼 수 있다. 이는 점차 macMinBE값을 적절하게 선택하고 있음을 의미한다. 또한 기존 알고리즘의macMinBE=(5,6)의 패킷전송률과 도 7의 지연시간을 교차 비교하면 지연시간은 발명기법이 항상 더 나음을 보이면서도 패킷 전송률은 점차 기존 기법을 추월하는 모습을 보인다. 즉, 기존 기법 대비 성능이 우수함을 알 수 있다.In Figure 6, the present invention shows no different from the existing CSMA/CA when the number of competing nodes is 10 or less. However, as competition intensifies, it can be seen that the rate of decline decreases compared to the packet transmission rate of the existing technique. This means that we are increasingly choosing the macMinBE value appropriately. In addition, when cross-comparing the packet transmission rate of macMinBE=(5,6) of the existing algorithm with the delay time in FIG. 7, the delay time shows that the invented technique is always better, while the packet transmission rate gradually overtakes the existing technique. In other words, it can be seen that the performance is superior to existing techniques.

도 7은 경쟁 노드 수 변화에 따른 지연시간 그래프이다.Figure 7 is a graph of delay time according to change in the number of competing nodes.

도 8에서와 같이, 본 발명의 매커니즘은 경쟁 노드가 많아질수록 점차 높은 MinBE값을 선택함을 보인다. As shown in Figure 8, the mechanism of the present invention selects a gradually higher MinBE value as the number of competing nodes increases.

도 8은 본 발명에 따른 경쟁 노드수 변화에 따른 MinBE선택 비율 그래프이다.Figure 8 is a graph of the MinBE selection ratio according to the change in the number of competing nodes according to the present invention.

이를 통하여 경쟁 노드 수가 많아질수록 노드 스스로 학습을 통하여 파라미터를 적절히 조절할 수 있고, 종래 기술 대비 패킷전송률, 지연시간과 같은 성능 또한 우수하게 확보할 수 있다.Through this, as the number of competing nodes increases, parameters can be appropriately adjusted through the nodes' own learning, and excellent performance such as packet transmission rate and delay time can be secured compared to the conventional technology.

이상에서 설명한 본 발명에 따른 IEEE 802.15.4 비슬롯 CSMA/CA의 대규모 사물인터넷을 위한 강화학습 기반 동적 백오프 파라미터 선택 장치 및 방법은 인공지능의 강화학습 기법인 Q-learning을 통하여 파라미터를 추가적인 패킷 부하 없이 조절할 수 있도록 한 것이다.The reinforcement learning-based dynamic backoff parameter selection device and method for the large-scale Internet of Things of IEEE 802.15.4 non-slot CSMA/CA according to the present invention described above can be used to add parameters to additional packets through Q-learning, a reinforcement learning technique of artificial intelligence. It is designed so that it can be adjusted without load.

본 발명은 채널 접근 경쟁이 심화되었을 때 macMaxBE, macMaxCSMABackoffs, macMinBE값을 조절하여 MAC 계층의 성능을 향상시켜 많은 수의 IoT 장치를 가지는 네트워크 환경에서는 경쟁이 심화되어도 성능을 일정수준 이상 보장할 수 있도록 한 것이다.The present invention improves the performance of the MAC layer by adjusting the macMaxBE, macMaxCSMABackoffs, and macMinBE values when channel access competition intensifies, thereby ensuring performance at a certain level even when competition intensifies in a network environment with a large number of IoT devices. will be.

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.As described above, it will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Therefore, the specified embodiments should be considered from an illustrative rather than a limiting point of view, the scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the equivalent scope are intended to be included in the present invention. It will have to be interpreted.

100. Q-learning 모듈
200. MAC 계층 모듈
300. PHY 계층 모듈100. Q-learning module
200. MAC layer module
300. PHY layer module

Claims

An electronic device including at least one processor performs reinforcement learning-based dynamic backoff parameter selection for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA,
A MAC layer module that requests and receives action values from the Q-learning module, requests packet transmission to the PHY layer module, and transmits and receives channel information to and from the PHY layer module;
Q-learning module, which contains Q-table, performs state maintenance, action selection, reward management, and maintains Q(s,a) values;
A PHY layer module that receives a packet transmission request from the MAC layer module and delivers the packet to the upper MAC layer when receiving another packet; a large-scale Internet of Things of IEEE 802.15.4 non-slot CSMA/CA, comprising Reinforcement learning-based dynamic backoff parameter selection device for.

The method of claim 1, wherein the MAC layer module:
A channel access unit that requests and receives action values from the action selection unit of the Q-learning module, requests packet transmission to the MAC layer management unit, and transmits and receives channel information to and from the channel detection unit of the PHY layer module for CCA;
Reinforcement learning base for large-scale Internet of Things of IEEE 802.15.4 non-slotted CSMA/CA, characterized by including a MAC layer management unit that receives a packet transmission request from the channel access unit and requests packet transmission to the packet transceiver unit of the PHY layer module. Dynamic backoff parameter selection device.

The method of claim 1, wherein the Q-learning module is:
A Q-table that maintains the Q(s,a) value by exchanging information with the state maintenance unit, action selection unit, and reward management unit,
An action selection unit that receives an action value request from the channel access unit of the MAC layer module, checks the status value from the state maintenance unit, and returns an action with the maximum Q(s,a) value by referring to the Q-table;
a state maintenance unit that receives information on the success or failure of a packet requested to be transmitted from the MAC layer module and frequently updates and maintains the state;
A large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA, characterized by including an action selection unit and a compensation management unit that generates compensation by referring to the packet transmission result information of the MAC layer and updating it by reflecting it in the Q-table. Reinforcement learning-based dynamic backoff parameter selection device for.

The method of claim 1, wherein the PHY layer module:
A packet transmitter and receiver that receives a packet transmission request from the MAC layer module, which is an upper layer, and transmits the packet to the MAC layer, which is a higher layer, when receiving another packet;
Reinforcement learning-based dynamic backoff parameter selection device for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA, comprising a channel access unit of the MAC layer module and a channel detection unit for transmitting and receiving channel information.

The method of claim 1, wherein the Q-learning module is:
It uses an off-policy algorithm based on the TD (Temporal-Difference) model,
stores the reward by reflecting the agent's state and behavior,
The value is,
,
Reinforcement learning-based dynamic reinforcement learning for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, which is updated by and has a policy of selecting actions that will maximize reward through the Q-Table of (s,a). Backoff parameter selection device.

The method of claim 5, wherein the Q-learning module,
Adjust the macMaxBE, macMaxCSMABackoffs, and macMinBE values with the goal of improving the performance of the MAC layer with the elements of reinforcement learning design of agent, state, action, and reward.
Reinforcement learning-based dynamic backoff for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, where macMaxCSMABackoffs is an opportunity to determine whether the channel is idle for a single packet, and macMaxBE and macMinBE are the maximum and minimum values of the backoff index, respectively. Parameter selection device.

The method of claim 6, wherein the subjects are nodes of the network,
Status refers to packet transfer rate,
The action is to select the macMinBE value,
Reinforcement learning-based dynamic backoff parameters for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, where compensation varies depending on whether or not an ACK message is received when the parent node successfully receives the packet after transmitting the packet. Select device.

The method of claim 7, wherein the condition is:
It is defined as,
The node, which is the subject of learning, acts by observing its current packet transmission rate,
Reinforcement learning-based dynamic backoff for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA, which is characterized by selecting the macMinBE value, which is a value applied to the BE (Backoff Exponential) value when the CSMA/CA algorithm starts. Parameter selection device.

According to claim 8, when selecting an action, the value with the maximum value in the Q-value array of Q-learning is selected,
Q-value arrays act as rows, accumulate reward values for states as columns, and
Each element of the Q-value table is an accumulation of compensation values when applying the macMinBE value according to the packet transmission rate of each node,
Reinforcement learning-based dynamic backoff parameter selection device for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, where each node takes the highest action among accumulated compensation values at the current packet transmission rate.

According to claim 7, the compensation varies depending on whether the ACK message is received after transmission,
If ACK is successfully received after transmission,
You receive compensation through, and for BE values that are greater than the BE value that the node successfully transmitted,
impose a penalty through
If ACK reception fails after transmission
You receive compensation through, and for BE values that are greater than the BE value that the node failed to transmit,
Compensation is given through
S(State) is the packet transmission rate, NB is the number of CSMA/CAs executed by the node, MaxNB is the maximum number of CSMA/CAs that can be executed, A(Action) is the macMinBE value selected by the node, and MaxBE is the maximum Reinforcement learning-based dynamic backoff parameter selection device for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, characterized by selectable BE values.

An operation for reinforcement learning-based dynamic backoff parameter selection for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA is performed in an electronic device including at least one processor,
The minBE value is selected through the channel access unit and the Q-learning module, and the BE, NB, and macMaxCSMABackoffs variables are initialized;
Generating a backoff delay for a random time in the backoff range of [0, 2 ^BE -1] through the channel access unit;
Transmitting and receiving idleness of channel information through a channel detection unit;
determining whether the channel is idle and, if the channel is idle, transmitting a packet through a MAC layer management unit and a packet transceiver unit;
Determining whether an ACK message is received through a packet transceiver and MAC layer management unit and transmitting the resulting information to a state maintenance unit;
Reinforcement learning-based dynamic backoff parameter selection method for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, comprising: updating the Q-table through a state maintenance unit and a compensation management unit.

The method of claim 11, if channel idleness is determined and the channel is not idle,
Reinforcement learning-based dynamic backoff parameter selection method for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA, further comprising the step of determining whether to proceed with the next backoff algorithm through the channel access unit. .

The method of claim 11, wherein the Q-learning module is:
Adjust the macMaxBE, macMaxCSMABackoffs, and macMinBE values with the goal of improving the performance of the MAC layer with the elements of reinforcement learning design of agent, state, action, and reward.
Reinforcement learning-based dynamic backoff for large-scale Internet of Things in IEEE 802.15.4 non-slot CSMA/CA, where macMaxCSMABackoffs is an opportunity to determine whether the channel is idle for a single packet, and macMaxBE and macMinBE are the maximum and minimum values of the backoff index, respectively. How to select parameters.

14. The method of claim 13, wherein the subjects are nodes of a network,
Status refers to packet transfer rate,
The action is to select the macMinBE value,
Reinforcement learning-based dynamic backoff parameters for large-scale Internet of Things in IEEE 802.15.4 non-slotted CSMA/CA, where compensation varies depending on whether or not an ACK message is received when the parent node successfully receives the packet after transmitting the packet. How to choose.