KR20250088147A

KR20250088147A - Method and device for generating learning data for generative artificial intelligence using spatial information and real-time sensor information

Info

Publication number: KR20250088147A
Application number: KR1020230177925A
Authority: KR
Inventors: 최형환; 김성호
Original assignee: (주)이지스
Priority date: 2023-12-08
Filing date: 2023-12-08
Publication date: 2025-06-17
Also published as: WO2025121984A1

Abstract

본 발명은 공간 정보 및 실시간 센서 정보를 활용한 생성형 인공지능의 학습데이터를 생성하는 방법 및 그 장치에 관한 것으로서, 적어도 하나의 프로세서를 포함하는 컴퓨팅 장치에 의해 수행되는, 생성형 인공지능의 학습데이터를 생성하는 방법은, 기 설정된 객체를 포함하는 객체에 대한 위치 기반의 객체 데이터를 획득하는 단계; 상기 위치 기반의 객체 데이터를 기초로 상기 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성하는 단계; 및 상기 문서화된 데이터에 기초하여 학습 데이터를 생성하는 단계;를 포함하는 것이다.The present invention relates to a method for generating learning data of generative artificial intelligence using spatial information and real-time sensor information, and a device therefor. The method for generating learning data of generative artificial intelligence, which is performed by a computing device including at least one processor, comprises the steps of: acquiring location-based object data for an object including a preset object; converting metadata and object attribute data specifying the object based on the location-based object data into a text listing format to generate documented data; and generating learning data based on the documented data.

Description

Method and device for generating learning data for generative artificial intelligence using spatial information and real-time sensor information {METHOD AND DEVICE FOR GENERATING LEARNING DATA FOR GENERATIVE ARTIFICIAL INTELLIGENCE USING SPATIAL INFORMATION AND REAL-TIME SENSOR INFORMATION}

본 발명은 공간 정보 및 실시간 센서 정보를 활용한 생성형 인공지능의 학습데이터를 생성하는 방법 및 그 장치에 관한 것으로서, 텍스트 형태로 존재하지 않는 데이터들을 텍스트 형태의 문서화된 데이터로 변환하여 생성형 AI의 학습데이터로 제공할 수 있는 방법에 관한 것이다.The present invention relates to a method for generating learning data for generative artificial intelligence using spatial information and real-time sensor information, and a device therefor. The present invention relates to a method for converting data that does not exist in text form into documented data in text form and providing the data as learning data for generative AI.

이 부분에 기술된 내용은 단순히 본 발명의 일 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The material described in this section merely provides background information for one embodiment of the present invention and does not constitute prior art.

LLM(LLM; large language model) 등의 초거대 AI(artificial intelligence)는 매우 많은 양의 텍스트 데이터를 사용하여 학습된 인공지능 모델 로서, 자연어 처리 작업을 수행하며, 다양한 언어 모델링 작업에 사용될 수 있다.Large-scale artificial intelligence (AI), such as LLM (LLM; large language model), is an artificial intelligence model trained using a very large amount of text data, performs natural language processing tasks, and can be used for various language modeling tasks.

이러한 초거대 AI는 인공지능 분야에서 가장 크고 복잡한 모델이며, 자연어 처리, 이미지 인식, 음성 인식 등 다양한 인공지능 분야에서 매우 높은 성능을 보이고 있다.These super-large AIs are the largest and most complex models in the field of artificial intelligence, and they show very high performance in various fields of artificial intelligence, such as natural language processing, image recognition, and speech recognition.

대표적인 초거대 AI로는 Open AI에서 개발한 GPT(generative pretrained transformer) 시리즈가 있는데, 특히 자연어 처리 분야에서 뛰어난 성능을 보이고 있으며, 대용량 텍스트 데이터를 이용한 학습을 통해 높은 수준의 언어 이해와 생성 능력을 갖추고 있다.A representative example of a large-scale AI is the GPT (generative pretrained transformer) series developed by Open AI, which has shown outstanding performance in the field of natural language processing and has a high level of language understanding and generation ability through learning using large amounts of text data.

또한, 초거대 AI는 이미 다양한 분야에서 사용되고 있다. 예를 들어, 자연어 처리 분야에서는 자동 번역, 자동 요약, 질문 응답 등의 다양한 응용이 가능하고, 이미지 분석 분야에서는 이미지 분류, 객체 인식 등에 적용될 수 있으며, 음성 인식 분야에서는 음성 합성, 음성 인식 등에 사용될 수 있다.In addition, super-large AI is already being used in various fields. For example, in the field of natural language processing, it can be applied to various applications such as automatic translation, automatic summarization, and question answering. In the field of image analysis, it can be applied to image classification, object recognition, etc., and in the field of speech recognition, it can be used for speech synthesis, speech recognition, etc.

이러한 초거대 AI의 발전은 딥러닝 알고리즘과 하드웨어 기술의 발전에 크게 기인한다. 즉, 딥러닝 알고리즘의 발전으로 초거대 AI 모델은 더욱 복잡한 패턴을 학습할 수 있게 되었으며, 하드웨어 기술의 발전으로 초거대 AI 모델의 학습 속도를 크게 높일 수 있게 되었다.The development of such large-scale AI is largely due to the development of deep learning algorithms and hardware technology. In other words, the development of deep learning algorithms has enabled large-scale AI models to learn more complex patterns, and the development of hardware technology has greatly increased the learning speed of large-scale AI models.

LLM은 인터넷, 책, 신문기사, 블로그 등의 다양한 웹문서 및 텍스트 데이터가 포함된 수백억 개 이상의 문장으로 구성된 학습데이터셋으로 학습될 수 있으며, 자연어 이해, 문장 생성, 기계 번역, 챗봇, 자동 요약 등의 다양한 응용 프로그램에서 사용될 수 있다. 이러한 LLM을 기반으로 하는 딥러닝 모델은 인간이 사용하는 언어를 학습하고, 그 언어로 다양한 작업을 수행하는 인공지능 기술이다. LLM can be trained with a learning dataset consisting of hundreds of billions of sentences containing various web documents and text data such as the Internet, books, newspaper articles, and blogs, and can be used in various applications such as natural language understanding, sentence generation, machine translation, chatbots, and automatic summarization. This deep learning model based on LLM is an artificial intelligence technology that learns the language used by humans and performs various tasks with that language.

LLM을 기반으로 하는 딥러닝 모델은 자연어 이해를 통해 사람이 작성한 문장을 이해하고, 이를 바탕으로 질문에 답변하는 작업을 수행하는 것이 대표적이며, 특히 인간과 대화하는 인공지능인 챗봇과 같은 분야에서 큰 관심을 받고 있다Deep learning models based on LLM are representative of tasks that understand sentences written by people through natural language understanding and answer questions based on this, and are receiving great attention in fields such as chatbots, which are artificial intelligence that converse with humans.

LLM을 기반으로 하는 딥러닝 모델의 대표적인 OpenAI의 ChatGPT, 구글의 Bard와 같은 경우 인터넷 문서를 기반으로 학습되기 때문에, 사전에 학습되지 않은 상황에서 현재 실시간으로 발생하는 데이터에 대한 답변은 성능이 떨어질 수밖에 없다. 특히, LLM을 기반으로 하는 딥러닝 모델은 한해 대략 100만건 정도의 개업 및 폐업을 반복하는 중소기업 및 사업장의 경우에 위치 관련된 사용자 질의에 대한 답변의 정확도가 많이 떨어진다는 문제점이 있다. Representative deep learning models based on LLM, such as OpenAI's ChatGPT and Google's Bard, are trained based on Internet documents, so their performance is bound to be poor when responding to real-time data that has not been trained in advance. In particular, deep learning models based on LLM have the problem that the accuracy of responses to location-related user queries is significantly low in the case of small and medium-sized businesses and workplaces that open and close approximately 1 million times a year.

이와 같이, 텍스트 형태로 문서화되지 않은 데이터 또는 모니터링을 위해 실시간 발생되는 데이터와 관련된 사용자 질의에 대해, LLM을 기반으로 하는 딥러닝 모델은 답변이 불가능하거나 결과값의 정확도가 현저히 낮다는 문제점이 있다.Likewise, deep learning models based on LLM have the problem that they cannot answer user queries related to data that is not documented in text form or data that is generated in real time for monitoring, or that the accuracy of the results is significantly low.

특히, 대화형 인공지능 서비스에 사용되는 생성형 AI는 사용자가 질문을 입력하기 이전의 데이터를 기반으로 학습된 언어 모델의 한계로 인해 인터넷상의 텍스트 형태로 존재하지 않는 지도 데이터와 현재 시점에 발생하는 실시간 데이터의 경우에 학습 자체가 수행될 수 없고, 그로 인해 우리 생활에 밀접한 위치 정보 또는 환경 정보와 관련된 질문에 대해 대부분 답변이 불가능하거나 오류가 많은 답변을 제시하고 있다는 문제점이 있다. In particular, generative AI used in conversational AI services has the problem that learning itself cannot be performed in the case of map data that does not exist in text form on the Internet and real-time data occurring at the present time due to the limitations of language models learned based on data prior to the user entering a question, and as a result, most questions related to location information or environmental information closely related to our lives are either unanswerable or present answers with many errors.

대한민국 등록특허 제10-2570178호 (2023.08.21)Republic of Korea Patent No. 10-2570178 (2023.08.21)

본 발명은 전술한 배경기술에 대응하여 안출된 것으로, 본 발명은 인터넷 및 소셜 네트워크 등 인터넷 상에서 텍스트화 되어 있지 않은 지도, 센서, 기상, 공연 등의 위치 기반의 객체 데이터를 실시간으로 문서화된 데이터로 변환하여 생성형 인공지능 모델의 학습 데이터로 제공할 수 있는 방법을 제공하고자 한다.The present invention has been made in response to the aforementioned background technology, and aims to provide a method for converting location-based object data, such as maps, sensors, weather, and performances, which are not textualized on the Internet, such as the Internet and social networks, into documented data in real time and providing the data as learning data for a generative artificial intelligence model.

다만, 본 발명에서 해결하고자 하는 과제는 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재를 근거로 명확하게 이해될 수 있을 것이다.However, the problems to be solved in the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood based on the description below.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따른 적어도 하나의 프로세서를 포함하는 컴퓨팅 장치에 의해 수행되는, 생성형 인공지능의 학습데이터를 생성하는 방법을 제공하고자 한다. 상기 방법은, 기 설정된 객체를 포함하는 객체에 대한 위치 기반의 객체 데이터를 획득하는 단계; 상기 위치 기반의 객체 데이터를 기초로 상기 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성하는 단계; 및 상기 문서화된 데이터에 기초하여 학습 데이터를 생성하는 단계;를 포함하는 것이다.As a technical means for achieving the above technical task, an object of the present invention is to provide a method for generating learning data of a generative artificial intelligence, which is performed by a computing device including at least one processor according to an embodiment of the present invention. The method includes a step of obtaining location-based object data for an object including a preset object; a step of converting metadata and object attribute data specifying the object based on the location-based object data into a text listing format to generate documented data; and a step of generating learning data based on the documented data.

대안적으로, 상기 위치 기반의 객체 데이터를 기초로 상기 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성하는 단계는, 상기 객체가 적어도 하나 이상의 단위 객체를 포함하는 경우, 상기 단위 객체 별로 상기 단위 객체를 특정하는 메타데이터 및 단위 객체 속성 데이터를 텍스트 나열 방식으로 변환한 문자열을 생성하고, 상기 단위 객체별 문자열을 병합하여 문서화된 데이터를 생성하는 것이다.Alternatively, the step of converting metadata and object property data specifying the object based on the location-based object data into a text listing format to generate documented data includes, when the object includes at least one unit object, generating a string by converting metadata and unit object property data specifying the unit object into a text listing format for each unit object, and merging the strings for each unit object to generate documented data.

대안적으로, 상기 위치 기반의 객체 데이터는, 포인트, 폴리곤 또는 폴리라인 중 어느 하나에 기초한 공간 데이터, 수치고도모델(DEM; digital elevation model) 데이터에 기초한 지형 데이터, 실시간 환경 모니터링으로 생성된 센서 데이터, 공개　API(application programing interface)를 이용한 공공데이터 또는 3차원 공간 정보에 기초한 공간 분석 데이터 중 적어도 어느 하나인 것이다. Alternatively, the location-based object data is at least one of spatial data based on any one of points, polygons or polylines, topographic data based on digital elevation model (DEM) data, sensor data generated through real-time environmental monitoring, public data using an open API (application programming interface), or spatial analysis data based on three-dimensional spatial information.

대안적으로, 상기 메타데이터는, 상기 위치 기반의 객체 데이터가 상기 포인트에 기초한 공간 데이터를 포함하는 경우, 단위 객체의 좌표 정보, 상기 좌표 정보를 지오코딩(geocoding)하여 구조화된 주소 정보, 상기 좌표 정보의 절대 고도 정보, 상기 좌표 정보를 기준으로 기 설정된 변경 영역에 대한 평균 경사도, 경사향 또는 주변 도로 또는 주요 시설들과의 거리를 포함하는 입지 특성 정보 중에서 적어도 어느 하나의 정보를 포함하는 것이다.Alternatively, the metadata includes at least one of coordinate information of a unit object, address information structured by geocoding the coordinate information, absolute altitude information of the coordinate information, and location characteristic information including an average slope, slope direction, or distance to surrounding roads or major facilities for a change area set based on the coordinate information, when the location-based object data includes spatial data based on the point.

대안적으로, 상기 위치 기반의 객체 데이터를 기초로 상기 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성하는 단계는, 텍스트 나열 방식으로 변환된 상기 메타데이터와 상기 단위 객체 속성 데이터를 구분자로 구분하여 상기 단위 객체별 문서열을 생성하는 것이다.Alternatively, the step of converting metadata and object attribute data that specify the object based on the location-based object data into a text listing format and generating documented data is to generate a document string for each unit object by separating the metadata converted into a text listing format and the unit object attribute data with a delimiter.

대안적으로, 상기 메타데이터는, 상기 위치 기반의 객체 데이터가 상기 폴리곤에 기초한 공간 데이터를 포함하는 경우, 상기 폴리곤의 중점 좌표 정보를 기준으로 지오코딩(geocoding)하여 구조화된 주소 정보, 상기 폴리곤의 면적, 경사도, 경사향 또는 해발 고도를 포함하는 입지 특성 정보 중에서 적어도 어느 하나의 정보를 포함하는 것이다.Alternatively, the metadata includes at least one of address information structured by geocoding based on the central coordinate information of the polygon, and location characteristic information including the area, slope, slope direction or elevation above sea level of the polygon, when the location-based object data includes spatial data based on the polygon.

대안적으로, 상기 메타데이터는, 상기 객체데이터가 상기 폴리라인에 기초한 공간 데이터를 포함하는 경우, 상기 폴리라인의 중심점 경위도 좌표 및 시종점 좌표를 포함하는 좌표 정보, 상기 좌표 정보를 지오코딩(geocoding)하여 구조화된 주소 정보, 상기 폴리라인의 라인 길이, 라인 경사도 또는 라인의 평균고도 또는 상기 좌표 정보를 기준으로 기 설정된 변경 영역에 위치한 주변 환경 정보를 포함한 입지 특성 정보 중에서 적어도 어느 하나의 정보를 포함하는 것이다.Alternatively, the metadata includes at least one of coordinate information including the latitude and longitude coordinates of the center point of the polyline and the coordinates of the start and end points, address information structured by geocoding the coordinate information, and location characteristic information including the line length of the polyline, the line slope or the average elevation of the line, or surrounding environment information located in a preset change area based on the coordinate information, when the object data includes spatial data based on the polyline.

대안적으로, 상기 메타데이터는, 상기 객체데이터가 상기 지형 데이터를 포함하는 경우, 기 설정된 기준 해상도를 단위 열로 설정하고, 상기 단위 열을 기준으로 정의된 좌표군의 중심 좌표 정보, 중심 좌표 정보를 기준으로 지오코딩(geocoding)하여 구조화된 주소 정보, 상기 중심 좌표 정보를 기준으로 기 설정된 변경 영역에 대한 고도, 경사도, 경사향 또는 토지피복을 포함하는 입지 특성 정보 정보 중 적어도 어느 하나의 정보를 포함하는 것이다.Alternatively, the metadata, when the object data includes the topographic data, includes at least one of information from among: center coordinate information of a coordinate group defined based on the unit column, structured address information geocoded based on the center coordinate information, and location characteristic information including elevation, slope, slope aspect or land cover for a change area set based on the center coordinate information.

대안적으로, 상기 메타데이터는, 상기 위치 기반의 객체 데이터가 상기 센서 데이터를 포함하는 경우, 기 설정된 시간 단위로 수집된 센서의 위치 정보 및 측정값을 포함하는 것이다.Alternatively, the metadata may include location information and measurement values of sensors collected at preset time units when the location-based object data includes sensor data.

대안적으로, 상기 센서는, 미세먼지 센서, 교량 진동 모니터링 센서 또는 교통량 측정 모니터링 센서 중 어느 하나 이상을 포함하는 IoT 센서인 것이다.Alternatively, the sensor is an IoT sensor including one or more of a fine dust sensor, a bridge vibration monitoring sensor, or a traffic volume measurement monitoring sensor.

대안적으로, 상기 위치 기반의 객체 데이터를 기초로 상기 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성하는 단계는, 상기 위치 기반의 객체 데이터가 공공 데이터를 포함하는 경우, 적어도 하나 이상의 공개　API에 기반하여 기본 학습데이터를 생성하는 단계; 상기 기본 학습데이터에 기초하여 제1 언어 모델을 학습하는 단계; 사전 학습된 제1 언어 모델을 이용하여, 사용자 질의에 매칭되는 공개 API를 결정한 후, 상기 결정된 공개 API를 호출하여 결과값을 수신하는 단계; 및 상기 결과값을 문서화된 데이터로 전환하여 상기 제1 언어 모델의 프롬프트로 제공하는 단계를 포함하는 것이다. Alternatively, the step of converting metadata and object attribute data specifying the object based on the location-based object data into a text listing format and generating documented data includes, if the location-based object data includes public data, the step of generating basic learning data based on at least one public API; the step of learning a first language model based on the basic learning data; the step of determining a public API matching a user query using a pre-learned first language model, and then calling the determined public API to receive a result value; and the step of converting the result value into documented data and providing it as a prompt for the first language model.

대안적으로, 상기 메타데이터는, 상기 객체데이터가 상기 공간 분석 데이터를 포함하는 경우, 객체에 대한 층, 호수를 포함한 공간 범위 정보 또는 상기 객체의 가시권 분석 결과에 기초하여 하나 이상의 환경 요소의 비율에 대한 가시권 정보 중 적어도 어느 하나를 포함하는 것이다. Alternatively, the metadata may include at least one of spatial extent information including layers and lakes for the object, or visibility information about the ratio of one or more environmental elements based on the visibility analysis results of the object, if the object data includes the spatial analysis data.

한편, 상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따른 생성형 인공지능의 학습데이터를 생성하는 컴퓨팅 장치를 제공하고자 한다. 상기 장치는, 적어도 하나의 코어(core)를 포함하는 프로세서(processor); 및 상기 프로세서에서 실행 가능한 프로그램 코드(code)들을 포함하는 메모리(memory);를 포함하고, 상기 프로세서는, 상기 프로그램 코드의 실행에 따라, 것 기 설정된 객체를 포함하는 객체에 대한 위치 기반의 객체 데이터를 획득하고, 상기 위치 기반의 객체 데이터를 기초로 상기 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성하며, 상기 문서화된 데이터에 기초하여 학습 데이터를 생성하는 것이다. Meanwhile, as a technical means for achieving the above-mentioned technical task, an object is to provide a computing device for generating learning data of a generative artificial intelligence according to an embodiment of the present invention. The device includes a processor including at least one core; and a memory including program codes executable by the processor; wherein, when the processor executes the program code, it obtains location-based object data for an object including a preset object, and converts metadata and object attribute data specifying the object based on the location-based object data into a text listing format to generate documented data, and generates learning data based on the documented data.

전술한 본 발명의 과제 해결 수단에 의하면, 본 발명은 인터넷 및 소셜 네트워크 등 인터넷 상에서 텍스트화 되어 있지 않은 지도, 센서, 기상, 공연 등의 위치 기반의 객체 데이터를 실시간으로 문서화된 데이터로 변환하여 생성형 인공지능 모델의 학습 데이터로 제공할 수 있고, 문서화된 데이터에 기초하여 생성형 인공지능 모델이 재학습됨으로 인해 생성형 인공지능 모델이 사용자 질의에 대한 답변의 정확도를 향상시킬 수 있다. According to the problem solving means of the present invention described above, the present invention can convert location-based object data such as maps, sensors, weather, and performances that are not textualized on the Internet, such as the Internet and social networks, into documented data in real time and provide the data as learning data for a generative artificial intelligence model, and since the generative artificial intelligence model is re-learned based on the documented data, the generative artificial intelligence model can improve the accuracy of answers to user queries.

도 1는 본 발명의 일 실시예에 따른 컴퓨팅 장치의 블록 구성도이다.
도 2는 본 개시의 일 실시예에 따른 생성형 인공지능의 학습데이터를 생성하는 방법을 설명하기 위한 순서도이다.
도 3은 본 개시의 일 실시예에 따른 포인트에 기초한 공간 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.
도 4는 본 개시의 일 실시예에 따른 폴리곤에 기초한 공간 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.
도 5는 본 개시의 일 실시예에 따른 폴리라인에 기초한 공간 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.
도 6은 본 개시의 일 실시예에 따른 지형 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.
도 7은 본 개시의 일 실시예에 따른 센서 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.
도 8은 본 개시의 일 실시예에 따른 공공 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.FIG. 1 is a block diagram of a computing device according to one embodiment of the present invention.
FIG. 2 is a flowchart illustrating a method for generating learning data of a generative artificial intelligence according to one embodiment of the present disclosure.
FIG. 3 is a diagram illustrating a learning data generation process for spatial data based on points according to one embodiment of the present disclosure.
FIG. 4 is a diagram illustrating a learning data generation process for spatial data based on polygons according to one embodiment of the present disclosure.
FIG. 5 is a diagram illustrating a learning data generation process for spatial data based on a polyline according to one embodiment of the present disclosure.
FIG. 6 is a diagram illustrating a learning data generation process for terrain data according to one embodiment of the present disclosure.
FIG. 7 is a diagram illustrating a learning data generation process for sensor data according to one embodiment of the present disclosure.
FIG. 8 is a diagram illustrating a learning data generation process for public data according to one embodiment of the present disclosure.

아래에서는 첨부한 도면을 참조하여 본 발명의 기술 분야에서 통상의 지식을 가진 자(이하, 당업자)가 용이하게 실시할 수 있도록 본 발명의 실시예가 상세히 설명된다. 본 발명에서 제시된 실시예들은 당업자가 본 발명의 내용을 이용하거나 또는 실시할 수 있도록 제공된다. 따라서, 본 발명의 실시예들에 대한 다양한 변형들은 당업자에게 명백할 것이다. 즉, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 이하의 실시예에 한정되지 않는다. Hereinafter, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the present invention. The embodiments presented in the present invention are provided so that those skilled in the art can utilize or implement the contents of the present invention. Accordingly, various modifications to the embodiments of the present invention will be apparent to those skilled in the art. That is, the present invention can be implemented in various different forms and is not limited to the embodiments below.

본 발명의 명세서 전체에 걸쳐 동일하거나 유사한 도면 부호는 동일하거나 유사한 구성요소를 지칭한다. 또한, 본 발명을 명확하게 설명하기 위해서, 도면에서 본 발명에 대한 설명과 관계없는 부분의 도면 부호는 생략될 수 있다.Throughout the specification of the present invention, identical or similar drawing reference numerals refer to identical or similar components. In addition, in order to clearly describe the present invention, drawing reference numerals of parts that are not related to the description of the present invention may be omitted in the drawings.

본 발명에서 사용되는 "또는" 이라는 용어는 배타적 "또는" 이 아니라 내포적 "또는" 을 의미하는 것으로 의도된다. 즉, 본 발명에서 달리 특정되지 않거나 문맥상 그 의미가 명확하지 않은 경우, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 이해되어야 한다. 예를 들어, 본 발명에서 달리 특정되지 않거나 문맥상 그 의미가 명확하지 않은 경우, "X는 A 또는 B를 이용한다" 는 X가 A를 이용하거나, X가 B를 이용하거나, 혹은 X가 A 및 B 모두를 이용하는 경우 중 어느 하나로 해석될 수 있다. The term "or" as used herein is intended to mean an inclusive "or" rather than an exclusive "or." That is, unless otherwise specified herein or the context makes clear, "X employs either A or B" should be understood to mean either one of the natural inclusive permutations. For example, unless otherwise specified herein or the context makes clear, "X employs A or B" can be interpreted to mean either X employs A, X employs B, or X employs both A and B.

본 발명에서 사용되는 "A 또는 B 중 적어도 하나" 라는 용어는 A, B, 그리고 A와 B의 조합을 모두 칭하는 것으로 해석되어야 한다.The term "at least one of A or B" as used in the present invention should be interpreted to refer to all of A, B, and combinations of A and B.

본 발명에서 사용되는 "및/또는" 이라는 용어는 열거된 관련 개념들 중 하나 이상의 개념의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.The term "and/or" as used herein should be understood to refer to and include all possible combinations of one or more of the related concepts listed.

본 발명에서 사용되는 "포함한다" 및/또는 "포함하는" 이라는 용어는, 특정 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는" 이라는 용어는, 하나 이상의 다른 특징, 다른 구성요소 및/또는 이들에 대한 조합의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. The terms "comprises" and/or "comprising" as used in the present invention should be understood to mean that certain features and/or components are present. However, it should be understood that the terms "comprises" and/or "comprising" do not exclude the presence or addition of one or more other features, other components, and/or combinations thereof.

본 발명에서 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 단수는 일반적으로 "하나 또는 그 이상" 을 포함할 수 있는 것으로 해석되어야 한다. Unless otherwise specified in the present invention or unless the context makes it clear that the singular form is intended to be referred to, the singular should generally be construed to include “one or more.”

본 발명에서 사용되는 "제 N(N은 자연수)" 이라는 용어는 본 발명의 구성요소들을 기능적 관점, 구조적 관점, 혹은 설명의 편의 등 소정의 기준에 따라 상호 구별하기 위해 사용되는 표현으로 이해될 수 있다. 예를 들어, 본 발명에서 서로 다른 기능적 역할을 수행하는 구성요소들은 제1 구성요소 혹은 제2 구성요소로 구별될 수 있다. 다만, 본 발명의 기술적 사상 내에서 실질적으로 동일하나 설명의 편의를 위해 구분되어야 하는 구성요소들도 제1 구성요소 혹은 제2 구성요소로 구별될 수도 있다.The term "Nth (N is a natural number)" used in the present invention can be understood as an expression used to distinguish the components of the present invention from each other according to a predetermined standard such as a functional viewpoint, a structural viewpoint, or convenience of explanation. For example, components performing different functional roles in the present invention can be distinguished as a first component or a second component. However, components that are substantially the same within the technical idea of the present invention but should be distinguished for convenience of explanation can also be distinguished as a first component or a second component.

한편, 본 발명에서 사용되는 용어 "모듈(module)", 또는 "부(unit)" 는 컴퓨터 관련 엔티티(entity), 펌웨어(firmware), 소프트웨어(software) 혹은 그 일부, 하드웨어(hardware) 혹은 그 일부, 소프트웨어와 하드웨어의 조합 등과 같이 컴퓨팅 자원을 처리하는 독립적인 기능 단위를 지칭하는 용어로 이해될 수 있다. 이때, "모듈" 또는 "부"는 단일 요소로 구성된 단위일 수도 있고, 복수의 요소들의 조합 혹은 집합으로 표현되는 단위일 수도 있다. 예를 들어, 협의의 개념으로서 "모듈" 또는 "부"는 컴퓨팅 장치의 하드웨어 요소 또는 그 집합, 소프트웨어의 특정 기능을 수행하는 응용 프로그램, 소프트웨어 실행을 통해 구현되는 처리 과정(procedure), 또는 프로그램 실행을 위한 명령어 집합 등을 지칭할 수 있다. 또한, 광의의 개념으로서 "모듈" 또는 "부"는 시스템을 구성하는 컴퓨팅 장치 그 자체, 또는 컴퓨팅 장치에서 실행되는 애플리케이션 등을 지칭할 수 있다. 다만, 상술한 개념은 하나의 예시일 뿐이므로, "모듈" 또는 "부"의 개념은 본 발명의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 정의될 수 있다.Meanwhile, the term "module" or "unit" used in the present invention may be understood as a term referring to an independent functional unit that processes computing resources, such as a computer-related entity, firmware, software or a part thereof, hardware or a part thereof, a combination of software and hardware, etc. At this time, the "module" or "unit" may be a unit composed of a single element, or a unit expressed as a combination or set of multiple elements. For example, as a narrow concept, a "module" or "unit" may refer to a hardware element of a computing device or a set thereof, an application program that performs a specific function of software, a processing process implemented through software execution, or a set of instructions for program execution, etc. In addition, as a broad concept, a "module" or "unit" may refer to a computing device itself that constitutes a system, or an application that is executed on a computing device, etc. However, since the above-described concept is only an example, the concept of “module” or “part” may be variously defined within a category understandable to those skilled in the art based on the contents of the present invention.

전술한 용어의 설명은 본 발명의 이해를 돕기 위한 것이다. 따라서, 전술한 용어를 본 발명의 내용을 한정하는 사항으로 명시적으로 기재하지 않은 경우, 본 발명의 내용을 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.The explanation of the terms mentioned above is intended to help understanding of the present invention. Therefore, if the terms mentioned above are not explicitly described as matters limiting the content of the present invention, it should be noted that they are not used to limit the technical idea of the content of the present invention.

도 1은 본 발명의 일 실시예에 따른 컴퓨팅 장치의 블록 구성도이다.FIG. 1 is a block diagram of a computing device according to one embodiment of the present invention.

본 발명의 일 실시예에 따른 컴퓨팅 장치(100)는 데이터의 종합적인 처리 및 연산을 수행하는 하드웨어 장치 혹은 하드웨어 장치의 일부일 수도 있고, 통신 네트워크로 연결되는 소프트웨어 기반의 컴퓨팅 환경일 수도 있다. 예를 들어, 컴퓨팅 장치(100)는 집약적 데이터 처리 기능을 수행하고 자원을 공유하는 주체인 서버일 수도 있고, 서버와의 상호 작용을 통해 자원을 공유하는 클라이언트(client)일 수도 있다. 또한, 컴퓨팅 장치(100)는 복수의 서버들 및 클라이언트들이 상호 작용하여 데이터를 종합적으로 처리할 수 있도록 하는 클라우드 시스템(cloud system)일 수도 있다. 상술한 기재는 컴퓨팅 장치(100)의 종류와 관련된 하나의 예시일 뿐이므로, 컴퓨팅 장치(100)의 종류는 본 발명의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The computing device (100) according to one embodiment of the present invention may be a hardware device or a part of a hardware device that performs comprehensive processing and calculation of data, or may be a software-based computing environment connected to a communication network. For example, the computing device (100) may be a server that performs an intensive data processing function and shares resources, or may be a client that shares resources through interaction with a server. In addition, the computing device (100) may be a cloud system that allows a plurality of servers and clients to interact with each other to comprehensively process data. Since the above description is only one example related to the type of the computing device (100), the type of the computing device (100) may be configured in various ways within a range that can be understood by those skilled in the art based on the contents of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 컴퓨팅 장치(100)는 프로세서(processor)(110), 메모리(memory)(120), 및 네트워크부(network unit)(130)를 포함할 수 있다. 다만, 도 1은 하나의 예시일 뿐이므로, 컴퓨팅 장치(100)는 컴퓨팅 환경을 구현하기 위한 다른 구성들을 포함할 수 있다. 또한, 상기 개시된 구성들 중 일부만이 컴퓨팅 장치(100)에 포함될 수도 있다.Referring to FIG. 1, a computing device (100) according to one embodiment of the present invention may include a processor (110), a memory (120), and a network unit (130). However, FIG. 1 is only an example, and the computing device (100) may include other configurations for implementing a computing environment. In addition, only some of the configurations disclosed above may be included in the computing device (100).

본 발명의 일 실시예에 따른 프로세서(110)는 컴퓨팅 연산을 수행하기 위한 하드웨어 및/또는 소프트웨어를 포함하는 구성 단위로 이해될 수 있다. 예를 들어, 프로세서(110)는 컴퓨터 프로그램을 판독하고, 연산 기능과 제어 기능을 통해 데이터 처리를 수행할 수 있다. 이와 같은 데이터 처리를 수행하기 위한 프로세서(110)는 중앙 처리 장치(CPU: central processing unit), 범용 그래픽 처리 장치(GPGPU: general purpose graphics processing unit), 텐서 처리 장치(TPU: tensor processing unit), 주문형 반도체(ASIC: application specific integrated circuit), 혹은 필드 프로그래머블 게이트 어레이(FPGA: field programmable gate array) 등을 포함할 수 있다. 상술한 프로세서(110)의 종류는 하나의 예시일 뿐이므로, 프로세서(110)의 종류는 본 발명의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The processor (110) according to one embodiment of the present invention may be understood as a configuration unit including hardware and/or software for performing computing operations. For example, the processor (110) may read a computer program and perform data processing through a computing function and a control function. The processor (110) for performing such data processing may include a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), a tensor processing unit (TPU), an application specific integrated circuit (ASIC), or a field programmable gate array (FPGA). Since the type of the processor (110) described above is only an example, the type of the processor (110) may be configured in various ways within a range understandable to those skilled in the art based on the contents of the present invention.

이러한 프로세서(100)는 사전 학습된 제1 언어 모델을 사용하여 위치 기반의 객체 데이터에 대한 사용자 질의에 대한 응답 데이터를 생성할 수 있다. 또한, 프로세서(110)는 제1 언어 모델을 이용한 응답 데이터에 기초하여, 제1 언어 모델 및 제2 언어 모델을 재학습시킬 수 있다. The processor (100) can generate response data for a user query on location-based object data using a pre-learned first language model. In addition, the processor (110) can re-learn the first language model and the second language model based on the response data using the first language model.

이때, 제1 언어 모델은 생성형 인공지능(generative　AI)) 모델일 수 있고, 제2 언어 모델은 대형 언어 모델(LLM; Large Language Model)이 될 수 있다. 따라서, 제2 언어 모델은 인간 언어의 입력 프롬프트를 기반으로 콘텐츠를 생성하는 제1 언어모델에 사용될 수 있다. 따라서, 프로세서(110)는 제2 언어 모델에 기반한 제1 언어 모델을 이용하여, 사용자 질의에 응답하는 에이전트를 생성할 수 있다.At this time, the first language model may be a generative AI model, and the second language model may be a large language model (LLM). Accordingly, the second language model may be used in the first language model that generates content based on an input prompt of human language. Accordingly, the processor (110) may generate an agent that responds to a user query by using the first language model based on the second language model.

여기서, 위치 기반의 객체 데이터는 포인트, 폴리곤 또는 폴리라인 중 어느 하나에 기초한 공간 데이터, 수치고도모델(DEM; digital elevation model) 데이터에 기초한 지형 데이터, 실시간 환경 모니터링을 위한 센서 데이터, 공개　API(application programing interface)를 이용한 공공데이터, 또는 3차원 공간 정보에 기초한 공간 분석 데이터 중 어느 하나가 될 수 있다. Here, location-based object data can be any of spatial data based on points, polygons, or polylines, topographic data based on digital elevation model (DEM) data, sensor data for real-time environmental monitoring, public data using an open API (application programming interface), or spatial analysis data based on three-dimensional spatial information.

위치 기반의 객체 데이터는 좌표, 공간정보 중 해발 고도, 경사도 같은 텍스트 정보와, 이를 제외한 텍스트화되지 않은 비텍스트(Non-Text) 정보를 포함할 수 있다. Location-based object data can include coordinates, text information such as altitude and slope among spatial information, and non-text information that is not converted to text.

프로세서(110)는 신경망 모델의 학습 과정에서 제1 언어 모델 및 제2 언어 모델에 포함된 적어도 하나의 신경망 블록을 표현하는 연산을 수행할 수 있다. The processor (110) can perform an operation representing at least one neural network block included in the first language model and the second language model during the learning process of the neural network model.

프로세서(110)는 상술한 학습 과정을 통해 생성된 제2 언어 모델을 이용하여 제1 언어 모델의 학습 데이터에 사용될 텍스트 정보를 추출할 수 있고, 추출된 텍스트 정보에 기반하여 제1 언어 모델에 사용될 학습데이터를 생성할 수 있다. 프로세서(110)는 상술한 과정을 통해 학습된 제1 언어 모델로 사용자 질의를 입력하여, 질의 시점의 지식 정보를 반영하여 추정한 결과를 나타내는 응답 데이터를 생성할 수 있다. The processor (110) can extract text information to be used in the learning data of the first language model using the second language model generated through the above-described learning process, and can generate learning data to be used in the first language model based on the extracted text information. The processor (110) can input a user query into the first language model learned through the above-described process, and generate response data representing an estimated result reflecting knowledge information at the time of the query.

상술한 예시 이외에도 언어 분포에 따른 학습 데이터셋의 종류 및 제1 언어 모델 및 제2 언어 모델의 출력은 본 개시의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.In addition to the examples described above, the types of learning datasets according to language distribution and the outputs of the first language model and the second language model can be configured in various ways within a category understandable to those skilled in the art based on the contents of the present disclosure.

본 발명의 일 실시예에 따른 메모리(120)는 컴퓨팅 장치(100)에서 처리되는 데이터를 저장하고 관리하기 위한 하드웨어 및/또는 소프트웨어를 포함하는 구성 단위로 이해될 수 있다. 즉, 메모리(120)는 프로세서(110)가 생성하거나 결정한 임의의 형태의 데이터 및 네트워크부(130)가 수신한 임의의 형태의 데이터를 저장할 수 있다. 예를 들어, 메모리(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리, 램(RAM: random access memory), 에스램(SRAM: static random access memory), 롬(ROM: read-only memory), 이이피롬(EEPROM: electrically erasable programmable read-only memory), 피롬(PROM: programmable read-only memory), 자기 메모리, 자기 디스크, 또는 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리(120)는 데이터를 소정의 체제로 통제하여 관리하는 데이터베이스(database) 시스템을 포함할 수도 있다. 상술한 메모리(120)의 종류는 하나의 예시일 뿐이므로, 메모리(120)의 종류는 본 발명의 내용을 기초로 당업자가 이해 가능한 범주에서 다양하게 구성될 수 있다.The memory (120) according to one embodiment of the present invention may be understood as a configuration unit including hardware and/or software for storing and managing data processed in the computing device (100). That is, the memory (120) may store any type of data generated or determined by the processor (110) and any type of data received by the network unit (130). For example, the memory (120) may include at least one type of storage medium among a flash memory type, a hard disk type, a multimedia card micro type, a card type memory, a RAM (random access memory), a SRAM (static random access memory), a ROM (read-only memory), an EEPROM (electrically erasable programmable read-only memory), a PROM (programmable read-only memory), a magnetic memory, a magnetic disk, or an optical disk. In addition, the memory (120) may also include a database system that controls and manages data in a predetermined system. The type of memory (120) described above is only an example, and thus the type of memory (120) can be configured in various ways within a range understandable to those skilled in the art based on the contents of the present invention.

본 발명의 일 실시예에 따른 네트워크부(130)는 임의의 형태의 공지된 유무선 통신 시스템을 통해 데이터를 송수신하는 구성 단위로 이해될 수 있다. 예를 들어, 네트워크부(130)는 근거리 통신망(LAN: local area network), 광대역 부호 분할 다중 접속(WCDMA: wideband code division multiple access), 엘티이(LTE: long term evolution), 와이브로(WiBro: wireless broadband internet), 5세대 이동통신(5G), 초광역대 무선통신(ultra-wide-band), 지그비(ZigBee), 무선주파수(RF: radio frequency) 통신, 무선랜(wireless LAN), 와이파이(wireless fidelity), 근거리 무선통신(NFC: near field communication), 또는 블루투스(Bluetooth) 등과 같은 유무선 통신 시스템을 사용하여 데이터 송수신을 수행할 수 있다. 상술한 통신 시스템들은 하나의 예시일 뿐이므로, 네트워크부(130)의 데이터 송수신을 위한 유무선 통신 시스템은 상술한 예시 이외에 다양하게 적용될 수 있다.The network unit (130) according to one embodiment of the present invention may be understood as a configuration unit that transmits and receives data via any type of known wired and wireless communication system. For example, the network unit (130) may perform data transmission and reception using a wired and wireless communication system such as a local area network (LAN), wideband code division multiple access (WCDMA), long term evolution (LTE), wireless broadband internet (WiBro), fifth generation mobile communication (5G), ultra-wide-band, ZigBee, radio frequency (RF) communication, wireless LAN, wireless fidelity, near field communication (NFC), or Bluetooth. Since the above-described communication systems are only examples, the wired and wireless communication system for data transmission and reception of the network unit (130) may be applied in various ways other than the above-described examples.

도 2는 본 개시의 일 실시예에 따른 생성형 인공지능의 학습데이터를 생성하는 방법을 설명하기 위한 순서도이다.FIG. 2 is a flowchart illustrating a method for generating learning data of a generative artificial intelligence according to one embodiment of the present disclosure.

도 2를 참조하면, 컴퓨팅 장치(100)는 기 설정된 객체를 포함하는 객체에 대한 텍스트 정보 또는 비텍스트 정보를 포함하는 위치 기반의 객체 데이터를 획득한다(S100). Referring to FIG. 2, a computing device (100) obtains location-based object data including text information or non-text information about an object including a preset object (S100).

컴퓨팅 장치(100)는 획득된 위치 기반의 객체 데이터를 기초로 객체를 특정하는 메타데이터 및 객체 속성 데이터를 텍스트 나열 방식으로 변환하여 문서화된 데이터로 생성한다(S200). 이때, 컴퓨팅 장치(100)는 사전 학습된 제2 언어 모델을 이용하여, 구조화되지 않은　텍스트를 구조화된 형식으로　변환할 수 있다. The computing device (100) converts metadata and object attribute data that specify an object based on the acquired location-based object data into a text listing format to generate documented data (S200). At this time, the computing device (100) can convert unstructured text into a structured format using a pre-learned second language model.

컴퓨팅 장치(100)는 객체가 적어도 하나 이상의 단위 객체를 포함하는 경우, 단위 객체 별로 단위 객체를 특정하는 메타데이터 및 단위 객체 속성 데이터를 텍스트 나열 방식으로 변환한 문자열을 생성하고, 단위 객체별 문자열을 병합하여 문서화된 데이터를 생성한다. The computing device (100) generates a string by converting metadata and unit object attribute data that specify each unit object into a text listing format when an object includes at least one unit object, and merges the strings for each unit object to generate documented data.

이때, 컴퓨팅 장치(100)는 텍스트 나열 방식으로 변환된 메타데이터와 객체 속성 데이터를 쉼표(,), 특수 기호(*, #) 등의 구분자로 구분하여 단위 객체별 문서열을 생성한다. At this time, the computing device (100) generates a document string for each object by separating the metadata and object attribute data converted into a text listing format with a delimiter such as a comma (,) or special symbol (*, #).

컴퓨팅 장치(100)는 문서화된 데이터에 기초하여 학습 데이터를 생성하고, 생성된 학습 데이터를 제1 언어 모델에 제공함으로써, 학습 데이터에 기초하여 제1 언어 모델이 학습될 수 있도록 한다(S300). A computing device (100) generates learning data based on documented data and provides the generated learning data to a first language model, thereby enabling the first language model to be trained based on the learning data (S300).

도 3은 본 개시의 일 실시예에 따른 포인트에 기초한 공간 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다. FIG. 3 is a diagram illustrating a learning data generation process for spatial data based on points according to one embodiment of the present disclosure.

공간 데이터를 표현하는 데이터 타입으로 벡터와 래스터가 있다. 벡터 타임은 실세계를 포인트, 폴리라인, 폴리곤의 기하학적 형태로 표현되는 반면에 래스터는 실세계를 격자형의 픽셀(또는 셀(cell)) 형태로 표현한다. 포인트(point)는 대부분의 좌표계에 의한 위도와 경도의 조합으로 구성되며, 폴리곤으로 표현되기 어려울 정도로 작을 경우에 유용하다. 예를 들어, 세계지도에서 서울의 위치를 나타날 때, 포인트를 사용할 수 있다. 폴리라인은 간선으로 연결된 2개 이상의 정점으로 이루어지며, 자연스러운 선의 형태로 표현하는 경우에 사용된다. 예를 들어, 서울시 지도에서 지하철 노선을 표시하는 경우에 폴리라인을 사용될 수 있다. 폴리곤은 폐쇄된 연결 형태로 구성된 3개 이상의 정점으로 이루어지며, 특정 영역의 경계를 표현하는 경우에 사용된다. 예를 들어, 대한민국 지도에서 서울의 경계 표시를 하는 경우에 폴리곤을 사용할 수 있다. There are vectors and rasters as data types that express spatial data. Vector time expresses the real world in the geometric form of points, polylines, and polygons, while raster expresses the real world in the form of grid-shaped pixels (or cells). Points are composed of a combination of latitude and longitude according to most coordinate systems, and are useful when they are too small to be expressed as polygons. For example, points can be used to indicate the location of Seoul on a world map. Polylines are composed of two or more vertices connected by edges, and are used when expressing in the form of natural lines. For example, polylines can be used when indicating subway lines on a map of Seoul. Polygons are composed of three or more vertices in a closed connection form, and are used when expressing the boundaries of a specific area. For example, polygons can be used when indicating the boundaries of Seoul on a map of South Korea.

포인트에 기초한 공간 데이터인 경우, 메타데이터는 단위 객체의 좌표(x, y) 정보, 좌표 정보를 지오코딩(geocoding)하여 구조화된 주소 정보, 좌표 정보의 절대 고도 정보, 좌표 정보를 기준으로 기 설정된 변경 영역에 대한 평균 경사도, 경사향, 주변 도로 또는 주요 시설들과의 거리를 포함하는 입지 특성 정보 중에서 어느 하나 이상의 정보를 포함할 수 있다. In the case of spatial data based on points, metadata may include one or more of the following information: coordinate (x, y) information of the unit object, structured address information by geocoding the coordinate information, absolute elevation information of the coordinate information, and location characteristic information including average slope, slope direction, and distance to surrounding roads or major facilities for a preset change area based on the coordinate information.

여기서, 단위 객체는 학습데이터의 단위열 기준 포인트일 수 있고, 주요 시설은 학교, 지하철역, 버스 정류장 등의 편의 시설, 교육 시설, 의료 시설, 음식점, 금융 기관 등을 포함할 수 있다. Here, the unit object can be a point based on the unit sequence of the learning data, and the main facilities can include convenience facilities such as schools, subway stations, and bus stops, educational facilities, medical facilities, restaurants, and financial institutions.

따라서, 컴퓨팅 장치(100)는 단위열 기준 포인트를 추출하고(S211), 텍스트 형태로 메타데이터를 입력하여, 해당 포인트 정보가 기존에 보유한 객체 속성 데이터를 구분자로 구분하여 단위 객체별 문자열을 생성한다(S212). Accordingly, the computing device (100) extracts a unit column reference point (S211), inputs metadata in text form, and generates a string for each unit object by separating the object attribute data previously held by the point information with a delimiter (S212).

컴퓨팅 장치(100)는 포인트에 기초한 공간 데이터 내 모든 단위 객체에 대해 메타데이터 입력 및 객체 속성 데이터를 입력하여 단위 객체별 문자열을 생성한 후, 생성된 단위 객체별 문자열을 병합하여 문서화된 데이터를 생성한다(S213). The computing device (100) inputs metadata and object attribute data for all unit objects in point-based spatial data to generate a string for each unit object, and then merges the generated strings for each unit object to generate documented data (S213).

제1 언어 모델은 포인트에 대한 문서화된 데이터에 기초한 학습 데이터를 이용해 학습될 수 있다. 따라서, 제1 언어 모델은 사용자 단말 또는 응용 프로그램으로부터 "강남역에서 10분 거리 이내의 중국집 리스트를 알려줘"라는 사용자 질의가 입력되면, 인공지능 기반의 메타데이터 검색을 통해 사용자 질의에 대한 응답 데이터를 제공할 수 있다. The first language model can be trained using training data based on documented data for points. Therefore, when a user query such as "Please tell me a list of Chinese restaurants within 10 minutes from Gangnam Station" is input from a user terminal or application, the first language model can provide response data to the user query through artificial intelligence-based metadata search.

도 4는 본 개시의 일 실시예에 따른 폴리곤에 기초한 공간 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.FIG. 4 is a diagram illustrating a learning data generation process for spatial data based on polygons according to one embodiment of the present disclosure.

도 4에 도시된 바와 같이, 컴퓨팅 장치(100)는 폴리곤에 기초한 공간 데이터인 경우, 폴리곤의 중점 좌표 정보를 기준으로 지오코딩(geocoding)하여 구조화된 주소 정보, 폴리곤의 면적, 경사도, 경사향, 해발 고도를 포함하는 입지 특성 정보 중에서 어느 하나 이상의 정보를 포함하는 메타데이터를 입력한다. As illustrated in FIG. 4, in the case of spatial data based on a polygon, the computing device (100) inputs metadata including at least one piece of information from among structured address information, area of the polygon, slope, slope direction, and location characteristic information including the elevation above sea level, by geocoding based on the central coordinate information of the polygon.

컴퓨팅 장치(100)는 해당 단위 객체에 대한 메타데이터와 단위 객체 속성 정보를 구분자로 구분하여 하나의 문자열을 생성하고, 폴리곤 내 모든 단위 객체에 대한 문자열 생성 과정을 반복 수행한 후 해당 폴리곤에 대한 문서화된 데이터를 생성한다. The computing device (100) separates metadata for the unit object and unit object attribute information using a delimiter to create a single string, repeats the string creation process for all unit objects in the polygon, and then creates documented data for the polygon.

제1 언어 모델은 폴리곤에 대한 문서화된 데이터에 기초한 학습 데이터를 이용해 학습될 수 있다. 따라서, 제1 언어 모델은 "서울에서 가장 높은 곳에 위치한 건물은 무엇인가?", "부산에서 가장 면적이 넓은 아파트 단지는 어디인가?", ”대한민국에서 제일 가격이 비싼 땅의 지적 정보를 알려줘.” 등의 사용자 질의에 대한 응답 데이터를 제공할 수 있다. The first language model can be trained using training data based on documented data about polygons. Therefore, the first language model can provide response data to user queries such as “What is the tallest building in Seoul?”, “What is the largest apartment complex in Busan?”, and “Tell me the land price information of the most expensive land in Korea.”

도 5는 본 개시의 일 실시예에 따른 폴리라인에 기초한 공간 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.FIG. 5 is a diagram illustrating a learning data generation process for spatial data based on a polyline according to one embodiment of the present disclosure.

도 5에 도시된 바와 같이, 컴퓨팅 장치(100)는 폴리라인에 기초한 공간 데이터인 경우, 폴리라인의 중심점 경위도 좌표 및 시종점 좌표를 포함하는 좌표 정보, 좌표 정보를 지오코딩(geocoding)하여 구조화된 주소 정보, 폴리라인의 라인 길이, 라인 경사도, 라인의 평균고도 또는 좌표 정보를 기준으로 기 설정된 변경 영역에 위치한 주변 환경 정보(바다, 강과의 거리 등)를 포함한 입지 특성 정보 중에서 어느 하나 이상의 정보를 포함하는 메타데이터를 입력한다.As illustrated in FIG. 5, the computing device (100) inputs metadata including at least one piece of information from among coordinate information including the latitude and longitude coordinates of the center point of the polyline and the coordinates of the start and end points, structured address information obtained by geocoding the coordinate information, the line length of the polyline, the line slope, the average altitude of the line, or location characteristic information including surrounding environment information (distance to the sea, river, etc.) located in a preset change area based on the coordinate information, in the case of spatial data based on a polyline.

컴퓨팅 장치(100)는 해당 단위 객체에 대한 메타데이터와 단위 객체 속성 정보를 구분자로 구분하여 하나의 문자열을 생성하고, 폴리라인 내 모든 단위 객체에 대한 문자열 생성 과정을 반복 수행한 후 해당 폴리라인에 대한 문서화된 데이터를 생성한다. The computing device (100) separates metadata for the corresponding unit object and unit object attribute information using a delimiter to create a single string, repeats the string creation process for all unit objects within the polyline, and then creates documented data for the corresponding polyline.

제1 언어 모델은 폴리라인에 대한 문서화된 데이터에 기초한 학습 데이터를 이용해 학습될 수 있다. 따라서, 제1 언어 모델은 "서울에서 가장 긴 도로명은 무엇인가?", "바다를 보면서 드라이브할 수 있는 경로를 알려줘.” 등의 사용자 질의에 대한 응답 데이터를 제공할 수 있다. The first language model can be trained using training data based on documented data about polylines. Thus, the first language model can provide response data for user queries such as “What is the longest road name in Seoul?”, “Tell me a route where I can drive while looking at the ocean.”

도 6은 본 개시의 일 실시예에 따른 지형 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.FIG. 6 is a diagram illustrating a learning data generation process for terrain data according to one embodiment of the present disclosure.

도 6을 참조하면, 위치 기반의 객체 데이터가 지형 데이터인 경우에, 컴퓨팅 장치(100)는 기 설정된 기준 해상도를 단위 열로 설정한다(S221). 예를 들어, 컴퓨팅 장치(100)는 기준 해상도를 10cm로 설정하고, 1m 기준으로 단위 열을 구성하는 경우, 단위 면적(가로×세로) 당 10개의 데이터를 하나의 단위 열로 하여 텍스트로 변환할 수 있다. Referring to FIG. 6, if the location-based object data is terrain data, the computing device (100) sets the preset reference resolution as a unit column (S221). For example, if the computing device (100) sets the reference resolution to 10 cm and configures the unit column based on 1 m, 10 data per unit area (width × height) can be converted into text as one unit column.

컴퓨팅 장치(100)는 설정된 단위 열을 기준으로 정의된 좌표군의 중심 좌표 정보, 중심 좌표 정보를 기준으로 지오코딩(geocoding)하여 구조화된 주소 정보, 중심 좌표 정보를 기준으로 기 설정된 변경 영역에 대한 고도, 경사도, 경사향, 토지피복(산, 강, 도로 등)을 포함하는 입지 특성 정보 정보 중 어느 하나 이상의 정보를 메타데이터로 입력한다(S222). The computing device (100) inputs one or more pieces of information as metadata among the center coordinate information of a coordinate group defined based on a set unit column, structured address information geocoded based on the center coordinate information, and location characteristic information including altitude, slope, slope direction, and land cover (mountains, rivers, roads, etc.) for a pre-set change area based on the center coordinate information (S222).

이때, 컴퓨팅 장치(100)는 지형 데이터가 등고 데이터인 경우에 공지된 DEM 변환 도구를 이용하여 DEM 데이터로 변환한 후에 변환된 DEM 데이터에 기초하여 메타데이터를 입력한다. At this time, if the topographic data is contour data, the computing device (100) converts it into DEM data using a known DEM conversion tool and then inputs metadata based on the converted DEM data.

컴퓨팅 장치(100)는 해당 단위 객체에 대한 메타데이터와 단위 객체 속성 정보를 구분자로 구분하여 하나의 문자열을 생성하고, 지형 데이터 내 모든 단위 객체에 대한 문자열 생성 과정을 반복 수행한 후 해당 지형 데이터에 대한 문서화된 데이터를 생성한다. The computing device (100) separates metadata and unit object attribute information for the corresponding unit object using a delimiter to create a single string, repeats the string creation process for all unit objects in the terrain data, and then creates documented data for the corresponding terrain data.

제1 언어 모델은 지형 데이터에 대한 문서화된 데이터에 기초한 학습 데이터를 이용해 학습될 수 있다. 따라서, 제1 언어 모델은 "산사태 가능성이 높은 경사도 30도 이상의 침엽수 지형을 찾아줘"등의 사용자 질의에 대한 응답 데이터를 제공할 수 있다. The first language model can be trained using training data based on documented data on terrain data. Thus, the first language model can provide response data for user queries such as "Find coniferous terrain with a slope of 30 degrees or more with a high possibility of landslides."

도 7은 본 개시의 일 실시예에 따른 센서 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.FIG. 7 is a diagram illustrating a learning data generation process for sensor data according to one embodiment of the present disclosure.

도 7을 참조하면, 컴퓨팅 장치(100)는 센서 데이터의 경우에, IoT 장비 및 표준 센서와 통신하여 센서의 위치 정보 및 측정값을 수집하여 메타데이터로 입력하고(S231), 메타데이터를 텍스트로 변환하여 기 설정된 시간 주기로 문서화된 데이터를 생성한다(S232, S233). Referring to FIG. 7, in the case of sensor data, the computing device (100) communicates with IoT equipment and standard sensors to collect location information and measurement values of the sensors and inputs them as metadata (S231), and converts the metadata into text to generate documented data at preset time intervals (S232, S233).

이때, 센서는 객체에 대한 상태 감지를 위한 미세먼지 센서, 교량 진동 모니터링 센서 또는 교통량 측정 모니터링 센서 중 어느 하나 이상을 포함하는 IoT 센서일 수 있다. At this time, the sensor may be an IoT sensor including at least one of a fine dust sensor for detecting the status of an object, a bridge vibration monitoring sensor, or a traffic volume measurement monitoring sensor.

컴퓨팅 장치(100)는 문서화된 데이터를 문서 파일(문서 A, 문서 B, 문서 C 등)에 임베딩하고(S234), 데이터베이스에 해당 문서 파일을 저장한다(S235). The computing device (100) embeds documented data in a document file (document A, document B, document C, etc.) (S234) and stores the document file in a database (S235).

예를 들어, 제1 언어 모델은"현재 미세먼지가 가장 심각한 지역은 어디야?", "현재, 서울에서 차가 가장 많이 막히는 지역은 어디야?" 등의 사용자 질의에 대해 실시간으로 데이터베이스에 저장된 문서 파일을 이용하여 응답 데이터를 생성할 수 있다. For example, the first language model can generate response data in real time using document files stored in a database for user queries such as "Where is the area with the worst fine dust at present?" and "Where is the area with the most traffic congestion in Seoul at present?"

도 8은 본 개시의 일 실시예에 따른 공공 데이터에 대한 학습 데이터 생성 과정을 설명하는 도면이다.FIG. 8 is a diagram illustrating a learning data generation process for public data according to one embodiment of the present disclosure.

도 8을 참조하면, 위치 기반의 객체 데이터가 기상정보(태풍, 비, 날씨 등), 공연 정보와 같은 공공 데이터인 경우, 컴퓨팅 장치(100)는 적어도 하나 이상의 공개　API를 포함하는 공개 API 리스트에 기반하여 기본 학습데이터를 생성하고(S241), 기본 학습데이터에 기초하여 제1 언어 모델을 학습한다(S242). Referring to FIG. 8, if the location-based object data is public data such as weather information (typhoon, rain, weather, etc.) and performance information, the computing device (100) generates basic learning data based on a public API list including at least one public API (S241), and learns a first language model based on the basic learning data (S242).

컴퓨팅 장치(100)는 사용자 질의가 입력되면(S243), 사전 학습된 제1 언어 모델을 이용하여 사용자 질의에 매칭되는 공개 API를 결정하고(S244), 결정된 공개 API를 호출하여 결과값을 수신한다(S245, S246).When a user query is input (S243), the computing device (100) determines a public API matching the user query using a pre-learned first language model (S244), and calls the determined public API to receive a result value (S245, S246).

컴퓨팅 장치(100)는 수신된 결과값을 문서화된 데이터로 전환하여 제1 언어 모델의 프롬프트로 제공한다(S246). 따라서, 제1 언어 모델은 프롬프트 입력에 기반하여 응답 데이터를 사용자에게 반환한다(S247).The computing device (100) converts the received result value into documented data and provides it as a prompt for the first language model (S246). Accordingly, the first language model returns response data to the user based on the prompt input (S247).

예를 들어, 제1 언어 모델은 "이번 태풍 영향권에 언제쯤 들것 같아?", "내일 예술의 전당에서 하는 공연이 뭐야?" 등의 사용자 질의에 대해 실시간으로 사용자 질의에 매칭되는 정보들을 조회하여 답변할 수 있다.For example, the first language model can answer user queries such as “When do you think this typhoon will affect us?” or “What performance is on at the Seoul Arts Center tomorrow?” by searching for information matching the user query in real time.

한편, 위치 기반의 객체 데이터가 3차원 공간 정보에 기초한 공간 분석 데이터인 경우, 컴퓨팅 장치(100)는 객체에 대한 층, 호수를 포함한 공간 범위 정보와, 객체의 가시권 분석 결과에 기초하여 해당 객체의 가시 영역 내 하나 이상의 환경 요소(하늘, 강, 바다, 숲, 공원, 도로 등)의 비율에 대한 가시권 정보를 메타데이터로 입력한다. Meanwhile, if the location-based object data is spatial analysis data based on three-dimensional spatial information, the computing device (100) inputs spatial range information including the floor and lake for the object, and visibility information about the ratio of one or more environmental elements (sky, river, sea, forest, park, road, etc.) within the visible area of the object based on the result of the visibility analysis of the object as metadata.

이와 같이, 본 발명은 지도 정보, 센서의 측정값, 기상 정보, 공연 정보 등의 위치 기반의 객체 데이터를 텍스트 형태의 문서화된 데이터로 변환하여 생성형 인공지능 모델의 학습 데이터로 제공할 수 있다.In this way, the present invention can convert location-based object data such as map information, sensor measurements, weather information, and performance information into documented data in text form and provide it as learning data for a generative artificial intelligence model.

따라서, 생성형 인공지능 모델은 위치 기반의 객체 데이터에 대한 문서화된 데이터에 기초하여 재학습되어, 기존의 생성형 AI가 답변하지 못하는 사용자 질의에 대해 정확한 답변을 제공할 수 있다. 예를 들어, 생성형 인공지능 모델은 지도상의 위치 관련된 사용자 질의에 대해 건물의 면적, 토지, 객체간 상호 위치 관계 등 정확한 주소 정보와 좌표 정보에 기반한 질의 응답이 가능해지며, 기상, 재난재해, 센서 모니터링과 같은 실시간 데이터에 대한 질의 응답도 실시간 수집되는 데이터에 기반하여 정확한 답변을 제공할 수 있다. Therefore, the generative AI model can be retrained based on documented data on location-based object data, and can provide accurate answers to user queries that existing generative AI cannot answer. For example, the generative AI model can answer questions about location-related user queries on a map based on accurate address information and coordinate information, such as building area, land, and mutual positional relationships between objects. It can also answer questions about real-time data such as weather, disasters, and sensor monitoring based on real-time collected data.

앞서 설명된 본 발명의 다양한 실시예는 추가 실시예와 결합될 수 있고, 상술한 상세한 설명에 비추어 당업자가 이해 가능한 범주에서 변경될 수 있다. 본 발명의 실시예들은 모든 면에서 예시적인 것이며, 한정적이 아닌 것으로 이해되어야 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다. 따라서, 본 발명의 특허청구범위의 의미, 범위 및 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The various embodiments of the present invention described above can be combined with additional embodiments and can be modified within the scope that can be understood by those skilled in the art in light of the detailed description described above. It should be understood that the embodiments of the present invention are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise, components described as distributed may be implemented in a combined form. Accordingly, all changes or modifications derived from the meaning, scope, and equivalent concept of the claims of the present invention should be interpreted as being included in the scope of the present invention.

100 : 컴퓨팅 장치
110 : 프로세서
120 : 메모리
130 : 네트워크부100 : Computing Device
110 : Processor
120 : Memory
130 : Network Department

Claims

A method for generating learning data for generative artificial intelligence, performed by a computing device including at least one processor,
A step of obtaining location-based object data for an object including a preset object;
A step of converting metadata and object attribute data that specify the object based on the above location-based object data into a text listing format to create documented data; and
A step of generating learning data based on the above documented data;
Including,
method.

In the first paragraph,
The step of converting metadata and object attribute data that specify the object based on the above location-based object data into a text listing format and creating documented data is as follows.
If the above object includes at least one unit object, a string is created by converting metadata and unit object attribute data that specify the unit object for each unit object into a text listing format, and the strings for each unit object are merged to create documented data.
method.

In the first paragraph,
The above location-based object data is,
At least one of spatial data based on any one of points, polygons or polylines, topographic data based on digital elevation model (DEM) data, sensor data generated through real-time environmental monitoring, public data using an open application programming interface (API), or spatial analysis data based on three-dimensional spatial information.
method.

In the third paragraph,
The above metadata is,
In the case where the above location-based object data includes spatial data based on the above point, it includes at least one piece of information from among coordinate information of the unit object, address information structured by geocoding the coordinate information, absolute altitude information of the coordinate information, and location characteristic information including average slope, slope direction, or distance to surrounding roads or major facilities for a change area set based on the coordinate information.
method.

In paragraph 4,
The step of converting metadata and object attribute data that specify the object based on the above location-based object data into a text listing format and creating documented data is as follows.
The metadata converted into a text listing format and the unit object attribute data are separated by a delimiter to create a document string for each unit object.
method.

In the third paragraph,
The above metadata is,
In the case where the above location-based object data includes spatial data based on the polygon, it includes at least one piece of information among structured address information by geocoding based on the central coordinate information of the polygon, and location characteristic information including the area, slope, slope direction or elevation above sea level of the polygon.
method.

In the third paragraph,
The above metadata is,
In the case where the object data includes spatial data based on the polyline, it includes at least one of coordinate information including the latitude and longitude coordinates of the center point of the polyline and the coordinates of the start and end points, address information structured by geocoding the coordinate information, and location characteristic information including the line length of the polyline, the line slope or the average altitude of the line, or the surrounding environment information located in a preset change area based on the coordinate information.
method.

In the third paragraph,
The above metadata is,
In the case where the object data includes the terrain data, the preset standard resolution is set as a unit column, and at least one of the following information is included: center coordinate information of a coordinate group defined based on the unit column, address information structured by geocoding based on the center coordinate information, and location characteristic information including elevation, slope, slope direction or land cover for a preset change area based on the center coordinate information.
method.

In the third paragraph,
The above metadata is,
If the above location-based object data includes the sensor data, it includes the location information and measurement values of the sensor collected in preset time units.
method.

In Article 9,
The above sensor,
An IoT sensor including at least one of a fine dust sensor, a bridge vibration monitoring sensor, or a traffic volume measurement monitoring sensor,
method.

In the third paragraph,
The step of converting metadata and object attribute data that specify the object based on the above location-based object data into a text listing format and creating documented data is as follows.
If the above location-based object data includes public data, a step of generating basic learning data based on at least one public API;
A step of learning a first language model based on the above basic learning data;
A step of determining a public API matching a user query using a pre-trained first language model, and then calling the determined public API to receive a result value; and
Including the step of converting the above result value into documented data and providing it as a prompt of the first language model.
method.

In the third paragraph,
The above metadata is,
If the object data includes the spatial analysis data, at least one of the spatial range information including the layer and lake for the object or the visibility information for the ratio of one or more environmental elements based on the visibility analysis result of the object.
method.

As a computing device that generates learning data for generative artificial intelligence,
a processor comprising at least one core; and
A memory including program codes executable by the processor;
The above processor, upon execution of the above program code,
Obtain location-based object data for objects containing pre-defined objects,
Based on the above location-based object data, metadata and object attribute data that specify the object are converted into text listing format to create documented data.
Generating training data based on the above documented data,
device.