CN108427600B - Data task processing method, application server and computer readable storage medium - Google Patents
Data task processing method, application server and computer readable storage medium Download PDFInfo
- Publication number
 - CN108427600B CN108427600B CN201810066359.7A CN201810066359A CN108427600B CN 108427600 B CN108427600 B CN 108427600B CN 201810066359 A CN201810066359 A CN 201810066359A CN 108427600 B CN108427600 B CN 108427600B
 - Authority
 - CN
 - China
 - Prior art keywords
 - data
 - task
 - dependency
 - tasks
 - running
 - Prior art date
 - Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 - Active
 
Links
Images
Classifications
- 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F9/00—Arrangements for program control, e.g. control units
 - G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
 - G06F9/46—Multiprogramming arrangements
 - G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
 - G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
 - G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
 - G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
 
 - 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F9/00—Arrangements for program control, e.g. control units
 - G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
 - G06F9/46—Multiprogramming arrangements
 - G06F9/48—Program initiating; Program switching, e.g. by interrupt
 - G06F9/4806—Task transfer initiation or dispatching
 - G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
 - G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
 
 - 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F11/00—Error detection; Error correction; Monitoring
 - G06F11/30—Monitoring
 - G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
 
 - 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F9/00—Arrangements for program control, e.g. control units
 - G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
 - G06F9/44—Arrangements for executing specific programs
 - G06F9/445—Program loading or initiating
 - G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
 
 - 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F9/00—Arrangements for program control, e.g. control units
 - G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
 - G06F9/44—Arrangements for executing specific programs
 - G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
 - G06F9/4494—Execution paradigms, e.g. implementations of programming paradigms data driven
 
 - 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F9/00—Arrangements for program control, e.g. control units
 - G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
 - G06F9/44—Arrangements for executing specific programs
 - G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
 - G06F9/45504—Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
 - G06F9/45508—Runtime interpretation or emulation, e g. emulator loops, bytecode interpretation
 - G06F9/45512—Command shells
 
 - 
        
- G—PHYSICS
 - G06—COMPUTING OR CALCULATING; COUNTING
 - G06F—ELECTRIC DIGITAL DATA PROCESSING
 - G06F2209/00—Indexing scheme relating to G06F9/00
 - G06F2209/48—Indexing scheme relating to G06F9/48
 - G06F2209/484—Precedence
 
 
Landscapes
- Engineering & Computer Science (AREA)
 - Software Systems (AREA)
 - Theoretical Computer Science (AREA)
 - Physics & Mathematics (AREA)
 - General Engineering & Computer Science (AREA)
 - General Physics & Mathematics (AREA)
 - Computing Systems (AREA)
 - Quality & Reliability (AREA)
 - Computer And Data Communications (AREA)
 - Debugging And Monitoring (AREA)
 - Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
 
Abstract
The invention discloses a data task processing method, which comprises the following steps: acquiring a task list from the terminal equipment; configuring the task dependents to analyze the dependency relationship between the data and the tasks; recording the execution process of data synchronization; judging whether the data are synchronous or not according to the execution process of the data synchronization and the dependency relationship between the data and the task; if the data is synchronized, executing the task of completing the data synchronization; and if the data are not synchronized, sending out early warning information. The invention also provides an application server and a computer readable storage medium. The data task processing method, the application server and the computer readable storage medium provided by the invention can judge whether the data are synchronous or not according to the execution process of data synchronization and the dependency relationship between the data and the task, and realize that the task can be executed only when the data are synchronized.
    Description
Technical Field
      The present invention relates to the field of data analysis, and in particular, to a data task processing method, an application server, and a computer-readable storage medium.
    Background
      Hadoop is an open source distributed infrastructure, and a user can develop a distributed program without knowing details of a distributed bottom layer. Hadoop implements a distributed file system that provides high transmission rates to access data of applications, suitable for applications with very large data sets. Generally, scheduling of tasks is achieved by using ozzie, but task scheduling based on the ozzie alone cannot analyze dependency relationship between data and tasks, for example, a model task needs to be executed after data in a period of time runs out synchronously, but dependency control of the data and the tasks is disordered and is difficult to find after a problem occurs.
    Disclosure of Invention
      In view of this, the present invention provides a data task processing method, an application server and a computer-readable storage medium, which can obtain a help document corresponding to a current page operation by analyzing the requested help information, thereby improving user experience.
      Firstly, in order to achieve the above object, the present invention provides a data task processing method, which is applied to an application server, and the method includes:
      acquiring a task list from the terminal equipment;
      configuring the task dependents to analyze the dependency relationship between the data and the tasks;
      recording the execution process of data synchronization;
      judging whether the data are synchronous or not according to the execution process of the data synchronization and the dependency relationship between the data and the task;
      if the data is synchronized, executing the task of completing the data synchronization;
      and if the data are not synchronized, sending out early warning information.
      Optionally, the step of configuring a task dependency in the task list to analyze a dependency relationship between data and a task specifically includes the following steps:
      obtaining an effective dependency (relier) configuration of a process node of the task;
      executing the dependency state query statement and outputting an original dependency result;
      combining a plurality of task nodes, completing the dependency state, and removing the duplication of the dependency result;
      and marking a dependency configuration slice label for the de-duplicated dependency result, and completing analysis of the dependency relationship of all tasks.
      Optionally, the step of executing the task of completing the data synchronization if the data synchronization is completed includes the following steps:
      acquiring a running task and a re-running task in waiting;
      and executing the running round task and the re-running task.
      Optionally, before the step of executing the running turn task and the rerun task, the method further includes the following steps:
      sequencing the running tasks and the re-running tasks according to the priority level;
      and preferentially executing the task with the high level.
      Optionally, the method further comprises the steps of:
      monitoring a currently executed task;
      and when abnormality occurs in the task execution process, early warning is sent out.
      In addition, in order to achieve the above object, the present invention further provides an application server, where the application server includes a memory and a processor, the memory stores a data task processing system that can run on the processor, and the data task processing system, when executed by the processor, implements the following steps:
      acquiring a task list from the terminal equipment;
      configuring the task dependents to analyze the dependency relationship between the data and the tasks;
      recording the execution process of data synchronization;
      judging whether the data are synchronous or not according to the execution process of the data synchronization and the dependency relationship between the data and the task;
      if the data is synchronized, executing the task of completing the data synchronization;
      and if the data are not synchronized, sending out early warning information.
      Optionally, the step of configuring a task dependency in the task list to analyze a dependency relationship between data and a task specifically includes the following steps:
      obtaining an effective dependency (relier) configuration of a process node of the task;
      executing the dependency state query statement and outputting an original dependency result;
      combining a plurality of task nodes, completing the dependency state, and removing the duplication of the dependency result;
      and marking a dependency configuration slice label for the de-duplicated dependency result, and finishing the scheduling dependency of all tasks.
      Optionally, the step of executing the task of completing the data synchronization if the data synchronization is completed includes the following steps:
      acquiring a running task and a re-running task in waiting;
      sequencing the running tasks and the re-running tasks according to the priority level;
      and preferentially executing the task with the high level.
      Optionally, when executed by the processor, the data task processing system further implements the following steps:
      monitoring a currently executed task;
      and when abnormality occurs in the task execution process, early warning is sent out.
      Further, to achieve the above object, the present invention also provides a computer readable storage medium storing a data task processing system, which is executable by at least one processor to cause the at least one processor to perform the steps of the data task processing method as described above.
      Compared with the prior art, the application server, the data task processing method and the computer readable storage medium provided by the invention have the advantages that firstly, a task list is obtained from the terminal equipment; then, configuring the task dependents to analyze the dependency relationship between the data and the tasks; then, recording the execution process of data synchronization; further, whether the data are synchronous or not is judged according to the execution process of the data synchronization and the dependency relationship between the data and the tasks; finally, if the data is synchronized, executing the task of completing the data synchronization; and if the data are not synchronized, sending out early warning information. Therefore, the defect of disordered dependence control of data and tasks in the prior art can be avoided, whether the data are synchronous or not can be judged according to the execution process of data synchronization and the dependence relationship of the data and the tasks, and the task can be executed only when the data are synchronized.
    Drawings
      FIG. 1 is a schematic diagram of an alternative application environment for various embodiments of the present invention;
      FIG. 2 is a schematic diagram of an alternative hardware architecture of the application server of FIG. 1;
      FIG. 3 is a schematic diagram of program modules of a first embodiment of a data task processing system in accordance with the present invention;
      FIG. 4 is a schematic diagram of program modules of a second embodiment of a data task processing system in accordance with the present invention;
      FIG. 5 is a flowchart illustrating a data task processing method according to a first embodiment of the present invention;
      FIG. 6 is a flowchart illustrating a data task processing method according to a second embodiment of the present invention;
      reference numerals:
      
      
      the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
    Detailed Description
      In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
      It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
      Fig. 1 is a schematic diagram of an alternative application environment according to various embodiments of the present invention.
      In the present embodiment, the present invention can be applied to an application environment including, but not limited to, the terminal device  1, the application server  2, and the network 3. Among them, the terminal apparatus  1 may be a mobile apparatus such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, a car-mounted device, etc., and a fixed terminal such as a digital TV, a desktop computer, a notebook, a server, etc. The application server  2 may be a rack server, a blade server, a tower server, or a rack server, and the application server  2 may be an independent server or a server cluster composed of a plurality of servers. The network 3 may be a wireless or wired network such as an Intranet (Internet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, or a communication network.
      The application server  2 is in communication connection with one or more terminal devices  1 through the network 3 respectively to perform data transmission and interaction.
      Fig. 2 is a schematic diagram of an alternative hardware architecture of the application server  2 in fig. 1.
      In this embodiment, the application server  2 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other through a system bus. It is noted that fig. 1 only shows the application server  2 with components 11-13, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
      The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the application server  2, such as a hard disk or a memory of the application server  2. In other embodiments, the memory 11 may also be an external storage device of the application server  2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the application server  2. Of course, the memory 11 may also comprise both an internal storage unit of the application server  2 and an external storage device thereof. In this embodiment, the memory 11 is generally used for storing an operating system installed in the application server  2 and various types of application software, such as program codes of the data task processing system 200. Furthermore, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.
      The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the application server  2. In this embodiment, the processor 12 is configured to run the program code stored in the memory 11 or process data, for example, run the data task processing system 200.
      The network interface 13 may comprise a wireless network interface or a wired network interface, and the network interface 13 is generally used for establishing a communication connection between the application server  2 and other electronic devices. In this embodiment, the network interface 13 is mainly used to connect the application server  2 with one or more terminal devices  1 through the network 3, and establish a data transmission channel and a communication connection between the application server  2 and the one or more terminal devices  1.
      The application environment and the hardware structure and function of the related devices of the various embodiments of the present invention have been described in detail so far. Hereinafter, various embodiments of the present invention will be proposed based on the above-described application environment and related devices.
      First, the present invention provides a data task processing system 200.
      Referring now to FIG. 3, a program module diagram of a first embodiment of a data task processing system 200 in accordance with the present invention is shown.
      In this embodiment, the data task processing system 200 includes a series of computer program instructions stored on the memory 11, which when executed by the processor 12, may implement the data task processing operations of the embodiments of the present invention. In some embodiments, data task processing system 200 may be divided into one or more modules based on the particular operations implemented by the portions of the computer program instructions. For example, in fig. 3, the data task processing system 200 can be divided into an acquisition module 201, a configuration module 202, a recording module 203, a judgment module 204, an execution module 205, and an early warning module 206. Wherein:
      the acquiring module 201 is configured to acquire the task list from the terminal device  1.
      Specifically, a hadoop data platform center is built in the application server  2, the hadoop data platform center acquires data from the external terminal device  1, when the application server  2 processes the data according to the data acquired by the hadoop data platform center, operations such as data acquisition, data cleaning, data analysis and the like need to be performed, each process may involve a plurality of tasks, some tasks need to be performed sequentially, and some tasks may be performed in parallel.
      In this embodiment, the application server  2 acquires the task list from the terminal device  1 through the acquisition module 201. The application server  2 manages the execution and sequence of these tasks through oozie. The oozie is a hadoop-based scheduler, writes a scheduling flow in the form of xml, and can schedule mr, pig, hive, shell, jar and the like. The application server  2 executes task flow nodes in order through the oozie, and supports fork (branching multiple nodes) and join (combining multiple nodes into one node).
      The configuration module 202 is configured to configure a dependency of a task in the task list, so as to configure a dependency relationship between data and the task.
      Specifically, the task configuration dependents configure the dependency relationship between data and tasks, and only tasks with complete data can be executed. In this embodiment, the application server  2 completes scheduling dependency of all tasks by obtaining effective relier (dependency) configuration of a task flow node fork, executing a relier state query statement, outputting an original result, further merging a plurality of task nodes, complementing a dependency state, removing a duplicate of a dependency result, and finally labeling a dependency configuration slice label for the removed dependency result.
      Please refer to table 1, which shows the requirement of the dependent configuration format in this embodiment.
      TABLE 1
      
      
      In this embodiment, the application server  2 loads a configuration file through live load, overwrites a configuration table, collects the latest configuration file from the production environment, and requests operation deployment after modification, and deploys a command to implement configuration of a dependency:
      step 1: upload script and authorize, format (allow private user action), upload to/tmp directory
      chmod 777/tmp/relier_config_all.txt
      Step 2: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 3: executing commands
      hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;
      truncate table fm_relier_check_script;
      load data local inpath'/tmp/relier_config_all.txt'into table aml_awbs.fm_relier_check_script;"
      In another embodiment of the invention, an implementation configuration dependency may deploy a command by modifying the configuration:
      step 1: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 2: executing commands
      hive-e"set mapred.job.queue.name=queue_0006_02;
      insert overwrite table aml_awbs.fm_relier_check_script
      select relier_name,
      src_job_name,
      if(relier_name='i_jt-aml-999-cd',
      'select concat(y,m,d)datestr,\'Y\'state from aml_awbs.JOB_STATE where JOB_NAME=\'jt-aml-999-cd\″,
      relier_name)script_string,
      fork
      from aml_awbs.fm_relier_check_script"
      The recording module 203 is configured to record an execution process of data synchronization of the task.
      Specifically, as can be seen from the above, only the tasks with complete data can be executed, and therefore, in order to ensure that the hadoop data platform center obtains data from the external terminal device  1, when there is an update or modification to the data, the application server  2 records the execution process of data synchronization through the recording module 203. In this embodiment, the recording module 203 creates a log and a state table by using the shell, and records the execution process of data synchronization and the execution time of data synchronization.
      The judging module 204 is configured to judge whether the data is completed synchronously according to the execution process of the data synchronization and the dependency relationship between the data and the task.
      Specifically, before executing the task, the application server  2 determines whether the data has been completed synchronously through the determining module 204. And the application server  2 judges whether the data are synchronized according to the execution process of the data synchronization recorded in the log and the state table created by the shell, the execution time of the data synchronization and the dependency relationship between the data and the task.
      The execution module 205 is configured to, when the data has been synchronized, execute the task that has completed the data synchronization.
      The early warning module 206 is configured to send out early warning information if the data is not synchronized.
      Specifically, the execution module 205 will execute the task only when the data is synchronized, i.e. the data is complete. When the data is not synchronized, the early warning module 206 sends early warning information, which includes, but is not limited to, data information that synchronization is not completed, time of last synchronization, and the like, to notify a worker to perform manual intervention.
      Through the program module 201 and 206, the data task processing system 200 provided by the present invention first obtains a task list from the terminal device  1; then, configuring the task dependents to analyze the dependency relationship between the data and the tasks; then, recording the execution process of data synchronization; further, whether the data are synchronous or not is judged according to the execution process of the data synchronization and the dependency relationship between the data and the tasks; finally, if the data is synchronized, executing the task of completing the data synchronization; and if the data are not synchronized, sending out early warning information. Therefore, the defect of disordered dependence control of data and tasks in the prior art can be avoided, whether the data are synchronous or not can be judged according to the execution process of data synchronization and the dependence relationship of the data and the tasks, and the task can be executed only when the data are synchronized.
      Further, a second embodiment of the present invention (as shown in fig. 4) is proposed based on the above-described first embodiment of the data task processing system 200 of the present invention. In this embodiment, the data task processing system 200 further comprises a sorting module 207, wherein,
      the obtaining module 201 is further configured to obtain a running task in the waiting period;
      as can be seen from the above, in the first embodiment, the task of having completed the data synchronization is executed only when the data synchronization is completed. In the present embodiment, the tasks include, but are not limited to, a wheel race task and a rerun task. The running round task refers to a task which is executed circularly in the effective date, and the running rerun task refers to a task which needs to be executed again after the execution fails.
      Specifically, the application server  2 acquires the running turn task in the waiting state through the acquisition module 201, determines whether the configuration of the dependents is satisfied, and analyzes the effective running turn date series of the task on the premise that the configuration of the dependents is satisfied.
      Referring to table 2, the requirements are configured for the turn round task:
      TABLE 2
      
      
      In this embodiment, the code for implementing the running turn task is:
      inserting a configuration table:
      step 1: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 2: executing commands
      hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;
      insert overwrite table aml_awbs.fm_model_config
      select
      board,series,model,date_start,date_end,ask_execute,ask_export,desc_interval,d
      esc_relier
      from aml_awbs.fm_model_config
      where regexp_replace(upper(concat(board,series,model)),″,″)<>
      regexp_replace(upper(concat('ky','zq','1214-13')),″,″)
      union all
      select'ky'board,'zq'series,'1214-13'model,
      '20150101'date_start,'20990101'date_end,
      'Y'ask_execute,'N'ask_export,
      'w:d:1'desc_interval,
      'i_jt-aml-investzq-import-cd:15'desc_relier
      from default.dual"
      The obtaining module 201 is further configured to obtain a running task waiting for the running task;
      specifically, please refer to table 3, which is a configuration requirement of the rerun task in an implementation of the present invention:
      TABLE 3
      
      
      In this embodiment, the code for implementing the rerun task is:
      inserting a configuration table:
      step 1: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 2: executing commands
      hive-e"set mapred.job.queue.name=queue_0006_02;
      insert into table aml_awbs.fm_model_task_rerun_set
      select'ky','zq','1214-25','20141202','y','y','1.0'from default.dual"
      The sorting module 207 is further configured to sort the round running tasks and the re-running tasks according to the priority level.
      The execution module 205 is further configured to preferentially execute a task with a high level.
      Specifically, in this embodiment, the sorting module 207 performs priority level sorting on the round running task and the rerun task according to the time sequence of the acquired tasks. It will be appreciated that in other embodiments of the invention, the priority requirements may be set according to actual requirements.
      The early warning module 206 is further configured to monitor a currently executed task, and issue an early warning when an abnormality occurs in the task execution process.
      Specifically, the application server  2 monitors the currently executed task through the early warning module 206, and when an abnormality occurs in the task execution process, sends out an early warning to notify the staff to process in time.
      Through the program module 207, the data task processing system 200 provided by the present invention can also sequence the acquired running tasks and the re-running tasks according to the priority levels, preferentially execute the tasks with high levels, monitor the currently executed tasks, and send out an early warning when an abnormality occurs in the task execution process, thereby implementing the task supervision.
      In addition, the invention also provides a data task processing method.
      Fig. 5 is a schematic flow chart of a data task processing method according to a first embodiment of the present invention. In this embodiment, the execution order of the steps in the flowchart shown in fig. 5 may be changed and some steps may be omitted according to different requirements.
      In step S301, a task list is acquired from the terminal device  1.
      Specifically, a hadoop data platform center is built in the application server  2, the hadoop data platform center acquires data from the external terminal device  1, when the application server  2 processes the data according to the data acquired by the hadoop data platform center, operations such as data acquisition, data cleaning, data analysis and the like need to be performed, each process may involve a plurality of tasks, some tasks need to be performed sequentially, and some tasks may be performed in parallel.
      In this embodiment, the application server  2 acquires a task list from the terminal device  1. The application server  2 manages the execution and sequence of these tasks through oozie. The oozie is a hadoop-based scheduler, writes a scheduling flow in the form of xml, and can schedule mr, pig, hive, shell, jar and the like. The application server  2 executes task flow nodes in order through the oozie, and supports fork (branching multiple nodes) and join (combining multiple nodes into one node).
      Step S302, configuring the task dependents in the task list to configure the dependency relationship between the data and the tasks.
      Specifically, the task configuration dependents configure the dependency relationship between data and tasks, and only tasks with complete data can be executed. In this embodiment, the application server  2 completes scheduling dependency of all tasks by obtaining effective relier (dependency) configuration of a task flow node fork, executing a relier state query statement, outputting an original result, further merging a plurality of task nodes, complementing a dependency state, removing a duplicate of a dependency result, and finally labeling a dependency configuration slice label for the removed dependency result.
      Please refer to table 1, which shows the requirement of the dependent configuration format in this embodiment.
      TABLE 1
      
      In this embodiment, the application server  2 loads a configuration file through live load, overwrites a configuration table, collects the latest configuration file from the production environment, and requests operation deployment after modification, and deploys a command to implement configuration of a dependency:
      step 1: upload script and authorize, format (allow private user action), upload to/tmp directory
      chmod 777/tmp/relier_config_all.txt
      Step 2: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 3: executing commands
      hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;
      truncate table fm_relier_check_script;
      load data local inpath'/tmp/relier_config_all.txt'into table aml_awbs.fm_relier_check_script;"
      In another embodiment of the invention, an implementation configuration dependency may deploy a command by modifying the configuration:
      step 1: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 2: executing commands
      hive-e"set mapred.job.queue.name=queue_0006_02;
      insert overwrite table aml_awbs.fm_relier_check_script
      select relier_name,
      src_job_name,
      if(relier_name='i_jt-aml-999-cd',
      'select concat(y,m,d)datestr,\'Y\'state from aml_awbs.JOB_STATE where JOB_NAME=\'jt-aml-999-cd\″,
      relier_name)script_string,
      fork
      from aml_awbs.fm_relier_check_script"
      Step S303, recording the execution process of the data synchronization of the task.
      Specifically, as can be seen from the above, only the tasks with complete data can be executed, and therefore, in order to ensure that the hadoop data platform center obtains the data from the external terminal device  1, the application server  2 records the execution process of data synchronization when the data is updated or modified. In this embodiment, the application server  2 creates a log and a state table by using a shell, and records an execution process of data synchronization and an execution time of data synchronization.
      Step S304, judging whether the data are synchronously finished according to the execution process of the data synchronization and the dependency relationship between the data and the task.
      Specifically, before executing the task, the application server  2 first determines whether the data has been completed synchronously. And the application server  2 judges whether the data are synchronously finished or not according to the execution process of the data synchronization recorded in the log and the state table created by the shell, the execution time of the data synchronization and the dependency relationship between the data and the task.
      In step S305, when the data synchronization is completed, the task of completing the data synchronization is executed.
      And step S306, if the data are not synchronized, sending out early warning information.
      Specifically, the application server  2 will perform the task only if the data is synchronized, i.e. the data is complete. When the data is not synchronized, the application server  2 sends out warning information, which includes, but is not limited to, data information that synchronization is not completed, time of last synchronization, and the like, to notify a worker to perform manual intervention.
      Through the steps S301 to S306, the data task processing method provided by the present invention first obtains a task list from the terminal device  1; then, configuring the task dependents to analyze the dependency relationship between the data and the tasks; then, recording the execution process of data synchronization; further, whether the data are synchronous or not is judged according to the execution process of the data synchronization and the dependency relationship between the data and the tasks; finally, if the data is synchronized, executing the task of completing the data synchronization; and if the data are not synchronized, sending out early warning information. Therefore, the defect of disordered dependence control of data and tasks in the prior art can be avoided, whether the data are synchronous or not can be judged according to the execution process of data synchronization and the dependence relationship of the data and the tasks, and the task can be executed only when the data are synchronized.
      Further, based on the above-described first embodiment of the data task processing method of the present invention, a second embodiment of the data task processing method of the present invention is proposed.
      Fig. 6 is a schematic flow chart of a data task processing method according to a second embodiment of the present invention. In this embodiment, the method further includes the steps of:
      step S401, a waiting running task is obtained;
      as can be seen from the above, in the first embodiment, the task is executed only when the data synchronization is completed. In the present embodiment, the tasks include, but are not limited to, a wheel race task and a rerun task. The running round task refers to a task which is executed circularly in the effective date, and the running rerun task refers to a task which needs to be executed again after the execution fails.
      Specifically, the application server  2 obtains the running turn task in the waiting state, determines whether the configuration of the dependents is satisfied, and analyzes the effective running turn date series of the task on the premise that the configuration of the dependents is satisfied.
      Referring to table 2, the requirements are configured for the turn round task:
      TABLE 2
      
      
      In this embodiment, the code for implementing the running turn task is:
      inserting a configuration table:
      step 1: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 2: executing commands
      hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;
      insert overwrite table aml_awbs.fm_model_config
      select
      board,series,model,date_start,date_end,ask_execute,ask_export,desc_interval,d
      esc_relier
      from aml_awbs.fm_model_config
      where regexp_replace(upper(concat(board,series,model)),″,″)<>
      regexp_replace(upper(concat('ky','zq','1214-13')),″,″)
      union all
      select'ky'board,'zq'series,'1214-13'model,
      '20150101'date_start,'20990101'date_end,
      'Y'ask_execute,'N'ask_export,
      'w:d:1'desc_interval,
      'i_jt-aml-investzq-import-cd:15'desc_relier
      from default.dual"
      Step S402, acquiring a running task waiting for the running;
      specifically, please refer to table 3, which is a configuration requirement of the rerun task in an implementation of the present invention:
      TABLE 3
      
      
      In this embodiment, the code for implementing the rerun task is:
      inserting a configuration table:
      step 1: switch user (if your private user allows to execute hive command, may not switch)
      sudo su-hduser0006
      Step 2: executing commands
      hive-e"set mapred.job.queue.name=queue_0006_02;
      insert into table aml_awbs.fm_model_task_rerun_set
      select'ky','zq','1214-25','20141202','y','y','1.0'from default.dual"
      And step S403, sorting the running round tasks and the running resuming tasks according to the priority level.
      In step S404, the task with the higher rank is executed with priority.
      Specifically, in this embodiment, the application server  2 performs priority level ranking on the turn running task and the rerun task according to the time sequence of the acquired tasks. It will be appreciated that in other embodiments of the invention, the priority requirements may be set according to actual requirements.
      And S405, monitoring the currently executed task, and sending out an early warning when an abnormality occurs in the task execution process.
      Specifically, the application server  2 monitors the currently executed task, and when an abnormality occurs in the task execution process, an early warning is given out to notify the staff of timely processing.
      Through the steps S401-S405, the data task processing method provided by the invention can also sequence the acquired running tasks and the re-running tasks according to the priority level, preferentially execute the tasks with high priority levels, monitor the currently executed tasks, and send out early warning when abnormality occurs in the task execution process, thereby realizing the task supervision.
      The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
      Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
      The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
    Claims (8)
1. A data task processing method is applied to an application server, and is characterized by comprising the following steps:
      acquiring a task list from the terminal equipment;
      configuring a task dependency in the task list to analyze a dependency relationship between data and a task, wherein the step of configuring the task dependency in the task list to analyze the dependency relationship between the data and the task specifically comprises the following steps:
      obtaining effective dependency configuration of the process nodes of the task;
      executing the dependency state query statement and outputting an original dependency result;
      combining a plurality of task nodes, completing the dependency state, and removing the duplication of the dependency result;
      labeling a dependency configuration slice label for the de-duplicated dependency result, and completing analysis of dependency relationships of all tasks;
      recording the execution process of the data synchronization of the task;
      judging whether the data are synchronous according to the execution process of the data synchronization and the dependency relationship between the data and the task, wherein the judging whether the data are synchronous according to the execution process of the data synchronization and the dependency relationship between the data and the task comprises the following steps: judging whether the data are synchronously completed according to the execution process of the data synchronization recorded in the log and the state table created by the shell, the execution time of the data synchronization and the dependency relationship between the data and the tasks;
      if the data is synchronized, executing the task of completing the data synchronization;
      and if the data are not synchronized, sending out early warning information.
    2. The data task processing method of claim 1, wherein the step of executing the task that has completed data synchronization if the data has been completed synchronously specifically comprises the steps of:
      acquiring a running task and a re-running task in waiting;
      and executing the running round task and the re-running task.
    3. The data task processing method of claim 2, wherein the step of performing the turn running task and the rerun task is preceded by the steps of:
      sequencing the running tasks and the re-running tasks according to the priority level;
      and preferentially executing the task with the high level.
    4. A data task processing method according to claim 3, characterized in that said method further comprises the steps of:
      monitoring a currently executed task;
      and when abnormality occurs in the task execution process, early warning is sent out.
    5. An application server, comprising a memory, a processor, the memory having stored thereon a data task processing system operable on the processor, the data task processing system when executed by the processor implementing the steps of:
      acquiring a task list from the terminal equipment;
      configuring the task dependents to analyze the dependency relationship between the data and the tasks, wherein the step of configuring the task dependents to analyze the dependency relationship between the data and the tasks specifically comprises the following steps:
      obtaining effective dependency configuration of the process nodes of the task;
      executing the dependency state query statement and outputting an original dependency result;
      combining a plurality of task nodes, completing the dependency state, and removing the duplication of the dependency result;
      labeling a dependency configuration slice label for the de-duplicated dependency result, and completing analysis of dependency relationships of all tasks;
      recording the execution process of data synchronization;
      judging whether the data are synchronous according to the execution process of the data synchronization and the dependency relationship between the data and the task, wherein the judging whether the data are synchronous according to the execution process of the data synchronization and the dependency relationship between the data and the task comprises the following steps: judging whether the data are synchronously completed according to the execution process of the data synchronization recorded in the log and the state table created by the shell, the execution time of the data synchronization and the dependency relationship between the data and the tasks;
      if the data is synchronized, executing the task of completing the data synchronization;
      and if the data are not synchronized, sending out early warning information.
    6. The application server of claim 5, wherein the step of executing the task that has completed the data synchronization if the data has been synchronized includes the following steps:
      acquiring a running task and a re-running task in waiting;
      sequencing the running tasks and the re-running tasks according to the priority level;
      and preferentially executing the task with the high level.
    7. The application server of claim 6, wherein the data task processing system, when executed by the processor, further performs the steps of:
      monitoring a currently executed task;
      and when abnormality occurs in the task execution process, early warning is sent out.
    8. A computer-readable storage medium storing a data task processing system executable by at least one processor to cause the at least one processor to perform the steps of the data task processing method according to any one of claims 1-4.
    Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201810066359.7A CN108427600B (en) | 2018-01-24 | 2018-01-24 | Data task processing method, application server and computer readable storage medium | 
| PCT/CN2018/089192 WO2019144552A1 (en) | 2018-01-24 | 2018-05-31 | Data task processing method, application server and computer-readable storage medium | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201810066359.7A CN108427600B (en) | 2018-01-24 | 2018-01-24 | Data task processing method, application server and computer readable storage medium | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN108427600A CN108427600A (en) | 2018-08-21 | 
| CN108427600B true CN108427600B (en) | 2021-03-16 | 
Family
ID=63156041
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201810066359.7A Active CN108427600B (en) | 2018-01-24 | 2018-01-24 | Data task processing method, application server and computer readable storage medium | 
Country Status (2)
| Country | Link | 
|---|---|
| CN (1) | CN108427600B (en) | 
| WO (1) | WO2019144552A1 (en) | 
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN113946626A (en) * | 2021-10-26 | 2022-01-18 | 中国平安人寿保险股份有限公司 | Data synchronization detection method, device, computer equipment and storage medium | 
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN102129390A (en) * | 2011-03-10 | 2011-07-20 | 中国科学技术大学苏州研究院 | Task scheduling system of on-chip multi-core computing platform and method for task parallelization | 
| CN102750179A (en) * | 2011-04-22 | 2012-10-24 | 中国移动通信集团河北有限公司 | Method and device for scheduling tasks between cloud computing platform and data warehouse | 
| CN103873567A (en) * | 2014-03-03 | 2014-06-18 | 北京智谷睿拓技术服务有限公司 | Task-based data transmission method and data transmission device | 
| CN104092591A (en) * | 2014-08-04 | 2014-10-08 | 飞狐信息技术(天津)有限公司 | Task monitoring method and system | 
| CN106980543A (en) * | 2017-04-05 | 2017-07-25 | 福建智恒软件科技有限公司 | The distributed task dispatching method and device triggered based on event | 
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN103116525A (en) * | 2013-01-24 | 2013-05-22 | 贺海武 | Map reduce computing method under internet environment | 
| CN104615486B (en) * | 2014-12-26 | 2019-07-02 | 北京京东尚科信息技术有限公司 | For searching for the multi-task scheduling of Extension Software Platform and executing methods, devices and systems | 
| CN106294496B (en) * | 2015-06-09 | 2020-02-07 | 北京京东尚科信息技术有限公司 | Data migration method and tool based on hadoop cluster | 
| CN105184470A (en) * | 2015-08-28 | 2015-12-23 | 浪潮软件股份有限公司 | Message mode-based method for integrating task lists of multiple business systems | 
- 
        2018
        
- 2018-01-24 CN CN201810066359.7A patent/CN108427600B/en active Active
 - 2018-05-31 WO PCT/CN2018/089192 patent/WO2019144552A1/en not_active Ceased
 
 
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN102129390A (en) * | 2011-03-10 | 2011-07-20 | 中国科学技术大学苏州研究院 | Task scheduling system of on-chip multi-core computing platform and method for task parallelization | 
| CN102750179A (en) * | 2011-04-22 | 2012-10-24 | 中国移动通信集团河北有限公司 | Method and device for scheduling tasks between cloud computing platform and data warehouse | 
| CN103873567A (en) * | 2014-03-03 | 2014-06-18 | 北京智谷睿拓技术服务有限公司 | Task-based data transmission method and data transmission device | 
| CN104092591A (en) * | 2014-08-04 | 2014-10-08 | 飞狐信息技术(天津)有限公司 | Task monitoring method and system | 
| CN106980543A (en) * | 2017-04-05 | 2017-07-25 | 福建智恒软件科技有限公司 | The distributed task dispatching method and device triggered based on event | 
Also Published As
| Publication number | Publication date | 
|---|---|
| WO2019144552A1 (en) | 2019-08-01 | 
| CN108427600A (en) | 2018-08-21 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| JP6695984B2 (en) | Multitask scheduling method, system, application server, and computer-readable storage medium | |
| CN111190823A (en) | UI automation test method, electronic device and computer readable storage medium | |
| CN108153849B (en) | Database table segmentation method, device, system and medium | |
| WO2019201039A1 (en) | Method and system for updating application and application server | |
| WO2019000629A1 (en) | Multi-data-source data synchronizing method and system, application server and computer readable storage medium | |
| US20150227605A1 (en) | Information processing terminal, synchronization control method, and computer-readable recording medium | |
| CN110362418B (en) | Abnormal data recovery method and device, server and storage medium | |
| CN105871587A (en) | Log uploading method and device | |
| CN110572464A (en) | Bottom layer resource state synchronization method and device, cloud platform system and storage medium | |
| CN103401698A (en) | Monitoring system used for alarming server status in server cluster operation | |
| CN107391303B (en) | Data processing method, device, system, server and computer storage medium | |
| US9116808B2 (en) | Method and system for determining device configuration settings | |
| CN112015534B (en) | Configurable platform scheduling method, system and storage medium | |
| WO2019095667A1 (en) | Database data collection method, application server, and computer readable storage medium | |
| CN111342992A (en) | Method and system for processing equipment information change record | |
| US9380001B2 (en) | Deploying and modifying a service-oriented architecture deployment environment model | |
| CN111580948A (en) | Task scheduling method, device and computer equipment | |
| CN106101710A (en) | A kind of distributed video transcoding method and device | |
| CN108427600B (en) | Data task processing method, application server and computer readable storage medium | |
| CN111431951B (en) | Data processing method, node equipment, system and storage medium | |
| CN113704203A (en) | Log file processing method and device | |
| CN108829574B (en) | Test data laying method, test server and computer readable storage medium | |
| CN102571965B (en) | Method and device for starting service group in high-availability cluster | |
| CN110766374A (en) | Asset automation control method, device, system, equipment and storage medium | |
| CN110661886B (en) | Component mounting method and device | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |