CN117182929A

CN117182929A - Flexible control method and device for on-orbit assembly of double-arm robot

Info

Publication number: CN117182929A
Application number: CN202311461575.9A
Authority: CN
Inventors: 刘磊; 曹钰雪; 谢心如; 刘乃龙; 徐拴锋; 张强; 张涛
Original assignee: Beijing Institute of Control Engineering
Current assignee: Beijing Institute of Control Engineering
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2023-12-08
Anticipated expiration: 2043-11-06
Also published as: CN117182929B

Abstract

The invention relates to the technical field of compliance control of a two-arm space robot, and in particular to a compliance control method and device for on-orbit assembly of a two-arm robot. The method includes: obtaining the current motion state of the two-arm robot; inputting the motion state of the two-arm robot into a pre-trained target model to obtain the desired trajectory of the target object at the current moment and the control of the impedance control model adapted to the current environment Parameters; among them, the target model is obtained by training the preset neural network through the motion state of the dual robotic arms, the operating force state of the dual robotic arms and the motion state of the target object as training samples; based on the desired trajectory of the target object, impedance control The control parameters of the model and the preset dual-loop impedance control model are used to obtain the expected joint angles of the dual-arm robots that are suitable for the current environment, which are used as control instructions for the dual-arm robots to achieve compliant control of the dual-arm robot. The invention can improve the efficiency and flexibility of on-orbit assembly of a two-arm robot.

Description

A compliant control method and device for on-orbit assembly of a two-arm robot

技术领域Technical field

本发明涉及空间双臂机器人柔顺控制技术领域，特别涉及一种双臂机器人在轨装配的柔顺控制方法及装置。The invention relates to the technical field of compliance control of a two-arm space robot, and in particular to a compliance control method and device for on-orbit assembly of a two-arm robot.

背景技术Background technique

随着地面机器人技术的广泛应用和航天技术的快速发展，将机器人应用于空间在轨服务体现出了极大的优势和效益。与单臂机器人系统相比，双臂机器人的协同操作使其可以处理更加多元的操作任务，同时系统的负载能力强，适用于大惯量物体或挠性物体的操作，因此将双臂机器人应用于空间在轨服务具有重要意义。With the widespread application of ground robot technology and the rapid development of aerospace technology, the application of robots in space on-orbit services has shown great advantages and benefits. Compared with the single-arm robot system, the collaborative operation of the two-arm robot allows it to handle more diverse operating tasks. At the same time, the system has a strong load capacity and is suitable for the operation of large inertia objects or flexible objects. Therefore, the dual-arm robot is used in Space on-orbit services are of great significance.

目前，单纯的运动规划和位置控制无法实现涉及力位协调的双臂协同操作，因此，需要采用柔顺控制方法，以确保双臂机器人在整个在轨服务过程中目标物体不发生滑落或对系统造成不可逆的损坏。At present, simple motion planning and position control cannot achieve dual-arm cooperative operations involving force and position coordination. Therefore, a compliant control method is needed to ensure that the target object does not slip or cause damage to the system during the entire on-orbit service of the dual-arm robot. Irreversible damage.

相关技术中，柔顺控制方法均采用运动规划和阻抗控制器分离的框架，当目标物体的位置发生变化时，往往需要重新规划路径，导致在轨装配效率低下；此外，该框架中的阻抗控制器的参数在操作过程中基本保持不变，算法的适应性和鲁棒性较差，使得双臂操作的灵活性较差。In related technologies, compliance control methods all use a framework in which motion planning and impedance controllers are separated. When the position of the target object changes, the path often needs to be re-planned, resulting in inefficient on-orbit assembly; in addition, the impedance controller in this framework The parameters remain basically unchanged during the operation, and the algorithm has poor adaptability and robustness, making the dual-arm operation less flexible.

基于此，目前亟待需要一种双臂机器人在轨装配的柔顺控制方法及装置来解决双臂机器人在轨装配时的效率低下和灵活性较差的技术问题。Based on this, there is an urgent need for a compliance control method and device for on-orbit assembly of a two-arm robot to solve the technical problems of low efficiency and poor flexibility during on-orbit assembly of a two-arm robot.

发明内容Contents of the invention

为了解决双臂机器人在轨装配时效率低下和灵活性较差的技术问题，本发明实施例提供了一种双臂机器人在轨装配的柔顺控制方法及装置。In order to solve the technical problems of low efficiency and poor flexibility during on-orbit assembly of a two-arm robot, embodiments of the present invention provide a compliance control method and device for on-orbit assembly of a two-arm robot.

第一方面，本说明书实施例提供了一种双臂机器人在轨装配的柔顺控制方法，包括：In the first aspect, embodiments of this specification provide a compliant control method for on-orbit assembly of a dual-arm robot, including:

获取双臂机器人当前的运动状态；Get the current motion status of the dual-arm robot;

将所述双臂机器人当前的运动状态输入到预先训练好的目标模型中，得到当前目标物体的期望轨迹和与当前环境相适应的阻抗控制模型的控制参数；其中，所述目标模型是通过双机械臂的运动状态、双机械臂的操作力状态和目标物体的运动状态作为训练样本对预设的神经网络训练得到的；Input the current motion state of the two-arm robot into the pre-trained target model to obtain the desired trajectory of the current target object and the control parameters of the impedance control model adapted to the current environment; wherein, the target model is obtained by The motion status of the robotic arm, the operating force status of the dual robotic arms and the motion status of the target object are used as training samples to train the preset neural network;

基于当前目标物体的期望轨迹、所述阻抗控制模型的控制参数以及预设的双环阻抗控制模型，得到与当前环境相适应的双机械臂的期望关节角，以作为所述双机械臂的控制指令，实现所述双臂机器人的柔顺控制。Based on the expected trajectory of the current target object, the control parameters of the impedance control model and the preset dual-loop impedance control model, the expected joint angles of the dual manipulators that are suitable for the current environment are obtained as the control instructions for the dual manipulators. , to achieve compliant control of the two-arm robot.

第二方面，本发明实施例还提供了一种双臂机器人在轨装配的柔顺控制装置，包括：In a second aspect, embodiments of the present invention also provide a compliance control device for on-orbit assembly of a dual-arm robot, including:

获取模块，用于获取双臂机器人当前的运动状态；Acquisition module, used to obtain the current motion status of the dual-arm robot;

输入模块，用于将所述双臂机器人当前的运动状态输入到预先训练好的目标模型中，以得到当前目标物体的期望轨迹和与当前环境相适应的阻抗控制模型的控制参数；An input module for inputting the current motion state of the two-arm robot into a pre-trained target model to obtain the desired trajectory of the current target object and the control parameters of the impedance control model adapted to the current environment;

计算模块，用于基于当前目标物体的期望轨迹、所述阻抗控制模型的控制参数以及预设的双环阻抗控制模型，得到与当前环境相适应的双机械臂的期望关节角，以实现所述双臂机器人的柔顺控制。The calculation module is used to obtain the expected joint angles of the dual manipulators that are adapted to the current environment based on the expected trajectory of the current target object, the control parameters of the impedance control model, and the preset dual-loop impedance control model to achieve the dual-loop impedance control model. Compliance control of arm robots.

第三方面，本说明书实施例还提供了一种电子设备，包括存储器和处理器，所述存储器中存储有计算机程序，所述处理器执行所述计算机程序时，实现本说明书任一实施例所述的方法。In a third aspect, embodiments of this specification also provide an electronic device, including a memory and a processor. A computer program is stored in the memory. When the processor executes the computer program, it implements any of the embodiments of this specification. method described.

第四方面，本说明书实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，当所述计算机程序在计算机中执行时，令计算机执行本说明书任一实施例所述的方法。In a fourth aspect, embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is caused to execute the method described in any embodiment of this specification. .

本说明书实施例提供了一种双臂机器人在轨装配的柔顺控制方法及装置，通过将获取得到的双臂机器人当前的运动状态输入到预先训练好的目标模型中，得到当前目标物体的期望轨迹和与当前环境相适应的阻抗模型的控制参数，最后基于当前目标物体的期望轨迹、阻抗模型的控制参数以及预设的双环阻抗控制模型，得到与当前环境相适应的双机械臂的期望关节角，以作为双机械臂的控制指令，实现双臂机器人的柔顺控制。因此，上述方案能够实现机器人双机械臂的运动规划和阻抗控制模型控制参数的同步优化，并且能够根据当前的环境特性自适应调整阻抗控制器的参数，从而提高了双臂机器人的在轨装配的效率和双臂机器人操作的灵活性。Embodiments of this specification provide a compliant control method and device for on-orbit assembly of a two-arm robot. By inputting the obtained current motion state of the two-arm robot into a pre-trained target model, the desired trajectory of the current target object is obtained. and the control parameters of the impedance model that are suitable for the current environment. Finally, based on the expected trajectory of the current target object, the control parameters of the impedance model, and the preset double-loop impedance control model, the expected joint angles of the dual manipulators that are suitable for the current environment are obtained. , used as the control command of the dual-arm robot to achieve compliant control of the dual-arm robot. Therefore, the above solution can realize the simultaneous optimization of motion planning and impedance control model control parameters of the robot's dual manipulators, and can adaptively adjust the parameters of the impedance controller according to the current environmental characteristics, thus improving the on-orbit assembly efficiency of the dual-arm robot. Efficiency and flexibility of dual-arm robot operation.

附图说明Description of the drawings

为了更清楚地说明本说明书实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本说明书的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of this specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are: For some embodiments of this specification, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1是本说明书一实施例提供的一种双臂机器人在轨装配的柔顺控制方法流程图；Figure 1 is a flow chart of a compliance control method for on-orbit assembly of a dual-arm robot provided by an embodiment of this specification;

图2是本说明书一实施例提供的一种电子设备的硬件架构图；Figure 2 is a hardware architecture diagram of an electronic device provided by an embodiment of this specification;

图3是本说明书一实施例提供的一种双臂机器人在轨装配的柔顺控制装置结构图。Figure 3 is a structural diagram of a compliance control device for on-orbit assembly of a two-arm robot provided by an embodiment of this specification.

具体实施方式Detailed ways

为使本说明书实施例的目的、技术方案和优点更加清楚，下面将结合本说明书实施例中的附图，对本说明书实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本说明书一部分实施例，而不是全部的实施例，基于本说明书中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本说明书保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of this specification clearer, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described embodiments It is a part of the embodiments in this specification, not all the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without any creative work are protected by this specification. scope.

请参考图1，本说明书实施例提供了一种双臂机器人在轨装配的柔顺控制方法，该方法包括：Please refer to Figure 1. The embodiment of this specification provides a compliant control method for on-orbit assembly of a dual-arm robot. The method includes:

步骤100：获取双臂机器人当前的运动状态；Step 100: Obtain the current motion status of the dual-arm robot;

步骤102：将所述双臂机器人当前的运动状态输入到预先训练好的目标模型中，得到当前目标物体的期望轨迹和与当前环境相适应的阻抗控制模型的控制参数；其中，所述目标模型是通过双机械臂的运动状态、双机械臂的操作力状态和目标物体的运动状态作为训练样本对预设的神经网络训练得到的；Step 102: Input the current motion state of the two-arm robot into the pre-trained target model to obtain the desired trajectory of the current target object and the control parameters of the impedance control model adapted to the current environment; wherein, the target model It is obtained by training the preset neural network through the motion state of the dual robotic arms, the operating force state of the dual robotic arms and the motion state of the target object as training samples;

步骤104：基于当前目标物体的期望轨迹、阻抗控制模型的控制参数以及预设的双环阻抗控制模型，得到与当前环境相适应的双机械臂的期望关节角，以作为双机械臂的控制指令，实现双臂机器人的柔顺控制。Step 104: Based on the expected trajectory of the current target object, the control parameters of the impedance control model and the preset double-loop impedance control model, obtain the expected joint angles of the dual manipulators that are suitable for the current environment as the control instructions for the dual manipulators. Achieve compliant control of dual-arm robots.

在本实施例中，通过将获取得到的双臂机器人当前的运动状态输入到预先训练好的目标模型中，得到目标物体的期望轨迹和与当前环境相适应的阻抗模型的控制参数，最后基于目标物体的期望轨迹、阻抗模型的控制参数以及预设的双环阻抗控制模型，得到与当前环境相适应的双机械臂的期望关节角，以作为双机械臂的控制指令，实现双臂机器人的柔顺控制。因此，上述方案能够实现机器人双机械臂的运动规划和阻抗控制模型控制参数的同步优化，并且能够根据当前的环境特性自适应调整阻抗控制器的参数，从而提高了双臂机器人的在轨装配效率和双臂机器人操作的灵活性。In this embodiment, by inputting the obtained current motion state of the two-arm robot into the pre-trained target model, the desired trajectory of the target object and the control parameters of the impedance model adapted to the current environment are obtained. Finally, based on the target The expected trajectory of the object, the control parameters of the impedance model and the preset double-loop impedance control model are used to obtain the expected joint angles of the dual-arms that are suitable for the current environment, which are used as control instructions for the dual-arms to achieve compliant control of the dual-arm robot. . Therefore, the above solution can realize the simultaneous optimization of motion planning and impedance control model control parameters of the robot's dual manipulators, and can adaptively adjust the parameters of the impedance controller according to the current environmental characteristics, thereby improving the on-orbit assembly efficiency of the dual-arm robot. and flexibility in dual-arm robot operation.

下面描述图1所示的各个步骤的执行方式。The execution of each step shown in Figure 1 is described below.

针对步骤100，获取双臂机器人当前的运动状态。For step 100, obtain the current motion state of the dual-arm robot.

需要说明的是，在本实施例中，双臂机器人当前时刻的运动状态具体指的是双臂机器人当前时刻双机械臂的末端执行器的位姿和速度。It should be noted that in this embodiment, the motion state of the two-arm robot at the current moment specifically refers to the posture and speed of the end effectors of the dual manipulator arms of the dual-arm robot at the current moment.

针对步骤102：For step 102:

在本说明书一个实施例中，双机械臂的运动状态包括双机械臂的关节角度、关节角速度、双机械臂末端执行器的位姿和速度；In one embodiment of this specification, the motion state of the dual robotic arms includes the joint angles, joint angular velocities, and posture and speed of the end effectors of the dual robotic arms;

双机械臂的操作力状态包括双机械臂末端的六维力信息和目标物体质心所受到的环境外力；The operating force state of the dual robotic arms includes six-dimensional force information at the ends of the dual robotic arms and the environmental external force on the center of mass of the target object;

目标物体的运动状态包括目标物体质心的位姿。The motion state of the target object includes the pose of the target object's center of mass.

本实施中，为了获得当前时刻目标物体的期望运动以及与当前环境相适应的阻抗控制器参数调整策略，将上述获取得到的双机械臂的运动状态、双机械臂的操作力信息和目标物体的运动状态作为训练样本，并通过预设神经网络进行目标物体运动规划和阻抗控制器参数调整策略的在线训练，直至算法收敛；为了提高学习效率，将物体当前位置与目标位置之间的误差向量也作为训练样本的状态信息。In this implementation, in order to obtain the desired motion of the target object at the current moment and the impedance controller parameter adjustment strategy adapted to the current environment, the above-obtained motion status of the dual manipulator arms, the operating force information of the dual manipulator arms and the target object's The motion state is used as a training sample, and online training of target object motion planning and impedance controller parameter adjustment strategies is carried out through the preset neural network until the algorithm converges; in order to improve learning efficiency, the error vector between the object's current position and the target position is also As state information of training samples.

在本说明书一个实施例中，预设的神经网络包括依次连接的LSTM网络模块、MLP网络模块、深度强化学习网络模块和经验回放池模块。In one embodiment of this specification, the preset neural network includes a LSTM network module, an MLP network module, a deep reinforcement learning network module and an experience replay pool module that are connected in sequence.

考虑到深度强化学习网络算法具有泛化性强的特征，能够提高系统规划和控制的自主性，本实施例中基于深度强化学习网络，将LSTM网络、MLP网络、深度强化学习网络和经验回放池设计为集中式神经网络结构，从而将双机械臂运动规划和阻抗控制模型控制参数的自适应调整统一在同一框架下，进而解决了相关技术中将机械臂运动规划和阻抗控制模型分离导致的双机械臂的在轨装配效率低下，以及传统的阻抗控制模型的适应性和鲁棒性较差导致的双机械臂的灵活性较差的问题。Considering that the deep reinforcement learning network algorithm has strong generalization characteristics and can improve the autonomy of system planning and control, in this embodiment, based on the deep reinforcement learning network, the LSTM network, MLP network, deep reinforcement learning network and experience replay pool are combined It is designed as a centralized neural network structure, thereby unifying the adaptive adjustment of the control parameters of the dual manipulator motion planning and the impedance control model under the same framework, thus solving the dual problems caused by the separation of the manipulator motion planning and the impedance control model in related technologies. The on-orbit assembly efficiency of the robotic arm is low, and the poor adaptability and robustness of the traditional impedance control model result in poor flexibility of the dual robotic arms.

在本说明书一个实施例中，目标模型是通过如下方式训练得到的：In one embodiment of this specification, the target model is trained in the following manner:

按照预设频率获取环境状态量；其中，环境状态量包括双机械臂的操作力状态、双机械臂的运动状态和目标物体的运动状态；Obtain the environmental state quantity according to the preset frequency; where the environmental state quantity includes the operating force state of the dual robotic arms, the motion state of the dual robotic arms, and the motion state of the target object;

将双机械臂的操作力状态输入LSTM网络模块中，得到带有时间序列的双机械臂的操作力状态；Input the operating force status of the dual robotic arms into the LSTM network module to obtain the operating force status of the dual robotic arms with time series;

利用MLP网络模块对双机械臂的操作力状态进行特征提取，得到当前时刻双机械臂力状态的特征向量；Use the MLP network module to extract features from the operating force status of the dual robotic arms, and obtain the feature vector of the force status of the dual robotic arms at the current moment;

将双机械臂力状态的特征向量、双机械臂的运动状态和目标物体的运动状态作为深度强化学习网络模块的输入状态向量；Use the feature vector of the force state of the dual robotic arms, the motion state of the dual robotic arms and the motion state of the target object as the input state vector of the deep reinforcement learning network module;

设置智能体的环境奖励函数，利用深度强化学习网络模块训练智能体，得到目标物体输出动作；目标物体的输出动作包括目标物体质心的位置变化量和阻抗控制模型的控制参数；Set the environmental reward function of the agent, use the deep reinforcement learning network module to train the agent, and obtain the output action of the target object; the output action of the target object includes the position change of the center of mass of the target object and the control parameters of the impedance control model;

将获取得到的环境状态量、目标物体的输出动作、执行该输出动作获得的环境奖励值以及执行该动作后的环境状态量存入经验回放池模块中，得到经验数据；Store the obtained environmental state quantity, the output action of the target object, the environmental reward value obtained by executing the output action, and the environmental state quantity after executing the action into the experience playback pool module to obtain experience data;

利用经验数据对当前的神经网络进行更新，直至算法进入收敛状态，得到目标模型。Use empirical data to update the current neural network until the algorithm enters a convergence state and obtain the target model.

本实施例中，按照20Hz的预设频率获取双机械臂的运动状态和目标物体的运动状态，在该频率下有利于保证双机械臂操作的稳定性；同时，假设阻抗控制模型的控制频率为h，则预设神经网络获取的双机械臂的操作力状态为长度为h/20的时间序列，在每个时刻t，将形状为[h/20,3]的双机械臂操作力状态的时间序列输入LSTM网络模块中，然后由LSTM网络模块输出具有相同形状的操作力状态的时间序列（其中，3表示双机械臂的操作力状态为三维方向的力）；之后，采用一层MLP网络模块对LSTM网络模块中输出的操作力状态的时间序列进行特征提取，得到当前时刻操作力状态的特征向量；最后，将当前时刻获取得到的机械臂的运动状态、目标物体的运动状态和当前时刻操作力状态的特征向量进行连接，使其共同作为深度强化学习网络的输入状态向量。In this embodiment, the motion state of the dual manipulator arms and the motion state of the target object are obtained according to a preset frequency of 20 Hz. This frequency is beneficial to ensuring the stability of the operation of the dual manipulator arms. At the same time, it is assumed that the control frequency of the impedance control model is h, then it is assumed that the operating force state of the dual robotic arms obtained by the neural network is a time series with a length of h/20. At each time t, the operating force state of the dual robotic arms with a shape of [h/20,3] is The time series is input into the LSTM network module, and then the LSTM network module outputs the time series of the operating force state with the same shape (where 3 indicates that the operating force state of the double manipulator is a force in the three-dimensional direction); after that, a layer of MLP network is used The module performs feature extraction on the time series of operating force states output from the LSTM network module to obtain the feature vector of the operating force state at the current moment; finally, the motion state of the robotic arm, the motion state of the target object, and the current moment are obtained at the current moment. The feature vectors of the operating force states are connected so that they jointly serve as the input state vector of the deep reinforcement learning network.

本实施例中，考虑到双臂机器人的双机械臂在协同操作同一目标物体时具有较强的时序性，而传统的强化学习网络无法处理高维时序信息的问题，因此设计了带有LSTM网络模块和MLP网络模块的集中式神经网络结构，该集中式网络结构能够同步获取双机械臂的运动状态和外界交互力状态，从而实现机器人双机械臂的运动规划和阻抗控制模型参数的同步优化，使得双臂机器人在操作过程中，可以根据当前的环境特性和系统运动状态自适应的调整阻抗控制模型的控制参数，进而提高了双臂机器人在轨装配的效率和双臂操作的灵活性。In this embodiment, considering that the dual manipulators of the two-arm robot have strong timing when cooperatively operating the same target object, and the traditional reinforcement learning network cannot handle the problem of high-dimensional timing information, a network with LSTM is designed. The centralized neural network structure of the module and the MLP network module. This centralized network structure can synchronously obtain the motion status and external interaction force status of the dual robotic arms, thereby realizing the simultaneous optimization of the motion planning and impedance control model parameters of the dual robotic arms of the robot. This allows the two-arm robot to adaptively adjust the control parameters of the impedance control model according to the current environmental characteristics and system motion status during operation, thereby improving the efficiency of the on-orbit assembly of the two-arm robot and the flexibility of the two-arm operation.

同时，本实施例中首先采用LSTM网络模块获取每一时刻获取的双机械臂的高维时序力状态信息，之后采用MLP网络模块对LSTM网络模块输出的时间序列的力状态信息进行特征提取，特征向量中带有力趋势特征显著提高了柔顺操作性能。At the same time, in this embodiment, the LSTM network module is first used to obtain the high-dimensional time-series force state information of the dual robotic arms obtained at each moment, and then the MLP network module is used to extract features of the time-series force state information output by the LSTM network module. The force trend feature in the vector significantly improves the compliance operation performance.

针对步骤104：For step 104:

在本说明一个实施例中，步骤104具体可以包括如下步骤：In one embodiment of this description, step 104 may specifically include the following steps:

根据目标物体的期望轨迹、闭链系统的运动学约束式和动力学模型，得到双机械臂末端的期望轨迹和期望操作力；According to the desired trajectory of the target object, the kinematic constraints and dynamic model of the closed-chain system, the desired trajectory and desired operating force at the end of the dual manipulator are obtained;

根据阻抗控制模型的控制参数、预设的双环控制模型和双机械臂末端的期望轨迹和期望操作力，得到双机械臂末端的柔顺期望轨迹；According to the control parameters of the impedance control model, the preset double-loop control model and the expected trajectory and expected operating force of the end of the double manipulator, the compliant expected trajectory of the end of the double manipulator is obtained;

根据双机械臂末端的柔顺期望轨迹和机械臂的逆运动学公式，得到双机械臂的期望关节角。According to the compliant desired trajectory of the end of the dual robotic arm and the inverse kinematics formula of the robotic arm, the desired joint angle of the dual robotic arm is obtained.

本实施例中，将训练后得到的目标模型应用于机器人的双机械臂协同装配操作中，训练后的目标模型通过获取当前时刻双臂机器人的运动状态，得到当前目标物体的质心的位置变化量（即目标物体的期望轨迹）和与当前环境相适应的阻抗控制模型的控制参数。在阻抗控制模型中，期望惯性对整个双臂机器人系统的稳定性有较大影响，但其对双臂机器人的最终运动状态影响较小，在训练过程中和整个在轨装配过程中，本实施例将期望惯性设置为固定参数，例如可以为1。因此，本实施例中目标模型输出的阻抗控制模型的控制参数为与当前环境相适应的阻尼参数和刚度参数。In this embodiment, the target model obtained after training is applied to the robot's dual-arm collaborative assembly operation. The trained target model obtains the position change of the center of mass of the current target object by acquiring the motion state of the dual-arm robot at the current moment. (i.e., the desired trajectory of the target object) and the control parameters of the impedance control model adapted to the current environment. In the impedance control model, the expected inertia has a greater impact on the stability of the entire dual-arm robot system, but its impact on the final motion state of the dual-arm robot is small. During the training process and the entire on-orbit assembly process, this implementation Example sets the desired inertia to a fixed parameter, which can be 1, for example. Therefore, in this embodiment, the control parameters of the impedance control model output by the target model are damping parameters and stiffness parameters that are suitable for the current environment.

在本说明书一个实施例中，预设的双环阻抗控制模型包括外环阻抗模型和内环阻抗模型；其中，外环阻抗模型用于修正目标物体的期望轨迹，内环阻抗模型用于修正双机械臂的期望轨迹；In one embodiment of this specification, the preset dual-loop impedance control model includes an outer-loop impedance model and an inner-loop impedance model; where the outer-loop impedance model is used to correct the desired trajectory of the target object, and the inner-loop impedance model is used to correct the dual-machine impedance model. The desired trajectory of the arm;

首先建立笛卡尔空间中的二阶阻抗模型：First, establish a second-order impedance model in Cartesian space:

式中，、/>和/>分别为二阶阻抗模型的期望惯性、阻尼和刚度，/>、/>和/>分别为笛卡尔空间中机械臂的实际加速度、速度和位置，/>、/>和/>分别为机械臂的期望加速度、速度和位置，/>为实际接触力，/>为机械臂的期望接触力，本实施例中三个阻抗参数均为正定矩阵，通常选择对角矩阵以获得线性解耦响应；In the formula, ,/> and/> are the expected inertia, damping and stiffness of the second-order impedance model, /> ,/> and/> are the actual acceleration, speed and position of the robotic arm in Cartesian space,/> ,/> and/> are the expected acceleration, speed and position of the robotic arm respectively,/> is the actual contact force,/> is the expected contact force of the robotic arm. In this embodiment, the three impedance parameters are all positive definite matrices. A diagonal matrix is usually selected to obtain a linear decoupling response;

在目标物体和外界环境之间建立外环阻抗模型，将笛卡尔空间中的二阶阻抗模型进行改写，得到外环阻抗控制模型：An outer loop impedance model is established between the target object and the external environment, and the second-order impedance model in Cartesian space is rewritten to obtain the outer loop impedance control model:

，/>，/> ,/> ,/>

式中，是目标物体的实际轨迹/>和期望轨迹/>之间的轨迹误差，/>是目标物体轨迹误差的一阶导数，/>是目标物体轨迹误差的二阶导数，/>是环境施加在目标物体上的外力，/>、/>、和/>分别为外环阻抗模型的期望惯性、阻尼和刚度，/>为目标物体的实际轨迹/>的一阶导数，/>为目标物体的期望轨迹/>的一阶导数，/>为目标物体的实际轨迹/>的二阶导数，/>为目标In the formula, Is the actual trajectory of the target object/> and expected trajectory/> The trajectory error between is the first derivative of the target object trajectory error,/> is the second derivative of the target object trajectory error,/> It is the external force exerted by the environment on the target object,/> ,/> , and/> are the expected inertia, damping and stiffness of the outer loop impedance model,/> is the actual trajectory of the target object/> The first derivative of ,/> is the expected trajectory of the target object/> The first derivative of ,/> is the actual trajectory of the target object/> The second derivative of ,/> as target

物体的实际轨迹的二阶导数；The actual trajectory of the object The second derivative of ;

之后建立机械臂i的末端操作力的内环阻抗控制模型，内环阻抗模型为：Then the inner loop impedance control model of the end operating force of the robot arm i is established. The inner loop impedance model is:

，/>，/>，/> ,/> ,/> ,/>

式中，是机械臂i的实际轨迹/>和期望轨迹/>之间的轨迹误差，/>机械臂i末端期望力/>和实际操作力/>之间的误差，/>、/>和/>分别为内环阻抗模型的期望惯性、阻尼和刚度，/>为机械臂i的实际轨迹/>的一阶导数，/>为机械臂i的期望轨迹/>的一阶导数，/>为机械臂i的实际轨迹/>的二阶导数，/>为机械臂i的期望轨迹/>的二阶导数。In the formula, is the actual trajectory of robotic arm i/> and expected trajectory/> The trajectory error between Expected force at end of robotic arm i/> and actual operation ability/> The error between ,/> and/> are the expected inertia, damping and stiffness of the inner loop impedance model, /> is the actual trajectory of robotic arm i/> The first derivative of ,/> is the expected trajectory of robotic arm i/> The first derivative of ,/> is the actual trajectory of robotic arm i/> The second derivative of ,/> is the expected trajectory of robotic arm i/> the second derivative of .

本实施例中，利用预设的外环阻抗控制模型对当前目标物体的期望轨迹进行修正，当目标物体与环境之间不产生相互作用时，最小化环境对目标物体的干扰力，外环阻抗模型不再修正目标物体的期望轨迹，得到目标物体的柔顺期望轨迹，从而实现目标物体与环境间的柔顺控制。In this embodiment, the preset outer loop impedance control model is used to correct the expected trajectory of the current target object. When there is no interaction between the target object and the environment, the interference force of the environment on the target object is minimized. The outer loop impedance The model no longer corrects the desired trajectory of the target object, but obtains the compliant desired trajectory of the target object, thereby achieving compliant control between the target object and the environment.

在本说明书一个实施例中，在双机械臂协同操作运动过程中双机械臂与目标物体间需始终保持稳定的刚性连接，因此，双机械臂末端的执行器与目标物体之间需要时刻满足闭链运动学约束，闭链系统的运动学约束式为：In one embodiment of this specification, during the coordinated operation of the dual robotic arms, a stable rigid connection needs to be maintained between the dual robotic arms and the target object. Therefore, the actuator at the end of the dual robotic arms and the target object need to be closed at all times. Chain kinematics constraints, the kinematics constraints of the closed chain system are:

式中，为机械臂i末端执行器坐标系相对于其基座坐标系的齐次变换矩阵，/>为世界坐标系相对于目标物体质心坐标系的齐次变换矩阵，/>由机械臂i的末端执行器与目标物体之间接触点的位姿可以得到，/>由机械臂i的基底坐标系求逆得到；In the formula, is the homogeneous transformation matrix of the end-effector coordinate system of robotic arm i relative to its base coordinate system,/> is the homogeneous transformation matrix of the world coordinate system relative to the target object center of mass coordinate system, /> It can be obtained from the pose of the contact point between the end effector of the robotic arm i and the target object, /> Obtained from the inversion of the base coordinate system of robot arm i;

动力学模型包括目标物体的力平衡方程和力矩平衡方程；其中，目标物体的力平衡方程为：The dynamic model includes the force balance equation and the moment balance equation of the target object; among them, the force balance equation of the target object is:

目标物体的力矩平衡方程为：The moment balance equation of the target object is:

式中，和/>分别是环境施加在目标物体上的外力和力矩，/>,/>,/>和/>分别是左机械臂和右机械臂末端执行器施加在目标物体上的力和力矩，/>,/>，/>分别是/>,，/>在目标物体上的力作用到物体质心的向量，/>和/>分别是目标物体质心的线速度和角速度，/>是目标物体的质量，/>目标物体的惯性矩，/>是目标物体的重力。In the formula, and/> are the external forces and moments exerted by the environment on the target object,/> ,/> ,/> and/> are the force and torque exerted on the target object by the end effectors of the left and right manipulator arms respectively,/> ,/> ,/> They are/> , ,/> The vector of the force on the target object acting on the center of mass of the object,/> and/> are the linear velocity and angular velocity of the center of mass of the target object,/> is the mass of the target object,/> Moment of inertia of the target object,/> is the gravity of the target object.

本实施例中，首先，采用上述闭链系统的运动学约束式对当前目标物体的柔顺期望轨迹进行分解，具体地，首先由目标物体的柔顺轨迹得到每个时刻目标物体质心相对于世界坐标系的变换矩阵，求该矩阵的逆矩阵并将其代入上述闭链系统的运动学约束式中，计算得到机械臂末端执行器相对于基座坐标系的变换矩阵/>，即可得到双臂末端执行器的期望轨迹，根据上述动力学模型，计算得到双机械臂末端的期望操作力；之后，将目标模型输出的与当前环境相适应的阻抗控制模型的控制参数以及上述计算得到的双机械臂末端的期望轨迹和期望操作力代入上述内环阻抗控制模型中，计算得到双机械臂末端的柔顺轨迹；最后，对机械臂进行逆运动学公式计算得到该柔顺轨迹对应的双机械臂的期望关节角，以作为双机械臂的控制指令，从而实现机器人在轨装配的柔顺控制。In this embodiment, first, the kinematic constraints of the above-mentioned closed-chain system are used to decompose the compliant expected trajectory of the current target object. Specifically, first, the target object's center of mass relative to the world coordinates at each moment is obtained from the compliant trajectory of the target object. The transformation matrix of the system , find the inverse matrix of this matrix and substitute it into the kinematic constraints of the above closed-chain system , calculate the transformation matrix of the manipulator end effector relative to the base coordinate system/> , the expected trajectory of the double-arm end effector can be obtained. According to the above dynamic model, the expected operating force of the double-arm end effector is calculated. After that, the control parameters of the impedance control model output by the target model that are suitable for the current environment and The expected trajectory and expected operating force at the end of the dual manipulator calculated above are substituted into the above inner loop impedance control model, and the compliant trajectory at the end of the dual manipulator is calculated. Finally, the inverse kinematics formula of the manipulator is calculated to obtain the corresponding compliant trajectory. The desired joint angles of the dual robotic arms are used as control instructions for the dual robotic arms to achieve compliant control of the robot's on-orbit assembly.

综上，本实施例提供一种双臂机器人在轨装配的柔顺控制方法剂及装置，利用集中式的神经网络结构统一了目标物体的运动规划和阻抗控制模型的参数的调整，该集中式神经网络结构能够同步获取双机械臂的运动状态和交互力状态，从而可根据目前的环境特性和系统运动状态自适应调整阻抗控制模型的控制参数，进而提高了双机械臂在轨装配的效率和双机械臂操作的灵活性。同时，采用LSTM网络模块和MLP网络模块对操作中的力状态信息预先进行特征提取，特征向量中带有的力趋势特征显著提高了柔顺操作性能。In summary, this embodiment provides a compliance control method and device for on-orbit assembly of a dual-arm robot, which uses a centralized neural network structure to unify the motion planning of the target object and the adjustment of parameters of the impedance control model. The centralized neural network structure The network structure can synchronously obtain the motion status and interaction force status of the dual robotic arms, so that the control parameters of the impedance control model can be adaptively adjusted according to the current environmental characteristics and system motion status, thus improving the efficiency of the on-orbit assembly of the dual robotic arms and the dual-arm assembly efficiency. Flexibility of robotic arm operation. At the same time, the LSTM network module and the MLP network module are used to pre-extract features from the force state information during operation. The force trend characteristics contained in the feature vector significantly improve the compliance operation performance.

如图2、图3所示，本说明书实施例提供了一种双臂机器人在轨装配的柔顺控制装置。装置实施例可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。从硬件层面而言，如图2所示，为本说明书实施例提供的一种双臂机器人在轨装配的柔顺控制装置所在电子设备的一种硬件架构图，除了图2所示的处理器、内存、网络接口、以及非易失性存储器之外，实施例中装置所在的电子设备通常还可以包括其他硬件，如负责处理报文的转发芯片等等。以软件实现为例，如图3所示，作为一个逻辑意义上的装置，是通过其所在电子设备的CPU将非易失性存储器中对应的计算机程序读取到内存中运行形成的。As shown in Figures 2 and 3, embodiments of this specification provide a compliance control device for on-orbit assembly of a two-arm robot. The device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware. From the hardware level, as shown in Figure 2, it is a hardware architecture diagram of the electronic equipment in which the compliance control device of a two-arm robot assembled in orbit provided by the embodiment of this specification. In addition to the processor shown in Figure 2, In addition to memory, network interfaces, and non-volatile storage, the electronic device where the device in the embodiment is located may also generally include other hardware, such as a forwarding chip responsible for processing messages, etc. Taking software implementation as an example, as shown in Figure 3, as a logical device, it is formed by reading the corresponding computer program in the non-volatile memory into the memory and running it through the CPU of the electronic device where it is located.

如图3所示，本实施例提供的一种双臂机器人在轨装配的柔顺控制装置，包括：As shown in Figure 3, this embodiment provides a compliance control device for on-orbit assembly of a two-arm robot, including:

获取模块300，用于获取双臂机器人当前的运动状态；The acquisition module 300 is used to acquire the current motion state of the dual-arm robot;

输入模块302，用于将双臂机器人的运动状态输入到预先训练好的目标模型中，以得到目标物体的期望轨迹和与当前环境相适应的阻抗控制模型的控制参数；The input module 302 is used to input the motion state of the two-arm robot into the pre-trained target model to obtain the desired trajectory of the target object and the control parameters of the impedance control model that are suitable for the current environment;

计算模块304，用于基于目标物体的期望轨迹、阻抗控制模型的控制参数以及预设的双环阻抗控制模型，得到与当前环境相适应的双机械臂的期望关节角，以实现双臂机器人的柔顺控制。The calculation module 304 is used to obtain the expected joint angle of the dual-arm robot that is suitable for the current environment based on the expected trajectory of the target object, the control parameters of the impedance control model, and the preset dual-loop impedance control model, so as to achieve the compliance of the dual-arm robot. control.

在本说明书实施例中，获取模块300可用于执行上述方法实施例中的步骤100，输入模块302可用于执行上述方法实施例中的步骤102，计算模块304可用于执行上述方法实施例中的步骤104。In this embodiment of the present description, the acquisition module 300 can be used to perform step 100 in the above method embodiment, the input module 302 can be used to perform step 102 in the above method embodiment, and the calculation module 304 can be used to perform the steps in the above method embodiment. 104.

在本说明书的一个实施例中，所述双机械臂的运动状态包括双机械臂的关节角度、关节角速度、双机械臂末端执行器的位姿和速度；In one embodiment of this specification, the motion state of the dual robotic arms includes the joint angles, joint angular velocities of the dual robotic arms, and the posture and speed of the end effectors of the dual robotic arms;

所述双机械臂的操作力状态包括双机械臂末端的六维力信息和目标物体质心所受到的环境外力；The operating force state of the dual robotic arms includes six-dimensional force information at the ends of the dual robotic arms and the environmental external force experienced by the center of mass of the target object;

所述目标物体的运动状态包括目标物体质心的位姿。The motion state of the target object includes the posture of the center of mass of the target object.

在本说明书的一个实施例中，所述预设的神经网络包括依次连接的LSTM网络模块、MLP网络模块、深度强化学习网络模块和经验回放池模块。In one embodiment of this specification, the preset neural network includes an LSTM network module, an MLP network module, a deep reinforcement learning network module and an experience replay pool module that are connected in sequence.

在本说明书的一个实施例中，所述目标模型是通过如下方式训练得到的：In one embodiment of this specification, the target model is trained in the following manner:

按照预设频率获取环境状态量；其中，所述环境状态量包括双机械臂的操作力状态、双机械臂的运动状态和目标物体的运动状态；Obtain the environmental state quantity according to a preset frequency; wherein the environmental state quantity includes the operating force state of the dual robotic arms, the motion state of the dual robotic arms, and the motion state of the target object;

将所述双机械臂的操作力状态输入所述LSTM网络模块中，得到带有时间序列的双机械臂的操作力状态；Input the operating force status of the dual robotic arms into the LSTM network module to obtain the operating force status of the dual robotic arms with time series;

利用所述MLP网络模块对所述双机械臂的操作力状态进行特征提取，得到当前时刻双机械臂力状态的特征向量；Use the MLP network module to perform feature extraction on the operating force state of the dual robotic arms to obtain the feature vector of the force state of the dual robotic arms at the current moment;

将所述双机械臂力状态的特征向量、所述双机械臂的运动状态和所述目标物体的运动状态作为所述深度强化学习网络模块的输入状态向量；Use the feature vector of the force state of the dual robotic arms, the motion state of the dual robotic arms and the motion state of the target object as the input state vector of the deep reinforcement learning network module;

设置智能体的环境奖励函数，利用所述深度强化学习网络模块训练智能体，得到目标物体输出动作；其中，所述目标物体的输出动作包括目标物体质心的位置变化量和阻抗控制模型的控制参数；Set the environmental reward function of the agent, use the deep reinforcement learning network module to train the agent, and obtain the output action of the target object; wherein the output action of the target object includes the position change of the center of mass of the target object and the control of the impedance control model parameter;

将获取得到的环境状态量、所述目标物体的输出动作、执行该动作获得的环境奖励值以及执行该动作后的环境状态量存入所述经验回放池模块中，得到经验数据；Store the obtained environmental state quantity, the output action of the target object, the environmental reward value obtained by executing the action, and the environmental state quantity after executing the action into the experience playback pool module to obtain experience data;

利用所述经验数据对当前的神经网络进行更新，直至算法进入收敛状态，得到所述目标模型。The current neural network is updated using the empirical data until the algorithm enters a convergence state and the target model is obtained.

在本说明书的一个实施例中，所述预设的双环阻抗模型包括外环阻抗模型和内环阻抗模型；其中，In one embodiment of this specification, the preset dual-loop impedance model includes an outer loop impedance model and an inner loop impedance model; where,

所述外环阻抗模型为：The outer loop impedance model is:

，/>，/> ,/> ,/>

式中，是目标物体的实际轨迹和期望轨迹之间的轨迹误差，/>是目标物体轨迹误差的一阶导数，/>是目标物体轨迹误差的二阶导数，/>是环境施加在目标物体上的外力，/>、/>和/>分别为外环阻抗模型的期望惯性、阻尼和刚度；In the formula, is the trajectory error between the actual trajectory and the expected trajectory of the target object,/> is the first derivative of the target object trajectory error,/> is the second derivative of the target object trajectory error,/> It is the external force exerted by the environment on the target object,/> ,/> and/> are the expected inertia, damping and stiffness of the outer loop impedance model respectively;

所述内环阻抗模型为：The inner loop impedance model is:

，/>，/> ,/> ,/>

式中，是机械臂i的实际轨迹和期望轨迹之间的轨迹误差，/>机械臂i末端期望力和实际操作力之间的误差，/>、/>和/>分别为内环阻抗模型的期望惯性、阻尼和刚度。In the formula, is the trajectory error between the actual trajectory and the desired trajectory of robotic arm i,/> The error between the expected force at the end of robot arm i and the actual operating force,/> ,/> and/> are the expected inertia, damping and stiffness of the inner loop impedance model, respectively.

在本说明书的一个实施例中，所述计算模块用于执行如下操作：In one embodiment of this specification, the computing module is used to perform the following operations:

根据所述目标物体的期望轨迹、闭链系统的运动学约束式和动力学模型，得到双机械臂末端的期望轨迹和期望操作力；According to the desired trajectory of the target object, the kinematic constraints and dynamic model of the closed-chain system, the desired trajectory and desired operating force of the end of the dual manipulator are obtained;

根据所述阻抗控制模型的控制参数、所述预设的双环控制模型和所述双机械臂末端的期望轨迹和期望操作力，得到双机械臂末端的柔顺期望轨迹；According to the control parameters of the impedance control model, the preset double-loop control model and the expected trajectory and expected operating force of the end of the dual manipulator, a compliant desired trajectory of the end of the dual manipulator is obtained;

根据所述双机械臂末端的柔顺期望轨迹和机械臂的逆运动学公式，得到所述双机械臂的期望关节角。According to the compliant desired trajectory of the end of the dual robotic arm and the inverse kinematics formula of the robotic arm, the desired joint angle of the dual robotic arm is obtained.

在本说明书的一个实施例中，所述闭链系统的运动学约束式为：In one embodiment of this specification, the kinematic constraint of the closed-chain system is:

动力学模型包括目标物体的力平衡方程和力矩平衡方程；其中，所述目标物体的力平衡方程为：The dynamic model includes the force balance equation and the moment balance equation of the target object; wherein, the force balance equation of the target object is:

所述目标物体的力矩平衡方程为：The moment balance equation of the target object is:

式中，和/>分别是环境施加在目标物体上的外力和力矩，/>,/>,/>和/>分别是左机械臂和右机械臂末端执行器施加在目标物体上的力和力矩/>，/>，/>分别是/>,/>，在目标物体上的力作用到物体质心的向量，/>和/>分别是目标物体质心的线速度和角速度，/>是目标物体的质量，/>目标物体的惯性矩，/>是目标物体的重力。In the formula, and/> are the external forces and moments exerted by the environment on the target object,/> ,/> ,/> and/> They are the force and torque exerted on the target object by the end effectors of the left and right manipulators/> ,/> ,/> They are/> ,/> , The vector of the force on the target object acting on the center of mass of the object,/> and/> are the linear velocity and angular velocity of the center of mass of the target object,/> is the mass of the target object,/> Moment of inertia of the target object,/> is the gravity of the target object.

可以理解的是，本说明书实施例示意的结构并不构成对一种双臂机器人在轨装配的柔顺控制装置的具体限定。在本说明书的另一些实施例中，一种空间飞行器的规避机动控制装置可以包括比图示更多或者更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。图示的部件可以以硬件、软件或者软件和硬件的组合来实现。It can be understood that the structures illustrated in the embodiments of this specification do not constitute a specific limitation on a compliance control device assembled on-orbit for a two-arm robot. In other embodiments of this specification, a spacecraft evasive maneuver control device may include more or less components than shown in the figure, or combine some components, or separate some components, or arrange different components. . The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

上述装置内的各模块之间的信息交互、执行过程等内容，由于与本说明书方法实施例基于同一构思，具体内容可参见本说明书方法实施例中的叙述，此处不再赘述。The information interaction, execution process, etc. between the modules in the above device are based on the same concept as the method embodiments of this specification. For specific content, please refer to the description in the method embodiments of this specification, and will not be described again here.

本说明书实施例还提供了一种电子设备，包括存储器和处理器，所述存储器中存储有计算机程序，所述处理器执行所述计算机程序时，实现本说明书任一实施例中的一种双臂机器人在轨装配的柔顺控制方法。Embodiments of this specification also provide an electronic device, including a memory and a processor. A computer program is stored in the memory. When the processor executes the computer program, a dual function in any embodiment of this specification is implemented. Compliance control method for on-orbit assembly of arm robots.

本说明书实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序在被处理器执行时，使所述处理器执行本说明书任一实施例中的一种双臂机器人在轨装配的柔顺控制方法。Embodiments of this specification also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, it causes the processor to execute any implementation of this specification. An example of a compliant control method for on-orbit assembly of a two-arm robot.

具体地，可以提供配有存储介质的系统或者装置，在该存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码，且使该系统或者装置的计算机（或CPU或MPU）读出并执行存储在存储介质中的程序代码。Specifically, a system or device equipped with a storage medium may be provided, on which the software program code that implements the functions of any of the above embodiments is stored, and the computer (or CPU or MPU) of the system or device ) reads and executes the program code stored in the storage medium.

在这种情况下，从存储介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能，因此程序代码和存储程序代码的存储介质构成了本说明书的一部分。In this case, the program code itself read from the storage medium can implement the functions of any one of the above embodiments, and therefore the program code and the storage medium storing the program code form a part of this specification.

用于提供程序代码的存储介质实施例包括软盘、硬盘、磁光盘、光盘（如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD+RW）、磁带、非易失性存储卡和ROM。可选择地，可以由通信网络从服务器计算机上下载程序代码。Examples of storage media for providing program codes include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), Tapes, non-volatile memory cards and ROM. Alternatively, the program code can be downloaded from the server computer via the communications network.

此外，应该清楚的是，不仅可以通过执行计算机所读出的程序代码，而且可以通过基于程序代码的指令使计算机上操作的操作系统等来完成部分或者全部的实际操作，从而实现上述实施例中任意一项实施例的功能。In addition, it should be clear that the above embodiments can be implemented not only by executing the program code read by the computer, but also by causing the operating system etc. operating on the computer to complete some or all of the actual operations through instructions based on the program code. function of any embodiment.

此外，可以理解的是，将由存储介质读出的程序代码写到插入计算机内的扩展板中所设置的存储器中或者写到与计算机相连接的扩展模块中设置的存储器中，随后基于程序代码的指令使安装在扩展板或者扩展模块上的CPU等来执行部分和全部实际操作，从而实现上述实施例中任一实施例的功能。In addition, it can be understood that the program code read from the storage medium is written into a memory provided in an expansion board inserted into the computer or into a memory provided in an expansion module connected to the computer, and then the program code is read based on the program code. The instructions cause the CPU installed on the expansion board or expansion module to perform part or all of the actual operations, thereby realizing the functions of any of the above embodiments.

需要说明的是，在本文中，诸如第一和第二之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or sequence. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储在计算机可读取的存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质中。Those of ordinary skill in the art can understand that all or part of the steps to implement the above method embodiments can be completed by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, It includes the steps of the above method embodiment; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

最后应说明的是：以上实施例仅用以说明本说明书的技术方案，而非对其限制；尽管参照前述实施例对本说明书进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本说明书各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present specification, but not to limit it; although the present specification has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications may be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions may be made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of this specification.

Claims

1. A compliant control method for on-orbit assembly of a two-arm robot, characterized by including:

Get the current motion status of the dual-arm robot;

Input the current motion state of the two-arm robot into the pre-trained target model to obtain the desired trajectory of the current target object and the control parameters of the impedance control model adapted to the current environment; wherein, the target model is obtained by The motion status of the robotic arm, the operating force status of the dual robotic arms and the motion status of the target object are used as training samples to train the preset neural network;

Based on the expected trajectory of the current target object, the control parameters of the impedance control model and the preset dual-loop impedance control model, the expected joint angles of the dual manipulators that are suitable for the current environment are obtained as the control instructions for the dual manipulators. , to achieve compliant control of the two-arm robot.

2. The method according to claim 1, characterized in that the motion state of the dual robotic arms includes joint angles, joint angular velocities of the dual robotic arms, and posture and speed of the end effectors of the dual robotic arms;

The operating force state of the dual robotic arms includes six-dimensional force information at the ends of the dual robotic arms and the environmental external force experienced by the center of mass of the target object;

The motion state of the target object includes the posture of the center of mass of the target object.

3. The method according to claim 1, wherein the preset neural network includes an LSTM network module, an MLP network module, a deep reinforcement learning network module and an experience replay pool module connected in sequence.

4. The method according to claim 3, characterized in that the target model is trained in the following manner:

Obtain the environmental state quantity according to a preset frequency; wherein the environmental state quantity includes the operating force state of the dual robotic arms, the motion state of the dual robotic arms, and the motion state of the target object;

Input the operating force status of the dual robotic arms into the LSTM network module to obtain the operating force status of the dual robotic arms with time series;

Use the MLP network module to perform feature extraction on the operating force state of the dual robotic arms to obtain the feature vector of the force state of the dual robotic arms at the current moment;

Use the feature vector of the force state of the dual robotic arms, the motion state of the dual robotic arms and the motion state of the target object as the input state vector of the deep reinforcement learning network module;

Set the environmental reward function of the agent, use the deep reinforcement learning network module to train the agent, and obtain the output action of the target object; wherein the output action of the target object includes the position change of the center of mass of the target object and the change of the impedance control model Control parameters;

Store the obtained environmental state quantity, the output action of the target object, the environmental reward value obtained by executing the action, and the environmental state quantity after executing the action into the experience playback pool module to obtain experience data;

The current neural network is updated using the empirical data until the algorithm enters a convergence state and the target model is obtained.

5. The method according to claim 1, wherein the preset dual-loop impedance model includes an outer loop impedance model and an inner loop impedance model; wherein, the outer loop impedance model is:

,/> ,/>

In the formula, is the trajectory error between the actual trajectory and the expected trajectory of the target object,/> is the first derivative of the target object trajectory error,/> is the second derivative of the target object trajectory error,/> It is the external force exerted by the environment on the target object,/> ,/> and/> are the expected inertia, damping and stiffness of the outer loop impedance model respectively;

The inner loop impedance model is:

,/> ,/>

In the formula, is the trajectory error between the actual trajectory and the desired trajectory of robotic arm i,/> The error between the expected force at the end of robot arm i and the actual operating force,/> ,/> and/> are the expected inertia, damping and stiffness of the inner loop impedance model, respectively.

6. The method according to claim 1, characterized in that, based on the expected trajectory of the current target object, the control parameters of the impedance control model and the preset dual-loop impedance control model, a dual-loop impedance control model adapted to the current environment is obtained. Desired joint angles of the robotic arm, including:

According to the desired trajectory of the current target object, the kinematic constraints and dynamic model of the closed-chain system, the desired trajectory and desired operating force of the end of the dual manipulator are obtained;

According to the control parameters of the impedance control model, the preset double-loop control model and the expected trajectory and expected operating force of the end of the dual manipulator, a compliant desired trajectory of the end of the dual manipulator is obtained;

According to the compliant desired trajectory of the end of the dual robotic arm and the inverse kinematics formula of the robotic arm, the desired joint angle of the dual robotic arm is obtained.

7. The method according to claim 6, characterized in that,

The kinematic constraints of the closed-chain system are:

In the formula, is the homogeneous transformation matrix of the end-effector coordinate system of robotic arm i relative to its base coordinate system,/> is the homogeneous transformation matrix of the world coordinate system relative to the target object center of mass coordinate system, /> It can be obtained from the pose of the contact point between the end effector of the robotic arm i and the target object, /> Obtained from the inversion of the base coordinate system of robot arm i;

The dynamic model includes the force balance equation and the moment balance equation of the target object; wherein, the force balance equation of the target object is:

The moment balance equation of the target object is:

In the formula, and/> are the external forces and moments exerted by the environment on the target object,/> , /> ,/> and/> are the force and torque exerted on the target object by the end effectors of the left and right manipulator arms respectively,/> ,/> ,/> They are/> ,/> ,/> The vector of the force on the target object acting on the center of mass of the object,/> and/> are the linear velocity and angular velocity of the center of mass of the target object,/> is the mass of the target object,/> Moment of inertia of the target object,/> is the gravity of the target object.

8. A compliance control device for on-orbit assembly of a two-arm robot, characterized by including:

Acquisition module, used to obtain the current motion status of the dual-arm robot;

An input module for inputting the current motion state of the two-arm robot into a pre-trained target model to obtain the desired trajectory of the target object at the current moment and the control parameters of the impedance control model adapted to the current environment;

The calculation module is used to obtain the expected joint angle of the dual manipulator adapted to the current environment based on the expected trajectory of the target object at the current moment, the control parameters of the impedance control model and the preset double-loop impedance control model, as the The control instructions of the dual-arm robot realize the compliant control of the dual-arm robot.

9. An electronic device, including a memory and a processor. A computer program is stored in the memory. When the processor executes the computer program, the method according to any one of claims 1-7 is implemented.

10. A computer-readable storage medium having a computer program stored thereon, which when the computer program is executed in a computer, causes the computer to perform the method according to any one of claims 1-7.