Enterprise fission path optimization and dynamic capability construction based on the soft actor-critic algorithm

Gu, Hengsheng

doi:10.1038/s41598-025-06180-w

Download PDF

Article
Open access
Published: 01 July 2025

Enterprise fission path optimization and dynamic capability construction based on the soft actor-critic algorithm

Hengsheng Gu¹

Scientific Reports volume 15, Article number: 20942 (2025) Cite this article

253 Accesses
Metrics details

Subjects

Abstract

This study aims to explore the enterprise fission path optimization strategy based on the Soft Actor-Critic (SAC) algorithm and analyze its impact on the parent company and the overall operational efficiency. Firstly, the enterprise finance and marketing data provided by the National Bureau of Statistics public dataset are used for data pre-processing. Secondly, a multi-level reward function is designed that covers short-term financial and market indices. Meanwhile, it incorporates long-term indices that measure dynamic capabilities, such as innovation, market agility, and resource integration. Finally, by introducing the reinforcement learning algorithm of SAC, the enterprise fission scenario is constructed into a complicated decision environment, in which the state space includes the current financial situation, market performance, and dynamic capability level of the enterprise. The action space encompasses various strategic choices of enterprise fission to simulate the enterprise fission decision process. The SAC algorithm’s entropy regularization feature prompts the model to strike a balance between exploration and utilization to optimize the dynamic capability construction. The experimental results show that the fission path optimized by deep reinforcement learning (DRL) markedly improves the resource allocation efficiency and market response speed by an average of 20.4% and 25.2%, respectively. More importantly, dynamic capability construction has been significantly enhanced, with the innovation capability index increasing by 15.4%, market agility improving by 12.3%, and resource integration capability also enhancing by 10.5%. This indicates that the strategy can help accelerate the formation of industrial clusters. Therefore, the SAC algorithm-based enterprise fission path optimization strategy constructed in this study can bring lasting competitive advantages to enterprises.

A dynamic receptive field and improved feature fusion approach for federated learning in financial credit risk assessment

Article Open access 03 November 2024

Structure and dynamics of financial networks by feature ranking method

Article Open access 02 September 2021

Predictive scale-bridging simulations through active learning

Article Open access 27 September 2023

Introduction

In today’s rapidly changing business environment, enterprise fission, as an important strategic tool, plays a significant role in enhancing enterprise flexibility, innovation capability, and market competitiveness. Enterprise fission refers to the process by which a parent enterprise forms a new independent enterprise by separating some of its assets, businesses, or subsidiaries^1,2. This process not only helps enterprises optimize resource allocation and improve organizational efficiency but also promotes innovation, injecting new vitality into the long-term development of enterprises. In the modern business environment, enterprise fission has become an important strategic tool to improve enterprises’ flexibility, innovation, and market competitiveness^3,4,5. With the rapid changes in the global economy and the continuous advancement of technology, the market environment faced by enterprises is becoming more complex and changeable. To maintain an advantage in the fierce market competition, enterprises need to constantly adjust their strategies to meet new challenges and opportunities^6,7. As a strategic choice, enterprise fission can improve the enterprises’ overall operation efficiency and market response speed through the reallocation of resources and the optimization of organizational structure⁸. However, how to effectively implement enterprise fission and optimize the fission path is a critical question. Moreover, achieving the optimal allocation of resources and the construction of dynamic capabilities are key issues that urgently need to be addressed in the field of enterprise strategic management.

Deep reinforcement learning (DRL) is an advanced algorithm that combines reinforcement learning (RL) and deep learning (DL) techniques, and has shown great potential in solving complex decision problems in recent years^9,10,11. By simulating the interaction process of agents in the environment, DRL can automatically learn the optimal strategy, to maximize the goal in an unknown or dynamic environment¹². Compared with traditional optimization methods, DRL has a stronger adaptive ability and higher decision-making efficiency, so it is widely used in automatic driving, robot control, financial transactions, and other fields^13,14,15. The Soft Actor-Critic (SAC) algorithm, as an important variant of DRL, has demonstrated excellent performance in various application scenarios due to its good balance between exploration and exploitation¹⁶. The SAC algorithm introduces an entropy regularization term, increasing the randomness of the strategy. This enables the SAC algorithm to discover better fission paths and strategies in the decision-making environment of enterprise fission, which is full of uncertainties and complexities.

This study introduces the SAC algorithm into the research of enterprise fission path optimization, aiming to explore its application potential and optimization effects in complex business environments. A multi-level reward function is constructed, taking into account both short-term financial and market indices and long-term dynamic capability indices of enterprises, such as innovation capability, market agility, and resource integration capability. Comparative experiments with other RL algorithms are conducted to verify the superiority and effectiveness of the SAC algorithm in optimizing the enterprise fission path.

This study not only has significant theoretical value, enriching the research findings in the fields of enterprise fission and DRL but also has notable practical significance. By optimizing the enterprise fission path, enterprises can allocate resources more effectively, improve market response speed, and accelerate the formation of industrial clusters, thereby gaining a lasting competitive advantage. Therefore, this study has critical practical implications for guiding enterprises in implementing fission strategies, enhancing dynamic capabilities, and market competitiveness.

Literature review

As a strategic management tool, enterprise fission is critical in enhancing enterprise flexibility and competitiveness^17,18,19. According to the existing literature, enterprise fission usually includes two forms: spin-off and split-off²⁰. A spin-off is when the parent company separates part of its business or subsidiary into a separate company and continues to hold part of the shares of the new company. A split-off is when the parent company completely gives up its shares, making the new company completely independent. Yu et al. (2023)²¹ explored the relationship between subsidiaries and parent companies. Dong et al. (2023)²², based on the FISM model, proposed an intelligent path identification method for the risk of enterprise capital chain disruption. Through big data analysis, they optimized the path planning during the enterprise fission process, enhancing capital operation efficiency and risk management capabilities. Abdulla et al. (2017)²³ showed that enterprise fission could bring many benefits through optimizing resource allocation, improving organizational efficiency, and promoting innovation. For example, Handoyo et al. (2023)²⁴ pointed out that new companies could focus more on core business through fission, thus enhancing operational efficiency and market competitiveness. However, enterprise fission also faced many challenges, such as how to balance the interests between the parent company and the new company, and ensure a smooth transition in the fission process²⁵. From the analysis of the aforementioned literature, it is evident that enterprise fission, as an important method of modern enterprise strategic adjustment, has received widespread attention in recent years. However, research on how to scientifically and effectively plan the path of enterprise fission is relatively scarce, which limits a comprehensive and in-depth understanding of the fission process. Existing studies mainly focus on the enterprise fission’s motives, patterns, and impacts, with little involvement in optimizing fission paths.

The dynamic capability theory emphasizes the ability of enterprises to maintain competitive advantages by integrating, constructing, and reconfiguring internal and external resources in a constantly changing environment²⁶. Although existing research has made some progress in the construction and evaluation of dynamic capability, effectively improving dynamic capability in practice is still a challenge²⁷. Especially during enterprise fission, how to enhance dynamic capability through optimization strategy needs further research. Cordeiro et al. (2023)²⁸ believed that the improvement of dynamic capability not only depended on the acquisition and allocation of resources but also needed to be realized through strategic management and organizational reform.

DRL combines the advantages of DL and RL to achieve adaptive optimization in intricate environments²⁹. In recent years, DRL has achieved remarkable results in the fields of autonomous driving, robot control, and financial transactions. For instance, Kathirgamanathan et al. (2021)³⁰ argued that the SAC algorithm could strike a balance between exploration and utilization due to its entropy regularization features, and become an effective tool to deal with continuous action space problems. However, there were relatively few researches on the application of DRL in enterprise management. The existing studies mainly focused on production scheduling, supply chain management, and other fields. However, the application research of enterprise strategic decision-making, especially the enterprise fission process, is still in the initial stage. Mohamadi et al. (2024)³¹ discussed the application of DRL in supply chain optimization and found that it remarkably improved the supply chain’s efficiency and response speed. Anbazhagan & Mugelan (2024)³² applied the SAC algorithm to resource optimization in NB-IoT (Narrowband Internet of Things) networks, designing efficient resource allocation strategies with RL to improve network performance and address resource bottleneck issues. Zhang et al. (2024)³³ proposed a new framework combining the SAC algorithm, disjoint graph embeddings, and autoencoders to solve workshop scheduling problems, significantly optimizing scheduling efficiency and enterprise production management processes.

To fill the research gaps mentioned above, this study explores the optimization of enterprise fission paths through the following aspects of innovation. It is necessary to comprehensively consider short-term financial and market indices as well as long-term dynamic capability indices to assess the impact of enterprise fission. At the same time, the advantage of the SAC algorithm in a complex decision-making environment is used to optimize the enterprise fission path and achieve the balance between exploration and utilization. The study proves that the fission path optimized by DRL not only improves resource allocation efficiency and market response speed but also effectively enhances the dynamic capabilities of enterprises.

Research methodology

Data source and preprocessing

This study utilizes enterprise finance and marketing data sourced from two publicly available datasets maintained by the National Bureau of Statistics (NBS). The two datasets are “Annual Financial Database of Industrial Enterprises Above Designated Size” and the “Microdata Module of Enterprise Economic Census”. The financial database contains core financial indices, including balance sheets, income statements, and cash flow statements of industrial enterprises, with specific fields covering total assets, net profit, total liabilities, and cash flow. The economic census microdata module provides corporate marketing indices such as sales revenue, market share, industry classification, and regional distribution information. These two comprehensive data panels are integrated through unified social credit codes, establishing a complete dataset that encompasses both financial performance and market behavior dimensions of enterprises. Data needs to be cleaned and pre-processed before use, as presented in Fig. 1:

Firstly, the data are cleaned and preprocessed. Data cleaning is a vital step to ensure data quality, which majorly includes dealing with missing values and outliers. Regarding missing value processing, the mean imputation method is utilized to ensure data integrity for a small amount of missing data. To avoid adverse effects on the analysis results, the samples with more serious missing are excluded. Outlier detection is another important part of data cleaning. Another key step in data preprocessing is data standardization. Due to the different dimensions between various indices, all continuous variables are standardized to eliminate the influence of dimensional differences on model training. Specifically, Z-score standardization method is adopted, that is, each variable minus its mean and then divided by its standard deviation (SD). The equation is as follows:

$$\:{X}^{{\prime\:}}=\frac{X-\mu\:}{\sigma\:}.$$

(1)

X refers to the raw data; µ represents the mean; σ stands for the SD. After the normalization treatment, the mean value of each variable is 0 and the SD is 1, which eliminates the influence of dimension and facilitates subsequent model training.

The data screening process follows three sequential steps to ensure sample quality. First, industry classification serves as the primary criterion to select enterprises with typical strategic adjustment needs, focusing specifically on the manufacturing and information technology service sectors. Second, data completeness and continuity requirements eliminate enterprise samples with over 15% missing financial indices or incomplete three-year consecutive data records. Third, enterprise size parameters (measured by employee count and annual revenue) exclude micro-enterprises and individual businesses, guaranteeing that all research samples possess both practical strategic adjustment foundations and data traceability. This rigorous screening procedure ultimately produces a high-quality enterprise sample set spanning multiple industries and regions, providing reliable data support for model training and experimental analysis.

After the above data cleaning and preprocessing, a high-quality dataset containing key financial indices (such as total assets, total liabilities, net profit, etc.) and marketing indices (such as sales revenue, market share, etc.) is obtained. These indices reflect the enterprises’ short-term financial situation and market performance. Furthermore, these offer critical data to evaluate the long-term dynamic capabilities of enterprises (such as market agility, innovation, and resource integration), thus providing a solid foundation for the construction and training of subsequent models.

Additionally, exploratory analysis of the data is performed to understand its basic distribution and characteristics. The distribution of each index is analyzed, and the potential patterns and rules in the data are identified. Finally, the processed data is divided into training and test sets. The training set is used to train the model, and the test set is utilized to evaluate the model’s performance. The method of random sampling is adopted to ensure the representativeness and balance of training and test sets. The time series characteristics are also taken into account during the data partitioning process to ensure that the training and test sets have continuity in time to reflect the real enterprise operating environment.

Model design and construction

A multi-level reward function is designed to optimize the enterprise fission path and enhance the dynamic capability, and a decision model is constructed based on the SAC algorithm. The model not only covers the short-term financial and market indices of the enterprise but also integrates the long-term dynamic capability indices into it, to realize the enterprise fission path’s comprehensive optimization. The model is displayed in Fig. 2:

Based on the above design, the SAC algorithm’s application in enterprise fission decisions not only improves the efficiency of enterprise resource allocation and market response speed but also offers strong support for the construction of dynamic capability. Corporate dynamic capability is defined as an enterprise’s comprehensive ability to achieve resource reconfiguration and environmental adaptation through strategic adjustments during fission processes, encompassing three specific dimensions. (1) Innovation capability is quantified through annual patent applications and research and development (R&D) investment ratio; (2) Market agility uses the reciprocal of product iteration cycles and market strategy adjustment response time as evaluation indices; (3) Resource integration capability is characterized by supply chain coordination efficiency, calculated through weighted measures of on-time order delivery rate and cross-departmental resource sharing rate. Resource allocation efficiency specifically refers to the quantitative performance of asset portfolio optimization during fission processes. It is calculated as the ratio between effective output resources (the sum of core business revenue and R&D achievement transformation) and total resource input (the sum of net fixed assets and operating costs). This index reflects the Pareto improvement level of resource utilization. Market agility focuses on an enterprise’s dynamic adaptability to market demand changes, measured through two sub-indices. The first is market information perception speed, representing the average time from market signal identification to strategy formulation. The second is strategy implementation efficiency, defined as an inverse proportional function of both cost consumption and period required for new market strategy execution. These parameters are transformed into standardized feature vectors for state space input in the model. All values undergo industry-benchmarked normalization processing to ensure validity in cross-enterprise comparisons.

The design of the reward function is the core of RL, which directly affects the agent’s learning direction and effect. The designed reward function is classified into two levels, short-term and long-term, respectively considering financial, market, dynamic capability, and other multidimensional indices. The first is short-term financial and market indices. These indices reflect the current operational status of the enterprise, encompassing net profit ($\:{R}_{\text{profit}}$), cash flow ($\:{R}_{\text{cash}}$), and market share ($\:{R}_{\text{market\:share}}$). The short-term reward function’s design aims to motivate enterprises to maintain good financial and market performance during the fission process. The specific short-term reward function is defined as:

$$\:{R}_{\text{short\:term}}={\alpha\:}_{1}{R}_{\text{profit}}+{\alpha\:}_{2}{R}_{\text{cash}}+{\alpha\:}_{3}{R}_{\text{market\:share}}.$$

(2)

α₁, α₂, and α₃ are the weight parameters, which are adjusted experimentally to optimize the model’s performance. The second is the long-term dynamic capability index. Dynamic capabilities include innovation, resource integration, and market agility, which are the keys for enterprises to maintain competitive advantages in the ever-changing market environment. The number of patents ($\:{R}_{\text{innovation}}$) is selected as a measure of innovation capability, market response speed ($\:{R}_{\text{agility}}$) as a measure of market agility, and supply chain synergy efficiency ($\:{R}_{\text{resource\:integration}}$) as a measure of resource integration capability. The long-term reward function’s devise motivates enterprises to enhance their dynamic capabilities through fission. The specific long-term reward function can be written as Eq. (3):

$$\:{R}_{\text{long\:term}}={\beta\:}_{1}{R}_{\text{innovation}}+{\beta\:}_{2}{R}_{\text{agility}}+{\beta\:}_{3}{R}_{\text{resource\:integration}}.$$

(3)

β₁, β₂, and β₃ represent the weight parameters, and the optimal values are determined by experiment. The comprehensive reward function combines short- and long-term indices and is defined as:

$$\:R={\lambda\:}_{1}{R}_{\text{short\:term}}+{\lambda\:}_{2}{R}_{\text{long\:term}}.$$

(4)

λ₁ and λ₂ refer to the comprehensive weights, and the experiment determines the optimal values. Through this multilevel reward function design, the short- and long-term effects of enterprise fission can be comprehensively evaluated.

It is crucial to define reasonable state and action spaces when constructing the RL model. The state space reflects the current state of the enterprise, while the action space contains the fission strategy that the enterprise can adopt. The state space includes the enterprise’s current financial position, market performance, and dynamic capability level. Specific variables include total assets ($\:{\text{t}\text{o}\text{t}\text{a}\text{l}}_{\text{a}\text{s}\text{s}\text{e}\text{t}\text{s}}$), total liabilities ($\:{\text{t}\text{o}\text{t}\text{a}\text{l}}_{\text{l}\text{i}\text{a}\text{b}\text{i}\text{l}\text{i}\text{t}\text{i}\text{e}\text{s}}$), net profit ($\:{\text{n}\text{e}\text{t}}_{\text{p}\text{r}\text{o}\text{f}\text{i}\text{t}}$), cash flow ($\:{\text{c}\text{a}\text{s}\text{h}}_{\text{f}\text{l}\text{o}\text{w}}$), market share ($\:{\text{m}\text{a}\text{r}\text{k}\text{e}\text{t}}_{\text{s}\text{h}\text{a}\text{r}\text{e}}$), innovation index ($\:{\text{i}\text{n}\text{n}\text{o}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}}_{\text{i}\text{n}\text{d}\text{e}\text{x}}$), market agility index ($\:{\text{a}\text{g}\text{i}\text{l}\text{i}\text{t}\text{y}}_{\text{i}\text{n}\text{d}\text{e}\text{x}}$), and resource integration index ($\:{\text{r}\text{e}\text{s}\text{o}\text{u}\text{r}\text{c}\text{e}}_{{\text{i}\text{n}\text{t}\text{e}\text{g}\text{r}\text{a}\text{t}\text{i}\text{o}\text{n}}_{\text{i}\text{n}\text{d}\text{e}\text{x}}}$). The representation of state space reads:

$$\:S=\{{\text{t}\text{o}\text{t}\text{a}\text{l}}_{\text{a}\text{s}\text{s}\text{e}\text{t}\text{s}},{\text{t}\text{o}\text{t}\text{a}\text{l}}_{\text{l}\text{i}\text{a}\text{b}\text{i}\text{l}\text{i}\text{t}\text{i}\text{e}\text{s}},{\text{n}\text{e}\text{t}}_{\text{p}\text{r}\text{o}\text{f}\text{i}\text{t}},{\text{c}\text{a}\text{s}\text{h}}_{\text{f}\text{l}\text{o}\text{w}},{\text{m}\text{a}\text{r}\text{k}\text{e}\text{t}}_{\text{s}\text{h}\text{a}\text{r}\text{e}},{\text{i}\text{n}\text{n}\text{o}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}}_{\text{i}\text{n}\text{d}\text{e}\text{x}},$$

$$\:{\text{a}\text{g}\text{i}\text{l}\text{i}\text{t}\text{y}}_{\text{i}\text{n}\text{d}\text{e}\text{x}},{\text{r}\text{e}\text{s}\text{o}\text{u}\text{r}\text{c}\text{e}}_{{\text{i}\text{n}\text{t}\text{e}\text{g}\text{r}\text{a}\text{t}\text{i}\text{o}\text{n}}_{\text{i}\text{n}\text{d}\text{e}\text{x}}}\}.$$

(5)

The action space involves various strategic choices for enterprise fission, such as spin-off ($\:\text{s}\text{p}\text{i}{\text{n}}_{\text{o}\text{f}\text{f}}$), split-off ($\:\text{s}\text{p}\text{l}\text{i}{\text{t}}_{\text{o}\text{f}\text{f}}$), etc. The specific parameters of each strategy, such as proportion, time, etc., are also included in the action space. The representation of action space is as follows:

$$\:A=\{\text{s}\text{p}\text{i}{\text{n}}_{\text{o}\text{f}\text{f}},\text{s}\text{p}\text{l}\text{i}{\text{t}}_{\text{o}\text{f}\text{f}},\text{r}\text{e}\text{s}\text{o}\text{u}\text{r}\text{c}{\text{e}}_{\text{a}\text{l}\text{l}\text{o}\text{c}\text{a}\text{t}\text{i}\text{o}\text{n}},\text{m}\text{a}\text{r}\text{k}\text{e}{\text{t}}_{\text{s}\text{t}\text{r}\text{a}\text{t}\text{e}\text{g}\text{y}}\}.$$

(6)

To clarify the logical architecture of the enterprise fission path model, this study develops an enterprise fission decision model based on the SAC algorithm, as illustrated in Fig. 3.

Figure 3 presents an enterprise fission decision model based on the SAC algorithm, comprising three core components: state space, action space, and reward function. The state space incorporates financial indices (cash flow, total liabilities, net profit, and total assets), market performance (market share and sales revenue), and dynamic capabilities (innovation index, market agility index, and resource integration index). The action space encompasses fission strategies, including spin-off strategies, split-off strategies, and market strategy adjustments, along with resource reallocation. The reward function evaluates fission decision outcomes through both short-term and long-term indices. Short-term indices include net profit growth rate, cash flow stability, and market share improvement. Long-term indices encompass patent quantity growth, market response cycle reduction, and supply chain coordination efficiency enhancement. The model’s operational process follows three sequential phases: decision, implementation, and evaluation. During the fission decision phase, the model utilizes data from the state space, such as financial status, market performance, and dynamic capabilities. Meanwhile, this model combines with possible strategies in the action space, such as split-off strategies and resource reallocation, to make decisions through the SAC algorithm. This process involves the interaction between policy and value networks, as well as entropy regularization and experience replay mechanisms, to optimize decision-making strategies. Subsequently, in the fission implementation phase, specific fission strategies and resource allocations are executed based on the decision results. Finally, in the fission evaluation phase, the execution outcomes are evaluated through a reward function that integrates short-term and long-term indices to quantify fission effects, thus completing the feedback and evaluation of the entire fission process. This closed loop optimizes the decision-making process via the SAC algorithm to enhance resource allocation efficiency, market response speed, and dynamic capability development after enterprise fission, ultimately generating sustainable competitive advantages for enterprises.

Application of the SAC algorithm in enterprise fission decision

The SAC algorithm is an RL algorithm based on strategy gradient, which has a good balance between utilization and exploration. The SAC algorithm improves the randomness of the strategy by introducing the entropy regularization term, thus enhancing the model’s exploration ability. The SAC algorithm’s optimization goal is to maximize the weighted sum of reward and strategy entropy:

$$\:J\left(\pi\:\right)={\mathbb{E}}_{(s,a)\sim\:{\rho\:}_{\pi\:}}\left[\sum\:_{t=0}^{T}\:{\gamma\:}^{t}\left(r({s}_{t},{a}_{t})+\alpha\:\mathcal{\mathscr{H}}\left(\pi\:\right(\cdot\:\left|{s}_{t}\right))\right)\right].$$

(7)

γ represents the discount factor, α indicates the entropy regularization coefficient, and $\:\mathcal{\mathscr{H}}$ refers to the entropy of the strategy.

In the specific model training process, the parameters of the policy network and the value network are initialized first. The policy network is employed to generate the decision strategy of the enterprise, and the value network is used to estimate the long-term value of the decision. During the training, the interaction data of the agent is stored in the experience pool, and the parameters of the policy network are updated by the strategy gradient method to maximize the expected cumulative reward. At the same time, the real value function is approximated by updating the value network’s parameters through the Bellman equation.

To reduce the estimation error, a dual network architecture is adopted, namely, two independent value networks are used for estimation, and their minimum value is taken as the target value. In addition, the target network is introduced to stabilize the training process, and the target network’s parameters are gradually approximated to those of the main network through the soft update strategy.

The SAC algorithm’s specific training process is denoted in Fig. 4:

The SAC algorithm first initializes the parameters of the policy and value networks randomly in the training process to ensure the diversity of initial states. Next, the initial state is sampled from the enterprise’s current financial position and dynamic capability level to provide comprehensive information about its current state. The policy network generates fission decision strategies based on this state, and then performs corresponding actions to simulate the decision-making process of the enterprise in the market.

The rewards for each action are calculated according to a pre-designed multi-layered reward function that takes into account short-term financial performance and long-term dynamic capability gains. In this way, the model can balance short-term gains and long-term development when making decisions. After each action is performed, new states and rewards earned are recorded and stored in the experience playback pool, ensuring that the model can learn from historical decisions.

Samples are randomly selected from the experiential playback pool for network updating, the value network’s target is calculated by the Behrman equation, and its parameters are updated to better estimate the possible future gains. At the same time, the policy gradient method is used to optimize the parameters of the policy network, which makes the generated decision strategy more efficient.

At the end of each iteration, it is necessary to check whether the current policy has reached a predetermined convergence condition. If the convergence condition is reached, training is completed; Otherwise, it needs to return to the initial state of the sample and continue training. Through repeated iterations, the SAC algorithm continuously optimizes the strategy and value networks, enabling enterprises to make more flexible and efficient decisions in the complex and changeable market environment. Hence, it can improve resource allocation efficiency and market response speed, thus enhancing competitiveness and sustainable development ability.

Experimental setup

The SAC algorithm is an RL algorithm by the strategy gradient, which can good balance exploration and utilization. To verify this algorithm’s effect on enterprise fission decisions, enterprise financial and marketing data disclosed by NBS are selected. Firstly, the original data is cleaned, including removing missing values, filling in outliers, and standardizing processing to ensure data integrity and consistency. After data preprocessing, the SAC algorithm is used to train the model. The key parameters are set as follows. The entropy regularization coefficient (α) is set to 0.2 to balance the relationship between exploration and exploitation. The discount factor (γ) is 0.99, considering the importance of future rewards. Learning rate: The learning rate for the policy network and the value network is 0.001 and 0.002, respectively. Batch size: 64 samples are used per update. Experience playback pool size: It is set to 100,000 to store experience samples for training. By comparing the changes of various indices before and after fission, the effect of the model on resource allocation efficiency, market response speed, and dynamic capability improvement is evaluated. The main evaluation indices include as following. Resource allocation efficiency: It measures the improvement of enterprise resource utilization. Market response speed: It measures how quickly an enterprise responds to changes in the market. Innovation capability index: The innovation capability of enterprises is measured. Market agility index: An enterprise’s ability is measured to adapt to the market. Resource integration index: It measures the ability of an enterprise to integrate resources. After multiple rounds of training and optimization, the SAC algorithm-based enterprise fission decision model can finally generate a significant fission path.

Results and discussion

Experimental results based on the SAC algorithm

After several rounds of training and testing, the fission path optimized by SAC algorithm exhibits remarkable results, as expressed in Fig. 5:

Figure 5 denotes that after fission, the resource allocation efficiency of enterprises increases by 20.4% on average. This shows that by optimizing fission decisions, enterprises are able to allocate resources more rationally and reduce waste and redundancy. Post-fission enterprises improve their market response speed by 25.2%. This result reveals that the optimized fission path enables enterprises to respond more quickly to market changes, seize opportunities, and enhance market competitiveness. In the process of fission, the enterprises’ innovation capability index, market agility, and resource integration improve by 15.4%, 12.3%, and 10.5%.

Comparative analysis with other algorithms

To comprehensively evaluate the SAC algorithm’s effect on enterprise fission decisions, the SAC algorithm is compared with other commonly used RL algorithms. These include Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Deep Deterministic Policy Gradient (DDPG). The experiment employs the same dataset and experimental settings and compares the performance of each algorithm in resource allocation efficiency, market response speed, and dynamic capability improvement through multiple rounds of training and testing. The variations of various indices of each algorithm before and after fission are recorded, and the specific results are depicted in Figs. 6, 7, 8, 9 and 10:

Figure 6 indicates that in terms of resource allocation efficiency, the SAC algorithm shows an improvement rate of 20.4%, markedly outperforming the other algorithms. The DQN algorithm has the lowest improvement rate of only 8.4%, while the PPO algorithm achieves an improvement rate of 12.1%; and the DDPG algorithm demonstrates an improvement rate of 14.9%, slightly higher than PPO but still considerably lower than SAC. The SAC algorithm’s superior performance in resource allocation efficiency is primarily due to its entropy regularization feature, enabling the model to achieve a better balance between exploration and exploitation, thus allocating resources more efficiently.

Figure 7 shows that considering market response speed, the SAC algorithm exhibits an improvement rate of 25.2%, remarkably exceeding the other algorithms. The DDPG, PPO, and DQN algorithms have an improvement rate of 20.4%, 16.4%, and 11.7%. Comparatively, the SAC algorithm not only excels in response speed but also leads significantly in improvement rate. This demonstrates that the SAC algorithm can respond more quickly to the complex and variable market environment during the enterprise fission process, facilitating enterprises to gain an advantage in a competitive market. This is because the SAC algorithm’s entropy regularization feature allows it to learn and adapt to market changes more quickly in a complicated decision-making environment, thereby enhancing the enterprise’s market response speed.

Figure 8 illustrates that regarding the innovation capability index, the SAC algorithm achieves an improvement rate of 15.4%, significantly higher than the other algorithms. The DQN, PPO, and DDPG algorithms show an improvement rate of only 6.9%, 10.5%, and 12.8%. The outstanding performance of the SAC algorithm in the innovation capability index is due to its ability to explore new strategies and paths more effectively, thus stimulating the innovation potential of enterprises. This is crucial for the continuous development and competitiveness enhancement of enterprises after fission. By effectively exploring and utilizing resources, the SAC algorithm aids enterprises in identifying innovation paths and strategies during the fission process, thereby bolstering overall innovation capability.

Figure 9 depicts that in terms of market agility, the SAC algorithm achieves the highest improvement rate at 12.3%, slightly surpassing the other algorithms. The DQN, PPO, and DDPG algorithms demonstrate an improvement rate of 6.7%, 9.5%, and 11.3%. Although the SAC algorithm’s improvement rate in market agility is not as significant as in other indices, it still outperforms the other algorithms. This indicates that the proposed algorithm can maintain a high market agility level in handling the complex decision-making environment during enterprise fission, helping companies quickly adapt to market changes.

Figure 10 reveals that regarding resource integration capability, the SAC algorithm attains an improvement rate of 10.5%, also superior to the other algorithms. The DDPG, PPO, and DQN algorithms exhibit an improvement rate of 9.8%, 9.4%, and 7.0%. The excellent performance of the SAC algorithm in resource integration capability demonstrates its ability to effectively integrate and utilize various enterprise resources, facilitating resource reallocation and integration after fission, and providing strong support for sustainable development.

According to the above study, it can be observed that the SAC algorithm excels in innovation capability, resource allocation efficiency, market agility, market response speed, and resource integration capability, markedly outperforming other RL algorithms. Its entropy regularization feature enables the model to balance exploration and exploitation effectively, making it highly performant in the complex decision-making environment of enterprise fission. By applying the SAC algorithm, enterprises not only enhance key performance indices but also optimize resource allocation and utilization efficiency during the fission process. By optimizing strategy selection and decision-making, the SAC algorithm guides enterprises to identify optimal pathways and methods during fission, thus bolstering overall competitiveness and market performance. Additionally, the SAC algorithm stands out in enhancing enterprise innovation, market agility, and resource integration capabilities, prominently boosting the dynamic capabilities of enterprises. Enhancing dynamic capabilities is pivotal for maintaining competitiveness and achieving sustainable growth in a rapidly changing market environment.

Discussion

Through the analysis of the results presented in this study, it can be found that the proposed approach of optimizing enterprise fission paths based on the SAC algorithm is effective. This approach significantly enhances key indices such as resource allocation efficiency, market response speed, and the construction of dynamic capabilities. The experimental results indicate that the SAC algorithm performs exceptionally well in balancing exploration and exploitation. This results in an average increase of 20.4% in resource allocation efficiency, and a 25.2% improvement in market response speed. At the same time, improvements of 15.4%, 12.3%, and 10.5% are achieved in innovation capability, market agility, and resource integration capability, respectively. Compared to other RL algorithms, the SAC algorithm stands out in complex decision-making environments, further validating its applicability and advantages in enterprise strategic management. As proposed in the studies by Song et al. (2022)³⁴ and Zhao et al. (2024)³⁵, the SAC algorithm demonstrates remarkable advantages in efficiency, further supporting its applicability in complex optimization problems. This study expands the application of DRL in the field of enterprise management. Meanwhile, it provides new ideas and practical references for optimizing enterprise fission paths and enhancing dynamic capabilities, holding significant theoretical and practical implications.

Further analysis of the advantages of the SAC algorithm reveals that, as the core RL method of this study, its unique strengths lie in its ability to efficiently handle continuous action spaces and complex decision-making environments. The entropy regularization mechanism in SAC ensures the diversity of policy exploration, which is particularly crucial in the highly uncertain scenario of enterprise fission. This aligns with the views of Hao et al. (2022)³⁶. By introducing entropy regularization, the algorithm can discover potential optimal paths that might otherwise go unnoticed without diverse exploration. Other algorithms, such as PPO and DDPG, show certain limitations under the research objectives. While PPO performs well in terms of stability and efficiency, it falls short in the diversity of strategy exploration, which may limit its ability to discover innovative paths in enterprise fission. DDPG, although effective in continuous action spaces, is prone to overfitting and lacks the robustness of the entropy regularization mechanism, making it insufficient to cope with the complexity of dynamic decision-making environments. In contrast, SAC, with its dual-network architecture and entropy mechanism, provides a more balanced and adaptable solution, which is also confirmed by the experimental results.

Despite the results of this study highlighting the effectiveness of the SAC algorithm, further exploration of its applicability in different industries and economic environments is necessary. Future research could attempt to combine SAC with other RL algorithms or domain-specific heuristic methods to enhance performance. Additionally, by introducing more globally representative datasets, a more comprehensive assessment of the applicability of the SAC algorithm can be achieved.

Conclusion

This study investigates the path analysis and dynamic capability construction of enterprise fission based on DRL, especially the application of the SAC algorithm in this process. During the research process, the publicly available dataset of NBS is first employed to analyze and preprocess the enterprises’ financial and marketing data. Subsequently, a multi-level reward function is constructed, and the enterprise fission scenario is modeled as a complex decision environment to take full advantage of the SAC algorithm. By introducing the SAC algorithm into the enterprise fission decision process, significant improvements are demonstrated in multiple dimensions such as resource allocation efficiency, market response speed, market agility, innovation, and resource integration. This innovative application not only provides a new idea for the practice of DRL in enterprise management but also furnishes an important reference for the study in related fields. Although this study has achieved certain results, there are still some limitations. For example, the selection of datasets mainly focuses on enterprise finance and marketing data in China, which may have limited the universality of the study to some extent. Consequently, there are plans to apply the SAC algorithm in other countries and industries in the future to verify its effectiveness in diverse economic environments and industry characteristics.

Data availability

All data generated or analysed during this study are included in this published article [and its supplementary information files].

References

Pearson, R. Preface to the special issue: The emergence of private fusion Enterprises. J. Fusion Energy. 42 (2), 47 (2023).
Article CAS Google Scholar
Unterrainer, C. et al. Organizational and psychological features of successful Democratic enterprises: A systematic review of qualitative research. Front. Psychol. 13, 947559 (2022).
Article PubMed PubMed Central Google Scholar
Su, J., Zhang, Y. & Wu, X. How market pressures and organizational readiness drive digital marketing adoption strategies’ evolution in small and medium enterprises. Technol. Forecast. Soc. Chang. 193, 122655 (2023).
Article Google Scholar
Hu, Z. Research on influencing factors of Cross-Border E-commerce enterprise Competitiveness. Asian Bus. Res. 7 (2), 27 (2022).
Article MathSciNet Google Scholar
Xin, D. et al. How To Build a Business Ecosystem for Spin-off Enterprises Supported by Parent Enterprises? A Comparative Case Study Based on Resource Orchestration Theory4434–50 (Foreign Economics & Management, 2022). 04.
Yang, J. & Zeng, X. Research on the influence of strategic adjustment on enterprise performance from the perspective of mixed reform of State-owned enterprises: take Shanxi Fen wine as an example. Front. Bus. Econ. Manage. 14 (3), 155–161 (2024).
Article Google Scholar
Li, C. et al. Economic policy uncertainty and enterprise strategic change: evidence from China. Strategic Change. 32 (4–5), 125–137 (2023).
Article CAS Google Scholar
Shen, Z. et al. Study on the functional mechanism of enterprise internationalization strategy adjustment in the digital era[C]//IOP Conference Series: Materials Science and Engineering. IOP Publishing, 780(7): 072002. (2020).
Mankowitz, D. J. et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618 (7964), 257–263 (2023).
Article CAS PubMed PubMed Central Google Scholar
Le, H. et al. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 35, 21314–21328 (2022).
Google Scholar
Panzer, M. & Bender, B. Deep reinforcement learning in production systems: A systematic literature review. Int. J. Prod. Res. 60 (13), 4316–4341 (2022).
Article Google Scholar
Huang, S. et al. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. J. Mach. Learn. Res. 23 (274), 1–18 (2022).
MathSciNet Google Scholar
Wurman, P. R. et al. Outracing champion Gran turismo drivers with deep reinforcement learning. Nature 602 (7896), 223–228 (2022).
Article CAS PubMed Google Scholar
Ju, H. et al. Transferring policy of deep reinforcement learning from simulation to reality for robotics. Nat. Mach. Intell. 4 (12), 1077–1087 (2022).
Article Google Scholar
Degrave, J. et al. Magnetic control of Tokamak plasmas through deep reinforcement learning. Nature 602 (7897), 414–419 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sun, W. et al. High robustness energy management strategy of hybrid electric vehicle based on improved soft actor-critic deep reinforcement learning. Energy 258, 124806 (2022).
Article Google Scholar
Uhlenbruck, K., Meyer, K. E. & Hitt, M. A. Organizational transformation in transition economies: resource-based and organizational learning perspectives. J. Manage. Stud. 40 (2), 257–282 (2003).
Article Google Scholar
Kirtley, J. & O’Mahony, S. What is a pivot? Explaining when and how entrepreneurial firms decide to make strategic change and pivot. Strateg. Manag. J. 44 (1), 197–230 (2023).
Article Google Scholar
Knossalla, C. E. & Carbon, C. C. Neither entrepreneurship nor intrapreneurship: a review of how to become an innovative split-off start-up. Front. Sociol. 8, 1267706 (2023).
Article PubMed PubMed Central Google Scholar
Wang, S. et al. Smart manufacturing business management system for network industry spin-off enterprises. Enterp. Inform. Syst. 16 (2), 285–306 (2022).
Article Google Scholar
Yu, Y., Jia, X. & Qi, H. Parent company board reform and subsidiary optimization of cash holdings: A quasi-natural experiment from central state-owned enterprises in China. Res. Int. Bus. Finance. 66, 102058 (2023).
Article Google Scholar
Dong, J., Yang, C. & Yu, L. Big Data Analysis and Calculation of Capital Chain Break Risk Intelligent Path Identification Based on FISM Model. Highlights in Science, Engineering and Technology, 47: 274–283. (2023).
Abdulla, A. et al. A retrospective analysis of funding and focus in US advanced fission innovation. Environ. Res. Lett. 12 (8), 084016 (2017).
Article Google Scholar
Handoyo, S. et al. A business strategy, operational efficiency, ownership structure, and manufacturing performance: the moderating role of market uncertainty and competition intensity and its implication on open innovation. J. Open. Innovation: Technol. Market Complex. 9 (2), 100039 (2023).
Article Google Scholar
Fontoura, P. & Coelho, A. More cooperative… more competitive? Improving competitiveness by sharing value through the supply chain. Manag. Decis. 60 (3), 758–783 (2022).
Article Google Scholar
Bari, N., Chimhundu, R. & Chan, K. C. Dynamic capabilities to achieve corporate sustainability: A roadmap to sustained competitive advantage. Sustainability 14 (3), 1531 (2022).
Article Google Scholar
Schulze, A. & Brusoni, S. How dynamic capabilities change ordinary capabilities: reconnecting attention control and problem-solving. Strateg. Manag. J. 43 (12), 2447–2477 (2022).
Article Google Scholar
Cordeiro, M., Puig, F. & Ruiz-Fernández, L. Realizing dynamic capabilities and organizational knowledge in effective innovations: The capabilities typological map. J. Knowl. Manage. 27 (10), 2581–2603 (2023).
Article Google Scholar
Volpe, G., Mangini, A. M. & Fanti, M. P. A. Deep reinforcement learning approach for competitive task assignment in enterprise Blockchain. IEEE Access. 11, 48236–48247 (2023).
Article Google Scholar
Kathirgamanathan, A., Mangina, E. & Finn, D. P. Development of a soft actor critic deep reinforcement learning approach for Harnessing energy flexibility in a large office building. Energy AI. 5, 100101 (2021).
Article Google Scholar
Mohamadi, N. et al. An application of deep reinforcement learning and vendor-managed inventory in perishable supply chain management. Eng. Appl. Artif. Intell. 127, 107403 (2024).
Article Google Scholar
Anbazhagan, S. & Mugelan, R. K. Next-gen resource optimization in NB-IoT networks: Harnessing soft actor–critic reinforcement learning. Comput. Netw. 252, 110670 (2024).
Article Google Scholar
Zhang, W. et al. A novel soft Actor–Critic framework with disjunctive graph embedding and autoencoder mechanism for job shop scheduling Problems. J. Manuf. Syst. 76, 614–626 (2024).
Article Google Scholar
Song, L. et al. Research on PID parameter tuning and optimization based on SAC-auto for USV path following. J. Mar. Sci. Eng. 10 (12), 1847 (2022).
Article Google Scholar
Zhao, Y. et al. SAC: An ultra-efficient spin-based architecture for compressed DNNs. ACM Trans. Archit. Code Optim. 21 (1), 1–26 (2024).
Article MathSciNet Google Scholar
Hao, D. et al. Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games. Inf. Sci. 617, 17–40 (2022).
Article Google Scholar

Download references

Funding

The study received no funding.

Author information

Authors and Affiliations

School of Economics & Management, Tongji University, Shanghai, 200092, China
Hengsheng Gu

Authors

Hengsheng Gu
View author publications
Search author on:PubMed Google Scholar

Contributions

Hengsheng Gu was responsible for the data collection, writing, experiment and subsequent revision of the working lamp for the entire article.

Corresponding author

Correspondence to Hengsheng Gu.

Ethics declarations

Competing interests

This study does not have competing interests as defined by nature research, or other interests that may be considered to influence the results reported and discussed herein.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gu, H. Enterprise fission path optimization and dynamic capability construction based on the soft actor-critic algorithm. Sci Rep 15, 20942 (2025). https://doi.org/10.1038/s41598-025-06180-w

Download citation

Received: 23 September 2024
Accepted: 06 June 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-06180-w