US20160055589A1 - Automated claim risk factor identification and mitigation system - Google Patents
Automated claim risk factor identification and mitigation system Download PDFInfo
- Publication number
- US20160055589A1 US20160055589A1 US14/464,288 US201414464288A US2016055589A1 US 20160055589 A1 US20160055589 A1 US 20160055589A1 US 201414464288 A US201414464288 A US 201414464288A US 2016055589 A1 US2016055589 A1 US 2016055589A1
- Authority
- US
- United States
- Prior art keywords
- data
- model
- risk
- computer
- variables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Definitions
- aspects of this invention relate to a computerized assessment and recommended intervention for high risk workers' compensation claims and, more specifically, to applying machine learning technologies to data and delivering results via electronic means over a computer network in an effort to identify and mitigate costs associated with high risk workers' compensation claims.
- the American workers' compensation system is a no-fault scheme that has been implemented by each of the fifty states. While each state's law may vary from that of another state, all states share the same basic concepts: (1) if an employee is injured at work, the employer must provide defined benefits (medical, indemnity (lost wages) and disability) regardless of fault and (2) the employer is immune from a tort lawsuit by the employee on account of the employee's work injury.
- Some states allow employers to insure their obligations to provide workers' compensation benefits through, for example, a primary insurance policy that provides first dollar coverage, a large deductible policy, a state managed fund, or a self-insurance program.
- Employers that are allowed to self-insure their workers' compensation liabilities are generally required to purchase some type of excess insurance that provides coverage for claims that exceed a self-insured retention (SIR).
- SIR self-insured retention
- migratory claims Although some claims are readily identifiable from the outset as high risk, a large percentage of high cost claims can be labeled as migratory claims.
- a migratory claim appears to be much like a normal risk claim and then medical conditions gradually worsen over time. For example, initially, a low back strain may result in lost time, limited medical treatment, and dispensing of pharmaceuticals.
- the normal claim can migrate to high risk if the claimant continues to experience pain and opts for a costly surgical intervention.
- the claimant will have multiple surgeries over an extended period of time and likely end up on a cocktail of high-powered addictive drugs. Without a different medical treatment protocol, this pattern will repeat over and over incurring hundreds of thousands even millions of dollars in costs.
- aspects of the present invention permit quickly and accurately predicting claim outcomes by applying statistical and/or machine learning techniques not only for scoring claims at their inception but also for migratory claims.
- aspects of the invention provide fully integrated scoring engines to automatically generate predictions, store predictions, validate ongoing model performance, allow for automated model retraining, and electronically deliver the results.
- aspects of the invention facilitate targeted interventions based on the predictions to mitigate the risk of migratory claims.
- One aspect of the present invention comprises a system for an automated claim risk factor identification and mitigation system.
- software instructions are stored on one or more tangible, non-transitory computer-readable media and are executable by a processor.
- a processor executable method is provided.
- FIG. 1 is a diagram depicting a system of identifying claim risk factors and suggesting intervention strategies for mitigating potential claim losses, according to one embodiment of the invention.
- FIG. 2 is an exemplary flowchart depicting the data intake component in further detail according to one embodiment of the invention.
- FIG. 3 is an exemplary flowchart depicting the processed data component wherein various forms of data are stored in a repository according to one embodiment of the invention.
- FIG. 4 is an exemplary flowchart depicting a generalization of the scoring engine component according to one embodiment of the invention.
- FIG. 5 is an exemplary flowchart depicting a detailed view of the scoring engine component according to one embodiment of the invention.
- FIG. 6 is an exemplary flowchart depicting a model training/retraining process used by the scoring engine component according to one embodiment of the invention.
- FIG. 7 is an exemplary flowchart depicting a model scoring layer in greater detail according to one embodiment of the invention.
- FIG. 8 is an exemplary flowchart depicting a report engine according to one embodiment of the invention.
- FIG. 9 is an exemplary screenshot depicting contents of the prediction report according to one embodiment of the invention.
- FIG. 10 is an exemplary screenshot depicting contents of the prediction report in further detail according to one embodiment of the invention.
- FIG. 1 is a diagram depicting a general overview of one embodiment of an automated claim risk factor identification and mitigation system. Aspects of the invention combine statistical and machine learning techniques along with related data to predict high risk workers' compensation claims and ultimately make suggestions as to how to mitigate ongoing claim risk.
- the system is configured with two models—one predicting claims that could exceed $50,000 in total cost and another model that predicts whether a claim is likely to exceed the self-insured retention or deductible. It is to be understood that an unlimited number of predictive models are within the scope of the invention. Given the breadth of data, the system is capable of handling predictions on a wide range of dependent variables. The system accommodates such flexibility.
- the platform contains all processes necessary to automatically train, test and validate all models independently at any interval of time. This includes the ability to automatically train, test, and validate a model at a frequency that is responsive to unexpected changes to model performance metrics data 581 .
- the models in this embodiment predict a binary indicator—High Risk/Normal Risk with an indication of fit.
- the prediction data is stored along with all input data every the time model scoring is run.
- This embodiment accommodates up to “N” models, wherein the models are configured to run both independently (in serial or parallel execution) and in tandem (in serial execution) depending on needs.
- the model output is stored for all executions of each model and further summarized for presentation via a network connected online reporting tool.
- the online tool reports claim level prediction output including a severity ranking in a number of claim related risk factors. Elevated indications in certain risk factors are tied to specific interventions. For example, a high risk indication in a Pharmacological Risk Factor generates an intervention of Seeking a Pharmacy Benefit Manager. It is this automated intervention strategy coupled with the identification of high risk claims (and particularly high risk claim factors) using multiple predictive models that provides improved predictions and mitigation of high risk claims in accordance with aspects of the invention.
- a data intake component 100 comprises a process that retrieves and/or accepts workers' compensation claims data, payment data, medical data, pharmacy data, and data from other sources that are eventually applied to a model to be used in the system.
- the sources include but are not limited to U.S. census data, Social Security disability data, state regulatory issues, medical coding data, pharmacy databases, chronic condition and co-morbidity data, etc.
- a processed data component 110 refers to a database used by the system to house relevant data and to construct model-ready data for scoring.
- a scoring engine component 120 is preferably an automated scoring engine operating in accordance with aspects of the invention.
- the scoring engine 120 also contains an automated variable creation component, which is a process that creates variables by statistical techniques or other binning techniques, for use by the process model.
- the scoring engine 120 contains one or more predictive models.
- two models are housed. A first model is based on a neural network, and predicts whether a claim is likely to exceed $50,000 in total paid. The other is based on a logistic regression model and determines whether a claim is likely to exceed a self-insured retention or deductible.
- scoring engine 120 stores incoming scoring data, an unlimited number of predictive models, execution logs, model validation data, and current or historical scored claims.
- a prediction database and report component i.e., a report engine 130 houses several components that serve to supplement the output from scoring engine 120 , and to present the output to a user.
- FIG. 2 depicts a detailed view of the data intake component 100 in accordance with one embodiment of the invention.
- the data intake component 100 comprises a data intake process that is made up of two processes.
- a first process combines an extract, transform and load (ETL) process to bring raw data files into a SQL database engine.
- ETL extract, transform and load
- a second process validates the data using check routines to ensure quality and completeness.
- Claims or policy data 210 , claim payment data 220 , and medical and/or prescription billing data 230 is matched so that proper connections are established between the different data types.
- Alternative embodiments provide for various categories of data quality checks, including but not limited to: a) ensuring the claim and medical data belong to a known policy serviced by the provider, b) ensuring certain data fields are NOT NULL or correctly populated, c) eliminating duplicates, and/or d) ensuring proper connections between claim and medical bill data.
- various forms of other data 240 are similarly matched so that proper connections are established between the different data types. This ensures that each data type is connected to the proper claim.
- FIG. 3 depicts a detailed view of the processed data component 110 in accordance with one embodiment of the invention.
- the claim data repository 350 is, for example, a SQL database comprised of all the claim, payment and medical data that has met the requisite quality checks.
- this database houses certain cross-reference tables necessary for analysis and model scoring. These additional tables include, for example, medical coding cross-references, pharmacy data, jurisdictional risk data, etc.
- the claim data repository 350 contains an evidence based medical treatment crosswalk for comparison to medical bill data.
- This SQL database serves as the core repository of claims data utilized by the models.
- the loaded and validated data in claim data repository 350 comprises the claims or policy data 210 , claim payment data 220 , and medical and/or prescription billing data 230 in one embodiment of the invention.
- the loaded and validated data comprises other data 240 .
- the cross-reference tables made available by claim data repository 350 are standard to the industry; examples include the NCCI Part of Body, Nature of Injury, and Cause of Injury cross-reference tables.
- the cross-reference tables are widely available, but not known to be used widely in the industry.
- the HCUP comorbidity and chronic condition databases (available at: http://www.hcup-us.ahrq.gov/tools_software.jsp) is implemented in the database in one alternative embodiment.
- a U.S. census database is implemented to impute socio-demographic details about each claimant.
- FIG. 4 depicts a more detailed view of scoring engine component 120 in accordance with an embodiment of the invention.
- the scoring engine 120 receives necessary relevant data 410 from the claim data repository 350 .
- a model 420 serves to score and classify the necessary relevant data 410 .
- the model 420 depicted here indicates, for example, classification is implemented using a machine learning or “neural network” model. All open claims are processed and stored for archival and analysis, represented by historical scored open claim data 574 , and scored open claim data 572 is pulled by the prediction database 820 of FIG. 8 . Claims identified as standard risk 430 are not processed further, while claims identified as high risk 440 are passed along to the report engine 130 for further processing.
- FIG. 5 depicts a further detailed description of several components related to the scoring engine component 120 .
- the scoring engine 120 includes a model data layer 510 , an automated variable creation component 520 , supporting model data structures 530 , and a model scoring layer 540 .
- the automated variable creation component 520 comprises a process that creates variables by statistical techniques, or other binning techniques, for use by the process model or process models implemented by the scoring engine 120 .
- Variable selection and construction is a key component of building successful predictive models. Certain variables contain information that can be binned based on importance with respect to the dependent variable. Assessing the importance of a given variable is particularly important where the cardinality for a particular SQL field is large.
- one ICD9 code indicates more importance (“risk”) than another ICD9 code in a given model predicting total claim cost.
- risk indicates more importance
- another ICD9 code indicates more importance (“risk”) than another ICD9 code in a given model predicting total claim cost.
- the same ICD9 code when applied in different models, has completely differently levels of risk. Therefore, it is imperative that this relative importance is tuned for a specific model.
- the automated variable creation process 520 automatically takes an individual ICD9 code and generates an importance score with respect to a dependent variable given to the process.
- this process is not limited to scoring only the ICD9 code variables.
- other variables are also scored in this manner in alternative embodiments.
- the scores generated by this process become part of the variables utilized in the predictive models.
- standard data mining techniques are utilized to produce this score such as binning.
- “riskiness” related to a particular variable for example an ICD9 code variable, is derived from data that serves as the foundation of automated variable creation process 520 .
- a process automatically generates a score for a variable that is passed to the model used by scoring engine 120 .
- various tables also exist within the scoring engine component 120 are invoked by the model data layer 510 as supporting model data structures 530 comprising reference resources such as claim exclusion tables, NCCI cross-reference tables, ICD9 cross-reference tables, target variable manipulation tables, and other tables.
- the scoring engine 120 can be broadly defined as a collection of processes and data used to generate, store, and validate predictions. This process extracts data (i.e., the relevant data 410 ) from claim data repository 350 for transformation into a model scoring record.
- a “model scoring record” represents all relevant predictive variables summarized at the claim level with a corresponding dependent variable (variable for prediction) when such record is used for training, testing and validation.
- a model scoring record will not have a dependent variable when presented to the predictive model for scoring.
- Each model will have its own set of input variables used for prediction.
- a model record is constructed utilizing the source data from claim data repository 350 and the application of model specific data that was created by the automated variable creation process 520 and/or other processes 530 . Once built, the model scoring record is stored for subsequent scoring by the predictive model. The scoring process is run, whereby the model scoring records are scored utilizing the respective model. As part of the same process, the model also scores updated diagnostic data (“validation data”) to benchmark model performance.
- Each model has its own model database that contains data and electronic program code for transforming the raw data into model scoring records.
- each model database will contain the logic and data necessary to automatically train, test and if necessary validate the model based on feedback as indicated by model retraining process 550 and model feedback data 560 .
- all of the resulting model scoring records and associated scoring data are written out to SQL database tables in a model output layer 570 for storage and analysis.
- ICD9 code data is transformed in various ways.
- groups of ICD9 codes are arranged into high risk workers' compensation injury classifications—Back and Thoracic, Knee, Shoulder, Burns, Reflex Sympathetic Dystrophy Syndrome, Pain, Diabetes, etc.
- this classification process groups ICD9 codes into certain injury types and not all ICD9 codes in a given classification are created equal with respect to riskiness.
- the medical data from claims is leveraged to better understand the riskiness of a given ICD9 within a certain classification.
- various other forms of data are transformed in various ways to render an enhanced prediction regarding the riskiness of a given claim.
- the model scoring engine collectively refers to the input data, the model training/testing/validation processes, the resulting predictive models and the generated output data.
- the general component parts of scoring engine 120 are: the model data layer 510 , which contains the data and transformations used to prepare for model scoring; the model retraining process 550 , which is an automated mechanism to retrain a model with new or updated data; the model scoring layer 540 , which provides an application of a specific model to its related model data.
- scoring engine 120 includes a model output layer 570 for the data captured as a result of processing in the model scoring layer 540 .
- model data layer 510 to enable automated scoring, the data presented to the respective predictive model must be properly transformed.
- a model scoring record is defined.
- the scoring engine 120 consumes data from the claim data repository 350 for non-closed claims and writes open claim input data 513 which is used to construct model scoring records for the non-closed claims.
- training, testing and validation model scoring records are also created, supplemented in an alternative embodiment by validation set input data 516 .
- Some models employ automated random or other sampling techniques to balance the training, testing, and validation model scoring records according to aspects of the invention.
- the model data layer 510 for that model is refreshed with the most current claim information.
- the model scoring engine 120 contains the ability to individually train/retrain each model. Every predictive model performing classification or regression must be initially trained with model scoring data that has the dependent variable present.
- the scoring engine 120 is utilized to train each predictive model prior to its first scoring execution.
- scoring engine 120 is capable of automatically retraining and testing each model as part of the scoring process. This ability forms a core of a true machine learning process. In this manner, the scoring engine 120 can dynamically adapt its predictive models as new data, representing changing circumstances, enters the system.
- model scoring layer 540 trained models and the associated training and testing data are stored as objects inside a component of the database engine. These objects are created as a result of the implementation platform. It is not necessary that the models be stored as database objects. Each model can be referenced as function call whereby the function is passed a model scoring record and the function returns the original model scoring record plus a prediction and an indication of certainty about that prediction.
- model scoring layer 540 scores open claims as well as validation model scoring records. The scoring of validation data is performed in an effort to understand model performance over time.
- model scoring layer 540 preferably hands its data to model output layer 570 for storage in database tables.
- the scored open claims are stored as historical scored open claim data 574 and the scored validation claims with history are stored as scored validation data 579 .
- scoring engine 120 retains the prediction along with the model scoring record for each of the open claims and validation data.
- model output layer 570 also catalogs a confusion matrix for assessing the effectiveness of the learning. The confusion matrix is based on the last execution of the model using the validation data. The validation set input data 516 is used when cataloging the confusion matrix. In another embodiment, the confusion matrix is captured historically.
- scoring engine 120 is configured to implement various predictive models.
- scoring engine 120 uses one or more predictive models, alone or in combination.
- a detailed summary of each exemplary model is provided in Appendix A and Appendix B, respectively.
- Appendix A sets forth a first model, MdlTRIAGEINT001, which identifies claims that are more likely to exceed the self-insured retention or deductible.
- Appendix B sets forth a second model, MdlTRIAGEEXT001, which identifies claims that are likely to exceed a cap, such as $50,000 in total cost. It should be apparent that the predictive models are used, in part, to identify claims at different points in a claim's lifecycle.
- mdlTriageINT001 is looking for claims that have the potential to breach the self-insured retention.
- the model mdITRIAGEEXT001 is trained to identify a claim that is likely to exceed $50,000 in total expenses.
- An appearance on a prediction report generated by the report engine 130 of FIG. 1 is an indication that a given claim is classified as elevated risk.
- inclusion on a report produced by the report engine 130 alerts a claim's analyst to review the treatment pattern for a given claim and develop an appropriate action plan based on the automated recommendations.
- FIG. 6 depicts the model training/retraining process 550 used by scoring engine 120 in accordance with an embodiment of the invention.
- the SQL Server Integration Services training package which is a part of the SQL Server 2008 R2 Database Platform, provides a suitable implementation of model training/retraining process 550 .
- the R2 Database Platform provides higher level tools that reduce the need for programming, and it is important to note that the services provided by this platform may be implemented in other languages and platforms with equal effectiveness.
- the entire predictive model automation process is constructed utilizing a combination of the SQL Server Database Engine, SQL Server Integration Services Packages and models and predictive model objects stored in SQL Server Analysis Services.
- Step 620 indicates that the list was successfully rebuilt.
- the list of claims used in a TEST set to evaluate the performance of the model is cleared from the mining structure during a truncate test table step 630 and indicated at 640 when successfully completed.
- the mining structure and mining models are reprocessed during an analysis services processing step 650 .
- This step includes loading the new claims from the rebuilt list of claims in the build model step 610 into the mining structure. A random 30% of the data from these new claims are held out from the previously mentioned TEST set.
- retraining of the model occurs during the analysis services processing step 650 based on the remaining 70% of new claims from the build model step 610 .
- the TEST set now stored in the mining structure from the analysis services processes step 650 if successful at 660 , is written to a database table during a data flow task step 670 and indicated at 680 as successful.
- FIG. 7 depicts an embodiment of the model scoring layer 540 in greater detail, as implemented in the previously mentioned SQL Server Integration Services training package.
- a store run time step 710 indicated at 720 when successful, controls an archive opens step 730 , indicated at 740 when successful.
- steps 730 are shown at 740 when successful.
- operations proceed to a score open claims step 750 to track and archive the status of open claims.
- Success of the score open claims step 750 is indicated at 760 .
- the store run time step 710 further controls a score validation claims step 770 , indicated at 780 when successful. Model performance is checked over time in the score validation claims step 770 as well as a store validation metrics step 785 , indicated at 790 when successful.
- a validation set To verify that a given model is performing as expected, certain test cases, termed a “validation set,” are pulled out of the general population of claims data and scored by the models. The validation set changes over time to include new claims that have recently closed or met some other criteria. The output of the model is checked against reality by tracking the costs associated with a given claim. Since the claim's outcome is known, scoring the associated model record will lead to either an accurate or inaccurate prediction.
- Data is pulled from the claim data repository 350 depicted in FIG. 3 .
- the validation set input data 516 depicted in FIG. 5 creates test cases. A confusion matrix is created from scored validation data 576 and then used to generate model performance metrics data 581 . This data indicates whether the model is performing as expected.
- FIG. 8 depicts the report engine 130 in further detail.
- Incoming claim files 810 that have been designated high risk by the scoring engine 120 are stored in a prediction database 820 and are used by a prediction report 830 .
- An intervention database 825 maintains model interventions associated with different risks.
- the scoring engine 120 generates a great deal of data about an individual claim. Not only is a current prediction available, but historical predictions are also available. This is a more complete picture of an individual claim's risk than a single point estimate.
- the many variables that are found in the model scoring record are not always of value for deciphering what is driving claim riskiness. Organizing the model data and the entire prediction history paints a more compelling picture of claim risk factors.
- the data from the model output layer 570 is transformed into the prediction report 830 .
- scored open claim data 572 is available for use by the prediction database 820 .
- the data is then pulled from the prediction database 820 and is used by the report 830 .
- the report engine 130 preferably generates a prediction report 830 that displays, for example, model variables categorized into 1 of 8 “risk factor categories” included as illustrated in Table 1:
- Claimant Personal Risk Attributes particular to the injured worker The severity score would Factors consider factors such as the age and gender of the claimant as well as the claimant's tenure with the employer. Claim Risk Severity Attributes that are related to the type of injury suffered by the claimant. The severity score would consider the seriousness of the work related injury - broken arm, strained back, burned hand, head injury, etc. Non-Pharmacological Attributes found in the medical data that are not related to Treatment Risk Factors pharmacy. The severity score will consider the diagnoses that appear on medical bills as well as the medical services begin performed. Pharmacological Attributes found in the medical data that involve prescription drugs. Treatment Risk Factors The severity score will consider the types of drugs that are prescribed to the claimant.
- Claimant Biological Risk Attributes associated with the medical bills that are related to Factors coexisting medical conditions are medical issues that are not necessarily related to the compensable workers compensation injury, but nevertheless appear in the medical bill data.
- the severity score would consider issues such as diabetes, hypertension and other similar conditions. Physician Outcome Not available at this time. (when Available)
- the severity score considers the indemnity expenses on the claim as well as how the diagnosis codes (from the medical data) impact the expected lost time.
- Regulatory/Legal Risk Attributes related to the legal environment associated with the Factors jurisdiction governing the claim The severity score considers the jurisdictional related factors such as the ability to settle medical/indemnity and who can direct care on the claim.
- risk score is based on crosswalk data that identifies risk.
- state risk is based on a third party tool that assesses workers' compensation risk by jurisdiction.
- a further transformation takes the current point estimate prediction and factors in past prediction changes to give an indication of the prediction trend for each claim—Increasing, Decreasing or Flat. The trend indication allows for varying degrees of change (depending on the current prediction score relative to the previous score change) before alert of an increasing or decreasing trend is presented. This more accurately predicts whether reports a change in trend is material. Representative selections of code that perform various portions of this transformation, such as the initial selecting of data used to render the evaluation of comorbidity factors, are detailed below:
- the resulting data from these additional transformations are the foundation of the prediction report 830 .
- the prediction report 830 generated by the report engine 130 serves as the primary user interface into the prediction and intervention suggestion system.
- the report contains claim identification data in addition to the risk factor categories and prediction trend indication.
- an indication of the $50,000 prediction MdlTRIAGEEXT001 appears on this report.
- a prediction and trend indication is given for the MdlTRIAGEINT001 model (likely to exceed retention model).
- model variable categories have interventions stored in the Intervention database 825 .
- an identified risk factor linked to a specific intervention as indicated at 840 can be determined based upon prediction report 830 .
- claimant personal risk factors cannot be mitigated by intervention.
- regulatory and/or legal risk factors are based on the law of the jurisdiction governing the claim and, thus, cannot be changed.
- most other risk factors have specific suggested interventions.
- the specific suggested interventions are cataloged in intervention database 825 .
- the design is flexible so as to support interventions for both generic and specific risk factors. There is no limit to the number of interventions that can be configured. Included below in Table 2 is an exemplary list of interventions tied to specific risk factors; a more complete description of particularly relevant interventions is described below.
- the interventions are based on the expert medical opinion of a medical doctor but apply generically based on the risk factor category:
- Utilization Review Is allowed in many jurisdictions; other jurisdictions do not address the use of UR but do not prohibit it.
- PBM Pharmacy Benefit Managers
- DIR drug indication reviews
- Modern Medical, Inc. has an excellent Opioid Defense Manager that identifies opioid overuse prospectively and intervenes at the level of the prescriber and injured worker.
- CBT Cognitive Behavioral Therapy
- COPE a national CBT provider group understands workers' compensation and does not use psychiatric diagnostic or billing codes. They can evaluate the patient and recommend, if appropriate, a limited intervention to help injured workers recover more quickly.
- FRP Functional Restoration Programs
- the prediction report 830 is delivered via the Internet, shown at 850 , or other suitable data communications network to web report users 860 and web service consumers 870 . Not all consumers 860 of this prediction and intervention process will want to receive reporting via online. In one embodiment, consumers 860 will integrate the output into existing business systems. In other embodiments, integration is not necessary or desired. Therefore, a secure web service, for example, provides prediction report data generated by the report engine 130 to such web service consumers 870 . The web service requires individualized authentication and will take a claim number or the like as input and return the contents of the corresponding prediction report 830 as an output generated by the report engine 130 .
- FIGS. 9 and 10 are exemplary screenshots, for illustrative purposes only, depicting contents of the prediction report according to one alternative embodiment.
- Risk factor categories are displayed including, for example, Lost Time Risk Factors 910 and Regulatory/Legal Risk Factors 920 .
- a Risk Score 930 can be thought of as a certainty measure that further explains the High Risk/Standard Risk determination made by scoring engine 120 . The closer this probability is to 1 the more certain the model is about the high risk prediction. The closer the probability is to 0 the more certain of a standard risk claim—with 0.5 being an exemplary transition point between high risk and standard risk.
- the score from the model is displayed in FIG. 9 as a probability score *100 and is in the column titled “Risk Score.”
- Examples of well-known computing systems, environments, and/or configurations that are suitable for use with aspects of the invention in various embodiments include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, network PCs, minicomputers, mainframe computers, distributed computing environments (such as cloud-based computing) that include any of the above systems or devices, and the like.
- Embodiments of the aspects of the invention are described in the general context of data and/or processor-executable instructions in various embodiments, such as program modules, stored one or more tangible, non-transitory storage media and executed by one or more processors or other devices.
- program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types.
- aspects of the invention are also practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules are located in both local and remote storage media including memory storage devices.
- processors, computers and/or servers execute the processor-executable instructions (e.g., software, firmware, and/or hardware) such as those illustrated herein to implement aspects of the invention.
- processors, computers and/or servers execute the processor-executable instructions (e.g., software, firmware, and/or hardware) such as those illustrated herein to implement aspects of the invention.
- processor-executable instructions are implemented with processor-executable instructions.
- the processor-executable instructions are organized into one or more processor-executable components or modules on a tangible processor readable storage medium in various embodiments.
- Aspects of the invention are implemented with any number and organization of such components or modules in various embodiments. For example, aspects of the invention are not limited to the specific processor-executable instructions or the specific components or modules illustrated in the figures and described herein.
- Other alternative embodiments of the aspects of the invention include different processor-executable instructions or components having more or less functionality than illustrated and described herein.
- the model takes in a host of variables/features, discussed below, and outputs a probability of becoming an excess claim.
- This model data resides in the MDLTRIAGEINT001 and MDLCOMMONTRIAGEINT databases on the production system.
- the training set includes: a) all open large loss claims, and b) all closed non-claims. Data deemed to have insufficient completeness of medical or pharmacy data will not be included. 80% of the claims meeting these criteria were used for training.
- the validation set has the same criteria as the training set except that is uses the 20% of the data that was not used for training data deemed to have insufficient completeness of medical or pharmacy data.
- a full set of potential training data resides in the table mdITRIAGEINT001_Training.
- the features for the model can be divided into a several classes:
- the area under the curve measurement in R is 0.86 using the 20% holdout sample.
- the ratio of true positives to all actual positives is 87%.
- the ratio of false positives to all predicted positives is 19%.
- variable importance measures used to identify relevant features for the model.
- the logistic regression was chosen as the ultimate solution. Although it performs slightly worse than the neural network, it is less of a black box and lends itself to understanding what drives predictions.
- the model is mdlLogisticRegression in the mining structure mdITRIAGEINT001 Training.
- the ratio of true positives to all actual positives is 89%.
- the ratio of false positives to all predicted positives is 17%.
- the model takes in a host of variables/features, discussed below, and outputs a probability of exceeding the 50 k threshold.
- This model data resides in the MDLTRIAGEEXT001 and MDLCOMMONTRIAGEEXT databases on the production system.
- the model itself resides in TRIAGEEXT001 database of the production Analysis Services database.
- the training set is restricted to claim administrators where we have reasonably complete data. For training we use all closed claims and those open claims that have already hit the 50 k threshold.
- a full set of potential training data resides in the table mdITRIAGEEXT001_TrainingSet.
- the features for the model can be divided into a several classes:
- the neural net was chosen as the ultimate solution for two compelling reasons: it had a slight edge in performance throughout the development process, and it is harder for clients to reverse engineer the probability scores it produces.
- the oversampled training data is in the table mdITRIAGEEXT001 — 30 pOversample. The oversample percentage was chosen to accommodate a maximum false positive rate of 5%.
- the model is Neural Net in the mining structure mdITRIAGEEXT001.
- the final model performs very well at its appointed task.
- the final model has a confusion matrix based on the test data (in mdITRIAGEEXT001_Test):
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A system to predict and identify claims that have a high likelihood of exceeding a predetermined limitation in a given excess workers' compensation insurance policy and to present the automated indication of possible intervention strategies to mitigate potential claims costs. The processing system includes a computer server, database engine, computer programming instructions, network connectivity, associated claims, payment, medical, pharmacy and other relevant data, a plurality of statistical and machine learning algorithms and a method for electronically displaying and attaching the results to a business process. The system will use all available data to analyze the medical treatment pattern of a claimant and based on automated findings make recommendations as to appropriate interventions to positively impact claims costs.
Description
- Aspects of this invention relate to a computerized assessment and recommended intervention for high risk workers' compensation claims and, more specifically, to applying machine learning technologies to data and delivering results via electronic means over a computer network in an effort to identify and mitigate costs associated with high risk workers' compensation claims.
- The American workers' compensation system is a no-fault scheme that has been implemented by each of the fifty states. While each state's law may vary from that of another state, all states share the same basic concepts: (1) if an employee is injured at work, the employer must provide defined benefits (medical, indemnity (lost wages) and disability) regardless of fault and (2) the employer is immune from a tort lawsuit by the employee on account of the employee's work injury. Generally, most states allow employers to insure their obligations to provide workers' compensation benefits through, for example, a primary insurance policy that provides first dollar coverage, a large deductible policy, a state managed fund, or a self-insurance program. Employers that are allowed to self-insure their workers' compensation liabilities are generally required to purchase some type of excess insurance that provides coverage for claims that exceed a self-insured retention (SIR).
- As medical and pharmaceutical technology has advanced, so have the expenses associated with these treatments. In addition, medical cost inflation has also aggressively trended upward over the last several decades. It is well known that medical expenses per capita in the US far exceed other industrialized nations. Not unexpectedly, in the last decade, the primary expense associated with workers' compensation claims has dramatically shifted from lost wages to medical and pharmaceutical related expenses. Currently, medical and pharmacy expense make up, on average, 60% of the total costs in workers' compensation claims. Understanding these trends coupled with predicting and early intervention on high risk claims is paramount to tackling the problem of medical cost inflation and over utilization in workers' compensation claims.
- Given the shift to medical related claim's expenses and the medical cost trends in the United States, workers' compensation insurers are acutely aware of the need to better manage claim's related medical costs. On complex claims, human adjusters must plow through voluminous medical records to understand the medical cost drivers on a claim and formulate a plan to mitigate claims costs. An automated approach to such review would drastically reduce time to identify and intervene on problem claims.
- Workers' compensation claims can be classified into two broad categories—Medical Only and Indemnity claims. Medical only claims incur limited medical costs, no lost wage costs, and then close. Indemnity claims involve injuries that cause the employee to be out of work for a period of time. While most injured employees return to work, some do not. Those who do not return to work receive workers' compensation benefits for life or for a substantial period of time. Some injuries are catastrophic and are known to be high risk at claim outset. This class of claim would include injuries such as death, some amputations, serious burns, brain injuries and paralysis. These claims are assigned to only experienced adjusters and nurse case managers.
- Although some claims are readily identifiable from the outset as high risk, a large percentage of high cost claims can be labeled as migratory claims. A migratory claim appears to be much like a normal risk claim and then medical conditions gradually worsen over time. For example, initially, a low back strain may result in lost time, limited medical treatment, and dispensing of pharmaceuticals. However, the normal claim can migrate to high risk if the claimant continues to experience pain and opts for a costly surgical intervention. In a typical migratory large loss claim, the claimant will have multiple surgeries over an extended period of time and likely end up on a cocktail of high-powered addictive drugs. Without a different medical treatment protocol, this pattern will repeat over and over incurring hundreds of thousands even millions of dollars in costs.
- The workers' compensation insurance industry and, in particular, self-insured employers and their excess carriers, have been slow to adopt automation. While automation has occurred, most of it has centered on workflow for handling claims and sending alerts when red flags appear on a claim. Further, conventional approaches lack the ability to effectively predict migratory claims. At most, current implementations merely include rudimentary models based on the summation of red flags to generate a single risk score. An automated approach for the accurate and early identification of such claims and for suggesting intervention techniques would greatly improve existing manual processes to identify migratory claims.
- Briefly, aspects of the present invention permit quickly and accurately predicting claim outcomes by applying statistical and/or machine learning techniques not only for scoring claims at their inception but also for migratory claims. Moreover, aspects of the invention provide fully integrated scoring engines to automatically generate predictions, store predictions, validate ongoing model performance, allow for automated model retraining, and electronically deliver the results. Advantageously, aspects of the invention facilitate targeted interventions based on the predictions to mitigate the risk of migratory claims.
- One aspect of the present invention comprises a system for an automated claim risk factor identification and mitigation system.
- In another aspect, software instructions are stored on one or more tangible, non-transitory computer-readable media and are executable by a processor.
- In yet another aspect, a processor executable method is provided.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- Other features will be in part apparent and in part pointed out hereinafter.
-
FIG. 1 is a diagram depicting a system of identifying claim risk factors and suggesting intervention strategies for mitigating potential claim losses, according to one embodiment of the invention. -
FIG. 2 is an exemplary flowchart depicting the data intake component in further detail according to one embodiment of the invention. -
FIG. 3 is an exemplary flowchart depicting the processed data component wherein various forms of data are stored in a repository according to one embodiment of the invention. -
FIG. 4 is an exemplary flowchart depicting a generalization of the scoring engine component according to one embodiment of the invention. -
FIG. 5 is an exemplary flowchart depicting a detailed view of the scoring engine component according to one embodiment of the invention. -
FIG. 6 is an exemplary flowchart depicting a model training/retraining process used by the scoring engine component according to one embodiment of the invention. -
FIG. 7 is an exemplary flowchart depicting a model scoring layer in greater detail according to one embodiment of the invention. -
FIG. 8 is an exemplary flowchart depicting a report engine according to one embodiment of the invention. -
FIG. 9 is an exemplary screenshot depicting contents of the prediction report according to one embodiment of the invention. -
FIG. 10 is an exemplary screenshot depicting contents of the prediction report in further detail according to one embodiment of the invention. - Corresponding reference characters indicate corresponding parts throughout the drawings.
-
FIG. 1 is a diagram depicting a general overview of one embodiment of an automated claim risk factor identification and mitigation system. Aspects of the invention combine statistical and machine learning techniques along with related data to predict high risk workers' compensation claims and ultimately make suggestions as to how to mitigate ongoing claim risk. In an embodiment, the system is configured with two models—one predicting claims that could exceed $50,000 in total cost and another model that predicts whether a claim is likely to exceed the self-insured retention or deductible. It is to be understood that an unlimited number of predictive models are within the scope of the invention. Given the breadth of data, the system is capable of handling predictions on a wide range of dependent variables. The system accommodates such flexibility. The platform contains all processes necessary to automatically train, test and validate all models independently at any interval of time. This includes the ability to automatically train, test, and validate a model at a frequency that is responsive to unexpected changes to modelperformance metrics data 581. - The models in this embodiment predict a binary indicator—High Risk/Normal Risk with an indication of fit. The prediction data is stored along with all input data every the time model scoring is run. This embodiment accommodates up to “N” models, wherein the models are configured to run both independently (in serial or parallel execution) and in tandem (in serial execution) depending on needs. The model output is stored for all executions of each model and further summarized for presentation via a network connected online reporting tool. The online tool reports claim level prediction output including a severity ranking in a number of claim related risk factors. Elevated indications in certain risk factors are tied to specific interventions. For example, a high risk indication in a Pharmacological Risk Factor generates an intervention of Seeking a Pharmacy Benefit Manager. It is this automated intervention strategy coupled with the identification of high risk claims (and particularly high risk claim factors) using multiple predictive models that provides improved predictions and mitigation of high risk claims in accordance with aspects of the invention.
- Referring further to
FIG. 1 , the automated claim risk factor identification and mitigation system comprises four components that will be described in greater detail below. The components refer to software objects in one embodiment, or processes invoked by the software objects in alternative embodiments. Adata intake component 100 comprises a process that retrieves and/or accepts workers' compensation claims data, payment data, medical data, pharmacy data, and data from other sources that are eventually applied to a model to be used in the system. In alternative embodiments, the sources include but are not limited to U.S. census data, Social Security disability data, state regulatory issues, medical coding data, pharmacy databases, chronic condition and co-morbidity data, etc. A processeddata component 110 refers to a database used by the system to house relevant data and to construct model-ready data for scoring. - In
FIG. 1 , ascoring engine component 120 is preferably an automated scoring engine operating in accordance with aspects of the invention. As will be described in greater detail below, thescoring engine 120 also contains an automated variable creation component, which is a process that creates variables by statistical techniques or other binning techniques, for use by the process model. Further, thescoring engine 120 contains one or more predictive models. In one alternative embodiment, two models are housed. A first model is based on a neural network, and predicts whether a claim is likely to exceed $50,000 in total paid. The other is based on a logistic regression model and determines whether a claim is likely to exceed a self-insured retention or deductible. Additionally, scoringengine 120 stores incoming scoring data, an unlimited number of predictive models, execution logs, model validation data, and current or historical scored claims. And a prediction database and report component, i.e., areport engine 130 houses several components that serve to supplement the output from scoringengine 120, and to present the output to a user. -
FIG. 2 depicts a detailed view of thedata intake component 100 in accordance with one embodiment of the invention. Thedata intake component 100 comprises a data intake process that is made up of two processes. A first process combines an extract, transform and load (ETL) process to bring raw data files into a SQL database engine. After loading to the database, a second process validates the data using check routines to ensure quality and completeness. Claims orpolicy data 210, claimpayment data 220, and medical and/orprescription billing data 230 is matched so that proper connections are established between the different data types. Alternative embodiments provide for various categories of data quality checks, including but not limited to: a) ensuring the claim and medical data belong to a known policy serviced by the provider, b) ensuring certain data fields are NOT NULL or correctly populated, c) eliminating duplicates, and/or d) ensuring proper connections between claim and medical bill data. In an alternative embodiment, various forms ofother data 240 are similarly matched so that proper connections are established between the different data types. This ensures that each data type is connected to the proper claim. -
FIG. 3 depicts a detailed view of the processeddata component 110 in accordance with one embodiment of the invention. After the data is validated bydata intake component 100, the validated data is then loaded to aclaim data repository 350 for analysis and model scoring. Theclaim data repository 350 is, for example, a SQL database comprised of all the claim, payment and medical data that has met the requisite quality checks. In addition, this database houses certain cross-reference tables necessary for analysis and model scoring. These additional tables include, for example, medical coding cross-references, pharmacy data, jurisdictional risk data, etc. Notably, theclaim data repository 350 contains an evidence based medical treatment crosswalk for comparison to medical bill data. This SQL database serves as the core repository of claims data utilized by the models. The loaded and validated data inclaim data repository 350 comprises the claims orpolicy data 210, claimpayment data 220, and medical and/orprescription billing data 230 in one embodiment of the invention. In an alternative embodiment, the loaded and validated data comprisesother data 240. - In alternative embodiments, the cross-reference tables made available by
claim data repository 350 are standard to the industry; examples include the NCCI Part of Body, Nature of Injury, and Cause of Injury cross-reference tables. In other alternative embodiments, the cross-reference tables are widely available, but not known to be used widely in the industry. For example, the HCUP comorbidity and chronic condition databases (available at: http://www.hcup-us.ahrq.gov/tools_software.jsp) is implemented in the database in one alternative embodiment. As a further example, a U.S. census database is implemented to impute socio-demographic details about each claimant. -
FIG. 4 depicts a more detailed view ofscoring engine component 120 in accordance with an embodiment of the invention. Thescoring engine 120 receives necessaryrelevant data 410 from theclaim data repository 350. Amodel 420 serves to score and classify the necessaryrelevant data 410. Themodel 420 depicted here indicates, for example, classification is implemented using a machine learning or “neural network” model. All open claims are processed and stored for archival and analysis, represented by historical scoredopen claim data 574, and scoredopen claim data 572 is pulled by theprediction database 820 ofFIG. 8 . Claims identified asstandard risk 430 are not processed further, while claims identified ashigh risk 440 are passed along to thereport engine 130 for further processing. -
FIG. 5 depicts a further detailed description of several components related to thescoring engine component 120. As shown, thescoring engine 120 includes amodel data layer 510, an automatedvariable creation component 520, supportingmodel data structures 530, and amodel scoring layer 540. According to aspects of the invention, the automatedvariable creation component 520 comprises a process that creates variables by statistical techniques, or other binning techniques, for use by the process model or process models implemented by thescoring engine 120. Variable selection and construction is a key component of building successful predictive models. Certain variables contain information that can be binned based on importance with respect to the dependent variable. Assessing the importance of a given variable is particularly important where the cardinality for a particular SQL field is large. For example, in one embodiment, one ICD9 code, indicates more importance (“risk”) than another ICD9 code in a given model predicting total claim cost. In alternative embodiments, the same ICD9 code, when applied in different models, has completely differently levels of risk. Therefore, it is imperative that this relative importance is tuned for a specific model. - In an embodiment, the automated
variable creation process 520 automatically takes an individual ICD9 code and generates an importance score with respect to a dependent variable given to the process. However, this process is not limited to scoring only the ICD9 code variables. Thus, other variables are also scored in this manner in alternative embodiments. The scores generated by this process become part of the variables utilized in the predictive models. Preferably, standard data mining techniques are utilized to produce this score such as binning. In alternative embodiments, “riskiness” related to a particular variable, for example an ICD9 code variable, is derived from data that serves as the foundation of automatedvariable creation process 520. In additional alternative embodiments, a process automatically generates a score for a variable that is passed to the model used by scoringengine 120. In still other alternative embodiments, various tables also exist within thescoring engine component 120 are invoked by themodel data layer 510 as supportingmodel data structures 530 comprising reference resources such as claim exclusion tables, NCCI cross-reference tables, ICD9 cross-reference tables, target variable manipulation tables, and other tables. - Still referring to
FIG. 5 , thescoring engine 120 can be broadly defined as a collection of processes and data used to generate, store, and validate predictions. This process extracts data (i.e., the relevant data 410) fromclaim data repository 350 for transformation into a model scoring record. A “model scoring record” represents all relevant predictive variables summarized at the claim level with a corresponding dependent variable (variable for prediction) when such record is used for training, testing and validation. A model scoring record will not have a dependent variable when presented to the predictive model for scoring. Each model will have its own set of input variables used for prediction. A model record is constructed utilizing the source data fromclaim data repository 350 and the application of model specific data that was created by the automatedvariable creation process 520 and/orother processes 530. Once built, the model scoring record is stored for subsequent scoring by the predictive model. The scoring process is run, whereby the model scoring records are scored utilizing the respective model. As part of the same process, the model also scores updated diagnostic data (“validation data”) to benchmark model performance. - Each model has its own model database that contains data and electronic program code for transforming the raw data into model scoring records. In addition, each model database will contain the logic and data necessary to automatically train, test and if necessary validate the model based on feedback as indicated by
model retraining process 550 andmodel feedback data 560. - In an embodiment, all of the resulting model scoring records and associated scoring data are written out to SQL database tables in a
model output layer 570 for storage and analysis. - With respect to the automated
variable creation process 520 as used in themodel scoring engine 120, alternative embodiments provide for a number of transformations to be performed to data that results in an enhanced prediction regarding the riskiness of a given claim. In one alternative embodiment, for example, ICD9 code data is transformed in various ways. First, for example, groups of ICD9 codes are arranged into high risk workers' compensation injury classifications—Back and Thoracic, Knee, Shoulder, Burns, Reflex Sympathetic Dystrophy Syndrome, Pain, Diabetes, etc. However, this classification process groups ICD9 codes into certain injury types and not all ICD9 codes in a given classification are created equal with respect to riskiness. In this manner, the medical data from claims is leveraged to better understand the riskiness of a given ICD9 within a certain classification. In alternative embodiments, various other forms of data are transformed in various ways to render an enhanced prediction regarding the riskiness of a given claim. - As mentioned when describing the
scoring engine component 120 inFIG. 1 , in an embodiment, the model scoring engine collectively refers to the input data, the model training/testing/validation processes, the resulting predictive models and the generated output data. In an embodiment, the general component parts ofscoring engine 120 are: themodel data layer 510, which contains the data and transformations used to prepare for model scoring; themodel retraining process 550, which is an automated mechanism to retrain a model with new or updated data; themodel scoring layer 540, which provides an application of a specific model to its related model data. In addition, scoringengine 120 includes amodel output layer 570 for the data captured as a result of processing in themodel scoring layer 540. - Referring further to the
model data layer 510, to enable automated scoring, the data presented to the respective predictive model must be properly transformed. For each model, a model scoring record is defined. Thescoring engine 120 consumes data from theclaim data repository 350 for non-closed claims and writes openclaim input data 513 which is used to construct model scoring records for the non-closed claims. In addition, training, testing and validation model scoring records are also created, supplemented in an alternative embodiment by validation setinput data 516. Some models employ automated random or other sampling techniques to balance the training, testing, and validation model scoring records according to aspects of the invention. Preferably, every time thescoring engine 120 is executed, themodel data layer 510 for that model is refreshed with the most current claim information. - Still referring to
FIG. 5 , themodel retraining process 550 will now be described in greater detail. Themodel scoring engine 120 contains the ability to individually train/retrain each model. Every predictive model performing classification or regression must be initially trained with model scoring data that has the dependent variable present. Thescoring engine 120 is utilized to train each predictive model prior to its first scoring execution. Moreover, scoringengine 120 is capable of automatically retraining and testing each model as part of the scoring process. This ability forms a core of a true machine learning process. In this manner, thescoring engine 120 can dynamically adapt its predictive models as new data, representing changing circumstances, enters the system. - In the
model scoring layer 540, trained models and the associated training and testing data are stored as objects inside a component of the database engine. These objects are created as a result of the implementation platform. It is not necessary that the models be stored as database objects. Each model can be referenced as function call whereby the function is passed a model scoring record and the function returns the original model scoring record plus a prediction and an indication of certainty about that prediction. In operation,model scoring layer 540 scores open claims as well as validation model scoring records. The scoring of validation data is performed in an effort to understand model performance over time. - Referring now to model
output layer 570, themodel scoring layer 540 preferably hands its data to modeloutput layer 570 for storage in database tables. This includes scoredopen claims 572, scoredvalidation data 576, and metadata generated by the execution of scoring engine 120 (e.g., date/time stamps, step identifiers, errors. etc.). Both the scored claims and the validation claims have current and historical tables. In an embodiment, the scored open claims are stored as historical scoredopen claim data 574 and the scored validation claims with history are stored as scoredvalidation data 579. - In an embodiment, scoring
engine 120 retains the prediction along with the model scoring record for each of the open claims and validation data. In an embodiment,model output layer 570 also catalogs a confusion matrix for assessing the effectiveness of the learning. The confusion matrix is based on the last execution of the model using the validation data. The validation setinput data 516 is used when cataloging the confusion matrix. In another embodiment, the confusion matrix is captured historically. - As described above, scoring
engine 120 is configured to implement various predictive models. In an embodiment, scoringengine 120 uses one or more predictive models, alone or in combination. A detailed summary of each exemplary model is provided in Appendix A and Appendix B, respectively. Appendix A sets forth a first model, MdlTRIAGEINT001, which identifies claims that are more likely to exceed the self-insured retention or deductible. Appendix B sets forth a second model, MdlTRIAGEEXT001, which identifies claims that are likely to exceed a cap, such as $50,000 in total cost. It should be apparent that the predictive models are used, in part, to identify claims at different points in a claim's lifecycle. For example, mdlTriageINT001 is looking for claims that have the potential to breach the self-insured retention. The model mdITRIAGEEXT001, on the other hand, is trained to identify a claim that is likely to exceed $50,000 in total expenses. An appearance on a prediction report generated by thereport engine 130 ofFIG. 1 is an indication that a given claim is classified as elevated risk. Thus, inclusion on a report produced by thereport engine 130 alerts a claim's analyst to review the treatment pattern for a given claim and develop an appropriate action plan based on the automated recommendations. -
FIG. 6 depicts the model training/retraining process 550 used by scoringengine 120 in accordance with an embodiment of the invention. The SQL Server Integration Services training package, which is a part of the SQL Server 2008 R2 Database Platform, provides a suitable implementation of model training/retraining process 550. The R2 Database Platform provides higher level tools that reduce the need for programming, and it is important to note that the services provided by this platform may be implemented in other languages and platforms with equal effectiveness. In this embodiment, the entire predictive model automation process is constructed utilizing a combination of the SQL Server Database Engine, SQL Server Integration Services Packages and models and predictive model objects stored in SQL Server Analysis Services. - As shown in
FIG. 6 , beginning at 610, a list of claims used to train the model is rebuilt. Step 620 indicates that the list was successfully rebuilt. In a preprocessing step, the list of claims used in a TEST set to evaluate the performance of the model is cleared from the mining structure during a truncatetest table step 630 and indicated at 640 when successfully completed. The mining structure and mining models are reprocessed during an analysisservices processing step 650. This step includes loading the new claims from the rebuilt list of claims in thebuild model step 610 into the mining structure. A random 30% of the data from these new claims are held out from the previously mentioned TEST set. Also, retraining of the model occurs during the analysisservices processing step 650 based on the remaining 70% of new claims from thebuild model step 610. The TEST set now stored in the mining structure from the analysis services processes step 650, if successful at 660, is written to a database table during a dataflow task step 670 and indicated at 680 as successful. -
FIG. 7 depicts an embodiment of themodel scoring layer 540 in greater detail, as implemented in the previously mentioned SQL Server Integration Services training package. As depicted, a storerun time step 710, indicated at 720 when successful, controls an archive opensstep 730, indicated at 740 when successful. As shown, from the archive opensstep 730, operations proceed to a score open claims step 750 to track and archive the status of open claims. Success of the score open claims step 750 is indicated at 760. - The store
run time step 710 further controls a score validation claims step 770, indicated at 780 when successful. Model performance is checked over time in the score validation claims step 770 as well as a store validation metrics step 785, indicated at 790 when successful. To verify that a given model is performing as expected, certain test cases, termed a “validation set,” are pulled out of the general population of claims data and scored by the models. The validation set changes over time to include new claims that have recently closed or met some other criteria. The output of the model is checked against reality by tracking the costs associated with a given claim. Since the claim's outcome is known, scoring the associated model record will lead to either an accurate or inaccurate prediction. Data is pulled from theclaim data repository 350 depicted inFIG. 3 . The validation setinput data 516 depicted inFIG. 5 creates test cases. A confusion matrix is created from scoredvalidation data 576 and then used to generate modelperformance metrics data 581. This data indicates whether the model is performing as expected. -
FIG. 8 depicts thereport engine 130 in further detail. Incoming claim files 810 that have been designated high risk by thescoring engine 120 are stored in aprediction database 820 and are used by aprediction report 830. Anintervention database 825 maintains model interventions associated with different risks. Thescoring engine 120 generates a great deal of data about an individual claim. Not only is a current prediction available, but historical predictions are also available. This is a more complete picture of an individual claim's risk than a single point estimate. In addition, the many variables that are found in the model scoring record are not always of value for deciphering what is driving claim riskiness. Organizing the model data and the entire prediction history paints a more compelling picture of claim risk factors. Thus, the data from themodel output layer 570 is transformed into theprediction report 830. After the process is run, scoredopen claim data 572 is available for use by theprediction database 820. The data is then pulled from theprediction database 820 and is used by thereport 830. Thereport engine 130 preferably generates aprediction report 830 that displays, for example, model variables categorized into 1 of 8 “risk factor categories” included as illustrated in Table 1: -
TABLE 1 Claimant Personal Risk Attributes particular to the injured worker. The severity score would Factors consider factors such as the age and gender of the claimant as well as the claimant's tenure with the employer. Claim Risk Severity Attributes that are related to the type of injury suffered by the claimant. The severity score would consider the seriousness of the work related injury - broken arm, strained back, burned hand, head injury, etc. Non-Pharmacological Attributes found in the medical data that are not related to Treatment Risk Factors pharmacy. The severity score will consider the diagnoses that appear on medical bills as well as the medical services begin performed. Pharmacological Attributes found in the medical data that involve prescription drugs. Treatment Risk Factors The severity score will consider the types of drugs that are prescribed to the claimant. Claimant Biological Risk Attributes associated with the medical bills that are related to Factors coexisting medical conditions. Coexisting conditions are medical issues that are not necessarily related to the compensable workers compensation injury, but nevertheless appear in the medical bill data. The severity score would consider issues such as diabetes, hypertension and other similar conditions. Physician Outcome Not available at this time. (when Available) Claimant Psycho-Social Attributes related to the claimant's lifestyle choices and Risk Factors psychological related risk factors. The severity score would consider things like tobacco use, alcohol use, substance abuse, depression, post-traumatic stress disorder, etc. Lost Time Risk Factors Attributes related to the lost time expenses and estimates. The severity score considers the indemnity expenses on the claim as well as how the diagnosis codes (from the medical data) impact the expected lost time. Regulatory/Legal Risk Attributes related to the legal environment associated with the Factors jurisdiction governing the claim. The severity score considers the jurisdictional related factors such as the ability to settle medical/indemnity and who can direct care on the claim. - As part of the transformation into each of these “risk factor categories,” in some instances, additional explanatory predictive models are utilized to grade risk in the respective category. In risk categories without a predictive model, the risk score is based on crosswalk data that identifies risk. For example, the state risk is based on a third party tool that assesses workers' compensation risk by jurisdiction. A further transformation takes the current point estimate prediction and factors in past prediction changes to give an indication of the prediction trend for each claim—Increasing, Decreasing or Flat. The trend indication allows for varying degrees of change (depending on the current prediction score relative to the previous score change) before alert of an increasing or decreasing trend is presented. This more accurately predicts whether reports a change in trend is material. Representative selections of code that perform various portions of this transformation, such as the initial selecting of data used to render the evaluation of comorbidity factors, are detailed below:
-
- The resulting data from these additional transformations are the foundation of the
prediction report 830. - Referring further to
FIG. 8 , theprediction report 830 generated by thereport engine 130 serves as the primary user interface into the prediction and intervention suggestion system. The report contains claim identification data in addition to the risk factor categories and prediction trend indication. In an embodiment, an indication of the $50,000 prediction MdlTRIAGEEXT001 appears on this report. In an alternative embodiment, a prediction and trend indication is given for the MdlTRIAGEINT001 model (likely to exceed retention model). - Most, if not all, model variable categories have interventions stored in the
Intervention database 825. In some instances, an identified risk factor linked to a specific intervention as indicated at 840 can be determined based uponprediction report 830. For example, claimant personal risk factors cannot be mitigated by intervention. Likewise, regulatory and/or legal risk factors are based on the law of the jurisdiction governing the claim and, thus, cannot be changed. However, most other risk factors have specific suggested interventions. The specific suggested interventions are cataloged inintervention database 825. The design is flexible so as to support interventions for both generic and specific risk factors. There is no limit to the number of interventions that can be configured. Included below in Table 2 is an exemplary list of interventions tied to specific risk factors; a more complete description of particularly relevant interventions is described below. The interventions are based on the expert medical opinion of a medical doctor but apply generically based on the risk factor category: -
TABLE 2 Non-Pharmaco- Pharmaco- Claimant logical logical Claimant Psycho- Treatment Treatment Biological Social Lost Time Other Risk Factors Risk Factors Risk Factors Risk Factors Risk Factors Considerations Interventions Utilization PBM use, including IME, Second Cognitive Return to Functional to consider Review: using formulary management, Opinion or Behavioral Work Program Restoration State-Specific Drug Indication Chart Review Therapy (CBT) Program Practice Guidelines/ Review, Independent with Telephonic Evaluation Evaluation ODG/ACO EM Pharmacist Peer Intervention Practice Guidelines/ Evaluation +/− Medical Society Telephonic Peer Practice Guidelines/ Intervention Cochrane/Others IME, Second Opinion or Chart Review with Telephonic Peer Intervention Rationale Is the Are the appropriate Is the Could there be a Are there any High scores in 2 or appropriate and and most cost- appropriate and psychological possible more Risk Categories best treatment effective drugs best treatment disorder or risk accommodations may point to the need being rendered? being used? being rendered? factor? that would allow for an intensive, Are the Could there the IW to return multidisciplinary prescribed drugs be substance to work? Are there intervention, where related to abuse? other avenues for physical and the claim? return to work, psychosocial concerns Are there adverse including volunteer are addressed, with a events due to the work and specialized view to restoring drugs? RTW programs? function, decreasing pain and reducing Rx costs - Utilization Review (UR)—Is allowed in many jurisdictions; other jurisdictions do not address the use of UR but do not prohibit it. A consideration should be given to do formal or informal utilization review through a reputable URO in order to assess whether or not a particular course of action is supported by evidence based medical guidelines, such as those published by: States, WLDI (ODG), ACOEM, Medical Societies or independent groups such as Cochrane.
- Independent Medical Evaluation (IME), Second Opinion or Chart Review with Telephonic Peer Intervention allows one to assess adequacy of diagnosis and treatment plan and suggest alternative management.
- Pharmacy Benefit Managers (PBM) have many available tools to ensure appropriate and cost-effective use of prescription medications, in addition to any savings they accomplish by reducing pharmacy bills. In alternative embodiments, some of these strategies include formulary management, drug indication reviews (DIR; helps identify the appropriateness of a medication to the compensable diagnosis), independent pharmacy evaluations and more intensive programs to assess and manage prescription patterns (including telephonic peer review consultations). Modern Medical, Inc. has an excellent Opioid Defense Manager that identifies opioid overuse prospectively and intervenes at the level of the prescriber and injured worker.
- Cognitive Behavioral Therapy (CBT) can help to address psychosocial risk factors that delay recovery and increase the cost of a claim. COPE, a national CBT provider group understands workers' compensation and does not use psychiatric diagnostic or billing codes. They can evaluate the patient and recommend, if appropriate, a limited intervention to help injured workers recover more quickly.
- Functional Restoration Programs (FRP) are multi-disciplinary, intensive interventions that address both psychosocial factors (fear, disability mindset, catastrophic thinking, stress, anxiety) and medical factors (deconditioning, pain and opioid abuse). Their intervention is intensive (30-40 hours weekly for 2-6 weeks, depending on severity and age of the claim) and has as its aim the recovery of function, return to work, reduction in pain and elimination or decrease in the use of opioids and other medications. An evaluation lasts one to two days in one embodiment, but varies in duration in other embodiments, and can identify patients that are most likely to succeed.
- Still referring to
FIG. 8 , in an embodiment theprediction report 830 is delivered via the Internet, shown at 850, or other suitable data communications network toweb report users 860 andweb service consumers 870. Not allconsumers 860 of this prediction and intervention process will want to receive reporting via online. In one embodiment,consumers 860 will integrate the output into existing business systems. In other embodiments, integration is not necessary or desired. Therefore, a secure web service, for example, provides prediction report data generated by thereport engine 130 to suchweb service consumers 870. The web service requires individualized authentication and will take a claim number or the like as input and return the contents of thecorresponding prediction report 830 as an output generated by thereport engine 130. -
FIGS. 9 and 10 are exemplary screenshots, for illustrative purposes only, depicting contents of the prediction report according to one alternative embodiment. Risk factor categories are displayed including, for example, Lost Time Risk Factors 910 and Regulatory/Legal Risk Factors 920. A Risk Score 930 can be thought of as a certainty measure that further explains the High Risk/Standard Risk determination made by scoringengine 120. The closer this probability is to 1 the more certain the model is about the high risk prediction. The closer the probability is to 0 the more certain of a standard risk claim—with 0.5 being an exemplary transition point between high risk and standard risk. The score from the model is displayed inFIG. 9 as a probability score *100 and is in the column titled “Risk Score.” - Although described in connection with an exemplary computing system environment, embodiments of the aspects of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. The computing system environment is not intended to suggest any limitation as to the scope of use or functionality of any aspect of the invention. Moreover, the computing system environment should not be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment. Examples of well-known computing systems, environments, and/or configurations that are suitable for use with aspects of the invention in various embodiments include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, network PCs, minicomputers, mainframe computers, distributed computing environments (such as cloud-based computing) that include any of the above systems or devices, and the like.
- Embodiments of the aspects of the invention are described in the general context of data and/or processor-executable instructions in various embodiments, such as program modules, stored one or more tangible, non-transitory storage media and executed by one or more processors or other devices. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. In various embodiments, aspects of the invention are also practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In various embodiments of a distributed computing environment, program modules are located in both local and remote storage media including memory storage devices.
- In alternative embodiments, processors, computers and/or servers execute the processor-executable instructions (e.g., software, firmware, and/or hardware) such as those illustrated herein to implement aspects of the invention.
- Alternative embodiments of the aspects of the invention are implemented with processor-executable instructions. The processor-executable instructions are organized into one or more processor-executable components or modules on a tangible processor readable storage medium in various embodiments. Aspects of the invention are implemented with any number and organization of such components or modules in various embodiments. For example, aspects of the invention are not limited to the specific processor-executable instructions or the specific components or modules illustrated in the figures and described herein. Other alternative embodiments of the aspects of the invention include different processor-executable instructions or components having more or less functionality than illustrated and described herein.
- The order of execution or performance of the operations in embodiments of the aspects of the invention illustrated and described herein is not essential, unless otherwise specified. That is, in alternative embodiments, the operations are performed in any order, unless otherwise specified, and embodiments of the aspects of the invention include additional or fewer operations than those disclosed herein. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the invention.
- When introducing elements of aspects of the invention or the embodiments thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that alternative embodiments include additional elements other than the listed elements.
- In view of the above, it will be seen that several advantages of the aspects of the invention are achieved and other advantageous results attained.
- Not all of the depicted components illustrated or described are required in alternative embodiments. In addition, alternative implementations and embodiments include additional components. Variations in the arrangement and type of the components are capable of being made in alternative embodiments without departing from the spirit or scope of the claims as set forth herein. Alternative embodiments provide that additional, different or fewer components are capable of being provided and components combined. Further, alternative embodiments provide for a component implemented alternatively or in addition by several components.
- The above description illustrates the aspects of the invention by way of example and not by way of limitation. This description enables one skilled in the art to make and use the aspects of the invention, and describes several embodiments, adaptations, variations, alternatives and uses of the aspects of the invention, including what is presently believed to be the best mode of carrying out the aspects of the invention. Additionally, it is to be understood that the aspects of the invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The aspects of the invention are capable of other embodiments and of being practiced or carried out in various ways. Also, it will be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.
- Having described aspects of the invention in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the invention as defined in the appended claims. It is contemplated that various changes could be made in the above constructions, products, and process without departing from the scope of aspects of the invention. In the preceding specification, various preferred embodiments have been described with reference to the accompanying drawings. It will, however, be evident that alternative embodiments provide for various modifications and changes to be made thereto, and additional embodiments implemented, without departing from the broader scope of the aspects of the invention as set forth in the claims that follow. The specification and drawings are accordingly to be regarded in an illustrative rather than restrictive sense.
- Predictive Model 1: Identify Claims at Risk of Exceeding the SIR/Deductible Introduction
- This will serve as the primary documentation for version 1.0 of the Internal WC claims triage model. The goal of this model is to identify higher risk claims in order for review by the claims department to effectively triage, specifically—
-
- “To identify claims, for review and action by Claims Triage personnel, which have not otherwise been reported or claims that are not currently open, and the identified claim has potential large-loss exposure.”
- The model takes in a host of variables/features, discussed below, and outputs a probability of becoming an excess claim.
- This model data resides in the MDLTRIAGEINT001 and MDLCOMMONTRIAGEINT databases on the production system. The model itself resides in TRIAGEINT001 database of the production Analysis Services database.
- The training set includes: a) all open large loss claims, and b) all closed non-claims. Data deemed to have insufficient completeness of medical or pharmacy data will not be included. 80% of the claims meeting these criteria were used for training.
- The validation set has the same criteria as the training set except that is uses the 20% of the data that was not used for training data deemed to have insufficient completeness of medical or pharmacy data.
- A full set of potential training data resides in the table mdITRIAGEINT001_Training. The features for the model can be divided into a several classes:
-
- 1. Claim and Policy variables—are sourced from claim/policy level data. Potential features in this category are listed below:
- a. Claim Administrator—the administrator's name
- c. Claim Number—Claim number assigned to the claim
- e. Claimant Name—Name of the Injured Worker
- f. Claimant's Gender—Gender of the Injured Worker
- i. Policy Number—policy number under which the claim is covered
- j. Policy Effective Date—Effective date of the policy under which the claim is covered
- k. Accident description—description of the accident
- n. Nature of Injury—NCCI nature of injury description
- o. Part of Body—NCCI part of body description
- p. Cause of Injury—NCCI cause of injury description
- q. NOI POB Severity Score—severity score based on the Nature of Injury and Part of Body NCCI categories. The scoring process is discussed below. Some are unclassified
- r. COI Severity Score—severity score based on the Cause of Injury NCCI category.
- s. Claim Status—the status of the claim (either Open/Closed or more rarely Unknown)
- t. Claim Reopen—a Boolean identifying claims that reopen. 0 is no reopen, 1 reopen.
- u. Claimant birth date—date of birth for the injured worker
- x. Date of Injury—date the accident occurred
- bb. Benefit state—identifies which state's laws govern claim benefits
- gg. Hire Date—date injured worker was hired (Jan. 1, 1900 is an unknown).
- jj. Employee job class code—code that identifies the job category of the injured worker
- aaa. Total Paid—total medical, indemnity and expense paid on a given claim
- bbb. Self-Insured Retention—the amount loss the insured retains for the claim as governed by the policy terms
- 2. Medical Diagnosis variables. These are sourced from ICD9 codes on the medical bills.
- q. Back Severity—Back injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- r. Brain Severity—for brain injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- s. Burn Severity—same as q. for burn injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- v. Knee Severity—same as q. for knee severity
- z. Shoulder Severity—shoulder injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- aa. Spinal Cord Severity—spinal cord injury given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- ff. Comorbidity—Diabetes—a flag indicating diabetes
- h. Comorbidity—Hypertension—a flag indicating hypertension
- jj. Comorbidity Obesity—a flag indicating obesity
- 3. Medical Service variables are related to classified service codes source from the medical bills. They are grouped using CMS BETOS classifications.
- a. Musculoskeletal Procedure Flag—high risk musculoskeletal procedure
- d. Emergency Room Service Flag—high risk ER services
- f. Imaging Flag—flag indicating the presence of x-ray/CT or MRI scans
- 4. Pharmacy variables are sourced using the drug codes on the medical bills, with the drug data filled in using a cross-reference database.
- a. NSAID Flag—a flag indicating the use of NSAID drugs
- b. Opioid Flag—a flag indicating the use of opioid based pain relievers.
- c. Muscle Relaxant Flag—a flag indicating the use of muscle relaxant drugs
- 5. Other Model variables are derived variables
- g. Risk Category—a ‘High’ or ‘Low’ risk category based on the total paid at a threshold of 50 k
- h. Total Incurred—the sum of the claim's paid losses and loss reserves
- 1. Claim and Policy variables—are sourced from claim/policy level data. Potential features in this category are listed below:
- Because of the tremendous number of NCCI categories and ICD9 codes, in order to build effective models, these categories need to get grouped into a smaller subset. In order to do this, we implement a severity score as described below.
- The NOI and POB fields interact strongly, so wherever possible we want to use both fields together to set the severity. We do this using the following algorithm:
-
- 1. If the combined NOI/POB have at least 100 claims, use the combined NOI and POB and calculate the probability of high risk.
- 2. If the combined NOI/POB have less than 100 claims, and either the NOI or POB individually have more than 100 claims use the probability of high risk calculated from that field. If BOTH individually have more than 100, use the higher probability.
- 3. If neither category has 100 claims in it, then group up the POB based on the body (toe gets grouped with foot, finders with hands, etc.) and check
steps - 4. Now that each NOI/POB has a probability of high risk associated with it, score as follows—the top x % get a score of 4, the next x % get a 3, the next x % is a 2 and the remainder is scored as a 1.
-
-
- 1. Group together COI codes based on common causes (all air/boat/motor vehicle collisions get lumped together)
- 2. Set the probability of high risk for each grouped COI for groups with 100 or more claims
- 3. Apply step 4 above.
- The revised ICD9 process is used in the Diagnosis Related Variables
-
- 1. Instead of dividing into nurse driven categories use the entire set of ICD9 codes
- 2. For ICD9 codes with at least 100 claims, calculate the probability of high risk
- 3. For 5 digit ICD9 codes with less than 100 claims, group them with the 4 digit category code and assign probability that way, where possible
- 4. Fill in severities of 1-4 as used in the other scoring processes
- 5. Break the severities back into medical categories
- The area under the curve measurement in R is 0.86 using the 20% holdout sample. The ratio of true positives to all actual positives is 87%. The ratio of false positives to all predicted positives is 19%.
- Feature selection for a model of this type is a tricky process. Several variables (i.e. the financial variables) are obviously highly correlated with expensive claims, but have to be excluded because they are unfit for an early identification model. At all times, we have tried to restrict ourselves to those variables that can show up early in a claim's lifetime.
- Below is an example of the variable importance measures used to identify relevant features for the model.
-
Chi Squared Attribute Statistic Self-Insured Retention 117.949 Benefit State 84.054 Comorbidity - Diabetes 83.502 Brain Severity 29.355 Musculoskeletal Procedure Flag 20.295 Opioid Flag 15.367 - The logistic regression was chosen as the ultimate solution. Although it performs slightly worse than the neural network, it is less of a black box and lends itself to understanding what drives predictions.
- The model is mdlLogisticRegression in the mining structure mdITRIAGEINT001 Training.
- The ratio of true positives to all actual positives is 89%. The ratio of false positives to all predicted positives is 17%.
- In order to assess validity of the model a separate validation set has been built, mdITRIAGEINT001_Validation, based on the description of the validation data above. The ratio of true positives to all actual positives has been running at 82-83%. The ratio of false positives to all predicted positives has been running at 21%. It is understandable that the performance against the validation data is somewhat worse since the data with insufficient completeness is included.
- All scored claim results are saved in an archive table (mdITRIAGEINT001_OpenArchive), so over time we will build up a set of data for actual historical validation of the model.
-
-
- mdITRIAGEINT001_Metrics—Data for the confusion matrix of the validation set
- mdITRIAGEINT001_Open—a table containing all the features of the open claims
- mdITRIAGEINT001_OpenScored—a table containing the scored open claims
- mdITRIAGEINT001_OpenArchive—the archive of all historical scored claims
- mdITRIAGEINT001_Test—the randomly withheld test set. This is created by Analysis Services when the model is trained.
- mdITRIAGEINT001_Training—The data available for training before splitting off the test set
- mdITRIAGEINT001_Validation—the validation set consisting of the validation data as described above.
- mdITRIAGEINT001_ValidationScored—the scored validation claims. Contains all validation claims that have ever been scored
- mdlxRefTRIAGEINT001_COISeverity—severity scores for the COI NCCI fields
- mdlxRefTRIAGEINT001_ICD9Severity—severity scores using the revised ICD9 scoring process
- mdlxRefTRIAGEINT001_POBNOISeverity—severity scores for the NOI/POB NCCI fields
- tempmdITRIAGEINT001_ClaimBenefitState—Benefit state assignment by claim
- tempmdITRIAGEINT001_ClaimDiagnosisSeverity—a temporary table containing each claim and its diagnoses with severity score
- tempmdITRIAGEINT001_Rx—a temporary table containing each claim and its rx fields
- tempmdITRIAGEINT001_ModelSourceData—a temporary table containing the full feature set for all claims including all claims. This is the source for the mdITRIAGEINT001_Training table.
-
-
- spcGetErrorinfo—a helper process that logs errors upon failure of any of the other routines.
- spcmdITRIAGEINT001_BuildOpenSet—Builds the open set from the source data
- spcmdITRIAGEINT001_BuildTrainingSet—Builds the mdITRIAGEINT001_Training table from the source data
- spcmdITRIAGEINT001_BuildValidationSet—Builds the validation set from the source data
- spcmdITRIAGEINT001_ClaimDiagnosisSeverity—builds the temporary medical severity table described above.
- spcmdITRIAGEINT001_ClaimExclusions—builds set of claims exclusions
- spcmdITRIAGEINT001_PolicyPeriodInfo—build set of data from policy system
- spcTRIAGEINT001_List—builds the Rx temporary table described above.
- spcTRIAGEINT001_ModelSourceData—builds the source data temporary table described above.
- spcTRIAGEINT001_OpenSet—builds the open claims data. No inputs, outputs the refreshed temp tables and the mdITRIAGEINT001_Open table
- spcTRIAGEINT001_TrainingSet—builds the training data. No inputs, outputs the refreshed temp tables and the mdITRIAGEINT001_TrainingSet table
- SQL Server Agent Jobs
-
- mdITRIAGEINT001_BuildAndScoreOpens—This archives the open claims, builds the new open claims and validation sets, scores them, and updates the metrics table.
- mdITRIAGEINT001_BuildTrainingSet—this trains the model by updating the training set, retraining the model, and then refreshing the table containing the test set.
- SSIS Packages
-
- mdITRIAGEINT001_BuildOpen—this package invokes the stored procedure that builds the open set
- mdITRIAGEINT001_BuildTraining—this package invokes the stored procedure that builds the training set, retrains the model, and refreshes the test set
- mdITRIAGEINT001_BuildValidation—this package invokes the stored procedure that builds the validation set
- mdITRIAGEINT001_ScoreOpens—this package archives the opens, scores both the opens and the validation set, and calculates the confusion matrix for the validation set.
- Predictive Model 2: Identify Claims at Risk of Exceeding $50,000 in Total Cost
- This will serve as the primary documentation for version 1.0 of the Primary WC claims triage model. The goal of this model is to identify higher risk claims in order for clients to effectively triage, specifically—
-
- “To identify claims, for review and action by Claims Administrator personnel,
- which are likely to exceed $50 k in total spend.”
- The model takes in a host of variables/features, discussed below, and outputs a probability of exceeding the 50 k threshold.
- This model data resides in the MDLTRIAGEEXT001 and MDLCOMMONTRIAGEEXT databases on the production system. The model itself resides in TRIAGEEXT001 database of the production Analysis Services database.
- The training set is restricted to claim administrators where we have reasonably complete data. For training we use all closed claims and those open claims that have already hit the 50 k threshold.
- A full set of potential training data resides in the table mdITRIAGEEXT001_TrainingSet. The features for the model can be divided into a several classes:
-
- 1. Claim and Policy variables—are sourced from claim/policy level data. Potential features in this category are listed below:
- a. Claim Administrator—the administrator's name
- c. Claim Number—Claim number assigned to the claim
- e. Claimant Name—Name of the Injured Worker
- f. Claimant's Gender—Gender of the Injured Worker
- i. Policy Number—policy number under which the claim is covered
- j. Policy Effective Date—Effective date of the policy under which the claim is covered
- k. Accident description—description of the accident
- n. Nature of Injury—NCCI nature of injury description
- o. Part of Body—NCCI part of body description
- p. Cause of Injury—NCCI cause of injury description
- q. NOI POB Severity Score—severity score based on the Nature of Injury and Part of Body NCCI categories. The scoring process is discussed below. Some are unclassified
- r. COI Severity Score—severity score based on the Cause of Injury NCCI category.
- s. Claim Status—the status of the claim (either Open/Closed or more rarely Unknown)
- t. Claim Reopen—a Boolean identifying claims that reopen. 0 is no reopen, 1 reopen.
- u. Claimant birth date—date of birth for the injured worker
- x. Date of Injury—date the accident occurred
- bb. Benefit state—identifies which state's laws govern claim benefits
- gg. Hire Date—date injured worker was hired (Jan. 1, 1900 is an unknown).
- jj. Employee job class code—code that identifies the job category of the injured worker
- aaa. Total Paid—total medical, indemnity and expense paid on a given claim
- bbb. Self-Insured Retention—the amount of loss the insured retains for the claim as governed by the policy terms
- 2. Medical Diagnosis variables. These are sourced from ICD9 codes on the medical bills.
- q. Back Severity—Back injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- r. Brain Severity—for brain injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- s. Burn Severity—same as q. for burn injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- v. Knee Severity—same as q. for knee severity
- z. Shoulder Severity—shoulder injuries given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- aa. Spinal Cord Severity—spinal cord injury given severity scores with the newer automated process described below, using refined sets of ICD9 codes
- ff. Comorbidity—Diabetes—a flag indicating diabetes
- h. Comorbidity—Hypertension—a flag indicating hypertension
- jj. Comorbidity Obesity—a flag indicating obesity
- 3. Medical Service variables are related to classified service codes source from the medical bills. They are grouped using CMS BETOS classifications.
- a. Musculoskeletal Procedure Flag—high risk musculoskeletal procedure
- d. Emergency Room Service Flag—high risk ER services
- f. Imaging Flag—flag indicating the presence of x-ray/CT or MRI scans
- 4. Pharmacy variables are sourced using the drug codes on the medical bills, with the drug data filled in using a cross-reference database.
- a. NSAID Flag—a flag indicating the use of NSAID drugs
- b. Opioid Flag—a flag indicating the use of opioid based pain relievers.
- c. Muscle Relaxant Flag—a flag indicating the use of muscle relaxant drugs
- 5. Other Model variables are derived variables g. Risk Category—a ‘High’ or ‘Low’ risk category based on the total paid at a threshold of 50 k
- h. Total Incurred—the sum of the claim's paid losses and loss reserves
- 1. Claim and Policy variables—are sourced from claim/policy level data. Potential features in this category are listed below:
- Because of the tremendous number of NCCI categories and ICD9 codes, in order to build effective models, these categories need to get grouped into a smaller subset. In order to do this, we implement a severity score as described below.
- The NOI and POB fields interact strongly, so wherever possible we want to use both fields together to set the severity. We do this using the following algorithm:
-
- 1. If the combined NOI/POB have at least 100 claims, use the combined NOI and POB and calculate the probability of high risk.
- 2. If the combined NOI/POB have less than 100 claims, and either the NOI or POB individually have more than 100 claims use the probability of high risk calculated from that field. If BOTH individually have more than 100, use the higher probability.
- 3. If neither category has 100 claims in it, then group up the POB based on the body (toe gets grouped with foot, finders with hands, etc.) and check
steps - 4. Now that each NOI/POB has a probability of high risk associated with it, score as follows—the top x % get a score of 4, the next x % get a 3, the next x % is a 2 and the remainder is scored as a 1.
-
-
- 1. Group together COI codes based on common causes (all air/boat/motor vehicle collisions get lumped together)
- 2. Set the probability of high risk for each grouped COI for groups with 100 or more claims
- 3. Apply step 4 above.
- The revised ICD9 process is used in the Diagnosis Related Variables
-
- 1. Instead of dividing into nurse driven categories use the entire set of ICD9 codes
- 2. For ICD9 codes with at least 100 claims, calculate the probability of high risk
- 3. For 5 digit ICD9 codes with less than 100 claims, group them with the 4 digit category code and assign probability that way, where possible
- 4. Fill in severities of 1-4 as used in the other scoring processes
- 5. Break the severities back into medical categories
- Feature selection for a model of this type is a tricky process. Several variables (i.e. the financial variables) are obviously highly correlated with expensive claims, but have to be excluded because they are unfit for an early identification model. At all times, we have tried to restrict ourselves to those variables that can show up early in a claims lifetime.
- An example of the measures used for feature selection is given below:
-
Attribute Value Favors HIGH Favors LOW Musculoskeletal Procedure Flag 1 100 Back Severity >=4 88.63 Comorbidity Obesity Flag >=1 86.82 Muscle Relaxant Flag >=1 78.64 Claimant's Gender Female 43.22 NSAID Flag <1 26.81 - The neural net was chosen as the ultimate solution for two compelling reasons: it had a slight edge in performance throughout the development process, and it is harder for clients to reverse engineer the probability scores it produces.
- To deal with the high skew in the data, an oversampling process was used. The number of high risk claims was oversampled up to 30%. The oversampled training data is in the table mdITRIAGEEXT001—30 pOversample. The oversample percentage was chosen to accommodate a maximum false positive rate of 5%.
- The model is Neural Net in the mining structure mdITRIAGEEXT001.
- The final model performs very well at its appointed task.
- The final model has a confusion matrix based on the test data (in mdITRIAGEEXT001_Test):
-
Test Matrix High Risk (actual) Low Risk (actual) High Risk (Predicted) 91% 5% Low Risk (predicted) 9% 95% - In order to assess validity of the model a separate validation set has been built (mdITRIAGEEXT001_Validation), this consists of the 400 most recently closed claims and the 100 youngest claims to hit the high risk threshold:
-
Validation Matrix High Risk (actual) Low Risk (actual) High Risk (Predicted) 92% 0.25% Low Risk (Predicted) 8% 99.75% - All scored claim results are saved in an archive table (mdITRIAGEEXT001_OpenArchive), so over time we will build up a set of data for actual historical validation of the model.
-
-
- mdITRIAGEEXT001—30pOversample—Training data oversampled to a high risk pct of 30
- mdITRIAGEEXT001_Metrics—Data for the confusion matrix of the validation set
- mdITRIAGEEXT001_Open—a table containing all the features of the open claims
- mdITRIAGEEXT001_OpenScored—a table containing the scored open claims
- mdITRIAGEEXT001_OpenArchive—the archive of all historical scored claims
- mdITRIAGEEXT001_Test—the randomly withheld test set. This is created by Analysis Services when the model is trained.
- mdITRIAGEEXT001_Validation—the validation set consisting of the 100 youngest high risk claims and 400 of the most recently closed claims
- mdITRIAGEEXT001_ValidationScored—the scored validation claims. Contains all validation claims that have ever been scored
- mdlxRefTRIAGEEXT001_COISeverity—severity scores for the COI NCCI fields
- mdlxRefTRIAGEEXT001_ICD9Severity—severity scores using the revised ICD9 scoring process
- mdlxRefTRIAGEEXT001_POBNOISeverity—severity scores for the NOI/POB NCCI fields
- tempmdITRIAGEEXT001_MedicalSeverity—a temporary table containing each claim and its diagnoses with severity score
- tempmdITRIAGEEXT001_Rx—a temporary table containing each claim and its rx fields
- tempmdITRIAGEEXT001_SourceData—a temporary table containing the full feature set for all claims, this gets broken into training and validation.
-
-
- sp_GetErrorinfo—a helper process that logs errors upon failure of any of the other routines.
- spcOVERSAMPLE—creates an oversample table. Inputs are: @TABLENAME a varchar containing the name of the input table you wish to oversample, @SOURCEFIELD the field you want to oversample on, @TARGETVALUE the value you want to oversample, @OS_PCT the percentage oversample you want, @SEED the seed for the random number generator. Output is an oversampled table.
- spcTRIAGEEXT001_MedicalSeverity—builds the temporary medical severity table described above.
- spcTRIAGEEXT001_List—builds the Rx temporary table described above.
- spcTRIAGEEXT001_ModelSourceData—builds the source data temporary table described above.
- spcTRIAGEEXT001_OpenSet—builds the open claims data. No inputs, outputs the refreshed temp tables and the mdITRIAGEEXT001_Open table
- spcTRIAGEEXT001_TrainingSet—builds the training data. No inputs, outputs the refreshed temp tables and the mdITRIAGEEXT001_TrainingSet table spcTRIAGEEXT001-ValidationSet—builds the validation data. No inputs, outputs the refreshed temp tables and the mdITRIAGEEXT001_ValidationSet table
- SQL Server Agent Jobs
-
- mdITRIAGEEXT001_BuildAndScoreOpens—This archives the open claims, builds the new open claims and validation sets, scores them, and updates the metrics table.
- mdITRIAGEEXT001_BuildTrainingSet—this trains the model by updating the training set, retraining the model, and then refreshing the table containing the test set.
- SSIS Packages
-
- mdITRIAGEEXT001_BuildOpen—this package invokes the stored procedure that builds the open set
- mdITRIAGEEXT001_BuildTraining—this package invokes the stored procedure that builds the training set, retrains the model, and refreshes the test set
- mdITRIAGEEXT001_BuildValidation—this package invokes the stored procedure that builds the validation set
- mdITRIAGEEXT001_ScoreOpens—this package archives the opens, scores both the opens and the validation set, and calculates the confusion matrix for the validation set.
Claims (20)
1. A computer-executable method of identifying elevated risk claims and providing suggestions for mitigating ongoing claim risk, said method comprising the steps of:
retrieving input data stored in a database, said input data determined to be relevant to a claim;
rendering variables for use by a predictive model, said variables comprising the input data and each having an importance score associated therewith based at least in part on a predictive model to which the variables are to be applied;
accessing the predictive model, said predictive model stored on a memory storage device associated with a processor;
executing, by the processor, the predictive model as a function of the accessed variables to yield a risk score for the claim; and
generating a report identifying the claim as an elevated risk claim when the risk score exceeds a predetermined threshold and providing suggestions for mitigating ongoing claim risk based on the variables.
2. The computer-executable method as recited in claim 1 , wherein the input data stored in the database relates to one or more of the following types of data or tables: ICD9 code data, workers compensation claim data, claim payment data, medical/prescription billing data, U.S. census data, Social Security disability data, state regulatory issues data, medical coding data, pharmacy database data, chronic condition data, comorbidity data, ICD9 cross reference tables, NCCI cross reference tables, target variable manipulation tables, claim exclusion tables, and an evidence-based medical treatment crosswalk for comparison to medical bill data.
3. The computer-executable method as recited in claim 1 , wherein the predictive model comprises one or more of the following types of predictive models: a model that identifies claims likely to exceed a self-insured retention or deductible, and a model that identifies claims likely to exceed a predetermined total cost.
4. The computer-executable method as recited in claim 1 , further comprising transforming the input data into model scoring records, said model scoring records representing all relevant variables for the predictive model.
5. The computer-executable method as recited in claim 1 , wherein the elevated risk claims comprise catastrophic claims and migratory claims.
6. The computer-executable method as recited in claim 1 , wherein the variables are categorized into 1 of 8 risk factor categories.
7. The computer-executable method as recited in claim 6 , wherein providing suggestions for mitigating ongoing claim risk based on the variables comprises retrieving specific suggested interventions stored in an intervention database according to risk factor category.
8. A computer-implemented method of mitigating elevated risk workers' compensation claims, the method comprising:
retrieving, by a processor, input data stored in a database, said input data determined to be relevant to mitigating a claim;
rendering, by the processor, variables for use by a predictive model, said variables comprising the input data and each having an importance score based at least in part on a predictive model to which the variables are to be applied;
accessing the predictive model, said predictive model stored on a memory storage device associated with the processor;
executing, by the processor, the predictive model as a function of the accessed variables to yield a risk score for the claim;
retrieving, by the processor, one or more specific suggested interventions stored in an intervention database according to a risk factor category associated with the claim;
generating a report on a computer user interface, said report identifying the claim as a predicted elevated risk claim when the risk score exceeds a predetermined threshold, wherein the report includes the retrieved suggestions for mitigating ongoing claim risk for the elevated risk claim.
9. The computer-implemented method as recited in claim 8 , further comprising implementing said one or more suggestions to mitigate ongoing claim risk.
10. The computer-implemented method as recited in claim 9 , wherein the data stored in the database relates to one or more of the following types of data or tables: ICD9 code data, workers compensation claim data, claim payment data, medical/prescription billing data, U.S. census data, Social Security disability data, state regulatory issues data, medical coding data, pharmacy database data, chronic condition data, comorbidity data, ICD9 cross reference tables, NCCI cross reference tables, target variable manipulation tables, claim exclusion tables, and an evidence-based medical treatment crosswalk for comparison to medical bill data.
11. The computer-implemented method as recited in claim 10 , wherein the predictive model comprises one or more of the following types of predictive models: a model that identifies claims likely to exceed a self-insured retention or deductible, and a model that identifies claims likely to exceed a predetermined total cost.
12. The computer-implemented method as recited in claim 11 , wherein the elevated risk workers' compensation claims comprise catastrophic claims and migratory claims.
13. A computer system for predicting and mitigating elevated risk workers' compensation claims comprising:
a computer-implemented user interface, executed by at least one computer system having at least one computer processor, said processor electronically retrieving data stored in a database and displaying the data on the interface, said data determined to be relevant to an elevated risk workers' compensation claim mitigation;
a variable rendering computer processing module, executed by the at least one computer processor, rendering at least one variable for use by the system, wherein said at least one variable comprises both the data relevant to the identification and mitigation model, and an importance score in part based upon predetermined risk predictions to which the variables are to be applied;
a predictive model computer processing module, executed by at least one computer processor, accessing at least one predictive model stored on a memory storage device and executing the predetermined risk predictions as a function of the at least one variable rendered by the variable rendering computer processing module to yield a risk score for a claim; and
a report generation computer processing module, executed by the at least one computer processor, generating a report identifying a predicted elevated risk claim based on the risk score and providing suggestions for mitigating ongoing claim risk based on the variables.
14. The system as recited in claim 13 , wherein the data stored in the database relates to one or more of the following types of data or tables: ICD9 code data, workers compensation claim data, claim payment data, medical/prescription billing data, U.S. census data, Social Security disability data, state regulatory issues data, medical coding data, pharmacy database data, chronic condition data, comorbidity data, ICD9 cross reference tables, NCCI cross reference tables, target variable manipulation tables, claim exclusion tables, and an evidence based medical treatment crosswalk for comparison to medical bill data.
15. The system as recited in claim 14 , wherein the predictive model comprises one or more of the following types of predictive models: a model that identifies claims likely to exceed a self-insured retention or deductible, and/or a model that identifies claims likely to exceed a predetermined total cost.
16. The system as recited in claim 15 , further comprising an identification and mitigation model database containing the data transformed into model scoring records, said model scoring records representing all relevant variables for the predictions.
17. The system as recited in claim 16 , wherein the elevated risk workers' compensation claims comprise catastrophic claims and migratory claims.
18. The system as recited in claim 13 , wherein the variables are categorized into 1 of 8 risk factor categories.
19. The system as recited in claim 18 , further comprising additional explanatory predictive models to grade the risk of the risk factor category of the variables.
20. The system as recited in claim 19 , further comprising specific suggested interventions stored in the intervention database according to risk factor category.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/464,288 US20160055589A1 (en) | 2014-08-20 | 2014-08-20 | Automated claim risk factor identification and mitigation system |
US15/686,420 US10679299B2 (en) | 2014-08-20 | 2017-08-25 | Machine learning risk factor identification and mitigation system |
US16/875,211 US20200279334A1 (en) | 2014-08-20 | 2020-05-15 | Machine learning risk factor identification and mitigation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/464,288 US20160055589A1 (en) | 2014-08-20 | 2014-08-20 | Automated claim risk factor identification and mitigation system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/686,420 Continuation US10679299B2 (en) | 2014-08-20 | 2017-08-25 | Machine learning risk factor identification and mitigation system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160055589A1 true US20160055589A1 (en) | 2016-02-25 |
Family
ID=55348685
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/464,288 Abandoned US20160055589A1 (en) | 2014-08-20 | 2014-08-20 | Automated claim risk factor identification and mitigation system |
US15/686,420 Active 2035-05-20 US10679299B2 (en) | 2014-08-20 | 2017-08-25 | Machine learning risk factor identification and mitigation system |
US16/875,211 Abandoned US20200279334A1 (en) | 2014-08-20 | 2020-05-15 | Machine learning risk factor identification and mitigation system |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/686,420 Active 2035-05-20 US10679299B2 (en) | 2014-08-20 | 2017-08-25 | Machine learning risk factor identification and mitigation system |
US16/875,211 Abandoned US20200279334A1 (en) | 2014-08-20 | 2020-05-15 | Machine learning risk factor identification and mitigation system |
Country Status (1)
Country | Link |
---|---|
US (3) | US20160055589A1 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180173854A1 (en) * | 2016-12-21 | 2018-06-21 | Cerner Innovation, Inc. | Monitoring predictive models |
US10394871B2 (en) | 2016-10-18 | 2019-08-27 | Hartford Fire Insurance Company | System to predict future performance characteristic for an electronic record |
US10445354B2 (en) | 2016-10-05 | 2019-10-15 | Hartford Fire Insurance Company | System to determine a credibility weighting for electronic records |
US10943464B1 (en) | 2017-09-27 | 2021-03-09 | State Farm Mutual Automobile Insurance Company | Real property monitoring systems and methods for detecting damage and other conditions |
CN112534456A (en) * | 2018-06-01 | 2021-03-19 | 全球保修服务有限公司 | System and method for analyzing protection plan and warranty data |
US11017116B2 (en) * | 2018-03-30 | 2021-05-25 | Onsite Health Diagnostics, Llc | Secure integration of diagnostic device data into a web-based interface |
CN113657343A (en) * | 2021-08-30 | 2021-11-16 | 平安医疗健康管理股份有限公司 | Method and device for generating claim resource data based on machine learning |
US11188565B2 (en) | 2017-03-27 | 2021-11-30 | Advanced New Technologies Co., Ltd. | Method and device for constructing scoring model and evaluating user credit |
US11257018B2 (en) * | 2018-12-24 | 2022-02-22 | Hartford Fire Insurance Company | Interactive user interface for insurance claim handlers including identifying insurance claim risks and health scores |
US20220083899A1 (en) * | 2020-09-11 | 2022-03-17 | International Business Machines Corporation | Validation of ai models using holdout sets |
US11314620B1 (en) * | 2020-12-09 | 2022-04-26 | Capital One Services, Llc | Methods and systems for integrating model development control systems and model validation platforms |
US20220222440A1 (en) * | 2021-01-14 | 2022-07-14 | Rumman Chowdhury | Systems and methods for assessing risk associated with a machine learning model |
WO2022234112A1 (en) * | 2021-05-07 | 2022-11-10 | Swiss Reinsurance Company Ltd. | Cloud-based, scalable, advanced analytics platform for analyzing complex medical risk data and providing dedicated electronic trigger signals for triggering risk-related activities in the context of medical risk-transfer, and method thereof |
US11574364B2 (en) * | 2019-03-28 | 2023-02-07 | Change Healthcare Holdings, Llc | Systems and methods for automated review of risk adjustment data on submitted medical claims |
WO2024017630A1 (en) | 2022-07-20 | 2024-01-25 | Swiss Reinsurance Company Ltd. | Digital life and/or health claims processing system integrating multiple claim channels, and method thereof |
CN118014737A (en) * | 2024-01-10 | 2024-05-10 | 保腾网络科技有限公司 | Health insurance risk control method and system based on big data |
US12210937B2 (en) * | 2018-11-16 | 2025-01-28 | Sap Se | Applying scoring systems using an auto-machine learning classification approach |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10756970B1 (en) * | 2019-02-20 | 2020-08-25 | Amdocs Development Limited | System, method, and computer program for automatic reconfiguration of a communication network |
US10403056B2 (en) * | 2014-12-08 | 2019-09-03 | Nec Corporation | Aging profiling engine for physical systems |
US11461848B1 (en) | 2015-01-14 | 2022-10-04 | Alchemy Logic Systems, Inc. | Methods of obtaining high accuracy impairment ratings and to assist data integrity in the impairment rating process |
US11853973B1 (en) | 2016-07-26 | 2023-12-26 | Alchemy Logic Systems, Inc. | Method of and system for executing an impairment repair process |
US11494845B1 (en) * | 2016-08-31 | 2022-11-08 | Nationwide Mutual Insurance Company | System and method for employing a predictive model |
US11854700B1 (en) | 2016-12-06 | 2023-12-26 | Alchemy Logic Systems, Inc. | Method of and system for determining a highly accurate and objective maximum medical improvement status and dating assignment |
US12165209B1 (en) | 2017-09-19 | 2024-12-10 | Alchemy Logic Systems Inc. | Method of and system for providing a confidence measurement in the impairment rating process |
US12183466B1 (en) * | 2018-03-12 | 2024-12-31 | Alchemy Logic Systems Inc. | Method of and system for impairment rating repair for the managed impairment repair process |
CN109492095A (en) * | 2018-10-16 | 2019-03-19 | 平安健康保险股份有限公司 | Claims Resolution data processing method, device, computer equipment and storage medium |
US11625687B1 (en) | 2018-10-16 | 2023-04-11 | Alchemy Logic Systems Inc. | Method of and system for parity repair for functional limitation determination and injury profile reports in worker's compensation cases |
US11580475B2 (en) * | 2018-12-20 | 2023-02-14 | Accenture Global Solutions Limited | Utilizing artificial intelligence to predict risk and compliance actionable insights, predict remediation incidents, and accelerate a remediation process |
US11928737B1 (en) | 2019-05-23 | 2024-03-12 | State Farm Mutual Automobile Insurance Company | Methods and apparatus to process insurance claims using artificial intelligence |
US11669907B1 (en) * | 2019-06-27 | 2023-06-06 | State Farm Mutual Automobile Insurance Company | Methods and apparatus to process insurance claims using cloud computing |
US11848109B1 (en) | 2019-07-29 | 2023-12-19 | Alchemy Logic Systems, Inc. | System and method of determining financial loss for worker's compensation injury claims |
US20240087750A1 (en) * | 2020-06-17 | 2024-03-14 | University Of Florida Research Foundation, Incorporated | Machine learning systems and methods for predicting risk of incident opioid use disorder and opioid overdose |
US11727119B2 (en) * | 2020-06-18 | 2023-08-15 | International Business Machines Corporation | Migration risk assessment, recommendation, and implementation |
US12131229B2 (en) | 2020-06-29 | 2024-10-29 | Optum Services (Ireland) Limited | Predictive data analysis techniques using bidirectional encodings of structured data fields |
US20230401529A1 (en) * | 2022-06-08 | 2023-12-14 | Express Scripts Strategic Development, Inc. | System and method for automatic detection for multiple failed orders at a back end pharmacy |
US20240020602A1 (en) * | 2022-07-15 | 2024-01-18 | Ukg Inc. | Systems and methods for generating multiple schedules for computational efficiency |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060224416A1 (en) * | 2005-03-29 | 2006-10-05 | Group Health Plan, Inc., D/B/A Healthpartners | Method and computer program product for predicting and minimizing future behavioral health-related hospital admissions |
US20090105550A1 (en) * | 2006-10-13 | 2009-04-23 | Michael Rothman & Associates | System and method for providing a health score for a patient |
US20110082712A1 (en) * | 2009-10-01 | 2011-04-07 | DecisionQ Corporation | Application of bayesian networks to patient screening and treatment |
US20120284052A1 (en) * | 2011-04-29 | 2012-11-08 | Sandra Lombardi Saukas | Systems and methods for providing a comprehensive initial assessment for workers compensation cases |
US20140081652A1 (en) * | 2012-09-14 | 2014-03-20 | Risk Management Solutions Llc | Automated Healthcare Risk Management System Utilizing Real-time Predictive Models, Risk Adjusted Provider Cost Index, Edit Analytics, Strategy Management, Managed Learning Environment, Contact Management, Forensic GUI, Case Management And Reporting System For Preventing And Detecting Healthcare Fraud, Abuse, Waste And Errors |
US20160048766A1 (en) * | 2014-08-13 | 2016-02-18 | Vitae Analytics, Inc. | Method and system for generating and aggregating models based on disparate data from insurance, financial services, and public industries |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7089592B2 (en) * | 2001-03-15 | 2006-08-08 | Brighterion, Inc. | Systems and methods for dynamic detection and prevention of electronic fraud |
US7363240B1 (en) | 2001-12-28 | 2008-04-22 | Travelers Property Casualty Corp. | Method and system for enhanced medical triage |
US7813937B1 (en) * | 2002-02-15 | 2010-10-12 | Fair Isaac Corporation | Consistency modeling of healthcare claims to detect fraud and abuse |
US8473311B2 (en) | 2003-10-22 | 2013-06-25 | Medco Health Solutions, Inc. | Computer system and method for generating healthcare risk indices using medical claims information |
US8359209B2 (en) * | 2006-12-19 | 2013-01-22 | Hartford Fire Insurance Company | System and method for predicting and responding to likelihood of volatility |
US8244654B1 (en) | 2007-10-22 | 2012-08-14 | Healthways, Inc. | End of life predictive model |
US8117043B2 (en) | 2009-05-14 | 2012-02-14 | Hartford Fire Insurance Company | System for evaluating potential claim outcomes using related historical data |
US8515788B2 (en) | 2009-09-04 | 2013-08-20 | The Travelers Indemnity Company | Methods and systems for providing customized risk mitigation/recovery to an insurance customer |
US8521555B2 (en) | 2009-12-09 | 2013-08-27 | Hartford Fire Insurance Company | System and method using a predictive model for nurse intervention program decisions |
US8543428B1 (en) | 2009-12-10 | 2013-09-24 | Humana Inc. | Computerized system and method for estimating levels of obesity in an insured population |
US20150286792A1 (en) * | 2014-04-03 | 2015-10-08 | HCMS Group LLC | Systems and methods for health risk determination |
US20160034662A1 (en) | 2014-08-01 | 2016-02-04 | The Travelers Indemnity Company | Systems, methods, and apparatus for identifying and mitigating potential chronic pain in patients |
US20160034664A1 (en) | 2014-08-01 | 2016-02-04 | The Travelers Indemnity Company | Systems, methods, and apparatus facilitating health care management and prevention of potential chronic pain in patients |
-
2014
- 2014-08-20 US US14/464,288 patent/US20160055589A1/en not_active Abandoned
-
2017
- 2017-08-25 US US15/686,420 patent/US10679299B2/en active Active
-
2020
- 2020-05-15 US US16/875,211 patent/US20200279334A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060224416A1 (en) * | 2005-03-29 | 2006-10-05 | Group Health Plan, Inc., D/B/A Healthpartners | Method and computer program product for predicting and minimizing future behavioral health-related hospital admissions |
US20090105550A1 (en) * | 2006-10-13 | 2009-04-23 | Michael Rothman & Associates | System and method for providing a health score for a patient |
US20110082712A1 (en) * | 2009-10-01 | 2011-04-07 | DecisionQ Corporation | Application of bayesian networks to patient screening and treatment |
US20120284052A1 (en) * | 2011-04-29 | 2012-11-08 | Sandra Lombardi Saukas | Systems and methods for providing a comprehensive initial assessment for workers compensation cases |
US20140081652A1 (en) * | 2012-09-14 | 2014-03-20 | Risk Management Solutions Llc | Automated Healthcare Risk Management System Utilizing Real-time Predictive Models, Risk Adjusted Provider Cost Index, Edit Analytics, Strategy Management, Managed Learning Environment, Contact Management, Forensic GUI, Case Management And Reporting System For Preventing And Detecting Healthcare Fraud, Abuse, Waste And Errors |
US20160048766A1 (en) * | 2014-08-13 | 2016-02-18 | Vitae Analytics, Inc. | Method and system for generating and aggregating models based on disparate data from insurance, financial services, and public industries |
Non-Patent Citations (1)
Title |
---|
Jose M. Valderas, Defining Comorbidity: Implications for Understanding Health and Health Services, Annuals of Family Medicine, July 2009. * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10445354B2 (en) | 2016-10-05 | 2019-10-15 | Hartford Fire Insurance Company | System to determine a credibility weighting for electronic records |
US10394871B2 (en) | 2016-10-18 | 2019-08-27 | Hartford Fire Insurance Company | System to predict future performance characteristic for an electronic record |
US20180173854A1 (en) * | 2016-12-21 | 2018-06-21 | Cerner Innovation, Inc. | Monitoring predictive models |
US11923094B2 (en) | 2016-12-21 | 2024-03-05 | Cerner Innovation, Inc. | Monitoring predictive models |
US11244764B2 (en) * | 2016-12-21 | 2022-02-08 | Cerner Innovation, Inc. | Monitoring predictive models |
US11188565B2 (en) | 2017-03-27 | 2021-11-30 | Advanced New Technologies Co., Ltd. | Method and device for constructing scoring model and evaluating user credit |
US10943464B1 (en) | 2017-09-27 | 2021-03-09 | State Farm Mutual Automobile Insurance Company | Real property monitoring systems and methods for detecting damage and other conditions |
US20230260048A1 (en) * | 2017-09-27 | 2023-08-17 | State Farm Mutual Automobile Insurance Company | Implementing Machine Learning For Life And Health Insurance Claims Handling |
US11783422B1 (en) * | 2017-09-27 | 2023-10-10 | State Farm Mutual Automobile Insurance Company | Implementing machine learning for life and health insurance claims handling |
US11017116B2 (en) * | 2018-03-30 | 2021-05-25 | Onsite Health Diagnostics, Llc | Secure integration of diagnostic device data into a web-based interface |
US12067625B2 (en) * | 2018-06-01 | 2024-08-20 | World Wide Warranty Life Services Inc. | System and method for protection plans and warranty data analytics |
US20210217093A1 (en) * | 2018-06-01 | 2021-07-15 | World Wide Warranty Life Services Inc. | A system and method for protection plans and warranty data analytics |
CN112534456A (en) * | 2018-06-01 | 2021-03-19 | 全球保修服务有限公司 | System and method for analyzing protection plan and warranty data |
US12210937B2 (en) * | 2018-11-16 | 2025-01-28 | Sap Se | Applying scoring systems using an auto-machine learning classification approach |
US12118493B2 (en) * | 2018-12-24 | 2024-10-15 | Hartford Fire Insurance Company | Interactive graphical user interface for insurance claim handlers including identifying insurance claim risks and health utilizing machine learning |
US20220138647A1 (en) * | 2018-12-24 | 2022-05-05 | Hartford Fire Insurance Company | System and method providing risk relationship resource allocation tool |
US11257018B2 (en) * | 2018-12-24 | 2022-02-22 | Hartford Fire Insurance Company | Interactive user interface for insurance claim handlers including identifying insurance claim risks and health scores |
US20230401511A1 (en) * | 2018-12-24 | 2023-12-14 | Hartford Fire Insurance Company | System and method providing risk relationship resource allocation tool |
US11783259B2 (en) * | 2018-12-24 | 2023-10-10 | Hartford Fire Insurance Company | Interactive graphical user interface for insurance claim handlers including identifying insurance claim risks and health scores via a body diagram dashboard |
US11574364B2 (en) * | 2019-03-28 | 2023-02-07 | Change Healthcare Holdings, Llc | Systems and methods for automated review of risk adjustment data on submitted medical claims |
US20230114791A1 (en) * | 2019-03-28 | 2023-04-13 | Change Healthcare Holdings, Llc | Systems and methods for automated review of risk adjustment data on submitted medical claims |
US20220083899A1 (en) * | 2020-09-11 | 2022-03-17 | International Business Machines Corporation | Validation of ai models using holdout sets |
US11715037B2 (en) * | 2020-09-11 | 2023-08-01 | International Business Machines Corporation | Validation of AI models using holdout sets |
US20220179771A1 (en) * | 2020-12-09 | 2022-06-09 | Capital One Services, Llc | Methods and systems for integrating model development control systems and model validation platforms |
US11599444B2 (en) * | 2020-12-09 | 2023-03-07 | Capital One Services, Llc | Methods and systems for integrating model development control systems and model validation platforms |
US11314620B1 (en) * | 2020-12-09 | 2022-04-26 | Capital One Services, Llc | Methods and systems for integrating model development control systems and model validation platforms |
US20220222440A1 (en) * | 2021-01-14 | 2022-07-14 | Rumman Chowdhury | Systems and methods for assessing risk associated with a machine learning model |
WO2022234112A1 (en) * | 2021-05-07 | 2022-11-10 | Swiss Reinsurance Company Ltd. | Cloud-based, scalable, advanced analytics platform for analyzing complex medical risk data and providing dedicated electronic trigger signals for triggering risk-related activities in the context of medical risk-transfer, and method thereof |
CN113657343A (en) * | 2021-08-30 | 2021-11-16 | 平安医疗健康管理股份有限公司 | Method and device for generating claim resource data based on machine learning |
WO2024017630A1 (en) | 2022-07-20 | 2024-01-25 | Swiss Reinsurance Company Ltd. | Digital life and/or health claims processing system integrating multiple claim channels, and method thereof |
CN118014737A (en) * | 2024-01-10 | 2024-05-10 | 保腾网络科技有限公司 | Health insurance risk control method and system based on big data |
Also Published As
Publication number | Publication date |
---|---|
US10679299B2 (en) | 2020-06-09 |
US20170352105A1 (en) | 2017-12-07 |
US20200279334A1 (en) | 2020-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10679299B2 (en) | Machine learning risk factor identification and mitigation system | |
US11798670B2 (en) | Methods and systems for managing patient treatment compliance | |
Justo et al. | Real-world evidence in healthcare decision making: global trends and case studies from Latin America | |
Comfort et al. | Effect of health insurance on the use and provision of maternal health services and maternal and neonatal health outcomes: a systematic review | |
CA2632730C (en) | Analyzing administrative healthcare claims data and other data sources | |
US20060271405A1 (en) | Pharmaceutical care of patients and documentation system therefor | |
Faruquee et al. | A scoping review of research on the prescribing practice of Canadian pharmacists | |
US11361381B1 (en) | Data integration and prediction for fraud, waste and abuse | |
US20160188819A1 (en) | Medical claims lead summary report generation | |
Burton et al. | Quality improvement initiative to decrease variability of emergency physician opioid analgesic prescribing | |
US11244029B1 (en) | Healthcare management system and method | |
Markowitz et al. | Nurse practitioner scope of practice and patient harm: Evidence from medical malpractice payouts and adverse action reports | |
Sarayani et al. | Topiramate utilization after phentermine/topiramate approval for obesity management: risk minimization in the era of drug repurposing | |
Georgieva et al. | Cost avoidance from health system specialty pharmacist interventions in patients with multiple sclerosis | |
US20180018433A1 (en) | Systems, devices, and methods for encouraging use of preferred drugs | |
McCann et al. | Out-of-pocket spending and health care utilization associated with initiation of different medications for opioid use disorder: Findings from a national commercially insured cohort | |
US20240143984A1 (en) | System for creating a temporal predictive model | |
US12183466B1 (en) | Method of and system for impairment rating repair for the managed impairment repair process | |
Hanna et al. | Advanced Therapies: Widening The Gap Between Payers And Regulators | |
Li et al. | Can AI Distort Human Capital? | |
PICHLER | CERTIFYING LETHE AS A DIGITAL HEALTH APPLICATION: ASSESSMENT OF CLINICAL VALIDITY, APPROVAL STANDARDS AND PROFITABILITY | |
Gusrianto et al. | A System Dynamics Model to Enhance the Indonesian Food and Drug Authority's Approach to Reduce Unauthorized Drug Sales in West Sumatra Province | |
Hanna et al. | Attributes Defining Patient Engagement And Centeredness In Health Care Research And Practice: A Framework Developed By The Ispor Patient-Centered Special Interest Group | |
Zauner et al. | Dexhelpp-Aging Population: Routine Data Based Analyses Of Fractures Due To Falls Less Than Three Meters–Hospitalization, Readmission And Mortality | |
Karaca et al. | PHP78 CHARACTERISTICS OF HOMELESS INDIVIDUALS USING INPATIENT AND EMERGENCY DEPARTMENT SERVICES |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MIDWEST EMPLOYERS CASUALTY COMPANY, MISSOURI Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BILLINGS, BRIAN ANDRE;REEL/FRAME:033575/0088 Effective date: 20140806 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |