+

WO2018107993A1 - 一种虚假地址信息识别的方法及装置 - Google Patents

一种虚假地址信息识别的方法及装置 Download PDF

Info

Publication number
WO2018107993A1
WO2018107993A1 PCT/CN2017/114441 CN2017114441W WO2018107993A1 WO 2018107993 A1 WO2018107993 A1 WO 2018107993A1 CN 2017114441 W CN2017114441 W CN 2017114441W WO 2018107993 A1 WO2018107993 A1 WO 2018107993A1
Authority
WO
WIPO (PCT)
Prior art keywords
address information
verified
grid
account
determining
Prior art date
Application number
PCT/CN2017/114441
Other languages
English (en)
French (fr)
Inventor
蒋贤礼
Original Assignee
阿里巴巴集团控股有限公司
蒋贤礼
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 蒋贤礼 filed Critical 阿里巴巴集团控股有限公司
Priority to KR1020197020451A priority Critical patent/KR102208892B1/ko
Priority to JP2019531993A priority patent/JP6756921B2/ja
Priority to EP17880372.2A priority patent/EP3557447A4/en
Publication of WO2018107993A1 publication Critical patent/WO2018107993A1/zh
Priority to US16/440,895 priority patent/US10733217B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • H04L67/306User profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/60Context-dependent security
    • H04W12/63Location-dependent; Proximity-dependent

Definitions

  • the present application relates to the field of information technology, and in particular, to a method and an apparatus for identifying false address information.
  • address information is usually verified by means of search engine verification and verification of logistics information.
  • the verification of the address information by the search engine is to input the address information to be verified into an existing search engine for searching, and determine whether the address information to be verified exists by using the address information already included in the search engine.
  • the verification of the address information by the logistics information is to verify the authenticity of the address information to be verified through the existing address information in the already collected logistics information.
  • the accuracy of the verification result and the coverage rate are determined based on the number of address information that the selected search engine has already included, that is, when the selected search engine includes more address information and If the coverage area is wide, the accuracy and coverage of the verification result may be high.
  • the search engine is more comprehensive and accurate for the address information of the technological area, but the address information of the remote area is relatively low, so the search is based on the search.
  • the accuracy of the geological information verification of the engine is unstable and the overall accuracy is not accurate.
  • the logistics industry is more strict in protecting the logistics information, which makes the logistics information difficult to obtain.
  • the accuracy of the logistics information and Authenticity is not the information that must be verified.
  • the user name "Sun Wukong”, the address "the east gate of a certain district in a certain city”, etc. although the above-mentioned logistics information is not true and inaccurate, it does not hinder the logistics business.
  • it cannot be used to verify the address information to be verified so it is difficult to verify the accuracy and coverage by using the logistics information to verify the address information to be verified.
  • the address information provided by the user is authentic, it is difficult to verify whether the address is the work address or the residence address of the user, that is, the address information is authentic, but not the address of the user, for example, the user a Taking the home address c of the user b as his home address, and assuming that the home address c of the user b is a real address, in the prior art, only the home address c can be identified as being true, and the family cannot be determined. Whether the address c is the user a, for the user a It is said that the home address c is actually a false address information, and such false address information is difficult to identify in the prior art, resulting in a lower accuracy of risk control based on the address information.
  • the embodiment of the present application provides a method for identifying a false address information, which is used to solve the problem that the verification of the address information by the prior art has a low accuracy rate, and it is difficult to verify the correspondence between the address and the account, resulting in low accuracy of verifying the false address information. problem.
  • the embodiment of the present invention provides a device for identifying false address information, which is used to solve the problem that the verification of the address information by the prior art has a low accuracy rate, and it is difficult to verify the correspondence between the address and the account, resulting in low accuracy of verifying the false address information. problem.
  • a method for identifying false address information includes:
  • a device for identifying false address information includes:
  • a first determining module determining address information of the account to be verified
  • the second determining module determines, according to the geographical location information reported by the account in the preset time period and the classification model of the training completion, in the pre-divided geographical range, determining the resident resident range of the account;
  • Matching module matching the to-be-verified address information with the resident range
  • the identification module determines whether the to-be-verified address information is false address information according to the matching result of the to-be-verified address information and the resident range.
  • the address information of the account to be verified is determined, and then according to the geographical location information reported by the account within the preset time period, the trained classification model is used in the pre-divided geographical range to determine the resident of the user who uses the account. a range, and then determining whether the to-be-verified address information is false address information according to a matching result of the to-be-verified address information and the grid corresponding to the resident range.
  • the determined resident range of the user who uses the account is determined by the geographical location information and the classification model reported by the account history, because the geographical location information reported by the account is not only true but also corresponding. In the account, so the determined resident location is not only true but also can be determined to be the account Therefore, by matching the to-be-verified address information with the resident range, the recognition accuracy of the fake address information can be made higher.
  • FIG. 2 is a schematic diagram of a map grid provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an apparatus for identifying a fake address information according to an embodiment of the present disclosure.
  • FIG. 1 is a process for identifying a fake address information according to an embodiment of the present disclosure, which specifically includes the following steps:
  • S101 Determine the address information of the account to be verified.
  • the verification of the address information is usually performed by the server of the service provider.
  • the service provider can also entrust a third party to verify the address information.
  • the verification of the address information may be performed by the server according to a preset condition (eg, verifying the address information at a fixed frequency or periodically, etc.), or initiated by a third party (eg, the third party server proposes the address information) Verification request), this application does not specifically limit how to start verification of address information.
  • the user provides the address information to the server through the account, so the address information is generally corresponding to the account. Therefore, in the embodiment of the present application, the address information of the account to be verified may be determined by the server.
  • the to-be-verified address information may be a home address, a work address, and the like where the user is resident in the account information that has been set in the account, and the server may call the user when determining that the account needs to be risk-controlled.
  • Each address information that has been set in the account is used as the address to be verified of the account.
  • the to-be-verified address information may be address information returned by the account after the server sends the address query information to the account, where the address query information may include at least one of text information, audio information, and video information, for example,
  • the text information may be "please provide a detailed home address” or "please provide a detailed work address” or the like to cause the account to return the address to be verified to the server.
  • the server may first determine an account that needs to perform risk control, then send an address inquiry message to the account, and accept the address information returned by the account as the address to be verified of the account.
  • how the server determines the address information to be verified of the account is not specifically limited, and may be set by the staff according to the needs of the actual application.
  • the server determines the to-be-verified address information of the account specifically determining whether the home address or the work address of the account may also be set by the staff according to actual needs, or the address information to be verified may include The home address and work address of the account.
  • the server may be a single device, or may be a system composed of multiple devices, that is, a distributed server.
  • S102 Determine, according to the geographic location information reported by the account in the preset time period and the classification model of the training completion, in the pre-divided geographical range, determine the resident location of the account.
  • the server may further determine the resident range of the account as the resident range of the user who uses the account, so as to follow up The address information to be verified is verified, and the false address information is identified.
  • the server may first determine the geography reported by the account.
  • Position information wherein the report may be a geographical location information of a device currently logged in to the server sent by the account according to a preset time frequency (for example, once every 30 minutes), or the account is When logging in, the geographical location information of the device currently logged in to the server is sent to the server, and the manner in which the location information is reported by the account may be set according to the actual application requirements, or the user address book may be obtained in real time with the prior art.
  • the method of determining the geographical location information reported by the account is not limited. Since the account stays in the same place for a longer period of time, the more the geographical location information reported by the account at the place, the location of the user who uses the account can be determined by the reported geographical location information, that is, the account The resident range.
  • the geographical location information reported by the account may be part of the geographical location information reported by the account, or all the geographical location information reported by the account, which may be set according to actual application requirements.
  • the server can determine each geographic location information reported by the account within a preset time period.
  • the preset time period may be a period of time back to the current time. For example, if the current time is November 11, 2016, and the preset time period is 4 months back, the server may determine July 11, 2016.
  • the geographical location information reported by the account between November 11th and 2016 may also be the time period from the specified start time to the specified end time, for example, from January 1st to June 1st.
  • the time is the preset time period, which can be set by the staff according to the needs of the actual application, and is not specifically limited in this application.
  • the specific time period of the preset time period may be set by the staff according to the needs of the actual application, for example, 4 months, 9 months, etc., and since the time of the usual house rental is at least half a year, if If the duration of the preset time period exceeds 6 months, the possibility of the life track of the account changes.
  • the duration of the preset time period is not specifically limited, and may also be performed by the staff according to the needs of the actual application. Settings.
  • the server can also divide the map into several grids according to the preset grid size, and pre-divide the grids on the map. Geographical scope, replacing geographical location information with precise location, determining the resident range of users who use the account, avoiding the redundancy caused by the error of positioning accuracy, and increasing the redundancy of the location accuracy of the geographic location information,
  • the grid of the map can be divided as shown in Figure 2.
  • FIG. 2 is a schematic diagram of a map grid provided by an embodiment of the present application. It can be seen that the map stored in the server has been pre-divided into a grid shape, wherein each grid is a dotted square and each grid can be in a latitude and longitude manner. Make a representation. Moreover, the side length of the grid can be set by the staff according to the needs of the actual application, for example, the square grid has a side length of 500 meters. It should be noted that the shorter the side length of the pre-divided grid, the more accurate the determined resident range of the user using the account is, but at the same time, the accuracy requirement for the geographical location information reported by the account is higher. The influence of the error in positioning accuracy is greater. Of course, the grid can also be other shapes, such as circles, triangles, etc., this application does not Make specific limits.
  • the server may determine, according to the pre-divided grids, the number and time of occurrence of each geographical location information reported by the account in the preset time period in each grid, and determine that the account is in each grid.
  • the feature value wherein the feature value can be as shown in Table 1.
  • Eigenvalue identification Eigenvalue description % of occurrence The percentage of occurrences in the grid as a percentage of total occurrences Occurrence of days
  • the number of days in the grid that account for the total number of days in the grid The number of working days
  • the number of days in the working day in the grid as a percentage of the total number of days in the grid Holiday days
  • the number of days in the holiday in the grid as a percentage of the total number of days Daytime duty ratio
  • the number of days in the day during the working day in the grid as a percentage of the total number of days in the grid Workday nighttime ratio
  • the number of days in the daytime during the holiday in the grid as a percentage of the total number of days in the day Holiday nighttime ratio
  • the number of days in the night of the holiday in the grid as a percentage of the total number of days in the night of days in the nighttime ratio
  • the above eight eigenvalues can be used to determine the frequency of occurrence of the account in each grid, the time period in which the account appears, and the like, for example, for each grid, the percentage of occurrences and The percentage of days that occur can determine whether the grid is a grid that often appears in the account. Obviously, if the grid is not a grid that often appears in the account, the grid is less likely to be the resident range of the user using the account. Through the ratio of the number of days in the working day, it can be determined whether the grid is the resident range of the user who uses the account.
  • the grid that appears more frequently on the working day is more It is possible that the resident range of the user who uses the account can determine whether the grid is not the area where the account works or resides by the proportion of the holiday days (for example, the user often goes to a gymnasium for fitness on weekends, and the holiday corresponds to the gymnasium in the gym. The number of occurrences in the grid is high, but the grid is not the area where the user works or lives.
  • the working day night ratio can determine whether the grid is the living area of the account, and the like.
  • the above-mentioned feature values determined in each grid can reflect the life trajectory and life law of the account in the grid divided by the map, and can exclude the region where the account is low-frequency (ie, the account is infrequent
  • the geographical extent of occurrence is to determine the interference of the resident range of the user who uses the account, in order to more accurately determine the grid corresponding to the resident range of the user who uses the account, and also to determine the network corresponding to the living area of the account.
  • the geographical location information may be carried The time when the report is reported, so in the present application, the server can determine some of the feature values in Table 1 by the time when each geographic location information is reported.
  • the time when the reporting is performed (referred to as the reporting time) may be the system time of the server when the geographical location information is received by the server, or may be the time information when the geographical location information is determined, or may be the geographic location.
  • the specific reporting time is adopted. It is not specifically limited and can be set by the staff according to the needs of the actual application.
  • the server may also determine, based on the classification model that has been trained, the grid that is frequently present in the users of the account using the account as the resident range of the user who uses the account. That is, the server may input the feature values in the respective grids corresponding to the account into the classification model of the training completion, and determine the use of each grid according to the classification result of each grid output by the classification model.
  • the grid of the resident's resident range of the account may be determined, based on the classification model that has been trained, the grid that is frequently present in the users of the account using the account as the resident range of the user who uses the account. That is, the server may input the feature values in the respective grids corresponding to the account into the classification model of the training completion, and determine the use of each grid according to the classification result of each grid output by the classification model. The grid of the resident's resident range of the account.
  • the server may select one or more feature values to determine the resident range of the user who uses the account, and the application does not limit that the server must use all the feature values to determine the user who uses the account.
  • the present application is not limited to using only the eight characteristic values shown in Table 1 above to determine the resident range of the user who uses the account.
  • the determination of the feature value may be specifically determined by the staff according to the actual application. Need to set up.
  • the training process for the classification model may be:
  • the server may pre-determine that a plurality of geographical location information has been verified as a real account, that is, an account that knows the real address information, as a training sample, and then collect each geographical location information reported by each training sample, and for each The training sample determines the feature value of the training sample in each grid, that is, determines the feature value of the training sample in each grid according to the number and time of occurrence of the training sample in each grid.
  • the server may sequentially input each feature value corresponding to each training sample into the classification model, and obtain a classification result.
  • the initial parameters of the classification model may be randomly generated or set by a staff member.
  • the classification result is that the classification model determines, for each training sample, whether each grid belongs to a grid corresponding to the resident range or belongs to the The grid corresponding to the range.
  • the server may determine the correctness rate of the classification result of the classification model according to the position of the coordinate corresponding to the real address information of each training sample in each grid, and adjust the parameters in the classification model according to the correct rate. .
  • the above process may be repeated in a loop until the preset number of repetitions, or the correct rate of the classification result of the classification model reaches a preset threshold, which may be set by the staff according to needs.
  • the classification model may include: random forest, logistic regression, nerve Classification algorithms such as networks, and the like, which is not limited to which classification model is specifically adopted.
  • S104 Determine, according to the matching result of the to-be-verified address information and the resident range, whether the to-be-verified address information is false address information.
  • the server after the server passes the training classification model, in each grid, after determining the grid corresponding to the resident range of the user using the account, the server can speak the address information to be verified and The resident range is matched, and it is determined whether the address information to be verified is false address information.
  • the server may first determine the coordinates of the to-be-verified address information according to the longitude of the earth and the latitude of the earth corresponding to the address information to be verified, and then the server may determine the coordinate correspondence of the address information to be verified in each grid. Grid, finally, determining whether the grid corresponding to the address information to be verified is the same as the grid corresponding to the resident range of the user using the account (ie, determining whether the coordinates of the address information to be verified fall into the resident If the range is corresponding to the grid, if it is, it is determined that the to-be-verified address information is not the fake address information, and if not, it is determined that the to-be-verified address information is the fake address information.
  • the grid corresponding to the address information to be verified matches the grid corresponding to the resident range of the user who uses the account, which means that the coordinates of the address information to be verified are located in the network corresponding to the resident range of the user who uses the account. In the grid.
  • the server may determine a grid corresponding to the resident range of the user who uses the account, and then the grid corresponding to the address information to be verified of the account and the user who uses the account.
  • the grid corresponding to the resident range is matched, and it is determined according to the matching result whether the address information to be verified is false address information.
  • the resident range of the user using the account determined by the server is determined based on the geographical location information reported by the history of the account, and is determined in a pre-divided map grid, so The reliability of the grid corresponding to the resident range is high, and the grid corresponding to the resident range is determined to be the account, and the address to be verified is matched according to the grid corresponding to the resident range.
  • the accuracy of the matching result is high, thereby obtaining a more accurate identification result of the fake address information, so that the accuracy of identifying the false address information is improved.
  • the positioning accuracy of different devices may not be completely consistent, and the positioning accuracy of the same device may be different under different external conditions, if the geographical location information reported by the account has geographical location information with low positioning accuracy In this case, the grid corresponding to the resident range of the user who uses the account is determined to be inaccurate, thereby affecting the accuracy of subsequent identification of the fake address information.
  • the server when determining the geographic location information reported by the account in the preset time period, may further select a positioning accuracy from each geographical location information according to a preset positioning precision threshold. Geographical location information of the location accuracy threshold as the pending address of the account The information is input into the classification model in which the training is completed, and the grid corresponding to the resident range of the user who uses the account is determined.
  • the server may also determine, for each training sample, geographic location information whose positioning accuracy is not less than the positioning accuracy threshold from each geographic location information reported in the preset time period, and train the Classification model.
  • the server is training
  • a commonly used method can be used to select a better classification model from a plurality of classification models as a classification model for determining the grid corresponding to the resident range.
  • the server can adopt multiple classification models to respectively
  • the training sample is trained, and the area under the Receiver Operating Characteristic Curve (ROC curve) corresponding to each classification model is calculated, and the AUC maximum classification model can be used as the classification model.
  • the classification model that is completed after training of course, which one is selected
  • the class model can also be selected by the staff according to the needs of the actual application. For example, considering the time cost, selecting a classification model with a faster classification speed, as a classification model for training completion, etc., the application is not specifically limited.
  • the classification model trained by different types of data may be different as described above, so in order to improve the applicability of the classification model, in the embodiment of the present application, the server may select a preset proportion of training samples for When testing each classification model, the samples used by the server in training each of the classification models may not be identical to the samples used in the calculated AUC, so as to achieve a better classification model selection result, wherein the preset ratio is It can be set by the staff, and this application is not limited.
  • the server can also determine the geography reported in the training sample for a period of time. Position information, wherein the time period may also be consistent with the preset time period, or may be inconsistent. The starting point and the ending point of the time period may be determined by the staff according to the needs of the actual application, for example, determining the training sample. The address information is started in real time, and the geographical location information reported by the training sample within 4 months is backtracked, and the present application is not specifically limited.
  • the classification result determined by the classification model by the feature value is It is also possible to distinguish between the resident range of the user who uses the account, the resident residence of the user who uses the account, and the resident working range of the user who uses the account.
  • the information to be verified of the account determined by the server in step S101 may further include: Verify the residential address information and the work address to be verified. Therefore, the classification model completed by the training can determine the resident resident range and the resident working range of the account through the geographical location information reported by the account.
  • the classification model may determine certain real-life address information and several accounts of known real working address information as training samples, and report the training samples according to the training samples. a plurality of geographic location information, determining the number and time of the training samples appearing in each grid, and determining corresponding characteristics of the training samples in each grid according to the number and time of occurrences of the training samples in each grid Value, and finally, according to the corresponding feature values of each training sample in each grid, the known real living address information of each training sample, and the known real working address information of each training sample, the classification model is trained, and then the classification model is determined.
  • the resident area can be determined only as the resident residence area and the resident work area.
  • step S103 when the to-be-verified address information is the to-be-verified living address information, the coordinates of the to-be-verified living address information are determined according to the longitude and latitude corresponding to the to-be-verified living address information; and the to-be-verified living address is determined.
  • the to-be-verified address information Whether the coordinates of the information fall within the resident residence range; if yes, it is determined that the to-be-verified address information is not false address information; if not, it is determined that the to-be-verified address information is false address information, and when the to-be-verified address information is to be
  • verifying the work address information determining the coordinates of the work address to be verified according to the longitude and latitude corresponding to the work address information to be verified; determining whether the coordinates of the work address to be verified fall within the resident work range; if yes, Then, it is determined that the to-be-verified address information is not the fake address information; if not, it is determined that the to-be-verified address information is the fake address information.
  • the contact information may include: a phone number, address information, and the like.
  • the verification of the address information may be that the financial institution checks the address information of the account when applying for a credit card or credit service to the financial institution, and the server may be the financial The server for verifying the address information of the organization, or the financial institution may be a third party that initiates the address information verification request to the server, wherein the verification of the address information by the financial institution is usually based on two aspects, on the one hand The authenticity of the address information is verified, and on the other hand, whether the address information is the account is verified.
  • the server may determine whether the to-be-verified address information of the account is false address information, and the server may not only determine the authenticity of the to-be-verified address information, but also determine Whether the to-be-verified address information corresponds to the account, that is, whether the to-be-verified address information matches the resident range of the user who uses the account.
  • the to-be-verified address information may be the verified residential address information of the account and/or Or the work address information to be verified of the account, by identifying whether the address information to be verified is false address information, the risk of the account may be determined, for example, if the account provides false address information, the account fraudulently obtains the loan The possibility is higher, and vice versa.
  • the server i of the bank f determines the address information to be verified of the account e respectively
  • the server i may first sort the geographical location information and the classification of the training completed according to the account e within the preset time period.
  • a model in a pre-divided geographical range, respectively determining a resident residence range of the account e and a resident working range of the account e, and separately respectively verifying the living address to be verified with the resident residence range, and the to-be-verified
  • the work address information is matched with the resident working range, and finally, the to-be-verified is determined according to the matching result of the to-be-verified living address information and the resident residence range and the matching result of the to-be-verified work address information and the resident working range.
  • the residence address information and the work address to be verified are false address information, and the server i can When the residential address information to be verified and the verification address has to be a false address information, determine the account e higher risk, not credit card business to the account e, or reduce credit to the account provided by e. Certainly, after specifically determining that the account provides the fake address information, what operation is subsequently taken is not specifically limited.
  • the execution bodies of the steps of the method provided by the embodiment of the present application may all be the same device, or the method may also be performed by different devices.
  • the execution body of step S101 and step S102 may be device 1
  • the execution body of step S103 may be device 2
  • the execution body of step S101 may be device 1
  • the execution body of step S102 and step S103 may be device 2
  • the server can be a distributed server composed of multiple devices.
  • the execution body of each step of the method provided by the embodiment of the present application is not limited to a server, and may be a terminal, and the terminal may be a mobile phone, a personal computer, a tablet computer, or the like.
  • the embodiment of the present application further provides a device for identifying false address information, as shown in FIG. 3 .
  • FIG. 3 is a schematic structural diagram of an apparatus for identifying a fake address information according to an embodiment of the present disclosure, including:
  • the first determining module 201 determines the address information of the account to be verified
  • the second determining module 202 determines, according to the geographic location information reported by the account in the preset time period and the classification model of the training completion, in the pre-divided geographical range, determining the resident location of the account;
  • the matching module 203 matches the to-be-verified address information with the resident range
  • the identification module 204 determines whether the to-be-verified address information is false address information according to the matching result of the to-be-verified address information and the resident range.
  • the geographical location information includes: longitude and latitude.
  • the location information further includes: location accuracy, and the second determining module 202 determines, according to the preset location accuracy threshold, that the location accuracy is not less than the location information reported by the account in the preset time period.
  • the geographical location information of the preset positioning accuracy threshold is determined according to the geographical location information whose positioning accuracy is not less than the preset positioning accuracy threshold, and the training completed classification model, and the account is determined in a pre-divided geographical range. Resident range.
  • the second determining module divides the map into a plurality of grids according to a preset grid size, and uses each grid on the map as a pre-divided geographic range.
  • the second determining module 202 is configured to train the classification model by using a method for determining a number of accounts with known real address information as training samples, and determining, according to the training sample, a plurality of geographical location information reported by the training sample.
  • the number and time of the training samples appearing in each grid, and determining the corresponding feature values of the training samples in each grid according to the number and time of occurrence of the training samples in each grid, according to each training sample in each network The corresponding feature values in the cells, and the actual address information of each training sample are known, and the classification model is trained.
  • the second determining module 202 determines, according to each geographic location information reported by the account in a preset time period, a corresponding feature value of the account in each grid, and corresponding the account in each grid.
  • the feature value is input into the classification model in which the training is completed, and the resident range of the account is determined.
  • the identifying module 204 determines the coordinates of the to-be-verified address information according to the longitude and latitude corresponding to the to-be-verified address information, and determines whether the coordinates of the to-be-verified address information fall within the resident range, and if so, Then, it is determined that the to-be-verified address information is not false address information, and if not, it is determined that the to-be-verified address information is false address information.
  • the to-be-verified address information includes: the to-be-verified living address information and the to-be-verified working address information, the second determining module 202, according to the geographic location information reported by the account in the preset time period, and the classification model of the training completion In the pre-divided geographical scope, determine the resident residence area of the account and the resident work scope.
  • the second determining module 202 trains the classification model to determine a number of accounts that are known to be real residential address information and known real working address information, as training samples, and for each training sample, according to the plurality of geography reported by the training sample Position information, determining the number of times the training sample appears in each grid and the time, determining the corresponding feature value of the training sample in each grid according to the number and time of occurrence of the training sample in each grid, according to Corresponding feature values of each training sample in each grid, known real residential address information of each training sample, and known real working address information of each training sample, training the classification model, so that the classification model is used to determine resident The scope of residence and the scope of permanent work.
  • the corresponding feature value of the training sample in any of the grids includes: a ratio of the number of occurrences of the training sample in the grid to the total number of occurrences, a ratio of days in which the training sample appears in the grid to the total number of days of occurrence, the training The ratio of the number of days in the working day of the sample to the total number of days in the grid, the training sample The ratio of the number of days in the holiday to the total number of days in the grid, the proportion of days in the day when the training sample is in the grid, and the number of days in the grid.
  • the identifying module 204 when the to-be-verified address information is the to-be-verified living address information, determining the coordinates of the to-be-verified living address information according to the longitude and latitude corresponding to the to-be-verified living address information; Verifying that the coordinates of the residential address information fall within the resident residence range; if yes, determining that the to-be-verified address information is not false address information; if not, determining that the to-be-verified address information is a false address information,
  • the verification address information is the work address information to be verified, determining the coordinates of the work address information to be verified according to the longitude and latitude corresponding to the work address information to be verified; determining whether the coordinates of the work address information to be verified fall And entering the resident working range; if yes, determining that the to-be-verified address information is not false address information; if not, determining that the to-be-verified address information is false address information.
  • the apparatus for identifying the fake address information as shown in FIG. 3 may be located in a server, and the server may be a single device or a system composed of multiple devices, that is, a distributed server.
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • the controller can be implemented in any suitable manner, for example, the controller can take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor.
  • computer readable program code eg, software or firmware
  • examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, The Microchip PIC18F26K20 and the Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic.
  • the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding.
  • Such a controller can therefore be considered a hardware component, and the means for implementing various functions included therein can also be considered as a structure within the hardware component.
  • a device for implementing various functions can be considered as a software module that can be both a method of implementation and a structure within a hardware component.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer.
  • the computer can be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the present application can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Business, Economics & Management (AREA)
  • Remote Sensing (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种虚假地址信息识别的方法及装置,首先确定账户的待核实地址信息(S101),之后根据该账户在预设时间段内上报的各地理位置信息,在预先划分的地理范围中,采用训练完成的分类模型,确定该账户的常驻范围(S102),之后根据该待核实地址信息与该常驻范围对应的网格的匹配结果,确定该待核实地址信息是否是虚假的地址信息(S104)。可见,确定的该账户的常驻范围,是通过该账户历史上报的地理位置信息以及分类模型确定的,由于该账户上报的地理位置信息不仅是真实的,还是对应于该账户的,所以确定的该常驻范围不仅真实也可确定是该账户的,所以通过对该待核实地址信息与该常驻范围进行匹配,可以使得对虚假地址信息的识别准确率更高。

Description

一种虚假地址信息识别的方法及装置 技术领域
本申请涉及信息技术领域,尤其涉及一种虚假地址信息识别的方法及装置。
背景技术
随着信息技术的发展,通过网络执行的业务越来越多,通常可对用户提供的地址信息的真实性进行核实,来保证执行业务时的安全性。如,对的账户的基本信息进行核实,若可判明该账户的地址信息是虚假的,则该账户存在较高风险,在执行业务时需要谨慎对待。
在现有技术中,通常采用通过搜索引擎核实以及物流信息核实的方式,对地址信息进行核实。
具体的,通过搜索引擎进行地址信息的核实,就是将该待核实的地址信息输入现有的搜索引擎中进行搜索,通过该搜索引擎已经收录的地址信息,确定该待核实的地址信息是否真实存在。通过物流信息进行地址信息的核实,则是通过已经收录的物流信息中已有的地址信息,对该待核实的地址信息的真实性进行核实。
但是,通过搜索引擎进行地址信息的核实时,核实结果的准确率以及覆盖率,均基于选择的搜索引擎已经收录的地址信息的数量而决定,即,当选用的搜索引擎收录的地址信息多且覆盖地域广的话,核实结果的准确率和覆盖率可能较高,而通常搜索引擎对于繁华地域的地址信息的收录较为全面准确,但对于偏远地区的地址信息的收录则相对较低,所以基于搜索引擎的地质信息核实的准确率不稳定,整体上不够准确。
对于通过物流信息进行地址信息核实的方法来说,首先通常物流业为了保证使用物流业务的用户的隐私,对于物流信息保护较为严密,导致物流信息难以获得,另一方面,物流信息的准确性与真实性并不是必须要核实的信息,例如,用户名“孙悟空”、地址“某市某区某小区东门”等等,上述物流信息虽然不真实、不准确,但是也无碍物流业务的进行,但是却无法用于对待验证的地址信息进行核实,所以使用物流信息进行待验证的地址信息的核实也难以保证准确率和覆盖率。
进一步地,即使用户提供的地址信息是真实的,但是也难以核实该地址是否是该用户的工作地址或者居住地址,即,该地址信息是真实的,但并非该用户的地址,例如,用户a将用户b的家庭住址c作为自己的家庭住址,假设用户b的家庭住址c是一个真实存在的地址,则在现有技术中,仅能识别该家庭住址c是真实的,而无法确定该家庭住址c是否是该用户a的,对于用户a来 说,该家庭住址c实际上是虚假的地址信息,而这类虚假的地址信息在现有技术中尚难以识别,导致基于地址信息进行风险控制的准确率降低。
可见,由于现有技术中对于地址信息的核实的方法存在上述缺点,导致对虚假地址信息识别的准确性低。
发明内容
本申请实施例提供一种虚假地址信息识别的方法,用于解决由于现有技术对地址信息的核实存在准确率低、难以核实地址与账户的对应关系,导致虚假地址信息核实的准确率低的问题。
本申请实施例提供一种虚假地址信息识别的装置,用于解决由于现有技术对地址信息的核实存在准确率低、难以核实地址与账户的对应关系,导致虚假地址信息核实的准确率低的问题。
本申请实施例采用下述技术方案:
一种虚假地址信息识别的方法,包括:
确定账户的待核实地址信息;
根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围;
将所述待核实地址信息与所述常驻范围进行匹配;
根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息。
一种虚假地址信息识别的装置,包括:
第一确定模块,确定账户的待核实地址信息;
第二确定模块,根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围;
匹配模块,将所述待核实地址信息与所述常驻范围进行匹配;
识别模块,根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息。
本申请实施例采用的上述至少一个技术方案能够达到以下有益效果:
首先确定账户的待核实地址信息,之后根据该账户在预设时间段内上报的各地理位置信息,在预先划分的地理范围中,采用训练完成的分类模型,确定使用该账户的用户的常驻范围,之后根据该待核实地址信息与该常驻范围对应的网格的匹配结果,确定该待核实地址信息是否是虚假的地址信息。可见,在本申请中,确定的使用该账户的用户的常驻范围,是通过该账户历史上报的地理位置信息以及分类模型确定的,由于该账户上报的地理位置信息不仅是真实的,还是对应于该账户的,所以确定的该常驻范围不仅真实也可确定是该账户 的,所以通过对该待核实地址信息与该常驻范围进行匹配,可以使得对虚假地址信息的识别准确率更高。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的一种虚假地址信息识别的过程;
图2为本申请实施例提供的地图网格的示意图;
图3为本申请实施例提供的一种虚假地址信息识别的装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图1为本申请实施例提供的一种虚假地址信息识别的过程,具体包括以下步骤:
S101:确定账户的待核实地址信息。
在现有技术中,通常服务提供方存在对地址信息进行核实的需求,所以通常由服务提供方的服务器进行地址信息的核实。当然服务提供方也可以委托第三方进行该地址信息的核实。其中,对地址信息的核实可由服务器根据预设条件进行的(如,以固定频率或者定期进行地址信息的核实等等),或者由第三方发起的(如,第三方服务器提出对该地址信息的核实请求),本申请对如何开始进行地址信息的核实并不做具体限定。
另外,一般情况下是由用户通过账户向服务器提供地址信息,所以地址信息通常与账户是对应的,于是,在本申请实施例中,可由服务器先确定账户的待核实地址信息。
具体的,该待核实地址信息可以是该账户已经设置的账户信息中的家庭住址、工作地址等等用户常驻的地址,则该服务器在确定需要对该账户进行风险控制时,便可调用该账户已经设置的各地址信息,作为该账户的待核实地址信息。
或者,该待核实地址信息也可是该服务器向该账户发送地址询问信息后,该账户返回的地址信息,其中,该地址询问信息可包含文本信息、音频信息、视频信息中的至少一种,例如,该文本信息可以是“请您提供详细的家庭住址”或者“请您提供详细的工作地址”等等,以使得该账户向该服务器返回该待核实地址信息。则,该服务器可先确定需要进行风险控制的账户,再向该账户发送地址询问信息,并接受该账户返回的地址信息,作为该账户的待核实地址信息。
当然,具体该服务器如何确定该账户的待核实地址信息本申请并不做具体限定,可由工作人员根据实际应用时的需要进行设置。另外,该服务器在确定该账户的该待核实地址信息时,具体是确定该账户的家庭住址还是工作地址也可由工作人员根据实际应用时的需要进行设置,或者,该待核实地址信息可以同时包括该账户的家庭住址以及工作地址。
需要说明的是,在本申请实施例中,该服务器可以是单独的一台设备,也可以是由多台设备组成的系统,即,分布式服务器。
S102:根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围。
由于现代社会中人们的生活轨迹是较为固定,且具有规律性的,如,工作日白天在办公场所工作或者在学校学习,夜间回到住所休息,而除了工作日之外,人们在节假日的行动轨迹就相对较为随机,除了在住所休息以外,还可能去一些景点、商圈等地点放松休闲。而由于人们这种较为固定的、具有规律性的生活轨迹,使得通过确定用户不同时段的位置信息,可以较为准确的确定该用户的生活区域以及工作区域。
于是,在本申请实施例中,当该服务器确定了该账户的待核实地址信息之后,该服务器还可以进一步确定该账户的常驻范围,作为使用该账户的用户的常驻范围,以便后续对该待核实地址信息进行核实,并进行虚假地址信息识别。
具体的,首先,由于需要确定使用该账户的用户的生活轨迹(以下简称为账户的生活轨迹),以确定使用该账户的用户的常驻范围,所以该服务器可以先确定该账户上报的各地理位置信息,其中,所述上报可以是该账户登录后,根据预设的时间频率(如,30分钟一次),向该服务器发送的该账户当前登录的设备的地理位置信息,或者是该账户在登录时,向该服务器发送的该账户当前登录的设备的地理位置信息,该账户上报地理位置信息的方式,可以根据实际应用的需要进行设置,也可以采用与现有技术中实时获取用户地址本的方法,确定该账户上报的各地理位置信息,具体方法申请不做限定。由于该账户在同一地点停留的时间越长,则该账户在该地点上报的地理位置信息越多,所以可以通过上报的各地理位置信息确定使用该账户的用户的常驻范围,即,该账户的常驻范围。
另外,该账户上报的各地理位置信息,可以是该账户上报的部分地理位置信息,也可是该账户上报的全部地理位置信息,具体可根据实际应用的需要设置。
进一步地,由于人们的工作地点以及居住地点通常是较为固定的,在大多数情况下短时间内不会改变,而另一方面,现代社会人员的流动性相对较高,所以在本申请中,该服务器可以确定该账户在预设时间段内上报的各地理位置信息。该预设时间段可以是当前时刻向前回溯的一段时间,例如,假设当前时刻为2016年11月11号,该预设时间段是回溯4个月,则该服务器可确定2016年7月11号至2016年11月11号之间该账户上报的各地理位置信息,也可以是该由指定开始时间,至指定结束时间内的时间段,例如,1月1日至6月1日之间的时间,为该预设时间段,具体可由工作人员根据实际应用的需要进行设置,本申请不做具体限定。
更进一步地,该预设时间段具体时长可由工作人员根据实际应用时的需要进行设置,例如4个月、9个月等等,而由于通常房屋租赁的时间最少是以半年为期,所以若该预设时间段的时长超过6个月则该账户的生活轨迹出现变化的可能性提高,当然,该预设时间段的时长本申请并不做具体限定,同样可由工作人员根据实际应用的需要进行设置。则通过确定预设时间段内该账户上报的各地理位置信息可以确定该账户较为规律的生活轨迹,既不会因为采集过长的时间段内的各地理位置信息,确定出多条生活轨迹,也不会因为采集过短时间段内的各地理位置信息,而难以确定该账户的生活轨迹。
其次,在本申请中,由于设备的定位精度并不固定,地理位置信息的定位精度在设备受到环境影响时会出现误差,所以该账户上报的各地理位置信息在定位精度上也并不完全一致,所以为了更加准确的确定使用该账户的用户的常驻范围,该服务器还可以根据预设的网格大小,将地图划分为若干网格,并以该地图上的各网格,作为预先划分的地理范围,以各地位范围取代精准定位的地理位置信息,确定使用该账户的用户的常驻范围,避免由于定位精度的误差带来的影响,增加地理位置信息的定位精度的冗余,其中该地图划分的网格可如图2所示。
图2为本申请实施例提供的地图网格的示意图,可见,该服务器中存储的地图已经预先划分成网格状,其中每个网格的为虚线的正方形每个网格可以用经纬度的方式进行表示。并且,该网格的边长可由工作人员根据实际应用的需要进行设置,例如,该正方形网格的边长为500米。需要说明的是,该预先划分的网格的边长越短,则确定的使用该账户的用户的常驻范围越精准,但同时,对于该账户上报的地理位置信息的精度要求就越高,定位精度的误差带来的影响越大。当然,该网格也可是其他形状,如圆形、三角形等等,本申请对此不 做具体限定。
之后,该服务器可以根据该预先划分的各网格,确定该账户在预设时间段内上报的各地理位置信息在各网格中的出现的次数以及时间,并且确定该账户在各网格中的特征值,其中,该特征值可如表1所示。
特征值标识 特征值描述
出现次数占比 在该网格内出现次数占总出现次数的比例
出现天数占比 在该网格内出现天数占总出现天数的比例
工作日天数占比 在该网格内工作日出现天数占总出现天数的比例
节假日天数占比 在该网格内节假日出现天数占总出现天数的比例
工作日白天占比 在该网格内工作日白天出现天数占总出现天数的比例
工作日夜间占比 在该网格内工作日夜间出现天数占总出现天数的比例
节假日白天占比 在该网格内节假日白天出现天数占总出现天数的比例
节假日夜间占比 在该网格内节假日夜间出现天数占总出现天数的比例
表1
通过表1可见,通过上述8个特征值可以确定每个网格中,该账户出现的频率高低、该账户出现的时间段等信息,如,针对每个网格,通过该出现次数占比以及出现天数占比可以确定该网格是否是该账户经常出现的网格,显然若该网格不是该账户经常出现的网格则该网格是使用该账户的用户的常驻范围的概率较低、通过该工作日天数占比可以确定该网格是否是使用该账户的用户的常驻范围,显然由于人们工作日的出行轨迹通常较为固定,所以在工作日出现次数较多的网格,更有可能是使用该账户的用户的常驻范围、通过该节假日天数占比可以确定该网格是否不是该账户工作或者居住的地区(例如,用户经常周末去某体育馆健身,则节假日在该体育馆对应的网格中出现的次数较多,但该网格并不是该用户的工作或者居住的地区)、通过该工作日白天占比可以确定该网格是否是该账户的工作区域、工作日夜间占比可以确定该网格是否是该账户的居住区域,等等。也就是说,上述在各网格中确定的特征值可以反应出该账户在该地图划分的网格中的生活轨迹以及生活规律,并且可以排除该账户低频出现的地域(即,该账户不常出现的地理范围)对于确定使用该账户的用户的常驻范围的干扰,以便更准确的确定使用该账户的用户的常驻范围对应的网格,并且还可以确定该账户的生活区域对应的网格以及工作区域对应的网格。
另外,由于通常账户在上报的地理位置信息时,该地理位置信息可携带有 上报时的时间,所以在本申请中,该服务器可以通过各地理位置位置信息上报时的时间,确定表1中的部分特征值。其中该上报时的时间(简称,上报时间)可以是该服务器接收到该地理位置信息时,该服务器的系统时间,也可以是该地理位置信息被确定时的时间信息,又或者可以是该地理位置信息由设备发送至该服务器时,该设备的发送时间。其中,若采用该服务器的系统时间,则确定各账户的地理位置信息的上报时间可以较为统一,方便管理,但是存在网络信息延时带来的误差,当然,具体采用何种上报时间,本申请并不做具体限定,可由工作人员根据实际应用时的需要进行设置。
最后,该服务器还可根据已经训练完成的分类模型,确定各网格中,使用该账户的用户经常出现的网格,作为使用该账户的用户的常驻范围。即,该服务器可以将该账户对应的在各网格中的特征值,输入该训练完成的分类模型中,并根据该分类模型输出的对各网格的分类结果,确定各网格中属于使用该账户的用户的常驻范围的网格。
需要说明的是,该服务器可以选择上述一个或者多个特征值,用于确定使用该账户的用户的常驻范围,本申请并不限定该服务器必须使用全部的特征值来确定使用该账户的用户的常驻范围,同时,本申请也不限定仅采用上述表1所示的8种特征值来确定使用该账户的用户的常驻范围,该特征值的确定具体可以由工作人员根据实际应用的需要进行设置。
其中,对该分类模型的训练过程,可以是:
首先,该服务器可将预先确定多个地理位置信息已经核实为真实的账户,即,已知真实地址信息的账户,作为训练样本,之后采集各训练样本上报的各地理位置信息,并针对每个训练样本,确定该训练样本在各网格中的特征值,即,根据该训练样本在各网格中出现的次数以及时间,确定该训练样本在各网格中的特征值。
然后,该服务器可将各训练样本分别对应的各特征值,依次输入该分类模型中,并得到分类结果。其中,该分类模型初始的参数,可以随机生成,或者由工作人员设置,该分类结果为,该分类模型对于每一个训练样本,确定每一个网格是属于常驻范围对应的网格还是属于非常驻范围对应的网格。
再后,该服务器可根据各训练样本已知真实地址信息分别对应的坐标在各网格中的位置,确定该分类模型的分类结果的正确率,并根据该正确率调整该分类模型中的参数。
上述过程可以重复循环进行,直到预设的重复次数为止,或者该分类模型的分类结果的正确率达到预设的阈值为止,该预设的阈值可由工作人员根据需要设置。
进一步地,在本申请中,该分类模型可包括:随机森林、逻辑回归、神经 网络等等分类算法,本申请对具体采用何种分类模型并不限定。
S103:将所述待核实地址信息与所述常驻范围进行匹配。
S104:根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息。
在本申请实施例中,当该服务器通过训练完成的分类模型,在各网格中,确定使用该账户的用户的常驻范围对应的网格之后,该服务器便可讲该待核实地址信息与该常驻范围进行匹配,并判断该待核实地址信息是否是虚假地址信息。
具体的,该服务器首先,可以根据该待核实地址信息对应的地球经度以及地球纬度,确定该待核实地址信息的坐标,之后该服务器可在各网格中,确定该待核实地址信息的坐标对应的网格,最后,判断该待核实地址信息对应的网格与使用该账户的用户的常驻范围对应的网格是否相同(即,判断该待核实的地址信息的坐标是否落入该常驻范围对应的网格内),若是,则确定该待核实地址信息不是虚假地址信息,若否,则确定该待核实地址信息是虚假地址信息。
其中,该待核实地址信息对应的网格与使用该账户的用户的常驻范围对应的网格匹配,便意味着该待核实地址信息的坐标位于使用该账户的用户的常驻范围对应的网格中。
通过如图1所示的风险控制的方法,该服务器可确定使用该账户的用户的常驻范围对应的网格,之后再将该账户的待核实地址信息对应的网格与使用该账户的用户的常驻范围对应的网格进行匹配,并根据匹配结果确定该待核实地址信息是否是虚假地址信息。可见,在对该账户的待核实地址进行核实时,该服务器确定的使用该账户的用户的常驻范围是基于该账户历史上报的地理位置信息,在预先划分的地图网格中确定的,所以该常驻范围对应的网格的可信度较高,并且可以确定该常驻范围对应的网格是该账户的,则基于该常驻范围对应的网格对该待核实地址信息进行匹配时,该匹配结果的准确度较高,以此得到较为准确的虚假地址信息的识别结果,使得对虚假地址信息识别的准确率提高。
另外,由于不同设备的定位精度可能不完全一致,并且在不同的外部条件下,同一设备的定位精度也可能出现差异,而若该账户上报的地理位置信息中存在定位精度较低的地理位置信息时,则有可能导致后续确定的使用该账户的用户的常驻范围对应的网格不准确,进而影响后续对虚假地址信息识别的准确率。
所以在本申请实施例中,该服务器在确定该账户在预设时间段内上报的各地理位置信息时,还可以根据预设的定位精度阈值,从各地理位置信息中,选择定位精度不小于该定位精度阈值的地理位置信息,作为该账户的待核实地址 信息输入该训练完成的分类模型中,确定使用该账户的用户的常驻范围对应的网格。
同理,对于各训练样本来说,该服务器也可以针对每个训练样本,从预设时间段内上报的各地理位置信息中,确定定位精度不小于该定位精度阈值的地理位置信息,训练该分类模型。
进一步地,在本申请中,由于不同的分类模型对于不同类型的数据的效果不一致,而训练样本的地理位置信息的分布规律通常也是随机的,例如,工作地点与居住地点很近的训练样本,以及工作地点与居住地点很远的训练样本等等,可能导致对于同样的训练样本,不同的分类模型得到的分类结果的准确性不完全一致,于是在本申请实施例中,该服务器在训练该分类模型时,可以采用常用的方法从多种分类模型中选择效果较好的分类模型,作为确定该常驻范围对应的网格的分类模型,具体的,该服务器可采用多种分类模型分别对该训练样本进行训练,并分别计算每个分类模型对应的受试者工作特征曲线(Receiver Operating Characteristic Curve,ROC曲线)下的面积(Area Under Curve,AUC),并可将AUC最大的分类模型作为训练完成的分类模型,当然,具体选用哪一个分类模型也可以由工作人员根据实际应用的需要进行选择,例如,考虑时间成本,选择分类速度较快的分类模型,作为训练完成的分类模型等等,本申请并不做具体限定。
更进一步地,如上所述不同的类型的数据训练出的分类模型可能存在差异,所以为了提高分类模型的适用性,在本申请实施例中,该服务器可以选择预设比例的训练样本,用于对各分类模型进行测试,则该服务器在训练各该分类模型时采用的样本与计算的AUC时采用的样本可不完全相同,以达到更好的分类模型的选择结果,其中,该预设的比例可由工作人员设置,本申请并不限定。
另外,由于训练样本的生活轨迹也不是完全固定不变的,所以在采用各训练样本训练该分类模型时,针对每个训练样本,该服务器也可确定该训练样本的一段时间内上报的各地理位置信息,其中该一段时间也可以与该预设时间段一致,也可以不一致,具体该一段时间的起始点与结束点可由工作人员根据实际应用时的需要进行确定,例如,从确定该训练样本的地址信息为真实时开始,回溯4个月内的该训练样本上报的各地理位置信息等等,本申请并不做具体限定。
进一步地,由于表1中所示的各特征值所能体现的该账户的生活轨迹以及生活规律,并且,通过各地理位置信息的上报时间,通过该特征值该分类模型确定的分类结果中,还可以区分使用该账户的用户的常驻范围还可包括,使用该账户的用户的常驻居住范围以及使用该账户的用户的常驻工作范围。
则在步骤S101中该服务器确定的该账户的待核实地址信息还可包括:待 核实居住地址信息以及待核实工作地址信息。于是,该训练完成的分类模型,通过该账户上报的各地理位置信息,可以确定该账户对应常驻居住范围以及常驻工作范围。
更进一步地,该分类模型在确定各训练样本时,可以确定已知真实居住地址信息以及已知真实工作地址信息的若干账户,作为训练样本,并针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训练样本出现在每个网格中的次数以及时间,再根据该训练样本在每个网格中出现的次数和时间,确定该训练样本在各网格中对应的特征值,最后根据各训练样本在各网格中对应的特征值、各训练样本已知真实居住地址信息以及各训练样本已知真实工作地址信息,训练所述分类模型,则所述分类模型在确定常驻范围时,可以仅以将常驻范围确定为常驻居住范围以及常驻工作范围。
另外,在步骤S103中,当该待核实地址信息为待核实居住地址信息时,根据该待核实居住地址信息对应的经度以及纬度,确定该待核实居住地址信息的坐标;判断该待核实居住地址信息的坐标是否落入该常驻居住范围内;若是,则确定该待核实地址信息不是虚假地址信息;若否,则确定该待核实地址信息是虚假地址信息,当该待核实地址信息为待核实工作地址信息时,根据该待核实工作地址信息对应的经度以及纬度,确定该待核实工作地址信息的坐标;判断该待核实工作地址信息的坐标是否落入该常驻工作范围内;若是,则确定该待核实地址信息不是虚假地址信息;若否,则确定该待核实地址信息是虚假地址信息。
通常金融机构在对申请贷款或者信用卡的账户的风险进行判断时,需要账户提供如,身份信息、联系信息、资产信息等信息,并对各信息进行核实,来确定对该账户的潜在风险,以进行后续的操作。其中,联系信息可包括:电话号码、地址信息等等。
于是,本申请另一实施例中,该地址信息的核实,可以是该账户在向金融机构申请信用卡或者信贷服务时,该金融机构对该账户的地址信息进行核实,则该服务器可以是该金融机构的用于对地址信息核实的服务器,或者该金融机构可以是向该服务器发起地址信息核实请求的第三方,其中,通常金融机构对于地址信息的核实出于两个方面,一方面是对该地址信息的真实性进行核实,另一方面,是对该地址信息是否是该账户的进行核实。
进一步地,在通过步骤S101~S104之后,该服务器可以确定该账户的该待核实地址信息是否是虚假的地址信息,并且,该服务器不仅可以确定该待核实地址信息的真实性,同时也可以确定该待核实地址信息与该账户是否对应,即,该待核实地址信息是否与使用该账户的用户的常驻范围匹配。
更进一步地,该待核实地址信息可以是该账户的待核实居住地址信息和/ 或该账户的待核实工作地址信息,则通过对该待核实地址信息是否是虚假地址信息的识别,可以确定该账户的风险,如,若该账户提供的是虚假地址信息,则该账户骗取贷款的可能性较高,反之亦然。例如,假设用户d通过账户e,向银行f申请信用卡业务,并且根据银行的要求,提供了居住地址g以及工作地址h,进一步假设该银行f的服务器i确定该账户e的待核实地址信息分别为,待核实居住地址,即居住地址g,以及待核实工作地址,即居住地址h,则该服务器i可先根据该账户e在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,分别确定该账户e的常驻居住范围以及该账户e的常驻工作范围,再分别将该待核实居住地址信息与该常驻居住范围,以及该待核实工作地址信息与该常驻工作范围进行匹配,最后根据该待核实居住地址信息与该常驻居住范围的匹配结果以及该待核实工作地址信息与该常驻工作范围的匹配结果,确定该待核实居住地址信息以及该待核实工作地址是否是虚假地址信息,并且,该服务器i可以仅当该待核实居住地址信息以及该待核实工作地址有一个是虚假地址信息时,确定该账户e的风险较高,不向该账户e提供信用卡业务,或者降低向该账户e提供的信用额度。当然,具体确定该账户提供的是虚假地址信息后,后续采取何种操作本申请并不做具体限定。
需要说明的是,本申请实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤S101和步骤S102的执行主体可以为设备1,步骤S103的执行主体可以为设备2;又比如,步骤S101的执行主体可以为设备1,步骤S102和步骤S103的执行主体可以为设备2;等等,即,该服务器可以是由多台设备组成的分布式服务器。同时,本申请实施例所提供的方法的各步骤的执行主体也不限定为服务器,也可以是终端,该终端可以是手机、个人电脑、平板电脑等设备。
基于图1所示的虚假地址信息识别过程,本申请实施例还对应提供一种虚假地址信息识别的装置,如图3所示。
图3为本申请实施例提供的一种虚假地址信息识别的装置的结构示意图,包括:
第一确定模块201,确定账户的待核实地址信息;
第二确定模块202,根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围;
匹配模块203,将所述待核实地址信息与所述常驻范围进行匹配;
识别模块204,根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息。
所述地理位置信息包括:经度、纬度。
所述地理位置信息还包括:定位精度,所述第二确定模块202,根据预设的定位精度阈值,从所述账户在预设时间段内上报的各地理位置信息中,确定定位精度不小于所述预设的定位精度阈值的地理位置信息,根据定位精度不小于所述预设的定位精度阈值的地理位置信息,以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围。
所述第二确地模块,根据预设的网格大小,将地图划分为若干网格,将所述地图上的各网格,作为预先划分的地理范围。
所述第二确定模块202,采用下述方法训练所述分类模型:确定若干已知真实地址信息的账户,作为训练样本,针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训练样本出现在各网格中的次数以及时间,根据该训练样本在各网格中出现的次数和时间,确定该训练样本在各网格中对应的特征值,根据各训练样本在各网格中对应的特征值,以及各训练样本已知真实地址信息,训练所述分类模型。
所述第二确定模块202,根据所述账户在预设时间段内上报的各地理位置信息,确定所述账户在各网格中对应的特征值,将所述账户在各网格中对应的特征值输入所述训练完成的分类模型中,确定所述账户的常驻范围。
所述识别模块204,根据所述待核实地址信息对应的经度以及纬度,确定所述待核实地址信息的坐标,判断所述待核实地址信息的坐标是否落入所述常驻范围内,若是,则确定所述待核实地址信息不是虚假地址信息,若否,则确定所述待核实地址信息是虚假地址信息。
所述待核实地址信息包括:待核实居住地址信息以及待核实工作地址信息,所述第二确定模块202,根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻居住范围以及常驻工作范围。
所述第二确定模块202,训练所述分类模型,确定已知真实居住地址信息以及已知真实工作地址信息的若干账户,作为训练样本,针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训练样本出现在每个网格中的次数以及时间,根据该训练样本在每个网格中出现的次数和时间,确定该训练样本在各网格中对应的特征值,根据各训练样本在各网格中对应的特征值、各训练样本已知真实居住地址信息以及各训练样本已知真实工作地址信息,训练所述分类模型,以使得所述分类模型用于确定常驻居住范围以及常驻工作范围。
该训练样本在任一网格中对应的特征值包括:该训练样本在该网格内出现次数占总出现次数的比例、该训练样本在该网格内出现天数占总出现天数的比例、该训练样本在该网格内工作日出现天数占总出现天数的比例、该训练样本 在该网格内节假日出现天数占总出现天数的比例、该训练样本在该网格内工作日白天出现天数占总出现天数的比例、该训练样本在该网格内工作日夜间出现天数占总出现天数的比例、该训练样本在该网格内节假日白天出现天数占总出现天数的比例、该训练样本在该网格内节假日夜间出现天数占总出现天数的比例中的至少一种。
所述识别模块204,当所述待核实地址信息为待核实居住地址信息时,根据所述待核实居住地址信息对应的经度以及纬度,确定所述待核实居住地址信息的坐标;判断所述待核实居住地址信息的坐标是否落入所述常驻居住范围内;若是,则确定所述待核实地址信息不是虚假地址信息;若否,则确定所述待核实地址信息是虚假地址信息,当所述待核实地址信息为待核实工作地址信息时,根据所述待核实工作地址信息对应的经度以及纬度,确定所述待核实工作地址信息的坐标;判断所述待核实工作地址信息的坐标是否落入所述常驻工作范围内;若是,则确定所述待核实地址信息不是虚假地址信息;若否,则确定所述待核实地址信息是虚假地址信息。
具体的,上述如图3所示的虚假地址信息识别的装置可以位于服务器中,该服务器具体可以是一台设备,也可以是由多台设备组成的系统,即,分布式服务器。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL (Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一 个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (22)

  1. 一种虚假地址信息识别的方法,其特征在于,所述方法包括:
    确定账户的待核实地址信息;
    根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围;
    将所述待核实地址信息与所述常驻范围进行匹配;
    根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息。
  2. 如权利要求1所述的方法,其特征在于,所述地理位置信息包括:经度、纬度。
  3. 如权利要求2所述的方法,其特征在于,所述地理位置信息还包括:定位精度;根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围,具体包括:
    根据预设的定位精度阈值,从所述账户在预设时间段内上报的各地理位置信息中,确定定位精度不小于所述预设的定位精度阈值的地理位置信息;
    根据定位精度不小于所述预设的定位精度阈值的地理位置信息,以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围。
  4. 如权利要求1所述的方法,其特征在于,预先划分地理范围,具体包括:
    根据预设的网格大小,将地图划分为若干网格;
    将所述地图上的各网格,作为预先划分的地理范围。
  5. 如权利要求4所述的方法,其特征在于,采用下述方法训练所述分类模型:
    确定若干已知真实地址信息的账户,作为训练样本;
    针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训练样本出现在各网格中的次数;
    根据该训练样本在各网格中出现的次数,确定该训练样本在各网格中对应的特征值;
    根据各训练样本在各网格中对应的特征值,以及各训练样本已知真实地址信息,训练所述分类模型。
  6. 如权利要求5所述的方法,其特征在于,根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围,具体包括:
    根据所述账户在预设时间段内上报的各地理位置信息,确定所述账户在各网格中对应的特征值;
    将所述账户在各网格中对应的特征值输入所述训练完成的分类模型中,确定所述账户的常驻范围。
  7. 如权利要求1所述的方法,其特征在于,根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息,具体包括:
    根据所述待核实地址信息对应的经度以及纬度,确定所述待核实地址信息的坐标;
    判断所述待核实地址信息的坐标是否落入所述常驻范围内;
    若是,则确定所述待核实地址信息不是虚假地址信息;
    若否,则确定所述待核实地址信息是虚假地址信息。
  8. 如权利要求1所述的方法,其特征在于,所述待核实地址信息包括:待核实居住地址信息以及待核实工作地址信息;
    根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围,具体包括:
    根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻居住范围以及常驻工作范围。
  9. 如权利要求8所述的方法,其特征在于,训练所述分类模型,具体包括:
    确定已知真实居住地址信息以及已知真实工作地址信息的若干账户,作为训练样本;
    针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训练样本出现在每个网格中的次数以及时间;
    根据该训练样本在每个网格中出现的次数和时间,确定该训练样本在各网格中对应的特征值;
    根据各训练样本在各网格中对应的特征值训练样本已知真实居住地址信息以及各训练样本已知真实工作地址信息,训练所述分类模型,以使得所述分类模型用于确定常驻居住范围以及常驻工作范围。
  10. 如权利要求9所述的方法,其特征在于,该训练样本在任一网格中对应的特征值包括:该训练样本在该网格内出现次数占总出现次数的比例、该训练样本在该网格内出现天数占总出现天数的比例、该训练样本在该网格内工作日出现天数占总出现天数的比例、该训练样本在该网格内节假日出现天数占总出现天数的比例、该训练样本在该网格内工作日白天出现天数占总出现天数的比例、该训练样本在该网格内工作日夜间出现天数占总出现天数的比例、该训练样本在该网格内节假日白天出现天数占总出现天数的比例、该训练样本在该 网格内节假日夜间出现天数占总出现天数的比例中的至少一种。
  11. 如权利要求8所述的方法,其特征在于,根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息,具体包括:
    当所述待核实地址信息为待核实居住地址信息时,根据所述待核实居住地址信息对应的经度以及纬度,确定所述待核实居住地址信息的坐标;判断所述待核实居住地址信息的坐标是否落入所述常驻居住范围内;若是,则确定所述待核实地址信息不是虚假地址信息;若否,则确定所述待核实地址信息是虚假地址信息;
    当所述待核实地址信息为待核实工作地址信息时,根据所述待核实工作地址信息对应的经度以及纬度,确定所述待核实工作地址信息的坐标;判断所述待核实工作地址信息的坐标是否落入所述常驻工作范围内;若是,则确定所述待核实地址信息不是虚假地址信息;若否,则确定所述待核实地址信息是虚假地址信息。
  12. 一种虚假地址信息识别的装置,其特征在于,包括:
    第一确定模块,确定账户的待核实地址信息;
    第二确定模块,根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围;
    匹配模块,将所述待核实地址信息与所述常驻范围进行匹配;
    识别模块,根据所述待核实地址信息与所述常驻范围的匹配结果,确定所述待核实地址信息是否是虚假地址信息。
  13. 如权利要求12所述的装置,其特征在于,所述地理位置信息包括:经度、纬度。
  14. 如权利要求13所述的装置,其特征在于,所述地理位置信息还包括:定位精度,所述第二确定模块,根据预设的定位精度阈值,从所述账户在预设时间段内上报的各地理位置信息中,确定定位精度不小于所述预设的定位精度阈值的地理位置信息,根据定位精度不小于所述预设的定位精度阈值的地理位置信息,以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻范围。
  15. 如权利要求12所述的装置,其特征在于,所述第二确地模块,根据预设的网格大小,将地图划分为若干网格,将所述地图上的各网格,作为预先划分的地理范围。
  16. 如权利要求15所述的装置,其特征在于,所述第二确定模块,采用下述方法训练所述分类模型:确定若干已知真实地址信息的账户,作为训练样本,针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训 练样本出现在各网格中的次数,根据该训练样本在各网格中出现的次数,确定该训练样本在各网格中对应的特征值,根据各训练样本在各网格中对应的特征值,以及各训练样本已知真实地址信息,训练所述分类模型。
  17. 如权利要求16所述的装置,其特征在于,所述第二确定模块,根据所述账户在预设时间段内上报的各地理位置信息,确定所述账户在各网格中对应的特征值,将所述账户在各网格中对应的特征值输入所述训练完成的分类模型中,确定所述账户的常驻范围。
  18. 如权利要求12所述的装置,其特征在于,所述识别模块,根据所述待核实地址信息对应的经度以及纬度,确定所述待核实地址信息的坐标,判断所述待核实地址信息的坐标是否落入所述常驻范围内,若是,则确定所述待核实地址信息不是虚假地址信息,若否,则确定所述待核实地址信息是虚假地址信息。
  19. 如权利要求12所述的装置,其特征在于,所述待核实地址信息包括:待核实居住地址信息以及待核实工作地址信息,所述第二确定模块,根据所述账户在预设时间段内上报的各地理位置信息以及训练完成的分类模型,在预先划分的地理范围中,确定所述账户常驻居住范围以及常驻工作范围。
  20. 如权利要求19所述的装置,其特征在于,所述第二确定模块,训练所述分类模型,确定已知真实居住地址信息以及已知真实工作地址信息的若干账户,作为训练样本,针对每个训练样本,根据该训练样本上报的若干地理位置信息,确定该训练样本出现在每个网格中的次数以及时间,根据该训练样本在每个网格中出现的次数和时间,确定该训练样本在各网格中对应的特征值,根据各训练样本在各网格中对应的特征值、各训练样本已知真实居住地址信息以及各训练样本已知真实工作地址信息,训练所述分类模型,以使得所述分类模型用于确定常驻居住范围以及常驻工作范围。
  21. 如权利要求20所述的装置,其特征在于,该训练样本在任一网格中对应的特征值包括:该训练样本在该网格内出现次数占总出现次数的比例、该训练样本在该网格内出现天数占总出现天数的比例、该训练样本在该网格内工作日出现天数占总出现天数的比例、该训练样本在该网格内节假日出现天数占总出现天数的比例、该训练样本在该网格内工作日白天出现天数占总出现天数的比例、该训练样本在该网格内工作日夜间出现天数占总出现天数的比例、该训练样本在该网格内节假日白天出现天数占总出现天数的比例、该训练样本在该网格内节假日夜间出现天数占总出现天数的比例中的至少一种。
  22. 如权利要求19所述的装置,其特征在于,所述识别模块,当所述待核实地址信息为待核实居住地址信息时,根据所述待核实居住地址信息对应的经度以及纬度,确定所述待核实居住地址信息的坐标;判断所述待核实居住地 址信息的坐标是否落入所述常驻居住范围内;若是,则确定所述待核实地址信息不是虚假地址信息;若否,则确定所述待核实地址信息是虚假地址信息,当所述待核实地址信息为待核实工作地址信息时,根据所述待核实工作地址信息对应的经度以及纬度,确定所述待核实工作地址信息的坐标;判断所述待核实工作地址信息的坐标是否落入所述常驻工作范围内;若是,则确定所述待核实地址信息不是虚假地址信息;若否,则确定所述待核实地址信息是虚假地址信息。
PCT/CN2017/114441 2016-12-14 2017-12-04 一种虚假地址信息识别的方法及装置 WO2018107993A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020197020451A KR102208892B1 (ko) 2016-12-14 2017-12-04 잘못된 주소 정보를 식별하기 위한 방법 및 장치
JP2019531993A JP6756921B2 (ja) 2016-12-14 2017-12-04 偽住所情報識別方法およびデバイス
EP17880372.2A EP3557447A4 (en) 2016-12-14 2017-12-04 METHOD AND DEVICE FOR IDENTIFYING FALSE ADDRESS INFORMATION
US16/440,895 US10733217B2 (en) 2016-12-14 2019-06-13 Method and apparatus for identifying false address information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611153869.5 2016-12-14
CN201611153869.5A CN107066478B (zh) 2016-12-14 2016-12-14 一种虚假地址信息识别的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/440,895 Continuation US10733217B2 (en) 2016-12-14 2019-06-13 Method and apparatus for identifying false address information

Publications (1)

Publication Number Publication Date
WO2018107993A1 true WO2018107993A1 (zh) 2018-06-21

Family

ID=59619172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/114441 WO2018107993A1 (zh) 2016-12-14 2017-12-04 一种虚假地址信息识别的方法及装置

Country Status (7)

Country Link
US (1) US10733217B2 (zh)
EP (1) EP3557447A4 (zh)
JP (1) JP6756921B2 (zh)
KR (1) KR102208892B1 (zh)
CN (2) CN111858937B (zh)
TW (1) TWI699652B (zh)
WO (1) WO2018107993A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034157A (zh) * 2019-12-24 2021-06-25 中国移动通信集团浙江有限公司 集团成员识别方法、装置及计算设备

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858937B (zh) * 2016-12-14 2024-04-30 创新先进技术有限公司 一种虚假地址信息识别的方法及装置
CN110069626B (zh) * 2017-11-09 2023-08-04 菜鸟智能物流控股有限公司 一种目标地址的识别方法、分类模型的训练方法以及设备
CN110392122B (zh) * 2018-04-16 2021-12-07 腾讯大地通途(北京)科技有限公司 地址类型的确定方法和装置、存储介质、电子装置
US10721242B1 (en) * 2018-04-27 2020-07-21 Facebook, Inc. Verifying a correlation between a name and a contact point in a messaging system
US10462080B1 (en) 2018-04-27 2019-10-29 Whatsapp Inc. Verifying users of an electronic messaging system
CN108416672A (zh) * 2018-05-11 2018-08-17 试金石信用服务有限公司 金融风险评估方法、系统、服务器及存储介质
CN109359186B (zh) * 2018-10-25 2020-12-08 杭州时趣信息技术有限公司 一种确定地址信息的方法、装置和计算机可读存储介质
CN109636568A (zh) * 2018-10-25 2019-04-16 深圳壹账通智能科技有限公司 电话号码的风险检测方法、装置、设备及存储介质
CN109919357B (zh) * 2019-01-30 2021-01-22 创新先进技术有限公司 一种数据确定方法、装置、设备及介质
CN111667127B (zh) * 2019-03-05 2023-04-18 杭州海康威视系统技术有限公司 一种智能监管方法、装置及电子设备
CN109978075B (zh) * 2019-04-04 2021-09-28 江苏满运软件科技有限公司 车辆虚假位置信息识别方法、装置、电子设备、存储介质
CN110599200B (zh) * 2019-09-10 2022-11-01 携程计算机技术(上海)有限公司 Ota酒店的虚假地址的检测方法、系统、介质及设备
CN110807068B (zh) * 2019-10-08 2022-09-23 北京百度网讯科技有限公司 换设备用户的识别方法、装置、计算机设备和存储介质
CN110708333B (zh) * 2019-10-22 2022-04-01 深圳市卡牛科技有限公司 一种位置验证方法以及相关设备
CN110807685B (zh) * 2019-10-22 2021-09-07 上海钧正网络科技有限公司 信息处理方法、装置、终端及可读存储介质
CN111310462A (zh) * 2020-02-07 2020-06-19 北京三快在线科技有限公司 用户属性的确定方法、装置、设备及存储介质
CN111400442B (zh) * 2020-02-28 2024-06-04 深圳前海微众银行股份有限公司 常驻地址分析方法、装置、设备及可读存储介质
US11803748B2 (en) * 2020-05-29 2023-10-31 Sap Se Global address parser
US11436240B1 (en) * 2020-07-03 2022-09-06 Kathleen Warnaar Systems and methods for mapping real estate to real estate seeker preferences
JP7577598B2 (ja) 2020-09-28 2024-11-05 Kddi株式会社 成果報酬決定サーバ、成果報酬決定方法、及びコンピュータプログラム
CN113076752A (zh) * 2021-03-26 2021-07-06 中国联合网络通信集团有限公司 识别地址的方法和装置
CN113609290B (zh) * 2021-07-28 2025-04-18 北京沃东天骏信息技术有限公司 一种地址识别方法及装置、存储介质
CN113722617A (zh) * 2021-09-30 2021-11-30 京东城市(北京)数字科技有限公司 企业实际办公地址的识别方法、装置及电子设备
CN114066606B (zh) * 2021-11-17 2024-07-19 四川新网银行股份有限公司 一种基于文本转义为gps距离的资料虚假识别系统及方法
JP7576058B2 (ja) * 2022-03-30 2024-10-30 楽天グループ株式会社 情報処理システム、方法及びプログラム
CN114757201A (zh) * 2022-04-14 2022-07-15 阿里巴巴(中国)有限公司 收货地址的识别方法、存储介质和处理器
CN115022014B (zh) * 2022-05-30 2023-07-14 平安银行股份有限公司 登录风险识别方法、装置、设备及存储介质
CN115333954B (zh) * 2022-08-10 2024-03-15 河南龙翼信息技术有限公司 虚假地址云端分析系统
CN115374713B (zh) * 2022-10-25 2022-12-27 成都新希望金融信息有限公司 一种gps真伪识别模型的训练方法
CN116561450A (zh) * 2023-05-25 2023-08-08 浪潮软件股份有限公司 一种用于企业注册申报的地址规范化管理方法及平台

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463668A (zh) * 2014-10-24 2015-03-25 南京邦科威信息科技有限公司 一种在线信用审核方法及装置
CN105447129A (zh) * 2015-11-18 2016-03-30 腾讯科技(深圳)有限公司 个性化内容获取方法、用户属性挖掘方法、系统和装置
US20160132930A1 (en) * 2014-11-10 2016-05-12 Brian Handly Mobile Device Proximity Determination
CN105787104A (zh) * 2016-03-21 2016-07-20 百度在线网络技术(北京)有限公司 用户属性信息的获取方法和装置
CN107066478A (zh) * 2016-12-14 2017-08-18 阿里巴巴集团控股有限公司 一种虚假地址信息识别的方法及装置

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040139049A1 (en) 1996-08-22 2004-07-15 Wgrs Licensing Company, Llc Unified geographic database and method of creating, maintaining and using the same
US6122624A (en) * 1998-05-28 2000-09-19 Automated Transaction Corp. System and method for enhanced fraud detection in automated electronic purchases
US6728767B1 (en) * 2000-08-18 2004-04-27 Cisco Technology, Inc. Remote identification of client and DNS proxy IP addresses
GB2402841B (en) * 2003-06-10 2005-05-11 Whereonearth Ltd A method of providing location based information to a mobile terminal within a communications network
US7454192B1 (en) * 2005-02-04 2008-11-18 Sprint Communications Company, L.P. Postal address validation using mobile telephone location information
US20080102819A1 (en) * 2006-10-30 2008-05-01 Henrik Bengtsson System and method for verifying contact data
US8220034B2 (en) 2007-12-17 2012-07-10 International Business Machines Corporation User authentication based on authentication credentials and location information
US8863258B2 (en) 2011-08-24 2014-10-14 International Business Machines Corporation Security for future log-on location
US9465800B2 (en) * 2013-10-01 2016-10-11 Trunomi Ltd. Systems and methods for sharing verified identity documents
CN103825942B (zh) * 2014-02-24 2018-07-10 可牛网络技术(北京)有限公司 自动查询应用程序app行为报告的方法、装置及服务器
US20150310434A1 (en) 2014-04-29 2015-10-29 Dennis Takchi Cheung Systems and methods for implementing authentication based on location history
KR101667644B1 (ko) * 2014-10-10 2016-10-19 나이스평가정보 주식회사 고객정보 진위여부검증 지원시스템
CN104361023B (zh) * 2014-10-22 2018-01-30 浙江中烟工业有限责任公司 一种情境感知的移动终端烟草信息推送方法
CN104598573B (zh) * 2015-01-13 2017-06-16 北京京东尚科信息技术有限公司 一种用户的生活圈提取方法及系统
SG11201706149XA (en) * 2015-01-27 2017-08-30 Beijing Didi Infinity Tech And Dev Co Ltd Methods And Systems For Providing Information For An On-Demand Service
CN104765873B (zh) * 2015-04-24 2019-03-26 百度在线网络技术(北京)有限公司 用户相似度确定方法和装置
US20170017921A1 (en) * 2015-07-16 2017-01-19 Bandwidth.Com, Inc. Location information validation techniques
CN105260795B (zh) * 2015-10-13 2019-05-03 广西师范学院 一种基于条件随机场的重点人员位置时空预测方法
CN106027544B (zh) * 2016-06-24 2019-12-06 深圳壹账通智能科技有限公司 地址信息的校验方法、云服务器及手持终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463668A (zh) * 2014-10-24 2015-03-25 南京邦科威信息科技有限公司 一种在线信用审核方法及装置
US20160132930A1 (en) * 2014-11-10 2016-05-12 Brian Handly Mobile Device Proximity Determination
CN105447129A (zh) * 2015-11-18 2016-03-30 腾讯科技(深圳)有限公司 个性化内容获取方法、用户属性挖掘方法、系统和装置
CN105787104A (zh) * 2016-03-21 2016-07-20 百度在线网络技术(北京)有限公司 用户属性信息的获取方法和装置
CN107066478A (zh) * 2016-12-14 2017-08-18 阿里巴巴集团控股有限公司 一种虚假地址信息识别的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3557447A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034157A (zh) * 2019-12-24 2021-06-25 中国移动通信集团浙江有限公司 集团成员识别方法、装置及计算设备
CN113034157B (zh) * 2019-12-24 2023-12-26 中国移动通信集团浙江有限公司 集团成员识别方法、装置及计算设备

Also Published As

Publication number Publication date
JP2020502673A (ja) 2020-01-23
TWI699652B (zh) 2020-07-21
KR20190094230A (ko) 2019-08-12
KR102208892B1 (ko) 2021-01-29
CN111858937A (zh) 2020-10-30
EP3557447A4 (en) 2019-11-20
EP3557447A1 (en) 2019-10-23
TW201822032A (zh) 2018-06-16
JP6756921B2 (ja) 2020-09-16
CN107066478A (zh) 2017-08-18
CN107066478B (zh) 2020-06-09
US20190294620A1 (en) 2019-09-26
CN111858937B (zh) 2024-04-30
US10733217B2 (en) 2020-08-04

Similar Documents

Publication Publication Date Title
WO2018107993A1 (zh) 一种虚假地址信息识别的方法及装置
TWI698770B (zh) 資源轉移監測方法、裝置、監測設備及儲存媒體
US10991248B2 (en) Parking identification and availability prediction
CN108446281B (zh) 确定用户亲密度的方法、装置及存储介质
US10715949B2 (en) Determining timing for determination of applicable geo-fences
CA3090497C (en) Transaction classification based on transaction time predictions
US20220038465A1 (en) Methods and Systems for Authenticating a Reported Geolocation of a Mobile Device
CN110706376B (zh) 一种人流量统计方法和装置
CN113449986B (zh) 一种业务分配方法、装置、服务器及存储介质
CN108416616A (zh) 投诉举报类别的排序方法和装置
US11954190B2 (en) Method and apparatus for security verification based on biometric feature
US8423525B2 (en) Life arcs as an entity resolution feature
RU2641246C2 (ru) Способ и устройство оценки безопасности
CN107389078A (zh) 一种路线推荐方法、装置及计算机可读存储介质
CN110458394B (zh) 一种基于对象关联度的指标测算方法及装置
US20210398144A1 (en) Impact Based Fraud Detection
CN117037316A (zh) 移动打卡方法、装置、计算机设备和存储介质
CN110175738A (zh) 医疗资源丰富度评价方法及系统
Rahmatulloh et al. Point Clipping Algorithm on Employee Presence Application for Geolocation of Employee Position
CN106708872B (zh) 一种关联对象的识别方法及装置
CN117272257A (zh) 一种职业身份认证方法、装置及设备
CN114638634A (zh) 数据处理方法及装置
Yi et al. Why You Go Reveals Who You Know: Disclosing Social Relationship by Cooccurrence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17880372

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019531993

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20197020451

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017880372

Country of ref document: EP

Effective date: 20190715

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载