Skip to main

API Technical Specs

Learn how to configure and leverage our services to achieve your toughest recruiting needs.

This documentation is for Version 10 of the Sovren REST API, released on December 15, 2020. Both V9 and V10 use the same parsing and matching engines under-the-hood, but V10 is more streamlined and has a vastly simpler output. Please visit this link for an in-depth comparison.

Resume Parser Overview

The Sovren Resume/CV Parser takes in documents and returns structured json responses representing a human understanding of the data. Your integration task isn't to just get the API call to succeed, but rather to understand the bigger picture and how to properly configure each transaction. For example, different configurations are needed for processing batches of resumes, resumes from college students, or even resumes coming from Australia or New Zealand. Below we will discuss the most important points to understand for your integration.

To parse a resume accurately, you must tell us when that resume was written or submitted to your system. This is not obvious, but it is 100x more important than any other setting when parsing resumes. We cannot determine that date from the resume. You must specify it explicitly. We refer to this date as the Document Last Modified Date.

Revision Date

Revision Dates are required for every transaction because they make or break the accuracy of parsing. Revision Date Dates tell the parser when the document was last revised, which impacts the interpretation of terms such as 'current'. In a candidate upload scenario, it's safe to assume a candidate is uploading a version of their resume that's reasonably current and that you can use today's date as the Document Last Modified Date. In any other situation, that means the resume was received sometime before today.

If you leave this date off and parse a batch of 1 million resumes, your oldest and least employable candidates will be distorted as the most experienced, most employable, ready-to-go-to-work candidates despite having received the resume years ago. Let's look at an example below.


Molly Adams

(678) 555-1212
missadams@yahoo.com

930 Via Mil Cumbres Unit 119
Solana Beach, California 92075

Work History
Technical Difference2014 - current
Senior Engineer
  • built company website in .NET
...

Sample Parsed Data Points
Field NameCorrect Revision Date (2015-01-01)No Revision Date (parsed as today)
Months Experience for job title Senior Engineer12103
Months Experience for skill .NET12103
Last Used Date for skill .NET2014-01-012022-08-02
Total Months Experience12103
Average Months Per Employer12103

As you can see in the sample parsed data points, not using a Revision Date means the parser can't properly calculate the metadata and reports that the candidate has over 5 times the amount of experience they actually have and that it's in the current time frame. This type of mistake will pollute any searching or matching software and bring these false positives high into the result set.

How do I determine the correct Revision Date?

The most correct Revision Date is the last time the file was authored. Since resumes can come from many different places, there are a few things to look for when determining the most correct Revision Date. Here are a few use cases and our recommended approach for determining the correct date.

File upload control

When a user uploads a file directly from their file system, we would recommend using the last modified date of the file. More documentation ofFile.lastModified can be found at https://developer.mozilla.org/en-US/docs/Web/API/File/lastModified.

Batch of resumes on disk

When you have a batch of resumes on disk that you are processing you need to look at the last modified date of the files and make sure that they all aren't the same, or within seconds of each other. If those dates are the same, then the metadata of the file was overwritten at some point during file transfer and isn't valid. You need to go back to the source and move those files over using a different approach.

Batch of resumes from a database

When you have a batch of resumes from a database those are usually stored with a profile. If the date modified for the file was stored in the database you should use that, but if not you should look for a last modified date on the profile and use that.

Sourcing resumes from a third-party such as a job board

When receiving resumes from a third-party API they should provide this date in the API response. If you don't see a date, reach out to the third-party to clarify.

How Does Resume Parsing Work?

What we call parsing is actually a multi-step process. First, we convert the source document to plain text, analyze it, and decide if the text is usable for parsing. If the plain text is not usable, we immediately return a response indicating the issue. If the plain text is usable it continues on to the parser and then returns a parsed document in the response. The graphic below illustrates this workflow.

Resume Parser Workflow

The vast majority of problems in parsing are not from processing the plain text, but from conversion to plain text. For example, there are many ways documents can be corrupted, or how they look like they are laid out isn't actually how the text is written. The point of explaining this is that when you find a mistake in the output, don't assume it's a parsing mistake. Look at the converted text and see if the converted text is as expected (reads logically). If the converted text is malformed, we cannot fix it.

Documents That Can Cause Problems

If you want to minimize conversion problems, don't use PDF documents. Many PDFs convert/parse fine; however, the reason for most of our "this document did not parse correctly" bug reports is that the document is a corrupt PDF file. PDF is a broken standard that often hides issues with the underlying text. If a PDF is corrupt, there is nothing Sovren can do to make that document convert to text "as a human would read it". More information regarding problems with the PDF format and how to check if a PDF is corrupt can be found here. Additionally, here are some tips for constructing an electronic resume.

Besides corrupt PDFs, we can predict - with very high accuracy - certain types of resumes that will not give satisfactory results.

LinkedIn Profile PDFs

While Sovren currently parses most LinkedIn profiles accurately, we cannot guarantee that we will always be able to do so. LinkedIn is determined to keep their data private by making their PDF profiles not compatible with any parsing software (Sovren and our competitors included). We are constantly working to adapt our parsing algorithms to the various changes LinkedIn makes regularly. It is our prediction that, at some point in the future, LinkedIn will make it impossible to extract any useful information from their profiles. We strongly advise our customers to avoid relying on LinkedIn profiles whenever possible.

Artists & Graphic Designers

The goal of these resumes is to create the most visually unique document representing their skills as a designer. This prevents accurate text extraction because candidates will use images instead of text, have text run diagonally across the resume, use vertical text, etc. Parsing can only be as accurate as the text extracted from the source document.

Extremly Long CVs Typical in Academia & Medicine

These documents are usually tens of pages and are flooded with patents, publications, and speaking events. They have very uncommon ways of writing work experience, and since they are often at a school or university it is easily confused with education.

Images & OCR

We don't provide Optical Character Recognition (OCR) because it introduces a tremendous amount of errors that are too numerous to allowing parsing to be accurate.

Since Sovren supports text-based formats, you can use an OCR provider and send the plain text output to Sovren to be parsed.

Entry Level

The Parser assumes that all resumes contain Employment History and Education. When confronted with a resume that seems to be missing Employment History or Education, it will assume that it has made a mistake, missed that data, and will try to treat other data as Employment History or Education.

Although that's a good strategy, it fails for student/new graduate/under-educated worker resumes where it is probable that their resume really does not contain any Employment History (and perhaps no Education). Therefore, when parsing a resume from a student or recent graduate or a worker with no advanced education (i.e., not even high school), set Coverage.EntryLevel = true in the config string (the default is false). This will tell the Parser that it's acceptable to not find Employment History and will result in more accurate parsing for student/recent graduate resumes only.

Australia / New Zealand / South Africa

In particular, Australia, New Zealand, and South Africa can present challenges in special cases where resumes are written in English and contain contact information with addreses in a 4-digit postal code country. More information on this topic can be found in the Languages section.

System Requirements

We recommend using our Sovren REST API to integrate the Sovren components. The API is deployed as an ASP.NET Web API and runs within IIS on Microsoft Windows operating systems. This section will discuss the system requirements for this API, and the requirements to run the dlls directly for legacy customers.

Software Requirements

The following operating systems are supported:

  • Windows Server 2012, 2012 R2, 2016, or 2019
  • Windows 8, 8 Pro, or 10
32-bit (x86) and 64-bit (x64) versions of each of these systems is supported, but 64-bit is strongly recommended. Linux, Mono and other non-Microsoft platforms are not supported.

Prerequisites (the software will not run without these components):

  • REST & SOAP APIs (Recommended)
    • IIS 8.0 or later (REST/SOAP APIs only) Note: By default, IIS is not enabled on Windows Server. Please follow the installation instructions below to enable IIS.
    • .NET Framework 4.8 (you can download it here)

Hardware Requirements

The following hardware is recommended, above and beyond the requirements of the operating system:

  • Memory: At least 1 GB free
    • Baseline of 400 MB plus 50 MB per language plus 10 MB per concurrent parse.
  • CPU: Modern fast multi-core CPU is recommended, the faster the better.
    • Parsing speeds are directly related to processor speeds. Parsing speed improves with faster clock rates and larger CPU caches like the ones found in Intel Xeon processors.
  • Hard Disk: 40 MB for installation of runtime files. No temp space or additional storage required.
  • If running in a VM, do NOT starve the VM. Give it multiple processors, plenty of memory, and processing priority.

Installing IIS

The following steps describe how to install IIS on a Windows Server 2016 instance. For information on how to install IIS in other Windows platforms, please visit https://www.iis.net/.

  1. Log into your Windows Server instance and go to Server Manager
  2. Under Manage, select Add Roles and Features
  3. Select Role-based or feature-based installation and click Next
  4. Select your server from the available server pool and click Next
  5. Select Web Server (IIS) – choose Include Management Tools as well and click Next
  6. Select ASP.NET 4.6 and click Next
  7. Under Application Development, select ASP.NET 4.6 and Add Features
  8. Click Next and Install

Web Service Installation Guide

You can download the latest version of the web service APIs from My Sovren Portal. Contact support@sovren.com if you need login information for the download site.

Once all the System Requirements have been met, you can begin the installation. The instructions listed below are for Windows 8/10 or Windows Server 2012/2012 R2/2016 and IIS 8/10. You must install both web services listed below to get successful api responses.

Installing the Sovren.SaaS.Service Web Service

  1. Download and unzip the file
    • For the REST service: download and unzip the Sovren.Saas.Service.Rest.Installed.zip file
    • For the SOAP service: download and unzip the Sovren.Saas.Service.Soap.Installed.zip file
  2. Copy the the Sovren.Saas.Service.* folder into a pre-defined folder in your server so that you have a structure like this C:\inetpub\wwwroot\Sovren.Saas.Service.*
    • Make sure the folder has the necessary read / write permissions for the IIS user (default is IIS_IUSRS) on your server
    • Note: the * in the path above denotes either Rest or Soap depending on your download
  3. Install your license. For more information regarding licensing refer to the full documentationhere.
  4. Open IIS Manager, go to Application Pools and create a new appplication pool called SovrenSaaswith .NET CLR Version 4.0.30319 and Managed pipeline mode set to Integrated. (It's highly recommended that this application not share an app pool with any other website)
  5. Disable automatic recycling of the Application Pool to avoid the 6-10 second delay that occurs during the first parse after the Application Pool is recycled.
    • In the Application Pools, open the Advanced Settings for the web service's Application Pool (i.e. SovrenSaas).
    • Set the Idle Time-out (minutes) to 0. This setting is under the Process Model section.
    • Set the Regular Time Interval (minutes) to 0. This setting us under the Generate Recycle Event Log Entry section. It defaults to 1740 minutes, which after a period of 29 hours causes a long delay when parsing the next resume/CV.
  6. Create the web site in IIS
    • If you installed the service in the folder mentioned above then expand Default Web Site, find your folder, right-click, and select Convert to Application. Select the SovrenSaas application pool you created earlier and select OK.
    • If you chose a different folder then right-click on Default Web Site, select Add Application, set alias to Sovren Saas, select the SovrenSaas application pool and set the physical path to you root of the folder you created.
  7. Verify the service is up
  8. The Web Service is now installed. Proceed with configuring and testing the web service.

Installing the Sovren.SaaS.ConversionService

We split the document conversion piece of our web service into its own project. In our document converter we levarage some third-party components that are .Net wrappers on c++ modules. These components are highly accurate, but one of them in particular will create a stack overflow exception on certain documents. We separated this into a separate service so we could implement retry logic and better handle those exceptions. Rather than having that request return a 500 error, and IIS have to reset the entire parser, we just failover the conversion service and parsing can continue uninhibited. For this process we will create 2 instances of the Conversion Service for increased insulation from the aforementioned stack overflow errors.

Installation steps:

  1. From the download in the previous section, copy the contents of the Sovren.Saas.ConversionService folder into a sibling folder of your Sovren.SaaS.Service.* web service installation. (For example C:\inetpub\wwwroot\Sovren.Saas.ConversionService)
  2. Install your license. For more information regarding licensing refer to the full documentationhere. If you already installed your license globally on the server you can skip this step.
  3. Open IIS Manager, go to Application Pools and create 2 new appplication pools called SovrenConversionService1 and SovrenConversionService2 with .NET CLR Version 4.0.30319 and Managed pipeline mode set to Integrated. These new application pools are critical for the failover of the conversion service to work effectively.
  4. Disable automatic recycling of the Application Pool to avoid the delay that occurs during the first parse after the Application Pool is recycled.
    • In the Application Pools, open the Advanced Settings for the web service's Application Pool (i.e. SovrenConversionService1 and SovrenConversionService2).
    • For each of the application pools do the following
      • Disable Rapid Fail Protection. This setting is under Rapid Fail Protection and it called enabled. This setting is intended to keep the application pool from automatically restarting after a speficied number of crashed in a certain timeframe. We moved the document conversion step of parsing to it's own service so that it could failover multiple times quickly without failing entire transactions. It's critical that this service is able to restart as many times as required.
      • Set the Idle Time-out (minutes) to 0. This setting is under the Process Model section.
      • Set the Regular Time Interval (minutes) to 0. This setting us under the Generate Recycle Event Log Entry section. It defaults to 1740 minutes, which after a period of 29 hours causes a long delay when parsing the next resume/CV.
  5. Create the web site in IIS
    • Right-click on Default Web Site, select Add Application, set alias to Sovren Conversion Service 1, select the SovrenConversionService1 application pool and set the physical path to you root of the folder you created.
    • Repeat the prior step for the 2nd Conversion Serivce. Note: both conversion services can point to the same physical paths, but they must use separate application pools.
  6. Verify the service is up
    • Navigate to http://localhost/SovrenConversionService1/ConversionService.asmx and verify that the standard Microsoft SOAP test page is returned.
  7. Lastly we need to make the Sovren.SaaS.Service aware of our ConversionServices
    • Navigate to the folder where you installed the Sovren.SaaS.Service
    • Open up the web.config
    • Modify the ConverterUrls setting with a semi-colon delimted list of the ConversionService Urls. For examplehttp://localhost/SovrenConversionService1/ConversionService.asmx;http://localhost/SovrenConversionService2/ConversionService.asmx
  8. Lastly, test a parsing transaction and verify that you recieve a sucessful result. 500 errors that return Conversion Failed means there is an issue making the call to the ConversionService.