By default, our resume parser ensures an optimal balance between parsing accuracy and speed. If you have specific preferences, please contact sales@sovren.com
This service is designed to parse resumes/CVs. It assumes that all files passed to it are resumes/CVs. It does not attempt to detect whether a document is a resume/CV or not. It should not be used to try to extract information from other types of documents.
This service supports all commercially viable document formats used for text documents (image formats are not supported). The service does not support parsing of image files (such as TIFF, JPG) or scanned images within supported document file formats. Always send the original file, not the result of copy/paste, not a conversion by some other software, not a scanned image, and not a version marked up with recruiter notes or other non-resume information. Be aware that if you pass garbage into the service, then you are likely to get garbage out. The best results are always obtained by parsing the original resume/CV file.
In order to provide parsing for a wide range of languages, the parser does not provide low-usage fields for some languages.
If you are running batch transactions (i.e. iterating through files in a folder), make sure that you do not try to reparse a file if you get an exception back from the service since you will get the same result each time and credits will be deducted from your account.
FlexRequests are of experimental nature. Therefore FlexRequests may not always provide accurate or reliable results. Sovren gives no guarantees or warranties of any kind with the use of FlexRequests, nor does it accept any liability for its use. It is your sole responsibility to use FlexRequests in a responsible and ethical manner, not to use it for harmful, malicious or unethical purposes and to ensure that the use of FlexRequests aligns with applicable laws and regulations.
Request Body
DocumentAsBase64Stringoptionalstring
A Base64 encoded string of the resume file bytes. This should use the standard 'base64' encoding as defined in RFC 4648 Section 4 (not the 'base64url' variant). .NET users can use the Convert.ToBase64String(byte[]) method.
SkillsSettingsoptionalobject
Enable skills normalization and enhanced candidate summarization, and specify the version of the skills taxonomy for this parsing transaction.
SkillsSettings.Normalizeoptionalbool
When true:
Raw skills will be normalized. These will be output under Value.ResumeData.Skills.Normalized. Read more about the benefits of using a skills taxonomy.
An enhanced candidate summary is generated, leveraging the taxonomy structure to relate skills to profession groups.
This setting has no effect when TaxonomyVersion is set to (or defaults to) V1.
SkillsSettings.TaxonomyVersionoptionalstring
Specifies the version of the skills taxonomy to use. One of:
V1 - Deprecated This is the default for old accounts. Will be removed in a future release.
V2 - This is the default for new accounts, and must be explicitly set if you have access to V1 and V2.
Benefits of V2 include:
2x larger skills taxonomy, updated frequently based on real-world data.
15-40% higher accuracy of extracted skills.
Better clustering of skill synonyms.
Distinguish skill types (IT / Professional / Soft).
Improved candidate summary.
Compatibility with the taxonomy used in Textkernel's Data Enrichment APIs and Jobfeed, enabling standardization of taxonomies across all of your data and benchmarking against jobs posted online.
ProfessionsSettingsoptionalobject
Enable normalization of job titles using our proprietary taxonomy and international standards.
ProfessionsSettings.Normalizeoptionalbool
When true, the most recent 3 job titles will be normalized. This includes a proprietary value from our profession taxonomy, plus ONET and ISCO mappings. Read more about the benefits of using a professions taxonomy.
When enabling professions normalization, additional charges apply.
The following languages are supported: English, Chinese (Simplified), Dutch, French, German, Italian, Polish, Portuguese, and Spanish. For documents in other languages, no normalized values will be returned.
For Sovren AI Matching, normalized professions are automatically indexed and used when profession normalization is enabled during parsing (through IndexingOptions). To leverage profession normalization for user-created searches, enable profession normalization at query time.
The professions taxonomy and the mappings are compatible with the taxonomies used in Textkernel's Data Enrichment APIs and Jobfeed, enabling standardization of taxonomies across all of your data and benchmarking against jobs posted online.
ProfessionsSettings.Versionoptionalobject
Specifies the versions to use when normalizing professions if more than one is available.
ProfessionsSettings.Version.ONEToptionalstring
The ONET Version to use when normalizing professions. One of:
2010
2019
This parameter defaults to "2010".
DocumentLastModifiedoptionalstring
Mandatory date, in YYYY-MM-DD format, so that the Parser knows how to interpret dates in the document that are expressed as "current" or "as of" or similar. To find out why this is so important and how to calculate/find it, read here. Failing to pass a DocumentLastModified, or passing a DocumentLastModified that is clearly improbable, may result in rejection of data and/or additional charges, and will utterly decimate the usefulness of AI Matching. Use of the DocumentLastModified field is subject to the Acceptable Use Policy.
OutputHtmloptionalbool
When true, the original file is converted to HTML and stored in the Html property.
OutputRtfoptionalbool
When true, the original file is converted to RTF and stored in the Rtf property.
OutputPdfoptionalbool
When true, the original file is converted to PDF and stored in the Pdf property as a byte array.
UseLLMParseroptionalbool
When true, and the document is English, the LLM parser will be used. See here for more information.
OutputCandidateImageoptionalbool
When true, if the document contains inline images, the image that is most likely to be a photo of the candidate is returned as a byte array.
Configurationoptionalstring
Optional parser configuration string to be used for parsing. If not specified, the default parser configuration will be used. For more information regarding the parser configuration string and assistance generating one, refer to the Parser Configuration Documentation.
GeocodeOptionsoptionalobject
Get or insert geocode coordinate values (latitude/longitude) during the parse transaction.
GeocodeOptions.IncludeGeocodingoptionalbool
When set to true we will automatically geocode the address that is parsed out leveraging an api call to our/geocode endpoint, and thus will be charged accordingly. This parameter defaults to false.
GeocodeOptions.Provideroptionalstring
The Provider you wish to use to geocode the postal address (current options are "Google", "Bing", or "None"). If not specified, we will default to Google. If you are just trying to update the postal address in the document, please set this to "None". If passing "Google" or "Bing", ProviderKey is requried.
GeocodeOptions.ProviderKeyoptionalstring
The Provider Key for the specified Provider. If using Bing you must specify your own provider key.
GeocodeOptions.PostalAddressoptionalobject
The postal address you wish to geocode. For best results, specify as many of the PostalAddress fields as possible. If provided, this address will be used to get the geocode coordinates instead of the address included in the ParsedDocument (if present), however, the address in the ParsedDocument will not be modified.
The address line (i.e. Street address for U.S. address) for the postal address
GeocodeOptions.GeoCoordinatesoptionalobject
The geographic coordinates (latitude/longitude) for your postal address. Use this if you already have latitude/longitude coordinatesand simply wish to add them to your parsed document. If provided, these values will be inserted into your ParsedDocument and the address included in the ParsedDocument (if present), will not be modified.
When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions.
Skills Normalization must be included to index documents using V2 Skills Taxonomy. These algorithms ignore raw skills and only consider the normalized skill concepts for skills category scoring. This leads to improved scoring and ranking because normalization produces less false negatives than simple exact keyword matching.
IndexingOptions.IndexIdoptionalstring
When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions. This determines what index to place the parsed document in. This is case-insensitive.
IndexingOptions.DocumentIdoptionalstring
When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions. This determines what id to give to the parsed document. This is restricted to alphanumeric with dashes and underscores. All values will be converted to lower-case.
IndexingOptions.UserDefinedTagsoptionalstring[]
The user-defined tags you want the document to have.
Unique field name to be returned alongside the reply in the response
FlexRequests[i].DataTypeoptionalstring
The data type for the reply. One of: text, numeric, bool, list, enumeration
FlexRequests[i].EnumerationValuesoptionalstring[]
If DataType is enumeration, this is the list of possible replies. This is limited to a maximum of 50 values.
SkillsDataoptionalstring[]
This feature is not recommended and only available as an add-on. Please reach out to sales@sovren.com. String[] of your custom skills list names and the Sovren "builtin" skills list. If no list is provided the Sovren builtin skills list will be used. The parser automatically detects language and looks for a corresponding skills list in that language, if no match is found this list is ignored.
NormalizerDataoptionalstring
This feature is not recommended and only available as an add-on. Please reach out to sales@sovren.com. Name of your custom normalization data file. If no list is provided the Sovren builtin skills list will be used (english only). When using custom normalization files the language to be used is determined by the Parser (the default fall back language is English if the Parser cannot find a match).
A response code elaborating on the HTTP status code. The following is a list of codes that can be returned by the service:
Success– Successful transaction
ConversionException- There was an issue converting the document
MissingParameter- A required parameter wasn't provided
InvalidParameter- A parameter was incorrectly specified
Timeout- The transaction reached its timeout limit
AuthenticationError- An error occurred with the credentials provided
Info.Messagestring
This message further describes the code providing additional detail.
Info.TransactionIdstring
The (GUID) id for a specific API transaction. Use this when contacting support@sovren.com about issues.
Info.EngineVersionstring
The version of the parsing/matching engine running under-the-hood.
Info.ApiVersionstring
The version of the API.
Info.TotalElapsedMillisecondsinteger
How long the transaction took on Sovren's server, in milliseconds. If the transaction takes longer to complete on the client side, that extra duration is solely network latency.
Info.TransactionCostdecimal
How many credits the transaction costs.How many credits the transaction costs.
Info.CustomerDetailsobject
Information about the customer who made the API call.
Value.CustomerDetails.AccountIdstring
The AccountId for the account.
Value.CustomerDetails.Namestring
The customer name on the account.
Value.CustomerDetails.IPAddressstring
The client IP Address where the API call originated.
Value.CustomerDetails.Regionstring
The region for the account, also known as the 'Data Center'.
Value.CustomerDetails.CreditsRemainingdecimal
The number of credits remaining to be used by the account.
The number of requests that can be made at one time. If using sub-accounts, this is the maximum number of concurent requests across all accounts, not just this single sub-account.
Value.CustomerDetails.ExpirationDatedate
The date that the current credits expire.
Valueobject
Contains response data for the transaction.
Value.ParsingResponseobject
The status of the parse transaction.
Value.ParsingResponse.Codestring
The following is a list of codes that can be returned by the service:
Success– Successful transaction
WarningsFoundDuringParsing- Parsing was successful. This is not an error code. This is an advanced level message about the document, not about the parsing. For more information, refer to the ResumeQuality section in the parsed document output and to the documentation here.
PossibleTruncationFromTimeout- The timeout occurred before the document was finished parsing which can result in truncation
Timeout- The transaction reached its timeout limit
ConversionException- There was an issue converting the document
Value.ParsingResponse.Messagestring
A short human-readable description explaining the Code value.
Value.GeocodeResponseobject
If geocoding was requested in the ParseOptions.GeocodeOptions the status of the geocode transaction will be output here.
Value.GeocodeResponse.Codestring
The following is a list of codes that can be returned by the service:
Success– Successful transaction
MissingParameter- A required parameter wasn't provided
InvalidParameter- A parameter was incorrectly specified
InsufficientData- The address provided doesn't have enough granularity to be geocoded effectively
CoordinatesNotFound- The geocoding provider couldn't find any coordinates matching the provided address
Value.GeocodeResponse.Messagestring
A short human-readable description explaining theCodevalue.
Value.IndexingResponseobject
If indexing was requested in the ParseOptions.IndexingOptions the status of the index transaction will be output here.
Value.IndexingResponse.Codestring
The following is a list of codes that can be returned by the service:
Success– Successful transaction
MissingParameter- A required parameter wasn't provided
InvalidParameter- A parameter was incorrectly specified
AuthenticationError- An error occurred with the credentials provided
DataNotFound- Data with the specified name wasn't found
ConstraintError- Data in the request is not allowed with the specific action being requested.
Value.IndexingResponse.Messagestring
A short human-readable description explaining the Code value.
Value.ProfessionNormalizationResponseobject
If profession normalization was requested in the ProfessionsSettings.Normalize the status of the profession normalization transaction will be output here.
Value.ProfessionNormalizationResponse.Codestring
The following is a list of codes that can be returned by the service:
This field is not available for all languages. See low-usage fields for more information. The preferred given (first) name or nickname. This is rarely populated.
The candidate's location/address. The Parser does not standardize addresses. Address standardization services are available, including for example the Google Maps API, that can take the Parser's contact info fields and standardize/geocode the data.
These values are not very global-friendly, but the Parser does normalize all degrees to one of these pre-defined types. This list is sorted, as well as possible, by increasing level of education, although there are certainly ambiguities from one discipline to another, such as whether professional is above or below bachelors. Here are the possible values:
Normalized GPA is a decimal value that is output only when a GPA has been provided. This value is normalized from 0.0 to 1.0, with 1.0 being the top mark, so that all GPAs across all scales can be compared, taking into account different min/max values and whether high or low numbers are ranked higher. For example:
A paragraph of text that summarizes the candidate's experience. This paragraph is generated based on other data points in the ExperienceSummary. It will be the same language as the resume for Czech, Dutch, English, French, German, Greek, Hungarian, Italian, Norwegian, Russian, Spanish, and Swedish. To always generate the summary in English, set "OutputFormat.AllSummariesInEnglish = True;" in the configuration string when parsing.
In order for this value to be accurate, you must have provided an accurate DocumentLastModified when you parsed this resume.
This summary can be further enhanced by enabling SkillsSettings.Normalize on the request.
The number of months of work experience as indicated by the range of start and end date values in the various jobs/positions in the resume. Overlapping date ranges are not double-counted. This value is NOT derived from text like "I have 15 years of experience".
In order for this value to be accurate, you must have provided an accurate DocumentLastModified when you parsed this resume.
The number of months of management experience as indicated by the range of start and end date values in the various jobs/positions in the resume that have been determined to be management-level positions. Overlapping date ranges are not double-counted. This value is NOT derived from text like "I have 10 years of management experience".
In order for this value to be accurate, you must have provided an accurate DocumentLastModified when you parsed this resume.
Job titles are examined to determine the best category for any executive experience. If the candidate posesses no executive experience, "NONE" will be output. Possible values are One of:
This field is not available for all languages. See low-usage fields for more information. A score (0-100), where 0 means a candidate is more likely to have had (and want/pursue) short-term/part-time/temp/contracting jobs and 100 means a candidate is more likely to have had (and want/pursue) traditional full-time, direct-hire jobs
In order for this value to be accurate, you must have provided an accurate DocumentLastModified when you parsed this resume.
The highest score calculated from any of the position titles. The score is based on the wording of the title, not on the experience described within the position description.
This field is not available for all languages. See low-usage fields for more information. Any abnormal findings about the candidate's career will be reported here. For example, if the candidate held a management-level position in a previous job, but not their current job.
This field is not available for all languages. See low-usage fields for more information. The degree of certainty that the company name is accurate. One of:
This field is not available for all languages. See low-usage fields for more information. True if the candidate was self-employed at this job/position.
Deprecated Use Profession Normalization (ProfessionsSettings.Normalize) instead. This value was a very basic normalization that simply removed special characters, etc. Profession Normalization is much more advanced and is taxonomy-based.
This field is not available for all languages. See low-usage fields for more information. The degree of certainty that the job title value is accurate. One of:
Deprecated Use Profession Normalization (ProfessionsSettings.Normalize) instead. These variations are generated with a very basic algorithm. Profession Normalization is much more accurate/advanced and is taxonomy-based.
This field is not available for all languages. See low-usage fields for more information. How many employees were supervised in this position/job, or null.
Deprecated Use Profession Normalization (ProfessionsSettings.Normalize) instead. The name of the skills taxonomy that this position was categorized as based on skills found in the job description. This field will not be output when SkillsSettings.TaxonomyVersion is set to (or defaults to) V2.
Deprecated Use Profession Normalization (ProfessionsSettings.Normalize) instead. The name of the skills subtaxonomy that this position was categorized as based on skills found in the job description. This field will not be output when SkillsSettings.TaxonomyVersion is set to (or defaults to) V2.
Deprecated Use Profession Normalization (ProfessionsSettings.Normalize) instead. The percentage of this job described by the TaxonomyName. This value will always be 0 when SkillsSettings.TaxonomyVersion is set to (or defaults to) V2.
True if Sovren found this by matching from a known list of certifications. False if Sovren found this by analyzing the context and determining it was a certification.
Sovren generates several possible variations for some certifications to be used in AI Matching. This greatly improves Matching, since different candidates have different ways of listing a certification. If this certification is a Sovren-created variation of a certification found on the resume, this property will be true.
Value.ResumeData.Licensesobject[]
This field is not available for all languages. See low-usage fields for more information. Licenses found on a resume. These are professional licenses, not driving licenses. For driving licenses, see PersonalAttributes.
Value.ResumeData.Licenses[i].Namestring
The name of the license.
Value.ResumeData.Licenses[i].MatchedFromListbool
True if Sovren found this by matching from a known list of licensense. False if Sovren found this by analyzing the context and determining it was a license.
Value.ResumeData.Associationsobject[]
This field is not available for all languages. See low-usage fields for more information. Associations/organizations found on a resume.
Any text within the Text that is recognized as a qualification (such as DDS), degree (such as B.S.), or a certification (such as PMP). Each qualification is listed separately.
The primary language of the parsed text. The value is one of the ISO 639-1 codes. When the language could not be automatically determined, it is reported as the special value Invariant/Unknown (iv). The two-letter ISO codes reported by the Parser, such as zh for Chinese, do not differentiate between language variants, such as Mandarin and Cantonese.
For a listing of languages and regions supported the most recent version, you can refer to parser tech specs.
This is an ISO 3066 code that represents the actual cultural context regarding formatting of numbers, dates, character symbols, and so on. This value is usually a simple concatenation of the Language and Country codes, such as "en-US" for US English, but beware that CultureInfo can be set independently of Language and Country to achieve fine-tuned cultural control over parsing, so if you use this value you should not assume that it always matches the Language and Country.
The exact text that was used to identify the beginning of the section. If there was no text indicator and the location was calculated, then the value is "CALCULATED"
This field is not available for all languages. See low-usage fields for more information. A list of quality assessments for the resume. These are very useful for providing feedback to candidates about why their resume did not parse properly. These can also be used to determine if a resume is 'high quality' enough to put into your system. More information is available in the Resume Quality Documentation
A list of user-defined tags that are assigned to this resume. These are used to filter search/match queries in the AI Matching Engine.
NOTE: you may add/remove these prior to indexing. This is the only property you may modify prior to indexing.
Value.RedactedResumeDataobject
This property is the Value.ResumeData with all of the Personally Identifiable Information (PII) fields such as first name, last name, email addresses, phone numbers, etc. redacted.
Value.ConversionMetadataobject
Information about converting the document to plain text
The suggested extension based on the DetectedType.
Value.ConversionMetadata.OutputValidityCodestring
The computed validity based on the source text. This will indicate whether a document looks like a legitimate resume or not. See here for more details.