Prologue
In this article,you will be provided a thorough treatise on an in-house developed tool for parsing and validating CVRF documents aptly named "cvrfparse". The article is split into two parts. The first part,intended for CVRF document producers and consumers,is a hands-on manual detailing how to use cvrfparse. The second part,intended for burgeoning Python programmers,explores some of the inner workings of the tool.
Introduction
The CVRF parser or "cvrfparse" is a Python-based command line tool that offers simple parsing and validation of CVRF documents. Using it, you can quickly query a CVRF document for any of its contents. For example, let's say one of your vendors releases a bundle of security advisories encoded in CVRF. There are a dozen individual CVRF documents each with multiple vulnerabilities across hundreds of products. Using cvrfparse, you can quickly ascertain which documents contain vulnerable products you might have installed in your infrastructure. We'll see how, shortly.
Cvrfparse is a validating parser. Before you start looking for data in a CVRF document, you might want to quickly check to ensure a CVRF document is well-formed and/or valid (in fact you'll need a well-formed and valid document before you can parse it). This is useful for document producers who provide CVRF content to their customers.
Without further ado, let's get to it and check out the tool. You can download the tool as a Python package from The Python Package Index (PyPI) or check out the source at GitHub. The only third-party code you may need to install is the lxml library. The easiest way to install cvrfparse and all required dependencies is to use pip. A typical invocation would be:
[sb:~] mike% pip install cvrfparse
The sample CVRF document used in the examples below is included in the distribution of the tool.
Need a CVRF Refresher?
If you're working with CVRF at any level, the two-part CVRF Missing Manual blog series is highly recommended. In fact, savvy readers will notice that the sample CVRF document included with cvrfparse is the same one created for that blog series.
The Tool: Cvrfparse
Before we dive into some examples, let's first explore all of the options we can specify when using the tool, to do that, we invoke cvrfparse with the "help" switch:
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --helpusage: cvrfparse.py [-h] -f FILE [--cvrf [{all,DocumentTitle,DocumentType,DocumentPublisher,DocumentTracking,DocumentNotes,DocumentDistribution,AggregateSeverity,DocumentReferences,Acknowledgments} ...]] [--vuln [{all,Title,ID,Notes,DiscoveryDate,ReleaseDate,Involvements,CVE,CWE,ProductStatuses,Threats,CVSSScoreSets,Remediations,References,Acknowledgments} ...]] [--prod [{all,Branch,FullProductName,Relationship,ProductGroups} ...]] [-c] [-s] [-V] [-S SCHEMA] [-C CATALOG] [-v]Validate/parse a CVRF 1.1 document and emit user-specified bits.optional arguments: -h, --help show this help message and exit -f FILE, --file FILE candidate CVRF 1.1 XML file --cvrf [{all,DocumentTitle,DocumentType,DocumentPublisher,DocumentTracking,DocumentNotes,DocumentDistribution,AggregateSeverity,DocumentReferences,Acknowledgments} ...] emit CVRF elements, use "all" to glob all CVRF elements. --vuln [{all,Title,ID,Notes,DiscoveryDate,ReleaseDate,Involvements,CVE,CWE,ProductStatuses,Threats,CVSSScoreSets,Remediations,References,Acknowledgments} ...] emit Vulnerability elements, use "all" to glob all Vulnerability elements. --prod [{all,Branch,FullProductName,Relationship,ProductGroups} ...] emit ProductTree elements, use "all" to glob all ProductTree elements. -c, --collate collate all of the Vulnerability elements by ordinal into separate files -s, --strip-ns strip namespace header fromelementtags before printing -V, --validate validate the CVRF document -S SCHEMA, --schema SCHEMA specify local alternative for cvrf.xsd -C CATALOG, --catalog CATALOG specify location for catalog.xml (default is ./cvrfparse/schemata/catalog.xml) -v, --version show program's version number and exit
While the help display might seem to offer a lot of confusing options, if you know CVRF, it's really quite simple. The following list explains each command line option in detail:
Cvrfparse Command-line Examples: Remote Validation
Now that we have the options down, let's explore a simple standard invocation: validating a CVRF document against the remote schema files. This may sound intimidating, but it's actually the easiest (and default) way to ensure you have a valid and well-formed CVRF document to work with. Here's how to do it:
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --validateFetching schemata...Valid
Ok, that was easy. Now that we know what it looks like when we work with a valid and well-formed CVRF document, let's muck with it a bit and see an example of when validation fails:
[sb:~/cvrfparse] mike% sed 's/<InitialReleaseDate>2011-05-25T00:00:00+00:00/<InitialReleaseDate>TODAY/' cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml > cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-invalid.xml [sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-invalid.xml --validateFetching schemata...cvrfparse/CVRF-1.1-cisco-sa-20110525-rvs4000-invalid.xml:31:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: Element '{http://www.icasi.org/CVRF/schema/cvrf/1.1}InitialReleaseDate': 'TODAY' is not a valid value of the atomic type 'xs:dateTime'.
Ah. That's nifty. cvrfparse not only told us the document was invalid, but also exactly where and how it was invalid.
For our next example, let's see what happens when the document is not well-formed:
[sb:~/cvrfparse] mike% sed 's/InitialReleaseDate/InitialReleaseDatefoo/' cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml > cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml [sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml --validatecvrfparse.py: Parsing error, document "cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml" is not well-formed: cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml:31:96:FATAL:PARSER:ERR_TAG_NAME_MISMATCH: Opening and ending tag mismatch: InitialReleaseDatefoo line 31 and InitialReleaseDate
Again cvrfparse found the error and told us exactly where and what it is. Oh cvrfparse, what can't you do!
Cvrfparse Command-line Examples: Local Validation
Normally, when -validate is specified, cvrfparse fetches the remote schemata from all over the Internet. While this is the simplest way to invoke the validation logic, it's also the slowest and can take several seconds to complete. For a single document, this is probably acceptable, but if you're doing bulk validation and running cvrfparse from a script or in a pipeline, there is a faster way. You can force cvrfparse to use local copies of the various schema files required to validate, resulting in a dramatic performance increase (on my home machine and 20Mbps cable modem I saw a 50x speed increase). To facilitate local validation, cvrfparse ships with copies of all of the required schema files and a catalog file that point to them. To invoke local validation, we use the -schema option to point to the CVRF 1.1 schema file and the -catalog option to point to the local catalog.xml (the -catalog option can be omitted if the catalog.xml is in the default directory of ./cvrfparse/schemata/catalog.xml).
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --validate --schema cvrfparse/schemata/cvrf/1.1/cvrf.xsd --catalog cvrfparse/schemata/catalog.xmlValid
Once we're sure we have a well-formed and valid CVRF document, we can start emitting some elements. A common use-case would be to query a document for the Document Title and Document Type:
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --cvrf DocumentTitle DocumentType[{http://www.icasi.org/CVRF/schema/cvrf/1.1}DocumentTitle] Cisco Security Advisory: Cisco RVS4000 and WRVS4400N Web Management Interface Vulnerabilities[{http://www.icasi.org/CVRF/schema/cvrf/1.1}DocumentType] Security Advisory
Sweet. Now, if you don't want to see that pesky namespace header preceding every line of output, use the -strip-ns option:
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --cvrf DocumentTitle DocumentType --strip-ns[DocumentTitle] Cisco Security Advisory: Cisco RVS4000 and WRVS4400N Web Management Interface Vulnerabilities[DocumentType] Security Advisory
Ah, much better. Another useful example is to emit the Product Tree Full Product Name elements with their corresponding Product ID attributes:
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --prod FullProductName --strip-ns[FullProductName] Cisco RVS4000 Gigabit Security Router version 1(ProductID: CVRF1.1-PID-0001)[FullProductName] Cisco RVS4000 Gigabit Security Router version 2(ProductID: CVRF1.1-PID-0002)[FullProductName] Cisco RVS4000 Gigabit Security Router version 1.3.3.5(ProductID: CVRF1.1-PID-0006)[FullProductName] Cisco RVS4000 Gigabit Security Router version 2.0.2.7(ProductID: CVRF1.1-PID-0007)[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 1.0(ProductID: CVRF1.1-PID-0003)[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 1.1(ProductID: CVRF1.1-PID-0004)[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 2.0(ProductID: CVRF1.1-PID-0005)[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 2.0.2.1(ProductID: CVRF1.1-PID-0008)
Want to quickly check to see if there are any high priority CVSS Scores? We can pull out the CVSS Score Sets from each vulnerability:
[sjc-vpn6-826:~/PycharmProjects/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --strip-ns --vuln CVSSScoreSets | grep BaseScore | sort -r -k2| grep BaseScore | sort -r -k2[BaseScore] 9.3[BaseScore] 9.0[BaseScore] 5.0
Cvrfparse Command-line Examples: Vulnerability Container Collation
As we learned above, cvrfparse also contains functionality to be able to collate each vulnerability in a document by Vulnerability Ordinal.
[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --strip-ns --collate [sb:~/cvrfparse] mike% ls -sh cvrfparse*txt24 cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-1.txt24 cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-2.txt24 cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-3.txt[sb:~/cvrfparse] mike% head cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-1.txt[Vulnerability] (Ordinal: 1)[Title] Retrieval of the configuration file[Notes] [Note] The Cisco RVS4000 and WRVS4400N Gigabit Security Routers deliver high-speed network access and IPsec VPN capabilities for small businesses. They also provides firewall and intrusion prevention capabilities. The Cisco RVS4000 and WRVS4400N Gigabit Security Routers contains a web management interface vulnerability:
Nicely done. If we had invoked cvrfparse as above on a CVRF document that had no Vulnerability Containers (which is perfectly valid), the program will quietly and correctly do nothing.
Under the Hood
As I've done in the past, in all of my technical blogs where I release code, I like to choose some linchpin code block and discuss it. With cvrfparse, we'll have a look at a few interesting sections. We'll check out the three functions that perform most of the work: validation, parsing and vulnerability collation.
Validation
The validation function accepts two arguments: a file object which will contain the un-parsed schema document and a lxml parsed (and consequently well-formed) CVRF document. The function first attempts to parse the schema into an ElementTree object. Provided the document is well-formed (what a disaster if your schema was broken!) control will proceed to the next line; this line calls XMLSchema which turns the document into an XML Schema validator. This object has the assertValid method that allows us to get an exception while validating. To find out why validation failed, we can check the error_log object. Assuming all goes well, the assertion will not fail and the function will return True and the string "Valid".
def cvrf_validate(f, cvrf_doc):""" Validates a CVRF document f: file object containing the schema cvrf_doc: the serialized CVRF ElementTree object returns: a tuple containing the return code (True for valid / False for invalid) and a reason for the code """ try: xmlschema_doc = etree.parse(f)exceptetree.XMLSyntaxErrorase: log = e.error_log.filter_from_level(etree.ErrorLevels.FATAL)return False, 'Parsing error, schema document "{0}" is not well-formed:{1}'.format(f.name, log)xmlschema = etree.XMLSchema(xmlschema_doc)try: xmlschema.assertValid(cvrf_doc)return True, "Valid" exceptetree.DocumentInvalid:return False, xmlschema.error_log
Parsing
The parsing function is even simpler. It also accepts two arguments: the parsed CVRF document and the elements the user wishes to emit, encoded as a list. It returns a dictionary that contains the filename of where to write the contents and a list that contains the items to write. The function kicks off by declaring an empty list that we'll use to store the items the user wants to emit. The function makes liberal use of Python's versatile workhorse iteration construct, the for loop. The top-level for loop iterates over each item in parsables and extracts each element in the list. For each one of the elements in parsables, we use the lxml/etree iter() method as an iterator to filter each element extract and each ElementTree node. Finally, we then iterate over eachnodein thatchildand add everything we find to the items list. When we've exhausted all of the items in parsables, we return a dictionary that contains the file to write the output to, which is currently standard output, and the list of the items to write.
def cvrf_parse(cvrf_doc, parsables):""" Parse a cvrf_doc and return a list of elements as determined by parsables cvrf_doc: the serialized CVRF ElementTree object parsables: list of elements to parse from a CVRF doc returns: a dictionary of the format {filename:[item, ...]} """items = []for element inparsables:for node incvrf_doc.iter(element):for child innode.iter(): items.append(child)#Hardcoded output for now, eventually make this user-tunable return {"stdout": items}
Vulnerability Collation
As our denouement, let's have a look at the vulnerability collation function, cvrf_collate_vuln(). It accepts only a single familiar argument, the parsed CVRF document and returns a dictionary of exactly the same format as does cvrf_parse(). The function starts by declaring an empty dictionary which will hold theresults. Next on its todo list is the creation of a root filename in which the collation process will store the goods. We use the findtext() method which is part of ElementTree's Xpath-like query language, ElementPath, to find the first (and only, assuming the document is valid) DocumentTitle element and return its contents. If you look closely, you'll notice the rather long line of string methods is actually operating on two different strings. The first one removes the curly braces from the namespace specifier string to accommodate the format required by findtext(). The second preps the filename by removing all extraneous whitespace from the Document Title and replacing any "internal" spaces with underscores.
Next, the iterator uses the findall() method which issues an Xpath like query to return all match elements. In this case, we want to iterate over each Vulnerability element. We create the specific filename, which is prefixed by the string literal "cvrfparse-", followed by the title we just created, followed by the string literal "-ordinal-", followed by the vulnerability's ordinal, and capped with the string literal ".txt". The function then uses the iter() method we saw above to create a list comprehension and store the whole in the dictionary indexed by the filename.
def cvrf_collate_vuln(cvrf_doc):""" Zip through a cvrf_doc and return all vulnerability elements collated by ordinal cvrf_doc: the serialized CVRF ElementTree object returns: a dictionary of the format {filename:[item, ...], filename:[item, ...]} """results = {}#Obtain document title to use in the filename(s) tiptoeing around around the curly braces in our NS definitiondocument_title = cvrf_doc.findtext("cvrf:DocumentTitle", namespaces={"cvrf": CVRF_Syntax.NAMESPACES["CVRF"].replace("{", "").replace("}", "")}).strip().replace(" ", "_") #Constrain Xpath search to the Vulnerability containerfor node in cvrf_doc.findall('.//'+ CVRF_Syntax.NAMESPACES['VULN'] +'Vulnerability'): #Create filename based on ordinal number to use as a key for results dictionaryfilename ='cvrfparse-'+ document_title +'-ordinal-'+ node.attrib['Ordinal'] +'.txt' #Create an iterator to iterate over each child element and populate results dictionary valuesresults[filename] = node.iter()return results
Conclusion
We looked at the newly open sourced tool, cvrfparse, a validating parser for CVRF. It's up for grabs at PyPI and GitHub! As work continues on the tool, your comments, critiques, and pull requests are welcomed.