Index/query scripts for IntEnz enzyme dataset
- Index IntEnz xml files, tested with IntEnz December 2019 release
$ ./nosqlbiosets/intenz/ --help usage: [-h] [-infile INFILE] [--index INDEX] [--doctype DOCTYPE] [--host HOST] [--port PORT] [--db DB] Index IntEnz xml files, with Elasticsearch, MongoDB or Neo4j optional arguments: -h, --help show this help message and exit -infile INFILE, --infile INFILE Input file name (intenz/ASCII/intenz.xml) --index INDEX Name of the Elasticsearch index or MongoDB database --doctype DOCTYPE Document type name for Elasticsearch, collection name for MongoDB --host HOST Elasticsearch, MongoDB or Neo4j server hostname --port PORT Elasticsearch, MongoDB or Neo4j server port --db DB Database: 'Elasticsearch', 'MongoDB' or 'Neo4j'
- Query API (naive and not comprehensive), more queries with MongoDB, few with Neo4j
$ ./nosqlbiosets/intenz/ --help usage: [-h] [--limit LIMIT] qc outfile Save IntEnz reaction connections as graph files positional arguments: qc MongoDB query clause to select subsets of IntEnz entries, e.g.: '{"reactions.label.value": "Chemically balanced"}' outfile File name for saving the output graph. Format is selected based on the file extension of the given output file; .xml for GraphML, .gml for GML, .json for Cytoscape.js, or .d3js.json for d3js format optional arguments: -h, --help show this help message and exit --limit LIMIT Maximum number of enzyme-metabolite connections
./nosqlbiosets/intenz/ '{"reactions.label.value": "Chemically balanced"}'\ balanced-reactions.xml --limit 800 ./nosqlbiosets/intenz/ '{"cofactors.#text": "Pyrroloquinoline quinone"}'\ cofactors.json ./nosqlbiosets/intenz/ '{"$text": {"$search": "poly(A)"}}' polyA.json
- Tests with the query API
Example graph
Example command lines for indexing
Server default connection settings are read from ../../conf/dbservers.json
# Download IntEnz xml files
wget -nc -P ./data
# Index with Elasticsearch, requires ~ 5m to 15m
./nosqlbiosets/intenz/ --db Elasticsearch --infile ./data/intenz.xml\
--index intenz
# Index with MongoDB, requires ~1m with local server, ~12m with MongoDB Atlas
./nosqlbiosets/intenz/ --db MongoDB --infile ./data/intenz.xml
# Index with Neo4j (processing time ~ 12m)
./nosqlbiosets/intenz/ --db Neo4j --infile ./data/intenz.xml