Index/query scripts for FDA Adverse Event Reporting System dataset

Note: this work is at its early stages

https://open.fda.gov/data/faers/

Synopsis

Download all archived report records, indivudual json files normally include 1200 reports

wget -nc https://api.fda.gov/download.json
grep drug-event download.json | cut -c 20- > drug-event.json.zip.files

# File names are not unique, some archive file names exist in multiple years
# For this reason we download with folder option '-x'
xargs wget -x -nc --no-host-directories --cut-dirs=0 < drug-event.json.zip.files

Index all .json.zip files in given folder

# Elasticsearch
./nosqlbiosets/fda/faers.py --esindex faers --infile\
   ~/data/fda/faers/drug/event/2019q1\
   --dbtype Elasticsearch --recreateindex true\
   --host localhost

# MongoDB
nosqlbiosets/fda/faers.py --mdbcollection faers --infile ~/data/fda/\
 --dbtype MongoDB --host localhost  --mdbdb biosets
./nosqlbiosets/fda/faers.py --mdbcollection faers\
  --infile ~/data/fda/faers/drug/event/2019q1 --recreateindex true\
  --dbtype MongoDB --host localhost --mdbdb biosets

Update database with new reports files

./nosqlbiosets/fda/faers.py --dbtype Elasticsearch\
    --infile ~/data/fda/drug-event-0001-of-0035.json --esindex faers

./nosqlbiosets/fda/faers.py --dbtype MongoDB\
    --infile ~/data/fda/drug-event-0001-of-0035.json\
    --mdbdb biosets --mdbcollection faers

TODO/Ideas

A project similar to ClinVar Miner with FAERS database would be helpful for researchers and practioners as well as for the public?