OTMI Repository
From OpenTextMining
The OTMI repository (on http://www.nature.com/) currently hosts 2 years (2005, 2006) worth of content for 5 journals:
- Nature (nature)
- Nature Genetics (ng)
- Nature Reviews Drug Discovery (nrd)
- Nature Structural & Molecular Biology (nsmb)
- The Pharmacogenomics Journal (tpj)
Directories of available OTMI files are provided using OPML files.
- Master file - master OPML file for all journals
- http://www.nature.com/otmi/journals.opml
- references journal OPML files via attribute opmlUri
- Journal file - OPML file for given journal - here tpj
- e.g. http://www.nature.com/tpj/otmi/tpj.opml
- references issue OPML files via attribute opmlUri
- references issue OTMI tarball via attribute gzipUri (see Note 1)
- Issue file - OPML file for per-issue OTMI files
- e.g. http://www.nature.com/tpj/journal/v5/n1/otmi/otmi-manifest.opml
- references article OTMI files via attribute otmiUri
OTMI files are available at issue or article levels:
- Issue level - tarball of OTMI files for complete issue (see Note 1)
- and for each tarball there's a corresponding MD5 digest file (see Note 2)
- Article level - OTMI file for individual article
Note 1: If you are using a command-line tool such as curl or wget you may need to add the following option to preserve the compressed file:
--header 'Accept-Encoding: compress, gzip'
e.g.
% curl --header 'Accept-Encoding: compress, gzip' 'http://www.nature.com/tpj/journal/v5/n1/otmi/otmi-contents.tar.gz'
and likewise for the MD5 digest
% curl --header 'Accept-Encoding: compress, gzip' 'http://www.nature.com/tpj/journal/v5/n1/otmi/otmi-contents.tar.gz.md5'
Note 2: MD5 digests are provided using the md5sum utility. To check the MD5 digests issue the following command which should give the response shown:
% md5sum -c otmi-contents.tar.gz.md5 otmi-contents.tar.gz.md5: OK
For some information on using md5sum see e.g. the article Using MD5SUM to Validate the Integrity of (Downloaded) Files.
