Installation

GimmeMotifs runs on Linux. On Windows 10 it will run fine using the Windows Subsystem for Linux.

Conda - the easy way

The preferred way to install GimmeMotifs is by using conda. Activate the required channels and install mamba (you only have to do this once).

In this example, conda and mamba versions are pinned due to a bug with mamba. For more information, see issue 271.

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda install -c conda-forge "conda>=4.12" "mamba>=0.27"

You can install GimmeMotifs with one command. In the current environment:

$ mamba install gimmemotifs

Or create a specific environment:

$ mamba create -n gimme gimmemotifs

# Activate the environment before you use GimmeMotifs
$ mamba activate gimme

Installation successful? Good. Have a look at the configuration section.

Upgrading from 0.11.1

The way genomes are installed and used has been changed from 0.11.1 to 0.12.0. Basically, we have switched to the faidx index used and supported by many other tools. This means that the old (<=0.11.1) GimmeMotifs index cannot be used by GimmeMotifs 0.12.0 and higher. You can re-install genomes using genomepy, which is now the preferred tool for genome management for GimmeMotifs. However, because of this change you can now also directly supply a genome FASTA instead of a genome name. Pre-indexing is not required anymore.

Pip

Installation from PyPI with pip is a relatively straightforward option. Install with pip as follows:

$ pip install gimmemotifs

Or the (unstable) develop branch with the newest bells, whistles and bugs:

$ pip install git+https://github.com/vanheeringen-lab/gimmemotifs.git@develop

Note that several dependencies and many of the motif tools (such as MEME) need to be installed separately. Instructions for doing so are not included here.

Source - developers install

Want to fix that darned bug yourself? Want to try out the latest features?

Well look no further! You can install the develop branch with the newest bells, whistles and bugs:

# download the gimmemotifs code
$ git clone https://github.com/vanheeringen-lab/gimmemotifs.git
$ cd gimmemotifs
$ git checkout develop

# setup the gimme conda environment
$ conda env create -f requirements.yaml
$ conda activate gimme
$ python setup.py build  # installs the motif discovery tools
$ pip install -e .       # installs gimmemotifs (in editable mode)

# test if the install was successful
$ gimme -h

Once installed, you can edit the code in the gimmemotifs folder, and the changes are immediately active! Check out how good your fixes are with unit tests:

$ pytest -vvv --disable-pytest-warnings

Configuration

The configuration file

All of GimmeMotifs’ configuration is stored in ~/.config/gimmemotifs/gimmemotifs.cfg. The configuration file is created at first run with all defaults set, but you can always edit it afterwards. It contains two sections main and params that take care of paths, file locations, parameter settings etc. Additionally, every motif tool has it’s own section. Let’s have a look at the options.

[main]
bg = bg
template_dir = templates
score_dir = score_dists
gene_dir = genes
motif_databases = motif_databases
tools = included_tools/
  • template_dir The location of the jinja2 html templates, used to generate the reports.

  • score_dir To generate p-values, a pre-calculated file with mean and sd of score distributions is needed. These are located here.

  • gene_dir Directory with bed-files containing gene locations. This is needed to create promoter background sequences.

  • motif_databases Contains various motif databases.

  • tools Here all tools included with GimmeMotifs are stored.

[params]
fraction = 0.2
use_strand = False
abs_max = 1000
analysis = xl
enrichment = 1.5
size = 200
lsize = 500
background = gc,random
cluster_threshold = 0.95
scan_cutoff = 0.9
available_tools = AMD,BioProspector,ChIPMunk,DiNAMO,GADEM,HMS,Homer,Improbizer,MDmodule,MEME,MEMEW,MotifSampler,Posmo,ProSampler,Trawler,Weeder,XXmotif,Yamda
tools = BioProspector,Homer,MEME
pvalue = 0.001
max_time = -1
ncpus = 12
motif_db = gimme.vertebrate.v5.0.pfm
use_cache = False

This section specifies all the default GimmeMotifs parameters. Most of these can also be specified at the command-line when running GimmeMotifs, in which case they will override the parameters specified.

Input Data

Genomes - and how to get them

You will need genome FASTA files for a lot of the tools that are included with GimmeMotifs.

The most straightforward way to download and index a genome is to use the genomepy tool, which is installed with GimmeMotifs.

$ genomepy install hg38 --provider UCSC --annotation

Here, the hg38 genome and accompanying gene annotation will be downloaded from UCSC to the directory ~/.local/share/genomes/hg38. You can change this default location by editing the file ~/.config/genomepy/genomepy.yaml and change the following line:

genomes_dir: /data/genomes

If this file does not exist, you can generate it with genomepy config generate. After downloading a genome with genomepy, you can use its name (e.g. hg38) for gimme commands.

MotifSampler

If you want to use MotifSampler there is one more step that you’ll have to take after installation of GimmeMotifs. For every organism, you will need a MotifSampler background. Note that human (hg19, hg38) and mouse (mm9, mm10) background models are included, so for these organisms MotifSampler will work out of the box. For other organisms the necessary background files can be created with CreateBackgroundModel (which is included with GimmeMotifs or can be downloaded from the same site as MotifSampler). The background model file needs to be saved in the directory /usr/share/gimmemotifs/MotifSampler and it should be named <organism_index_name>.bg. So, for instance, if I downloaded the human epd background (epd_homo_sapiens_499_chromgenes_non_split_3.bg), this file should be saved as /usr/share/gimmemotifs/MotifSampler/hg19.bg. here.