Installation¶
GimmeMotifs runs on Linux. On Windows 10 it will run fine using the Windows Subsystem for Linux.
Conda - the easy way¶
The preferred way to install GimmeMotifs is by using conda. Activate the required channels and install mamba (you only have to do this once).
In this example, conda and mamba versions are pinned due to a bug with mamba. For more information, see issue 271.
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda install -c conda-forge "conda>=4.12" "mamba>=0.27"
You can install GimmeMotifs with one command. In the current environment:
$ mamba install gimmemotifs
Or create a specific environment:
$ mamba create -n gimme gimmemotifs
# Activate the environment before you use GimmeMotifs
$ mamba activate gimme
Installation successful? Good. Have a look at the configuration section.
Upgrading from 0.11.1¶
The way genomes are installed and used has been changed from 0.11.1 to 0.12.0. Basically, we have switched to the faidx index used and supported by many other tools. This means that the old (<=0.11.1) GimmeMotifs index cannot be used by GimmeMotifs 0.12.0 and higher. You can re-install genomes using genomepy, which is now the preferred tool for genome management for GimmeMotifs. However, because of this change you can now also directly supply a genome FASTA instead of a genome name. Pre-indexing is not required anymore.
Pip¶
Installation from PyPI with pip
is a relatively straightforward option.
Install with pip as follows:
$ pip install gimmemotifs
Or the (unstable) develop branch with the newest bells, whistles and bugs:
$ pip install git+https://github.com/vanheeringen-lab/gimmemotifs.git@develop
Note that several dependencies and many of the motif tools (such as MEME) need to be installed separately. Instructions for doing so are not included here.
Source - developers install¶
Want to fix that darned bug yourself? Want to try out the latest features?
Well look no further! You can install the develop branch with the newest bells, whistles and bugs:
# download the gimmemotifs code
$ git clone https://github.com/vanheeringen-lab/gimmemotifs.git
$ cd gimmemotifs
$ git checkout develop
# setup the gimme conda environment
$ conda env create -f requirements.yaml
$ conda activate gimme
$ python setup.py build # installs the motif discovery tools
$ pip install -e . # installs gimmemotifs (in editable mode)
# test if the install was successful
$ gimme -h
Once installed, you can edit the code in the gimmemotifs folder, and the changes are immediately active! Check out how good your fixes are with unit tests:
$ pytest -vvv --disable-pytest-warnings
Configuration¶
The configuration file¶
All of GimmeMotifs’ configuration is stored in ~/.config/gimmemotifs/gimmemotifs.cfg
.
The configuration file is created at first run with all defaults set, but you can always edit it afterwards.
It contains two sections main
and params
that take care of paths, file locations, parameter settings etc.
Additionally, every motif tool has it’s own section.
Let’s have a look at the options.
[main]
bg = bg
template_dir = templates
score_dir = score_dists
gene_dir = genes
motif_databases = motif_databases
tools = included_tools/
template_dir
The location of the jinja2 html templates, used to generate the reports.score_dir
To generate p-values, a pre-calculated file with mean and sd of score distributions is needed. These are located here.gene_dir
Directory with bed-files containing gene locations. This is needed to create promoter background sequences.motif_databases
Contains various motif databases.tools
Here all tools included with GimmeMotifs are stored.
[params]
fraction = 0.2
use_strand = False
abs_max = 1000
analysis = xl
enrichment = 1.5
size = 200
lsize = 500
background = gc,random
cluster_threshold = 0.95
scan_cutoff = 0.9
available_tools = AMD,BioProspector,ChIPMunk,DiNAMO,GADEM,HMS,Homer,Improbizer,MDmodule,MEME,MEMEW,MotifSampler,Posmo,ProSampler,Trawler,Weeder,XXmotif,Yamda
tools = BioProspector,Homer,MEME
pvalue = 0.001
max_time = -1
ncpus = 12
motif_db = gimme.vertebrate.v5.0.pfm
use_cache = False
This section specifies all the default GimmeMotifs parameters. Most of these can also be specified at the command-line when running GimmeMotifs, in which case they will override the parameters specified.
Input Data¶
Genomes - and how to get them¶
You will need genome FASTA files for a lot of the tools that are included with GimmeMotifs.
The most straightforward way to download and index a genome is to use the genomepy
tool, which is installed with GimmeMotifs.
$ genomepy install hg38 --provider UCSC --annotation
Here, the hg38 genome and accompanying gene annotation will be downloaded from UCSC to the directory ~/.local/share/genomes/hg38
.
You can change this default location by editing the file ~/.config/genomepy/genomepy.yaml
and change the following line:
genomes_dir: /data/genomes
If this file does not exist, you can generate it with genomepy config generate
.
After downloading a genome with genomepy, you can use its name (e.g. hg38
) for gimme commands.
MotifSampler¶
If you want to use MotifSampler there is one more step that you’ll have
to take after installation of GimmeMotifs. For every organism, you will
need a MotifSampler background. Note that human (hg19, hg38) and mouse (mm9, mm10) background models are included, so for these
organisms MotifSampler will work out of the box. For other organisms the
necessary background files can be created with CreateBackgroundModel
(which is included with GimmeMotifs or can be downloaded from the same
site as MotifSampler). The background model file needs to be saved in
the directory /usr/share/gimmemotifs/MotifSampler
and it should be
named <organism_index_name>.bg
. So, for instance, if I downloaded
the human epd background
(epd_homo_sapiens_499_chromgenes_non_split_3.bg
), this file should
be saved as /usr/share/gimmemotifs/MotifSampler/hg19.bg
.
here.