$Id: MANUAL,v 1.2 2000/04/17 16:17:02 dirk Exp $

NAME
	ROSE - Random-model Of Sequence Evolution 
	Rose implements a new probabilistic model of RNA-, DNA-, or
	protein-sequence evolution.

SYNOPSIS
	rose [-I <dir>[:<dir>]] <input file> | -

AVAILABILITY
	http://bibiserv.TechFak.Uni-Bielefeld.DE/rose/

DESCRIPTION

Rose: generating sequence families

Jens Stoye (1) Dirk Evers (2) and Folker Meyer (2) 

1 Research Center for Interdisciplinary Studies on Structure Formation (FSPM)
2 Technische Fakultaet, University of Bielefeld, Postfach 100 131,
  33501 Bielefeld, Germany 

Motivation: We present a new probabilistic model of the evolution of
RNA-, DNA-, or protein-like sequences and a software tool, Rose, that
implements this model. Guided by an evolutionary tree, a family of
related sequences is created from a common ancestor sequence by
insertion, deletion and substitution of characters. During this
artificial evolutionary process, the `true' history is logged and the
`correct' multiple sequence alignment is created simultaneously. The
model also allows for varying rates of mutation within the sequences,
making it possible to establish so-called sequence motifs.  

Results: The data created by Rose are suitable for the evaluation of
methods in multiple sequence alignment computation and the prediction of
phylogenetic relationships. It can also be useful when teaching courses
in or developing models of sequence evolution and in the study of
evolutionary processes.
 
OPTIONS
	-I dir[:dir]
		A colon-separated list of directories used  to  specify
		include search directories to the input parser.

USAGE

	rose <input file> | -

	Input can be from stdin ( specify a '-'(minus) on the command line)
	or from an input file.

	The input stream may contain the following parameters:

Name			Type		Default	Optional Comment

StdOut                  Boolean         True    Yes     output to stdout
OutputFilename		String		None	Yes	output to single filename
OutputFilebase          String          None    Yes     out to separate files named...
SequenceSuffix          String          ".fas"  Yes     sequence file suffix
AlignmentFormat         String         "PHYLIP" Yes     "FASTA" or "PHYLIP"
AlignmentWithAncestors  Boolean         False   Yes     alignment will contain ancestors
AlignmentSuffix         String (".fa" or ".phy") Yes    alignment file suffix
TreeSuffix              String          ".tree" Yes     tree file suffix
SequenceOutputLen       Integer         60      Yes     Length of Seq on a Line
SeedVal			Integer		None	Yes	Seed of random num gen
SequenceLen		Integer		100	Yes     average sequence length
SequenceNum		Integer		10	Yes     How many sequences?
InputType		Integer		1 	Yes	1=Protein, 4=DNA
Relatedness		Integer		1	Yes	nonsense default value!
ChooseFromLeaves	Boolean		True	Yes	Output only leaf seqs 
TreeWithSequences	Boolean		False	Yes	Tree with seqs attached
TreeSequencesWithGaps   Boolean         False   Yes     Sequences in tree will contain alignment gaps
TreeWithAncestors       Boolean         False   Yes     Give all ancestors in the tree
TheTree			Tree		None	Yes	Tree in Phylip format
TheSequence		String		None	Yes	Start Sequence
ThePAMMatrix		FP Matrix	None	No!*	The Mutation Matrix
TheAlphabet		String		None	No!	The used Alphabet
TheFreq			FP Vector	None	No!	The average freq of Elem
TheInsertThreshold	FP		0.03	Yes	Insertion only % time
TheDeleteThreshold	FP		0.03	Yes	Deletion only % time
TheMutationProbability	FP Vector	[1.0+]	Yes	at a given site
TheDNAmodel		String		None	No!*	"JC","HKY","F81","F84","K2P"
MeanSubstitution	Double		0.01342302	Yes	Mean Subst. Rate (all)
TransitionBias		Double		1.0	Yes	needed for HKY, K2P
TTratio			Double		0.0	Yes	Transition/Transversion (F84)
NumberOfRuns		Integer		1	Yes	number of rose-runs
TheInsFunc		FP Vector	None	No!	Prob of certain length
TheDelFunc		FP Vector	None	No!	Prob of certain length


* either ThePAMMatrix or TheDNAModel has to be specified !!

Assignment
==========

{Tag} = {Value} [;]

Example:
OutputFilename = "myoutput";

Includes
========

May be placed anywhere between complete assignments in the input file
and nested to a given depth.

%include {include filename}

Example:
%include protein-defaults

Comments
========

Can be any of:

C type comments:	/* A comment
			stretching several lines */
C++ type comments:	//Another comment ending with the line

Bourne Shell comments:	# The hash has to be the first character on the line

Type Description
================

Name		Regexp like		Description		Example

Integer		{DIGIT}+		1 or more digits	123
FP		{DIGIT}+"."{DIGIT}*	FP has to have "."	3.4 or .5
		"."{DIGIT}*
Boolean		[Tr]"rue"					True or false
		[Ff]"alse"
String		\"[^\"\n]*\"		double quoted
					text no newlines	"An Example"
Vector		[\[\{]{Objects}[\]\}]				[4], {.4,5.5}
Matrix								[[3,2],[5,5]]
Tree					Phylip Tree		(a,b,(c,d:5,e));


Parse Errors
============

Are commented in compiler style giving file names and line numbers
in a nested fashion together with expected symbol.

Example:
In file included from sample1:2:
protein-defaults:13: parse error, expecting `EQ' or `OBRACE'


EXAMPLES

rose sample2

Takes this input file
----------------------------------------------
# Sample2 for ISMB97 Poster

%include dna-defaults

SequenceNum = 5
ChooseFromLeaves = True
TheSequence = "AGTCTGTACTATAATGGGAGGAAAGCC"
TheTree = ((a:3,b:5):5,(c:4,d:2,e:4):5,(f:3,g:4):6,(h:3,i:3):4);
TheMutationProbability =
	[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,
	0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
----------------------------------------------

includes this default file
----------------------------------------------
#
# default rose include file for DNA
#

InputType = 4 // DNA
TheAlphabet = "ACGT"
TheFreq = [.25,.25,.25,.25]

TheInsertThreshold = 0.09
TheDeleteThreshold = 0.09

TheInsFunc = [.2,.2,.2,1,1,1,1]
TheDelFunc = [.2,.2,.2,1,1,1,1]

ThePAMMatrix = [[.97,.01,.01,.01],
                [.01,.97,.01,.01],
                [.01,.01,.97,.01],
                [.01,.01,.01,.97]]
----------------------------------------------

results in something like this
----------------------------------------------
#i
ACGCTGTAGTATAATGGGAGGAACGCT

#h
ACTATGTCCAATCAACTATAATGGGAGGAACCCT

#e
AGTCCGTACTATAATGGGTTCCAGGAATGC

#d
AGTCAGTACTATAATGGGTTCCAGGAAAGC

#c
AGTCCGTAATATAATGTGTTCCAGGAATCC


Alignment:
       i  ACGCTGT-------AGTATAATGGG----AGGAACGCT
       h  ACTATGTCCAATCAACTATAATGGG----AGGAACCCT
       e  AGTCCGT-------ACTATAATGGGTTCCAGGAATGC-
       d  AGTCAGT-------ACTATAATGGGTTCCAGGAAAGC-
       c  AGTCCGT-------AATATAATGTGTTCCAGGAATCC-


(
(
(
i:3,
h:3):4,
(
e:4,
d:2,
c:4):5));
----------------------------------------------
Giving you:

1. The chosen ancestor sequences
2. Their alignment
3. The coresponding tree with distances

ENVIRONMENT
	No environment variables are used.

FILES
	protein-defaults	default config file to include for protein seqs
	dna-defaults		default config file to include for dna seqs

SEE ALSO

	For a complete description of the functionality of ROSE see:

	Stoye, J., Evers, D., & Meyer, F. (1998)
	Rose: generating sequence families.
	In Bioinformatics, Vol. 14, Issue 2, pp. 157-163.

http://www.oup.co.uk/bioinformatics/hdb/Volume_14/Issue_02/ps/btb005_gml.ps.gz

preprint version:
ftp://ftp.uni-bielefeld.de/pub/papers/techfak/pi/Report97-04.ps.gz

BUGS
	If you encounter strange behaviour please contact:
	mailto:folker@TechFak.Uni-Bielefeld.DE
	mailto:dirk@TechFak.Uni-Bielefeld.DE
	mailto:Jens.Stoye@CeBiTec.Uni-Bielefeld.DE
