Finding Regulatory Motifs of Genetic Networks Using Cut-Sort Algorithm
Ahmad M. Al-Omari, Mohammed H. Tawalbeh, Abedalmuhdi M. Almomany |Pages: 77-90|

Abstract— Understanding the targets of regulatory genes has become a challenging problem for bioinformaticians and biologists in systems biology. The main issue in solving this challenge consists in finding motifs that are finding short, recurring patterns in DNA or in amino-acid sequences that presumably have a regulatory function. A motif is considered a signature for a protein family binding to sequence motifs in the genome. The major challenge in finding motifs arises from the fact that most of the time the motifs are not well conserved. To discover such degenerate motifs, aligning multiple sequence motifs becomes a challenge. Usually, a motif discovery algorithm uses some prior information about the motifs to be discovered. In this paper, we present a novel algorithm for finding conserved sequence motifs in DNA without having a priori knowledge about the motifs. However, the algorithm can be used for motifs sequence both in DNA and in proteins. Our algorithm mainly depends on cutting sequences that have conserved motifs into equal fragments, sorting the fragments and then extending in both fragment directions. The algorithm runs in a very short time period. It takes 5.5 seconds for a real data sequence with length N = 28,000 nucleotides to find its identical, degenerate, long and short motifs; it can be easily parallelized by implementing it on General Purpose Graphical Processing Units. The algorithm guarantees to find any globally optimal solution within a short time even for sequences with very long motifs.