pan_matrix.txt (3.88 MB)

Pan Matrix data

dataset

posted on 2013-12-13, 14:05 authored by Lars Snipen, David Ussery

The file pan_matrix.txt is a huge table (tab-separated columns) where each row corresponds to a genome and each column to a domain sequences family. The rows are named by the BIOID-code, see map_ecoli.txt to look up the strain names. The columns are named Cluster 1, Cluster 2,...etc. The corresponding Pfam-A domain sequence is given in the file cluster_info.txt (see below). In cell (i,j) in this table you find the number of occurrences that domain sequence j has in genome number i.