Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The file cluster_info.txt is a table with two columns. Each row corresponds to a domain sequence family. These are named Cluster 1, Cluster 2,...etc. In the second column is listed the ordered occurrence of Pfam-A domains for each domain sequence family.