Please find benchmark dataset from the link below
You will find total 2141 of DNA segments. Among them 741 were σ70 promoters sequence of E. coli K-12. Rest of them were randomly chosen non-promoter sequences which was extracted from coding regions and intergenic regions of E.coli K-12 genome. All of them were 81 bp where 60 bp upstream and 20 bp downstream of the TSS (Transcript Start Site) at each sequence in the dataset [1, 2].Click Here
Hao Lin, Zhi-Yong Liang, Hua Tang, and Wei Chen. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM transactions on computational biology and bioinformatics, 2017.
Socorro Gama-Castro, Heladia Salgado, Alberto Santos-Zavaleta, Daniela Ledezma-Tejeida, Luis Muñiz-Rascado, Jair Santiago Garcı́a-Sotelo, Kevin Alquicira-Hernández, Irma Martı́nez-Flores, Lu- cia Pannier, Jaime Abraham Castro-Mondragón, et al. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic acids research, 44(D1):D133– D143, 2015.