Roughly 340 terabytes of redundant image data are clogging the servers of Dhaka City Corporation's central digital repository, according to an internal audit completed in March 2026 — a figure that represents nearly 38 percent of the entire storage infrastructure maintained by the Nagar Bhaban administrative complex on Fulbaria Road. The audit, which covered digitisation work carried out between 2019 and early 2026, found that duplicate images — identical or near-identical scanned documents filed multiple times across departments — account for the single largest category of wasted storage in the city's municipal digital system.
The timing matters. Bangladesh's Digital Bangladesh initiative, which shifted enormous volumes of land records, birth certificates and urban planning documents into electronic formats over the past decade, created vast new archives. But the push to digitise quickly, often without unified file-naming conventions or deduplication protocols, left institutions holding multiple copies of the same scans. With the government's successor programme, Smart Bangladesh 2041, now demanding interoperability between city and national databases, the duplicate problem has become an active obstacle rather than a background inefficiency.
Where the Data Piles Up
Two institutions stand out in the audit's findings. The Dhaka Deputy Commissioner's office at Segunbagicha, which handles land mutation and property registration records, was found to carry duplicate image rates above 41 percent in its scanned document folders as of February 2026. The Bangladesh National Archives facility in Agargaon reported a lower but still significant rate of around 22 percent duplication across its digitised photographic collections, spanning materials from the Liberation War period through to urban planning maps from the 1990s.
Storage is not cheap. Server capacity rented through the government's National Data Center in Kaliakoir costs institutions approximately Tk 4,200 per terabyte per year under current procurement rates. At 340 terabytes of pure duplication in the DCC system alone, that translates to roughly Tk 1.43 crore in annual spending on storage that holds no unique information. Multiply that figure across the 12 other city-level agencies identified in the audit as having duplicate rates above 20 percent, and the aggregate waste figure climbs considerably higher — though the full cross-agency total has not yet been formally published.
The problem is not unique to government. Dhaka's private media sector has its own version of it. Several Bangla-language news portals operating out of Karwan Bazar — the city's main media district — have reported that their content management systems contain duplicate feature images numbering in the hundreds of thousands, accumulated over years of editorial handoffs and platform migrations. For commercial operators paying cloud storage fees to international providers, the cost per terabyte is considerably higher than government procurement rates.
What Deduplication Actually Involves
The process of identifying and removing duplicate images is more technically involved than simply deleting copied files. Hash-based deduplication software compares unique digital fingerprints — known as checksums — for every image file, flagging those that match exactly. Near-duplicate detection, which catches the same photograph saved at different resolutions or compression levels, requires more computationally intensive perceptual hashing algorithms. The Bangladesh Computer Council, headquartered in Agargaon, has been running a pilot deduplication programme since January 2026 covering three participating agencies, with results expected to be presented to the ICT Division before September.
Early results from that pilot, covering approximately 18 terabytes of test data at the Department of Immigration and Passports, reportedly reduced the active image library size by 29 percent after a single deduplication pass — though the BCC has not yet released those figures officially.
For city institutions looking to act before formal guidance arrives, the practical path forward involves three steps: commissioning a storage audit using open-source tools such as dupeGuru or rdfind, establishing a single canonical file-naming standard before any new scanning contracts are signed, and building deduplication checkpoints into procurement agreements with document management vendors. The DCC's March audit recommends that all new digitisation tenders issued after July 2026 include mandatory deduplication clauses — a small administrative change that could prevent the next decade of redundancy from accumulating in the first place.