HASH VALUE: AUTHENTICATION AND ADMISSIBILITY IN INDIAN PERSPECTIVE
Hash value plays a significant role in establishing the authenticity and integrity of data/evidence in the digital world particularly in Cryptography, Data Analyses and Forensic Imaging etc. Hash Value popularly known as Fingerprint of data is the crucial single factors which not only authenticate the integrity of data but also play crucial role in the validation of the forensic processes & equipments used for the forensic examination. This article seeks to explain the various stipulations associated with the hash value such as hashing algorithm, its uniqueness & standards which plays an important role in proving the forensic electronic evidence in the court of law which could have determining factors in deciding the fate of the case. It covers the vital issues which a legal counsel or Jurist must be aware to deal with the digital evidence with particular reference to the provision of Information Technology Act, 2000.
“The forensic sciences require adherence to standards of operation and of performance. These standards must be clearly enunciated and must be, at least in their basic form, the consensus of opinion of workers in that particular subject area. Stated differently, forensic scientists are not entitled to indulge whims in the conduct of their work. A forensic scientist who adopts an extreme position that runs counter to the flow of prevailing opinion on a subject, or who enters an area in which operational norms have not been established has a burden even greater than usual to justify that position in the light of good scientific practice. (Thornton J. I. (1997)” The General Assumptions and Rationale of Forensic Identification,” for David L. Faigman, David H. Kaye, Michael J. Saks, & Joseph Sanders, Editors, Modern Scientific Evidence: The Law and Science of Expert Testimony, Volume 2, St. Paul, MN; West Publishing Company).
Electronic Evidence is becoming more commonplace in civil and criminal cases. Hash value is used as important tool in examining, discovering and authenticating electronic evidence. It would be pertinent to say that it is only the significant dominion factor which play crucial role and also widely accepted means of authenticating electronic evidence which is also legally admissible in the court of law throughout World. “A “hash value” is an electronic fingerprint. The data within a file is represented through the cryptographic algorithm as that hash value” . The “hash value” is an important tool to identify and authenticate digital evidence and thus can be used in the court of law to prevent an allegation that the exhibited electronic data has not been altered. The admissibility of the hash value have increased over the years as the hash values have unique identification capabilities that have a high degree of accuracy to confirm whether two records or files are a match or are dissimilar.
“Managing Discovery of Electronic Information: A Pocket Guide for Judges”, defines “hash value” as :-
“A unique numerical identifier that can be assigned to a file, a group of files, or a portion of a file, based on a standard mathematical algorithm applied to the characteristics of the data set. The most commonly used algorithms, known as MD5 and SHA, will generate numerical values so distinctive that the chance that any two data set will have the same hash value, no matter how similar they appear, is less than one in one billion. ‘Hashing is used to guarantee the authenticity of an original data set and can be used as a digital equivalent of the Bates stamp used in paper document production.”
Hashing is the process of mapping large amount of data item to a smaller table with the help of a hashing function/algorithm. A hashing algorithm transforms an arbitrarily long block of data into a large number. This large number called the hash value, has a few unique characteristics:
- It has no correlation with the original data from which it is derived and nothing regarding the original data can be inferred from it.
- Slight changes in the original data produce large, random, altogether different changes in the hash value.
- Generated hash values are evenly scattered over the wide range of possible values (i.e., all possible values are equally likely to occur).
The most widely used hash functions are Hash Functions are: MD5 and SHA. MD5 was developed by Professor Ronald L. Rivest of MIT. The MD5 algorithm takes as input a message of arbitrary length and produces as output a 128-bit fingerprint of the input. It is an algorithm that is used to verify data integrity through the creation of a 128-bit message digest from data input (message of any length) that is claimed to be as unique to that specific data as a fingerprint is to the specific individual.SHA stands for Secure Hash Algorithm. The SHA hash functions are a set of cryptographic hash functions designed by the National Security Agency (NSA).The five algorithms are denoted SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512. SHA-1 produces a message digest that is 160 bits long; the number in the other four algorithms’ names denote the bit length of the digest they produce.SHA-1 is similar to the MD4 and MD5 algorithms developed by Rivest, but it is slightly slower and more secure.
Uses of Hashing in Forensic Process
Hash values are used during different phases of the forensic process:-
- A hash value is used to ensure that the examined copy/mirror image is the replica of the original. The basic principle adopted in the forensic examination of the electronic evidence is that examination is never conducted on the original evidence except under some exceptional circumstances. The image is used during the forensic examination to preserve the integrity of the original evidence. A hash value is taken of the imaged copy before any examination and matched with the hash value of the original evidence, if the hash values are same, then the copy is treated the same as original.
- The pre-acquisition hash is computed to maintain the authenticity and integrity of the evidence when it is seized or received for the examination. The pre-acquisition hash made at this stage can be verified at any stage to establish that there is no alteration or tempering in the evidence by re-calculating and comparing the same at any stage. It also verifies the authenticity of the process used for forensic imaging. The matching of pre-acquisition hash with post-acquisition hash will authenticate and validate the imaging process as well as hardware and software used for this purpose. It also established the evidentiary value by proving the veracity/authenticity/integrity of the forensic process as well as evidence item. The pre-acquisition hash value can be used by the court/defense to establish/contradict the veracity of version of the prosecutorial or investigative process.
- The non-matching of acquisition hash with the hash of source image file may have serious ramifications as the examiner technical knowledge and expertise would be challenged at the time of prosecution in the court and further the authenticity and validation of the forensic software would become a issue of questioning during evidence in the court. The authenticity and the integrity of the outcome of the forensic examination would be challenged as change in the MD5 hash value implies the change of the information in the destination media as compared to source media. There may be number of reason such as destination media being larger in size as compared to the source media, failure of forensic software/hardware during acquisition process.
- During forensic examination, first Hash is calculated at the time of the acquisition of evidence so that authenticity and integrity of the evidence can be establish in the court of law by comparing the such Hash value with the Hash value calculate at any stage or before the court. The second Hash value is calculated of the image which on comparison with the Hash value of original evidence will prove the authenticity of the image so prepared and also the subsequent evidence or artifacts extracted from such evidence. The third Hash value is calculated of the original media after imaging to established that the original media has not been alter in the preparation of forensic backup. Further, the Hash value can be calculated at any stage, pre or post examination which will not only established the authenticity of examination process but also the result.
- During the forensic examination process the Hash value can be used to eliminate the files which are not relevant and likely to unnecessary delay the forensic examination process such as operating system file which can be ignored by the investigator. If the data sought to be recovered is known to the investigator then rather than making the entire search, the investigator can search for the identified file based on their Hash value such as existing image of child pornography, document containing proprietary data etc.
Admissibility in Court of Law
- Hash value can be used to authenticate evidence in the court of law as well as during discovery process. One method of authenticating electronic evidence under Rule 901(b)(4) is the use of “hash values” or “hash marks” when making documents. A hash value” is an alphanumeric string that serves to identify an individual digital file as a kind of “digital fingerprint.” Although it may be possible for two digital files to have hash values that “collide,” or overlap, it is unlikely that the values of two dissimilar images will do so. United States v. Cartier, 543 F.3d 442, 446 (8th Cir. 2008). In the present case, the district court found that files with the same hash value have a 99.99percent probability of being identical.
- While commenting on the role of hash values in identifying the impugned data by the investigation agency in Peer to Peer Network and its admissibility in the court, it was observed that “the hash value is a reference code comprising a string of letters and numbers, which is used to identify each piece of the content to be shared. This enables the tracker to recognize pieces of the content file as they are shared and is intended to ensure that the content files are correctly downloaded and unmodified.”
- The court dealing with the issue of Key Word search and search by Hash Value also observed that it would narrowly restrict the searched of digital devices and observed that “A file can be mislabeled; its extension (a sort of suffix indicating the type of file) can be changed; it can actually be converted to a different file type (just as a chat transcript can be captured as an image file, so can an image be inserted into a word-processing file and saved as such). Any of these manipulations could change a document’s hash value. And in any event a limited hash-value search would not have turned up any chat transcripts (which, again, can be saved as image files)”
- The hash value can be used to authenticate the integrity of the data exchanged between the parties and any alteration would result into change in hash value. The court while dealing with the issue of exchange of data coupled with the allegations of manipulation leveled by the opposite party observed that “Defendant’s concerns regarding maintaining the integrity of the spreadsheet’s values and data could have been addressed by the less intrusive and more efficient use of “hash marks.” For example, Defendant could have run the data through a mathematical process to generate a shorter symbolic reference to the original file, called a “hash mark” or “hash value,” that is unique to that particular file. This “digital fingerprint” akin to a tamper evident seal on a software package would have shown if the electronic spreadsheets were altered. When an electronic file is sent with a hash mark, others can read it, but the file cannot be altered without a change also occurring in the hash mark. The producing party can be certain that the file was not altered by running the creator’s hash mark algorithm to verify that the original hash mark is generated. This method allows a large amount of data to be self-authenticating with a rather small hash mark, efficiently assuring that the original image has not been manipulated.”
- Information Technology Act, 2000 also support the international accepted hash function as the unique and reliable method to authenticate the integrity of data as emerging from the explanation of Sec 3 which provides:-
Explanation.– For the purposes of this sub-section, “hash function” means an algorithm mapping or translation of one sequence of bits into another, generally smaller, set known as “hash result” such that an electronic record yields the same hash result every time the algorithm is executed with the same electronic record as its input making it computationally infeasible–
a. to derive or reconstruct the original electronic record from the hash result produced by the algorithm;
b. that two electronic records can produce the same hash result using the algorithm.
Section 3, further provides that the authentication of the electronic record shall be effected by the use of asymmetric crypto system and hash function.
- The rule 3, 4 and 5 of Information Technology (Certifying Authorities) Rules 2000 provides the use of the hash function in authentication of information by digital signature and in creation and verification of digital signatures and further provides that the electronic record was unaltered, which is known to be the case if the hash result computed by the verifier is identical to the hash result extracted from the Digital Signature during the verification process. The Rule 6 of the Information Technology (Certifying Authorities) Rules 2000 recognize the MD5 & SHA-2 as the accepted standard digital hash function.
Thus, the hash value is internationally accepted scientifically attested means to authenticate the reliability and authenticity of the electronic evidence which has also been recognized as admissible by the Courts in USA as well as various other digitally advanced countries. The provision of Information Technology Act, 2000 also recognize the hash value as unique and MD5 & SHA-2 as the standard hash function attuned to International Standards but how far these are used in investigation or digital forensic and its admissibility in the India Courts is yet to be seen and still a long journey but s brust in cyber offences in the last few years hardly leaves any choice for the investigation agencies and forensic institution.
 L-3 COMMUNICATIONS WESTWOOD CORP. V. JOSEPH EMILE ROBICHARUX, JR. ET AL., NO: 06-0279
 USA vs. Wellman, No. 10-4689
 Dramatico Entertainment Ltd. vs. British Sky Broadcasting Ltd., [2012 ]EWHC 268 (Ch)
 Lorraine v. Markel – ESIADMISSIBILITY OPINION, NO. PWG-06-1893
 Shirley WILLIAMS v. SPRINT/UNITED MANAGEMENT COMPANY, No. CIV.A.03-2200-JWLDJW