Coming up with a definition for data turned out to be one of our early challenges.
The term “data” is used in a variety of ways depending on field and use. A computer scientist might use the term to refer to the flow of zeros and ones that stream through data cables and are used to transmit videos, web pages and the like, or to the aggregate of all the information available on the world wide web. A statistician or survey researcher might think of a numeric dataset structured for use in a statistical package. In general, we will take an intermediate approach. Content on the web, video streams, survey responses, or engineering measurements constitute potential sources of data for researchers, but the data we are concerned with here is the product of taking that raw informational input and assembling it into a structured form for analysis. In other words, data are a product of researchers as well as an input for research. Research data collections (or datasets) are generally in electronic form and are accompanied by or incorporate metadata, or documentation that describes the structure and content of the data.
In brief, the term data, as used in our book, describes electronic files of information that have been collected systematically and structured and documented to serve as input for further research.
Summarized from the introduction to Databrarianship: the Academic Data Librarian in Theory and Practice.