Skip to main content
Glossary Term

Data

Etymology and Terminology - The word 'data' is the plural of 'datum,' meaning 'thing given' in Latin. - The first English use of the word 'data' was in the 1640s. - The term 'data processing' was first used in 1954. - In everyday language and technical fields, 'data' is often used as a mass noun. - Some style guides recognize the different meanings of the term, while others recommend the form that suits the target audience. Meaning - Data, information, knowledge, and wisdom are closely related concepts. - Data becomes information after it has been analyzed. - The extent to which data is informative depends on its unexpectedness. - Knowledge is the awareness of the environment possessed by an entity. - Data is often considered the least abstract concept, while knowledge is the most abstract. Types of Data - Data can be discrete or continuous. - It can describe quantity, quality, facts, statistics, or other basic units of meaning. - Data can be represented as numbers or characters. - Field data is collected in an uncontrolled environment, while experimental data is generated in a controlled scientific experiment. - Data sets include price indices, unemployment rates, literacy rates, and census data. Data Analysis - Data is analyzed using techniques such as calculation, reasoning, discussion, presentation, and visualization. - Raw data is typically cleaned before analysis. - Outliers and errors are removed from raw data. - Data analysis can yield insights and intelligence. - Data science uses machine learning and AI methods for efficient analysis of big data. Data Collection and Longevity - Data can be gathered through primary or secondary sources. - Primary sources involve the researcher obtaining the data firsthand. - Secondary sources involve the researcher obtaining data that has already been collected by other sources. - Data analysis methodologies include data triangulation and data percolation. - The longevity of data is an important concern in computer science, technology, and library science. - Scientific research generates large amounts of data, but storing it on hard drives or optical discs may lead to unreadability after a few decades. - Data accessibility is a problem, as much scientific data is never published or deposited in data repositories. - Surveys have shown that the likelihood of retrieving data decreases over time after publication. - The requirement for FAIR data (Findable, Accessible, Interoperable, and Reusable) aims to improve the reproducibility of research.