Glossary Term
Data
Etymology and Terminology
- The word 'data' is the plural of 'datum,' meaning 'thing given' in Latin.
- The first English use of the word 'data' was in the 1640s.
- The term 'data processing' was first used in 1954.
- In everyday language and technical fields, 'data' is often used as a mass noun.
- Some style guides recognize the different meanings of the term, while others recommend the form that suits the target audience.
Meaning
- Data, information, knowledge, and wisdom are closely related concepts.
- Data becomes information after it has been analyzed.
- The extent to which data is informative depends on its unexpectedness.
- Knowledge is the awareness of the environment possessed by an entity.
- Data is often considered the least abstract concept, while knowledge is the most abstract.
Types of Data
- Data can be discrete or continuous.
- It can describe quantity, quality, facts, statistics, or other basic units of meaning.
- Data can be represented as numbers or characters.
- Field data is collected in an uncontrolled environment, while experimental data is generated in a controlled scientific experiment.
- Data sets include price indices, unemployment rates, literacy rates, and census data.
Data Analysis
- Data is analyzed using techniques such as calculation, reasoning, discussion, presentation, and visualization.
- Raw data is typically cleaned before analysis.
- Outliers and errors are removed from raw data.
- Data analysis can yield insights and intelligence.
- Data science uses machine learning and AI methods for efficient analysis of big data.
Data Collection and Longevity
- Data can be gathered through primary or secondary sources.
- Primary sources involve the researcher obtaining the data firsthand.
- Secondary sources involve the researcher obtaining data that has already been collected by other sources.
- Data analysis methodologies include data triangulation and data percolation.
- The longevity of data is an important concern in computer science, technology, and library science.
- Scientific research generates large amounts of data, but storing it on hard drives or optical discs may lead to unreadability after a few decades.
- Data accessibility is a problem, as much scientific data is never published or deposited in data repositories.
- Surveys have shown that the likelihood of retrieving data decreases over time after publication.
- The requirement for FAIR data (Findable, Accessible, Interoperable, and Reusable) aims to improve the reproducibility of research.