Glossary Term
Binary data
Binary Data Basics
- A discrete variable with only one state contains zero information.
- The bit, with two possible values, is the standard unit of information.
- The number of states in a collection of n bits is 2^n.
- The number of states in a collection of discrete variables depends exponentially on the number of variables.
- Ten bits have more states (1024) than three decimal digits (1000).
- Binary data consists of categorical data with two possible values.
- Binary data is often used to represent the outcome of an experiment or a yes-no question.
- Binary data is nominal data and cannot be compared numerically.
- Binary data can also represent the presence or absence of a feature.
- Binary data can be used to represent political party choices in elections.
Binary Variables
- Binary variables have two possible values.
- Independent and identically distributed binary variables follow a Bernoulli distribution.
- Total counts of i.i.d. binary variables follow a binomial distribution.
- Binary data need not come from i.i.d. variables.
- The distribution of binary variables may not be binomial if they are not i.i.d.
Counting and Conversion of Binary Data
- Binary data can be converted to count data by assigning 1 for a value that occurs and 0 for a value that does not occur.
- Grouping binary data allows for counting the occurrences of each value.
- Binary data can be simplified to a single count by considering one value as success and the other as failure.
- Count data with n=1 is binary data.
- Counts of i.i.d. binary variables follow a binomial distribution.
Binary Regression
- Binary regression analyzes predicted outcomes that are binary variables.
- Binomial regression can be used when binary data is converted to count data.
- Logistic regression and probit regression are common methods for binary regression.
- Multinomial regression models counts of i.i.d. categorical variables with more than two categories.
- Non-i.i.d. binary data can be modeled using more complex distributions like the beta-binomial distribution.
Binary Representation and Formats
- 1 and 0 represent two different voltage levels.
- Computers understand 1 as higher voltage and 0 as lower voltage.
- Different methods can be used to store two voltage levels.
- Magnetic tapes with a coating of ferromagnetic material can store 1 and 0 data.
- The orientation of magnetic domains determines whether it is interpreted as 1 or 0.
- Textual data can be represented in binary format, such as compressed or formatted files.
- Image data can sometimes be represented in textual format, like the X PixMap image format.
- Binary formats are more specific for representing data without interpretation.
- Textual formats may include formatting codes and other text-related elements.
- The choice between binary and textual formats depends on the nature of the data.