ASCII-American Standard Code for Information Interchange. A character encoding scheme used by many computers. The ASCII standard uses 7 of the 8 bits that make up a byte to define the codes for 128 characters. Example: in ASCII, the number seven is a treated as a character and is encoded as: 00010111. Because a byte can have a total of 256 possible values, there are an additional 128 possible characters that can be encoded into a byte, but there is no formal ASCII standard for those additional 128 characters. Most IBM-compatible personal computers do use an IBM extended character set that includes international characters, line and box drawing characters, Greek letters, and mathematical symbols.
C-A programming language.
CBw.d-Instructions that the SAS System uses to read standard numeric values from column-binary files, translating the data into standard binary format. The w value specifies the width of the variable, usually 8, but has a range between 1 and 32. The d value specifies the number of digits to the right of the decimal point in the numeric value.
card-Also known as deck, a physical record of data. A survey may have multiple cards for each respondent, all cards together comprising a logical record. Based on the IBM punch cards of 80-column length.
case-The unit of analysis in a particular data file. Can be an individual respondent to a questionnaire, a customer, or an industry. In the Roper Reports, each case is an interview respondent.
codebook-Description of the organization and content of a data file. Contains the code ranges and the code meanings needed to interpret the data file.
column binary-A code originally used with punched cards in which successive bits are represented by the presence or absence of punches in contiguous positions in columns. Using this method, responses to more than one question can be stored in a single column.
data dictionary-A file, part of a file, or part of a printed codebook containing information about a data file, including the name of the element, its format, location, and size.
Data Documentation Initiative (DDI)-An international committee sponsored by ICPSR that is developing a new metadata standard for social science documentation. This standard, developed by representatives from the international social science research community, is intended to fill the need for a structured codebook standard that will serve as an interchange format and permit the development of new Web applications. The Document Type Definition (DTD) for the DDI standard is written in XML (Extensible Markup Language) and is available at http://www.icpsr.umich.edu/DDI/.
deck-Also known as card, a logical record of data. A survey may have multiple cards for each respondent, all cards together comprising a logical record. Based on the IBM punch cards of 80-column length.
documentation-Information that accompanies a data file, describing the condition of the data, the creation of the file, the location and size of variables in the file, and the values (or codes) of the variables.
export file-A file produced by a software package that is designed to be read on another computer, often with a different operating system, running a version of the same software package.
HTML-HyperText Markup Language
ICPSR-Inter-university Consortium for Political and Social Research
informat-The instructions that specify how SAS reads the numbers and characters in a data file.
intermediate variable-A variable used when recoding data to input information from individual punches in multipunch data. Sets of intermediate variables are then recoded to produce final variables.
logical record-A complete unit of data for a particular unit of analysis, in this project a single respondent. Multiple physical records, called cards or decks, may make up a logical record.
missing value-A value code that indicates no data are present for a variable for a particular case. To be distinguished from non-response values (respondent refused to answer or was not asked the question) and from invalid responses (the response did not have a valid value code equivalent). Non-responses and invalid responses may or may not have value categories provided in the questionnaire and may be treated differently from true missing data during analysis.
multipunched-A way of recording data, originally used with punched cards, in which successive bits are represented by the presence or absence of punches in contiguous positions in columns. Using this method, responses to more than one question can be stored in a single column.
OCR-Optical character recognition
PDF-Portable Document Format, a published standard format developed by Adobe Systems, accessed with proprietary software.
punch card-A paper medium used for recording computer-readable data. The card is punched by a special machine called a keypunch that works like a typewriter, except that it punches holes in cards instead of typing characters on paper. The punch cards are then processed with a card reader that transfers the punched information to a computer-readable digital format.
PUNCH.d-Instructions that the SAS system uses to read standard numeric values from column binary files. The d value specifies which row in a card column to read. Valid values for the d value are 1 through 12.
questionnaire-The set of questions asked in a survey. In the Yale Roper Collection, the questionnaire, with columns and codes written next to the question, may substitute for a codebook.
recode-Changing the value code of a variable from one value to another. For example, changing 0 and 1 values in column binary data files to value ranges of 0 through 12. Also known as data transformation.
respondent-In survey research, the person responding to the survey questions.
ROWw.d-Instructions that the SAS system uses to read a column-binary field down a card column. The w value specifies the row where the field begins, with a range between 1 and 12. The d value specifies the length in rows of the field. Valid values for d are 1 through 25, with the default value of 1. The informat assigns the relative position of the punch in the field range to a numeric variable.
SAS-Set of proprietary computer programs used for analysis of social science statistical data. (No longer an acronym; originally stood for Statistical Analysis System.)
SGML-Standard General Mark-Up Language
SPSS-Statistical Package for the Social Sciences. Set of proprietary computer programs used for analysis of social science statistical data.
single-punch-A single response coded in a column.
split sample-A method of data collection in which one group of respondents is queried with one form of a questionnaire and the second group is queried with a different form of the questionnaire.
spread-Recoding multiple responses that have been coded in a single column of a record to a separate column for each response.
system file-A data file or collection of data files specifically formatted for a particular software package; may not be readable by other software packages.
TIFF-Tagged Image File Format
values-The numeric or character equivalents for a particular variable in a data file.
variable-An item in a data file to which a value has been assigned. A data file contains the values of certain variables measured for a set of cases. In the Roper Report data files, variables are responses to questions or parts of questions from each person interviewed.
XML-Extensible Markup Language (XML) is a data format for structured document interchange on the Web.
xray-A form of output that is organized by card, column, and row; each bit has its own unique location within this framework. The total number of punched bits across all observations is recorded for each location in the data set. This sum often provides a response frequency for individual response options.
Sources of Glossary Terms
Armor, David J., and Arthur S. Couch. 1972. Data-text Primer: An Introduction to Computerized Social Data Analysis. New York: The Free Press.
Dodd, Sue A. 1982. Cataloging Machine-readable Data Files: An interpretive Manual. Chicago: American Library Association.
Dodd, Sue A., and Ann M. Sandberg-Fox. 1985. Cataloging Microcomputer Files: a Manual of Interpretation for AACR2. Chicago: American Library Association.
Geda, Carolyn L. [n.d.] Data Preparation Manual. Sponsored by John D. Peine, Project Coordinator, Heritage Conservation and Recreation Service, U.S. Department of the Interior.
Jacobs, Jim. Glossary of Selected Social Science Computing Terms and Social Science Data Terms. University of California, San Diego. Available at http://odwin.ucsd.edu/glossary/index.html.
SAS Institute. 1990. SAS Language: Reference. Version 6, 1st ed. Cary, NC: SAS Institute.
Sippl, Charles J. 1966. Computer Dictionary. Indianapolis: Howard W. Sams & Co., Inc.
Sippl, Charles J., and Roger J. Sippl. 1980. Computer Dictionary. 3rd ed. Indianapolis: Howard W. Sams & Co, Inc.
Spencer, Donald D. 1968. Computer Programmer’s Dictionary and Handbook. Waltham, MA: Blaisell Publishing Company.
SPSS, Inc. 1988. SPSS-X User’s Guide. 3rd ed. Chicago: SPSS, Inc.
Weik, Martin H. 1969. Standard Dictionary of Computers and Information Processing. New York: Hayden Book Company.