Contents

### RDM Services at Your Library

Need help managing your research data? Staff at your local institution’s library can provide assistance with all phases of the data lifecycle, which may include:

- developing data management plans
- documenting of research data
- sharing and long-term preservation of data
- using online research data repositories (including Scholars Portal Dataverse)
- publishing options
- author rights

Please select your institution below to contact local research data services:

#### Ontario

Brock University

Carleton University

Lakehead University

Laurentian University

McMaster University

Nipissing University

#### Quebec

#### Western Canada

### Terminology

**Cases** – These are the units of analysis, things that have certain characteristics or properties. For example, the cases could be individuals in a statistics class, all residents of Chicago, hamburgers, cities, countries, organizations, or lakes. We want to reach a conclusion about their characteristics.

**Data** refers to quantitative data *and* research files broadly (i.e. field notes, ethnographic descriptive text, images, etc.). Dataverse accepts all kinds of data and files.

**Tabular data**is quantitative data (numbers) arranged in a table. Dataverse can only run statistical analyses on tabular data files. Accepted file formats are: SPSS/POR, SPSS/SAV, Strata, CSV (w/SPSS card), and TAB (w/DDI). Dataverse will maintain usability of tabular data files over time. For example, if .sav files become obsolete, Dataverse will republish deposited data in new useable formats.

**Network data**is represented in XML files. These files contain information about network properties (nodes, edges). Network data is used for network analysis (i.e. social network analysis). Dataverse can visualize network data from GraphML files.

**Frequency** – The frequency for a value is the number of cases that fall into the category and is also called a “count”.

**Metadata **– text that describes your research study. Metadata fields include the abstract, keywords, and data collection mode (among others). All metadata fields in Dataverse are defined on the site itself and are compliant with the DDI standard schema version 2. For an overview of DDI standards, visit ddialliance.org. To view a complete list of DDI fields in Dataverse, see this document (PDF).

**Values** – These are the possible outcomes for a single variable. They are different for the different cases. Values can be numbers or named categories. For example the variable GENDER traditionally has two values, “man” and “woman”. Some people (cases) are men, and some are women.

**Variable** – This is the characteristic or property in which we are interested. It is a characteristic that pertains to the cases. A variable must be able to take on different values for different cases. Variables include characteristics like people’s GENDER, people’s HEIGHT, the DEPTHS of lakes, lake TEMPERATURES, organizations’ REVENUES, and whether a hamburger is COOKED rare, medium, or well-done. Often we look at two or more different variables at a time and ask whether they are related for a specific set of cases. For example, we might want to know if GENDER is related to HEIGHT among human beings or it TEMPERATURE is related to DEPTH for lakes.

**Variable: Character** – In this level of measurement, the values of the variable are “qualities” or categoric pigeonholes, which may or may not be orderable. These categoric values can be given code numbers, but the numbers do not refer to an equal-interval scale or to real quantities. Generally, we cannot compute a mean or other quantitative summary measures for the variable. These categories should be exhaustive and mutually exclusive.

**Variable: Continuous** – This level of measurement is like the interval-ratio level. The values of the variable are quantitative, definite meaningful numbers on a scale. Furthermore, we can think of them as points along a continuum that can be subdivided forever. Measuring length or distance with a rule is a simple example of collecting data at a continuous level of measurement. Most researchers treat percentages and other kinds of proportions as continuous data. It makes sense to compute a mean and other quantitative summary measures for these data.

**Variable: Discrete** – These are quantitative variables whose values fall along a scale or metric, often with a true 0, but they are not really continuous. The units of measurement are whole numbers, and it makes little sense to indefinitely subdivide the units. Only a whole number makes sense for the value. For instance, generally people don’t have a fraction of a sibling or a fractional number of body pierces–only whole numbers. These data are discrete, but notice that the numbers do refer to a real scale (not just code numbers), and most researchers end up treating them as if they were continuous data. It makes sense to computer a mean (and other quantitative summary measures). For example, we can talk about the “mean number of children born to women in the Yukon” and come up with a fractional amount although each woman has only a whole number of children.

from Garner, R. (2005). *The joy of stats: A short guide to introductory statistics in the social sciences*. Peterborough, Ont: Broadview Press.

### Other Guides

Scholars Portal: <odesi>

Harvard Dataverse Project: Advanced User Guide

### Resources for Statistics and Data

**Introductory:**

Garner, R. (2010). *The joy of stats: A short guide to introductory statistics in the social sciences*. Toronto, Ont: University of Toronto Press.

Rowntree, D. (2000). *Statistics without tears: A primer for non-mathematicians. *London: Penguin.

**Intermediate:**

Blaikie, N. W. H. (2003). *Analyzing quantitative data: From description to explanation*. London: Sage Publications.

Erickson, B. H., & Nosanchuk, T. A. (1992). *Understanding data *(2nd ed). Toronto, ON: University of Toronto Press.