Many of the evaluation techniques we describe involve collecting information from or about discrete units, such as trees, streets, blocks, or residents. In many cases, it may not be practical to perform a complete census of every unit in the overall population. However, it is still possible to obtain reliable information about the overall population by collecting data from a representative subset or sample. Sampling is simply the technique used to choose representative units for study from a larger population. Sampling is a prerequisite of several of the assessment methods discussed in section 3, including photogrammetry, ground survey, and public polling.

The reason for using statistically sound sampling methods is to avoid **bias**
in the estimates of the parameter(s) you are measuring. Although the value of
any single estimate (biased or not) is unlikely to equal the true population
value, the mean of a large number of unbiased estimates will approximate the
true value. In contrast, the mean of a large number of biased estimates will
either be higher or lower than the true population value, depending on the direction
of the bias. Hence, if you are interested in knowing the actual value of a parameter
from the population (e.g., actual percent tree canopy cover), you generally
want to use an unbiased estimator of that parameter. In some situations, a small
bias (e.g., a tendency to slightly over- or underestimate cover) can be tolerated
if the bias is small relative to the standard deviation of the estimation errors
(perhaps 10% to 15% or less).

Bias in estimates can come from various sources. For instance, if tree shadows are counted as canopy in aerial photo interpretation (misclassification bias), the canopy cover estimate will be biased upward. In public polling, people who fail to respond to a survey may constitute a source of sampling bias. If some segment(s) of the population (e.g., retirees, working couples, low-income households) are either more or less likely to respond than other population segments, responses may not be representative of the population as a whole. Many types of bias can be avoided through good sampling design and the careful implementation of appropriate evaluation techniques.

Most statistical methods are based on the assumption of **random sampling**.
This simply means that every unit in the population has an equal chance of being
chosen for the sample. Furthermore, the selection of random units should be
**independent** of other units that have been sampled. If you reject a sample
unit because you think it is too close to one already chosen, your sample will
not be random and independent. A relatively simple and reliable method for randomization
is to use random numbers. Most spreadsheet, database, and statistical programs
that run on personal computers have functions that generate random numbers.
Although these random number generators may not be optimal, they will generally
suffice. You can also obtain random numbers from online generators (e.g., https://www.random.org).

Several techniques can be used to draw a random sample from a population that consists of individual objects or records (e.g., street addresses or tree numbers). Many spreadsheet programs, include tools that can produce a random sample of a specified size from a range of cells. Alternatively, you can assign a unique random number to each unit or record, sort on the random number, and pick the required number of units from the top of the sorted database.

In some cases, it is necessary to take random samples across a geographic area, such as part or all of a city or forested area. In such a situation, random sample points can be assigned by randomly sampling from a coordinate grid that has been established for the area in question. This may either be an existing set of map-based coordinates, such as UTM or State Plane grids, or an arbitrary grid based on units measured on a map or aerial photograph (e.g., distances measured from the bottom and left edge of the map or photo). After you have determined the range of X and Y coordinates within the area to be sampled, X and Y coordinates can be selected randomly to generate random sample points.

If strata are assigned so that each is more or less homogeneous with respect
to the characters being measured, fewer samples will be needed to adequately
characterize each stratum. For instance, if tree cover is to be assessed in
different portions of a city, visual
estimates of the tree canopy cover could be used to help demarcate zones
where canopy cover is relatively uniform. A sample of street trees might be
stratified by tree species, size, and/or age, depending on the purpose of the
evaluation. If these trees were classified in a municipal street tree database,
stratification might be accomplished relatively simply from existing tree data.
However, if such data are lacking, it may be necessary to conduct a preliminary
sample to delineate the population before sampling occurs. For example, in a
study we conducted on utility pruning, we needed to sample from a population
of matched pairs of London plane (*Platanus* x *acerifolia*) street
trees that were both directly under conductors and had clearances within a certain
range. Because existing tree inventories did not contain all of the necessary
information, we surveyed the study area to identify a population of trees that
met these criteria. These trees constituted a particular stratum of the street
tree population.

Once strata are assigned and delineated, samples are drawn at random from within each stratum. If the number of samples selected from each stratum is not proportional to the size of the stratum, then the averages from each will have to be weighted to obtain an overall population average.

In general:

- up to a point, the reliability of estimates will increase as sample size increases;
- the more variable the population is with respect to the characteristic(s) being rated, the larger the sample should be;
- a large sample is required to accurately estimate the frequencies of relatively rare events or characteristics;
- larger sample sizes are needed in order to detect relatively small differences between means or proportions; smaller sample sizes may suffice if the differences are relatively large.

The optimum sample size represents a compromise between cost and accuracy, since both generally increase with increasing sample size. You can determine an optimum sample size by identifying the point of diminishing returns beyond which further increases in accuracy are not worth the additional costs of data collection. Optimum sample size will vary with the type of data being collected, so it is not possible to set a single number for all applications.

However, you can use certain statistical formulas to estimate the **minimum**
sample size needed for a specific purpose. A number of statistics web sites
include on-line interactive calculators that allow you to estimate required
sample sizes. Before you can use these sample size calculators, you will need
to know several things about the data you are collecting and how it will be
analyzed:

**Type of data.** Main
data types include:

- variables can take any value, e.g., tree diameters*continuous*-
- variables can only have certain discrete values. Types of discrete data include*discrete*- ordered ratings, e.g. low, moderate, high**ranks**- e.g., number of trees by species*counts*- variables have only two outcomes, e.g., present/absent. Binary data is typically expressed as proportions or percents, such as the percent canopy cover determined from dot grid counts (canopy is rated as present or absent for each dot).*binary*

**Type of analysis.** Continuous data are
typically analyzed using linear models, including linear regression and analysis
of variance techniques. Discrete data may be analyzed in various ways, including
contingency table analysis, logistic regression, and survival analysis. Different
formulas are used to estimate sample sizes for various analysis methods.

**Expected values.** To estimate sample sizes
for analyses of continuous data you will have to specify estimates of expected
population means (the Greek letter mu may be used for this term) and standard
deviations or variances (the Greek letter sigma symbolizes the population
standard deviation; variance is the square of the standard deviation). For
proportions, estimates of the expected proportions are needed; margins of
error (as percents) may also be needed.

**Data structure.** If data are paired or
arranged in blocks or other more complex designs, the structure of the statistical
model should be specified.

**Confidence level.**
Also abbreviated as the Greek letter alpha, this is the probability
of Type I error, the chance that you will say that a difference is significant
when it really isn't (i.e., the probability of rejecting the null hypothesis
when it is true). This is typically set a low level, often 5% (alpha=0.05),
meaning that there would only be a 5% (1 in 20) chance of deciding that a
spurious difference is real (i.e., you have a 95% chance of avoiding Type
I error).

**Power.**
This parameter
is the flip side of the confidence level, and is expressed as (1-beta) where
beta is the probability of Type II error. Power is the the probability of
detecting a real difference (i.e., the probability of rejecting the null hypothesis
when it is false). If you are interested in detecting real differences, the
power of a test should be high, generally at least 80% (0.8) or greater.

Some useful web sites with sample size calculators are listed below. Additional sites can be found by following links on some of these pages or by searching on the term "sample size" on various web search engines.

http://www.stat.uiowa.edu/~rlenth/Power/
: **Russ Lenth's Java applets for power and sample size** -This site provides
a variety of powerful but easy to use applets that allow you calculate sample
size and interactively see how sample size, power, alpha, and other study design
factors are interrelated.

http://www.quantitativeskills.com/sisa/
: **SISA: Simple Interactive Statistical Analysis - **This site includes
a number of statistical analysis applications that can be run interactively
online. It includes sample size calculators for both continuous and binary (proportion)
data.

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
: **Power and Sample Size Estimation** - A downloadable application (PS)
for calculating sample size and power.